AU678942B2

AU678942B2 - Data compression encoder/decoder and method for efficient duplicate string handling

Info

Publication number: AU678942B2
Application number: AU52948/96A
Authority: AU
Inventors: Jeffrey T Klayman
Original assignee: Motorola Inc
Current assignee: Motorola Solutions Inc
Priority date: 1995-03-24
Filing date: 1996-01-16
Publication date: 1997-06-12
Anticipated expiration: 2016-01-16
Also published as: CN1148916A; EP0761041A1; AU5294896A; WO1996031007A1; CA2189566A1

Description

1

DATA COMPRESSION ENCODER/DECODER AND METHOD FOR EFFICIENT DUPLICATE STRING HANDLING

Field of the Invention

The present invention relates generally to data compression, and more particularly to increasing the efficiency of handling duplicate strings in the input data.

Background of the Invention

Data communication is the movement of computer- encoded information from one point to another by means of a transmission system. Data communication results in nearly instantaneous information exchange over long distances.

Data communication links data terminal equipment (DTE) such as terminals, printers or computers that transmit or receive data. Data communication equipment (DCE) is a device attached between a DTE and the communication channel that manipulates the transmitted signal or data. The DCE usually comprises a DTE interface for interfacing with the DTE, a random access memory (RAM) for storing a computer program and data, a microprocessor for executing the stored computer program, and a data pump for interfacing with the communication channel. The communication channel is often a telephone network, although it could be a cellular network, a digital communication network, or a satellite network.

The information sent by a transmitter DTE (TXDTE) to a receiver DTE (RXDTE) consists of a sequence of characters. The information generally contains a significant amount of redundancy. The information, therefore, may be compressed so that it can be transmitted in less time over a communication channel.

Among known data compression methods is the Ziv- Lempel 78 algorithm ("ZL78"). In the ZL78 algorithm, the transmit DCE (TXDCE) records the history of recently transmitted data by storing the strings in an encoder dictionary maintained in the TXDCE RAM. Each string in the encoder dictionary has assigned to it a corresponding codeword. By comparing successive elements of the current data with strings in the encoder dictionary, redundant data is found. The TXDCE, instead of sending the entire redundant sequence, sends the corresponding codeword. Data compression occurs whenever the number of bits required to send the codeword is less than the number of bits in the redundant data sequence. The process of generating the codeword from the redundant data sequence is typically called the encoding process, and the software or hardware in the TXDCE that performs the encoding process is typically called the encoder.

At the other end of the channel, the receiver DCE (RXDCE) maintains a decoder dictionary in the RXDCE RAM similar to the encoder dictionary maintained by the TXDCE. Upon receipt of the codeword from the TXDCE, the RXDCE uses the codeword to find the redundant data sequence in the decoder dictionary. The RXDCE then transmits the data sequence to the RXDTE. The process of recovering the redundant data sequence from the codeword is typically called the decoding process, and the software or hardware in the RXDCE that performs the decoding process is typically called the decoder.

The encoder in the TXDCE and the decoder in the RXDCE are peer data compression entities. The combination of the encoder in the TXDCE and the decoder in the RXDCE may be called a data compression system. A DCE typically includes both an encoder and a decoder and thus may act as a TXDCE in a first data compression system and as an RXDCE in a second data compression system.

One goal of a data compression system is to maximize data compression efficiency, i.e. to generate the fewest number of bits to represent the data. In a ZL78 data compression system, the more characters there are in the redundant data sequence, the greater the compression efficiency. Thus, a goal of a ZL78 data compression system is to add strings to the dictionary that represent the longest redundant data sequences that will be encountered in the data.

Another goal of a data compression system is to minimize the number of steps and amount of memory needed to compress and decompress the data. The number of steps may affect the number of instructions executed by a microprocessor, or the number of logic gates required in an integrated circuit or other hardware implementation. The amount of memory may affect the amount of RAM utilized, or the number of registers required in an integrated circuit or other hardware implementation. Thus, the complexity of the data compression system has a direct impact on the cost of the system, both in throughput (i.e. how fast the data is processed) and the number of resources required.

One implementation of a ZL78 algorithm is the CCITT (Comite Consultatif International de Telegraphie et Telephonie) V.42bis data compression standard. V.42bis maintains a finite number of strings in the dictionary. V.42bis is adaptive such that new strings are added to the dictionary and old strings are deleted from the dictionary so that the strings stored in the dictionary are representative of the most recent data. The V.42bis encoder and decoder build their respective dictionaries according to a set of rules. One method of building the dictionaries is by storing the strings in a tree structure data base with various levels of interconnected nodes. Both the encoder and decoder maintain a free entry in their respective dictionaries, where the free entry is a codeword with no corresponding string associated with it. A new string is added to the free entry, and an old string is deleted to become the free entry for a next new string. A full description of a procedure for creating the tree structure data base, adding strings to the tree structure data base, and deleting strings from the tree structure data base may be found in the CCITT V.42bis specification and in Clark, U.S. Patent No. 5,153,591 and Welsh, U.S. Patent No. 4,558,302. Other patents of interest include Holtz, U.S. Patent No. 4,366,551 and MacCrisken, U.S. Patent No. 4,730,348.

One requirement in the ZL78 data compression system is that a particular string must be assigned the same codeword in both the encoder dictionary and the decoder dictionary so that when the encoder matches a redundant sequence of characters in its dictionary and sends the corresponding codeword, the decoder recovers from the codeword the same redundant sequence of characters from the decoder dictionary. A second requirement in the ZL78 data compression system is that the encoder only send codewords for strings that exist in the decoder dictionary.

Since there is no string associated with the free entry in the encoder dictionary, the encoder will never match on the free entry and thus will never send the codeword that corresponds to the free entry. Similarly, the decoder should never receive the codeword associated with the free entry in the decoder vocabulary. If the decoder does receive the codeword for its free entry, it is considered an error, and some error recovery procedure must take place to resynchronize the encoder and decoder. As will be demonstrated below, however, the free entry in the encoder dictionary may at times be different than the free entry in the decoder dictionary. Therefore, the encoder must ensure that it does not send the codeword corresponding to the free entry in the decoder dictionary.

The V.42bis encoder operates generally as follows. The encoder employs a string matching procedure in order to find the longest matched string of input characters in the encoder dictionary. The string matching procedure is an iterative process, which begins by searching in the dictionary for a search string equal to a first character from a succession of input characters, and extending the search string one input character at a time until the search string is not matched in the vocabulary. The input character that caused the non-match may be called the unmatched character. The codeword representing the longest matched string is sent, and the search string is added to the dictionary if the length of the search string (i.e. the number of characters in the search string) does not exceed a predetermined maximum string length. The string added to the vocabulary may be called the last added string. The string matching procedure is restarted with the search string equal to the input character that caused the non-match. Note that the unmatched character will be the first character of the next longest matched string.

It is important to note that at the moment the encoder adds a new string (referred to above as the last added string) to its dictionary, the encoder dictionary includes a string that is not represented in the decoder dictionary. The decoder will not add the new string to its dictionary until it receives the codeword generated by the iteration of the string matching procedure immediately following the addition of the new string to the encoder dictionary. Thus, the encoder cannot send the codeword for the last added string until it has sent a codeword for a string other than the last added string. The codeword assigned to the last added string by the encoder is the same as the codeword for the free entry in the decoder dictionary (since the decoder will subsequently add the string to the decoder dictionary at the free entry, which must have the same codeword as was assigned to the string by the encoder). If the encoder were to send the codeword for the last added string, then the decoder would receive the codeword for its free entry, which is considered an error.

Thus, a problem arises when the encoder matches on the last added string, which can occur when there is a repeated string of characters in the data (for example, a run of a single character). The encoder cannot send the codeword for the last added string, but instead acts as if the match on the last added string was a non-match, except that the string is not added again to the dictionary. This is considered a premature termination of the string matching procedure, as the string matching procedure is terminated prior to the longest match, and no string is added to the dictionary. The encoder sends the codeword for the longest matched string, and restarts the string matching procedure with the search string equal to the unmatched character. Subsequent matches on the string are considered valid matches, and the encoder continues the string matching procedure as usual.

The effect of the prohibition against sending the codeword for the last added string is that the encoder must check if the matched string is the last added string

(sometimes called the duplicate string check) each time the encoder matches a string in the dictionary. In order to perform the duplicate string check, the encoder must remember what string was the last added string. Therefore, each time the encoder adds a string to its dictionary, it saves the string (or a representation of the string suitable for the execution of the duplicate string check) in RAM as the last added string. The saving of the last added string and the duplicate string check consume both executional and memory resources and generally add complexity to the encoder.

The premature termination of the string matching procedure lowers the efficiency of the compression by forcing the encoder to generate a codeword prior to the true longest matched string, and also by preventing the encoder from adding a new string to the encoder dictionary.

During operation of the data compression system, the decoder is generally one step behind the encoder, since the encoder adds a string to its dictionary one codeword before the decoder adds the same string to its dictionary. The encoder generally searches until it finds a longest matched string and an unmatched character. The encoder sends the codeword for the longest matched string and adds a new string to its dictionary, where the new string is the concatenation of the longest matched string and the unmatched character. The encoder then restarts the string matching procedure with the unmatched character. The next codeword sent by the encoder will therefore represent a string with the unmatched character as the first character of the string.

The decoder receives a series of codewords, and for each codeword, decodes the string represented by the codeword and adds a string to the decoder dictionary as required to keep the decoder dictionary synchronized with the encoder dictionary.

When the decoder decodes a current string from a codeword, the first character of the current string (sometimes called the innovation character) is the unmatched character from the previous string. The string that must be added by the decoder to the decoder dictionary is the concatenation of the previous string and the innovation character of the current string. In a typical V.42bis decoder, then, the decoder saves each decoded string as the previous string for processing of the next codeword.

The series of codewords received by the decoder may include one or more control codewords. A data compression system typically reserves one or more codewords for use in conveying control information from the encoder to the decoder. For example, V.42bis defines three control codewords for signaling the occurrence of a flush, step up, or mode switch by the encoder. No string is associated with a control codeword. For each codeword received, the decoder determines whether or not the codeword is a control codeword. Where the codeword is a control codeword, the decoder performs a corresponding control function; where the codeword is not a control codeword, the decoder decodes the string corresponding to the codeword.

Brief Descriptions of the Drawings

FIG. 1 is a flow diagram representing one embodiment of the process character procedure for the prior art V.42bis encoder.

FIG. 2 is a flow diagram representing one embodiment of a process character procedure in accordance with the present invention.

FIG. 3 is a flow diagram representing one embodiment of the procedure for decoding a non-control codeword in the prior art V.42bis decoder. FIG. 4 is a flow diagram representing a first embodiment of the procedure for decoding a non-control codeword in accordance with the present invention.

FIG. 5 is a flow diagram representing a second embodiment of the procedure for decoding a non-control codeword in accordance with the present invention.

FIG. 6 shows a block diagram of a data communication equipment as is known in the prior art.

FIG. 7 shows a functional block diagram of a data compression encoder in accordance with the present invention.

FIG. 8 shows a functional block diagram of a data compression decoder in accordance with the present invention.

FIG. 9 shows a functional block diagram of a data communication equipment with both transmit and receive modes in accordance with the present invention.

Detailed Description of a Preferred Embodiment

This invention increases the efficiency of both the encoder and decoder by eliminating the duplicate string check from the encoder and allowing the decoder to decode the correct string when the encoder sends the codeword for its last added string.

Elimination of the duplicate string check is counter¬ intuitive, since the purpose of the duplicate string check in the V.42bis encoder is to prevent the encoder from sending an invalid codeword to the decoder. However, elimination of the duplicate string check in the encoder of the present invention has a number of advantages. First, the encoder performs one less step when the string matching procedure detects a match, and thus the microprocessor executes fewer instructions. Second, the string matching procedure in the encoder continues until it finds a longest matched string (i.e. there is no premature termination of the string matching procedure), and thus compression efficiency is improved both by generating a codeword from the longest matched string and by updating the dictionary upon each non-match. Third, the step of saving the last added string each time a new string is added to the vocabulary, which consumes both microprocessor and memory resources, is eliminated. As a result, the encoder of the present invention provides a more efficient data compression system than that of the prior art.

The increased efficiency of the encoder is realized when the encoder matches on its last added string. In order for the encoder to match on its last added string, the last character of the string must be the same as the first character of the string. For example, if the input data comprises the characters 'ABCABCABC...' and the encoder dictionary includes the string 'ABC, then the encoder would match the first 'ABC string as the longest matched string, send the codeword for string 'ABC, and add the non-matched string 'ABCA' to the vocabulary. String 'ABCA' would be the last added string. The encoder would continue the string matching procedure from the second 'ABC string and would match the string 'ABCA'. In the V.42bis encoder, this would be a match on the last added string, so the encoder would treat it as a non-match and would send the codeword for string 'ABC without adding a string to the vocabulary. However, in the encoder of the present invention, this would be a longest matched string, so the encoder of the present invention would send the codeword for string 'ABCA' and add the non-matched string 'ABCAB' to the encoder dictionary. In the V.42bis encoder, the string matching procedure would continue from the third 'A'; in the encoder of the present invention, the string matching procedure would continue from the third 'B'. The V.42bis encoder would be able to match the string 'ABCA', but the encoder of the present invention would be able to match both 'ABCA' and 'ABCAB'.

The decoder of the present invention allows the decoder to decode the correct string when the encoder sends the codeword for its last added string. Each string decoded by the decoder is saved as the previous string for the next codeword. Then the decoder decodes the next codeword, appends the first character of the decoded string onto the previous string, and adds the resulting string to its vocabulary at the free entry. However, when the encoder sends the codeword for its last added string, the decoder receives the codeword corresponding to the free entry in the decoder vocabulary. In the V.42bis decoder, this would be considered an error. However, in the decoder of the present invention, the decoder decodes the codeword by appending the first character of the previous string onto the previous string. The resulting string is added to the decoder dictionary at the free entry.

In the above example, the encoder of the present invention first sends the codeword for string 'ABC and then sends the codeword for string 'ABCA'. In response to the codeword for string 'ABC, the decoder of the present invention recovers the string 'ABC from its dictionary and saves it as the previous string. In response to the codeword for string 'ABCA', which is the codeword corresponding to the free entry in the decoder dictionary, the decoder appends the first

(innovation) character of the previous string ('A') onto the previous string ('ABC), and the resulting string ('ABCA') is added to the decoder vocabulary at the free entry. The decoder correctly decodes the string and correctly adds the string to the decoder dictionary. Although the free entry in the decoder of the present invention will typically be a valid codeword (i.e. will have a string associated with it), there are at least two times when the free entry will be invalid. First, the free entry will be invalid immediately following a reset of the dictionary, where there is no previous string stored in the decoder. Second, the free entry will be invalid when the decoder receives the codeword for a maximum length string. This occurs when the longest matched string in the encoder is a maximum length string, and the encoder does not add a string to the encoder dictionary (since extending the longest matched string by one character would create a string that exceeds the maximum string length). Therefore, there is no new last added node on which the encoder can match.

The free entry in the decoder will also be invalid whenever the encoder sends the codeword for a partially matched string (i.e. a string matched by the encoder prior to finding the longest matched string). This can occur, for example, when the encoder performs a "flush" operation (refer to the CCITT V.42bis specification). The flush operation comprises sending the codeword for the partially matched string, and if the encoder output is then non-octet aligned (i.e. the number of bits in the sequence of codewords generated by the encoder is not an exact multiple of eight), sending a control codeword to signal that the flush operation occurred. The decoder is unable to determine that a codeword represents a partially matched string, and therefore the decoder may fail to mark the free entry invalid. However, this can be remedied by requiring the encoder to send the "flush" control codeword following each flush operation and marking the free entry invalid upon receipt of the "flush" control codeword. The advantages of the encoding/decoding process of the present invention can best be seen where the input data is a run of characters. An example showing the encoding of a run of seven character (ex. 'AAAAAAA') will demonstrate the benefit of the encoder of the present invention over the V.42bis encoder. In the following two examples, it is assumed that the encoder dictionary includes the string 'A'.

The following is an example showing the processing of the string 'AAAAAAA' in the V.42bis encoder. The encoder finds the longest matched string 'A', sends the codeword for string 'A', adds string 'AA' to the dictionary, and restarts the string matching procedure from the second 'A'. The encoder then matches on string 'AA* which is the last added string, so the encoder sends the codeword for string 'A' and restarts the string matching procedure from the third 'A'. The encoder then finds the longest matched string 'AA', sends the codeword for string 'AA', adds string 'AAA' to the dictionary, and restarts the string matching procedure from the fifth 'A'. Finally, the encoder matches on string 'AAA' which is the last added string, so the encoder sends the codeword for string 'AA' and restarts the string matching procedure from the seventh 'A'. So, the V.42bis encoder sends four codewords to represent six characters (the seventh 'A' will be represented in the next codeword sent by the encoder) and adds two strings to the dictionary.

The following is an example showing the processing of the string 'AAAAAAA' in the encoder of the present invention. The encoder finds the longest matched string 'A', sends the codeword for string 'A', adds string 'AA' to the dictionary, and restarts the string matching procedure from the second 'A'. The encoder then finds the longest matched string 'AA', sends the codeword for string 'AA', adds string 'AAA' to the dictionary, and restarts the string matching procedure from the fourth 'A'. Finally, the encoder finds the longest matched string 'AAA', sends the codeword for string 'AAA', adds string 'AAAA' to the dictionary, and restarts the string matching procedure with the seventh 'A'. So, the encoder of the present invention sends three codewords to represent six characters (the seventh 'A' will be represented in the next codeword sent by the encoder) and adds three strings to the dictionary.

In the example above, not only was the encoder of the present invention more efficient than the V.42bis encoder in encoding the run of characters, but if the data were to include a subsequent run of the character 'A', the encoder of the present invention would be able to match up to the four character string 'AAAA' where the V.42bis encoder would only be able to match up to the three character string 'AAA'. Thus, the encoder of the present invention would be more efficient in encoding the subsequent run also.

FIG. 1 is a flow diagram representing one embodiment of the process character procedure for the prior art V.42bis encoder. The method processes one input character. The method (102) includes the steps of: 1) searching (104) in a dictionary for a new string equal to the concatenation of a matched string and the input character; 2) where there is an unsuccessful match (106 - Not Found), adding (108) the new string to the dictionary, sending (1 10) a codeword to represent the matched string, and setting (1 12) the matched string equal to the input character; 3) where there is a successful match (106 - Found), testing (114) whether the new string is equal to a last added string; 4) where the new string is equal to the last added string (1 14 - Yes), sending (1 10) a codeword to represent the matched string, and setting (1 12) the matched string equal to the input character; 5) where the new string is a string other than the last added string (114 - No), setting (116) the matched string equal to the new string. FIG. 2 is a flow diagram representing one embodiment of a process character procedure in accordance with the present invention. The method processes one input character. The method (202) includes the steps of: 1 ) searching (204) in a dictionary for a new string equal to the concatenation of a matched string and the input character; 2) where there is an unsuccessful match (206 - Not Found), adding (208) the new string to the dictionary, sending (210) a codeword to represent the matched string, and setting (212) the matched string equal to the input character; 3) where there is a successful match (206 - Found), setting (216) the matched string equal to the new string.

FIG. 3 is a flow diagram representing one embodiment of the procedure for decoding a non-control codeword in the prior art V.42bis decoder. The method processes one non-control codeword. The method (302) includes the steps of: 1 ) testing (304) whether the codeword is equal to a free entry; 2) where the codeword is equal to the free entry (304 - Yes), taking an appropriate error recovery action; 3) where the codeword is one other than the free entry (304 - No), decoding (306) the codeword to get a current string, outputting (308) the current string, and searching (310) a dictionary for a new string equal to the concatenation of a previous string and an innovation character of the current string; 4) where there is an unsuccessful match (312 - Not Found), adding (314) the new string to the dictionary and setting (316) the previous string equal to the current string; 5) where there is a successful match (412 - Found), setting (416) the previous string equal to the current string.

FIG. 4 is a flow diagram representing a first embodiment of the procedure for decoding a non-control codeword in accordance with the present invention. The method processes one non-control codeword. The method (402) includes the steps of: 1 ) testing (404) whether the codeword is equal to a free entry; 2) where the codeword is one other than the free entry (404 - No), decoding (406) the codeword to get a current string; 3) where the codeword is equal to the free entry (404 - Yes), setting (418) the current string equal to the concatenation of a previous string and an innovation character of the previous string; 4) outputting (408) the current string; 5) searching (410) a dictionary for a new string equal to the concatenation of the previous string and an innovation character of the current string; 6) where there is an unsuccessful match (412 - Not Found), adding (414) the new string to the dictionary and setting (416) the previous string equal to the current string; 7) where there is a successful match (412 - Found), setting (416) the previous string equal to the current string.

FIG. 5 is a flow diagram representing a second embodiment of the procedure for decoding a non-control codeword in accordance with the present invention. The method processes one non-control codeword. The method (502) includes the steps of: 1 ) testing (504) whether the codeword is valid; 2) where the codeword is invalid (504 - No), taking an appropriate error recovery action; 3) where the codeword is valid (304 - Yes), decoding (506) the codeword to get a current string, outputting (508) the current string, and searching (510) a dictionary for a new string equal to the concatenation of a previous string and an innovation character of the current string; 4) where there is an unsuccessful match (512 - Not Found), adding (514) the new string to the dictionary and testing (518) whether the length of the current string is equal to a maximum string length; 5) where there is a successful match (512 - Found), testing (518) whether the length of the current string is equal to the maximum string length; 6) where the length of the current string is less than the maximum string length (518 - No), assigning (520) to a free entry a string equal to the concatenation of the current string and an innovation character of the current string and setting (516) the previous string equal to the current string; 7) where the length of the current string is equal to the maximum string length (518 - Yes), setting (522) the free entry to be invalid and setting (516) the previous string equal to the current string.

FIG. 6, numeral 600, shows a block diagram of a data communication system as is known in the art. DTE 602 is coupled to DCE 604. DCE 604 includes a DTE interface 614 through which it interfaces with DTE 602. DTE 602 sends information for transmission (TXD) to DCE 604 via DTE interface 614. DCE 604 includes a microprocessor 606. Microprocessor 606 performs the functions of a data compression encoder 608 and a data compression decoder 610. Data compression encoder 608 takes TXD and compresses the TXD into codewords, if possible. Data pump 616 sends the compressed TXD via a communication channel 618 to a DCE/DTE pair at some other location. As is commonly used in data communication, an "RX" prefix indicates "receiver", while a "TX" prefix indicates "transmitter".

Similarly, data pump 616 obtains compressed RXD from the communication channel 618. Data compression decoder 610 then decompresses the compressed RXD into RXD. DTE interface 614 sends the RXD to DTE 602.

RAM 612 is coupled to microprocessor 606. RAM 612 contains, among other things, the encoder and decoder dictionaries and the program controlling the microprocessor.

FIG. 7, numeral 700, shows a functional block diagram of a data compression encoder (702) utilizing the device of the present invention. String Matching Unit 706 takes as input character strings, and searches in the Encoder Dictionary 704 using the method of the present invention to find the longest matched string. Upon finding a longest matched string, an optional Codeword Generator 708 outputs a codeword corresponding to the longest matched string. Also upon finding a longest matched string, an optional Encoder Dictionary Update Unit 712 updates the Encoder Dictionary 704 by adding a new string and recovering a free entry.

FIG. 8, numeral 800, shows a functional block diagram of a data compression decoder (802) utilizing the device of the present invention. String Decoding Unit 806 accepts as input codewords, and decodes each codeword into its corresponding character string using the Decoder Dictionary 804 and optionally the previous string stored in the optional String Memory Unit 810 using the method of the present invention. The String Decoding Unit 806 outputs the decoded character string. Decoder Dictionary 804 is used to store previously decoded character strings, with each character string having a corresponding codeword. Optional String Memory Unit 810 is used to store a previous codeword or previously decoded character string. After a codeword is decoded, optional Decoder Dictionary Update Unit 812 updates the Decoder Dictionary 804 by adding a new string and recovering a new free entry. Decoder Dictionary Update Unit 812 optionally associates a duplicate string equal to the concatenation of the decoded string and the innovation character of the decoded string with the new free entry using the method of the present invention. After decoding the codeword, the String Decoding

Unit 806 optionally stores the codeword or the decoded string in the String Memory Unit 810.

FIG. 9, numeral 900, shows a functional block diagram of a DCE 901 with at least one of a transmit and a receive mode utilizing the device of the present invention. Clearly, where selected, the DCE may be operated with the device of the present invention only in transmit or only in receive mode, or in both modes. Also, in this example, but clearly selectably, the TX data pump 914 and the RX data pump 920 are located within the DCE 901 having both transmit and receive modes. Transmitter DCE (TXDCE) 902 communicates with a remote Receiver DCE (RXDCE) by way of a communication channel. In most cases, a DCE contains both a TXDCE and a RXDCE.

TXDCE 902 receives TXD via the transmitter DTE (TXDTE) interface 906. Data compression encoder 908 compresses the TXD into compressed TXD using the method of the present invention.

Optional TX error correction 912 receives compressed TXD from data compression encoder 908, and sends the data to TX data pump 914 for transmission via a communication channel to a remote RXDCE.

RX data pump 920 receives compressed RXD from the communication channel. Optional RX error correction 918 processes the compressed RXD, and sends the data to the data compression decoder 922 which decompresses the compressed RXD into RXD using the method of the present invention. The

Receiver DTE (RXDTE) interface 924 then typically sends the RXD to the attached DTE.

Although FIG. 9 shows a DCE as the device for efficient processing of a duplicate string in an encoder/decoder, the encoder/decoder with efficient duplicate string processor may reside in a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC) or other integrated circuit, discrete logic circuits, programmable logic circuits, or any combination thereof. The figures described above do not represent complete encoder and decoder implementations. The figures are intended only to show how the invention works, and only those parts of the data compression system that are pertinent to the invention are shown. It will be obvious to those skilled in the art how to integrate the invention into a new or existing data compression process, system, or device.

Although exemplary embodiments are described above, it will be obvious to those skilled in the art that many alterations and modifications may be made without departing from the invention. Accordingly, it is intended that all such alterations and modifications be included within the spirit and scope of the invention as defined in the appended claims.

I claim:

Claims

CLAIMS 2 l

1. An efficient method for encoding a duplicate string in a data compression system, comprising the steps of:

1 A) searching for a longest matched string of input characters in an encoder dictionary; and 1 B) upon matching the longest matched string of input characters in the encoder dictionary, where said longest matched string of input characters is equal to a last added string, sending a codeword corresponding to a free entry in a decoder dictionary to represent the longest matched string.

2. The method of claim 1 wherein searching for the longest matched string of input characters in the encoder dictionary comprises:

2A) appending a next input character onto a partially matched string to form a search string;

2B) searching in the encoder dictionary for a match on the search string;

2C) recycling, upon a successful match of the search string in the encoder dictionary, to step 2A without checking whether the search string is equal to the last added string, and using the search string as the partially matched string; and 2D) setting, upon an unsuccessful match of the search string in the encoder dictionary, a longest matched string equal to said partially matched string, and classifying said next input character as a nonmatched character; and where selected, further including, after step 2D, the step of sending a codeword to represent the longest matched string; and where further selected, further including the step of adding the search string to the encoder dictionary; and where further selected, at least one of 2E-2F:

2E) wherein adding the search string to the encoder dictionary comprises associating the search string with a free entry in the encoder dictionary; and 2F) including the step of recovering a next free entry from the encoder dictionary and where selected, including one of 2F1-2F2:

2F1 ) wherein recovering the next free entry from the encoder dictionary comprises deleting a leaf node from the vocabulary tree; and 2F2) further including the step of, upon an unsuccessful match of the search string in the encoder dictionary, continuing searching using the nonmatched character as the partially matched string.

3. An efficient method for decoding a duplicate string in a data compression system, comprising the steps of: 3A) upon receiving a codeword for a free entry in a decoder dictionary, decoding a current string equal to the concatenation of an immediately previous string and an innovation character of the immediately previous string; and 3B) upon receiving a codeword for a node other than the free entry, decoding a current string equal to a string in a decoder dictionary corresponding to the codeword.

4. The method of claim 3 including at least one of 4A-4F: 4A) wherein decoding the current string upon receiving the codeword for the free entry comprises forming the current string from the concatenation of the immediately previous string saved in a memory and the innovation character of the immediately previous string;

4B) wherein decoding the current string upon receiving the codeword for the free entry comprises the steps of:

4B1 ) decoding the immediately previous string from an immediately previous codeword saved in a memory; and 4B2) forming the current string from the concatenation of the immediately previous string and the innovation character of the immediately previous string;

4C) further including the step of outputting the current string, and where selected, further including the steps of: 4C1 ) searching in the decoder dictionary for a match on a new string, where said new string is equal to the concatenation of the immediately previous string and the innovation character of the current string; and

4C2) upon an unsuccessful match of the new string in the decoder dictionary, adding the new string to the decoder dictionary by associating the new string with the free entry, and where further selected, further including the step of recovering a next free entry from the decoder dictionary, and where further selected, at least one of 4C3-4C5:

4C3) wherein recovering the next free entry from the decoder dictionary comprises deleting a leaf node from the vocabulary tree; 4C4) further including the step of saving the current string in a memory as the immediately previous string for a next iteration of the decoder; and

4C5) further including the step of saving the codeword in a memory as the immediately previous string for a next iteration of the decoder; 4D) including the steps of: 4D1 ) determining whether the codeword is for a free entry in a decoder dictionary or for a node other than the free entry in the decoder dictionary by comparing the codeword with a codeword associated with the free entry; 4D2) where the codeword is for the free entry in the decoder dictionary, decoding a current string equal to the concatenation of an immediately previous string and an innovation character of the immediately previous string; and

4D3) where the codeword is for a node other than the free entry, decoding a current string equal to a string in a decoder dictionary corresponding to the codeword, and where selected,

4D4) further including the step of outputting the current string; and where further selected,

4D5) further including the steps of: 4D5a) searching in the decoder dictionary for a match on a new string, where said new string is equal to the concatenation of the immediately previous string and the innovation character of the current string; and

4D5b) upon an unsuccessful match of the new string in the decoder dictionary, adding the new string to the decoder dictionary by associating the new string with the free entry, and where further selected,

4D6) further including the step of recovering a next free entry from the decoder dictionary; and where further selected, at least one of 4D6a-4D6c: 4D6a) wherein recovering the next free entry from the decoder dictionary comprises deleting a leaf node from the vocabulary tree; 4D6b) further including the step of saving the current string in a memory as the immediately previous string for a next iteration of the decoder; and

4D6c) further including the step of saving the codeword in a memory as the immediately previous string for a next iteration of the decoder; 4E) wherein decoding the current string upon receiving the codeword for the free entry comprises decoding a string in the decoder dictionary corresponding to the codeword, where the string equal to the concatenation of the immediately previous string and the innovation character of the immediately previous string had previously been assigned to the codeword for the free entry; and where selected, further including the step of outputting the current string; and where further selected, further including the steps of 4E1 -4E2: 4E1 ) searching in the decoder dictionary for a match on a new string, where said new string is equal to the concatenation of the immediately previous string and the innovation character of the current string; and

4E2) upon an unsuccessful match of the new string in the decoder dictionary, adding the new string to the decoder dictionary by associating the new string with the free entry; and where further selected, further including the step of recovering a next free entry from the decoder dictionary; and where further selected, at least one of 4E3-4E4:

4E3) wherein recovering the next free entry from the decoder dictionary comprises deleting a leaf node from the vocabulary tree; and 4E4) further including the step of associating with the next free entry the string formed by the concatenation of the current string and the innovation character of the current string and where further selected one of 4E4a-4E4b:

4E4a) further including the step of saving the current string in a memory as the immediately previous string for a next iteration of the decoder; and 4E4b) further including the step of saving the codeword in a memory as the immediately previous string for a next iteration of the decoder; F) including the steps of:

4F1 ) determining whether the codeword is associated with a valid entry or is invalid; 4F2) where the codeword is invalid, initiating an error recovery procedure;

4F3) where the codeword is valid and the codeword corresponds to a node other than a free entry in a decoder dictionary, decoding a current string equal to a string in the decoder dictionary corresponding to the codeword;

4F4) where the codeword is valid and the codeword corresponds to the free entry in the decoder dictionary, decoding the codeword as a codeword for a node other than the free entry, where the string equal to the concatenation of the immediately previous string and the innovation character of the immediately previous string had previously been assigned to the codeword for the free entry; and where selected,

4F5) further including the step of outputting the current string; and where further selected, including the steps of:

4F5a) searching in the decoder dictionary for a match on a new string, where said new string is equal to the concatenation of the immediately previous string and the innovation character of the current string; and 4F5b) upon an unsuccessful match of the new string in the decoder dictionary, adding the new string to the decoder dictionary by associating the new string with the free entry; and where selected,

4F6) further including the step of recovering a next free entry from the decoder dictionary, and where selected, one of 4F6a-46Fb:

4F6a) wherein recovering the next free entry from the decoder dictionary comprises deleting a leaf node from the vocabulary tree;

4F6b) further including the steps of:

4F6b1 ) comparing the length of the current string with a predetermined maximum string length;

4F6b2) where the length of the current string is equal to the predetermined maximum string length, marking the next free entry as invalid; and 4F6b3) where the length of the current string is unequal to the predetermined maximum string length, associating with the next free entry the string formed by the concatenation of the current string and the innovation character of the current string, and where selected, one of 4F6b3- 4F6b4:

4F6b3) further including the step of saving the current string in a memory as the immediately previous string for a next iteration of the decoder; and 4F6b3) further including the step of saving the codeword in a memory as the immediately previous string for a next iteration of the decoder.

5. A device for efficient encoding of a duplicate string in a data compression system comprising:

5A) an encoder dictionary for storing strings of characters, said encoder dictionary having in it a free entry and zero or more strings, each string having associated with it a corresponding codeword; and

5B) a string matching unit for finding in the encoder dictionary a longest matched string of input characters from a succession of input characters by iteratively searching in the encoder dictionary for a match to a search string initially equal to a single input character, and extended by one input character upon each successful match in the encoder dictionary without first checking whether the matched search string is equal to a last added string in the encoder dictionary.

6. The device of claim 5 further comprising a codeword generator for outputting a codeword corresponding to the longest matched string, and where selected, further comprising an encoder dictionary update unit providing means for adding a new string to the encoder dictionary, where the new string is equal to the concatenation of the longest matched string and a next input character, and where further selected, at least one of 6A-6B:

6A) wherein the means for adding the new string to the encoder dictionary comprises assigning to the new string a codeword corresponding to the free entry in the encoder dictionary without saving a pointer to the free entry as the last added string; and

6B) wherein the encoder dictionary update unit further comprises means for recovering a next free entry from the encoder dictionary; and where selected, one of 6B1 -

6B2:

6B2a) wherein the means for recovering the next free entry comprises deleting a string from the encoder dictionary; and 6B2b) further wherein the string matching unit begins iteratively searching from the character immediately following the longest matched string.

7. A device for efficient decoding of a duplicate string in a data compression system comprising:

7A) a decoder dictionary for storing strings of characters, said decoder dictionary having in it a free entry and zero or more strings, each string having associated with it a corresponding codeword; and 7B) a string decoding unit with means for generating a current string from a received codeword, wherein said current string is equal to one of:

7B1 ) the string in the decoder dictionary corresponding to the received codeword, where the received codeword corresponds to an entry in the decoder dictionary other than the free entry; and

7B2) the string formed from the concatenation of a previously decoded string and the innovation character of the previously decoded string, where the received codeword is equal to the free entry in the decoder dictionary.

8. The device of claim 7 further comprising at least one of 8A-8D)

8A) a string memory unit for storing the previously decoded string; 8B) further comprising a string memory unit for storing a previously received codeword; 8C) the means for generating the current string when the codeword corresponds to the free entry in the decoder dictionary comprises generating the current string equal to the string in the decoder dictionary corresponding to the received codeword, where the string equal to the concatenation of the immediately previous string and the innovation character of the immediately previous string had previously been assigned to the codeword for the free entry; and 8D) wherein the means for generating the current string from the received codeword determining whether the received codeword is valid or invalid, and where the received codeword is valid, further comprising generating the current string equal to the string in the decoder dictionary corresponding to the received codeword.

9. A data communications equipment comprising at least one of a transmitter data communications equipment and a receiver data communications equipment, wherein: the transmitter data communications equipmentcomprises:

9A) a transmitter DTE interface for accepting transmit data from a transmitter DTE;

9B) an encoder with efficient duplicate string processor for encoding the transmit data into compressed data, further comprising:

9B1 ) an encoder dictionary for storing strings of characters, said encoder dictionary having in it a free entry and zero or more strings, each string having associated with it a corresponding codeword; and 9B2) a string matching unit for finding in the encoder dictionary a longest matched string of input characters from a succession of input characters by iteratively searching in the encoder dictionary for a match to a search string initially equal to a single input character, and extended by one input character upon each successful match in the encoder dictionary without first checking whether the matched search string is equal to a last added string in the encoder dictionary; and 9C) a transmitter data pump for sending the compressed data over a communication channel; and where selected, further comprising a transmitter error correction unit between the encoder and the transmitter data pump; and the receiver comprises a receiver data communications equipment comprising:

9D) a receiver data pump for receiving compressed data from a communication channel;

9E) a decoder with efficient duplicate string processor for decoding the compressed data into receive data, further comprising:

9E1 ) a decoder dictionary for storing strings of characters, said decoder dictionary having in it a free entry and zero or more strings, each string having associated with it a corresponding codeword; and

9E2) a string decoding unit with means for generating a current string from a received codeword, wherein said current string is equal to one of:

9E2a) the string in the decoder dictionary corresponding to the received codeword, where the received codeword corresponds to an entry in the decoder dictionary other than the free entry; and

9E2b) the string formed from the concatenation of a previously decoded string and the innovation character of the previously decoded string, where the received codeword is equal to the free entry in the decoder dictionary; and

9F) a receiver DTE interface for sending the receive data to a receiver DTE, and where selected, further comprising a receiver error correction unit between the receiver data pump and the decoder.

1 0. A device for efficient encoding and decoding of a duplicate string in a data compression system comprising: 10A) an encoder with efficient duplicate string processor, further comprising:

10A1 ) an encoder dictionary for storing strings of characters, said encoder dictionary having in it a free entry and zero or more strings, each string having associated with it a corresponding codeword; and

10A2) a string matching unit for finding in the encoder dictionary a longest matched string of input characters from a succession of input characters by iteratively searching in the encoder dictionary for a match to a search string initially equal to a single input character, and extended by one input character upon each successful match in the encoder dictionary without first checking whether the matched search string is equal to a last added string in the encoder dictionary; and

10B) a decoder with efficient duplicate string processor, further comprising:

10B1 ) a decoder dictionary for storing strings of characters, said decoder dictionary having in it a free entry and zero or more strings, each string having associated with it a corresponding codeword; and

10B2) a string decoding unit with means for generating a current string from a received codeword, wherein said current string is equal to one of: 10B2A) the string in the decoder dictionary corresponding to the received codeword, where the received codeword corresponds to an entry in the decoder dictionary other than the free entry; and 10B2B) the string formed from the concatenation of a previously decoded string and the innovation character of the previously decoded string, where the received codeword is equal to the free entry in the decoder dictionary.