WO2000038331A1 - An efficient, locally-adaptive data reduction method and apparatus - Google Patents

An efficient, locally-adaptive data reduction method and apparatus Download PDF

Info

Publication number
WO2000038331A1
WO2000038331A1 PCT/US1999/030530 US9930530W WO0038331A1 WO 2000038331 A1 WO2000038331 A1 WO 2000038331A1 US 9930530 W US9930530 W US 9930530W WO 0038331 A1 WO0038331 A1 WO 0038331A1
Authority
WO
WIPO (PCT)
Prior art keywords
token
cache
stored
data
reducer
Prior art date
Application number
PCT/US1999/030530
Other languages
French (fr)
Inventor
Henry Collins
Original Assignee
Citrix Systems, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Citrix Systems, Inc. filed Critical Citrix Systems, Inc.
Priority to AU20580/00A priority Critical patent/AU2058000A/en
Publication of WO2000038331A1 publication Critical patent/WO2000038331A1/en

Links

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3084Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
    • H03M7/3086Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method employing a sliding window, e.g. LZ77

Definitions

  • the present invention relates to data reduction methods and, in particular, to methods and apparatus for efficiently reducing data in a locally-adaptive manner.
  • Data reduction methods and apparatus remove redundancy from an input data stream, while still preserving the information content.
  • An input data stream can be a stream of data to be transmitted or a file to be compressed, and the input data stream is sometimes referred to as an alphabet A of symbols.
  • the data reduction methods and apparatus which are the greatest interest are those which are fully reversible, such that an original data stream may be reconstructed from reduced data without any loss of information content.
  • Techniques, such as filtering, which are not fully reversible are sometimes suitable for reducing the size of visual images or sound data. They are, nevertheless, not suitable for reductions of program image files, textual report files and the like, because the information content of such files must be preserved exactly. There are two major goals in digital data reduction.
  • the first goal is to maximize reduction by using the fewest possible bits to represent a given quantity of input data.
  • the second goal is to minimize the resources required to perform reduction and reconstruction.
  • the second goal encompasses such objectives as minimizing computation time and minimizing the amount of memory required to reduce and reconstruct the data.
  • Data reduction methods of the prior art typically achieve only one these goals.
  • one reduction technique is "move -to-front" coding.
  • the basic idea of this method is to maintain the alphabet A of symbols as a list where frequently occurring symbols are located near the front.
  • move-to-front method is locally adaptive since it adapts itself to the frequencies of symbols in local areas of the input stream.
  • this technique consumes a large amount of computational resources when the list A is large because each symbol in the list must be moved when a new symbol is brought to the front of list A.
  • a variation of move-to-front coding called “move-ahead-k,” attempts to solve the computational efficiency problems of move-to-front coding by moving a list element matched by the current symbol k positions, instead of all the way to the front of list A.
  • the parameter k can be specified by the user, with a default value of either n or 1.
  • this variation does not eliminate moving multiple list elements. The computational load, therefore, may not be significantly lessened.
  • a second variation of move-to-front coding attempts to lessen computational load by moving an element of the list A to the front only after it has been matched c times to symbols from the input stream (not necessarily c consecutive times).
  • this variation known as "wait-c- and-move," each element of A should have a counter associated with it, to count the number of matches.
  • this variation does not avoid shuffling the element of the list A and the concomitant computational load, but merely delays it.
  • the present invention provides efficient data reduction systems and apparatus that exhibit the desirable properties of the move-to-front reduction schemes discussed above while avoiding their computational intensity.
  • the present invention therefore provides data reduction techniques and apparatus that operate quickly and with minimal computational load.
  • the invention relates to an apparatus for efficiently reducing data which includes a token cache, a comparator, and a repositioning mechanism.
  • the token cache stores a plurality of processed tokens.
  • the comparator receives as input a first token to be processed and provides as output an indication that the first token is stored by the cache at a first position.
  • the repositioning mechanism swaps the first token with a second token stored in a token cache. The second token is selected in response to the position of the first token.
  • the present invention relates to a reducer which can be used in a system for efficiently reducing data.
  • the reducer includes a token cache, a receiver, a decoder, and a repositioning mechanism.
  • the token cache stores a plurality of process tokens.
  • the receiver receives token position information.
  • the decoder accepts as input the received position information and provides as output a first token corresponding to the position information.
  • the repositioning mechanism swaps the first token with a second token stored in the token cache. The second token is selected responsive to the received position infoimation.
  • the invention in yet another aspect, relates to a method for efficiently reducing data.
  • the method includes the steps of determining the position occupied by a first data token in a token cache, selecting a second data token stored in the token cache, and swapping the first token and the second token.
  • the second data token is selected responsive to the position of the first token.
  • the invention in yet another aspect, relates to a method for receiving transmitted data in system for efficiently transmitting data.
  • the method includes the steps of receiving a transmitted token position indicator, identifying a first token stored in a token cache using the received token position indicator, determining a second token stored in the token cache based on the position of the first token, and swapping the first token and the second token.
  • FIG. 1 is a block diagram of one embodiment of a data reduction apparatus
  • FIG. 2 is a flowchart of one embodiment of the steps taken reduce data
  • FIG. 3 A is a logic diagram of one embodiment of a comparator as used in the present invention
  • FIG. 3B is a block diagram of an embodiment of token switching logic
  • FIG. 4 is a block diagram of an exemplary system using the apparatus and method of the present invention to efficiently transmit data
  • FIG. 5 is a diagram illustrating the components of a general purpose computer.
  • a data token or symbol is any conveniently-sized datum in which the described technique may find utility.
  • a data token may be a 4-bit nibble, a 8-bit byte, a 16-bit word, a 32-bit longword, or some other conveniently sized datum.
  • an apparatus 10 for efficiently reducing data includes a token cache 12, a comparator 14, and a repositioning mechanism 16.
  • the token cache 12 stores data tokens.
  • the comparator 14 compares tokens or symbols from an input stream to the token cache 12, and provides to the repositioning mechanism 16 an indication whether a representation of the current token is stored in the token cache 12.
  • the input stream may be a stream of data tokens to be transmitted, or it may be a file to be reduced.
  • the apparatus outputs a string of encodings which can be used by a decoding unit to reconstruct the reduced information. Referring also to FIG.
  • a data token to be processed is received (step 102).
  • the data token may be received as one of a stream of tokens to be transmitted over a communications channel, such as a local area network connection, a wide area network connection, or a wireless network connection.
  • the data token may be accessed from a buffer memory (not shown in FIG. 1) in which received tokens are stored before being processed.
  • the buffer memory may be one or more transceivers embodied as integrated circuits.
  • the data token may be accessed from a file to be reduced using the method of the invention. In general, use of the term
  • Received is intended to refer to any method of accessing a data token for reduction, whether or not it is buffered before processing.
  • the comparator 14 determines whether a representation of the received data token is present in the token cache 12 (step 104). In one embodiment, the comparator 14 makes this determination by comparing the received data token with every entry in the token cache 12.
  • the token cache 12 includes two arrays, one of which stores token encodings and the other of which stores token decodings.
  • An encoding array maps tokens to their respective positions in the output list.
  • a decoding array maps positions in the output list to tokens. In these embodiments, the encoding array may be provided with a flag that indicates whether a token is contained in the token cache 12.
  • the comparator 14 accesses the appropriate element of the encoding array to determine whether the received data token is stored in the token cache and, if so, what position it occupies in the output list. For example, an encoding for token "c" could be stored in the third element of an array. Alternatively, the token encoding could be stored in the 63rd element of an array, which corresponds to the ASCII encoding of "c.” When a "c" token is received, the comparator 14 refers to the encoding array to determine if "c" is stored in the token cache, the encoding array may additionally provide the comparator 14 with the position of the token "c".
  • the position of the data token in the token cache 12 is determined (step 106). This determination may be made by the comparator 14 when it determines if the data token representation is stored in the token cache 12. Alternatively, a separate functional unit may be provided which independently determines the position of the token representation.
  • a second data token is selected (step 108) and the received data token is swapped with the second data token stored in the token cache 12 (step 110). Selection of the second token is made responsively to the position of the received data token.
  • the second token may have a position equal to three-quarter, one-half, one-quarter, one-eighth, or one-sixteenth the current position of the received data token. That is, the second token may be closer to the head of the list by three-quarters, one-half, one-quarter, one-eighth, or one-sixteenth the position of the first token. Selection of the second data token may be made in response to current performance characteristics of the data reduction.
  • the apparatus 10 may determine that data reduction achieved by swapping received tokens with data tokens at one-half their position is not acceptable and can begin using some other rule, such as three-quarters, to attempt to improve performance. Once the tokens are swapped, the next token in the input stream is processed until no more tokens exist in the stream. Referring back to step 104, if the comparator 14 determines that a received data token is not stored in the token cache 12, a second data token is selected (step 120).
  • the second data token may be selected using a pseudorandom number generator. In some embodiments, the second data token is selected using the bits from successive numbers in the Fibonacci number sequence. Alternatively, a subset of those bits, such as the bottom three or top five, may be used.
  • cache management techniques such as most-recently-used (MRU) or least-recently-used (LRU) may be used to select the second data token.
  • MRU most-recently-used
  • LRU least-recently-used
  • the token cache 12 stores received data tokens and may be implemented as any convenient memory element or memory data structure. For example, if tokens are 8-bit bytes, the token cache 12 may be implemented as a byte- wide memory chip such as SRAM. DRAM, SDRAM, or flash memory. Alternatively, the token cache 12 may be implemented as an array memory structure in which data tokens are stored. In these embodiments, the array memory structure matches the size of the tokens. In one advantageous embodiment, the token cache 12 is implemented as two arrays. One array stores encodings of data tokens, that is, the array maps tokens to their position in the output list. A second array, which corresponds to the first array, stores the decodings of positions, that is, it maps positions in an output list to the corresponding token value.
  • the comparator 14 compares received data tokens to the token cache 12.
  • the comparison of the received data token to the token cache 12 is effected by comparing the received data token to every element in the token cache 12.
  • FIG. 3 depicts an embodiment of the comparator 14 which combinatorially compares a received data token to the token cache 12.
  • a 4-bit nibble is shown as the token size and only the block of logic 30 required to compare one entry in the token cache with a received data token is shown.
  • Circuitry 34 compares each bit of the received data token with the corresponding bit of an element in the token cache.
  • comparison circuitry 34 includes 2 AND gates 35, 36 and an OR gate 37.
  • the bits to be compared are delivered to the inputs of AND gate 35.
  • the output of this gate is positive only when both bits are equal to a logical "1" value.
  • the inversion of each bit is delivered to the inputs of AND gate 36.
  • the output of AND gate 36 is high only when both bits are a logic "0".
  • the output of AND gates 35, 36 are connected to the inputs of OR gate 37, which outputs a logic "1” if the bits to be compared are both "0" or "1".
  • the result of each individual comparison that is, the output of each OR gate 37 is combined with the result of other OR gates in the logic block 30.
  • the output of AND gate 38 is a logic "1" only when each and every output of OR gates 37 are logic "1".
  • a logic "1" output from AND gate 38 indicates that the received data token stored in buffer memory element 32 matches a token stored in the token cache 12.
  • the output of the respective AND gates 38 can be used to determine the position of the matching data token.
  • FIG. 3B shows a block diagram of a hardware implementation for swapping two data tokens in the token cache 12.
  • the token cache 12 is implemented a plurality of token-wide latches that can be simultaneously, selectively read.
  • the outputs of the token cache 12 are fed back to the inputs of the token cache 12 through a crossbar switching element 39.
  • the crossbar switching logic 39 allows any output of the token cache 12 to be fed to any input of the token cache 12.
  • This implementation allows data tokens to be swapped in one clock cycle.
  • the token cache 12 is provided as token-wide RAM memory elements.
  • each token cache location is read and the output is stored in a latch element. After both entries are read, they are written back to the memory element using the address of the other token entry. The addresses may be latched to associate them with the token entry.
  • the token cache 12 may be provided as dual-port RAM to allow the token cache 12 to be written and read at the same time.
  • the circuitry described above in relation to FIG. 3B may still be used, provided that the selected token to be removed from the token cache 12 is not fed back to the inputs of the cache 12.
  • a system 40 for transmitting reduced data includes a transmitter 42 and a receiver 44.
  • the transmitter 42 and the receiver 44 communicate over a communications channel 46 that may be a local area network connection or a wide area network connection.
  • Communications channel 46 may use any suitable communications protocol such as
  • Channel 46 may be a wireless connection.
  • the transmitter 42 includes a token cache 12, a comparator 14, and a repositioning mechanism 16.
  • the token cache 12 is provided as two 256-byte arrays. One of the arrays will be referred to as Encode and the other as Decode, and their entries satisfy the property:
  • the comparator 14 accesses the Encode array to determine the current encoding for token T. That encoding is provided to transceiver 48, which transmits the encoding over the communications channel 46. The comparator 14 determines with which token the received data token should be swapped, and the repositioning mechanism
  • Decode2[e] X /* Swap tokens */
  • the program may be written in any one of a number of high level languages such as FORTRAN, PASCAL, JAVA, C, C++, or BASIC.
  • the software could be implemented in an assembly language directed to the microprocessor resident on the target computer, for example, the software could be implemented in Intel 80x86 assembly language if it were configured to run on an IBM PC or PC clone.
  • the software may be embodied on an article of manufacture including, but not limited to, a floppy disk, a hard disk, an optical disk, a magnetic tape, a PROM, an EPROM, EEPROM, field-programmable gate array, or CD-ROM.
  • the software may be configured to run on any personal -type computer or workstation such as a PC or PC-compatible machine, an Apple Macintosh, a Sun workstation, etc.
  • any device could be used as long as it is able to perform all of the functions and capabilities described herein.
  • the particular type of computer or workstation is not central to the invention.
  • the computer 500 typically will include a central processor 520, a main memory unit 522 for storing programs and/or data, an input/output (I/O) controller 524, a display device 526, and a data bus 528 coupling these components to allow communication therebetween.
  • the memory 522 includes random access memory (RAM) and read only memory (ROM) chips.
  • the computer 500 typically also has one or more input devices 530 such as a keyboard 532 (e.g., an alphanumeric keyboard and/or a musical keyboard), a mouse 534, and, in some embodiments, a joystick 536.
  • the computer 500 typically also has a hard drive 550 with hard disks therein and a floppy drive 552 for receiving floppy disks such as 3.5 inch disks.
  • Other devices 560 also can be part of the computer 500 including output devices (e.g., printer or plotter) and/or optical disk drives for receiving and reading digital data on a CD-ROM.
  • one or more computer programs define the operational capabilities of the system 500, as mentioned previously. These programs can be loaded onto the hard drive 550 and/or into the memory 522 of the computer 500 via the floppy drive 552.
  • the controlling software program(s) and all of the data utilized by the program(s) are stored on one or more of the computer's storage mediums such as the hard drive 550, CD-ROM, etc.
  • the programs implement the invention on the computer 500, and the programs either contain or access the data needed to implement all of the functionality of the invention on the computer 500.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

An apparatus for efficiently reducing data includes a token cache, a comparator, and a repositioning mechanism. The token cache stores a plurality of processed tokens. The comparator compares an input token to the token cache and outputs an indication that the first token is stored by the cache at a first position. Alternatively, the comparator may output that the first token is not stored by the token cache. A repositioning mechanism swaps the input token with a token already stored in the token cache, the second token selected responsive to the position of the first token. Corresponding methods are also described.

Description

AN EFFICIENT, LOCALLY-ADAPTIVE DATA REDUCTION METHOD AND APPARATUS
Field of the Invention
The present invention relates to data reduction methods and, in particular, to methods and apparatus for efficiently reducing data in a locally-adaptive manner.
Background of the Invention Information processing systems and data transmission systems frequently need to store large amounts of digital data in a mass memory device or to transfer large amounts of digital data using a resource which may only carry a limited amount of data at a time, such as a communications channel. Therefore approaches have been developed to increase the amount of data that can be stored in memory and to increase the information carrying capacity of capacity limited resources. Most conventional approaches to realize such increases are costly items of equipment or monetary expense, because they require the installation of additional resources or the physical improvement of existing resources. Data reduction, in contrast with other conventional approaches, provides such increases without incurring large costs. In particular, it does not require the installation of additional resources or the physical improvement of existing resources.
Data reduction methods and apparatus remove redundancy from an input data stream, while still preserving the information content. An input data stream can be a stream of data to be transmitted or a file to be compressed, and the input data stream is sometimes referred to as an alphabet A of symbols. The data reduction methods and apparatus which are the greatest interest are those which are fully reversible, such that an original data stream may be reconstructed from reduced data without any loss of information content. Techniques, such as filtering, which are not fully reversible, are sometimes suitable for reducing the size of visual images or sound data. They are, nevertheless, not suitable for reductions of program image files, textual report files and the like, because the information content of such files must be preserved exactly. There are two major goals in digital data reduction. The first goal is to maximize reduction by using the fewest possible bits to represent a given quantity of input data. The second goal is to minimize the resources required to perform reduction and reconstruction. The second goal encompasses such objectives as minimizing computation time and minimizing the amount of memory required to reduce and reconstruct the data. Data reduction methods of the prior art typically achieve only one these goals. For example, one reduction technique is "move -to-front" coding. The basic idea of this method is to maintain the alphabet A of symbols as a list where frequently occurring symbols are located near the front. A symbol "s" is encoded as the number of symbols that precede it in this list. Thus if A=("a", "m", "o", "n", . . .) and the next symbol in the input stream to be encoded is "n", it will be encoded as "3", since it is preceded by three other symbols. This encoding may be supplied to any appropriate standard variable-size bit-encoding scheme, such as Huffman encoding. After symbol "n" is encoded, it is stored (or moved if it was previously stored) at the front of A. Thus, after encoding "n" the list is modified to A=("n", "a", "m". "o"). This move-to-front step reflects the hope that once "n" has been read from the input stream, it will be read many more times and will, at least for a while, be a common symbol. The move-to- front method is locally adaptive since it adapts itself to the frequencies of symbols in local areas of the input stream. Unfortunately, this technique consumes a large amount of computational resources when the list A is large because each symbol in the list must be moved when a new symbol is brought to the front of list A. A variation of move-to-front coding, called "move-ahead-k," attempts to solve the computational efficiency problems of move-to-front coding by moving a list element matched by the current symbol k positions, instead of all the way to the front of list A. The parameter k can be specified by the user, with a default value of either n or 1. However, this variation does not eliminate moving multiple list elements. The computational load, therefore, may not be significantly lessened.
A second variation of move-to-front coding attempts to lessen computational load by moving an element of the list A to the front only after it has been matched c times to symbols from the input stream (not necessarily c consecutive times). In this variation, known as "wait-c- and-move," each element of A should have a counter associated with it, to count the number of matches. As above, however, this variation does not avoid shuffling the element of the list A and the concomitant computational load, but merely delays it.
Summary of the Invention
The present invention provides efficient data reduction systems and apparatus that exhibit the desirable properties of the move-to-front reduction schemes discussed above while avoiding their computational intensity. The present invention therefore provides data reduction techniques and apparatus that operate quickly and with minimal computational load. In one aspect the invention relates to an apparatus for efficiently reducing data which includes a token cache, a comparator, and a repositioning mechanism. The token cache stores a plurality of processed tokens. The comparator receives as input a first token to be processed and provides as output an indication that the first token is stored by the cache at a first position. The repositioning mechanism swaps the first token with a second token stored in a token cache. The second token is selected in response to the position of the first token.
In another aspect, the present invention relates to a reducer which can be used in a system for efficiently reducing data. The reducer includes a token cache, a receiver, a decoder, and a repositioning mechanism. The token cache stores a plurality of process tokens. The receiver receives token position information. The decoder accepts as input the received position information and provides as output a first token corresponding to the position information. The repositioning mechanism swaps the first token with a second token stored in the token cache. The second token is selected responsive to the received position infoimation.
In yet another aspect, the invention relates to a method for efficiently reducing data. The method includes the steps of determining the position occupied by a first data token in a token cache, selecting a second data token stored in the token cache, and swapping the first token and the second token. The second data token is selected responsive to the position of the first token.
In yet another aspect, the invention relates to a method for receiving transmitted data in system for efficiently transmitting data. The method includes the steps of receiving a transmitted token position indicator, identifying a first token stored in a token cache using the received token position indicator, determining a second token stored in the token cache based on the position of the first token, and swapping the first token and the second token. Brief Description of the Drawings
The invention is pointed out with particularity in the appended claims. The advantages of the invention described above, and further advantages, may be better understood by reference to the following description taken in conjunction with the accompanying drawings, in which: FIG. 1 is a block diagram of one embodiment of a data reduction apparatus; FIG. 2 is a flowchart of one embodiment of the steps taken reduce data; FIG. 3 A is a logic diagram of one embodiment of a comparator as used in the present invention;
FIG. 3B is a block diagram of an embodiment of token switching logic; FIG. 4 is a block diagram of an exemplary system using the apparatus and method of the present invention to efficiently transmit data; and
FIG. 5 is a diagram illustrating the components of a general purpose computer.
Detailed Description of the Invention Throughout the Specification, reference will be made interchangeably to "data tokens" or
"symbols". A data token or symbol is any conveniently-sized datum in which the described technique may find utility. Thus, a data token may be a 4-bit nibble, a 8-bit byte, a 16-bit word, a 32-bit longword, or some other conveniently sized datum.
Referring now to FIG. 1 , an apparatus 10 for efficiently reducing data includes a token cache 12, a comparator 14, and a repositioning mechanism 16. The token cache 12 stores data tokens. For simplicity, reference throughout will be made to the token cache storing tokens, although it should be understood to include embodiments in which the token cache stores representations of tokens. The comparator 14 compares tokens or symbols from an input stream to the token cache 12, and provides to the repositioning mechanism 16 an indication whether a representation of the current token is stored in the token cache 12. The input stream may be a stream of data tokens to be transmitted, or it may be a file to be reduced. The apparatus outputs a string of encodings which can be used by a decoding unit to reconstruct the reduced information. Referring also to FIG. 2, the steps taken by the apparatus 10 to efficiently reduce data are shown. A data token to be processed is received (step 102). The data token may be received as one of a stream of tokens to be transmitted over a communications channel, such as a local area network connection, a wide area network connection, or a wireless network connection. In some embodiments, the data token may be accessed from a buffer memory (not shown in FIG. 1) in which received tokens are stored before being processed. The buffer memory may be one or more transceivers embodied as integrated circuits. Alternatively, the data token may be accessed from a file to be reduced using the method of the invention. In general, use of the term
"received" is intended to refer to any method of accessing a data token for reduction, whether or not it is buffered before processing.
Once the data token is received (step 102), the comparator 14 determines whether a representation of the received data token is present in the token cache 12 (step 104). In one embodiment, the comparator 14 makes this determination by comparing the received data token with every entry in the token cache 12. In other embodiments, the token cache 12 includes two arrays, one of which stores token encodings and the other of which stores token decodings. An encoding array maps tokens to their respective positions in the output list. A decoding array maps positions in the output list to tokens. In these embodiments, the encoding array may be provided with a flag that indicates whether a token is contained in the token cache 12. The comparator 14 accesses the appropriate element of the encoding array to determine whether the received data token is stored in the token cache and, if so, what position it occupies in the output list. For example, an encoding for token "c" could be stored in the third element of an array. Alternatively, the token encoding could be stored in the 63rd element of an array, which corresponds to the ASCII encoding of "c." When a "c" token is received, the comparator 14 refers to the encoding array to determine if "c" is stored in the token cache, the encoding array may additionally provide the comparator 14 with the position of the token "c".
If a received data token is stored by the token cache 12, the position of the data token in the token cache 12 is determined (step 106). This determination may be made by the comparator 14 when it determines if the data token representation is stored in the token cache 12. Alternatively, a separate functional unit may be provided which independently determines the position of the token representation.
Once the position of the data token is determined, a second data token is selected (step 108) and the received data token is swapped with the second data token stored in the token cache 12 (step 110). Selection of the second token is made responsively to the position of the received data token. For example, the second token may have a position equal to three-quarter, one-half, one-quarter, one-eighth, or one-sixteenth the current position of the received data token. That is, the second token may be closer to the head of the list by three-quarters, one-half, one-quarter, one-eighth, or one-sixteenth the position of the first token. Selection of the second data token may be made in response to current performance characteristics of the data reduction. For example, the apparatus 10 may determine that data reduction achieved by swapping received tokens with data tokens at one-half their position is not acceptable and can begin using some other rule, such as three-quarters, to attempt to improve performance. Once the tokens are swapped, the next token in the input stream is processed until no more tokens exist in the stream. Referring back to step 104, if the comparator 14 determines that a received data token is not stored in the token cache 12, a second data token is selected (step 120). The second data token may be selected using a pseudorandom number generator. In some embodiments, the second data token is selected using the bits from successive numbers in the Fibonacci number sequence. Alternatively, a subset of those bits, such as the bottom three or top five, may be used. In other embodiments, well-known cache management techniques such as most-recently-used (MRU) or least-recently-used (LRU) may be used to select the second data token. The selected second data token is replaced by the received data token (step 122), and the next token in the input stream is processed until no more tokens exist in the stream.
Referring back to FIG. 1, and in greater detail, the apparatus 10 for performing the methods described above may be provided as hardware or software executing on a general- purpose computer. The token cache 12 stores received data tokens and may be implemented as any convenient memory element or memory data structure. For example, if tokens are 8-bit bytes, the token cache 12 may be implemented as a byte- wide memory chip such as SRAM. DRAM, SDRAM, or flash memory. Alternatively, the token cache 12 may be implemented as an array memory structure in which data tokens are stored. In these embodiments, the array memory structure matches the size of the tokens. In one advantageous embodiment, the token cache 12 is implemented as two arrays. One array stores encodings of data tokens, that is, the array maps tokens to their position in the output list. A second array, which corresponds to the first array, stores the decodings of positions, that is, it maps positions in an output list to the corresponding token value.
As noted above, the comparator 14 compares received data tokens to the token cache 12. In software, the comparison of the received data token to the token cache 12 is effected by comparing the received data token to every element in the token cache 12. FIG. 3 depicts an embodiment of the comparator 14 which combinatorially compares a received data token to the token cache 12. In the embodiment shown in FIG. 3, for simplicity, a 4-bit nibble is shown as the token size and only the block of logic 30 required to compare one entry in the token cache with a received data token is shown.
In the embodiment shown in FIG. 3, a received data token is stored in a buffer element 32. Circuitry 34 compares each bit of the received data token with the corresponding bit of an element in the token cache. In the embodiment depicted in FIG. 3, comparison circuitry 34 includes 2 AND gates 35, 36 and an OR gate 37. The bits to be compared are delivered to the inputs of AND gate 35. The output of this gate is positive only when both bits are equal to a logical "1" value. The inversion of each bit is delivered to the inputs of AND gate 36. The output of AND gate 36 is high only when both bits are a logic "0". The output of AND gates 35, 36 are connected to the inputs of OR gate 37, which outputs a logic "1" if the bits to be compared are both "0" or "1". The result of each individual comparison, that is, the output of each OR gate 37 is combined with the result of other OR gates in the logic block 30. The output of AND gate 38 is a logic "1" only when each and every output of OR gates 37 are logic "1". A logic "1" output from AND gate 38 indicates that the received data token stored in buffer memory element 32 matches a token stored in the token cache 12. In the embodiment shown in FIG. 3, the output of the respective AND gates 38 can be used to determine the position of the matching data token. Although only one logic block, which compares one token with a single received data token, is shown in FIG. 3, it will be readily apparent to one of ordinary skill in the art how to extend the embodiment shown to accommodate an entire token cache or different token sizes.
As noted above, if the received data token is stored in the token cache 12, a second data token is selected and the positions of those tokens are swapped. FIG. 3B shows a block diagram of a hardware implementation for swapping two data tokens in the token cache 12. In the embodiment shown in FIG. 3B, the token cache 12 is implemented a plurality of token-wide latches that can be simultaneously, selectively read. The outputs of the token cache 12 are fed back to the inputs of the token cache 12 through a crossbar switching element 39. The crossbar switching logic 39 allows any output of the token cache 12 to be fed to any input of the token cache 12. This implementation allows data tokens to be swapped in one clock cycle. In another implementation, the token cache 12 is provided as token-wide RAM memory elements. In this embodiment, each token cache location is read and the output is stored in a latch element. After both entries are read, they are written back to the memory element using the address of the other token entry. The addresses may be latched to associate them with the token entry. In other embodiments, the token cache 12 may be provided as dual-port RAM to allow the token cache 12 to be written and read at the same time.
For embodiments in which the received data token is not stored in the token cache 12, the circuitry described above in relation to FIG. 3B may still be used, provided that the selected token to be removed from the token cache 12 is not fed back to the inputs of the cache 12.
EXAMPLE The following example is one way of using the invention. The following example is meant to explain one way in which the invention can be used and should not be used to unduly limit the invention.
Referring now to FIG. 4, a system 40 for transmitting reduced data is shown and includes a transmitter 42 and a receiver 44. The transmitter 42 and the receiver 44 communicate over a communications channel 46 that may be a local area network connection or a wide area network connection. Communications channel 46 may use any suitable communications protocol such as
Ethernet, TCP/IP. or ATM. Channel 46 may be a wireless connection.
The transmitter 42 includes a token cache 12, a comparator 14, and a repositioning mechanism 16. In this example, the token cache 12 is provided as two 256-byte arrays. One of the arrays will be referred to as Encode and the other as Decode, and their entries satisfy the property:
Decode[Encode[i]] = i for all 0<= i <= 255. The Encode array and the Decode array may be initialized such that Encode[i] == i and Decode[i] == i for all elements I.
When a data token T is received by the comparator 14, whether read from a file or received over a communications channel, the comparator 14 accesses the Encode array to determine the current encoding for token T. That encoding is provided to transceiver 48, which transmits the encoding over the communications channel 46. The comparator 14 determines with which token the received data token should be swapped, and the repositioning mechanism
16 performs the swap. In software, the actions of the transmitter may be modeled as follows: e = Encode[T] /* What's the current encoding for token T? */
Output[e] /* Transmit encoding */ half_e = e/2 /* What encoding should token T have next time? */ X=Decode[half_e] /* What token has the future encoding of T */
Decode[e] = X /* Swap tokens */
Decode[half_e] = T Encode [X] = e Encode[T] = half_e The receiver 44 reconstructs T from e using its own copy of the token cache 12 constructed as two arrays, Encode2 and Decode2 (not shown in FIG. 4). The receiver 44 performs the following steps:
Receive[e] /* Receive encoding */
T= Decode2[e] /* What's the current encoding for token T? */ half_e = e/2 /* What encoding should token T have next time? */
X=Decode2[half_e] /* What token has the future encoding of T */
Decode2[e] = X /* Swap tokens */
Decode2[half_e] = T Encode2[X] =e
Encode2[T] = half_e
For embodiments in which the invention is provided as software, the program may be written in any one of a number of high level languages such as FORTRAN, PASCAL, JAVA, C, C++, or BASIC. Additionally, the software could be implemented in an assembly language directed to the microprocessor resident on the target computer, for example, the software could be implemented in Intel 80x86 assembly language if it were configured to run on an IBM PC or PC clone. The software may be embodied on an article of manufacture including, but not limited to, a floppy disk, a hard disk, an optical disk, a magnetic tape, a PROM, an EPROM, EEPROM, field-programmable gate array, or CD-ROM. In these embodiments, the software may be configured to run on any personal -type computer or workstation such as a PC or PC-compatible machine, an Apple Macintosh, a Sun workstation, etc. In general, any device could be used as long as it is able to perform all of the functions and capabilities described herein. The particular type of computer or workstation is not central to the invention. Referring to FIG. 5, the computer 500 typically will include a central processor 520, a main memory unit 522 for storing programs and/or data, an input/output (I/O) controller 524, a display device 526, and a data bus 528 coupling these components to allow communication therebetween. The memory 522 includes random access memory (RAM) and read only memory (ROM) chips. The computer 500 typically also has one or more input devices 530 such as a keyboard 532 (e.g., an alphanumeric keyboard and/or a musical keyboard), a mouse 534, and, in some embodiments, a joystick 536.
The computer 500 typically also has a hard drive 550 with hard disks therein and a floppy drive 552 for receiving floppy disks such as 3.5 inch disks. Other devices 560 also can be part of the computer 500 including output devices (e.g., printer or plotter) and/or optical disk drives for receiving and reading digital data on a CD-ROM. In the disclosed embodiment, one or more computer programs define the operational capabilities of the system 500, as mentioned previously. These programs can be loaded onto the hard drive 550 and/or into the memory 522 of the computer 500 via the floppy drive 552. In general, the controlling software program(s) and all of the data utilized by the program(s) are stored on one or more of the computer's storage mediums such as the hard drive 550, CD-ROM, etc. In general, the programs implement the invention on the computer 500, and the programs either contain or access the data needed to implement all of the functionality of the invention on the computer 500.
Having described certain embodiments of the invention, it will now become apparent to one of ordinary skill in the art that other embodiments incorporating the concepts of the invention may be used. Therefore, the invention should not be limited to certain embodiments, but rather should be limited only by the spirit and scope of the following claims.

Claims

CLAIMS What is claimed is: 1. An apparatus for efficiently reducing data comprising: a token cache storing a plurality of processed tokens; a comparator receiving as input a first token to be processed and providing as output an indication that said first token is stored by said cache at a first position; a repositioning mechanism swapping said first token with a second token stored in said token cache responsive to the indication of the first position.
2. The apparatus of claim 1 wherein said token cache comprises a random access memory.
3. The apparatus of claim 1 wherein said token cache comprises an ordered array of bytes.
4. The apparatus of claim 1 wherein said indication output by said comparator comprises a cache hit indicator and a cache position indicator.
5. The apparatus of claim 1 wherein said repositioning mechanism swaps said first token with a second token stored in said token cache at a second position, the second position being half the first position.
6. The apparatus of claim 1 further comprising a cache management unit, said cache management unit receiving as input said indication from said comparator and selecting a token to be removed from said token cache when said indication is negative.
7. The apparatus of claim 6 wherein said cache management unit selects a token to be removed using a pseudorandom number generator.
8. The apparatus of claim 6 wherein said cache management unit selects a token to be removed using bits from successive members of a Fibonacci number sequence.
9. The apparatus of claim 1 further comprising a transmitter transmitting said first position.
10. In a system for efficiently reducing data, a reducer comprising: a token cache storing a plurality of processed tokens; a receiver receiving token position information; a decoder accepting as input said received position information and providing as output a first token corresponding to said position information; and a repositioning mechanism swapping said first token with a second token stored in said token cache responsive to said received position information.
11. The reducer of claim 10 wherein said token cache comprises a random access memory.
12. The reducer of claim 10 wherein said token cache comprises an ordered array of bytes.
13. The reducer of claim 10 wherein said repositioning mechanism swaps said first token with a second token stored in said token cache at a second position, said second position being half said received position information.
14. The reducer of claim 10 wherein said receiver receives an indication of whether a token is stored in stored in said token cache.
15. The reducer of claim 14 further comprising a cache management unit, said cache management unit receiving as input a token and a negative indication that said received token is stored in said token cache and selecting a token to be removed from said token cache.
16. The reducer of claim 15 wherein said cache management unit selects a token to be removed using a pseudorandom number generator.
17. The reducer of claim 15 wherein said cache management unit selects a token to be removed using bits from successive members of a Fibonacci number sequence.
18. A method for efficiently reducing data, the method comprising the steps of: (a) determining the position occupied by a first data token in a token cache; (b) selecting, responsive to the position of the first token, a second data token stored in the token cache; and (c) swapping the first token with the second token.
19. The method of claim 18 further comprising the step of receiving a first data token.
20. The method of claim 18 wherein steps (a) and (b) comprise: (a) determining that a first data token is not stored in a token cache; and (b) selecting a second data token stored in the token cache.
21. The method of claim 18 further comprising the step of transmitting the first token.
22. The method of claim 20 wherein step (b) comprises selecting, using a pseudorandom number generator, a second token stored in the token cache.
23. The method of claim 20 wherein step (b) comprises selecting, using the bits of successive numbers in a Fibonacci number sequence, a second token stored in the token cache.
24. In a system for efficiently transmitting data, a method for receiving transmitted data, the method comprising the steps of: (a) receiving a transmitted token position indicator; (b) identifying a first token stored in a token cache using the received token position indicator; (c) determining a second token stored in the token cache based on the position of the first token; and (d) swapping the first token and the second token.
25. The method of claim 24 wherein step (c) comprises determining a second token stored in the token cache, the position of the second token equal to half the position of the first token.
26. The method of claim 24 wherein steps (b) and (c) comprise (b) determining that said received first token is not stored in the token cache; and (c) selecting a second token stored in the token cache.
27. The method of claim 26 wherein step (c) comprises selecting, using a pseudorandom number generator, a second token stored in the token cache.
28. The method of claim 26 wherein step (c) comprises selecting, using bits from successive members of a Fibonacci number sequence, a second token stored in the token cache.
PCT/US1999/030530 1998-12-22 1999-12-21 An efficient, locally-adaptive data reduction method and apparatus WO2000038331A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU20580/00A AU2058000A (en) 1998-12-22 1999-12-21 An efficient, locally-adaptive data reduction method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US21883498A 1998-12-22 1998-12-22
US09/218,834 1998-12-22

Publications (1)

Publication Number Publication Date
WO2000038331A1 true WO2000038331A1 (en) 2000-06-29

Family

ID=22816695

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1999/030530 WO2000038331A1 (en) 1998-12-22 1999-12-21 An efficient, locally-adaptive data reduction method and apparatus

Country Status (2)

Country Link
AU (1) AU2058000A (en)
WO (1) WO2000038331A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110661759A (en) * 2018-06-30 2020-01-07 华为技术有限公司 Access detection method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0283735A2 (en) * 1987-02-24 1988-09-28 Hayes Microcomputer Products, Inc. Adaptive data compression method and apparatus
US4796003A (en) * 1984-06-28 1989-01-03 American Telephone And Telegraph Company Data compaction
US4870662A (en) * 1987-12-01 1989-09-26 Concord Data Systems, Inc. System and method for compressing transmitted or stored data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4796003A (en) * 1984-06-28 1989-01-03 American Telephone And Telegraph Company Data compaction
EP0283735A2 (en) * 1987-02-24 1988-09-28 Hayes Microcomputer Products, Inc. Adaptive data compression method and apparatus
US4870662A (en) * 1987-12-01 1989-09-26 Concord Data Systems, Inc. System and method for compressing transmitted or stored data

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
HAYASHI S ET AL: "A NEW SOURCE CODING METHOD BASED ON LZW ADOPTIND THE LEAST RECENTLYUSED DELECTION HEURISTIC", PROCEEDINGS OF THE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERSAND SIGNAL PROCESSING,US,NEW YORK, IEEE, vol. -, 1993, pages 190 - 193, XP000409284 *
SALOMON D: "DATA COMPRESSION", 1998, SPRINGER-VERLAG, NEW YORK, XP002135752 *
YEHESKEL BAR-NESS ET AL: "WORD BASED DATA COMPRESSION SCHEMES 1", PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS,US,NEW YORK, IEEE, vol. SYMP. 22, 1989, pages 300 - 303, XP000131243 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110661759A (en) * 2018-06-30 2020-01-07 华为技术有限公司 Access detection method and device
CN110661759B (en) * 2018-06-30 2021-10-01 华为技术有限公司 Access detection method and device

Also Published As

Publication number Publication date
AU2058000A (en) 2000-07-12

Similar Documents

Publication Publication Date Title
JP4008826B2 (en) Device for cache compression engine to increase effective cache size by on-chip cache data compression
US6779088B1 (en) Virtual uncompressed cache size control in compressed memory systems
US5410671A (en) Data compression/decompression processor
US8161206B2 (en) Method and system for storing memory compressed data onto memory compressed disks
US20060047916A1 (en) Compressing data in a cache memory
US5235697A (en) Set prediction cache memory system using bits of the main memory address
US4558302A (en) High speed data compression and decompression apparatus and method
US6453388B1 (en) Computer system having a bus interface unit for prefetching data from system memory
US5778255A (en) Method and system in a data processing system for decompressing multiple compressed bytes in a single machine cycle
US7653798B2 (en) Apparatus and method for controlling memory allocation for variable size packets
US20090138663A1 (en) Cache memory capable of adjusting burst length of write-back data in write-back operation
CA2103445A1 (en) Data compression usin multipel levels
US5805086A (en) Method and system for compressing data that facilitates high-speed data decompression
US20200304146A1 (en) Variable-sized symbol entropy-based data compression
EP2005594A2 (en) High-speed data compression based on set associative cache mapping techniques
US4821171A (en) System of selective purging of address translation in computer memories
US5737638A (en) System for determining plurality of data transformations to be performed upon single set of data during single transfer by examining communication data structure
US7007135B2 (en) Multi-level cache system with simplified miss/replacement control
US7243204B2 (en) Reducing bus width by data compaction
US20020042861A1 (en) Apparatus and method for implementing a variable block size cache
US6654867B2 (en) Method and system to pre-fetch compressed memory blocks using pointers
WO2000038331A1 (en) An efficient, locally-adaptive data reduction method and apparatus
US5765190A (en) Cache memory in a data processing system
JP2003510685A (en) Cache replacement method and apparatus
US7133997B2 (en) Configurable cache

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase