GB2311635A

GB2311635A - Compression of data for storage using two CAM dictionaries in parallel

Info

Publication number: GB2311635A
Application number: GB9706452A
Authority: GB
Inventors: Alan Welsh Sinclair
Original assignee: Memory Corp PLC
Current assignee: Memory Corp PLC
Priority date: 1996-03-27
Filing date: 1997-03-27
Publication date: 1997-10-01
Also published as: GB9706452D0; GB9606465D0

Description

Data Conversion Device This invention relates to data conversion devices and in particular to a device for compressing and storing data.

The benefits of data compression in the fields of transmission and storage of data or images are well-known and highly desirable. Certain compression techniques exist which provide significant reduction in data volume. The highest compression ratios are achievable when the original data does not need to be reproduced exactly after decompression, as is often the case with image or speech information. In such applications, efficient methods such as the discrete cosine transform may be used.

However, for almost all data storage devices the data compression employed must not result in any loss of the data to be stored and retrieved. One effect of this need to ensure no data loss due to the compression stage is that the possible reduction in data volume resulting from the compression algorithm is much lower than for applications where a small loss in the data integrity is acceptable.

Many algorithms for error-free data compression have been developed to operate on long streams of data to be transmitted or on files of data to be stored. However, if data is to be compressed for storage on a random access storage device such as a magnetic disk or a solid state disk, independent compression must be performed on each block of data, since data blocks may be accessed in any order. Many operating systems use a blocksize of only 512 bytes. Thus, a viable compression ratio for random access devices must be achieved over a limited volume of data.

Data compression is achieved through elimination of redundancy which occurs in most sequences of digital data. Redundancy is in the form of repeated patterns of bits occurring within the data. These redundant sequences can be detected and replaced by a reference to a previous occurrence of the pattern within the data. A dictionary containing referenced bit patterns must be created from the data stream during both the compression and decompression operations to eliminate any necessity to store or transmit a dictionary with the data.

Data to be compressed is frequently in the form of a string of characters, which are commonly represented by a single byte. This is the natural form of textual and database information. Redundancy is present as a repetition of characters or character strings and compression is achieved through operations on subsets of the data comprising one or more characters. This still allows efficient compression of information, such as image data, which has no inherent character structure as redundant patterns are still present.

One form of compression known as Huffman compression operates on single characters and substitutes a codeword for a repeated character. A dictionary is created by storing each input character as it occurs whilst eliminating any character in the dictionary which matches it. A substitute codeword points to an entry in the dictionary. The number of entries that can be maintained in the dictionary is restricted by the need to ensure that the codeword is shorter than the character which it replaces. If an input character matches a dictionary entry then a pointer to the entry is output instead of the character. If no match occurs between an input character and a dictionary entry then the input character is output with no modification.

An architecture to implement this simple character substitution algorithm has been designed. The dictionary is implemented as a Content Addressable Memory (CAM) to allow very fast searches for character matches to be made. A shifting facility is incorporated in the CAM to allow insertion of additional entries and elimination of duplicate entries.

Examples of the compression schemes which operate on strings of characters are the LZ1 and LZ2 algorithms developed by Lempel and Ziv. These are substitutional compressors whereby a repeated string is replaced by a reference to the original occurrence of that string. The resulting compaction can be significant depending on the redundancy in the data.

The LZ1 algorithm operates by buffering the input byte sequence. At each step in the compression process, the longest prefix of the input stream which matches a subset of the previously read byte stream is identified as a match. A codeword is output containing a pointer to the previous occurrence of that string and the length of the string. If there is no match, the input byte is output and the algorithm repeats the process from the next byte.

In theory, LZ 1 remembers the entire input sequence, but in practice a limited buffer length is normally used.

The LZ2 algorithm does not store the input sequence of bytes but instead builds a local dictionary of recurring strings. At each step in the compression process, the longest prefix of the input stream which matches one of the strings in the local dictionary is identified and a codeword is output containing a pointer to the dictionary entry. The input is then advanced one byte and the string with the new byte appended is added to the dictionary. The process is repeated for each successive byte.

Some data compressors implementing a form of LZ2 are available commercially.

The dictionary is held on SRAM with entries in the form of linked lists located according to a complex algorithm which aids hash searching procedures. Hash searching is a method of performing a function on a string to produce a signature for accessing a table. The LZ2 algorithm is used in preference to LZ 1 because of its lesser dictionary search requirements, but data throughput is still limited to a maximum of 1.5Mbytes per second. This performance is inadequate for application in solid state disks.

Significantly higher performance can be achieved by using variants of the hardware architecture. A CAM can be used as the dictionary store to allow very fast searching to be performed. The CAM architecture does not lend itself to adaptive modification of dictionary entries in the form of linked lists of character strings and is therefore not the best implementation for the LZ2 algorithm. However, it is extremely suitable for implementing LZl since it can store all input data in sequence and perform fast matches of multiple byte strings in a large dictionary memory.

Previous teaching has been that a variant of the LZ1 algorithm is the better choice for small data blocksizes because it provides inherently faster leaming characteristics than LZ2.

Thus, most of the data compression algorithms which are currently known have been substitutional. They replace recurring strings in the input data with shorter codewords. These algorithms are normally adaptive: they learn from the input data stream to enable effective compression of any data type.

The learning characteristics of some algorithms prevent efficient compression from being achieved on a small block of data although the algorithms may be able to make large string substitutions and achieve good compression of a long stream of data. Conversely, other algorithms adapt very quickly to the data characteristics but can achieve only a limited degree of compression.

The current invention is concerned with methods and apparatus for merging compressed data output streams to combine fast learning characteristics with high compression ratio characteristics and achieve effective compression of small blocks of data.

The present invention accordingly provides a data conversion device comprising a circuit arrangement for generating a plurality of data streams from a first single data stream characterised in that the circuit arrangement merges at least some of the generated data streams into a second single data stream.

For a better understanding of the present invention and to show how the same may be carried into effect reference will now be made by way of example to the accompanying drawings, in which: Figure 1 shows the effect of two algorithms performed on a string of data; Figure 2 shows an architecture for one embodiment of the present invention; Figure 3 shows the flow in the data compressor; Figure 4 shows an example of the data format; Figure 5 shows the string substitution output data format; Figure 6 shows the dictionary CAM architecture; Figure 7 shows a dictionary CAM cell for string substitution; Figure 8 shows the character substitution output data format; and Figure 9 shows a dictionary CAM cell for character substitution One embodiment of the present uses two independent data compression algorithms as shown in Figure 1. These two independent data compression algorithms operate simultaneously on the input data stream by encoding their respective compressed data streams into a single data output. A corresponding decoding process occurs during data compression to create the data sequences for two independent decompression algorithms.

The algorithms employed are adaptive substitution techniques which each use a dictionary in the form of a buffer store for input data. Independent dictionaries are used for the two algorithms since different operations on the stored data are required by the two differing forms of compression. An effective implementation uses a string substitution algorithm coupled with a separate character substitution method, but combinations of other forms of algorithm are also possible.

Figure 1 shows a block of uncompressed input data 10. One algorithm is designated as the primary data compressor and this will typically use a string substitution technique.

The output data stream from this string substitution technique consists of a series of pointers to entries in its dictionary which is created and adapted from the input data stream, mixed with strings of unmodified input data where no redundancy was present and no data compression was achievable. The results of the operation of this primary data compressor is shown in Figure 1 as an area of compressed data containing primary algorithm dictionary references 12, and primary algorithm uncompressed data 14. Although these two areas would be interleaved in practice they are shown as separate groups of data for diagrammatic purposes.

A second data compressor, the secondary algorithm, operates simultaneously with the primary data compressor on the input data 10. The secondary algorithm will typically use a character substitution method and will create a series of secondary algorithm dictionary references 16 marking where compression can be achieved. Although secondary compression will show considerable overlap with that from the primary compressor it will achieve compression on a significant proportion of the data which is uncompressed by the primary compressor. The final uncompressed data 18 will therefore be smaller than the primary algorithm uncompressed data 14. The resultant data streams from the primary and secondary compressors are merged and encoded to incorporate substitutions by entries from both dictionaries, with priority given to the primary compressor.

Figure 2 shows an architecture for implementing data compression using two algorithms. A primary algorithm compression sequencer 20 operates on the input data 10 at the same time as a secondary algorithm compression sequencer 22.

Independent dictionaries (a primary algorithm dictionary 24 and a secondary algorithm dictionary 26) are constructed and maintained by the two compression sequencers 20,22. The secondary algorithm dictionary 26 may be constructed from the full input data 10, or it may be based only on characters or strings which have no entry in the primary algorithm dictionary 24.

A compressed data encoder/decoder 28 combines the primary algorithm output data stream 30 and the secondary algorithm output data stream 32 in such a way that the resultant compressed data may be interpreted during the decompression operation to recover the data components to be applied to the separate compression sequencers 20,22.

Algorithms for both primary and secondary data compressors may be chosen for implementation on Content Addressable Memories (CAMs). High sustained data throughput results from the fast character and string matches which can be performed with this associative memory structure. Such a "twin-CAM compression engine" achieves a very high performance, measured by both the high data throughput resulting from the CAM architecture and the high compression ratio on a small data blocksize achieved by the use of two compression algorithms.

The memory system within which the data compression resides will operate with a data path of a fixed width. However, data from application programs on a host system is most commonly byte structured and redundancy in the data is highest if subdivisions of one byte are considered, e.g. text. The substitutional algorithms implemented by the data compressor are therefore optimum when they operate with a granularity of a one byte character.

Accordingly, Figure 3 shows the input and output control functions that should be incorporated to translate the format of data between that of a host system and that which is appropriate for data compression operations.

An input data control block 40 formats fixed words of uncompressed data into characters forming an input data stream 42. The input data stream 42 is then routed to each of the data compressors which implement the data compression algorithms. The primary data compressor comprises: the primary algorithm compressor sequencer 20, the primary algorithm dictionary 24, and the compressed data encoder/decoder 28. Similarly, the secondary data compressor comprises: the secondary algorithm compressor sequencer 22, the secondary algorithm dictionary 26, and the compressed data encoder/decoder 28. The data compressor 44 shown in Figure 3 is a combination of the primary and the secondary data compressors. The output data stream 46 arising from this has a variable word width.

An output data control 48 then formats the variable word widths into fixed length words.

Output data from the primary data compressor is in the form of parallel data with a variable word width.

Figure 4 shows the output format of a data compression sequencer. The output data stream 46 can be considered as a series of entries which represent either a pointer to a dictionary entry 50 or a literal entry 52 from the input data. The entries may be mixed together in any order, as shown in Figure 4.

A literal entry 52 is a character from the input data for which no string match has been found in the dictionary. It is of fixed length and comprises a start bit 60 and a literal data character 54 from the input data 10.

Figure 5 shows a dictionary entry 50 and a literal entry 52. Each dictionary entry 50 is a fixed length string which is substituted in the output data stream for a variable length string of characters in the input data stream 10 for which a match was found in the dictionary. A dictionary entry 50 comprises: a start bit 60, a dictionary address pointer 62 which points to the dictionary address at which the start of the input data string is located, and the input data string length 64.

The dictionary address pointer 62 points to a location in the dictionary which contains a character from the preceding input data stream. The range of locations to be addressed therefore cannot exceed the length of the input data stream to date. The number of bits assigned to the dictionary address may be set to a function of the current input data character count to minimise the volume ofthe compressed data stream. The number of bits assigned to define the string length may be set to a function of the string length. The dictionary entry therefore varies in length as a function of input data character count and string length.

Figure 7 shows a Content Addressable Memory (CAM) 70 which can be used as an efficient dictionary store for the string substitution algorithm. When a data character is applied at the data in port 72 of the CAM block 74, a match line 76 is asserted for every row of the CAM 70 which contains a character matching the input character. These signals on the match lines 76 are processed by the match logic 78 which controls the address encode and decode block 80 and may enable an address corresponding to one of matched row locations to be output. The CAM 70 performs a character match operation concurrently on every character entry in the dictionary and this permits high data throughput.

On occurrence of a character match in the CAM 70, the Match output line 82 from the Match Logic 78 is asserted whilst the address of the row which registered a match (the matching row) is encoded as an output from the address encode and decode block 80. If more than one row matches a character (if there is more than one matching row) then the row which is nearest the row which was written earliest is used as the matching row and it is encoded as an output from the address encode and decode block 80.

Two types of matches can be made, selected by the Global/Local control signal 84 to the Match Logic 78. A global match allows any single row match to be registered as a CAM match, whilst a local match allows matches to be registered only in CAM rows where a match was recorded in the preceding row in the preceding cycle.

The field over which a data comparison is performed in the CAM 70 can be controlled. The start of the field is always the location of the CAM block 74 which was written earliest. The end of the field can be set by asserting one of a number of match field markers 86. The default end position is the last writable entry of the CAM block 74. This facility allows dictionary searches to be performed over a restricted area of the dictionary, so that the number of bits assigned to the dictionary address can be minimised when the dictionary contains few entries.

The CAM 70 can be pre-loaded with a standard dictionary at the start of a data block compression operation to minimize the need for adaptive learning from the input data stream to achieve string substitutions. Each CAM block location comprises a shadow latch in addition to the data latch. A starting dictionary can be loaded into the shadow latches when the data compressor is first initialised by using the address in port 88 and data in port 72 in conjunction with the program strobe signal 90. The contents of the shadow latches can be globally written to the data latches at the start of a compression operation by assertion of the load strobe 92.

Read and write access of the CAM 70 can be performed in an identical fashion to a conventional static RAM via the read line 94 and the write line 96.

Figure 7 shows the logical block structure of a CAM cell. Data in lines 100 and Data out lines 102 are shared by all cells in one CAM block 74 column. These data lines 100,102 may be differential pairs, or data input lines 100 and data output lines 102 may be combined into a single input/output line, depending on the physical circuit implementation of the cell. Write 104, read 106, load 108 , and program lines 110 are shared by all cells in one row of the CAM block 74. A separate match signal may be output in parallel from each cell in a row and a row match condition detected by gating these parallel signals, or the match signal from one cell may be passed to an adjacent cell to be combined with the match signal from that cell. This leads to a ripple propagation of the match signals along a row.

This CAM architecture 70 is used to perform a string substitution data compression algorithm as follows.

A character is input and is applied as input data 10 to the CAM 70, with global match mode set (that is, with the global/local control signal 84 set to global mode). If no match is detected, the character is transmitted as output data and is also written to the next free location in the CAM block 74.

The process is then repeated, with the next character applied to the data in line 100 of the CAM 70, with global match mode still set. If a match is detected, the character is written to the next free location in the CAM block 74 and is also written to a buffer (not shown). The next character is then input and is applied to the CAM 70, with local match mode set (that is, with the global/local control signal 84 set to local). If no match is found, the buffered character is transmitted via the data output line 102, because a single character is not considered to be a substitutable string, and the process is restarted with the same character as input to the CAM 70 with global match mode set. However, if a match in local mode is found, the character is written to the next free location in the CAM block 74, a counter recording matched string length is incremented, and the next character is input to the CAM 70 in local mode. This process is repeated in local mode until no match is detected for a character.

The address returned by the CAM 70 is then decremented by the string length and is output as a dictionary address whose bit length is a function of input character count as defined previously, the counter value is encoded as described previously and is output as the string length, the counter is cleared, and the process is restarted with the same character as input to the CAM 70 with global match mode set. This procedure continues until the number of input characters is equal to the data blocksize.

The decompression algorithm operates on entries in the compressed data stream at the bit level. As shown in Figure 5, each entry is of a defined length and is preceded by a start bit 60 differentiating dictionary entries 50 and literal entries 52. If the start bit 60 is logical 1, a following number of bits are read to form a literal data character 54. This will typically be 8 bits. This literal entry 52 is written to the next free location of the CAM block 74 and is transmitted as uncompressed data. A counter recording the number of uncompressed data characters is incremented. If the start bit 60 is logical 0, the entry is a dictionary entry and a following number of bits are read and stored in a dictionary address counter. The number of bits assigned to the dictionary address is a function of the uncompressed character count as described previously. If the first bit read following the dictionary address in the compressed data stream is logical 0, the string length is defined as two characters as described previously and the entry is complete. If the first two bits following the dictionary address are logical 1 and 0 respectively, the string length is defined as 3 characters, and if both bits are logical 1, the following 3 bits define the length of the string in excess of 3 characters.

When the dictionary address and string length information has been recovered from the compressed data stream in this way, the original data characters defined by the dictionary address and string length are read from the CAM block 74. Each original data character entry which is read from the CAM 70 is written to the next free location of the CAM block 74 and is transmitted as uncompressed data. The counter recording the number of uncompressed data characters is incremented. This procedure continues until the number of uncompressed characters is equal to the data blocksize.

Figure 8 shows the output data format for a character substitution algorithm. Like the primary compressor, the secondary data compressor provides output data as parallel data with a variable word width in the form of a series of entries which represent either a pointer to a dictionary entry 50 or a literal entry 52 from the input data 10. The entries 50,52 may be mixed together in any order, as shown in Figure 4.

A literal entry 52 in the output data is a literal entry from the input data 10 for which no string match has been found in the dictionary. It is of fixed length and comprises a start bit 60 and a single literal data character 54 from the input data 10.

A dictionary entry is a fixed length string which is substituted in the output data stream for a single character in the input data stream 10 for which a match was found in the dictionary. It comprises a start bit 60, and a pointer to the dictionary address 62 at which the matching character is located.

To ensure that compression of a single character is possible, the number of bits assigned to a dictionary address pointer 62 must be two or more bits less than the number of bits in an uncompressed character. For a character size of 8 bits, typically 4 or 5 bits are allocated to the address pointer 62, giving a maximum dictionary size of 16 or 32 entries.

The compression ratio achieved by the character substitution algorithm is determined by the compression factor of a matched character, the expansion factor of an unmatched character and the character match "hit ratio".

A Content Addressable Memory (CAM) 70 substantially the same as that in Figure 6 can be used as an efficient dictionary store for the character substitution algorithm. When a data character is applied at the Data in port 72 of the CAM block 74, a match line 76 is asserted for every row of the CAM block 74 which contains a character matching the input character. These match signals are processed by the match logic 78 which controls the address encode & decode block 80 and enables an address corresponding to the location of the matched row to be output. In practice, steps are taken to eliminate duplicate entries in the CAM and only a single match can be detected. The CAM is performing a character match operation concurrently on every character entry in the dictionary and this permits high data throughput.

To eliminate duplicate entries in the CAM block 74 and to ensure that characters which have achieved a recent match are maintained near the start of the dictionary, a shifting function is incorporated for the data write operation. The write operation causes the character at the data in port 72 to be written in the first row and other rows to be written with the data from the preceding row up to the location defined by the address in port 88.

A read operation causes the character at the addressed location to be transferred to the data out port 120.

Only global matches are required and no local match facility in the match logic 78 is necessary.

The CAM 70 can be pre-loaded with a standard dictionary at the start of a data block compression operation to minimise the need for adaptive learning from the input data stream 10 to achieve character substitutions. Each CAM block location comprises a shadow latch 122 in addition to a data latch 124. Each CAM block location has an individual write 104, read 106, load 108, and strobe 110 line. A starting dictionary can be loaded into the shadow latches 122 when the data compressor is first initialised by using the address in 88 and data in 72 ports in conjunction with the individual program strobe signal 110. The contents of the shadow latches 122 can be globally written to the data latches 124 at the start of a compression operation by assertion of the individual load strobe line 108.

Figure 9 shows the logical block structure of a CAM cell for use with a character substitution algorithm. Data in lines 100 and Data out lines 102 are shared by all cells in one CAM column. These data lines 100,102 may be differential pairs, or data input lines 100 and output lines 102 may be combined into a single input/output line, depending on the physical circuit implementation of the cell. Individual write 104, read 106, load 108, and program 110 lines are shared by all cells in one row ofthe CAM. Shift in lines 130 and shift out lines 132 provide connections to adjacent cells in the same column for data shift operations. The Data latch 124 must have two phase operation to support shifting. A separate match signal 76 may be output in parallel from each cell in a row, and a row match condition detected by gating these parallel signals, or the match signal 76 from one cell may be passed to an adjacent cell to be combined with the match signal 76 from that cell.

This leads to a ripple propagation of the match signals 76 along a row.

This CAM architecture is used to perform a character substitution data compression algorithm as follows.

A character is input and is applied as input data 10 to the CAM 70. If no match is detected, the character is transmitted as output data and is written to the CAM block 74 with the address in port 88 addressing the location with the oldest data. This causes all of the rows to be shifted down the CAM block 74 and the character to be written at the location previously occupied by the newest data. If the CAM block 74 is full prior to the shifting operation then the oldest data (which is in the bottom location of the CAM block 74) falls out of the dictionary. The process is then repeated, with the next character applied as input to the CAM block 74. If a match is detected, the address returned by the CAM 70 is interpreted as a dictionary pointer 50 and is transmitted as output data. This address is then applied to the address in port 88 of the CAM 70 to define the limit of a shift operation and the character is written immediately above the location of the newest data whilst its existing location in the CAM is overwritten. Another character is then input and the procedure continues until the number of input characters is equal to the data blocksize.

The decompression algorithm operates on en described embodiments within the scope of the present invention. For example, the start bit 60 may be logical one for the dictionary entry 50 and logical zero for the literal entry 52.

Claims

Claims

1. Data storage means including a data conversion device comprising a circuit arrangement for generating a plurality of data streams from a first single data stream characterised in that the circuit arrangement merges at least some of the generated data streams into a second single data stream.