GB2395641A

GB2395641A - Data compression

Info

Publication number: GB2395641A
Application number: GB0227400A
Authority: GB
Inventors: Gregory Keith Trezise
Original assignee: Hewlett Packard Co
Current assignee: HP Inc
Priority date: 2002-11-23
Filing date: 2002-11-23
Publication date: 2004-05-26
Also published as: GB0227400D0

Abstract

A method of compressing a data set, said method comprising: inserting a plurality of reset tokens at pre-determined intervals within said data set, to convert said data set into a plurality if smaller data chunks; compressing a plurality of said data chunks simultaneously in parallel with each other 202, 203, to produce a plurality of compressed data chunks. The method may be applied in a data storage device such as in a tape drive storage system. Other embodiments disclosed include: <SL> <LI>1) A method of converting a data stream comprising a plurality of reset tokens, a plurality of compressed data chunks and a plurality of sections of junk (stuffing) data; into a continuous data stream having a constant bit rate wherein searching is performed for the plurality of reset tokens and between receiving immediately consecutive reset tokens a stream of dat is output and between receiving alternate immediately consecutive reset tokens no dat is output. <LI>2) A data processing channel comprising a first buffer memory 200 for receiving an incoming data set; a plurality of compression engines arranged in parallel 202, 203 and which each receive data chunks from the first buffer memory; a further buffer memory 205, 206, 207, 208, 209 for receiving and storing the plurality of compression processed data chunks from the plurality of compression engines; and a scheduler 201 for allocating the plurality of data chunks to the plurality of compression engines. <LI>3) A write channel comprising a first buffer memory 200 for receiving an incoming data set; a plurality of compression engines arranged in parallel 202, 203 and which each receive data chunks from the first buffer memory; a second buffer memory; and a packer device for receiving a stream of compression processed data chunks and a plurality of packing bytes and to remove the packing bytes in order to output a continuous stream of compression processed data chunks. </SL>

Description

239564 1

-1 DATA COMPRE$810N

Fib.4d or the Imrenn The present invention relates to a method of data compression, to a method of converting a data stream, to an electronic data processing channel, to a data processing device for compressing data, and to a data storage device capable of compressing data.

Backaround to the Invention

10 Various types of data compression are known for multiple applications including data storage, telecommunications, image processing and the like. In the case of data storage, for example tape drive units, data is compressed before it is written to tape, and then decompressed after it is read from tape, to reconstitute the original uncompressed data. Benefits of compression are that a s larger amount of data can be stored on the same amount of tape compared to the uncompressed format, and information transfer from a write channel of a tape drive unit to a tape are higher than for uncompressed data. Similarly reading of compressed data can be achieved at a higher information transfer rate than for uncompressed data.

For prior art tape data storage systems incorporating data compression,

there are two fundamental rate determining processes which limit the rate at which data can be written to and read from tape. The first of these is the transfer rate between a read/write head and a tape data storage medium. The second is 25 the rate at which compression can be applied to a data stream in real time. In prior art tape drive technology, compressed data rates of 30 MB/s for reading and

writing onto tape are achieved. This is the sustainable data transfer rate over time for reading/writing to and from a tape data storage medium. Since the majority of data compresses in the ratio 4:1 or less, it is usual to require the o compression engine to accept data from the host computer at a data rate of 120 MB/s. Because the nature of data is that some parts of data are highly compressible, whereas other parts of the data stream are less compressible, or

sometimes uncompressible, within the tape data storage industry, manufacturers tend to quote to users a figure of twice the data rate at which compressed data can be read to or written from tape. For example, a tape drive unit capable of writing compressed data to tape at a rate of 30 MB/s, would be sow as a 60 MB/s tape data storage device, that is a device which is capable of inputting and outputting data from a host computer at the rate of 60 MB/s. In fact, such a tape drive would momentarily be able to transfer data to and from a host at a rate of up to 200 MB/s.

o Referring to Fig. 1 herein, there is illustrated schematically a prior art host

computer 100, and a write channel of a tape drive unit 101 connected to the host computer, for storing data from the host computer. The write channel shown in Fig. 1 is of a prior art second generation linear tape open (LTO) format tape drive.

A connection between the host computer 100 and the tape drive unit 101 may :s send and receive data at a peak data rate of 200 MB/s. Data received from the host computer is input into a relatively small burst buffer 103, typically of capacity 32 MB. Data is output from the burst buffer and is input into a compression engine 104 which compresses the data. The compression engine can compress this data at a rate of 120 MB/s. The compressed data output of the compression 20 engine is input into a formatter 105 arm into main buffer 106 which rate matches the output data stream from the compression engine, via a second formatter 107, to a tape mechanism 108, which is capable of writing data to a tape data storage medium at a rate of up to 30 MB/s. First formatter 105 formats the compression data output from the compression engine. Second formatter 107 formats the s data in the main buffer prior to input into the tape mechanism 108. The compression engine 104 may achieve compression rates of between 1:1 and 100:1 depending upon the inherent compressibility of the incoming data stream.

The burst buffer 103 is provided to temporarily buffer bursts of data from the so host device, until compression engine 104 is ready to input the data burst.

Typically, continuous streaming of data can occur between the host and the tape

data storage mechanism, but H large bursts of data are received from the host, then the burst buffer 103 can become full, in which case data streaming needs to be interrupted. Typically, the burst buffer can Ml up within less than a second where large bursts of data are received from the host.

The fundamental limits on the rate at which data can be written to the data storage medium are the physical limitations on data transfer rates to the data storage medium, and the rate at which the compression engine can compress data. Ideally, the compression engine would not be a rate limiting stage in the 10 write channel, and the rate at which data could be written should depend only upon the physical limitations of the write speed of data to the data storage medium. A typical physical limitation on writing data to tape limits the data transfer rate from a write head to a tape data storage medium to 30 MB/s.

Ideally, to maintain that rate, the compression should be effectively transparent to incoming data, so that the only data rate problem occurs in matching the data rate into the main buffer with the rate at which data can physically written to the tape data storage medium.

In practice, the compression engine 104 does have a maximum rate of data o throughput, i.e. the rate at which data can be compressed. This takes two forms.

The first limit is a maximum output rate, which practically is of the order of 30 MB/s. Secondly, there is also a maximum input rate, which is of the order 120 MB/s. Upstream of the compression engine, the burst data rate output from the host computer may be as high as 200 MB/s, and variations between the output :5 data rate of the host computer and the input data rate of the compression are matched by the burst buffer 103.

The speed at which the compression engine operates is fundamentally limited by the clock rate at which the compression engine circuit operates. With So current technology, there is a fundamental limit of 1 uncompressed byte per

- cycle. Therefore, for an incoming data rate of 120 MB/s, a clock rate of 120 MHz is required.

For future generations of tape drive mechanism, writs and read data rates s of the order 60 MB/s are being considered. Therefore, a compression engine having a maximum output of 30 MB/s becomes the primary data rate determining stage in a write channel for a tape drive unit. There is therefore the problem of how to increase the data rate throughput of a compression engine, in order to achieve a higher data rate readhNrite channel, for next generation tape drive JO units.

Summarv of the Inntlon According to a first aspect of the present invention there is provided a method of compressing a data set, said method comprising: inserting a plurality s of reset tokens data at predetermined intervals within said data set, to convert said data set into a plurality of data chunks; compressing a plurality of said data chunks simultaneously in parallel with each other, to produce a plurality of compressed data chunks.

zo According to a second aspect of the inversion, there is provided a method of converting a data stream comprising: a plurality of reset tokens; a plurality of compressed data chunks; and a plurality of sections of junk data; into a continuous data stream having a substantially constant bit rate, said method comprising: searching for said plurality of reset tokens; between receiving 2s immediately consecutive said reset tokens, outputting a stream of data; and between receiving alternate immediately consecutive said reset tokens, producing no data output.

According to a furler aspect of the present invention, there is provided an 30 electronic data processing channel for processing a data set of electronic digital data, said channel comprising: a first buffer for receiving a plurality of incoming

-A digital data sets; a plurality of ALDC format compression engines arranged in parallel, said plurality of compression engines arranged to input said data sets as a pluralRy of data chunks, wherein each data chunk is of a smaller size than a said data set, said plurality of compression engines outputting a plurality of 5 compression processed data chunks; a second buffer memory, said second buffer memory receiving said plurality of compression processed data chunks, and storing said plurality of compression processed data chunks consecutively in an order corresponding to an original order of said plurality of input data chunks; and a scheduler component for allocating said plurality of data chunks to input To into said plurality of compression engines.

According to yet a further aspect of the present invention, there is provided a write channel for writing a stream of digital data to a data storage medium, said write channel comprising: a first buffer device for receiving an incoming data set; a plurality of compression engines, each compression engine operating to compression process at least one data chunk of said data set; a plurality of second output buffer devices, said plurality of second output buffers operating to receive compression processed data chunks output from said plurality of compression engines; a third buffer device, said third buffer device receiving data go chunks output from said plurality of second buffer devices; and a scheduler component for scheduling input of said data chunks into said plurality of compression engines, for scheduling output of said compression processed data chunks into said second plurality of buffers, and for routing said plurality of compression processed data chunks from said second plurality of buffers, into 2 5 said third buffer.

According to yet a further aspect of the present invention there is provided a write channel for writing digital data to a data storage medium, said write channel comprising: a first buffer for receiving an incoming stream of data sets; a plurality 30 of compression engines arranged to operate in parallel, each said compression engine arranges to process a stream of data chunks, a plurality of said data

-6 chunks comprising a data set; a second buffer memory, said second buffer memory comprising a plurality of memory locations, each said rremory location of a size suitable for stonog a said data chunk; and a packer device, said packer device arranged for inputting a stream of said compression processed data 5 chunks and a plurality of packing bytes, and removing said plurality of packing bytes from said data stream, to output a continuous stream of said compression processed data chunks.

The scope of the invention is limited only by the features of the claims To herein.

Brief tb'ecriotion of the Drawinas For a better understanding of the invention and to show how the same may be carried into effect, there will now be described by way of example only, 15 specific embodiments, methods and processes according to the present invention with reference to the accompanying drawings in which: Fig. 1 illustrates schematically a prior art write channel in a tape data

storage device, operating according to the known adaptive lossless data 20 compression standard; Fig. 2 illustrates schematically a first write channel according to a first specific embodiment of the present invention, including a plurality of parallel data compression engines; Fig. 3 illustrates schematically scheduler components of the first write channel of Fig, 2 for routing data chunks between components of the first write channel; 3 o Fig. 4 illustrates schematically a plurality of compressed data chunks stored in the main buffer memory of the first write channel;

Fig. 5 illustrates schematically data flow of a plurality of data chunks through a plurality of compression engines, and storage of compressed data chunks in a main buffer memory; Fig. 6 illustrates schematically process steps carried out by the first write channel for processing a stream of data sets into a stream of smaller data chunks, which become compressed individually; To Fig. 7 illustrates schematically a second write channel according to a second specific embodiment of the present invention, in which compressed data chunks are stored in main buffer slots of a size equal to an uncompressed data chunk in a burst buffer; Fig. 8 illustrates schematically a data stream output of a main buffer memory of the second write, which is input into an on the fly packer component; Fig. 9 illustrates schematically an output of an on the fly packer component of the second write channel; Fig. 10 illustrates schematically memory locations in a main buffer of the second write channel, containing compressed data chunks; Fig. 11 illustrates schematically a method of operation of the data packer for 2 5 removing packing bytes and outputting a continuous stream of compressed data; Fig. 12 illustrates schematically a data flow of a data set divided into a plurality of uncompressed data chunks, through a plurality of compression engines, resulting in a plurality of compressed data chunks, proceed by the 3 o second write channel;

-8 Fig. 13 illustrates schematically data processing carried out by the second write channel for parallel processing of a plurality of data chunks making up a data set; 5 Fig. 14 illustrates schematically a read channel for reading a plurality of compressed data chunks and decompressing the data chunks, including a plurality of parallel operating decompression engines; and Fig. 15 illustrates schematically a data flow diagram for reading and To decompression of a plurality of data chunks, carried out by the read channel of Fig. 14 herein.

Detalled D - criotion of a $ - civic Mode for Carrvina Out the Invention There Will now be described by way of example a specific mode contemplated by me inventors for carrying out the invention. In the following description numerous specific details are set forth in order to provide a thorough

understanding of the present invention. It will be apparent however, to one skilled in the art, that the present invention may be practiced without limitation to these specific details. In other instances, well known methods and structures have not o been described in detail so as not to unnecessarily obscure the present invention.

Referring to Fig. 2 herein, there is illustrated schematically a write channel having a compression engine, suitable for use in a tape drive unit, according to a first specific embodiment of the present invention. The architecture uses two 25 compression engines, with swing buffers and a chunk size of 32 Kbytes. The write channel comprises a burst buffer 200 which receives incoming data from a host computer. The burst buffer outputs two data streams, to a pair of compression engines 201, 202 respectively. Each compression engine outputs data to two chunk sized 'swing' buffers 203, 204 and 205, 206 respectively.

o Fixed sized data chunks, of a size 32 kBytes in the best mode, are output from the burst buffer into the compression engines. When processing a data chunk,

-9- the compression engine outputs a reset token as the first symbol of each compression processed chunk. The swing buffers send composed data to a main buffer 207. The main buffer 207 outputs a continuous stream of compressed data to a tape drive mechanism, for writing the data to a tape data storage medium.

A first write channel scheduler 201 allocates data chunks stored in the burst buffer 200 alternately to the first compression engine 202 and to the second compression engine 203.

A second write scheduler 204 schedules the outputs of the first swing buffer 205 and second swing buffer 206 to fill memory slots in the main buffer 210, and similarly, schedules the outputs of the second and third swing buffers 207, 208 outputting data from the second compression engine 203, to fill memory slots in 15 main buffer 210.

A long data stream is divided into a plurality of data blocks. The compression engine compresses each data block in tom. Data blocks are separated by means of a reset token, which indicates when a new data block is o being started.

The compression engines 202, 203 may be implemented as prior art

ALDC/CAM engines. ALDC engines are known in the art, for example, in the ECMA standard ECMA - 222 published June 1995, and available from s http//:www.ecma.ch. ALDC compression engines have the characteristic that according to the ALDC algorithms data is process at the rate of no more than one input byte per clock cycle. ALDC is a particular implementation of LZ -1 compression. The compression engine has the features that it compresses data in a linear fashion, and identifies reset tokens. Unlike DCLZ (which is LZ - 2) a 30 compression dictionary is not required. Instead, a content addressable memory (CAM) is used as a 'history buffer' in a long shift register configuration. Data to

-10 be compressed is pushed into the shift register one byte at a time, and the content addressable memory logic indicates any matches found in the previous 1000 or so bytes of data. Further bytes are pushed in until there is only one match remaining, and at this point the address and length of the match are 5 encoded into a token. If there were no matches in the first place, then a 'literal' token is generated based on the data byte instead. As tokens are produced, they are packed into bytes that are output to form the compressed data stream. The bytes that 'fall out' of the far end of the content addressable memory are discarded and do not form part of the compressed data stream. At various 10 points, the content addressable memory is usually reset in order to provide access points during reads. For example, in the known Hewlett Packard Ultrium product, the access points occur at least at the start of the each first complete record in the data set.

A typical content addressable memory for a tape data storage device would be 1024 x 9 bits in size. Making the content addressable memory larger improves the compression performance, but makes the design a little slower 2 o because the content addressable memory logic has to OR tougher matched bits for every location. Standard ALDC cores usually have 512, 1024 or 2048 bits.

To implement decompression requires that the content addressable memory works backwards, i.e. like an SRAM. The smallest hardware :s implementations tend to add SRAM functionality to the content addressable memory, but a separate decompression engine would need a small amount of static random access memory, as little as one Kbyte.

Referring again to Fig. 2 herein, data received from a host computer is 3 o stored in the burst buffer 200. In the burst buffer, the raw data received from the host computer is stored in a plurality of data blocks, each of a pre-detemmined

-11 size. Scheduler 201 takes pairs of data blocks, and routes a first data chunk of the pair to a first compression engine 202, and in parallel routes a second data chunk of the pair to a second compression engine 203. The first and second compression engines operate in paraNel, compressing the two data chunks at the 5 same time. Depending upon the compressibility of data within each chunk, each data chunk may be compressed to a lesser or greater extent. An output of the first compression engine 202 is fed to first pair of swing buffers 205, 206. The output of the first compression engine is a compressed data chunk. The size of the compressed data chunk depends upon the compressibility of the input data, Jo and in the general case can be different for each compressed data chunk.

Therefore, compressed data chunks stored in the main buffer 210 may be of different sizes to each other, and separated from each other by a reset token.

Since the compressed data chunks are written sequentially into memory, and since the compressed data chunks can be of arbitrary length between a lower :5 (fully compressed) data size, and an upper data size where further compression has not been possible, or compression is very minimal, the swing buffers are necessary in order to allow continuous compressed data to be written to the main buffer in successive data chunks. The second scheduler 204 schedules output of the compression engines into the first and second pairs of buffers. Third o scheduler 209 schedules the output of the first and second pairs of swing buffers into the main buffer memory 210.

Each compression engine needs to know only: Where its uncompressed data starts in the burst buffer, Which swing buffer to write the compressed data into; and How much data to compress (fixed at 32 kB in the first embodiment).

-12 lt is the ob of the scheduler to supply this information to each compression engine. Using the architecture of Fig. 2, each of the first and second compression 5 engines can be run at the same clock speed as the rest of the circuit, but because a plurality of compression engines are operated in parallel, compression of a plurality of data blocks can occur simultaneously, so that effectively more than one byte per clock cycle is being processed by the compressors.

JO In the first specific embodiment, compression processed data chunks are stored in a contiguous area of the main buffer. The rate at which data arrives at the write channel is not necessarily constant, and it is the purpose of the main buffer to smooth out the data rate variations in the compressed data stream issuing from the compression engines, so that the downstream part of the tape s drive, ie the tape heads, and tape mechanism can nun synchronously with the data rate arriving at the tape heads.

Referring to Fig. 3 herein, there is illustrated schematically components of the scheduler in the write channel for scheduling passage of data into and out of 2 0 the compression engines, through the swing buffer and into the main buffer. Th scheduler 300 comprises a component 301 for multiplexing the output of the burst buffer to inputs of a plurality of compression engines; a plurality of components 302 - 304 for multpiexing the output of the plurality of compression engines into the corresponding pairs of swing buffers; and a plurality of 25 components for routing the outputs of the pairs of swing buffers into memory locations of the main buffer.

Referring to Fig. 4 herein, there is illustrated schematically a logical arrangement of a plurality of compressed data chunks stored in the main buffer, 3 o of the write channel of Fig. 2 herein. The chunks of compressed data may be of random size, provided this is less than or equal to a size of a block of

-13 uncompressed data of fixed size. The compressed chunks of data are separated from each other by reset tokens.

Within the prior art LTO format, there is a requirement that periodically, a

reset token is induded. This provides an access point from which recovery of data can commence. Decompression of data may recommence from a reset token, but not from a position between reset tokens. The length between access points is one data set, which is equivalent to one block of compressed data.

o The beginning and end of a data set are not necessarily coincident with the reset tokens. The LTO format requires that there is at least one reset token within the data set, and does not care if there is more than one reset token per data set.

s According to the first embodiment shown in Figs. 2 - 4, in the main buffer, there are many access points, provided by many reset tokens. That is, there are tokens which occur before the end of a full data set. Therefore, the data between each access point and the next can be compressed and decompressed independently of each other, and this feature enables the capability of employing o a plurally of compression engines to operate in parallel. Data in the burst buffer is divided into a plurality of data chunks. In the best mode implementation, a data chunk is of a size of 32Kbytes. The chunks are then multiplexed by the scheduler, and sent to the first and second compression engines 202, 203 in parallel, so that compression of the two data chunks occurs simultaneously.

s Within each compression engine, compression of the two deferent data chunks takes approximately the same time, but can result in compressed data chunks of widely different bit length, due to the deferent compressibilis of the two data chunks. For example data in the first chunk in the first compression engine may be compressed to a small byte size, whereas the second chunk of data in the so second compression engine may be compressed only marginally, to a much larger byte size (but still within the 32 Kbyte chunk size).

-14 Referring to Fig. 5 herein, there is illustrated schematicaJly storage of uncompressed data chunks in the burst buffer, and their routing to a fist compression engine and the second compression engine, and storage of 5 compressed data chunks in memory locations of a main buffer, after output from the first and second compression engines.

In an implementation having two compression engines, the first compression engine may compress odd numbered chunks, and the second 10 compression engine may compress even numbered chunks, so that the first two chunks N. N+1 are compressed in parallel by the two compression engines, then the next two chunks N + 2, N + 3 are compressed and so on, the first compression engine compresses chunks N. N+2, N+4, N+6.... and so on, whilst the second compression engine compresses chunks N+1, N+3, N+ 5, N+7,........

and so on.

The compressed data chunks are stitched back together again to form a compressed data set. The compressed data set has embedded within it a plurality of reset tokens, providing access points, so that in the main buffer, the go final compressed data set composes a plurality of compressed data chunks, in sequence, because the data set is divided up with a plurality of reset tokens.

This means that each compression engine can continue to compress the next chunk of data, without having prior knowledge of the previous data chunk, so s that the compression algorithm can operate from the access point, rather than needing to have a complete knowledge of the whole of the data set in order to perform compression of the data chunk.

In the above description there have been shown specific embodiments in

So which two compression engines are operated in parallel. However, the general architectures are extendable to incorporate an arbitrary plurality of compression

-1 engines in parallel. For implementations having two compression engines in parallel, compression processing occurs at a rate of two bytes per clock cycle.

For the general case having a plurality of a number P of compression engines operating in parallel, compression processing operates at the rate P bytes per 5 clock cycle.

When joining back together the compressed data set in the main buffer, there is no prior knowledge of how long each compressed data chunk we be.

Therefore, as a real time operation, there is no knowledge of where to start 10 writing a second data chunk which is being compression processed in parallel to the first data chunk, until the compression of the first data chunk has finished. In the first embodiment, this problem is overcome by the provision of the swing buffers which provide temporary storage of the compressed data chunks, until both data chunks have finished being compressed. At that point, the length of s each compressed data chunk is known, and memory locations can be allocated for storage of each data chunk in the memory, so that the data chunks are stored contiguously in the main buffer.

Each pair of swing buffers works in tandem so that, for example, whilst a o first swing Or 205 Is filling up with the output of a current compression process of first compression engine 202, the output of a previous compression process for a previous chunk stored in second swing buffer 206 may be emptying into a memory location of the main buffer. Inparallel with this process, the third swing buffer 207 may be filling up with the output of the second compression engine 203, whilst simultaneously the fourth swing buffer 208, is emptying the compressed data chunk previous output from the second compression engine, into a memory locating in the main buffer contiguous with the end of the memory location used to store the output of the second swing buffer 206. Emptying of the swing buffer into the main buffer is triggered by the compr - Con engine finishing o compression of a Mole data chunk. In one embodiment, once both compression engines have finished their compression process, then the two compressed data

-16 chunks can be emptied in paraUel into the main buffer. In the best mode implementation, the order of data chunks stored in the burst buffer is the same as the order of compressed dam chunks in the main buffer.

5 When the compression engine completes compression of a data chunk, the ALDC compression engine sends a signal to the scheduler, so that the scheduler is able to determine that the swing buffers can be emptied into the main buffer.

Once the scheduler receives a signal from each of the plurality of compression engines, that they have each completed compression of a chunk of data, then the o scheduler can trigger emptying of the swing buffers into the main memory in an ordered fashion, so that the compressed data chunks are stored in the main memory in the original order in which the uncompressed data chunks were received in the burst buffer.

15 Signals from the ALDC compression engines that they have completed compression of data chunk can also be used by the scheduler to input a new uncompressed data chunk into each of the compression engines. The scheduler may wait until it has received all compression complete signals from all of the plurality of compression engines before inputting any new chunks of data into any go compression engine, or may - al with each compression engine independently, and route a next uncompressed chunk of data to a compression engine, once the scheduler has received a compression complete signal from that particular compression engine. Therefore, the plurality of the compression engines may be clocked synchronously to input uncompressed data chunks in one mode of 25 operation, or may be operating asynchronously to each input uncompressed chunks of data at different times in a second mode of operation.

The size of each swing buffer is determined by the size of an uncompressed data chunk. In a case where an uncompressed data chunk so cannot be compressed any further, that data chunk will pass through the compression engine without any significant compression. Therefore, each swing

-17- buffer needs to be of a size capable of storing an uncompressed data chunk, although for much of their operation, they will only store compressed data chunks, which are of a sgnifi"ntly smaller number of byes than an uncompressed data chunk. Consequently, the architecture becomes more expensive, the larger the chunk size which is selected, since this involves having a larger capacity swing buffer. On the other hand, having a relatively smaller size chunk lowers the efficiency of the ALDC compression algorithm. Therefore, there is a tradeoff between having a data chunk size which incurs the cost of large swing buffers, and having a decrease in efficiency in the compression o algorithm thorough selection of a relatively smaller chunk size.

In the first specific embodiment, taking as an example a case where the incoming uncompressed data is, on average, compressed by a compression ratio of 4:1, then a plurality of uncompressed data chunks of, for example 8 x 32 Is kBytes, would result in 8 compressed data chunks in the main buffer, occupying a memory space of 64 kBytes.

The compression rate of each compression engine can be described in two ways; (a) firstly, the rate at which uncompressed data enters the compression engine. This is the usual way of describing a compression engines throughput. (b) secondly, the rate at which compressed data is generated by the :s compression engine.

A characteristic of the ALDC algorithm is that the rate at which uncompressed data enters the compression engine is constant at 1 byte per cycle. When the data is spa into 32 kB chunks, this worsens the compression so ratio by approximately 10% compared to the situation where a whole data set is being compressed at once, and also worsens the rate at which compressed data

-1 is generated by me engine by approximately 10%, but does not cause the compression operation to take longer. Therefore, using two parallel ALDC compression ungirt can be said to exactly double the rate of compression.

Because each mm!pression engine is operating to compress data independently 5 of each other compression engine, any further replication of the compression engines, is adding more engines in parallel, does not incur any further compression rate penalty due to the smaller chunk size relative to the original data set.

JO Referring to Fig. 6 herein, there is illustrated schematically overall process steps carried out by the first write channel illustrated with reference to Figs. 2 - 5 herein. In process 600, a plurality of reset tokens are introduced into a received data set in the burst buffer to create a plurality of uncompressed data chunks.

The data chunks are typically of a size of around 32kB. In process 601, a plurality l of the data chunks are input in parallel to a plurality of compression engirdles for simultaneous compression. Selection of data chunks in the burst buffer and routing of those data chunks to the plurality of compression engines is achieved by the scheduler. In process 602, the plurality of data chunks are compressed in parallel in the plurality of compression engines. Since each compression must go process the whole of a data chunk in order to apply the ALDC compression algorithm, each of the plurality of compression engines takes an approximately equal amount of time to compress a data chunk. However, the compressed chunk size output from each of the plurality of compression engines, in the general case will be different, and unpredictable. The size of compressed data z chunk may range from around 100 of the size of an uncompressed data chunk at one extreme, and up to the full size of an uncompressed data chunk at another extreme. In process 603, the plurality of data chunks are input, via the swing buffers, into the main buffer, in the same order in which they appeared in the original data set. Each pair of swing buffers at the output of a compression 30 engine operate in tandem. Whilst one swing buffer is filling up with a currently

-19 processed data chunk, the other swing buffer is emptying into the main buffer, the result of the immediately previously processed data chunk.

The process of Fig. 6 operates in real time as an ongoing data processing operation. Data sets input into the burst buffer as a continuous data stream, are divided up into uncompressed data chunks and fed into a plurality of compression engines in parallel, are compressed in parallel, and the compressed data chunks are input sequentially into the buffer in parallel to reconstitute the original data set. The original data set is emptied from the main buffer down a o write channel of a data storage device, for writing to a data storage medium.

Referring to Fig. 7 herein, there is illustrated schematically a second write channel according to a second specific embodiment of the present invention.

The second write channel comprises a burst buffer 700 for receiving raw data from a host computer; a first scheduler 701 for assigning individual data chunks into a plurality of compressors 702, 703 respectively; a second scheduler 704; a main buffer 705 for storing compressed data chunks; and an 'on the fly' packer 706. 2 0 Burst buffer 700 inputs raw data from a host computer in full data sets. First scheduler 701 reads the data set in chunks of data, assigning each chunk to a compression engine of the plurality of compression engines. In the embodiment shown in Fig. 7, there is first compression engine 702 and second compression engine 703. The first scheduler 701 assigns a consecutive data chunk to each of 25 the first and second compression engines in parallel so that the two data chunks are each compressed in parallel. Each compression engine compresses data chunks in a self contained manner, so that the output of each compression engine is a compressed data chunk. Secondly scheduler 704 routes the output of the first and second compression engines into memory slots in the main buffer.

so The memory slots in the main buffer are each of a size which is capable of containing a whole uncompressed data chunk, in order to account for the

-20- eventuality that an uncompressed data chunk onginating from the host computer cannot be further compressed. Since each incoming uncompressed data chunk is allocated an amount of memory in the main buffer after compression, which is equal to or greater to the data size of the uncompressed data chunk, the amount of memory occupied by each of the reserved memory slots in the main buffer are greater in the second embodiment, than those in the first embodiment, where a plurality of swing buffers were provided.

The scheduler writes out the compression processed data chunks into the 10 plurality of fixed size slots in the main buffer. Each compression process data chunk starts with a reset token, and has a reset token at the end of the compression processed data. Since this compression processed chunk may be of variable size, and each slot is of a feed size, the slot will therefore also contain whatever data was there before writing of the compression process data, for 15 example the tail end of a previous chunk of data which was not quite so compressible. This does not matter, since the on-the-fly packer will interpret the end reset token to mean "skip to the end of the slot". The first reset token of the compression process data which is on the slot boundary is interpreted as the start of a compression process data chunk. The next reset token which is not on go a slot boundary, is interpreted to mean that the on-the-fly packer should skip to the end of the memory slot.

Referring to Fig. 8 herein, the output of the main buffer comprises a set of compressed data chunks, interspersed with junk data. The junk data may be random or pseudo random data for example.

Referring to Fig. 9 herein, there is illustrated schematically a data output of on the fly' packer 706, in which junk data is removed from the data stream. After passing through the 'on the fly packer 706, the data stream is continuous, having JO the junk data removed. Individual compressed data chunks are delineated from each other by a plurality of reset tokens.

-21 Refe'Ting to Fig. 10 herein, there is iHustrated schematicaHy storage of compressed data chunks in the main buffer memory at a set of prdeterrnined memory locations, each memory location capable of contorting data of the same 5 size as the uncompressed data chunks, in this embodiment 32kB.

In the second embodiment, taking as an example 8 x 32kB uncompressed data chunks in the burst buffer, these would be stored in the main buffer occupying a memory space of 8 x 32kB, including packing bytes. On removal of 10 the packing bytes by the on the fly packer, the compressed data chunks would occupy a space of 64 kB, i.e. a space of '/. of the space occupied by the original uncompressed data chunks. Consequently, in the second embodiment herein, the main buffer needs to be of a larger see than in the first specific embodiment described herein, for the same data handling capacity in a write channel.

Although the second specific embodiment, from a practical point of view requires a larger main buffer than the first specific embodiment, implementing interfacing and scheduling bet seen the output of the plurality of compression engines and a single main buffer is easier in the second specific embodiment o than in the first s,oecific embodiment. Therefore, the succors specific embodiment is preferred practically compared to the first specific embodiment. As a practical implementation, both the first and second embodiment write channels are likely to have the main buffer implemented as a separate integrated circuit.

At the end of each compressed data chunk, there is inserted a second reset token R' so that each compressed data chunk is marked by a first reset token at the beginning of the compressed data chunk, and a second rent token and the end of the compressed data chunk. Therefore, the Junk data which occupies the unused reining portion of each memory location in the main buffer which is not 3 o occupied by compressed data, can be readily identified by the packer.

-22 The on the fly packer simply outputs data continuously until it reads a reset token. Once it reads a reset token, it skips to the next reset token before re-

commencing output of data. Therefore, the packer skips over the junk data, and outputs a continuous Strom of compressed data.

Referring to Fig. 11 herein, there is illustrated schematically process steps carried out by the packer. In process 1100, when the packer inputs a reset token, the packer outputs data continuously in process 1101. When the packer receives a next reset token in process 1102, the packer closes the output and 10 outputs no data 1103. On receiving a next reset token, the packer reverts to process 1100 and follows on to continue to output the data. The process continues alternating between outputting compressed data chunks and outputting no data. According to the ALDC format, reset tokens are highly identifiable bit sequences, and so can be identified by the packer. The packer may implemented entirely in hardware.

In the general case, there is no reason why the scheduler cannot assign incoming uncompressed data chunks to whichever compression engine become available, and whether the scheduler assigns consecutive incoming 2 0 uneonqpress data drunks to one compression engine or another of the plurality of compression engines is an implementation specific detail. Each compression engine can already determine the size of data chunk which it is to compress, and can establish the start of the data chunk which it is to compress next by reading a memory start address at which the uncompressed data chunk is stored in the 25 burst buffer.

In the second specific embodiment, each compression engine needs to have the information of: 3 o a start location at which an uncompressed data chunk is to be read from the burst buffer;

-2 a destination memory location in the main buffer at which the compressed data chunk is to be output to; and 5 how much data to compress (in the second embodiment 34 kBytes).

It is to job of the scheduler to supply the above information to each compression engine. The scheduler may be implemented as a single hardware item, or alternatively may be implemented as a firmware item.

Since the memory locations from which uncompressed data chunks are being read in the burst buffer are known by the scheduler, the on the fly packer does not need to have any functionality to distinguish between a first reset token in a compressed data chunk and a second reset token, it can simply alternate its 15 output mode for each reset token which it receives, provided it receives a synchronization from the scheduler.

In the second embodiment, the on-the-fly packer can run freely through the burst buffer, outputting the content of each slot in order to create a steady stream 2 0 of data. The on-the-fly packer only needs to know how her through the burst buffer the compressed data extends. More specifically, since more than one slot of the memory will be being filled at any one moment, the packer needs to know the burst buffer address of the earliest slot which has not yet been completely filled. The scheduler supplies this information to the packer.

The packer needs to receive the information of the point in the main buffer up to which the last complete compressed data chunk has been read. It receives this data from the scheduler. In other words, the packer needs to have the information as to the point in the main buffer from which compressed data chunks o have already been read.

-24 The output of the packer has to be at a fixed data rate, which is synchronized to the analog part of the write channel, and leads to the write head.

Consequently, in the specific embodiments, the rate determining stage is 5 the output of the main buffer to the analog write channel driving the write head, since, with a plurality of compression engines provided, the plurality of compression engines can provide compressed chunk data to the main buffer at a rate as fast as the main buffer can receive it.

10 The ALDC standard and the industry in general require that data written to tape using the ALDC format can be read back from tape using a single de compression engine. This is possible with data written to tape using the specific methods described herein.

Referring to Fig. 12 herein, there is illustrated schematically data flow from a plurality of memory locations in the burst buffer 1200, through first and second compression engines 1201, 1202 respectively, and into a plurality of equally seed memory locations in the main buffer. In this specific implementation the first compression engine 1201 and the second compression engine 1202 compress o alternate data chunks from the first buffer, and store the corresponding compressed data chunks in alternate memory locations in the main buffer. Thus, for a sequence N. N +1, N+2, N+ 3...of uncompressed data chunks, first compression engine 1201 processes data chunks N. N+2, N+4, whilst second compression engine 1202 compresses even numbered chunks N+1, N+3, N+5 and so on, and the scheduler arranges for the odd numbered compressed data chunks to be stored alternately with the even numbered compressed data chunks in the main memory.

The ALDC tomcats specifies that there must be at least one access point o (reset token) in every data set. However, it does not exclude having more than one access point per data set.

-25 By causing each ALDC engine to use a start address offset by a full chunk size from the engine,ocsing the previous chunk, thereby leavers gaps in the main buffer, which are packed with Junk data, being data previously existing the s main buffer and which is ignored. The engine concludes each compression processed data with a 'history reset' token. On emptying the main buffer, the presence of a 'history reset' token not already on a chunk size boundary would cause the buffer pointer to advance to the next chunk size boundary, skipping the invalid junk data between the compressed data chunks. If the chunk size is a 10 multiplier of two this can be trivially accomplished in hardware. The history reset token is chosen as an end of data marker, as the presence of this extra token in this position, i.e. adjacent to the 'history reset history' token at the beginning of the next chunk, will not have any detrimental effect during decompression.

as The serial data is multiplexed into two or more compression engines, and the resulting data streams reconstituted into a single data stream prior to input to a tape drive mechanism.

When decompressing, the incoming chunks of data will be of varying size, o but will always decompress to exactly the chunk size already used. Providhg this is known, each ALDC can write its decompressed data directly into the burst buffer, and no gaps will result.

This has the following results: 2s The overall compression throughput is not limited to the speed of the content addssabb memory. By generalizing the above architecture a number of ALDC compression engines can be used to achieve a compression throughput of N x faster than the maximum read access 3 o rate of a single content addressable memory.

-26 A traditional architecture ALDC engine, for example in a legacy product, can decompress the data stream, albeit more slowly.

With moderately sized chunks, for example 32Kbytes, the compression ratio is reduced by only a smaM amount. For example for data that is compressible in the ratio 4:1 using continuous ALDC, the scheme achieves 3.56:1 with a 32 Kbyte chunk size.

Referring to Fig. 13 herein, there is illustrated schematically processes Lo carried out by the second write channel according to a second specific method of the present invention. In process 1300, an incoming data set stored in the buffer memory is divided up into a plurality of data chunks of equal size. In process 1301, uncompressed data chunks are input into a plurality of compression engines in parallel. In process 1302, the plurality of uncompressed data chunks s are each compressed individually, with compression of the plurality of data chunks occurring in parallel by a plurality of compression engines arranged in parallel. In process 1303, the compressed data chunks output from the plurality of compression engines are stored in a plurality of uncompressed chunk sized memory locations, such that the compressed data chunks are stored in the same o order in the memory lotions of the main buffer as were the uncompressed data chunks in the burst buffer, so that the main buffer stores a plurality of compressed data chunks having packing bytes there between, so that the full data set is reconstituted within the main buffer. In process 1304 the on the fly packer reads consecutive compressed data chunks from the main buffer, and z outputs a continuous steady data rate data stream of compressed data to an analog write channel for writing to a data storage medium.

Refemng to Fig. 14 herein, there is illustrated schematically a read channel for reading compressed data from a tape data storage medium, incorporating a so plurality of parallel ALDC/CAM compression engines. The read channel reads data from the tape drive mechanism 1400, and comprises a main buffer 1401,

-27 the main buffer 1401 outputting compressed data chunks into a plurality of decompression engines 1403,1404; and burst buffer 1405.

Compression data is streamed continuously from tape drive mechanism 5 1400, being read by a read head and being converted into analog signals. The analog signals are digitised and unformatted and stored in main buffer 1401.

Read scheduler 1402 identifies individual compressed data chunks by means of reset tokens and allocates them alternately to first and second ALDC decompression engines 1403, 1404 respectively. Each decompression engine JO inputs one compression chunk at a time, and outputs a decompressed data chunk. Decompressed data chunks are output into a plurality of destination memory locations in burst buffer 1405. The memory locations in burst buffer 1405 may all be of a same size, in the best mode implementation 32kB.

Although the compressed data chunks may each have a different data size, less s than or equal to 32kB in this example, when decompressed, they each expand to occupy 32kB of data.

Referring to Fig. 15 herein, there is illustrated schematically a plurality of compressed data chunks in main buffer 1500. Each compressed data chunk has 20 a different data size. The decompressed data chunks are stored in the burst buffer 1502, the decompressed data chunks each occupying a same amount of burst buffer memory as each other.

In the embodiment described, in a tape drive architecture, a novel buffering 25 arrangement is used to allow a plurality of compression engines to operate concurrently on a single data stream. In this way, a speed bottleneck of single compression engine is avoided, and data can be compressed at more than one symbol per clock cycle.

Claims

-28 Clas: 1. A method of compressing a data set, said method comprising:

inserting a plurality of reset tokens at pre-detenmined intervals within said data set, to convert said data set into a plurality of data chunks; and compression processing a plurality of said data chunks in parallel with each other, to produce a plurality of compression processed data chunks.

JO
2. The method as claimed in claim 1, comprising: inserting a said reset token as a first symbol of each said compression processed data chunk.

5
3. The method as claimed in claim 1 or 2, comprising: inserting a said reset token at the end of each said compression processed data chunk.

go
4. The method as claimed in any one of the preceding claims, further . composing: arranging said plurality of compression processed data chunks in order, such that said plurality of compression processed data chunks are in a same order as said plurality of data chunks before compression processing.
5. The method as claimed in any one of the preceding claims, wherein said process of compression processing saw plurality of data chunks Comprises applying an adaptive lossless data compression aigonthm to said o plurality of data chunks.

-29
6. The method as claimed in any one of the preceding claims, further composing: storing said plurality of compression processed data chunks in a plurality of 5 memory locations, wherein said plurality of compression processed data chunks stored in said plurality of memory locations fomm a contiguous sequence of compression processed data chunks.
7. The method as claimed in any one of the preceding claims, further o comprising: storing said plurality of compression processed data chunks in a plurality of fixed size memory slots, wherein each said fixed size memory slot is capable of storing an uncompressed said data chunk.
8. The method as claimed in any one of the preceding claims, further comprising: storing said plurality of compression processed data chunks in a plurality of 20 fixed size memory slots, wherein each said fixed size memory slot is capable of storing an uncompressed said data chunk; and converting said plurality of compression processed data chunks into a continuous bit stream having a substantially constant data rate.
9. A method of converting a data stream comprising: a piuramy of reset tokens; 3 o a plurality of compressed data chunks; and

-30 a plurality of sections of junk data; ins a continuous data stream having a substantially constant bit rate, said method compnsing: searching for said plurality of reset tokens; between receiving immediately consecutive said reset tokens, outputting a stream of data; and between receiving alternate immediately consecutive said reset tokens, producing no data output.
10. An electronic data processing channel for processing a data set of s electronic digital data, said channel comprising: a first buffer memory for receiving a plurality of incoming digital data sets; a plurality of ALDC format compression engines arranged in parallel, said o plurality of compression engines arranged to remive saw data s from said first buffer memory as a plurality of data chunks, wherein each data chunk is of a smaller size than a said data set, said plurality of compression engines outputting a plurality of compression processed data chunks; :s a second buffer memory, said second buffer memory receiving said plurality of compression processed data chunks, and storing said plurality of compression processed dada chunks consecutively in an order corresponding to an original order of said plurality of input data chunks; and 30 a scheduler component for allocating said plurality of data chunks to input into said plurality of compression engines.

-31
11. A data processing channel as claimed in claim 10, further comprising: a plurality of third buffers, said plurality of third buffers receiving an output of a single said compression engine, said plurality of third buffers operating to a receive consecutive compression processed data chunks output from said compression engine.

To
12. A write channel for writing a stream of digital data to a data storage medium, said write channel comprising: a first buffer device for receiving an incoming data set; :s a plurality of compression engines, each compression engine operating to compression process at least one data chunk of said data set; a plurality of second output buffer devices, said plurality of second output buffers operating to receive compression processed data chunks output from 2 o said plwalltr of compression engirms; a third buffer device, said third buffer device receiving data chunks output from said plurality of second buffer devices; and z a scheduler component for scheduling input of said data chunks into said plurality of compression engines, for scheduling output of said compression processed data chunks into said second plurality of buffers, and for routing said plurality of compression processed data chunks from saw second plurality of buffers, into said third buffer.
13. The write channel as claimed in claim 12, wherein:

-32 each said compression ergine outputs a stream of compression processed data chunks to at least a pair of saw second buffers, wherein one of said pair of buffers operates to receive a said compression processed data 5 chunk, and in parallel, anther of said pair of second buffers operates to output a compression processed data chunk.
14. The write channel as claimed in claim 12, wherein: JO an output of a plurality of compression processed data chunks are stored in a set of memory locations in said third buffer device, wherein individual memory locations of said set of memory locations are of variable size.
15. A write channel for writing digital data to a data storage medium, 15 said write channel comprising: a first buffer for receiving an incoming stream of data sets; a plurality of compression engines arranged to operate in parallel, each said 20 compression engine arrange to process a stream of data chunks, a plurality of said data chunks comprising a data set; a second buffer memory, said second buffer memory comprising a plurality of memory locations, each said memory location of a size suitable for storing a 25 said data chunk; and a packer device, said packer device arranged for receiving a stream of said compression processed data chunks and a plurality of packing bytes, and removing said plurality of packing bytes from said data stream, to output a 30 continuous stream of said compression processed data chunks.

-33
16. The write channel as claimed in claim 15, further comprising a scheduler device for: routing a plurality of said data chunks to said plurality of compression engines.
17. The write channel as claimed in claim 15, further comprising a scheduler for; To routing a plurality of compression processed data chunks output from said plurality of compression engines into said plurality of memory slots of said second buffer.
18. A data processing device operable for carrying out a method of compressing a data set, said method comprising: inserting a plurality of reset tokens at pre-determined intervals within said data set, to convert said data set into a plurality of data chunks; and o compression procs"'ng a plurality of said data chunks in parallel with each other, to produce a plurality of compression processed data chunks.
19. A data processing device comprising; s means for inserting a plurality of reset tokens at pre-determined intervals within a data set, to convert said data set into a plurality of data chunks; and a plurality of compression engines operable for compression processing a plurality of said data chunks in parallel with each other, to produce a plurality of so compression processed data chunks.

-34
20. Program data comprising instructions for operating a data processing device for carrying out compression processing of a plurality of data sets, by: 5 inserting a plurality of reset tokens at preetemmined intervals within said data set, to convert said data set into a plurality of data chunks; and compression processing a plurality of said data chunks in parallel with each other, to produce a plurality of compression processed data chunks.
21. A data storage medium carrying program data comprising instructions for operating a data processing device for carrying out compression processing of a plurality of data sets, by: 15 inserting a plurality of reset tokens at pre-determined intervals within said data set, to convert said data set into a plurality of data chunks; and compression processing a plurality of said data chunks in parallel with each other, to produce a plurality of compression processed data chunks.
22. A data processing device loaded with program instructions for carrying out compression processing of a plurality of data sets, said instructions causing said data processing device to carry out the processes of: 25 inserting a plurality of reset tokens at pre-determined intervals within said data set, to convert said data set into a plurality of data chunks; and compression processing a plurality of said data chunks in parallel with each other, to produce a plurality of compression processed data chunks.
23. A data storage device comprising;

-3 means for inserting a plurality of reset tokens at preeterrnined intervals within a data set, to convert said data set irdo a plurality of data chunks; and a plurality of compression engines operable for compression processing a plurality of said data chunks in parallel with each other, to produce a plurality of compression processed data chunks.