WO1993000635A1

WO1993000635A1 - Data storage management systems

Info

Publication number: WO1993000635A1
Application number: PCT/GB1992/001137
Authority: WO
Inventors: Alan Welsh Sinclair
Original assignee: Anamartic Limited
Priority date: 1991-06-21
Filing date: 1992-06-22
Publication date: 1993-01-07
Also published as: GB9113469D0

Abstract

A data storage management system sotres logical blocks of data in a main memory device (6). A data compressor (10) compresses the blocks prior to storage and thereby produces blocks having various sizes. A data controller (8) stores the blocks at physical locations in free memory space. A supplementary memory device (fig. 7, 26) is used for the storage of logical blocks whose physical size exceeds the largest area of free memory space available in the main memory device. Blocks of data are stored within partitions in the memory device, along with an identifying header. A block address table then stores the partition address corresponding to the partition in which the logical block is stored.

Description

DATA STORAGE MANAGEMENT SYSTEMS

This invention relates to data storage management systems and in particular to systems for storing data which has a variable block size, such as may arise when data compression techniques are employed.

In many data storage devices such as magnetic disk, optical disk and magnetic tape data is stored in units of fixed size called blocks, which may typically be of size 512 bytes. Each block is allocated a logical address which uniquely identifies it and the set of logical block addresses forms a continuous series.

Data is held on the physical storage medium in corresponding blocks whose physical addresses can readily be derived from their logical addresses. For a random block access storage device such as a magnetic disk, the logical to physical address mapping requirement constrains the device to operate with a fixed physical block size and for physical and logical addresses to follow the same order. The physical block address is simply a function of the logical address and the physical block size. The physical address space may not be continuous but may accommodate the characteristics of the storage medium

The effective capacity of a storage device can be increased by the use of data compression techniques, such as run-length encoding, which are commonly applied to data transmission to reduce the volume of data. This type of technique can be applied in magnetic tape devices where data is written to and read from the tape in a continuous stream and fast random access of a block of data is not necessary. It can also be applied by either hardware or software means to a file of data to reduce its volume before it is passed to a device driver which interfaces with a storage device and transfers data in units of a fixed blocksize. However, it has not been possible to incorporate such data compression within the immediate control structure of a random block access storage device such as magnetic disk because of the difficulty of managing the variable physical blocksize which results from compression of fixed logical block of data.

For a magnetic disk, the logical address of a block of data is translated to a physical address algorithmically, that is, the physical location of the block can be computed from a knowledge of sequential block number, fixed blocksize and the characteristics of the medium. This physical address is defined in terms of cylinder or track and sector and the physical blocks are ordered identically to the logical blocks. Deviation from such a logically sequential ordering of physical blocks would require the use of a lookup table for address translation which, for a high capacity disk, would be of a size which could reasonably be stored only on the disk itself. The seek and latency times for retrieval from magnetic disk of address information from a look up table would unacceptably compromise the performance of the disk system.

Thus a logical block is assigned a fixed physical area on a magnetic disk when the disk is first formatted. This presents a major obstacle to effective use of any technique such as data compression which results in variation in the size of a block data to be stored. Any increase in size modification of a previously stored block cannot be tolerated if no spare capacity for the block has been allowed and the alternative of allocating sufficent storage capacity to a block to accommodate its maximum size for worst case data compression ratio is not viable because of the wide spread of data compression efficiency for different types of data.

Current techniques for incorporation of data compression in magnetic disk storage systems are confined to operation during data transfer to the device driver for the magnetic disk. The device driver accesses the disk with a single fixed blocksize, or with a very limited range of fixed blocksizes if partitioning of the disk is employed.

Another problem which can arise after data compression is the fact that some data is very incompressible and may result in a larger blocksize than that of the original data. Clearly this presents a problem if the compressed blocksize is larger than the blocksize on the memory device.

One object of the present invention is to provide a scheme for efficient management of storage of data having a variable blocksize.

The invention is defined in the appended claims to which reference should now be made.

The invention will now be described in detail by way of exampl with reference to the accompanying drawings in which:

Figure 1 shows schematically one example of compression of a logical block of data;

Figure 2 shows examples of continuous data space, and partitioned data space;

Figure 3 shows an address map for use with continuous data space; Figure 4 shows various options for writing a data block to memory;

Figure 5 shows various options for relocating a block o stored data;

Figure 6 shows a block diagram of the control system fo the storage management system of the present invention; and

Figure 7 shows the storage system architecture for a system which includes a memory overflow device.

In this description of the invention no specific details of the software used by the management system are given since this can be implemented straightforwardly by a man skilled in the art.

In the context of this invention. It is assumed that a data compression, system is resident in the data path between the device interface, which operates with fixed blocksize, and the memory controller which manages the storage of the resultant blocks of data of variable size. The compression of a logical block of data is shown schematically in figure 1. The logical data block is of fixed size, which may be defined during the initial formatting of the device. The size of the compressed block is variable, and whilst is is typically significantly smaller than the logicl block, it may also be larger if the data proves to be uncompressable. The physical blocksize is not infinitely variable but is constrained to be an integral number of capacity units of the storage device which may be designated as tiles. The size of a tile is defined during initial formatting of the device and the maximum blocksize is set to typically 16 tiles. Thus the number of tiles in a block of data to be stored depends on the data compression ratio achieved for that block.

A tile is a physically contiguous rectangular area of N memory locations. The Addressing scheme for a tile is set out in WO90/09634-A and the arrangement is such that the N memory locations are addressed by N successive counts of an address counter. This is achieved by interleaving at least the higher order address bits between row and column decoders.

For the purposes of the current invention a tile is not limited to a physically contiguous area of memory but is used to describe a set of consecutive memory addresses which may not by physically contiguous on the memory.

The data storage management technique employed has four elements;

(1) the organisation of the storage of data blocks in physical memory and the means of logical to physical address translation

(2) the operation of writing a data block to the storage memory

(3) the relocation of data blocks within the storage memory to eliminate fragmentation of the available memory space

(4) the operation of swapping a data block between the storage memory and another storage device

Data is stored in the memory in the form of data blocks, whose length is variable, in discrete steps. That is to say each block comprises an integral number of smaller fixed capacity units called tiles. A physical block may typically comprise up to 16 tiles. The order in which the blocks are stored in the memory device is not constrained to be the same as their logical order and a block may be located without any fixed relationship^'to other logical tiles. To accommodate this storage scheme, a look up table is required to translate a logical tile number to a physical address. This address map may be held in a semiconductor memory which is either within the memory controller or is in a dedicated region of the memory device and is accessed for each data block transfer. A record of the logical block number can be stored as a header with each data block so that any block read from physical memory may be identified.

The organisation of stored data blocks may be one of the following.

(i) Continuous data space.

With this organisation, data is stored in a continuous area within the storage memory and blocks of differing size can be located adjacently, as shown in figure 2. The memory space can be considered as cyclic and the head of the data space can wrap around to the bottom of the physical memory if it is free. An address map provides the physical start address and physical block length for each logical block number, as shown in figure 3.

(ii) Partitioned data space, fixed blocksize.

With this organisation, data is stored within discrete partitioned areas of the memory space according to the length of the data block. Within one partitioned area all blocks have identical length and one partitioned area is allocated for each possible size of data block, as shown in figure 2. The memory space can be treated as cyclic in the same manner as for the continuous data space. The address map need contain only the physical block number related to the bottom address for the partitioning area, since all blocksizes are identical within the partition. The partitions are of variable size and their boundaries may be moved by a process known as adaptive partitioning in accordance with the actual volume of data stored in each partitioned area. If a partitioned area cannot be expanded further because of data stored in adjacent areas, the partition may be fragmented in two or more non-adjacent areas. This may be achieved by simple management of the address pointers indicating the partition boundaries.

(iii) Partitioned data space variable blocksize.

With this organisation, data is stored within discrete partioned areas of the memory space but blocks of differing size can be located adjacently, as shown n figure 2. The partitions are of fixed size and location. This technique is particularly appropriate to data storage on magnetic disk, where a partioned area could be a track or cylinder on the disk. The address map need contain only the address of the partioned area at which each logical block number is located.

Data is written to the storage memory in blocks of ^■variable size. When a block is modified its size may increase. For this reason there may not be sufficient space available at the address it previously occupied for it to be re-stored there. For this reason a write scheme is used whic will locate the block at a suitable physical location irrespective of whether the block exists elsewhere in the memory. Whenever a block is written, an entry is made in th address table to define its physical location thereby mapping its logical to its physical address. If the block previously existed elsewhere in memory, it becomes obsolete and a hole is effectively created in the data space. The start address and length of this hole is entered in a separate hole address table.

Options for locating a data block during a write operation are shown in figure 4 and are as follows.

(i) Write to head of data space.

Data is written at the first available address above the highest address occupied by data, and an entry is made in the block address table. The length of the data block need not be known prior to starting the write operation and so this scheme does not require prior buffering of the write data stream, unless it is implemented in conjunction with the partitioned data space with fixed blocksize organisation. A head of the data space may be defined as an address immediately above which is an area of free memory which is large enough to guarantee storage of a data block of maximum size. Several heads may therefore exist concurrently within the data space and these can be marked by address pointers in a separate head address table.

(ii) Write to exact fit.

If the data stream is buffered prior to starting the write operation, the exact length of the physical block is known before its location in memory need be determined and the block can be located in. a hole of the exact size to accommodate the block. Block and hole address tables are updated accordingly.

(iii) Write to best fit. This scheme is similar to the write to exact fit scheme but the hole selected for the write location is one with closest available size to the data block.

(iv) Write to first fit.

This scheme is similar to the write to exact and best fit schemes, but the hole selected for the write location is the first encountered which is large enough to accommodate the data block.

The schemes described for writing a block of data have the effect of introducing holes in the data space where blocks had previously been located and this leads to fragmentation of the unused memory space and effective reduction of the available memory space for writing data. To combat this fragmentation problem, a data relocation scheme which involves physical movement of blocks of data within the storage memory can be adopted. A block or group of blocks is read from the storage memory to a data buffer and is relocated by a write operation to a different location in memory. Block, hole and head address tables are updated accordingly.

Options for data relocation are shown in figure 5 and are as follows.

(i) Fill hole from head or tail.

The data block from either the head or tail of the data space is read into a buffer memory and is written to a location where a hole exists in accordance with the write to exact, best or first fit schemes.

(ii) Enlarge hole.

A hole in the data space is enlarged by relocation of a adjacent data block in accordance with any of the data write schemes .

(iii) Compact partition.

With this technique, all data blocks residing in a partition of the memory space are read into a buffer memory and are written back to the same partition after compaction to eliminate holes.

Data may be swapped between the storage memory and an alternative storage device to which the memory controller also has access. This facility may be used to accommodate data overflow which may occur from a mass storage device which operates with variable blocksize. When data compression is employed in a storage device, the variability of the data blocksize as a function of the data characteristics results in the the exact storage capacity of the device being undefined and therefore the possibility of a device overflow as a result of a data block write exists. In such circumstances, the storage device controller may write the data block to a location on an alternative storage device and record this location in the block address table. The data block may be written in either compressed or uncompressed format. The block may be relocated to the main storage device when sufficient memory space later becomes free. This organisation is particularly suited to the use of a low cost magnetic disk as an overflow storage device for a high cost semiconductor solid state disk storage device. The magnetic disk may also function as a backup non- olatile storage device.

Data blocks may also be relocated between the storage device memory and the alternative storage device. This allows data which has been written to the alternative storage device to be restored to the storage memory. It also allows data to be interchanged between the two storage devices in accordance with a cacheing algorithm resident in the storage device controller. This allows a high cost solid state disk to be configured as a high-speed cache memory to a much higher capacity, low cost, magnetic disk memory.

Figure 6 shows a block diagram of a data storage device which contains a hardware data compression facility and a storage management system for handling the variable blocksize resulting from data compression.

An interface bus 2 couples the system to a host computer and typically conforms to SCSI or other industry standards for peripheral device interfaces. Communications on the bus are controlled by the Host Controller (4) which contains a small buffer memory for tranfer of bursts of data on the bus. A data Storage device (6) may typically be either semiconductor memory as in the case of a solid state disk or a magnetic medium in the case of a magnetic disk. A data Controller (8) controls both the Data Storage device and all data transfers and manipulations within the system controller. It contains and executes the algorithms for data relocation and data swapping and performs all address translation and table management functions. The software required for these can be implemented in a straightforward manner by a man skilled in the art. It also contains a block buffer memory for data transferred from a data compressor 10 so that the blocksize of a compressed block may be determined prior to allocation of a physical address for the block. The data compressor (10) performs compression and decompression in real time on data which is transferred to it. Data transfers are routed through a data buffer (12), which can store several blocks of data, via data channels (14) and a DMA Controller (16). An interface Controller (11) provides access to alternative storage devices via another industry standard bus (20). Data transfers will typically occur in units of one block. Data is written from a host via bus 2, to the data storage device 6 via the DMA controller 16, the data buffer 12 and the data controller 8, and is read via the inverse path. Data is relocated within the Data Storage device by reading out to the data buffer 12 and then back to the storage device 6. Data in compressed format is swapped between the Data Storage device and an alternative storage device on bus 20 via the data controller 8, the DMA controller 16, the data buffer 12, and the interface controller 18 and is restored via the inverse path. Interface Controller (18) may operate with a fixed blocksize and may store data in compressed format with the addition of a pad. Data in uncompressed format is swapped by being routed from the data storage device to the bus 20 via the data compressor 10 and restored via the inverse path. Data is transferred directly between a host on Bus A and a backup storage device on Bus B via the host controller 4, the DMA controller 16, and the buffer 12 and the interface controller 18.

A high performance memory system will use semiconductor memory for the data storage device 6 and may take the form of a solid state disk. However, the host controller (4) need not operate on an I/O channel and may be coupled directly to the CPU bus of a host computer system. The storage system will normally incorporate a magnetic disk as a backup storage device to provide non-volatility for the stored data, and the system will be configured as shown in figure 7. A solid state disk 24 is connected to a host via bus A and to a backup storage device 26 (e.g. a magnetic disk) and to other storage devices 28 via a bus B. The cost of the additional magnetic disk is low relative to the cost of semiconductor memory. Because semiconductor data storage is relatively expensive in comparison with magnetic storage, optimised us of data compression is very desirable. This, however, lead to an increased probability of solid state disk overflow because the data compression efficiency actually achieved h not been sufficient to provide the logical capacity assumed when the system was first initialised. The configuration shown in figure 7 allows the system to continue operation i the event of a solid state disk overflow with only a small impact on performance. A magnetic disk 26 of capacity at least as large as the maximum logical capacity of the solid state disk 24 is connected as a backup storage device on bus to provide non-volatility when power is disconnected. If t physical capacity of the-solid state disk is filled before i stated logical capacity, the storage system controller may locate additional data blocks which are written by the host the magnetic disk. This results in a performance degradatio rather than a storage system failure, but will occur very infrequently. The frequency of occurrence is a direct function of the logical capacity to physical capacity ratio which is assumed for the solid state disk and it is therefor possible to trade off average input/output rate for the soli state disk against logical capacity.

One implementation of the management system operates with the storage system controller configuring the data storage device as a continuous data space. A hole is define as a gap in the data space which is not guaranteed to be lar enough to accommodate a data block of undefined exact size. head is defined as a gap in the data space which is guarante to be large enough to accommodate a data block of undefined exact size. A head may wrap around from above the highest address in the data space to below the lowest address. Read and write of data blocks is given priority by the controller and relocation of data blocks is run as a background task, which can run concurrently if the data bandwidth to the data storage device 6 on channel is greater than that to the host on bus 2.

The controller manages data write and date relocation according to the following algorithms.

Data Write Algorithm

If data block matches a hole with exact fit then

write data block to hole of exact fi update block and hole address tables else

write data block to smallest head update block and head address tables end

Data Relocation Algorithm

If number of holes > 0 then perform hole relocation else perform head relocation end Hole Relocation Routine

If data block following largest hole matches a hole with exact fit then move data block to hole of exact fit update block and hole address tabl else move data block to smallest head update block and head address tabl end

Head Relocation Routine

If number of heads > 1 then move data block following largest head to smallest head update block and head address tabl end

In a medium performance memory system such as a magne disk, the characteristics of the storage medium dictate a different choice of options for data storage, data write an data relocation but the same architecture for the system controller as shown in figure 6 can be applied.

The data storage is organised as partitioned data spa with variable blocksize. For magnetic disk, the partition arranged to be a track or cylinder on the disk and is fixed when the disk is formatted. Sectors need not be formatted within a track or cylinder, therefore each partition has capacity for an unknown number of data blocks of variable blocksize.

Each block is written within the partition with an associated header to identify its logical block number and t block address table need only identify the partition within which each logical block is located, that is, the block's track or cylinder number. A block is read by reading a partition until the target block is identified from its header. The access time is the time to locate the partition, the latency. For a magnetic disk, the average latency is hal a rotation of the disk which is the time to read 50% of a partition.

Data of variable blocksize is always written within a partition at the head of the data space. The controller selects the partition to which to allocate a block on the basis of the head size available in each partition. The operation of writing a block is combined with the operation o relocating blocks within the partition using the compact partition scheme. When a block is to be written in a partition, the complete data space from the partition is firs read into the data buffer and compacted to eliminate any hole which exist. The block being written .is then appended to the end of this data space, and the compacted partition written back to the data storage device.

A hole address table for each partition is maintained s that obsolete blocks may be identified and eliminated when compaction is performed in the data buffer. When a block is written which has previously been located in another partition, the logical number of the block is entered in the hole address table for that other partition.

For conventional operation of a magnetic disk with fixed blocksize, the typical latency is half a rotation of the disk, which is the time to read or write half a track. The variable blocksize management scheme requires two rotations of the disk to perform the compaction of the partition. For a typical disk, average seek time for a write is 14mS and averge latency is 6mS, giving a typical write time of 20mS. For the variable blocksize management scheme the typical write time is 14mS + (2xl2)mS = 38mS. For a typical read to write ratio of 4 to 1, the average increase in disk access time for the variable blocksize management scheme is 18%.

The variable blocksize storage management techniques support incorporation of a data compression facility within a data storage device in a manner which is transparent to the host system.

The system architecture allows use of a magnetic disk as an overflow device for a solid state disk memory with data compression to eliminate system errors resulting from the undefined capacity of the compressed data storage. Compromises can be made between solid state disk capacity and average input/output time. The storage management enables introduction of data compression techniques to magnetic disk at the expense of only a modest increase in device access time.

Claims

1. A data storage management system for storing logical blocks of data in a main memory device comprising means for compressing the logical blocks of data to produce compressed blocks having varying sizes, means for storing the blocks at physical locations in free memory space in the memory device, and a supplementary memory device for storage of logical blocks of data whose size exceeds the largest area of free memory space available in the main memory device.

2. A data storage management system according to- claim 1 including means for relocating blocks stored in the supplementary memory device in the main memory when sufficient free memory space becomes available.

3. A data storage management system according to claim 1 or 2 in which the main memory device comprises a solid state memory.

4. A data storage management system according to claim 1, 2, or 3 in which the supplementary memory device comprises a magnetic disc.

5. A data storage management system according to any preceding claim in which the storing means is arranged to write a data block to a physical address above which is an area of free memory space large enough to store a data block of maximum possible size.

6. A data storage management system according to any preceding claim including means for storing a list of the physical addresses of all areas of free memory space and their sizes.

7. A data storage management system according to claim 6 in which the storing means stores a logical block of data in an area of free memory space of identical physical size to the physical size of the logical block.

8. A data storage management system according to claim 6 in which the storing means stores a logical block of data in an area of free memory space whose physical size is closest to that of the logical block.

9. A data storage management system according to any preceding claim including means for relocating stored blocks of data thereby reducing fragmentation of free memory space.

10. A data storage management system according to any preceding claim including means for compacting logical blocks stored within a partition in the memory device thereby reducing fragmentation of free memory space in that partition.

11. A data storage management system according to claim 10 in which the compacting means comprises means for relocating all blocks stored in a partition to a buffer memory, and means for re-storing the logical blocks sequentially from the start of the partition.

12.- A data storage management system according to any preceding claim in which the storing means stores logical blocks of data in partitions within the memory along with an identifying header and stores a partition address for that logical block in a block address table.

13. A data storage management system according to claim 12 including means responsive to a logical block address to read a partition address from the block address table, means for searching a partition for a header identifying a logical block, and means for reading that logical block.

14. A data storage management system for storing logical blocks of data of variable size comprising a memory device divided into a plurality of partitions, means for storing a logical block of data in a partition along with an identifying header, and means for storing a partition address, corresponding to the partition in which the logical block is stored, in a block address table.

15. A data storage management system according to claim 15 including means responsive to the logical block address to read a partition address from a block address table, means for searching a partition for a header identifying the logical block, and means for reading the logical block.

16. A data storage managmenet system according to claim 14 or 15 including, a hole address table for each partition storing the physical addresses of all areas of free memory space within a respective partition.

17. A data storage management system according to claim 14, 15, or 16 including means for compacting all blocks of data stored within a partition prior to writing a new block of data into the partition.

18. A data storage management system for a memory device comprising input means receiving logical blocks of data, means for compressing the data in the blocks thereby altering the block sizes of the received logical blocks, means for storing the compressed blocks at physical locations in the memory device such that the fragmentation of free memory space is minimised, and means for translating logical addresses of stored blocks to physical addresses in the memory device in response to a request to access a logical block.

19. A data storage management system according to claim 18 including means for storing a list of the physical addresses of all areas of free memory space and their sizes.

20. A data storage management system according to claim 19 in which the storing means stores a logical block of data in an area of free memory space of identical physical size to the physical size of the logical block.

21. A data storage management system according to claim 19 in which the storing means stores a logical block of data in an area of free memory space whose physical size is closest to that of the logical block.

22. A data storage management system according to any one of claims 18 to 21 including means for relocating stored blocks of data.

23. A data storage management system according to any of claims 18 to 22 including means for compacting logical blocks of data stored within a partition in the memory device.

24. A data storage management system according to claim 23 in which the compacting means comprises means for relocating all blocks stored in a partition to a memory buffer and means for re-storing the logical blocks sequentially from the start of the partition.

25. A data storage management system substantially as herein described with reference to the accompanying drawings.