WO2008007348A1 - A data storage system - Google Patents

A data storage system Download PDF

Info

Publication number
WO2008007348A1
WO2008007348A1 PCT/IE2007/000067 IE2007000067W WO2008007348A1 WO 2008007348 A1 WO2008007348 A1 WO 2008007348A1 IE 2007000067 W IE2007000067 W IE 2007000067W WO 2008007348 A1 WO2008007348 A1 WO 2008007348A1
Authority
WO
WIPO (PCT)
Prior art keywords
storage system
mapping
data
data storage
mapping manager
Prior art date
Application number
PCT/IE2007/000067
Other languages
French (fr)
Inventor
William Opperman
Jerome Kelleher
Original Assignee
Mpstor Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to IE2006/0510 priority Critical
Priority to IE20060510 priority
Application filed by Mpstor Limited filed Critical Mpstor Limited
Publication of WO2008007348A1 publication Critical patent/WO2008007348A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from or digital output to record carriers, e.g. RAID, emulated record carriers, networked record carriers
    • G06F3/0601Dedicated interfaces to storage systems
    • G06F3/0628Dedicated interfaces to storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0644Management of space entities, e.g. partitions, extents, pools
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from or digital output to record carriers, e.g. RAID, emulated record carriers, networked record carriers
    • G06F3/0601Dedicated interfaces to storage systems
    • G06F3/0602Dedicated interfaces to storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • G06F3/0605Improving or facilitating administration, e.g. storage management by facilitating the interaction with a user or administrator
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from or digital output to record carriers, e.g. RAID, emulated record carriers, networked record carriers
    • G06F3/0601Dedicated interfaces to storage systems
    • G06F3/0602Dedicated interfaces to storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0616Improving the reliability of storage systems in relation to life time, e.g. increasing Mean Time Between Failures [MTBF]
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from or digital output to record carriers, e.g. RAID, emulated record carriers, networked record carriers
    • G06F3/0601Dedicated interfaces to storage systems
    • G06F3/0628Dedicated interfaces to storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0647Migration mechanisms
    • G06F3/0649Lifecycle management
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from or digital output to record carriers, e.g. RAID, emulated record carriers, networked record carriers
    • G06F3/0601Dedicated interfaces to storage systems
    • G06F3/0668Dedicated interfaces to storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Abstract

A data storage system controls storage of data using tiered logical volumes. A mapping manager automatically re-maps data between logical volumes and physical media addresses to optimise use of disk resources, by pre-emptively re-mapping data according to predicted access to the data to reduce risk of a disk exceeding a duty cycle limit. The mapping manager predicts based on statistical time series analysis of historical hit count values for logical volume chunks by maintaining over a time period a data access hit count for each logical volume chunk, and for periodically sampling the counts. The mapping manager minimises the differences between current and predicted mappings before performing a migration.

Description

"A Data Storage System"

INTRODUCTION

Field of the Invention

The invention relates to data storage for example, for servers linked with storage area networks (SANs).

Prior Art Discussion

In such data storage, the servers access the physical storage through the SAN, which allows sharing of storage space, centralized management, and redundancy.

At present, emerging fields are High Density Storage (HDS) and Object Storage Disks (OSD). These developments have arisen in order to address the current storage requirements, including for example converging media such as digital broadcast, file data, video, music, voice mail and archiving. Data is in many cases stored in a tiered storage disk array, into which disks are aggregated into logical tiers. Each logical tier is made up of disks of a particular class. Each class has a particular price point and functionality level quality of service (QoS). QoS designates the overall performance of a drive. Drives used in top tiers have a high QoS and drives in the lower tiers have a low QoS.

Enterprises data tends to grow at a very substantial rate. The value of this data scales in value from low (such as archive, infrequently used) to high (such as mission- critical, frequently used). The value of this data to an enterprise is not equal and therefore a strategy of different drive types with differing costs to store this data is required in order to optimise media costs.

Drive prices are generally based on a QoS level, drives with high QoS (Enterprise drives) being expensive and drives with lower QoS being less expensive (e.g. desktop drives). The QoS of a disk is related to its overall performance, a critical parameter being duty cycle, or the ratio of time the disk heads are moving relative to the time the disks are powered on. For example Desktop drives are specified to work during a normal workday with a duty cycle of about 10%. Enterprise drives are specified to work 24/7 with a duty cycle of about 60%.

Enterprise class drives typically have a rating of 60%-80% duty cycle and a MTBF of 1.2Million hours. Midline class drives have a rating of 15%-25% duty cycle and a MTBF of 500K to 1.2Million hours.

Desktop/Mobile class drives have a rating of 5%-10% duty cycle, and a MTBF 300K hours or Warranty notice.

An optimal strategy for storage of any given data flow is that frequently, accessed data should be stored on high QoS disks, and low/medium accessed data should be stored on low QoS disks.

Several methods of implementing tiered storage are possible. For example, tiered storage may be managed at a file system level where files are individually tagged with importance levels (either through manual tagging or access counts), and these files are then stored on the appropriate medium. There are disadvantages to this approach, including portability and granularity. If tiered storage is to be implemented at a file system level, then the tiered storage server must implement each file system type and modify each file system explicitly to support the tiered storage functionality. The granularity problem arises when large files are required, since a file is associated with only one importance level. Some blocks of a large file may be accessed very often while other blocks are rarely accessed. Since all blocks of the file are stored on media of the same quality there is an inevitable mismatch between the value of certain blocks of data and the quality of the physical medium it is stored on.

WO2005/017737 describes a RAID system having dynamic allocation of data across a pool of storage. US2004/0024854 describes a method of managing a storage area networks by first and second three-tier management systems.

The invention is directed towards providing data storage management for improved utilisation of disk resources.

SUMMARY OF THE INVENTION

According to the invention, there is provided a data storage system for controlling storage of data using tiered logical volumes, the system comprising a mapping manager for automatically re-mapping data between logical volumes and physical media addresses to optimise use of physical media resources, wherein the mapping manager comprises means for pre-emptively re-mapping data according to predicted access to the data to reduce risk of a physical medium exceeding a duty cycle limit.

In one embodiment, the mapping manager predicts based on statistical time series analysis of historical hit count values for logical volume chunks.

In one embodiment, the mapping manager performs prediction by maintaining over a time period a data access hit count for each logical volume chunk, and for periodically sampling the counts.

hi one embodiment, the mapping manager re-maps data according to predicted data access hit count, and to minimise the differences between current and predicted mappings.

In one embodiment, the mapping manager resolves predicted QoS capacity deficits by swapping chunks of high predicted usage with chunks of low predicted usage and iteratively repeats these steps until all QoS deficits have been resolved.

In another embodiment, the mapping manager executes tie-breaking algorithms according to configurable conditions to choose between alternative re-mappings. In one embodiment, configurable conditions include:

QoS (quality-of-service) top-down or bottom-up order of re-mapping, and

equalising QoS deficits across all disk resources.

In one embodiment, the mapping manager generates a set of predicted usage values for each physical media unit and manages mappings so that the sum of predicted values in each set is no more than a predefined value representing a QoS-derived limit for each unit of physical medium.

In one embodiment, each unit of physical media is a Raid disk or disk array.

In one embodiment, the mapping manager swaps large values from sets that exceed their required sum with small values from sets that are less than a required sum.

In one embodiment, the mapping manager operates iteratively until capacity defects are resolved.

In a further embodiment, the mapping manager migrates data on a chunk basis, in which a chunk is a unit of space in the logical volume and a unit of space on the physical media.

In one embodiment, the mapping manager automatically generates a predictor function for each chunk.

In one embodiment, the mapping manager uses predictor functions for contiguous chunks as a basis for developing a predictor function of a particular chunk.

In one embodiment, the chunk size is in the range of 16MB to 128MB. In one embodiment, the system further comprises a client interface for performing client-initiated queries to the mapping manager.

In one embodiment, the client interface is programmed with an API protocol.

In one embodiment, the mapping manager comprises means for responding to a client interface mapping query with a current mapping.

In one embodiment, the client interface comprises means for generating the mapping query with a logical volume identifier, a start address specification, and an end address specification.

In one embodiment, the mapping manager comprises means for responding to a mapping query with a logical start address, logical end address, and a logical volume identifier for a contiguous segment of logical volume addresses.

In one embodiment, the client interface comprises means for generating a statistics query and the mapping manager comprises means for responding to such a query with the data access statistics stored for a segment of a logical volume.

In one embodiment, the mapping manager comprises means for including in a statistics query response a data access count for a particular logical volume address range over a period of time.

In one embodiment, the client interface comprises means for generating a mapping management command and the mapping manager comprises means for responding to such a command by performing re-mapping according to criteria in the command.

In one embodiment, the mapping manager comprises means for performing a verification of a migration decision before performing a migration. In one embodiment, the mapping manager monitors data access during a probationary period and compares the monitored data accesses with the predicted data accesses to perform the verification.

In another aspect, the invention provides a computer readable medium comprising software code for performing operations of any data storage system defined above when executing on a digital processor.

DETAILED DESCRIPTION OF THE INVENTION

Brief Description of the Drawings

The invention will be more clearly understood from the following description of some embodiments thereof, given by way of example only with reference to the accompanying drawings in which:-

Fig. 1 illustrates a mechanism by which logical block addresses are mapped to addresses on physical media in a tiered logical volume manager; and

Figs. 2 and 3 illustrate an algorithm for resolving predicted QoS capacity deficits in the tiered logical volume manager.

Description of the Embodiments

Referring to Fig. 1, a self-organising storage (SOS) system manages data storage in a tiered storage disk array. Within the array, disks are organised into Raids (Raid 1 and Raid 2) and data is striped across the Raids. The Raids are built exclusively with disks from a particular tier of the tiered storage array. Tiered Raids inherit the QoS of its member disks and as such the data flow to the Raid must be kept within the operational limits of the member disks.

The data flow within a system is composed of the following flows • Host data flow o Read data from cache o Write data to cache o Read data from disk o Write data to disk • Internal data flow o Migration of data from one Raid to another (during an SOS operation) o Replication of data (copying data to/from a remote system) o Rebuild data (rebuilding the parity of a Raid)

All of these data flows contribute to the Raid duty cycle and hence drive duty cycle. The overall dataflows within the system must be measured and managed so that the system operates reliably.

Read and Write requests from a host server can only be sent to a Logical Volume (LV) device within the SOS system. This LV appears as a disk volume to the host server. A tiered logical volume is a logical volume mapped to (TLV) Raids in different tiers of a disk array. As shown in Fig. 1, a TLV appears as one contiguous disk space to the host, and this logical disk is mapped to space on a number of Raids, Raid 1 and Raid 2. The Raids used depend on the QoS required by the user for the volume. A volume that is a mission critical volume could be configured to use storage space from only the highest QoS Raid of the array, while a volume with mixed data may be configured to use space from several Raids. In this embodiment, Raid 1 has a very high QoS, for mission critical data, whereas Raid 2 is for archive data. In practice, there will typically be many more Raids, only two being shown here for clarity.

The SOS system has a mapping manager which pre-emptively re-maps LV chunks to Raid chunks according to predicted data use. A factor in the prediction is the number of accesses to the data on the physical Raid chunks. These are quantified as hit values. The total number of times a Raid is accessed must not exceed a predefined limit, and this limit is derived from its disk duty cycle characteristics. The mapping manager enforces the restriction that a disk does not exceed its allowed duty cycle by enforcing the corresponding restriction that the total number of accesses on a Raid does not exceed its hit limit in a given time period.

Each logical volume chunk must map to a Raid chunk, and these mappings are dynamically changed in a three-step process:

1. Based on past experience, predict the number of hits on each logical volume chunk that will occur over the next time period.

2. Using these predicted hit counts, compute a new mapping between logical volume chunks and Raid chunks such that (a) the total number of predicted hits on any Raid does not exceed its designated hit limit, and (b) the difference between the new computed mapping and the current mapping is minimised.

3. Schedule migration operations for any chunks that are moved to a different Raid under the new mapping to minimise disruption of incoming IO operations.

It is not sufficient to use the number of data access "hits" per logical volume chunk in time period t to predict the number of hits on a particular segment in time period t + 1. A migration strategy based on this information would inevitably be myopic, resulting in undesirable behaviour such as migration thrashing, since data access patterns do not tend to be constant but rather follow statistical patterns. For example, a particular logical volume segment may be accessed heavily every third day and not at all during the intervening two days. Adopting a reactive migration policy, in which segments are migrated based on the previous day's usage, would in this instance result in the segment being located on low QoS Raids during days of high usage and the segment being located on high QoS Raids for one of the low usage days. This behaviour is clearly undesirable, and therefore the invention provides an intelligent mapping manager to discover statistical patterns in segment usage in order to predict periods of high and low usage and pre-emptively migrate data to the appropriate physical media. To continue the example above, the system would pre-emptively migrate the chunk in question to a high QoS Raid before its days of high usage and to a low QoS Raid for the days of low usage. The example above, however, is simple, serving to illustrate the basic concept of predictive migration in contrast to reactive migration. One fundamental assumption underlies the possibility of predictive migration: access patterns tend to follow regular, discoverable, statistical patterns. For instance, logical volume chunks may experience:

• Constant hit rates

• Linearly increasing/decreasing hit rates

• Exponentially increasing/decreasing hit rates

• Periodically varying hit rates

Using time-series analysis the system can accurately predict the number of hits on each chunk for the forthcoming time period. This data is the primary input for a chunk mapping generator algorithm, which pre-emptively migrates data to the appropriate medium, based on the forecasted hit rate.

To predict the number of accesses to a particular logical volume chunk during time period t + 1 we analyse the access values for that chunk to derive a function f. This function is computed by performing a time series analysis on the historical data to derive any trends from the data. To take a simplified example, suppose the access values for logical volume chunk over the time periods [1, 2, 3, 4, 5, 6] were [100,110,120,130,140,150]. In order to predict the usage count over time period 7 we must derive some function in parameter t that will return the predicted usage count. Clearly, in this example, the function that we require is f(t) = 100 + 10 * (t - l),

and so we can predict that the usage for time period 7 will be 160. This example is contrived, but serves to illustrate the basic point that we are using historical values to generate a predictor function, and this function is then used to generate the predicted usage value for the time period we are interested in. The methods used to generate the predictor function are well-known statistical techniques.

There is one prediction per LV chunk at any time. Contiguous LV chunks are likely to have similar predictor functions by the principle of locality of reference. Therefore the task of generating predictor functions may use a previous function as a starting point. Fig 1 demonstrates the basic scheme used for mapping logical volume addresses to physical Raid addresses. The logical volume address space is broken into 32MB chunks, each of which is mapped to a corresponding 32MB chunk on a Raid. For example, addresses within the 32MB range covered by IvI chunkl are mapped to corresponding addresses within the raidl chunk 1; similarly, the logical block addresses within IvI chunk 3 are mapped to raid2 chunk 1. The system has a mapping manager which manipulates these mappings between logical volume address ranges and Raid address ranges and obtains useful statistical data about each logical volume address chunk. Furthermore, the manager using these statistics to pre-emptively migrate data to the most appropriate physical medium, improving reliability, performance, and reducing the cost of disk array storage systems.

The chunk size may more generally, be in the range of 16MB to 128MB.

To provide the information required for a chunk usage prediction algorithm, detailed statistical information must be stored on a chunk basis. This is achieved by maintaining an access or hit counter for each address chunk and by incrementing this counter each time an address within this chunk is accessed. The mapping manager then periodically samples these counters and updates and stores the updated values over a certain time period. For example, the counters may be sampled and stored on a daily basis, providing a dataset that may be mined to predict the usage for a particular block. The accesses are monitored as they are performed on the physical Raid disks. By monitoring physical accesses rather than host accesses the mapping manager monitors the true number of accesses without having to account for the effects of pos- LV caching.

The system also comprises a client interface API that allows clients to query the current state of the address mapping, to obtain statistical information on address chunks, and to remap/migrate chunks of data from tier to tier at will. This API logically comprises three operations, mapping query, statistics query, and mapping management. Mapping Query

To allow client applications to determine the current state of the mapping tables for a logical volume, the API provides a mapping query operation. Since many of the contiguous address segments on logical volumes reside on the same raid, the queries are structured using the start_address to end_address paradigm. In this way the system minimises the bandwidth required to transmit the mapping information to the client. Using Fig. 1 as an example, the client may perform the following mapping query: mapping_query(lvl, 1, 6) = {(l,2,raid 1), (3,4, raid 2), (5,6,raid 1)}

The mapping_query operation takes three parameters and returns a set of three-tuples. The first parameter the mapping_query is the logical volume identifier; the second parameter is the start address specification; and the third parameter is the end address specification. The set of three tuples returned describe the current state of the mapping tables in a concise format. Each tuple in the returned set describes the start address, end address and raid identifier for a contiguous segment of logical volume addresses. Using this paradigm, the client can derive the complete state of the mapping tables for a particular logical volume, or some subset of the address ranges within a logical volume.

Statistics Query

Querying the statistical information derived for a particular logical block address range is very advantageous as it allows the system to obtain the raw information required to generate a predicted value for the usage of that segment over time. The information is obtained by issuing a stats_query command for a particular chunk of a particular logical volume, e.g., stats_query(lvl, 1) = [0,0,0,1,3,4,5]

The return value from a stats_query operation is an ordered list of usage count samples over specific time periods, in reducing temporal granularity. This data encapsulates the usage count for a particular logical volume address range over a period of time, and provides the time-series data vital for the mapping generation algorithm. Mapping Management

Once the client has obtained the current snapshot of the state of the address mapping table for a logical volume and computed the state of the desired mapping table according to its own criteria, this new mapping can be implemented by issuing a series of migrate commands. A migrate command is simply a command to move an address range to a different physical medium. For example, we may move chunks 5 and 6 to raid 2 in Figure 1 using the following command: migrate(lvl, 5, 6, raid2).

The first parameter to the migrate operation logical volume identifier, the second parameter is the start address specification, the third is the end address specification and the final parameter is the target raid identifier. This primitive operation provides excellent flexibility and allows the client, for example, to entirely decommission a Raid set by moving all logical block addresses from one Raid onto another.

Remapping Algorithm

Returning to operation of the mapping manager itself, the remapping algorithm computes a new mapping between logical volume and Raid chunks such that the total predicted hit count for each Raid does not exceed its specified hit-limit and the difference between the new and old mappings is minimised. By minimising the difference between the new and old mappings the system can fulfil important optimisation criteria. Minimising the number of migrations that need to be undertaken to fulfil the QoS requirements conserves SAN bandwidth and disk usage as well as promoting the stability of the mapping. Once a mapping meeting these requirements has been computed, the migration manager schedules the migration tasks intelligently, during periods of low data, access rates, ensuring that normal operation of the system is not disrupted by the migration tasks.

The remapping algorithm takes as input the set of predicted hit values for each logical volume chunk, the current mapping between logical volume and Raid chunks, and the hit limits for each individual Raid. The output of the algorithm is a new mapping between logical volume and Raid chunks that is guaranteed to meet the QoS requirements for each Raid and to do so with the minimum number of migrated blocks. There may be many possible mappings that meet these criteria and so tie- breaking heuristics are used to make intelligent (and configurable) choices between alternative mappings. For example a policy may specify that higher QoS Raids should be filled first and low hit-rate chunks migrated as more high QoS space is required (Top Down), or alternatively, segments should begin on low QoS Raids and be migrated upwards as demand increases (Bottom Up).

In the case that there is no mapping satisfying the QoS requirements of all Raids in the system there is also a range of heuristic policies that can be chosen from. For example, the operator may wish to specify the policy that particular logical volumes should be brought to within the correct QoS limits at the further expense of other logical volumes. Alternatively, the operator may wish to specify that the QoS costs should be borne equally among all Raids, ensuring that the excessive wear is distributed equally among all disks in the system. Many different heuristic policies for preferential chunk mappings are possible, and a general mechanism for supplying such heuristics is an integral part of the remapping algorithm.

There is a set of predicted usage values for a full logical volume. The set is broken into subsets, in which there is one subset per Raid. The composition of each subset is dynamically modified as necessary to ensure that a QoS threshold per Raid is not exceeded. Each predicted usage value is a hit rate.

The problem of generating a mapping matching the requirements outlined above can be formulated mathematically as follows. A partition of a set A is an expression of A as a union of nonempty, disjoint subsets. Thus, a mapping between LV chunks and Raid chunks is some partition of the set of logical volume chunks, each subset corresponding to a particular Raid. If P is the set (or more precisely, multiset) of predicted hit values for logical volume chunks, then the required output of the mapping algorithm is a partition of P such that there are exactly k subsets and the sum of each subset does not exceed rnj, where kis the number of Raids allocated and jnj is the maximum number of hits for Raid 7. More formally, let P- {pi, ...,pn} be a multiset of n nonnegative integers and let R (Ri, ...,RkJ be a set of A: multisets of nonnegative integers such that

Figure imgf000016_0001

Furthermore, let m - {mi, ...,nik} be a multiset of positive integers. The problem of mapping logical volume to Raid is then transformed into the following. Given R = {Ri, ...,Rk} and m = {mi, ...,πik}, compute R = {i?j,...,i^.} such that

.

Figure imgf000016_0002

Any set R meeting these requirements is guaranteed to meet the required QoS criteria.

As an example, let P= {1,1,4,5,6, 7,8} be the set of predicted LV chunk demand values for the forthcoming time period. Then, suppose that under the current mapping between logical volume and Raid chunks there is the following distribution of predicted demand values on Raids

^ = {1,8} i?2 = {4,7} R3 = {1,5,6}

and suppose, furthermore, that the maximum demand values for these Raids are given by mi = 3, M2 = 10 and ms = 20. Since the sum of Raids 1 and 2 exceed these maxima, the manager must compute a new mapping from logical volume to Raid chunks to prevent the forecasted violation of the designated usage restrictions for these Raids. The multisets

Figure imgf000016_0003

meet the criteria above (i.e. the size of each new set is the same as the original, and the sum of the set R\ is no more than its usage limit, mj). The system can also model the requirement that the number of migration operations is minimised as follows. If there is no set R such that

R1 R: and ^Tr < rtij for 1 < j < k r≡R and

Figure imgf000017_0001

we are assured that the mapping corresponding to the solution R is globally optimal in terms of the number of migration operations. Since the current mapping is given by the set of multisets R we can quantify the number of migration operations that will be needed to implement the new mapping by considering the sum of the cardinalities of the intersection of each set Rj from the original mapping with the set R\ from the newly computed mapping. The larger the intersections of these sets, the more the sets have in common and hence the fewer migration operations required to implement the corresponding mapping.

The algorithm operates by swapping large values from sets that exceed their required sum with small values from sets that are less than the required sum, and iteratively applies this rule until a solution is found. This is illustrated in Fig. 2, where there are three sets of integers Rl, R2 and R3, corresponding to Raids 1, 2 and 3. For each Raid set there is a specified limit, ml, m2 and m3 respectively. The integers here refer to the predicted hit counts for particular LV chunks. Thus, we have predicted that two chunks on Raid Rl will receive five hits, two segments will receive three hits and one segment will receive only one hit. Summing these hit counts for each Raid we can see that Raid Rl has a forecasted QoS capacity surplus and R3 has a forecasted QoS capacity deficit. This means that, under the current mapping between LV and Raid chunks we have predicted that the disks in Raid 3 will exceed their designated QoS access limit. We have also predicted, however, that Raid Rl will be well within its operational limits. We can therefore resolve the capacity deficit in R3 by swapping the block of size 5 in R3 with the block of size 1 in Rl . The result of this operation is shown in Fig. 3, where all Raids are within their designated QoS capacity limits.

The block remapping algorithm operates by iteratively applying this step until all of the capacity deficits have been resolved. The distance of any particular solution from the optimum may also be quantified, and so the system can also terminate iteration once a solution of the required quality has been found. In the case that the total capacity surplus is less than the total capacity deficit, no solution exists, and so more space on the high quality Raids must be allocated. The means by which this is obtained is a user-specifiable policy.

Once the remapping algorithm has computed the new LV chunk to Raid mapping, this new mapping is passed to a migration function of the mapping manager. The migration function ensures that the normal operation of the system is not affected by the process of data migration by monitoring system activity and intelligently scheduling migrations to and from nearby Raids to minimise SAN bandwidth and disk utilisation.

It will be appreciated that the invention provides for optimum use of available disk resources, giving improved reliability and lower cost.

Another important advantage of the invention is the ability to perform preemptive migration of data to the most appropriate medium for forecasted usage. Data access frequencies are rarely constant but instead follow statistical patterns which may be predicted by analysing block access patterns. Using these predictions the system can then ensure that blocks of data reside on the most appropriate physical medium and optimal usage of storage resources is obtained.

The block-based tiered storage management described above has many advantages. Data is automatically analysed at the block level to determine its value and hence the most appropriate storage medium. By exporting raw logical volumes over the storage area network rather than file systems, the storage management is decoupled from file system issues. Furthermore, since block-level access patterns are analysed without knowledge of file structure the granularity problem discussed above is also solved. Data can be analysed at any level of granularity desired to ensure optimal usage of storage at the various price points and levels of functionality.

The invention is not limited to the embodiments described but may be varied in construction and detail. For example, in another embodiment once the mapping manager makes a migration decision, it may provisionally change the mappings without causing a migration of the actual dates. The manager delays the physical migration for a probationary time period during which the counts are monitored, and the prediction is verified. The migration then occurs only if the prediction is verified.

Claims

Claims
1. A data storage system for controlling storage of data using tiered logical volumes, the system comprising a mapping manager for automatically remapping data between logical volumes and physical media addresses to optimise use of physical media resources, wherein the mapping manager comprises means for pre-emptively re-mapping data according to predicted access to the data to reduce risk of a physical medium exceeding a duty cycle limit.
2. A data storage system as claimed in claim 1, wherein the mapping manager predicts based on statistical time series analysis of historical hit count values for logical volume chunks.
3. A data storage system as claimed in claim 2, wherein the mapping manager performs prediction by maintaining over a time period a data access hit count for each logical volume chunk, and for periodically sampling the counts.
4. A data storage system as claimed in any preceding claim, wherein the mapping manager re-maps data according to predicted data access hit count, and to minimise the differences between current and predicted mappings.
5. A data storage system as claimed in any preceding claim, wherein the mapping manager resolves predicted QoS capacity deficits by swapping chunks of high predicted usage with chunks of low predicted usage and iteratively repeats these steps until all QoS deficits have been resolved.
6. A data storage system as claimed in claim 5, wherein the mapping manager executes tie-breaking algorithms according to configurable conditions to choose between alternative re-mappings.
7. A data storage system as claimed in claim 6, wherein configurable conditions include: QoS top-down or bottom-up order of re-mapping, and
equalising QoS deficits across all disk resources.
8. A data storage system as claimed in any of claims 5 to 7, wherein the mapping manager generates a set of predicted usage values for each physical media unit and manages mappings so that the sum of predicted values in each set is no more than a predefined value representing a QoS-derived limit for each unit of physical medium.
9. A data storage system as claimed in claim 8, wherein each unit of physical media is a Raid disk or disk array.
10. A data storage system as claimed in claims 8 or 9, wherein the mapping manager swaps large values from sets that exceed their required sum with small values from sets that are less than a required sum.
11. A data storage system as claimed in claim 10, wherein the mapping manager operates iteratively until capacity defects are resolved.
12. A data storage system as claimed in any preceding claim, wherein the mapping manager migrates data on a chunk basis, in which a chunk is a unit of space in the logical volume and a unit of space on the physical media..
13. A data storage system as claimed in any of claims 2 to 12, wherein the mapping manager automatically generates a predictor function for each chunk.
14. A data storage system as claimed in claim 13, wherein the mapping manager uses predictor functions for contiguous chunks as a basis for developing a predictor function of a particular chunk.
15. A data storage system as claimed in any of claims 2 to 14, wherein the chunk size is in the range of 16MB to 128MB.
16. A data storage system as claimed in any preceding claim, further comprising a client interface for performing client-initiated queries to the mapping manager.
17. A data storage system as claimed in claim 16 wherein the client interface is programmed with an API protocol.
18. A data storage system as claimed in either of claims 16 or 17, wherein the mapping manager comprises means for responding to a client interface mapping query with a current mapping.
19. A data storage system as claimed in claim 18, wherein the client interface comprises means for generating the mapping query with a logical volume identifier, a start address specification, and an end address specification.
20. A data storage system as claimed in either of claims 18 or 19, wherein the mapping manager comprises means for responding to a mapping query with a logical start address, logical end address, and a logical volume identifier for a contiguous segment of logical volume addresses.
21. A data storage system as claimed in any of claims 16 to 20, wherein the client interface comprises means for generating a statistics query and the mapping manager comprises means for responding to such a query with the data access statistics stored for a segment of a logical volume.
22. A data storage system as claimed in claim 21, wherein the mapping manager comprises means for including in a statistics query response a data access count for a particular logical volume address range over a period of time.
23. A data storage system as claimed in any of claims 16 to 22, wherein the client interface comprises means for generating a mapping management command and the mapping manager comprises means for responding to such a command by performing re-mapping according to criteria in the command.
24. A data storage system as claimed in any preceding claim, wherein the mapping manager comprises means for performing a verification of a migration decision before performing a migration.
25. A data storage system as claimed in claim 24, wherein the mapping manager monitors data access during a probationary period and compares the monitored data accesses with the predicted data accesses to perform the verification.
26. A data storage system substantially as described with reference to the accompanying drawings.
27. A computer readable medium comprising software code for performing operations of a data storage system of any preceding claim when executing on a digital processor.
PCT/IE2007/000067 2006-07-12 2007-07-11 A data storage system WO2008007348A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
IE2006/0510 2006-07-12
IE20060510 2006-07-12

Publications (1)

Publication Number Publication Date
WO2008007348A1 true WO2008007348A1 (en) 2008-01-17

Family

ID=38521682

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IE2007/000067 WO2008007348A1 (en) 2006-07-12 2007-07-11 A data storage system

Country Status (1)

Country Link
WO (1) WO2008007348A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102221981A (en) * 2010-04-19 2011-10-19 株式会社日立制作所 Method and apparatus to manage tier information
EP2417524A1 (en) * 2009-04-10 2012-02-15 Kaminario Tehnologies Ltd. A mass-storage system utilizing auxiliary solid-state storage subsystem
GB2497172A (en) * 2011-11-14 2013-06-05 Ibm Reserving space on a storage device for new data based on predicted changes in access frequencies of storage devices
EP2811410A1 (en) * 2012-12-21 2014-12-10 Huawei Technologies Co., Ltd. Monitoring record management method and device
US9372630B2 (en) 2014-07-09 2016-06-21 International Business Machines Corporation Migration of newly allocated data to a storage tier
US9703500B2 (en) 2012-04-25 2017-07-11 International Business Machines Corporation Reducing power consumption by migration of data within a tiered storage system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6487562B1 (en) * 1999-12-20 2002-11-26 Emc Corporation Dynamically modifying system parameters in data storage system
WO2005017737A2 (en) * 2003-08-14 2005-02-24 Compellent Technologies Virtual disk drive system and method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6487562B1 (en) * 1999-12-20 2002-11-26 Emc Corporation Dynamically modifying system parameters in data storage system
WO2005017737A2 (en) * 2003-08-14 2005-02-24 Compellent Technologies Virtual disk drive system and method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LIN QIAO ET AL: "PULSATINGSTORE: An Analytic Framework for Automated Storage Management" DATA ENGINEERING WORKSHOPS, 2005. 21ST INTERNATIONAL CONFERENCE ON TOKYO, JAPAN 05-08 APRIL 2005, PISCATAWAY, NJ, USA,IEEE, 5 April 2005 (2005-04-05), pages 1213-1213, XP010924120 ISBN: 0-7695-2657-8 *
MASSIGLIA P: "THE RAIDBOOK, CHAPTER 11: DYNAMIC DATA MAPPING" THE RAIDBOOK. A SOURCE BOOK FOR RAID TECHNOLOGY, February 1997 (1997-02), pages 197-208, XP002296549 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2417524A1 (en) * 2009-04-10 2012-02-15 Kaminario Tehnologies Ltd. A mass-storage system utilizing auxiliary solid-state storage subsystem
EP2417524A4 (en) * 2009-04-10 2013-03-06 Kaminario Tehnologies Ltd A mass-storage system utilizing auxiliary solid-state storage subsystem
US9753668B2 (en) 2010-04-19 2017-09-05 Hitachi, Ltd. Method and apparatus to manage tier information
EP2378410A3 (en) * 2010-04-19 2012-05-02 Hitachi Ltd. Method and apparatus to manage tier information
US8677093B2 (en) 2010-04-19 2014-03-18 Hitachi, Ltd. Method and apparatus to manage tier information
CN102221981A (en) * 2010-04-19 2011-10-19 株式会社日立制作所 Method and apparatus to manage tier information
GB2497172A (en) * 2011-11-14 2013-06-05 Ibm Reserving space on a storage device for new data based on predicted changes in access frequencies of storage devices
GB2497172B (en) * 2011-11-14 2014-01-01 Ibm Storage reservation apparatus
US9928164B2 (en) 2011-11-14 2018-03-27 International Business Machines Corporation Information processing apparatus
US10049034B2 (en) 2011-11-14 2018-08-14 International Business Machines Corporation Information processing apparatus
US9703500B2 (en) 2012-04-25 2017-07-11 International Business Machines Corporation Reducing power consumption by migration of data within a tiered storage system
EP2811410A4 (en) * 2012-12-21 2017-04-05 Huawei Technologies Co., Ltd. Monitoring record management method and device
EP2811410A1 (en) * 2012-12-21 2014-12-10 Huawei Technologies Co., Ltd. Monitoring record management method and device
US9639293B2 (en) 2014-07-09 2017-05-02 International Business Machines Corporation Migration of newly allocated data to a storage tier
US9372630B2 (en) 2014-07-09 2016-06-21 International Business Machines Corporation Migration of newly allocated data to a storage tier

Similar Documents

Publication Publication Date Title
US9258364B2 (en) Virtualization engine and method, system, and computer program product for managing the storage of data
US10198324B2 (en) Data protection scheduling, such as providing a flexible backup window in a data protection system
EP2378427B1 (en) Management system for calculating storage capacity to be increased/decreased
US8266406B2 (en) System and method for allocation of organizational resources
CN103176881B (en) The energy management data processing resources of an adaptive power management of the data storing operation
US8935493B1 (en) Performing data storage optimizations across multiple data storage systems
US6925529B2 (en) Data storage on a multi-tiered disk system
US20110010514A1 (en) Adjusting Location of Tiered Storage Residence Based on Usage Patterns
US20080059704A1 (en) System and method for allocation of organizational resources
US9110727B2 (en) Automatic replication of virtual machines
US7831766B2 (en) Systems and methods of data storage management, such as pre-allocation of storage space
US9106591B2 (en) Adaptive resource management using survival minimum resources for low priority consumers
US9021203B2 (en) Enhancing tiering storage performance
US20150106578A1 (en) Systems, methods and devices for implementing data management in a distributed data storage system
US9753938B2 (en) Selective deduplication
US8433848B1 (en) Analysis tool for a multi-tier storage environment
US9207874B2 (en) Synchronous extent migration protocol for paired storage
US8688941B2 (en) System and method for controlling automated page-based tier management in storage systems
US20130007302A1 (en) System, Method and Program Product to Manage Transfer of Data to Resolve Overload of a Storage System
US20160011816A1 (en) Method to optimize inline i/o processing in tiered distributed storage systems
CN104272386B (en) By reducing power consumption data migration in the tiered storage system
US20090006877A1 (en) Power management in a storage array
US8892780B2 (en) Management of shared storage I/O resources
US8590050B2 (en) Security compliant data storage management
US8380947B2 (en) Storage application performance matching

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07766779

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase in:

Ref country code: DE

NENP Non-entry into the national phase in:

Ref country code: RU

122 Ep: pct app. not ent. europ. phase

Ref document number: 07766779

Country of ref document: EP

Kind code of ref document: A1