WO2010099992A1 - Method, system and computer program product for managing the placement of storage data in a multi tier virtualized storage infrastructure - Google Patents

Method, system and computer program product for managing the placement of storage data in a multi tier virtualized storage infrastructure Download PDF

Info

Publication number
WO2010099992A1
WO2010099992A1 PCT/EP2010/050254 EP2010050254W WO2010099992A1 WO 2010099992 A1 WO2010099992 A1 WO 2010099992A1 EP 2010050254 W EP2010050254 W EP 2010050254W WO 2010099992 A1 WO2010099992 A1 WO 2010099992A1
Authority
WO
WIPO (PCT)
Prior art keywords
storage
data
read
mdg
write
Prior art date
Application number
PCT/EP2010/050254
Other languages
French (fr)
Inventor
Pierre Sabloniere
Original Assignee
International Business Machines Corporation
Compagnie Ibm France
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corporation, Compagnie Ibm France filed Critical International Business Machines Corporation
Priority to CN2010800102363A priority Critical patent/CN102341779A/en
Priority to EP10700239A priority patent/EP2404231A1/en
Publication of WO2010099992A1 publication Critical patent/WO2010099992A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0647Migration mechanisms
    • G06F3/0649Lifecycle management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • G06F3/0605Improving or facilitating administration, e.g. storage management by facilitating the interaction with a user or administrator
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0685Hybrid storage combining heterogeneous device types, e.g. hierarchical storage, hybrid arrays

Definitions

  • the present invention relates to the field of data processing and in particular to the management of storage and the optimization of data placement in a multi tier virtualized storage infrastructure.
  • SAN Storage Area Network
  • Block virtualization provides servers with a logical view of the physical storage, such as disk drives, solid-state disks, and tape drives, on which data is actually stored.
  • the logical view may comprise a number of virtual storage areas into which the available storage space is divided (or aggregated) without regard to the physical layout of the actual storage.
  • the servers no longer see specific physical targets, but instead see logical volumes which can be for their exclusive use.
  • the servers send their data to the virtual storage areas as if they are their direct attached property.
  • Virtualization may take place at the level of volumes, individual files or at the level of blocks that represent specific locations within a disk drive.
  • Block aggregation can be performed within hosts (servers), and/or in storage devices
  • Tiered storage is the 20080238 2 assignment of different categories of data to different types of storage media in order to reduce total storage cost. Categories may be based on levels of protection needed, performance requirements, frequency of use, capacity and other considerations. User requirements for their placement are quite often loosely specified or based on wishes rather than on accurate capacity planning. Furthermore, even if initial requirements were adequate, applications may undergo drastic data access changes throughout their life cycle. For instance, the roll out of an internet application where the number of future users is difficult to predict is likely to have an actual data access behavior at a given time very different from initial deployment values and/or planned activity. Over time, this application might benefit from functional enhancements causing upward changes in data access behaviors.
  • a method of hierarchical storage data in a storage area network (SAN) has been proposed in WO 2007/009910 from the Assignee where the SAN comprises a plurality of host data processors coupled to a storage virtualization engine, which is coupled to a plurality of physical storage media. Each physical media is assigned a tier level. The method is based on selective relocation of data blocks when their access behaviors exceed tier media threshold values.
  • This method may lead to non economical solutions for composite workloads including multiple applications consisting of highly demanding applications and low demanding applications. For such workloads, this method would lead to recommend or select two types of storage resources.
  • the first storage resource type would be a "high performance
  • the present invention aims to address the aforementioned cited problems.
  • the invention provides a method for managing the placement of data on the virtualized multi-tier storage infrastructure in a loosely defined and changing 20080238 4 environment.
  • Each physical storage media is assigned a tier level based on its Read I/O rate access density.
  • the method comprises a top down method based on data collected from the virtualization engine compared to Read I/O capability and space capacity of each discrete virtual storage pool to determine whether re-tiering situations exist, a drill-in analysis algorithm based on relative Read I/O access density to identify which data workload should right-tiered among the composite workload hosted in the discrete virtual storage pool..
  • the method operates at discrete storage virtual pool and storage virtual disk levels and takes advantage of opportunistic complementary workload profiles present in most aggregated composite workloads. This method significantly reduces the amount of re-tiering activity which would be generated by a micro analysis at block or storage virtual disk level and may provide more economical recommendations.
  • the method based on a top down approach analyzes the behavior of storage resources, detecting situations where workload re-tiering is suitable and provides re-tiering (upward or downward) recommendations.
  • the suggested re-tiering/right-tiering actions can be analyzed by storage administrators for validation or automatically passed to the virtualization engine for virtual disk migration.
  • the method comprises also a Write response time component which covers for quality of service issues.
  • the method uses alerts based on thresholds defined by the storage administrator.
  • the process comprises a structured and repeatable evaluation of the virtualized storage infrastructure, a process flow leading to data workload re-tiering actions.
  • the process comprises also a structured flow to analyze Write response time quality of service alerts, decide whether re-tiering is required and identify which data workload should be re-tiered.
  • Figure 1 shows an example of a Storage Area Network in which the present invention may be implemented
  • Figure 2 shows a simple view of block virtualization
  • FIG. 3 shows components of a virtualization engine in which the present invention may be implemented
  • Figure 4 shows components of the Storage Tiering Analyzer for Right Tiering (START) component according to the invention
  • Figure 5 illustrates the preferred data service model dimensions used in an embodiment of the right tiering process
  • Figure 6 illustrates storage data service technical and economical domains of usage
  • Figures 7A, 7B, 7C and 7D show examples of actual situations of a composite data workload in a technical domain of usage for a storage pool
  • Figure 8 illustrates the Read I/O rate density in a three-dimension model used by the invention
  • Figure 9 shows the Read I/O rate density of a data workload composed of two data workloads of different Read I/O rate densities and illustrates the thermal analogy which is applicable;
  • Figure 10 shows how the Read I/O rate density of a composite workload is modified when removing one of the composing data workloads
  • Figure 11 shows the threshold based alert system supporting the invention
  • Figure 12 provides the process flow supporting the method described in the invention as it relates to Read I/O rate density and Space utilization;
  • Figure 13 provides the process flow supporting an embodiment of the method as it relates to the analysis of Write I/O response time alerts.
  • the invention proposes using a virtualization engine, which has knowledge of both the data and the location of the data, and an analyzer component to identify situations deserving data re-tiering and recommending actual data re-tiering actions. 20080238 6
  • SAN 100 with several host application servers 102 attached. These can be many different types, typically some number of enterprise servers, and some number of user workstations.
  • Tier 1 which may be, for example enterprise level storage, such as the IBM ⁇ System Storage DS8000; Tier 2, which may be mid range storage, such as the IBM ⁇ System Storage DS5000 equipped with FC disks; and Tier 3 which may be lower end storage, such as the IBM ⁇ System Storage DS4700 equipped with Serial Advanced Technology
  • SATA Serial Attachment
  • each MDisk corresponds to a single tier and each RAID array 101 belongs to a single tier.
  • Each of the RAID controllers 103 may control RAID storage belonging to different tiers.
  • different tiers may also be applied to different RAID types; for example, a RAID-5 array may be placed in a higher tier than a RAID-O array.
  • the SAN is virtualized by means of a storage virtualization engine 104 which sits in the data path for all SAN data, and presents Virtual Disk 106a to 106n to the host servers and workstations 102. These virtual disks are made up from the capacity provided across the three tiers of storage devices.
  • the virtualization engine 104 comprises one of more nodes 110 (four shown), which provide virtualization, cache and copy services to the hosts.
  • the nodes are deployed in pairs and make up a cluster of nodes, with each pair of nodes known as an Input/Output (I/O) group.
  • I/O Input/Output
  • Each RAID controller presents an SCSI (Small Computer System Interface) disk to the virtualization engine.
  • the presented disk may be managed by the virtualization engine, and be called a managed disk, or MDisk.
  • MDisks are split into extents, fixed size blocks of usable capacity, which are numbered sequentially from the start to the end of each MDisk. These extents can be concatenated, striped, or any desirable algorithm can 20080238 7 be used to produce larger virtual disk (VDisks) which are presented to the hosts by the nodes.
  • the MDisks M1,M2,...M9 can be grouped together in Managed Disk Groups, or MDGs 108, typically characterized by factors such as performance, RAID level, reliability, vendor, and so on. According to the preferred embodiment, all MDisks in an MDG represent storage of the same tier level, as shown in Figure 1. There may be multiple MDGs of the same tier in the virtualized storage infrastructure, each being a discrete virtual storage pool.
  • the virtualization engine converts Logical Block Addresses (LBAs) of virtual disk to extents of the VDisks, and maps extents of the VDisk to MDisk extents.
  • LBAs Logical Block Addresses
  • An example of the mapping from a VDisk to MDisks is shown in Figure 2.
  • Each of the extents of the VDisk A is mapped to an extent of one of the managed disks Ml, M2 or M3.
  • the mapping table which can be created from metadata stored by each node, shows that some of the managed disk extents are unused. These unused extents are available for use in creating new VDisks, migration, expansion and so on.
  • virtual disks are created and distributed so that the enterprise level servers initially use enterprise level storage or based on application owner requirements. This may not be fully justified by actual data access characteristics.
  • the invention provides a method to identify better data placement scenario with a structured right tiering process.
  • the invention supports a different and cheaper initial data placement for application. For instance, initial data placement for all application could be released in tier 2 storage media and the invention would support the re-tiering of part or all of this data based on the actual situation of the overall virtualized storage infrastructure.
  • a virtualization engine of node 110 comprises the following modules: SCSI Front End 302, Storage Virtualization 310, SCSI Back End 312, Storage Manager 314 and Event Manager 316. 20080238 8
  • the SCSI Front End layer receives I/O requests from hosts; conducts LUN mapping (i.e. between LBAs to Logical Unit Numbers (LUNs) (or extents) of virtual disks A and C); and converts SCSI Read and Write commands into the node's internal format.
  • LUN mapping i.e. between LBAs to Logical Unit Numbers (LUNs) (or extents) of virtual disks A and C
  • LUNs Logical Unit Numbers
  • the SCSI Back End processes requests to Managed disks which are sent to it by the Virtualization layer above, and addresses commands to the RAID controllers.
  • the I/O stack may also include other modules (not shown), such as Remote Copy, Flash Copy or Cache. Caches are usually present both at Virtualization engine and RAID controller levels.
  • the node displayed in Figures 3 belongs to an I/O group to which VDisks A and B are assigned. This means that this node presents an interface to VDisks A and B for hosts. Managed disks 1, 2 and 3 may also correspond to other virtual disks assigned to other nodes.
  • the event manager 316 manages metadata 318, which comprises mapping information for each extent as well as tier level data and an access value for the extent. This metadata is also available to the virtualization layer 310 and storage manager 314.
  • the Front End converts the specified LBA into an extent ID (LUN) of a virtual disk, set us say this is extent 3 of VDisk A (A-3).
  • LUN extent ID
  • the virtualization component 310 uses the metadata shown in the form of a mapping table in Figure 2, to map extent A-3 to extent 6 of MDisk 2 (M2-6).
  • the write request is then passed via the SCSI back end 312 to the relevant controller for MDisk 2 and Data is written to the extent M2-6.
  • the virtualization layer sends a message 304 to the event manager indicating that a write to extent 6 of MDisk 2 has been requested.
  • the event manager then updates the metadata in respect of extent M2-6 to indicate that this extent is now full.
  • the event manager also updates the access value in the metadata for the extent. This may be by storing the time at which the write occurred as the access value, of by resetting a count value in the metadata.
  • the event manager returns a message 304 to the virtualization component to indicate that the metadata has been updated to reflect the write operation. 20080238 9
  • the Storage Tiering Analyzer for Right Tiering (START) manager component which allows right tiering actions is now described with reference to Figure 4.
  • START performs the analysis of the SAN activity to identify situations deserving right tiering actions and prepares the appropriate VDisk migration action list.
  • the Data Collector 401 acts as a Storage Resource Manager, by periodically collecting topology data contained in the virtualization engine and access activity per LUNs and VDisks. This may comprise write and read activity counts, response times and other monitoring data. This may comprise back end and front end activity data and internal measurements of the virtualization engine such as queue levels.
  • the data collector inserts this series of data in its local repository on a periodic basis (a preferred period is typically every 15 minutes) and stores it for a longer period of time (typically 6 months).
  • the Data Aggregator 402 processes SAN data covering a longer period of time (say one day e.g. 96 samples of 15 minute each) by accessing the Data Collector repository (with mechanisms such as batch reports) and produces aggregated values comprising minimum, maximum, average, shape factors,...for VDisks and MDGs managed by the virtualization engine of the SAN.
  • the data produced by the Data Aggregator can be compared to the SAN Model Metadata 403 which contains the I/O processing capability for each of the MDGs.
  • This I/O processing capacity may be based on disk array vendor specifications, disk array modeling activity figures (such as produced by Disk Magic application software), or generally accepted industry technology capability figures for the disks controlled by the RAID controller, their number, their redundancy set up and cache hit ratio values at RAID controller level. Other I/O processing modeling capability algorithms may also be used.
  • the data produced by the Data Aggregator can also be compared to the total space capacity of each MDG which can be stored in the SAN Model Meta data or collected from the virtualization engine.
  • the Data Analyzer component 404 performs these comparisons and raises right tiering alerts based on thresholds set by the storage administrator. These alerts cover MDGs which utilizations are not balanced and for which VDisk migration actions should be considered. 20080238 10
  • the Data Analyzer provides a drill- in view of all VDisks hosted by the MDG sorted by Read Access Rate Density. This view allows an immediate identification of 'hot' VDisks and 'cold' ones. Depending on the type of alert, this drill- in view easily points to VDisks which migration to another tier will resolve the MDG alert. By right-tiering these VDisks, the source MDG will see the Read Access rate density value of the composite workload hosted by the MDG becoming closer to the MDG intrinsic capability, making this MDG usage better balanced in regards of its utilization domain.
  • the Data Analyzer computes the Net Read I/O access density as the ratio of the MDG remaining Read I/O processing capability divided by the
  • MDG reaming space capacity A workload which Read I/O access density would be equal to the Net Read I/O access density would be considered as a complementary workload for this MDG in its current state.
  • the VDisk migration action list composed of 'hot' or 'cold' VDisks depending on the type of alert, is prepared by the Data Analyzer component and may be passed to the virtualization engine for implementation in the SAN either automatically or after validation by the storage administrator as shown by 405.
  • the MDG target to which a particular VDisk should be re-tiered may be determined using the following algorithm. First, MDGs which remaining space capacity or Read I/O processing capability are not sufficient to fit VDisk footprint
  • the VDisk footprint being equal to space and Read I/O requirements for this VDisk
  • the MDG of Net Read I/O access density of the closest value to the VDisk Read I/O access density is chosen (e.g. the VDisk workload profile is a workload complementary to the MDG in its current state). This operation is repeated for VDisks in an MDG in alert until the cumulated relative weight of the re-tiered VDisks resolves the alert. This operation is also repeated for all MDGs in alert. Other algorithms may be considered to assist in the alert resolution process.
  • Figure 5 illustrates a thee-dimension model used in a particular embodiment of the invention.
  • back end storage services are provided by 'Managed Disk Groups' (MDG) federating a series of Managed Disks (LUNs) 20080238 11 hosted on storage arrays and accessed in 'stripped mode' by the SVC layer.
  • MDG Managed Disk Groups'
  • LUNs Managed Disks
  • Front end storage services as seen by data processing hosts as provided by VDisks.
  • a composite workload of multiple VDisks, for instance all VDisks hosted in a given MDG, may also be described along this three-dimension model.
  • Figure 6 illustrates two major domains of utilization of a storage service such as a RAID array, an MDG, a LUN or a VDisk.
  • the first domain is the functional domain of the storage service. It lays within the boundaries of total space (in Mbytes) of the storage pool, its maximum Read I/O rate processing capability and its maximum acceptable response time as defined by the Storage administrator.
  • the second domain is the economical domain of utilization of the storage service. This is a reduced volume located inside the previous domain located close to boundaries of the maximum Read I/O capability and total storage space pace within the acceptable response time limit.
  • Figures 7A-7D provides illustrated examples of workload situations within the two domains of utilization.
  • Figure 8 introduces the Read I/O rate access density factor which can be evaluated for a storage device (in terms of capability) or data workload such as applications or parts of applications (hosted in one VDisk or multiple ones). The following formulas provide additional details.
  • the Read I/O rate access density is measured in IO/sec / Megabyte and its algebra can easily be understood when using a thermal analogy where high access density applications would be a 'hot' storage workloads and low access density application would be a 'cold' storage workloads.
  • the weighted thermal formula applicable to mild water (hot + cold) applies to 'hot' and 'cold' data workloads.
  • An MDG operates within its economical zone if the aggregated workload of all VDisks hosted in the MDG is 'close' to the MDG theoretical access density and if the MDG capacity is almost all utilized.
  • the invention proposes a process aiming to optimizing MDG usage as a result from exchanging workload(s) with other MDGs of different access density.
  • the preferred embodiment of this invention is the use of the Read I/O rate density to classify MDG capacity among the various tiers.
  • An MDG hosted on a tier 1 RAID controller has the highest Read I/O rate density among all MDGs whereas an MDG of the lowest Read I/O rate access density will belong to a tier of lower ranking (typically tier 3-5 depending on the tier grouping in the virtualized infrastructure).
  • the preferred embodiment of the invention is implemented by the Data
  • Analyzer component when raising alerts based on thresholds defined by the storage administrator. There are three different alerts listed hereafter:
  • Group capacity allocated to VDisks is close (in %) to the MDG storage capacity.
  • Figure 11 shows these three alert thresholds as they refer to MDG domains of utilization.
  • the driving principles for storage pool optimization are the following ones: 1. If “Allocated capacity” is close to “Maximum capacity” and “Read I/O activity” is significantly lower than the “Read I/O capability”, the "Read I/O capability” is not fully leveraged. Then, application data of lowest access rate density must be removed from the discrete virtual storage pool (i.e. MDG) to free up space to host application data of higher access rate density. The removed application data of lowest access rate density should be dispatched to a storage pool of lower Read access rate density capability. This process is called “down-tiering”.
  • VDisk(s) When determining which VDisk(s) should be right-tiered, absolute Read I/O rate VDisk actual values cannot be used 'as is' because of the cache present at the virtualization engine level. This cache allows serving Read I/O request to front end data processors without incurring back end Read instructions.
  • the method of the present invention uses the relative Read I/O rate activity for each VDisk compared to the front end aggregated workload hosted in the MDG to sort VDisks between 'hot' and 'cold' data workloads and take practical re-tiering decisions. It will be clear to one skilled in the art that the method of the present invention may suitably be embodied in a logical apparatus comprising means to perform the steps of the method, and such logic means may comprise hardware components of firmware components.
  • Step 1200 checks if the allocated storage capacity if greater than 90% of the total capacity of the Managed Disk Group where the threshold value (90%) can be set up by the storage administrator according to local policy.
  • step 1202 a test is performed (step 1202) to determine whether the actual Read I/O rate is greater than 75% of the read I/O capability of the MDG where the threshold value (75%) can be set up by the storage administrator according to local policy. - If the result is No, meaning that the pool is in an intermediate state, no further action is performed and the process goes to step 1216.
  • step 1208 the up-tiering is performed by selecting the VDisk(s) of highest access density currently hosted in the MDG, and up- tiering to another MDG for which the VDisk is a good complementary workload. After this VDisk right-tiering operation, the source MDG will see its Read Access rate density actual value decreasing and becoming closer to its intrinsic capability, making this MDG usage better balanced in regards of its utilization domain. The process then goes to step 1216.
  • step 1202 a similar test similar to step 1202 is performed. - If the result is Yes, meaning that the aggregated workload is using a high percentage of the Read I/O capability and most of the space is consumed, the MDG is operating in its economical domain, no further action is performed, and the process stops.
  • step 1214 the down-tiering is performed by selecting the VDisk(s) of lowest access density in the MDG, and down-tiering to another MDG for which the VDisk is a good complementary workload. After this VDisk right-tiering operation, the source MDG will see its Read Access rate density actual value increasing and becoming closer to its intrinsic capability, making this MDG usage better balanced in regards of its utilization domain. The process then goes to step 1216. FR920080238 16
  • step 1216 the available MDG storage capacity is allocated to other workloads of complementary access density profile, and the process loops back to step 1200 to analyze the following MDG. When all MDGs are analyzed, the process will wait until the next evaluation period to restart in 1200 for the first MDG of the list.
  • the analysis / alert method can be integrated in a repeatable storage management process as a regular monitoring task. For instance, every day, a system implementation of the method could produce a storage management dashboard reporting for each MDG, actual values versus capability and capacity and Write response time situation with highlighted alerts when applicable.
  • the dashboard would be accompanied with drill- in views providing behaviors of the VDisks hosted by each MDG, this view being sorted by Read I/O Access rate density and a list of right-tiering actions which might be evaluated by the storage administrator for passing to the virtualization engine.
  • FIG. 13 shows a flow chart of the analysis / alert method to take care of the Write I/O quality of service aspects.
  • the Write I/O response time trigger is replaced by another Write I/O rate indicator.
  • This indicator is based on the ratio between the Front End Write cache Delay I/O rate and the total Write I/O rate value.
  • Write Cache Delay I/O operations are Write I/O operations retained in the Write cache of the virtualization engine because the back end storage pool cannot accept them because of saturation.
  • the front end application is likely to become slowed down and the response time increases.
  • the usage of this indicator as a re-tiering alert is another embodiment of the present invention.
  • step 1300 a test is performed to check if the Front End Write Cache Delay I/O rate has reached the threshold where the threshold value is be set up by the storage administrator according to local policy.
  • step 1320 If the result is No, then the process goes to step 1320 If the result is Yes, then the VDisks causing the alert are traced to the application using these VDisks on step 1302.
  • step 1303 values for the application batch elapsed time value [A] and the batch elapsed time SLA target [T] are collected. This data is provided externally to the present invention typically by FR920080238 17 application performance indicators under IT operation staff responsibility.
  • step 1304 a new test checks whether the application SLA, typically a batch elapsed time target is at risk by the mean of comparing A and T values versus a safety threshold level. If the result is No, meaning that A is significantly lower than T, then the observed high response time values are not important for the batch duration, no further action is performed on step 1306, and the process goes to step 1320.
  • step 1308 a trend analysis of Write I/O response time and Write I/O rate values is performed using for instance TPC graphics reporting as an embodiment.
  • step 1310 a new test is performed to check whether the total time the application waits for Write I/O operations is of increasing values or not (this total Write wait time is equal to the sum for all sampling periods of the multiplication of Write I/O response time and Write I/O rate for all VDisks in alert): - If the result is No, meaning that the total time the application waits for Write
  • step 1314 trend analysis results are used to extrapolate, for instance with a linear modeling, future batch duration values.
  • step 1316 to check if the SLA Target (T) is at risk or not in a near future. If the result is No, the process goes to step 1312 otherwise if the result is Yes, the process goes to step 1318 to up-tier some (or all) VDisks, creating the application SLA risk to an MDG with an higher I/O capability.
  • step 1320 the available MDG storage capacity is allocated to other workloads of complementary access density profile, and the process loops back to step 1300 to analyze the following MDG.
  • the process will wait until the next evaluation period to restart in 1300 for the first MDG of the list.
  • the analysis/alert methods described in Figures 12 and 13 can also be used to characterize a new workload which I/O profile is unknown.
  • This workload may be hosted in a 'nursery' MDG for measurement of its I/O behavior for a certain period (for instance for one month) to collect sufficient behavioral data.
  • application VDisks could be right-tiered based on space requirement, Read I/O requirement and Read I/O density values provided by the Data Analyzer component.
  • This 'nursery' process may replace, at low cost, the need for sophisticated storage performance estimation work required before deciding which storage tier should be used and which MDG(s) would be best suited. Future changes in application behavior would then be handled by the regular monitoring task ensuring alignment of application needs to the storage infrastructure without intervention from costly storage engineers.
  • the analysis/alert method of the present invention may be used to relocate application data when a back end disk array connected to the virtualized storage infrastructure requires de-commissioning.
  • the data available at the Data Analyzer component may be used to decide which storage tier should be used for each of the logical storage units and which discrete storage pool (e.g. MDG) is best suited for each ones.
  • MDG discrete storage pool
  • the analysis/alert method of the present invention may be used to relocate application data when a disk array not connected to the virtualized storage infrastructure requires de-commissioning.
  • the disk might be connected to the virtualized storage infrastructure and undergo the nursery characterization process before relocating the virtual logical storage units to other discrete virtual storage pools.
  • the process might consist of using existing performance data collected on the disk array and reinstall the application on the virtualized storage infrastructure using the data provided by Data Analyzer component.
  • an MDG may be referred as a storage pool, virtual storage pool or discrete virtual storage pool and a VDisk as a Virtual Storage Logical Unit.

Abstract

A storage management method for use in SAN based virtualized multi-tier storage infrastructure in a loosely defined and changing environment. Each physical storage media is assigned a tier level based on its Read I/O rate access density. The method comprises a top down method based on data collected from the virtualization engine compared to Read I/O capability and space capacity of each discrete virtual storage pool to determine whether re-tiering situations exist, a drill-in analysis algorithm based on relative Read I/O access density to identify which data workload should right-tiered among the composite workload hosted in the discrete virtual storage pool.

Description

FR920080238 1
Method, System and Computer Program Product for Managing the Placement of Storage Data in a multi tier virtualized storage infrastructure
FIELD OF THE INVENTION
The present invention relates to the field of data processing and in particular to the management of storage and the optimization of data placement in a multi tier virtualized storage infrastructure.
BACKGROUND OF THE INVENTION
Enterprises face major challenges due to the fast growth of their storage needs, the increased complexity of managing the storage, and the requirement for high availability of storage. Storage Area Network (SAN) technologies enable storage systems to be engineered separately from host computers through the pooling of storage, resulting in improved efficiency.
Storage virtualization, a storage management technology which masks the physical storage complexities for the user, may also be used. Block virtualization (sometimes also called block aggregation) provides servers with a logical view of the physical storage, such as disk drives, solid-state disks, and tape drives, on which data is actually stored. The logical view may comprise a number of virtual storage areas into which the available storage space is divided (or aggregated) without regard to the physical layout of the actual storage. The servers no longer see specific physical targets, but instead see logical volumes which can be for their exclusive use. The servers send their data to the virtual storage areas as if they are their direct attached property.
Virtualization may take place at the level of volumes, individual files or at the level of blocks that represent specific locations within a disk drive. Block aggregation can be performed within hosts (servers), and/or in storage devices
(intelligent disk arrays).
In data storage, the problem of accurate data placement among a set of storage tiers is among the most difficult problems to solve. Tiered storage is the 20080238 2 assignment of different categories of data to different types of storage media in order to reduce total storage cost. Categories may be based on levels of protection needed, performance requirements, frequency of use, capacity and other considerations. User requirements for their placement are quite often loosely specified or based on wishes rather than on accurate capacity planning. Furthermore, even if initial requirements were adequate, applications may undergo drastic data access changes throughout their life cycle. For instance, the roll out of an internet application where the number of future users is difficult to predict is likely to have an actual data access behavior at a given time very different from initial deployment values and/or planned activity. Over time, this application might benefit from functional enhancements causing upward changes in data access behaviors. Later, selected functions may become unused because their functional perimeter is taken over by a newer application leading to downward change in data access patterns. In additional to application behavior uncertainty, data access behaviors may be far from homogeneous within a single application. For instance, a highly active database log and a static parameter table will feature very different data access patterns. All across these life cycle changes, storage administrators are faced with loosely specified and changing environments where user technical input cannot be considered accurate or trustable to take right data placement decisions.
The abundance of storage technologies used in storage tiers ( Fiber Channel (FC), Serial AT Attachment (SATA), Solid State Drives (SSD)) combined with their redundancy set up (RAID 5, RAID 10, etc...) makes even more complex application data placement decisions in storage tiers where prices per unit of storage capacity may range from 1 to 20 between SATA and SSD. Using right tiers for application data is now a crucial need for enterprises to reduce their cost while maintaining application performance.
A method for managing allocation of data sets among a plurality of storage devices has been proposed in US 5,345,584. The method based on data storage factors for data set and storage devices is well fit for single dataset placement in single storage devices accessed without a local cache layer. This architecture is today mostly obsolete because modern storage devices host datasets in stripped 20080238 3 mode across multiple storage devices with a cache layer capable of buffering high numbers of write access instructions. Furthermore using the total access rate (i.e. the sum of Read activity and Write activity) is grossly inaccurate to characterize modern storage devices; for instance, a 300 GB Fiber Channel drive may typically support 100-150 random accesses per second whereas a write cache layer may buffer 1000 write instructions per second each of 8Kbytes (a typical database block size) for 15 minutes causing the total access rate to become inaccurate. This issue derails any model which would be based on total read and write access activity and capability.
A method of hierarchical storage data in a storage area network (SAN) has been proposed in WO 2007/009910 from the Assignee where the SAN comprises a plurality of host data processors coupled to a storage virtualization engine, which is coupled to a plurality of physical storage media. Each physical media is assigned a tier level. The method is based on selective relocation of data blocks when their access behaviors exceed tier media threshold values. This method may lead to non economical solutions for composite workloads including multiple applications consisting of highly demanding applications and low demanding applications. For such workloads, this method would lead to recommend or select two types of storage resources. The first storage resource type would be a "high performance
SSD like" type and the second one would be a "low performance SATA drive like" type whereas a solution based on Fiber Channel (FC) disks might be sufficient and more economical to support the "average" performance characteristics of the aggregated workload. In essence, using 1, 2 and 20 ratios for type prices/unit of capacity for SATA, FC and SSD storage media would lead to an FC solution being five times cheaper than a combined SSD and SATA solution.
The present invention aims to address the aforementioned cited problems.
SUMMARY OF THE INVENTION.
The invention provides a method for managing the placement of data on the virtualized multi-tier storage infrastructure in a loosely defined and changing 20080238 4 environment. Each physical storage media is assigned a tier level based on its Read I/O rate access density. The method comprises a top down method based on data collected from the virtualization engine compared to Read I/O capability and space capacity of each discrete virtual storage pool to determine whether re-tiering situations exist, a drill-in analysis algorithm based on relative Read I/O access density to identify which data workload should right-tiered among the composite workload hosted in the discrete virtual storage pool..
The method operates at discrete storage virtual pool and storage virtual disk levels and takes advantage of opportunistic complementary workload profiles present in most aggregated composite workloads. This method significantly reduces the amount of re-tiering activity which would be generated by a micro analysis at block or storage virtual disk level and may provide more economical recommendations.
The method, based on a top down approach analyzes the behavior of storage resources, detecting situations where workload re-tiering is suitable and provides re-tiering (upward or downward) recommendations.
The suggested re-tiering/right-tiering actions can be analyzed by storage administrators for validation or automatically passed to the virtualization engine for virtual disk migration. The method comprises also a Write response time component which covers for quality of service issues. The method uses alerts based on thresholds defined by the storage administrator. The process comprises a structured and repeatable evaluation of the virtualized storage infrastructure, a process flow leading to data workload re-tiering actions. The process comprises also a structured flow to analyze Write response time quality of service alerts, decide whether re-tiering is required and identify which data workload should be re-tiered.
According to the invention, there is provided a method and system as described in the appended independent claims.
Further embodiments are defined in the appended dependent claims. The foregoing and other objects, modules and advantages of the present invention will now be described by way of preferred embodiment and examples, with reference to the accompanying figures. 20080238 5
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 shows an example of a Storage Area Network in which the present invention may be implemented; Figure 2 shows a simple view of block virtualization;
Figure 3 shows components of a virtualization engine in which the present invention may be implemented;
Figure 4 shows components of the Storage Tiering Analyzer for Right Tiering (START) component according to the invention; Figure 5 illustrates the preferred data service model dimensions used in an embodiment of the right tiering process;
Figure 6 illustrates storage data service technical and economical domains of usage;
Figures 7A, 7B, 7C and 7D show examples of actual situations of a composite data workload in a technical domain of usage for a storage pool;
Figure 8 illustrates the Read I/O rate density in a three-dimension model used by the invention;
Figure 9 shows the Read I/O rate density of a data workload composed of two data workloads of different Read I/O rate densities and illustrates the thermal analogy which is applicable;
Figure 10 shows how the Read I/O rate density of a composite workload is modified when removing one of the composing data workloads;
Figure 11 shows the threshold based alert system supporting the invention; Figure 12 provides the process flow supporting the method described in the invention as it relates to Read I/O rate density and Space utilization; and
Figure 13 provides the process flow supporting an embodiment of the method as it relates to the analysis of Write I/O response time alerts.
DESCRIPTION OF A PREFERRED EMBODIMENT
The invention proposes using a virtualization engine, which has knowledge of both the data and the location of the data, and an analyzer component to identify situations deserving data re-tiering and recommending actual data re-tiering actions. 20080238 6
Referring to Figure 1, there is shown a SAN 100 with several host application servers 102 attached. These can be many different types, typically some number of enterprise servers, and some number of user workstations.
Also attached to the SAN, (via Redundant Array of Inexpensive Disks (RAID) controllers A, B and C), are various levels of physical storage. In the present example, there are three levels of physical storage: Tier 1, which may be, for example enterprise level storage, such as the IBM© System Storage DS8000; Tier 2, which may be mid range storage, such as the IBM© System Storage DS5000 equipped with FC disks; and Tier 3 which may be lower end storage, such as the IBM© System Storage DS4700 equipped with Serial Advanced Technology
Attachment (SATA) drives.
Typically, each MDisk corresponds to a single tier and each RAID array 101 belongs to a single tier. Each of the RAID controllers 103 may control RAID storage belonging to different tiers. In addition to different tiers being applied to different physical disk types, different tiers may also be applied to different RAID types; for example, a RAID-5 array may be placed in a higher tier than a RAID-O array.
The SAN is virtualized by means of a storage virtualization engine 104 which sits in the data path for all SAN data, and presents Virtual Disk 106a to 106n to the host servers and workstations 102. These virtual disks are made up from the capacity provided across the three tiers of storage devices.
The virtualization engine 104 comprises one of more nodes 110 (four shown), which provide virtualization, cache and copy services to the hosts. Typically, the nodes are deployed in pairs and make up a cluster of nodes, with each pair of nodes known as an Input/Output (I/O) group.
As storage is attached to the SAN it is added to various pools of storage each controlled by a RAID controller 103. Each RAID controller presents an SCSI (Small Computer System Interface) disk to the virtualization engine. The presented disk may be managed by the virtualization engine, and be called a managed disk, or MDisk. These MDisks are split into extents, fixed size blocks of usable capacity, which are numbered sequentially from the start to the end of each MDisk. These extents can be concatenated, striped, or any desirable algorithm can 20080238 7 be used to produce larger virtual disk (VDisks) which are presented to the hosts by the nodes.
The MDisks M1,M2,...M9 can be grouped together in Managed Disk Groups, or MDGs 108, typically characterized by factors such as performance, RAID level, reliability, vendor, and so on. According to the preferred embodiment, all MDisks in an MDG represent storage of the same tier level, as shown in Figure 1. There may be multiple MDGs of the same tier in the virtualized storage infrastructure, each being a discrete virtual storage pool.
The virtualization engine converts Logical Block Addresses (LBAs) of virtual disk to extents of the VDisks, and maps extents of the VDisk to MDisk extents. An example of the mapping from a VDisk to MDisks is shown in Figure 2. Each of the extents of the VDisk A is mapped to an extent of one of the managed disks Ml, M2 or M3. The mapping table, which can be created from metadata stored by each node, shows that some of the managed disk extents are unused. These unused extents are available for use in creating new VDisks, migration, expansion and so on.
Typically, virtual disks are created and distributed so that the enterprise level servers initially use enterprise level storage or based on application owner requirements. This may not be fully justified by actual data access characteristics. The invention provides a method to identify better data placement scenario with a structured right tiering process. The invention supports a different and cheaper initial data placement for application. For instance, initial data placement for all application could be released in tier 2 storage media and the invention would support the re-tiering of part or all of this data based on the actual situation of the overall virtualized storage infrastructure.
To accomplish this, in addition to the metadata used to track the mapping of managed disk extents to virtual disks, access rate to each extent is monitored. As the data is read and written to any given extent, the metadata is updated with access count. An I/O flow will now be described with reference to Figure 3. As shown in
Figure 3, a virtualization engine of node 110 comprises the following modules: SCSI Front End 302, Storage Virtualization 310, SCSI Back End 312, Storage Manager 314 and Event Manager 316. 20080238 8
The SCSI Front End layer receives I/O requests from hosts; conducts LUN mapping (i.e. between LBAs to Logical Unit Numbers (LUNs) (or extents) of virtual disks A and C); and converts SCSI Read and Write commands into the node's internal format. The SCSI Back End processes requests to Managed disks which are sent to it by the Virtualization layer above, and addresses commands to the RAID controllers.
The I/O stack may also include other modules (not shown), such as Remote Copy, Flash Copy or Cache. Caches are usually present both at Virtualization engine and RAID controller levels. The node displayed in Figures 3 belongs to an I/O group to which VDisks A and B are assigned. This means that this node presents an interface to VDisks A and B for hosts. Managed disks 1, 2 and 3 may also correspond to other virtual disks assigned to other nodes.
The event manager 316 manages metadata 318, which comprises mapping information for each extent as well as tier level data and an access value for the extent. This metadata is also available to the virtualization layer 310 and storage manager 314.
Now consider the receipt from a host of a write request 350 which includes the ID of the virtual disk to which the request refers, and the LBA to which the write should be made. On receipt of the write request, the Front End converts the specified LBA into an extent ID (LUN) of a virtual disk, set us say this is extent 3 of VDisk A (A-3). The virtualization component 310 uses the metadata shown in the form of a mapping table in Figure 2, to map extent A-3 to extent 6 of MDisk 2 (M2-6). The write request is then passed via the SCSI back end 312 to the relevant controller for MDisk 2 and Data is written to the extent M2-6. The virtualization layer sends a message 304 to the event manager indicating that a write to extent 6 of MDisk 2 has been requested. The event manager then updates the metadata in respect of extent M2-6 to indicate that this extent is now full. The event manager also updates the access value in the metadata for the extent. This may be by storing the time at which the write occurred as the access value, of by resetting a count value in the metadata. The event manager returns a message 304 to the virtualization component to indicate that the metadata has been updated to reflect the write operation. 20080238 9
The Storage Tiering Analyzer for Right Tiering (START) manager component which allows right tiering actions is now described with reference to Figure 4. START performs the analysis of the SAN activity to identify situations deserving right tiering actions and prepares the appropriate VDisk migration action list. Firstly, the Data Collector 401 acts as a Storage Resource Manager, by periodically collecting topology data contained in the virtualization engine and access activity per LUNs and VDisks. This may comprise write and read activity counts, response times and other monitoring data. This may comprise back end and front end activity data and internal measurements of the virtualization engine such as queue levels. The data collector inserts this series of data in its local repository on a periodic basis (a preferred period is typically every 15 minutes) and stores it for a longer period of time (typically 6 months).
The Data Aggregator 402 processes SAN data covering a longer period of time (say one day e.g. 96 samples of 15 minute each) by accessing the Data Collector repository (with mechanisms such as batch reports) and produces aggregated values comprising minimum, maximum, average, shape factors,...for VDisks and MDGs managed by the virtualization engine of the SAN.
The data produced by the Data Aggregator can be compared to the SAN Model Metadata 403 which contains the I/O processing capability for each of the MDGs. This I/O processing capacity may be based on disk array vendor specifications, disk array modeling activity figures (such as produced by Disk Magic application software), or generally accepted industry technology capability figures for the disks controlled by the RAID controller, their number, their redundancy set up and cache hit ratio values at RAID controller level. Other I/O processing modeling capability algorithms may also be used.
The data produced by the Data Aggregator can also be compared to the total space capacity of each MDG which can be stored in the SAN Model Meta data or collected from the virtualization engine.
The Data Analyzer component 404 performs these comparisons and raises right tiering alerts based on thresholds set by the storage administrator. These alerts cover MDGs which utilizations are not balanced and for which VDisk migration actions should be considered. 20080238 10
For any MDG in alert, the Data Analyzer provides a drill- in view of all VDisks hosted by the MDG sorted by Read Access Rate Density. This view allows an immediate identification of 'hot' VDisks and 'cold' ones. Depending on the type of alert, this drill- in view easily points to VDisks which migration to another tier will resolve the MDG alert. By right-tiering these VDisks, the source MDG will see the Read Access rate density value of the composite workload hosted by the MDG becoming closer to the MDG intrinsic capability, making this MDG usage better balanced in regards of its utilization domain.
For all MDGs, the Data Analyzer computes the Net Read I/O access density as the ratio of the MDG remaining Read I/O processing capability divided by the
MDG reaming space capacity. A workload which Read I/O access density would be equal to the Net Read I/O access density would be considered as a complementary workload for this MDG in its current state.
The VDisk migration action list, composed of 'hot' or 'cold' VDisks depending on the type of alert, is prepared by the Data Analyzer component and may be passed to the virtualization engine for implementation in the SAN either automatically or after validation by the storage administrator as shown by 405. The MDG target to which a particular VDisk should be re-tiered may be determined using the following algorithm. First, MDGs which remaining space capacity or Read I/O processing capability are not sufficient to fit VDisk footprint
(the VDisk footprint being equal to space and Read I/O requirements for this VDisk) are eliminated as possible targets. Then, the MDG of Net Read I/O access density of the closest value to the VDisk Read I/O access density is chosen (e.g. the VDisk workload profile is a workload complementary to the MDG in its current state). This operation is repeated for VDisks in an MDG in alert until the cumulated relative weight of the re-tiered VDisks resolves the alert. This operation is also repeated for all MDGs in alert. Other algorithms may be considered to assist in the alert resolution process.
Figure 5 illustrates a thee-dimension model used in a particular embodiment of the invention. In an embodiment based on the IBM® TotalStorage® SAN Virtualization Controller (SVC), back end storage services are provided by 'Managed Disk Groups' (MDG) federating a series of Managed Disks (LUNs) 20080238 11 hosted on storage arrays and accessed in 'stripped mode' by the SVC layer. Front end storage services as seen by data processing hosts as provided by VDisks. A composite workload of multiple VDisks, for instance all VDisks hosted in a given MDG, may also be described along this three-dimension model.
Figure 6 illustrates two major domains of utilization of a storage service such as a RAID array, an MDG, a LUN or a VDisk.
The first domain is the functional domain of the storage service. It lays within the boundaries of total space (in Mbytes) of the storage pool, its maximum Read I/O rate processing capability and its maximum acceptable response time as defined by the Storage administrator.
The second domain is the economical domain of utilization of the storage service. This is a reduced volume located inside the previous domain located close to boundaries of the maximum Read I/O capability and total storage space pace within the acceptable response time limit.
Figures 7A-7D provides illustrated examples of workload situations within the two domains of utilization.
In Figure 7A, data occupies all the storage capacity, the I/O processing capability is well utilized and the Write I/O response time value is not a problem.
There is a good match between data placement and the storage pool.
In Figure 7B, the I/O processing capability is almost all utilized, the storage capacity is only very partially allocated and the Write I/O response time value is not a problem. Further capacity allocation is likely to cause I/O constraints. Moving selected data to a storage pool of higher I/O capability would be suitable.
In Figure 7C, data occupies almost all the storage capacity, the I/O processing capability is under-utilized and the Write I/O response time value is not a problem. There is an opportunity to utilize a storage pool of lower I/O processing capability which is likely to be more economical. In Figure 7D, the storage capacity is almost completely allocated, the I/O processing capability is well leveraged, however, the Write I/O response time value is too high. There is a need to assess whether the high response time value FR920080238 12 constitutes a risk to workload SLA (typically a batch elapsed time) before deciding any action.
Figure 8 introduces the Read I/O rate access density factor which can be evaluated for a storage device (in terms of capability) or data workload such as applications or parts of applications (hosted in one VDisk or multiple ones). The following formulas provide additional details.
• For MDGs: Maximum Access Density = I/O processing Capability / Total storage capacity • For Applications: Maximum Access Density = Actual maximum I/O rate / allocated storage space
• For VDisks: Maximum Access Density = Actual maximum I/O rate / allocated storage space
The Read I/O rate access density is measured in IO/sec / Megabyte and its algebra can easily be understood when using a thermal analogy where high access density applications would be a 'hot' storage workloads and low access density application would be a 'cold' storage workloads. As illustrated in Figures 9 and 10, the weighted thermal formula applicable to mild water (hot + cold) applies to 'hot' and 'cold' data workloads. An MDG operates within its economical zone if the aggregated workload of all VDisks hosted in the MDG is 'close' to the MDG theoretical access density and if the MDG capacity is almost all utilized. The invention proposes a process aiming to optimizing MDG usage as a result from exchanging workload(s) with other MDGs of different access density. The preferred embodiment of this invention is the use of the Read I/O rate density to classify MDG capacity among the various tiers. An MDG hosted on a tier 1 RAID controller has the highest Read I/O rate density among all MDGs whereas an MDG of the lowest Read I/O rate access density will belong to a tier of lower ranking (typically tier 3-5 depending on the tier grouping in the virtualized infrastructure). The preferred embodiment of the invention is implemented by the Data
Analyzer component when raising alerts based on thresholds defined by the storage administrator. There are three different alerts listed hereafter:
1. Storage capacity almost all allocated: in this situation, the Managed Disk FR920080238 13
Group capacity allocated to VDisks is close (in %) to the MDG storage capacity.
2. IO Capacity almost fully used: in this situation the maximum Read I/O rate on back end disks (Managed Disk Group) is close (in %) to the maximum theoretical value.
3. 'High' response time values: in this situation the number of write instructions retained in the SVC cache is 'important' (in %) when compared to the total number of write instruction. This phenomenon reveals an increase of the write response time which may be causing breach of SLA target values for batch workloads.
Figure 11 shows these three alert thresholds as they refer to MDG domains of utilization.
The driving principles for storage pool optimization are the following ones: 1. If "Allocated capacity" is close to "Maximum capacity" and "Read I/O activity" is significantly lower than the "Read I/O capability", the "Read I/O capability" is not fully leveraged. Then, application data of lowest access rate density must be removed from the discrete virtual storage pool (i.e. MDG) to free up space to host application data of higher access rate density. The removed application data of lowest access rate density should be dispatched to a storage pool of lower Read access rate density capability. This process is called "down-tiering".
2. If "Read I/O activity" is close to the "Read I/O capability" and "Allocated capacity" significantly lower than "Maximum capacity", the storage pool capacity is unbalanced and adding more application data is likely to cause an undesired performance constraint. Handing this situation requires removing application data of highest access rate density from the storage pool to free up read I/O capability. This capacity will be used later to host application data of lower access rate density. The removed application data (of highest access rate density) may need to be dispatched to a storage pool of higher "Read I/O density capability". This process is called "up-tiering".
3. "Write response time" values increase when write cache buffers are filled up and this may put application service level agreement (SLA) at risk. In this situation, it is necessary to perform a trend analysis to project future "Write response time" FR920080238 14 values and assess whether application SLA will be endangered. If this is the case, the related application data (VDisks) must be "up-tiered" to a storage pool of higher write I/O capability. If the SLA is not at risk, the application data placement may be kept unchanged in its current storage pool. 4. If the storage pool is in an intermediate status where all the space is not fully allocated or its Read I/O activity is not close to the "Read I/O capability", there is no need to consider any action. Even if a hot workload is present in the MDG, its behavior may be balanced by a cold workload resulting in an average workload within the MDG capability. This opportunistic situation significantly reduces the hypothetical amount of right tiering actions which might be unduly recommended by a micro analysis approach.
5. If "Read I/O activity" is close to the "Read I/O capability" and "Allocated capacity" is almost equal to the "Maximum capacity", the storage pool capacity is well balanced as long as the "Write response time" value stays within the acceptable limits and the two alerts compensate each other.
6. When determining which VDisk(s) should be right-tiered, absolute Read I/O rate VDisk actual values cannot be used 'as is' because of the cache present at the virtualization engine level. This cache allows serving Read I/O request to front end data processors without incurring back end Read instructions. The method of the present invention uses the relative Read I/O rate activity for each VDisk compared to the front end aggregated workload hosted in the MDG to sort VDisks between 'hot' and 'cold' data workloads and take practical re-tiering decisions. It will be clear to one skilled in the art that the method of the present invention may suitably be embodied in a logical apparatus comprising means to perform the steps of the method, and such logic means may comprise hardware components of firmware components.
The implementation of this optimization approach may be supported by means of a microprocessor supporting a process flow as now described with reference to Figure 12. Step 1200 checks if the allocated storage capacity if greater than 90% of the total capacity of the Managed Disk Group where the threshold value (90%) can be set up by the storage administrator according to local policy. FR920080238 15
If the result is No, then a test is performed (step 1202) to determine whether the actual Read I/O rate is greater than 75% of the read I/O capability of the MDG where the threshold value (75%) can be set up by the storage administrator according to local policy. - If the result is No, meaning that the pool is in an intermediate state, no further action is performed and the process goes to step 1216.
- If the result of test 1202 is Yes, meaning that the aggregated workload is already using a high percentage of the Read I/O capability without all the space being consumed, there is a high probability that adding further workload may saturate the Read I/O capability causing workload SLA to suffer. Therefore an up-tiering operation is recommended in step 1206. Next, on step 1208, the up-tiering is performed by selecting the VDisk(s) of highest access density currently hosted in the MDG, and up- tiering to another MDG for which the VDisk is a good complementary workload. After this VDisk right-tiering operation, the source MDG will see its Read Access rate density actual value decreasing and becoming closer to its intrinsic capability, making this MDG usage better balanced in regards of its utilization domain. The process then goes to step 1216.
Going back to test performed on step 1200, if the result is Yes, then a similar test similar to step 1202 is performed. - If the result is Yes, meaning that the aggregated workload is using a high percentage of the Read I/O capability and most of the space is consumed, the MDG is operating in its economical domain, no further action is performed, and the process stops.
- If the result is No, meaning that the Read I/O capability in under utilized and most of the space is already consumed, then, the MDG Read I/O capability is likely to stay underutilized. The VDisks in the MDG would be more economically hosted on an MDG of lower tier. Therefore a down-tiering operation is recommended in step 1212. Next, on step 1214, the down-tiering is performed by selecting the VDisk(s) of lowest access density in the MDG, and down-tiering to another MDG for which the VDisk is a good complementary workload. After this VDisk right-tiering operation, the source MDG will see its Read Access rate density actual value increasing and becoming closer to its intrinsic capability, making this MDG usage better balanced in regards of its utilization domain. The process then goes to step 1216. FR920080238 16
Finally, on step 1216, the available MDG storage capacity is allocated to other workloads of complementary access density profile, and the process loops back to step 1200 to analyze the following MDG. When all MDGs are analyzed, the process will wait until the next evaluation period to restart in 1200 for the first MDG of the list.
The analysis / alert method can be integrated in a repeatable storage management process as a regular monitoring task. For instance, every day, a system implementation of the method could produce a storage management dashboard reporting for each MDG, actual values versus capability and capacity and Write response time situation with highlighted alerts when applicable. The dashboard would be accompanied with drill- in views providing behaviors of the VDisks hosted by each MDG, this view being sorted by Read I/O Access rate density and a list of right-tiering actions which might be evaluated by the storage administrator for passing to the virtualization engine.
Figure 13 shows a flow chart of the analysis / alert method to take care of the Write I/O quality of service aspects. In this figure, the Write I/O response time trigger is replaced by another Write I/O rate indicator. This indicator is based on the ratio between the Front End Write cache Delay I/O rate and the total Write I/O rate value. Write Cache Delay I/O operations are Write I/O operations retained in the Write cache of the virtualization engine because the back end storage pool cannot accept them because of saturation. When the amount of Write Cache delay I/O operations reaches a significant percentage of the total Write I/O activity, the front end application is likely to become slowed down and the response time increases. The usage of this indicator as a re-tiering alert is another embodiment of the present invention.
On step 1300, a test is performed to check if the Front End Write Cache Delay I/O rate has reached the threshold where the threshold value is be set up by the storage administrator according to local policy.
If the result is No, then the process goes to step 1320 If the result is Yes, then the VDisks causing the alert are traced to the application using these VDisks on step 1302. Next, on step 1303 values for the application batch elapsed time value [A] and the batch elapsed time SLA target [T] are collected. This data is provided externally to the present invention typically by FR920080238 17 application performance indicators under IT operation staff responsibility. Next on step 1304, a new test checks whether the application SLA, typically a batch elapsed time target is at risk by the mean of comparing A and T values versus a safety threshold level. If the result is No, meaning that A is significantly lower than T, then the observed high response time values are not important for the batch duration, no further action is performed on step 1306, and the process goes to step 1320.
If the result is Yes, meaning that A is close to T then on step 1308, a trend analysis of Write I/O response time and Write I/O rate values is performed using for instance TPC graphics reporting as an embodiment.
The process continues with step 1310 where a new test is performed to check whether the total time the application waits for Write I/O operations is of increasing values or not (this total Write wait time is equal to the sum for all sampling periods of the multiplication of Write I/O response time and Write I/O rate for all VDisks in alert): - If the result is No, meaning that the total time the application waits for Write
I/O operations during the batch processing does not increase over time, and therefore does not degrade the batch duration SLA, then no further action is performed on step 1312 and the process follows with step 1320.
- If the result is Yes, meaning that the total time the application waits for Write I/O operations during the batch processing is increasing and may cause the batch duration to become at risk, the process goes to step 1314 where trend analysis results are used to extrapolate, for instance with a linear modeling, future batch duration values.
The process continues with step 1316 to check if the SLA Target (T) is at risk or not in a near future. If the result is No, the process goes to step 1312 otherwise if the result is Yes, the process goes to step 1318 to up-tier some (or all) VDisks, creating the application SLA risk to an MDG with an higher I/O capability.
Finally, on step 1320, the available MDG storage capacity is allocated to other workloads of complementary access density profile, and the process loops back to step 1300 to analyze the following MDG. When all MDGs are analyzed, the process will wait until the next evaluation period to restart in 1300 for the first MDG of the list. FR920080238 18
The analysis/alert methods described in Figures 12 and 13 can also be used to characterize a new workload which I/O profile is unknown. This workload may be hosted in a 'nursery' MDG for measurement of its I/O behavior for a certain period (for instance for one month) to collect sufficient behavioral data. After this period, application VDisks could be right-tiered based on space requirement, Read I/O requirement and Read I/O density values provided by the Data Analyzer component. This 'nursery' process may replace, at low cost, the need for sophisticated storage performance estimation work required before deciding which storage tier should be used and which MDG(s) would be best suited. Future changes in application behavior would then be handled by the regular monitoring task ensuring alignment of application needs to the storage infrastructure without intervention from costly storage engineers.
In an alternate embodiment, the analysis/alert method of the present invention may be used to relocate application data when a back end disk array connected to the virtualized storage infrastructure requires de-commissioning. In this situation the data available at the Data Analyzer component may be used to decide which storage tier should be used for each of the logical storage units and which discrete storage pool (e.g. MDG) is best suited for each ones.
In yet another embodiment, the analysis/alert method of the present invention may be used to relocate application data when a disk array not connected to the virtualized storage infrastructure requires de-commissioning. In this situation the disk might be connected to the virtualized storage infrastructure and undergo the nursery characterization process before relocating the virtual logical storage units to other discrete virtual storage pools. In an alternative, the process might consist of using existing performance data collected on the disk array and reinstall the application on the virtualized storage infrastructure using the data provided by Data Analyzer component.
It will be understood by those skilled in the art that, although the present invention has been described in relation to the preceding example embodiments, the invention is not limited thereto and that there are many possible variations and modifications which falls within the scope of the invention.
The scope of the present disclosure includes any novel feature of combination of features disclosed herein. The applicant hereby gives notice that new claims may be FR920080238 19 formulated to such features or combination of features during prosecution of this application of any such further applications derived therefrom. In particular, with reference to the appended claims, features from respective independent claims may be combined in any appropriate manner and not merely in the specific combinations enumerated in the claim.
For the avoidance of doubt, the term " comprising", as used herein throughout the description and claim is not to be construed as meaning 'consisting only of. It will be understood by those skilled in the art that although the present invention has been described in relation to the preceding example embodiments by the use of SAN Volume controller vocabulary, the invention is not limited thereto and there are many possible wording which can describe an MDG or a VDisk. For instance, an MDG may be referred as a storage pool, virtual storage pool or discrete virtual storage pool and a VDisk as a Virtual Storage Logical Unit.

Claims

FR920080238 20CLAIMS
1. A method for managing storage of data in a network comprising a plurality of host data processors coupled to a plurality of physical storage media through a storage virtualization engine, the storage virtualization engine comprising a mapping unit to map between Virtual Disk(s) (VDisk(s)) to Managed Disks (MDisks), wherein a plurality of Managed Disks of a same tier level being grouped to form discrete virtual storage pool(s) (MDG(s)), the method comprising: • storing metadata describing space capacity and quantifying Read I/O capability of each discrete virtual storage pool;
• periodically collecting from the virtualization engine information on storage usage, Read I/O and Write I/O activity of Virtual Disk(s);
• aggregating the collected information; • comparing the aggregated data to the metadata of each discrete virtual storage pool; and
• generating a list of re-tiering actions for Virtual Disk(s) according to the result of the comparison step based on threshold attainment.
2. The method of claim 1 wherein Read and Write I/O information collected may be one of rate access, response times, back end and/or front end activity data, and/or queue levels.
3. The method of claim 1 or 2 wherein the collecting step further comprises the step of storing at various time periods the information collected into a local repository.
4. The method of anyone of claims 1 to 3 wherein the aggregated data comprise values of minimum, maximum, average, shape factors, for VDisk(s).
5. The method of anyone of claims 1 to 4 wherein the comparing step further comprises the step of checking if the allocated storage capacity is greater than a predefined capacity threshold value. FR920080238 21
6. The method of claim 5 wherein the predefined capacity threshold value is set to 90% of the total capacity of the discrete virtual storage pool.
7. The method of claim 5 or 6 further comprising the step of checking if the actual
Read I/O rate is greater than a predefined capability threshold value.
8. The method of claim 8 wherein the predefined capability threshold value is set to 75% of the Read I/O capability.
9. The method of anyone of claims 1 to 8 wherein the comparing step further comprises the step of checking if the Write cache delay I/O rate is greater than a predefined percentage threshold of actual Write I/O rate value.
10. The method of anyone of claims 1 to 9 wherein the threshold values are set up by a storage administrator.
11. The method of anyone of claims 1 to 10 wherein the step of generating a list of re-tiering actions further comprises the step of generating a storage pool dash- board comprising virtual storage pool capability, capacity, actual usage and alerts raised.
12. The method of anyone of claims 1 to 11 wherein the step of generating a list of re-tiering actions further comprising the step of generating a drill- in view of VDisks sorted by relative Read I/O rate density.
13. A system for managing storage of data in a network comprising a plurality of host data processors coupled to a plurality of physical storage media through a storage virtualization engine, the storage virtualization engine comprising a mapping unit to map between Virtual Disk(s) (VDisk(s)) to Managed Disks
(MDisks), a plurality of Managed Disks of a same tier level being grouped to form a discrete virtual storage pool (MDG), the system comprising means for implementing the steps of the method of anyone of claims 1 to 12. FR920080238 22
14. A computer program comprising instructions for carrying out the steps of the method according to any one of claim 1 to 12 when said computer program is executed on a suitable computer device.
15. A computer readable medium having encoded thereon a computer program according to claim 14.
PCT/EP2010/050254 2009-03-02 2010-01-12 Method, system and computer program product for managing the placement of storage data in a multi tier virtualized storage infrastructure WO2010099992A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN2010800102363A CN102341779A (en) 2009-03-02 2010-01-12 Method, system and computer program product for managing the placement of storage data in a multi tier virtualized storage infrastructure
EP10700239A EP2404231A1 (en) 2009-03-02 2010-01-12 Method, system and computer program product for managing the placement of storage data in a multi tier virtualized storage infrastructure

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP09305191.0 2009-03-02
EP09305191 2009-03-02

Publications (1)

Publication Number Publication Date
WO2010099992A1 true WO2010099992A1 (en) 2010-09-10

Family

ID=41716214

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2010/050254 WO2010099992A1 (en) 2009-03-02 2010-01-12 Method, system and computer program product for managing the placement of storage data in a multi tier virtualized storage infrastructure

Country Status (3)

Country Link
EP (1) EP2404231A1 (en)
CN (1) CN102341779A (en)
WO (1) WO2010099992A1 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102520887A (en) * 2011-12-19 2012-06-27 中山爱科数字科技股份有限公司 Storage space configuration and management method applied to cloud computing
US8341350B2 (en) 2010-09-21 2012-12-25 Lsi Corporation Analyzing sub-LUN granularity for dynamic storage tiering
US8671263B2 (en) 2011-02-03 2014-03-11 Lsi Corporation Implementing optimal storage tier configurations for a workload in a dynamic storage tiering system
GB2506164A (en) * 2012-09-24 2014-03-26 Ibm Increased database performance via migration of data to faster storage
CN105007330A (en) * 2015-08-04 2015-10-28 电子科技大学 Modeling method for storage resource scheduling model of distributed flow data storage system
WO2016068976A1 (en) * 2014-10-31 2016-05-06 Hewlett Packard Enterprise Development Lp Storage array allocator
US9880788B2 (en) 2014-12-19 2018-01-30 International Business Machines Corporation Modeling the effects of switching data storage resources through data storage pool tier performance capacity and demand gap analysis
US10152411B2 (en) 2014-12-12 2018-12-11 Huawei Technologies Co., Ltd. Capability value-based stored data allocation method and apparatus, and storage system
US10698823B2 (en) 2018-04-27 2020-06-30 Nutanix, Inc. Method and apparatus for using cache size estimations for guiding hot-tier insertion decisions
US10915272B2 (en) 2018-05-16 2021-02-09 International Business Machines Corporation Data management in shared storage systems including movement of logical units of data and mapping of virtual devices to storage device groups, wherein the movement and the mapping are, both, based on policy specifying that backup data type cannot be shared with other data types
LU501202B1 (en) * 2022-01-04 2023-07-04 Microsoft Technology Licensing Llc Prioritized thin provisioning with eviction overflow between tiers

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106325777A (en) * 2016-08-24 2017-01-11 浪潮(北京)电子信息产业有限公司 Logical unit management method and system
CN111210879B (en) * 2020-01-06 2021-03-26 中国海洋大学 Hierarchical storage optimization method for super-large-scale drug data
CN113448970B (en) * 2021-08-31 2022-07-12 深圳市一号互联科技有限公司 Graph data storage method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5345584A (en) 1991-03-11 1994-09-06 Laclead Enterprises System for managing data storage based on vector-summed size-frequency vectors for data sets, devices, and residual storage on devices
WO2007009910A2 (en) 2005-07-15 2007-01-25 International Business Machines Corporation Virtualisation engine and method, system, and computer program product for managing the storage of data
US20070079099A1 (en) * 2005-10-04 2007-04-05 Hitachi, Ltd. Data management method in storage pool and virtual volume in DKC
US20080147960A1 (en) * 2006-12-13 2008-06-19 Hitachi, Ltd. Storage apparatus and data management method using the same
US20080301763A1 (en) * 2007-05-29 2008-12-04 Hitachi, Ltd. System and method for monitoring computer system resource performance

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1782287A2 (en) * 2004-07-21 2007-05-09 Beach Unlimited LLC Distributed storage architecture based on block map caching and vfs stackable file system modules

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5345584A (en) 1991-03-11 1994-09-06 Laclead Enterprises System for managing data storage based on vector-summed size-frequency vectors for data sets, devices, and residual storage on devices
WO2007009910A2 (en) 2005-07-15 2007-01-25 International Business Machines Corporation Virtualisation engine and method, system, and computer program product for managing the storage of data
US20070079099A1 (en) * 2005-10-04 2007-04-05 Hitachi, Ltd. Data management method in storage pool and virtual volume in DKC
US20080147960A1 (en) * 2006-12-13 2008-06-19 Hitachi, Ltd. Storage apparatus and data management method using the same
US20080301763A1 (en) * 2007-05-29 2008-12-04 Hitachi, Ltd. System and method for monitoring computer system resource performance

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8341350B2 (en) 2010-09-21 2012-12-25 Lsi Corporation Analyzing sub-LUN granularity for dynamic storage tiering
US8671263B2 (en) 2011-02-03 2014-03-11 Lsi Corporation Implementing optimal storage tier configurations for a workload in a dynamic storage tiering system
CN102520887A (en) * 2011-12-19 2012-06-27 中山爱科数字科技股份有限公司 Storage space configuration and management method applied to cloud computing
US9495396B2 (en) 2012-09-24 2016-11-15 International Business Machines Corporation Increased database performance via migration of data to faster storage
GB2506164A (en) * 2012-09-24 2014-03-26 Ibm Increased database performance via migration of data to faster storage
WO2016068976A1 (en) * 2014-10-31 2016-05-06 Hewlett Packard Enterprise Development Lp Storage array allocator
US10152411B2 (en) 2014-12-12 2018-12-11 Huawei Technologies Co., Ltd. Capability value-based stored data allocation method and apparatus, and storage system
US9880788B2 (en) 2014-12-19 2018-01-30 International Business Machines Corporation Modeling the effects of switching data storage resources through data storage pool tier performance capacity and demand gap analysis
US10216458B2 (en) 2014-12-19 2019-02-26 International Business Machines Corporation Modeling the effects of switching data storage resources through data storage pool tier performance capacity and demand gap analysis
CN105007330A (en) * 2015-08-04 2015-10-28 电子科技大学 Modeling method for storage resource scheduling model of distributed flow data storage system
CN105007330B (en) * 2015-08-04 2019-01-08 电子科技大学 The modeling method of the storage resource scheduling model of distributed stream data-storage system
US10698823B2 (en) 2018-04-27 2020-06-30 Nutanix, Inc. Method and apparatus for using cache size estimations for guiding hot-tier insertion decisions
US10915272B2 (en) 2018-05-16 2021-02-09 International Business Machines Corporation Data management in shared storage systems including movement of logical units of data and mapping of virtual devices to storage device groups, wherein the movement and the mapping are, both, based on policy specifying that backup data type cannot be shared with other data types
LU501202B1 (en) * 2022-01-04 2023-07-04 Microsoft Technology Licensing Llc Prioritized thin provisioning with eviction overflow between tiers
WO2023133037A1 (en) * 2022-01-04 2023-07-13 Microsoft Technology Licensing, Llc Prioritized thin provisioning with eviction overflow between tiers

Also Published As

Publication number Publication date
EP2404231A1 (en) 2012-01-11
CN102341779A (en) 2012-02-01

Similar Documents

Publication Publication Date Title
WO2010099992A1 (en) Method, system and computer program product for managing the placement of storage data in a multi tier virtualized storage infrastructure
US10754573B2 (en) Optimized auto-tiering, wherein subset of data movements are selected, utilizing workload skew point, from a list that ranks data movements based on criteria other than I/O workload
US9898224B1 (en) Automatic adjustment of capacity usage by data storage optimizer for data migration
US9507887B1 (en) Adaptive techniques for workload distribution across multiple storage tiers
US8838931B1 (en) Techniques for automated discovery and performing storage optimizations on a component external to a data storage system
US8868797B1 (en) Techniques for automated discovery of storage devices and their performance characteristics
US10324633B2 (en) Managing SSD write quotas in data storage systems
US9026765B1 (en) Performing write operations in a multi-tiered storage environment
US10353616B1 (en) Managing data relocation in storage systems
US9952803B1 (en) Techniques for automated evaluation and moment of data between storage tiers
US9785353B1 (en) Techniques for automated evaluation and movement of data between storage tiers for thin devices
US9477407B1 (en) Intelligent migration of a virtual storage unit to another data storage system
US9703664B1 (en) Self adaptive workload classification and forecasting in multi-tiered storage system using ARIMA time series modeling
US8566483B1 (en) Measuring data access activity
US8935493B1 (en) Performing data storage optimizations across multiple data storage systems
US9542125B1 (en) Managing data relocation in storage systems
US10339455B1 (en) Techniques for determining workload skew
US8566546B1 (en) Techniques for enforcing capacity restrictions of an allocation policy
US9354813B1 (en) Data storage system modeling
US10671431B1 (en) Extent group workload forecasts
US9256381B1 (en) Managing degraded storage elements in data storage systems
US8972694B1 (en) Dynamic storage allocation with virtually provisioned devices
US9323459B1 (en) Techniques for dynamic data storage configuration in accordance with an allocation policy
US10318163B2 (en) Balancing SSD wear in data storage systems
US8862837B1 (en) Techniques for automated data compression and decompression

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 201080010236.3

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10700239

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2010700239

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE