CN103765370B - Access the computer system of object storage system - Google Patents

Access the computer system of object storage system Download PDF

Info

Publication number
CN103765370B
CN103765370B CN201280041414.8A CN201280041414A CN103765370B CN 103765370 B CN103765370 B CN 103765370B CN 201280041414 A CN201280041414 A CN 201280041414A CN 103765370 B CN103765370 B CN 103765370B
Authority
CN
China
Prior art keywords
vvol
storage
computer system
virtual machine
logical storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201280041414.8A
Other languages
Chinese (zh)
Other versions
CN103765370A (en
Inventor
S·B·瓦哈尼
I·索科林斯基
T·阿斯瓦坦拉亚纳
S·古博尔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
VMware LLC
Original Assignee
VMware LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US13/219,378 external-priority patent/US8650359B2/en
Application filed by VMware LLC filed Critical VMware LLC
Priority to CN201610515983.1A priority Critical patent/CN106168884B/en
Publication of CN103765370A publication Critical patent/CN103765370A/en
Application granted granted Critical
Publication of CN103765370B publication Critical patent/CN103765370B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

Storage system derives the logical storage volumes being supplied as storing object.The computer system connected accesses these storage objects as desired by the logical endpoints for protocol traffic that configures within the storage system, the standard agreement that uses such as SCSI and NFS.Before sending IO Command to logical storage volumes, computer system sends for the request that logical storage volumes is tied to protocol end.As response, return the first identifier for protocol end and the second identifier for logical storage volumes.Even if using same protocol end points, it is also possible to generate different second identifiers for Different Logic storage volume.Therefore, single protocol end can serve as the gateway for multiple logical storage volumes.

Description

Access the computer system of object storage system
Background technology
Along with computer system zooms to enterprise-level, particularly in the situation supporting large-scale data center, the lower number of plies Storage area network (SAN) or network attachment storage device (NAS) is often used according to storage system.As conventionally by well As understanding, SAN or NAS provides many technical capabilities and operational benefits, these technical capabilities and ground, operational benefits basis The physical equipment controlled including the virtualization of data storage device, the fault recovery with transparent fault tolerant and error protection Redundancy, be distributed and the storage device that replicates and from computer system management decoupling client-centered geographically The centralized supervision closed and storage configuration management.
Architecturally, the storage device in SAN storage system (such as, disk array etc.) is typically connected to the network switch (such as, fibre channel switch etc.), these network switch are then attached to server or " main frame ", these servers or Person's " main frame " needs to access the data in storage device.Server, switch and storage device in SAN generally use small-sized Computer system interface (SCSI) agreement communicates, and this SCSI protocol crosses over network transmission data in dish block level.Comparison and Speech, NAS device is typically following equipment, and this device interior comprises one or more storage device and passes through procotol (such as Ethernet) is connected to main frame (or intervening switch).In addition to comprising storage device, NAS device the most basis Network file system (such as NFS (NFS) or Common Internet File System (CIFS)) preformat Change its storage device.So, as with to main frame exposure dish (be referred to as LUN and in further detail below describe) these Then dish needs according to the SAN comparison being formatted by the file system of host computer and then assembling, NAS device Network file system (this system needs by the operating system support of main frame) makes the NAS device operating system table to main frame Being now file server, then this document server can assemble or map NAS device and such as may have access to as operating system Network-driven.It should be appreciated that along with storage system sales business constantly brings forth new ideas and issues new product, in SAN Yu NAS storage be Clear difference between system continues to weaken, and actual storage system implementations often represents the characteristic of the two, in identical systems Both middle offer file-level agreement (NAS) and block level agreement (SAN).Such as, in alternate nass framework, NAS " head (head) " Or " gateway " equipment networking is to main frame rather than tradition NAS device.Such NAS gateway device itself does not comprises storage and drives Dynamic, but make External memory equipment be connectable to NAS gateway device (such as, via fibre channel interface etc.).By main frame with Similar manner is perceived as such NAS gateway device of tradition NAS device and provides a kind of for dramatically increasing storage based on NAS It is simple that the capacity (the memory capacity level such as, supported by SAN on more conventional) of framework and holding file-level storage access The ability of property.
SCSI and other storage devices based on block agreement (the storage system 30 than as shown in Figure 1A) utilize and represent one The system management memory device 31 of the storage processor of individual or multiple programming with the memory element in polymerization storage device or drives Dynamic and be rendered as them each having uniquely can one or more LUN(logical unit number of identifier number) 34. LUN34 passes through network 20(such as, fiber channel etc.) fitted by physical host bus by one or more computer system 10 Orchestration (HBA) 11 accesses.In computer system 10 and at more than HBA11, by from the beginning of low level device drivers layer 12 And a series of software layers terminated in operating system proprietary file system layer 15 are implemented storage and are accessed abstract with having feature. Realize the device driver layer 12 of the basic access of LUN34 is generally specific to the communication protocol that used by storage system (such as, SCSI etc.).Data access layer 13 can be implemented to support to be visited by HBA11 and other data in device driver layer more than 12 Ask that the multipath of control and the visible LUN34 of management function merges.Generally at data access layer 13 and conventional operating systems file Between system layer 15 implement LVM 14 support by the addressable LUN34 of HBA11 towards volume virtualization and Management.Under the control of LVM 14, multiple LUN34 can be assembled together as volume and manage for literary composition Part system layer 15 presents and is used as logical device by file system layer 15.
System management memory device 31 be implemented in resident in storage system 30, be referred to as spindle in figure ia (spindle) virtualization being typically based on the physical memory cell that dish drives of 32.From the point of view of logic viewpoint, these spindles 32 In each spindle can be considered the array successively in panel (extent) of fixed size.System management memory device 31 passes through (this logical memory space is divided to expose continuous logic memory space to the computer system (such as computer system 10) connected Become to be referred to as the set of virtual SCSI device of LUN34) carry out abstract melting and make reading and write operation actual spin with what dish drove The address in hammer body and panel is the complexity of target.Each LUN is by there is such LUN and presenting to computer system 10 Such LUN represents certain capacity being assigned for being used by computer system 10.Unit safeguarded by system management memory device 31 Data, this metadata includes the mapping for each such LUN to the ordered list in panel, and the most each such panel can To be identified as spindle-panel to<spindle #, panel #>and any spinning of therefore may be located in various spindle 32 In hammer body.
Figure 1B is by network 21(such as, Ethernet) it is connected to one or more via NIC (NIC) 11 ' The block diagram of the Conventional memory systems based on NAS or file-level 40 of computer system 10.Storage system 40 includes representing one Or the system management memory device 41 of the storage processor of multiple programmings.System management memory device 41 is resident in storage system 40 , in fig. ib be referred to as spindle 42 be typically based on dish drive physical memory cell above implement file system 45.From patrolling From the point of view of collecting viewpoint, each spindle in these spindles can be considered the array successively in the panel 43 of fixed size.File System 45 includes catalogue and file (these catalogues and literary composition by exposing to the computer system (such as computer system 10) connected Part can be organized into file system level volume 44(hereinafter referred to as " FS volume "), these FS volume is visiting by their corresponding assembling point Ask) name space come the abstract address in actual spindle and the panel making reading and write operation drive with dish that melts as target Complexity.
Even if the progress in the storage system being described above, still have been widely recognized that they are the most scalable To meet the specific needs of virtualized computer system.Such as, server machine cluster can serve up to 10,000 Virtual machine (VM), each VM uses number to be multiple " virtual disks " and number is multiple " snapshot ", each virtual disk and soon According to the file can being such as stored as on specific LUN or FS volume.Even if be scaled down be estimated as every VM have 2 virtual Dish and 2 snapshots, if VM is directly connected to physical disks (i.e. every physical disks has 1 virtual disk or snapshot), this still amounts to 60,000 different dishes are supported for storage system.Additionally, this scale storage device and Topology Management is known has any problem.Make For result, develop such as combine here by quoting, entitled " Providing Multiple Concurrent Access to a File System " United States Patent (USP) 7, this concept of the data repository described in 849,098, at these In data repository, VM is multiplexed into physical storage entity (such as, VMFS clustered file systems based on LUN or FS volume) In more small set.
In using the Conventional memory systems of LUN or FS volume, from multiple VM workload generally by single LUN or Single FS volume of service.As result, the resource requirement from a VM workload will affect on identical LUN or FS volume to separately The service level that one VM workload provides.For storage efficiency measurement (such as time delay and input/output operations per second (IO) or Person IOPS) number of variations of workload that is accordingly dependent in given LUN or FS volume and can not being ensured.Thus, The storage strategy for the storage system using LUN or FS volume can not be performed on the basis of every VM, and can not be at every VM base Give service level agreement (SLA) on plinth to ensure.Additionally, the granularity at LUN or FS volume rather than the grain of the virtual disk at VM Degree provides the data, services (such as snapshot, replicate, encrypt and deduplication) provided by storage system sales business.As result, can To use the data, services provided by storage system sales business to come for whole LUN or whole FS volume of establishment snapshot, but can not Store the LUN of virtual disk from which or file system creates the snapshot of the single virtual dish for VM discretely.
Summary of the invention
One or more embodiment relates to a kind of storage system, and this storage system is arranged to be isolated in wherein to be run Workload so that every workload ground provides SLA to ensure and can the number of every workload ground offer storage system According to service, and without storing the thorough redesign of system.In storage in the storage system of the virtual disk of multiple virtual machines, SLA can be provided on the basis of every virtual disk to ensure, and the data of storage system can be provided on the basis of every virtual disk to take Business.
According to one embodiment of present invention, storage system is referred to as the logical storage volume of " storage container " from here and refers to Derive among group on every workload basis, be supplied as logical storage volumes that store object, that be referred herein as " virtual volume ". For VM, virtual volume can be created for each virtual disk in the virtual disk of VM and snapshot and snapshot.In one embodiment, By the computer system connected by the logic for protocol traffic being referred to as " protocol end " configured within the storage system End points, use standard agreement (such as SCSI and NFS) access virtual volume as desired.
A kind of for being tied to depositing by the logical storage volumes created within the storage system In storage system, the protocol end of configuration comprises the following steps for the method used by the application run in computer systems: The request for binding logical storage volumes is sent to storage system via non-I/O path;And memory response receives in request First identifier and the second identifier, wherein the first identifier and the second identifier are encoded into and will deposit to logic via I/O path The IO that storage is permed out.First identifier identity protocol end points and the second identifier mark logical storage volumes.
A kind of according to an embodiment of the invention method bag for sending input-output order (IO) to logical storage volumes Include following steps: receive the read/write for file from application and ask;Generate block level IO corresponding with read/write request; The block device title included in block level IO is translated into the first identifier and the second identifier;And to by the first identifier mark The protocol end known sends IO, IO and includes the second identifier for identifying logical storage volumes.
A kind of computer system is included therein multiple virtual machines of operation, in virtual machine Each virtual machine have within the storage system as the virtual disk that is managed of logical storage volumes separated.This computer system is also Including: hardware store interface, it is arranged to send IO to storage system;And virtualization software module, be arranged to from Virtual machine receives the read/write for the file on virtual disk and asks and each have according to read/write request generation Protocol end identifier and the first and second IO of secondary identifier.
Embodiments of the invention also include a kind of non-transient computer-readable recording medium storing instruction, and these instructions exist Computer system is made to perform one of methods set forth above when being performed by computer system.
Accompanying drawing explanation
Figure 1A is the conventional memory device based on block agreement being connected to one or more computer system by network Block diagram.
Figure 1B is the block diagram of the conventional NAS device being connected to one or more computer system by network.
Fig. 2 A is the storage system cluster based on block agreement of enforcement virtual volume according to an embodiment of the invention Block diagram.
Fig. 2 B is the frame of the storage system cluster based on NAS of enforcement virtual volume according to an embodiment of the invention Figure.
Fig. 3 is the virtual for managing of the storage system cluster of Fig. 2 A according to an embodiment of the invention or Fig. 2 B The block diagram of the parts of volume.
Fig. 4 is the flow chart of the method step for creating storage container.
Fig. 5 A is in storage system based on SAN the one of the computer system being software for carrying out virtual volume of master control The block diagram of individual embodiment.
Fig. 5 B is the computer system of Fig. 5 A being arranged to virtual volume of master control in storage system based on NAS Block diagram.
Fig. 5 C is the another of the computer system being software for carrying out virtual volume of master control in storage system based on SAN The block diagram of one embodiment.
Fig. 5 D is the computer system of Fig. 5 C being arranged to virtual volume of master control in storage system based on NAS Block diagram.
Fig. 6 is the calculating illustrating the parts and communication path for managing virtual volume according to an embodiment of the invention The simplified block diagram of machine environment.
Fig. 7 is the flow process for the method step to the storage system cluster authentication calculations machine system of Fig. 2 A or Fig. 2 B Figure.
Fig. 8 is the flow chart of the method step for creating virtual volume according to an embodiment.
Fig. 9 A is the flow chart for finding can be used for the method step of the protocol end of computer system.
Fig. 9 B is for storing the method for the protocol end that system discovery computer system is connected to step via in-band path Rapid flow chart.
Figure 10 is the flow chart of method step for sending and perform virtual volume bind request according to an embodiment.
Figure 11 A and Figure 11 B is the flow chart of the method step for sending IO to virtual volume according to an embodiment.
Figure 12 is the flow chart of the method step for performing IO in storage system according to an embodiment.
Figure 13 is the stream of method step for sending and perform virtual volume bind request again according to an embodiment Cheng Tu.
Figure 14 is the concept map of the life cycle of virtual volume.
Figure 15 is the flow process of the method step for supplying VM of the storage system of the use Fig. 2 A according to an embodiment Figure.
Figure 16 A is the flow chart for the method step making VM be energized.
Figure 16 B is the flow chart of the method step for making VM power-off.
Figure 17 is the flow chart of the method step of the size of the vvol for extending VM.
Figure 18 is the flow chart of the method step of the vvol for VM mobile between storage container.
Figure 19 is the flow chart for the method step from template VM clone VM.
Figure 20 is the flow chart of the method step for supplying VM according to another embodiment.
Figure 21 illustrates samples storage capability profile and for the method creating storage container, and the method includes that profile selects step Suddenly.
Figure 22 is to illustrate for creating vvol and the definition flow process for the method step of the storage capacity profile of vvol Figure.
Figure 23 is the flow chart of the diagram method step for creating snapshot.
Detailed description of the invention
Fig. 2 A and Fig. 2 B is the block diagram of the storage system cluster implementing " virtual volume " according to an embodiment of the invention.Storage System cluster includes one or more storage system, such as, stores system 1301With 1302, these storage systems can be dish Array, each storage system has multiple data storage cell (DSU), and marking one of these DSU in the drawings is 141, and deposits Storage system cluster includes system management memory device 131 and 132, and these system management memory devices control the various of storage system 130 Operation is to realize embodiments of the invention described herein.In one embodiment, two or more storage system 130 is permissible Implementing distributed memory system manager 135, this distributed memory system manager 135 controls the operation of storage system cluster such as It is that single logical storage system is the same with them.The operation domain of distributed memory system manager 135 can be crossed at identical number In according in the heart or cross over the storage system that multiple data center installs.Such as, in one suchembodiment, distributed deposit Storage system manager 135 can include system management memory device 131, and system management memory device 131 is managing with as " subordinate " " master control " manager it is used as, but it would be recognized that can implement for enforcement point when the system management memory device 132 of device communicates The multiple alternative approach of cloth system management memory device.DSU represents physical memory cell, such as, based on dish or flash memory deposits Storage unit, such as rotating disk or solid-state disk.According to embodiment, storage system cluster creates and to the computer system connected (such as computer system 1001 and 1002) exposes " virtual volume " (vvol) as specifically described further here.In department of computer science The application (such as, VM etc., these VM access their virtual disk) run in system 100 uses standard agreement (such as Fig. 2 A's In embodiment for SCSI and in the embodiment of Fig. 2 B for NFS), by that be configured in storage system 130, be referred to as " association View end points " the on-demand access vvol of the logical endpoints for SCSI or NFS protocol flow of (PE).From computer system 100 to The communication path for the data manipulation relevant with application of storage system 130 referred herein as path " in band ".At computer Between host bus adaptor (HBA) and the PE configured in storage system 130 of system 100 and in computer system 100 NIC (NIC) and in storage system 130 communication path between the PE of configuration be the example of in-band path.From meter Calculation machine system 100 to storage system 130 be not band in and be commonly used to perform management operate communication path here by It is referred to as " band is outer " path.The example of out-of-band path is illustrated in figure 6 discretely, such as in computer system 100 from in-band path It is connected with the ethernet network between storage system 130.To put it more simply, computer system 100 is shown as being directly connected to deposit Storage system 130.It will be appreciated, however, that they can by one or more switch in multiple paths and switch even Receive storage system 130.
Distributed memory system manager 135 or single system management memory device 131 or 132 can be from representing physics The logic " storage container " of the logical aggregate of DSU creates vvol(such as, when computer system 100 is asked etc.).It is said that in general, Storage container can cross over more than one storage system, and many storage containers can by single system management memory device or Distributed memory system manager creates.Similarly, single storage system can comprise many storage containers.At Fig. 2 A and Fig. 2 B In, distributed memory system manager 135 storage container 142 createdAIt is shown as crossing over storage system 1301With storage system System 1302, and storage container 142BWith storage container 142CIt is shown as being contained in single storage system and (that is, is respectively storage System 1301With storage system 1302).It should be appreciated that owing to storage container can cross over a multiple storage system, so depositing Storage system manager can supply the memory capacity of the memory capacity exceeding arbitrary storage system to its client.Also it should be noted that Arrive, owing to multiple storage container can be created in single storage system, so system management memory person can use single depositing Storage system supplies storage device to multiple clients.
In the embodiment of Fig. 2 A, supply each vvol from block-based storage system.In the embodiment of Fig. 2 B, based on The storage system of NAS implements file system 145 on DSU141, and exposes each vvol conduct to computer system 100 File object in this file system.Additionally, as following will specifically describe further as, in computer system 100 The application of upper operation accesses the vvol for IO by PE.Such as, as shown in the dotted line in Fig. 2 A and Fig. 2 B, vvol151 and Vvol152 may have access to via PE161;Vvol153 and vvol155 may have access to via PE162;Vvol154 via PE163 and PE164 may have access to;And vvol156 may have access to via PE165.It should be appreciated that from multiple storage containers vvol(such as Vvol153 in storage container 142A and at storage container 142CIn vvol155) can any time given via Single PE(such as PE162) may have access to.It should also be appreciated that PE(such as PE166) can may have access to via them not existing Any vvol time exist.
In the embodiment of Fig. 2 A, storage system 130 uses known for setting up the method for LUN that PE is embodied as spy The LUN of different type.As with LUN, storage system 130 provides to each UE and is referred to as WWN(world wide title) unique Identifier.In one embodiment, when creating PE, store the not specified size for special LUN of system 130, because here The PE described is not real data container.In one suchembodiment, storage system 130 can assign null value or the least Value as the size of the LUN relevant with PE so that as discussed further below, manager can please seek survival Storage system provide LUN(such as, traditional data LUN and the LUN relevant with PE) list time quickly identify PE.Similarly, storage System 130 can assign the LUN numbering more than 255 as the identifier number for LUN to indicate by mankind close friend's mode to PE They are not data LUN.As the another way for distinguishing between PE and LUN, can be to inquiry data VPD page of extension PE position is added in face (page 86h).When LUN is PE, PE position is arranged to 1, and is arranged to 0 when it is general data LUN. Computer system 100 can be by sending scsi command REPORT_LUNS(report _ LUN) to find PE also via in-band path And the PE position indicated by inspection determines whether they are the PE according to the embodiments described herein or routine data LUN. Computer system 100 can check that LUN size and LUN numbering character are to further confirm that whether as PE or often LUN alternatively Rule LUN.It should be appreciated that any technology in technology described above can be used to distinguish the LUN relevant with PE and common number According to LUN.In one embodiment, PE position technology is used to the only technology distinguishing the LUN relevant with PE with general data LUN.
In the embodiment of Fig. 2 B, use known for setting up the method for the assembling pointing to FS volume point in storage system PE is created in 130.The each PE created in the embodiment of Fig. 2 B is by the IP address being collectively referred to as " assembling point " the most routinely Uniquely identify with file system path.But, it being different from conventional assembling point, PE does not associate with FS volume.Additionally, be different from Fig. 2 A The PE of PE, Fig. 2 B unless the PE that virtual volume is tied to give otherwise can not be found by computer system 100 via in-band path. Therefore, the PE of Fig. 2 B via out-of-band path by storing System Reports.
Fig. 3 is the parts for managing virtual volume of the storage system cluster of Fig. 2 A according to an embodiment or Fig. 2 B Block diagram.Parts include system management memory device 131 and 132 soft performed in storage system 130 in one embodiment Part module or the software module of distributed memory system manager 135 in another embodiment, i.e. input/output (I/O) Manager 304, volume manager 306, container manager 308 and data access layer 310.In the description of embodiment here, should Work as understanding, distributed memory system manager 135 any action taked can depend on that embodiment is by system management memory Device 131 or system management memory device 132 are taked.
In the example of fig. 3, distributed memory system manager 135 creates three storage containers from DSU141 SC1, SC2 and SC3, each storage container in these storage containers is shown as having the spindle body disc being noted as P1 to Pn District.It is said that in general, each storage container has fixed physical size and associates with the concrete panel of DSU.Shown in figure 3 In example, distributed memory system manager 135 has the access to container database 316, and container database 316 is deposited for each Storage container stores its Container ID, physical layout information and some metadata.Container database 316 is managed by container manager 308 Reason and renewal, container manager 308 is the parts of distributed memory system manager 135 in one embodiment.Container ID is The universal unique identifier given to storage container when creating storage container.Physical layout information by with the storage container given The spindle panel of the DSU141 associating and being stored as the ordered list of<panel is numbered for system identifier, DSU ID>is constituted.Unit Data section division can comprise some public and some storage exclusive metadata of system sales business.Such as, metadata merogenesis can wrap Containing being licensed for access to the computer system of storage container or application or the ID of user.As another example, metadata merogenesis Comprising assignment bit map, this assignment bit map has been assigned with which < system identifier, DSU of storage system for representing to existing vvol ID, panel is numbered > panel and which panel be idle.In one embodiment, system management memory person can create for not With the Separate Storage container of business unit, so that not from the vvol of identical storage container supply different business unit.Permissible Apply other strategies being used for isolating vvol.Such as, system management memory person can use and will supply clouds from different storage containers The such strategy of vvol of the different clients of service.Vvol can also be grouped and from depositing according to their required service level Storage container is supplied.Additionally, system management memory person can create, delete and otherwise manage storage container, the most fixed The storage container number that can be created of justice and arrange can the greatest physical size of every storage container setting.
Equally, in the example of fig. 3, distributed memory system manager 135 has been supplied and (has been represented requesting computer system System 100) each from multiple vvol of different storage containers.It is said that in general, vvol can have fixed physical size or can With by thin supply, and each vvol has vvol ID, this vvol ID be when creating vvol to vvol give general only One identifier.For each vvol, vvol data base 314 is that each vvol stores depositing of its vvol ID, wherein establishment vvol The ordered list of<skew, length>value of the Container ID of storage container and the address space including vvol in this storage container. Vvol data base 314 is managed by volume manager 306 and updates, and volume manager 306 is distributed storage system in one embodiment The parts of system manager 135.In one embodiment, vvol data base 314 also stores a small amount of metadata about vvol.This Metadata is stored as the set of key-value pair in vvol data 314, and can any at the duration of existence of vvol Time is updated by computer system 100 via out-of-band path and inquires about.The key-value of storage is to falling in three classifications.The One classification is: the definition of some keyword of keyword the known explanation of their value (and therefore) can be used publicly. One example is that (such as, in virtual machine embodiment, whether vvol comprises the metadata of VM or the number of VM with virtual volume type According to) corresponding keyword.Another example be App ID, App ID be the ID of the application storing data in vvol.Second category It is: some keyword and value are stored as void by computer system exclusive keyword computer system or its management module Intend the metadata of volume.3rd classification is: the storage exclusive keyword of system sales business these permissions storage system sales business deposit Some keyword of the metadata association of storage and virtual volume.This key-value thesaurus is used for it by storage system sales business A reason of metadata be that all these keyword is readily available for storage system via the outband channel for vvol Distributors's plug-in unit and other extensions.For key-value to store operation be virtual volume create and the part of other processes, and And therefore storage operation should be the fastest.Storage system is also configured for and the value provided on concrete keyword Exact match realizes the search of virtual volume.
I/O Manager 304 is that the software module safeguarded and connect data base 312 is (the most in certain embodiments for distributed storage The parts of system administration manager 135), connect the currently active IO access path that data base 312 is stored between PE and vvol.At figure In example shown in 3, it is shown that seven the currently active IO sessions.Each active session has related PEID, secondary mark Symbol (SLLID), vvol ID and reference count (RefCnt), the difference that reference count instruction performs IO by this IO session is answered Number.By distributed memory system manager 135(such as, when being asked by computer system 100) PE Yu vvol it Between set up effective IO session process be referred herein as " binding " process.For each binding, distributed memory system manager 135(such as, via I/O Manager 304) to connect data base 312 add entry.By distributed memory system manager 135 with Process that the process of rear dismounting IO session referred herein as " solves binding ".Binding, distributed memory system manager is solved for each 135(such as, via I/O Manager 304) reference count of IO session successively decreased one.When the reference count of IO session is zero, point Cloth system management memory device 135(such as, via I/O Manager 304) can delete for this IO even from connecting data base 312 Connect the entry in path.As previously discussed, in one embodiment, computer system 100 generates and from outside via band Bind request is conciliate in the transmission binding of radial distribution formula system management memory device 135.Alternatively, computer system 100 can be passed through Make existing erroneous path over loading generate and solve bind request via in-band path transmission.In one embodiment, in ginseng Examine count from 0 change over 1 or contrary time will generate number and be changed to the number of monotone increasing addend or stochastic generation.Real at another Execute in example, generate the number that number is randomly generated, and eliminate RefCnt row from connecting data base 312, and for each binding, Even if when bind request is to the vvol bound, distributed memory system manager 135 also (such as, manages via IO Device 304) add entry to connection data base 312.
Storage systematic group at Fig. 2 A is concentrated, and I/O Manager 304 is used and connects data base 312 and process and received by PE I/O Request (IO) from computer system 100.When receiving IO at one of PE, I/O Manager 304 resolves IO with mark at IO In PE ID and SLLID that comprise to determine the vvol that IO is intended for.Data base 314, I/O Manager is connected by accessing Then 304 can fetch the vvol ID associated with PE ID and SLLID resolved.In Fig. 3 and subsequent figure, incite somebody to action to simplify PE ID is shown as PE_A, PE_B etc..In one embodiment, actual PE ID is the WWN of PE.Additionally, SLLID is shown as S0001, S0002 etc..Actual SLLID by distributed memory system manager 135 be generated as be connected in data base 312 to Any unique number among the SLLID of fixed PE ID association.Have vvol ID virtual volume logical address space with Mapping between the physical location of DSU141 is used vvol data base 314 by volume manager 306 and is made by container manager 308 Perform with container database 316.Once having been obtained for the physical location of DSU141, data access layer 310(is an enforcement Also for the parts of distributed memory system manager 135 in example) just physically perform IO at these.
Storage systematic group at Fig. 2 B is concentrated, and receives IO by PE, and each such IO includes that IO has been dealt into NFS handle (or similar file system handle).In one embodiment, for the connection data base of such system The IP address of the NFS interface that 312 comprise storage system is PE ID and includes that file system path is SLLID.Exist based on vvol The position of the vvol in file system 145 generates SLLID.Between the logical address space and the physical location of DSU141 of vvol Mapping used vvol data base 314 by volume manager 306 and used container database 316 to hold by container manager 308 OK.Once having been obtained for the physical location of DSU141, data access layer just physically performs IO at these.It should be noted that Arriving, for the storage system of Fig. 2 B, container database 312 can comprise literary composition in the container position entry for given vvol Part:<skew, length>entry ordered list (that is, vvol can be included in file system 145 the multiple file sections stored).
In one embodiment, safeguard in volatile memory and connect data base 312, and at persistent storage (ratio Such as DSU141) in safeguard vvol data base 314 and container database 316.In other embodiments, can be in persistent storage The all data bases of middle maintenance 312,314,316.
Fig. 4 is the flow chart of the method step 410 for creating storage container.In one embodiment, these steps exist By system management memory device 131, system management memory device 132 or distributed memory system pipe under the control of storage administrator Reason device 135 performs.As noted above, storage container represents the logical aggregate of physics DSU and can cross over from many Physics DSU in a storage system.In step 411, storage administrator's (via distributed memory system manager 135 etc.) sets Put the physical capacity of storage container.In cloud or data center, this physical capacity can such as represent is leased by client The amount of physical storage device.The motility provided by storage container disclosed herein is that the storage container of different client can be by depositing Storage manager supply from identical storage system, and the storage container being used for single client can such as set in any one storage Standby physical capacity be insufficient for client request size in the case of or such as vvol the physical store area of coverage will Naturally it is supplied from multiple storage systems in the case of crossing over multiple such duplication of storage system.In step 412, storage tube Reason person provides access to the Permission Levels of storage container.Such as, in multi-tenant data in the minds of, client only can access The storage container leased to he or she.In step 413, distributed memory system manager 135 generates for storage container Unique identifier.Then, in step 414, distributed memory system manager 135(such as, in one embodiment via container Manager 308) with quantity sufficient to the idle spindle panel of storage container allocation DSU141.As noted above, exist In the case of the free space of any one storage system is insufficient for physical capacity, distributed memory system manager 135 Can be from the spindle panel of multiple storage systems distribution DSU141.After having been allocated for subregion, distributed memory system Manager 135(such as, via container manager 308) with unique container ID,<System Number, DSU ID, panel numbering>have Sequence table updates container database 316 with the context ID of the computer system being licensed for access to storage container.
According to the embodiments described herein, memory capacity profile (such as, SLA or service quality (QoS)) can be often On the basis of vvol by distributed memory system manager 135(such as, requesting computer system 100 is represented) configuration.Therefore, having can Can allow the vvol with different storage capacity profile is the part of identical storage container.In one embodiment, system manager Definition default storage capability profile (or multiple may storage capacity profile), should (or these) storage capacity profile create For newly created vvol and be stored in the metadata merogenesis of container database 316 during storage container.If not for The new vvol created in storage container explicitly specifies storage capacity profile, then succession is associated by new vvol with storage container Default storage capability profile.
Fig. 5 A is the computer system being configured to practice virtual volume of master control in the storage system cluster of Fig. 2 A The block diagram of embodiment.One or more CPU (CPU) 501, memorizer 502, or many can included Individual NIC (NIC) 503 is normal with the usually server category of one or more host bus adaptor (HBA) 504 Computer system 101 is constructed on rule hardware platform 500.HBA504 makes computer system 101 to pass through in storage device 130 The PE of configuration sends IO to virtual volume.It is further illustrated in Fig. 5 A, in hardware platform 500 operating system installed above 508, and And on operating system 508, perform multiple application 5121-512N.The example of operating system 508 include known to commercial operation system Any operating system in system (such as Microsoft Windows, Linux etc.).
According to the embodiments described herein, each application 512 have associated one or more vvol and to By operating system 508 according to the block device being called the vvol created by application 512 " the establishment equipment " in operating system 508 Example sends IO.Associating between block device title with vvol ID it is maintained in block device data base 533.Carry out self-application The IO of 5121-512N is received by file system driver 510, and file system driver 510 is converted into block IO, and to Virtual volume device driver 532 provides block IO.On the other hand, carry out the IO of self-application 5121 to be shown as walking around file system and drive Dynamic device 510 and be provided directly to virtual volume device driver 532, this represent application 5121 with by incorporated herein will be complete Portion's content combines, entitled " Providing Access to a Raw Data Storage Unit in a Computer System " United States Patent (USP) 7, the mode described in 155,558 directly accesses its block device as original storage device, example As, achieve and content storage storehouse as data base disk, daily record dish, backup.When virtual volume device driver 532 receives block IO, Its access block device databases 533 is to quote the WWN of block device title and the PE ID(PE LUN specified in IO) and SLLID Between mapping, the IO access path of vvol with block device names associate is led in this PE ID and SLLID definition.Institute here In the example shown, block device title archive corresponds to as application 5121The block device example vvol12 created, and block device Title foo, dbase and log correspond respectively to as application 5122-512NIn the block device that creates of one or more application real Example vvol1, vvol16 and vvol17.In block device data base 533, other information of storage include for each block device Enlivening place value, this enlivens whether place value instruction block device enlivens, and includes the in-flight order of CIF(, commands-in- Flight) value.Enliven position " 1 " to represent and can send IO to block device.Enliven position " 0 " and represent that block device is inactive and can not be to Block device sends IO.CIF-value provides how many IO instruction of awing (but i.e., be issued not yet complete).Shown here Example in, block device foo is active and has some in-flight orders.Block device archive is inactive and will not Newer order can be accepted.But, it waits that 2 in-flight orders complete.Block device dbase is inactive and nothing is the completeest The order become.Finally, block device log enlivens, but application is currently without to the pending IO of equipment.Virtual volume device driver 532 can select to remove such equipment from its data base 533 at any time.
In addition to performing mapping described above, virtual volume device driver 532 also sends former to data access layer 540 Beginning block level IO.Data access layer 540 includes equipment access layer 534 and the device driver 536 for HBA504, equipment access layer Command queuing and scheduling strategy are applied to original block level IO, and device driver 536 lattice in the form meet agreement by 534 Formula original block level IO and to HBA504 send they for via in-band path to PE forward.Use SCSI association wherein In the embodiment of view, such as at SAM-5(SCSI framework model-5) in the SCSI LUN data field specified encode vvol believe Breath, this data field is 8 byte structures.In front 2 bytes, coding is routinely used for the PE ID of LUN ID, and utilizes surplus 6 bytes (part) that Yu encode vvol information in the LUNID of the SCSI second level, are SLLID especially.
Being further illustrated in Fig. 5 A, data access layer 540 also includes for disposing by in-band path from storage system The error handling unit 542 of the I O error received.In one embodiment, I/O manager 304 propagated by mistake by PE Put the I O error that unit 542 receives.The example of I O error class is included in the path error between computer system 101 and PE, PE Mistake and vvol mistake.All mistakes detected are categorized as aforementioned class by error handling unit 542.Running into the road leading to PE Footpath mistake and in the presence of leading to another path of PE, data access layer 540 is along the different path transmission IO leading to PE.At IO When mistake is PE mistake, error handling unit 542 updates block device data base 533 with instruction for sending each of IO by PE The erroneous condition of block device.When I O error is vvol mistake, error handling unit 542 updates block device data base 533 to refer to Show the erroneous condition for each block device associated with vvol.Error handling unit 542 can also send warning or system Event, so that by refusal to the further IO of the block device with erroneous condition.
Fig. 5 B is the block diagram of the computer system of Fig. 5 A, and this computer system has been arranged to the storage system with Fig. 2 B The storage systematic group collection docking of system cluster rather than Fig. 2 A.In this embodiment, data access layer 540 includes NFS client 545 and for the device driver 546 of NIC503.Block device title is mapped to PE ID(NAS storage by NFS client 545 System IP address) and SLLID, this SLLID be the NFS file handle corresponding with block device.As shown in Figure 5 B at block device number According to storehouse 533 stores this mapping.Yet suffer from it should be noted that, active with CIF row, but block device number shown in figure 5b It is not illustrated according in storehouse 533.As will be described below, NFS file handle unique mark file pair in NAS storage system As, and can be generated during binding procedure.Alternatively, in response to the request for binding vvol, NAS storage system returns Return PEID and SLLID, and use the opening of the vvol of common in-band mechanisms (such as, search or readdirplus) will be to Give NFS file handle.NFS client 545 also original block level IO received from virtual volume device driver 532 is translated into based on The IO of NFS file.Then device driver 546 for NIC503 formats based on NFS file in the form meet agreement IO and they be sent collectively to NFS handle NIC503 forward to one of PE for via in-band path.
Fig. 5 C is the block diagram of another embodiment of the computer system being software for carrying out virtual volume.In this embodiment In, to the virtualization software of computer system 102 configuration denoted here as management program 560.Pacify on hardware platform 550 Tubulature reason program 560, hardware platform 550 includes CPU551, memorizer 552, NIC553 and HBA554 and supports that virtual machine is held Row space 570, can parallel instancesization and the multiple virtual machines of execution (VM) 571 in virtual machine execution space 5701-571N.? In one or more embodiment, use the VMware distributed by the VMware company in Paro Otto city, CaliforniaProduct implements management program 560 and virtual machine 571.Each virtual machine 571 implements virtual hardware platform 573, Virtual hardware platform 573 is supported to install and is able to carry out applying the client operating system (OS) 572 of 579.The example bag of client OS572 Any operating system in commercial operation system (such as Microsoft Windows, Linux etc.) known to including.In each reality In example, client OS572 includes native file system layer (the most not shown), such as, the literary composition of NTFS or ext3FS type Part system layer.These file system layers dock with virtual hardware platform 573 to be deposited with access data from the viewpoint of client OS572 Storage device HBA, data storage device HBA are the virtual HBA574 implemented by virtual hardware platform 573 in reality, virtual HBA574 provides performance disk storage support (to be virtual disk or virtual disk 575 in realityA-575X) to realize performing client OS572.In certain embodiments, virtual disk 575A-575XCan show as supporting from the viewpoint of client OS572 for even Receive the SCSI of virtual machine or any other of IDE, ATA and ATAPI of including known to persons of ordinary skill in the art Suitably hardware connects interface standard.Although from the viewpoint of client OS572, what such client OS572 initiated being used in fact Grant the relevant data transmission of file system and the file system call controlling to operate shows as to virtual disk 575A-575XPathfinding Performed by for final, but in reality, process and transmit such calling by virtual HBA574 and adjust virtual machine Monitor (VMM) 5611-561N, these virtual machine monitors (VMM) 5611-561NImplement to coordinate behaviour with management program 560 The virtual system support made and need.Specifically, HBA emulator 562 functionally enables data transmission and controls operation Correctly disposed by management program 560, management program 560 by its various layers to being connected to the HBA554 of storage system 130 Transmit such operation eventually.
According to the embodiments described herein, each VM571 have associated one or more vvol and to by Management program 560 is according to the block device example being called the vvol created by VM571 to " the establishment equipment " managed in program 560 Send IO.Associating between block device title with vvol ID it is maintained in block device data base 580.From VM5712-571N IO received by SCSI virtualization layer 563, SCSI virtualization layer 563 be converted into virtual machine file system (VMFS) drive The file I/O that device 564 understands.Then file I/O is changed in bulk IO by the 564 of VMFS driver, and to virtual volume device drives Device 565 provides block IO.On the other hand, from VM5711IO be shown as walking around VMFS driver 564 and directly being carried Supply virtual volume device driver 565, this represents VM5711Directly to access in the way of described in United States Patent (USP) 7,155,558 Its block device is as original storage device, such as, achieves and content storage storehouse as data base disk, daily record dish, backup.
When virtual volume device driver 565 receives block IO, its access block device databases 580 is specified in IO to quote Block device title and PE ID and SLLID between mapping, PE ID and SLLID definition lead to and block device names associate The IO session of vvol.In example shown here, block device title dbase and log correspond respectively to as VM5711Create Block device example vvol1 and vvol4, and block device title vmdk2, vmdkn and snapn correspond respectively to as VM5712-571N In one or more VM create block device example vvol12, vvol16 and vvol17.Block device data base 580 deposits Other information of storage include enlivening place value for each block device, and this enlivens whether place value instruction block device enlivens, and wraps Include the in-flight order of CIF() value.Enliven position " 1 " to represent and can send IO to block device.Enliven position " 0 " and represent block device not Enliven and IO can not be sent to block device.CIF-value provides how many IO finger of awing (but i.e., be issued not yet complete) Show.
In addition to performing mapping described above, virtual volume device driver 565 also sends former to data access layer 566 Beginning block level IO.Data access layer 566 includes equipment access layer 567 and the device driver 568 for HBA554, equipment access layer Command queuing and scheduling strategy are applied to original block level IO, and device driver 568 lattice in the form meet agreement by 567 Formula original block level IO and to HBA554 send they for via in-band path to PE forward.Use SCSI association wherein In the embodiment of view, such as at SAM-5(SCSI framework model-5) in the SCSI LUN data field specified encode vvol believe Breath, SCSI LUN data field is 8 byte structures.In front 2 bytes, coding is routinely used for the PE ID of LUN ID, and Utilize 6 bytes (part) of residue to encode vvol information in the LUN ID of the SCSI second level, be SLLID especially.Such as figure Being further illustrated in 5C, data access layer 566 also includes the error handling worked in the way of identical with error handling unit 542 Unit 569.
Fig. 5 D is the block diagram of the computer system of Fig. 5 C, and this computer system has been arranged to the storage system with Fig. 2 B The storage systematic group collection docking of system cluster rather than Fig. 2 A.In this embodiment, data access layer 566 includes NFS client 585 and for the device driver 586 of NIC553.Block device title is mapped to PE ID(IP address by NFS client 585) and The SLLID(NFS file handle corresponding with block device).This mapping is stored as shown in fig. 5d in block device data base 580. Yet suffer from but block device data base 580 shown in figure 5d is not illustrated it should be noted that, active with CIF row.As with Lower by as description, NFS file handle uniquely identifies file object in NAS, and is binding in one embodiment It is generated during journey.NFS client 585 also original block level IO received from virtual volume device driver 565 is translated into based on The IO of NFS file.Then device driver 586 for NIC553 formats based on NFS file in the form meet agreement IO and they be sent collectively to NFS handle NIC553 forward to one of PE for via in-band path.
It should be appreciated that various terms, layer and the classification for describing the parts in Fig. 5 A-Fig. 5 D can differently be quoted And without departing from their function or spirit and scope of the present invention.Such as, VMM561 can be considered at VM571 and pipe The virtualization parts of the separation between reason program 560 (can itself be considered to virtualize " kernel " portion in such concept Part) because there is the VMM of the separation of the VM for each instantiation.Alternatively, each VMM561 can be considered its correspondence The parts of virtual machine, because such VMM includes the simulation hardware parts for virtual machine.In such alternative concepts, example As, the conceptual level being described as virtual hardware platform 573 can merge with VMM561 and be merged in VMM561, so that from Fig. 5 C and Fig. 5 D removes virtual host bus adapter 574(i.e., because its function is by host bus adaptor emulator 562 Realize).
Fig. 6 is the computer illustrating the parts and communication path for managing vvol according to an embodiment of the invention The simplified block diagram of environment.As previously described, the communication path for I/O protocol flow be referred to as in-band path and Being shown as dotted line 601 in Fig. 6, dotted line 601 connects the data access layer 540(of computer system by computer systems There is provided HBA or NIC) with storage system 130 in configuration one or more PE.It is used for managing the communication path of vvol It is that out-of-band path (as previously defined, not being the path of " in band ") and is shown as solid line 602 in figure 6.Root According to the embodiments described herein, can be by the plug-in unit 612 provided in management server 610 and/or in computer system 103 In each computer system in provide plug-in unit 622 manage vvol, one of these plug-in units are the most only shown.Set in storage On standby side, management interface 625 is configured by system management memory device 131, and management interface 626 is by system management memory device 132 Configuration.Additionally, management interface 624 is configured by distributed memory system manager 135.Each management interface and plug-in unit 612,622 Communication.In order to contribute to sending and dispose administration order, have been developed for special applications DLL (API).It should be appreciated that In one embodiment, two plug-in units 612,622 of customization with the storage hardware communications from particular memory system sales business.Cause This, management server 610 and computer system 103 by from the storage hardware communications luck being used for different storage system sales business With different plug-in units.In another embodiment, can there is the single plug-in unit mutual with the management interface of any distributors.This will need It is programmed for system management memory device knowing interface (such as, issuing) by by computer system and/or management server.
The system administration manager 611 of associated computer system also it is configured to management server 610.An embodiment In, computer system performs virtual machine, and system administration manager 611 manages the virtual machine run in computer systems.Management One example of the system administration manager 611 of virtual machine is distributed by VMware companyProduct.As indicated, be System manager 611 with in computer system 103 operation main frame background program (daemon) (hostd) 621 communicate (by The suitable hardware interface of both management server 610 and computer system 103) to receive resource use from computer system 103 Report and the various management of application initiation to running in computer system 103 operate.
Fig. 7 is the flow process for the method step to the storage system cluster authentication calculations machine system of Fig. 2 A or Fig. 2 B Figure.This is initiated when computer system is by asking certification to storage its Secure Sockets Layer(SSL) certificate of system transfers A little method steps.In step 710, storage system sends the prompting (example for certification certificate to the computer system of request certification As, the user name and password).When receiving certification certificate in step 712, storage system compares they and storage in step 714 Certificate.Armed with correct certificate, then the SSL card of storage system computer system of authentication storage in keyword thesaurus Book (step 716).Armed with incorrect certificate, then storage system is ignored SSL certificate and returns suitable error message (step Rapid 718).After certified, computer system can call API with by SSL chain road direction storage system send administration order, And the unique context ID included in SSL certificate is tactful for carrying out some by the system that stores, and which such as defines and calculates Which storage container is machine system can access.In certain embodiments, can make when managing the authority of computer system approval With their context ID.Such as, host computer can be licensed establishment vvol, but can be not permitted deletion vvol or The snapshot of person vvol, or host computer can be licensed create vvol snapshot, but can be not permitted clone vvol. Additionally, authority can be according to the user class change in privileges of the user of the computer system signing in certification.
Fig. 8 is for using establishment virtual volume api command to create the flow chart of the method step of virtual volume.A reality Executing in example, computer system 103 has certain from one of its application reception for creating in step 802 in computer system 103 Via out-of-band path 602 to depositing during the request of the vvol of individual size and storage capacity profile (such as minimum IOPS and average delay) Storage system sends establishment virtual volume api command.As response, computer system 103 in step 804(computer system 103 He Request is applied that be licensed for access to and has abundant idle capacity to adapt among the storage container of request) select storage container And send establishment virtual volume api command via plug-in unit 622 to storage system.Api command include vvol storage container ID, Vvol size and storage capacity profile.In another embodiment, api command include applying require storage system with newly created The key-value that vvol stores together is to set.In another embodiment, server 610 is managed via out-of-band path 602 to depositing Storage system sends establishment virtual volume api command (via plug-in unit 612).
In step 806, system management memory device receives via management interface (such as, management interface 624,625 or 626) For generate vvol ask and access selection storage container the metadata merogenesis in container database 316 with verify Request contexts including computer system 103 and application has abundant authority to create vvol in the storage container selected. In one embodiment, if Permission Levels are insufficient, then return error message to computer system 103.If Permission Levels Fully, then unique vvol ID is generated in step 810.Then, in step 812, system management memory device scanning container database Assignment bit map in the metadata merogenesis of 316 is to determine the Free Partition of the storage container of selection.System management memory device distributes Select storage container be enough to adapt to request vvol size Free Partition and update storage container in container data Assignment bit map in the metadata merogenesis in storehouse 316.System management memory device also updates vvol data base 314 by new vvol entry. New vvol entry is included in the vvol ID of step 810 generation, the ordered list in newly assigned storage container panel and new vvol's It is expressed as the metadata of key-value pair.Then, in step 814, system management memory device transmits to computer system 103 vvol ID.In step 816, computer system 103 associates the application of vvol ID and request establishment vvol.An embodiment In, for one or more vvol descriptor file of each applicating maintenance, and safeguard to creating the application of vvol for request Vvol descriptor file in write vvol ID.
As in figs. 2 a and 2b, and not all vvol is connected to PE.The vvol being not attached to PE does not knows by correspondence The IO that application sends, because not setting up IO session to vvol.Can be before vvol sends IO, vvol experiences binding procedure, As the result of this binding procedure, vvol will be bound to specific PE.Once vvol is bound to PE, it is possible to send to vvol IO is until solving binding vvol from PE.
In one embodiment, computer system 130 use binding virtual volume API via out-of-band path 602 to storage System sends bind request.Bind request identifies vvol(to be bound and uses vvol ID), and as response, store system Vvol is tied to the PE that computer system 103 is connected to via in-band path.Fig. 9 A is via in band for computer system Its flow chart of the method step of PE of being connected to of path discovery.Use standard SCSI command REPORT_LUNS via Dai Nei road The PE being now based in the storage device of SCSI protocol configuration is sent out in footpath.Use API to send out via out-of-band path and be now based on NFS protocol Storage device in configuration PE.The method step of Fig. 9 A is performed by the storage system that computer system is each connection.
In step 910, computer system determines that whether the storage system of connection is based on SCSI protocol or based on NFS association View.If storage system is based on SCSI protocol, then in storage system band, sent scsi command REPORT_ by computer system LUNS(step 912).Then, in step 913, computer system audit, from the response of storage system, is and return especially PE ID in each PE ID association PE position, with between the LUN relevant with PE and routine data LUN distinguish.If deposited Storage system based on NFS protocol, then by computer system from plug-in unit 622 to management interface (such as, management interface 624,625 or 626) the outer API Calls that sends of band is to obtain the ID(step 914 of available PE).Following the step 916 of step 913 and step 914, The PE ID of the LUN relevant with PE that computer system storage is returned by storage system or the PE ID that returned by management interface with For using during binding procedure.It should be appreciated that the PE ID returned by storage device based on SCSI protocol each includes WWN, and each included IP address and assembling point by the PE ID of storage device based on NFS protocol return.
Fig. 9 B is for system management memory device 131 or system management memory device 132 or distributed memory system pipe Reason device 135(hereinafter referred to as " system management memory device ") find what given computer system 103 was connected to via in-band path The flow chart of the method step of PE.Found that such PE enables storage system in response to from request by system management memory device The bind request of computer system returns computer system to computer system can be operatively coupled to effective PE thereon ID.In step 950, system management memory device sends outside band to computer system 103 via management interface and plug-in unit 622 " Discovery_Topology(finds _ topology) " API Calls.Computer system 103 returns its system identifier and it is via figure The list of all PE ID that the flow chart of 9A finds.In one embodiment, system management memory device is by via management interface " Discovery_Topology " API Calls is sent to perform step 950 to management server 610 with plug-in unit 612.Such In embodiment, reception is responded by storage system, and this respond packet contains multiple computer systems ID and the PEID of association, a computer The PE ID of system identifier and association is for managing each corresponding computer system 103 of server 610 management.Then, in step 952, system management memory device processes the result from step 950.Such as, system management memory device is removed not in its current control The list of all PE ID under system.Such as, system management memory device 135 call sending Discovery_Topology Time receive some PE ID can correspond to be connected to another storage system of same computer system.Similarly, some receives PE ID can correspond to since the older PE deleted by system management memory person, etc..In step 954, store system pipes The result of reason device cache handles uses for during follow-up bind request.In one embodiment, storage system pipes Reason device is periodically run the step of Fig. 9 B and delays to change the high speed updating it by ongoing computer systems and networks topology The result deposited.In another embodiment, system management memory device runs the step of Fig. 9 B when it receives new vvol request to create Suddenly.In still another embodiment, system management memory device is the step of service chart 9B after the authenticating step of service chart 7.
Figure 10 is for using binding virtual volume API send and perform the flow process of method step of virtual volume bind request Figure.In one embodiment, computer system 103 is at its one of application request I O access and the vvol being not already bound to PE Bind request is sent via out-of-band path 602 to storage system during the block device associated.In another embodiment, management server 610 combine some VM management operation sends bind request, these VM management operation include VM energising and from a storage container to The vvol of another storage container migrates.
Continuing example described above, in this example, application request I O access and the vvol being not already bound to PE close The block device of connection, computer system 103 in step 1002 from block device data base 533(or 580) determine the vvol of vvol ID.Then, in step 1004, computer system 103 is sent to storage system by out-of-band path 602 and asks for binding vvol Ask.
System management memory device receives via management interface (such as, management interface 624,625 or 626) in step 1006 For binding the request of vvol, and then performing step 1008, step 1008 includes selecting vvol by the PE being bound to, life Become to be used for the SLLID of the PE of selection and generate number and update connection data base 312(such as, via I/O Manager 304).According to Connection (i.e., only have and can be used for selecting with the PE being connected in the existing band of computer system 103) and other factors are (the most logical Cross the current i/o traffic of available PE) carry out the PE that selection vvol will be bound to.In one embodiment, storage system is according to figure Process and the PE list of cache that the method for 9B sends to it from computer system 103 select.SLLID generates and is using figure The embodiment of the storage system cluster of 2A from use Fig. 2 B storage system cluster embodiment between different.At former case Under, generate the unique SLLID of PE for selecting.In the later case, the literary composition of the file object corresponding with vvol is led in generation Part path is as SLLID.Generate SLLID at PE for selecting and generate after number, update connect data base 312 with Including the newly-generated IO session with vvol.Then, in step 1010, the ID of PE, the life selected is returned to computer system 103 The SLLID become and generation number.Alternatively, use Fig. 2 B storage system cluster embodiment in, can for vvol pair The file object answered generates and returns to computer system 103 together with the ID of PE, the SLLID of generation selected and generation number Unique NFS file handle.In step 1012, computer system 103 updates block device data base 533(or 580) with include from PE ID, SLLID(and alternatively that storage system returns, NFS handle) and generation number.Especially, will be to block device data base 533(or 580) add PE ID, the SLLID(and alternatively returned from storage system, NFS handle) and generation number is each Set is as new entry.It is used for taking precautions against Replay Attack it should be appreciated that generate number.Therefore, Replay Attack is not considered wherein In embodiment, do not use generation number.
By hope to identical vvol send IO different application initiate to the follow-up bind request of identical vvol time, deposit Vvol can be tied to identical or different PE by storage system manager.If vvol to be tied to identical PE, then store system The ID that manager returns identical PE and the SLLID being previously generated, and it is incremented by this IO connection being combined storage with data base 312 The reference count in path.On the other hand, if vvol is tied to different PE, then system management memory device generates new SLLID And return the ID and newly-generated SLLID of different PE and be connected with this new IO of vvol to connecting data base 312 interpolation Path is as new entry.
Solution binding virtual volume API can be used to send virtual volume solution bind request.Solve bind request and include that following IO is even Meet PE ID and SLLID in path, the most bind vvol by this IO access path.Ask it is suggested, however, that process to solve to bind Ask.System management memory device is idle to solve binding vvol immediately or after a delay from PE.Data base 312 is connected by updating The reference count of entry of PE ID and SLLID is comprised to process solution bind request to successively decrease.If reference count is decremented to Zero, then can delete entry.In this case, it should be noted that vvol exists, but be no longer available for using given The IO of PE ID and SLLID.
In the case of implementing the vvol of virtual disk of VM, the reference count for this vvol incites somebody to action at least one.Make VM power-off and in combination send solution bind request time, reference count will be decremented by one.If reference count is zero, the most permissible Vvol entry is removed from connecting data base 312.It is said that in general, be useful from connect data base 312 removing entry, because I/O Manager 304 manages less data and can also reclaim SLLID.Such benefit is at the vvol sum by storage system storage Greatly (such as, vvol is at million grades), but become by when applying the vvol sum accessed actively little (such as, VM is tens thousand of) Significantly.Additionally, when vvol is not bound to any PE, storage system has when selecting where storage vvol in DSU141 Bigger motility.For example, it is possible to implement storage system with asymmetric, classification DSU141, some of them DSU141 provide and faster count According to accessing and other DSU141 slower data access of offer (such as, to save carrying cost).In one implementation, exist Vvol is not bound to any PE(, and this can be by checking that the reference count of the entry in connecting data base 312 of vvol comes Determine) time, storage system can to more slowly and/or more cheap types physical storage device migrate vvol.Then, once will Vvol is tied to PE, then storage system can migrate vvol to the physical storage device of faster type.It should be appreciated that can lead to Cross one or more element of ordered list of the container position of the given vvol of the composition changing in vvol data base 314 also And the respective disc differentiation coordination contour updated in the metadata merogenesis of container database 316 realizes such migration.
Conciliating binding vvol to PE binding makes system management memory device can determine vvol activity.System management memory device can To utilize this information so that non-I/O service (passive) and I/O service (enlivening) vvol are performed the storage exclusive optimization of system sales business. Such as, if system management memory device can be arranged to vvol and keep beyond the specific threshold time in passive state, will It is repositioned onto middle time delay (low cost) hard-drive from low delay (high cost) SSD.
Figure 11 A and 11B is the flow chart of the method step for sending IO to virtual volume according to an embodiment.Figure 11A be for directly send the flow chart of method step 1100 of IO from application to original block equipment and Figure 11 B be for from Application sends the flow chart of the method step 1120 of IO by file system driver.
Method 1100 starts in step 1102, wherein application (the application 512 or Fig. 5 C-shown in such as Fig. 5 A-Fig. 5 B VM571 shown in Fig. 5 D sends IO to original block equipment.In step 1104, virtual volume device driver 532 or 565 basis The IO sent by application generates original block level IO.In step 1106, the title of original block equipment is by virtual volume device driver 532 Or 565 are translated into PE ID and SLLID(and also by NFS client 545 in using the embodiment of storage device of Fig. 2 B Or 585 are translated into NFS handle).In step 1108, data access layer 540 or 566 perform PE ID and SLLID(and Also by NFS handle in using the embodiment of storage device of Fig. 2 B) it is encoded into original block level IO.Then, in step 1110, HBA/NIC sends original block level IO.
Applying (application 512 shown in such as Fig. 5 A-Fig. 5 B) for non-VM, method 1120 starts in step 1121.? Step 1121, applies and sends IO to the file of storage on block device based on vvol.Then, in step 1122, file system Driver (such as, file system driver 510) generates block level IO according to file I/O.After step 1122, perform and step 1106,1108 steps 1126 identical with 1110,1128 and 1130.
Applying (VM571 shown in such as Fig. 5 C-Fig. 5 D) for VM, method 1120 starts in step 1123.In step 1123, VM send IO to its virtual disk.Then, in step 1124, this IO is such as translated written by SCSI virtualization layer 563 Part IO.File system driver (such as, VMFS driver 564) then generates block level IO in step 1125 according to file I/O.? After step 1125, perform with step 1106,1108 and 1110 identical steps 1126,1128 and 1130.
Figure 12 is the flow chart of the method step for performing IO in storage system according to an embodiment.In step 1210, the IO sent by computer system is received by one of PE of configuring within the storage system.I/O Manager 304 is in step 1212 resolve IO.After step 1212, if storage system cluster is type shown in Fig. 2 A, then held by I/O Manager 304 Row step 1214a, and if store system cluster be type shown in Fig. 2 B, then performed step by I/O Manager 304 1214b.In step 1214a, I/O Manager 304 IO analytically extract SLLID and access connection data base 312 with determine with The vvol ID that PE ID is corresponding with the SLLID of extraction.In step 1214b, I/O Manager 304 IO analytically extracts NFS handle And use PE ID and NFS handle as SLLID to identify vvol.Step 1216 is performed after step 1214a and 1214b. Vvol data base 314 and container database 316 is accessed respectively to obtain in step 1216, volume manager 306 and container manager 308 It must will be performed the physical storage locations of IO.Then, in step 1218, data access layer 310 is to obtaining in step 1216 Physical storage locations performs IO.
In some cases, application (application 512 or VM571), management server 610 and/or system management memory device May determine that the binding such as experience problem when PE becomes over loading due to too many binding of vvol to specific PE.As solution The mode of such problem, even if when I/O command is drawn towards vvol, system management memory device still can be by binding Vvol is tied to different PE again.Figure 13 is sending for use again binding API and performing according to an embodiment The flow chart of the method step 1300 of vvol bind request again.
As indicated, method 1300 starts in step 1302, wherein system management memory device is it is determined that be tied to vvol The 2nd different for the PE PE being currently bound to from vvol.In step 1304, system management memory device is via out-of-band path Send the request for again binding vvol to computer system (such as, computer system 103), this computer system run to Vvol sends the application of IO.In step 1306, computer system 103 receives bind request again from system management memory device, and And as response, send the request for vvol being tied to new PE.In step 1308, system management memory device receives and again ties up Fixed request, and as response, vvol is tied to new PE.In step 1310, system management memory device is as above in association with Figure 10 The ID of the new PE that vvol is the most also bound to is transmitted and for accessing the SLLID of vvol to computer system as description.
In step 1312, computer system receives new PE ID and SLLID from system management memory device.In block device data In storehouse 533 or 580, initially the active position that new PE connects is arranged to 1, it means that have been set up the use via new PE New IO session in vvol.The active position that oneth PE connects also is arranged to 0 by computer system, and this expression can not pass through this PE Connect and send more IO to vvol.Connect, because may have it should be appreciated that should not solve this PE of binding immediately when deexcitation The IO to vvol connected by this PE, these IO may be awing, i.e. but it is issued and is not fully complete.Therefore, in step 1314, computer system accesses block device data base 533 or 580 with check whether to have been completed by a PE connect to All " in-flight order " (CIF) that vvol sends, i.e. whether CIF=0.Computer system is before performing step 1318 Wait that CIF becomes zero.Meanwhile, send the additional I/O to vvol by new PE, because the active position connected by new PE is arranged Become 1.When CIF reaches zero, perform step 1318, wherein send for solving what binding the oneth PE connected to system management memory device Request.Then, in step 1320, system management memory device solves binding vvol from a PE.Computer system is also in step 1324 All additional I/O are sent to vvol by new PE.
Figure 14 is the concept map of the life cycle of the virtual volume according to an embodiment.All orders shown in Figure 14 (that is, establishment, snapshot, clone, bind, solve binding, extend and delete) forms vvol administration order collection, and by above in association with The plug-in unit 612,622 that Fig. 6 describes may have access to.As indicated, as to order establishment vvol, snapshot vvol or clone When the result of any order in vvol generates vvol, the vvol of generation keeps in " passive " state, and wherein vvol is not It is bound to specific PE, and therefore can not receive IO.Additionally, perform fast to order when vvol is in passive state According to vvol, clone vvol or extension vvol in any order time, if original vvol and newly created vvol(has) Passive state keeps.Also as indicated, when the vvol in passive state is bound to PE, vvol enters " enlivening " shape State.Conversely, when enlivening vvol from PE solution binding, vvol enters passive state, and this assumes that vvol is not bound to any attached Add PE.Perform when vvol is in active state to order snapshot vvol, clone vvol, extension vvol or again tie up Determine in vvol any order time, original vvol keeps in active state, and if newly created vvol(have) Passive state keeps.
As described above, VM can have multiple virtual disk, and is that each virtual disk creates the vvol separated. VM also has the meta data file of the configuration describing VM.Meta data file includes VM configuration file, VM journal file, dish descriptor File, VM swap file etc., each dish descriptor file each respective virtual dish in the virtual disk of VM.For virtual disk Dish descriptor file comprise the information relating to virtual disk, such as its vvol ID, its size, the thinnest supply virtual disk Mark etc. with one or more snapshot created for virtual disk.VM swap file provides the exchange sky of the VM in storage system Between.In one embodiment, vvol stores these VM configuration files, and this vvol is referred herein as metadata vvol。
Figure 15 is the flow chart of the method step for supplying VM according to an embodiment.In this embodiment, make With management server 610, computer system (such as, hereinafter referred to as " the main frame meter of the computer 102(shown in Fig. 5 C of master control VM Calculation machine ")) and the storage system cluster of Fig. 2 A, it is system management memory device 131,132 or 135 especially.As indicated, storage System administration manager receives the request for supplying VM in step 1502.This can be to use and managing the suitable of server 610 The VM manager of user interface sends the VM for supply with certain size and storage capacity profile to management server 610 The request generated during order.In response to this order, in step 1504, management server 610 is initiated for retouching with above in association with Fig. 8 The mode stated creates the vvol(hereinafter referred to as " metadata vvol " of the metadata for comprising VM) method, system management memory Device creates metadata vvol in the method in step 1508 and returns the vvol of metadata vvol to management server 610 ID.In step 1514, management server 610 registers back the vvol ID of metadata vvol to the computer system of master control VM.In step Rapid 1516, host computer initiates the method for metadata vvol is tied to PE by the mode described above in association with Figure 10, Metadata vvol is tied to PE in step 1518 and returns PE to host computer by system management memory device in the method ID and SLLID.
In step 1522, host computer uses " establishment equipment " in the operating system of host computer to call and create Build the block device example of metadata vvol.Then, in step 1524, host computer creates file system (example on block device As, VMFS), return file system ID(FSID in response to this point).Host computer has return in step 1526 assembling The file system of FSID, and in the name space associated with this file system, store the metadata of VM.Showing of metadata Example includes VM journal file, dish descriptor file and VM swap file, and each dish descriptor file is in the virtual disk of VM Each respective virtual dish.
In step 1528, host computer is initiated for in the virtual disk that the mode described above in association with Fig. 8 is VM Each virtual disk creates each such vvol of vvol(referred herein as " data vvol ") method, system management memory device exists Step 1530 creates data vvol in the method and returns the vvol ID of data vvol to host computer.In step 1532, host computer is storing the ID of data vvol in the dish descriptor file of virtual disk.The method is to be All virtual disks of VM solve binding metadata vvol(after creating data vvol not shown) as terminating.
Figure 16 A is the method step for having made VM be energized after having supplied VM in the way of combining Figure 15 description Flow chart.Figure 16 B is the flow chart of the method step having made VM power-off after having made VM energising.Both approaches is by for VM Host computer perform.
When receiving VM power-on command in step 1608, fetch metadata vvol corresponding with VM in step 1610.Then, In step 1612, metadata vvol experiences binding procedure as above described in conjunction with Figure 10.In step 1614 in metadata vvol Upper assembling file system so that step 1616 read for the meta data file of data vvol (be dish especially Descriptor file) and obtain data vvol ID.Then data vvol experience as retouched above in association with Figure 10 one by one in step 1618 The binding procedure stated.
When receiving VM power off command in step 1620, block device data base (such as, the block device data base of Fig. 5 C 580) data vvol of labelling VM are inactive, and host computer waits and associating with each vvol in data vvol CIF reaches zero (step 1622).When the CIF associated with each data vvol reaches zero, host computer please in step 1624 Storage system solution of seeking survival binds this data vvol.After the CIF associated with all data vvol reaches zero, in step 1626 at block In device databases, labelling metadata is inactive.Then, in step 1628, zero is reached at the CIF associated with metadata vvol Time, host computer solves binding metadata vvol in step 1630 request.
Figure 17 and Figure 18 is the flow chart of the method step for again supplying VM.In example shown here, Figure 17 For extending data vvol that the vvol(of VM is particularly used for the virtual disk of VM) the holding on a host computer of size The flow chart of method step of row, and Figure 18 is holding within the storage system of the vvol for VM mobile between storage container The flow chart of the method step of row.
Start in step 1708 for the method for the size of data vvol of the virtual disk of VM for extension, wherein main frame meter Calculation machine determines whether VM is energized.If in step 1708, host computer determines that VM is not energized, then host computer is in step Rapid 1710 ID fetching metadata vvol corresponding with VM.Then, host computer initiate for metadata in step 1712 The binding procedure of vvol.After the binding, in step 1714, host computer assemble in metadata vvol file system and Fetch the ID of data vvol corresponding with virtual disk from the dish descriptor file for virtual disk, this dish descriptor file is in unit File in the file system of assembling in data vvol.Then, in step 1716, host computer in step 1716 to storage is System sends extension vvol API Calls, and wherein extension vvol API Calls includes the ID of data vvol and the new big of data vvol Little.
If VM is energized, then host computer fetches data vvol of virtual disk to be extended of VM in step 1715 ID.Should recognize from the method for Figure 16 A, this ID can be obtained from the dish descriptor file associated with the virtual disk of VM.So After, in step 1716, host computer sends extension vvol API Calls in step 1716 to storage system, wherein extends vvol API Calls includes the ID of data vvol and the new size of data vvol.
Extension vvol API Calls causes and updates vvol data base and container database (such as, Fig. 3 within the storage system Vvol data base 314 and container database 316) to reflect the address space of the increase of vvol.Adjust receiving extension vvol API During with the confirmation being complete, host computer updates the dish descriptor literary composition of the virtual disk for VM in step 1718 by new size Part.Then, in step 1720, host computer determines whether VM is energized.If it is not, then host computer is in step 1722 Dismantle file system and send the request for solving binding metadata vvol to storage system.On the other hand, if VM quilt Energising, then the method terminates.
For the vvol of the current bindings of VM to PE is shifted to (the wherein source storage of destination's storage container from source storage container Container with both destination storage containers in the range of identical system management memory device) method start in step 1810, its The Container ID (respectively SC1 and SC2) of middle reception source and destination storage container and the vvol ID of vvol to be moved.So After, in step 1812, update vvol data base (such as, the vvol data base 314 of Fig. 3) and container database (such as, Fig. 3 Container database 316) panel assignment bit map as follows.First, system management memory device from SC1 container database 316 Entry remove vvol panel in SC1, and then by the entry in container database 316 of amendment SC2 come to SC2 assigns these panels.In one embodiment, storage system can be by assigning new spindle panel to compensate SC1 to SC1 In (owing to removing vvol storage panel) memory capacity loss, and by removing some untapped spindles from SC2 (owing to adding vvol storage panel) memory capacity that panel makes up in SC2 increases.In step 1814, system management memory Device determines whether the PE of current bindings can optimally serve the IO of the new position to vvol.Current PE can not serve to The sample instance during IO of the new position of vvol be if storage administrator's static configuration system management memory device with to Vvol from different clients and therefore from different storage containers assigns different PE.If current PE can not serve to The IO of vvol, then vvol step 1815 experience above in association with Figure 13 describe binding procedure again (with to connect data base's (example Such as, the connection data base 312 of Fig. 3) the change of association).After step 1815, perform step 1816, wherein to main frame meter Calculation machine returns the confirmation successfully moved.If system management memory device step 1814 determine current PE can serve to The IO of the new position of vvol, then walk around step 1815 and then perform step 1816.
When between incompatible storage container (storage container such as, created in the storage device of different manufacturers it Between) mobile vvol time, also exist in addition to container database 316, vvol data base 314 and the change connecting data base 312 Perform data between storage container to move.In one embodiment, be used in by incorporated herein full content being hereby incorporated by, It is filed on May 29th, 2008 and entitled " Offloading Storage Operations to Storage Hardware " No. 12/129,323 U.S. Patent application described in Data Transfer Techniques.
Figure 19 is the stream for the method step performed in host computer and storage system from template VM clone VM Cheng Tu.This method starts in step 1908, and wherein host computer sends for creating metadata for new VM to storage system The request of vvol.1910, storage system is that new VM creates metadata and to main frame according to the method described above in association with Fig. 8 Computer returns new metadata vvol ID.Then, in step 1914, for belonging to all data vvol ID of template VM from master Machine computer sends clone's vvol API Calls via out-of-band path 601 to storage system.In step 1918, system management memory Device inspection is the most compatible with data vvol checking template VM and new VM.If it should be appreciated that being cloned in different manufacturer Occur between the storage container created in storage system, then data vvol may be incompatible.If there is compatibility, then perform step Rapid 1919.In step 1919, system management memory device is by generating new data vvol ID, updating dividing in container database 316 Coordination contour and add new vvol entry to create new data vvol and multiple to data vvol of new VM to vvol data base 314 Make the content of storage in data vvol of template VM.In step 1920, system management memory device returns new to host computer Data vvol ID.Receiving new data vvol ID provides data vvol to clone and faultless confirmation to host computer. Then, in step 1925, host computer sends IO to update with newly-generated data vvol ID to metadata vvol of new VM Meta data file, is dish descriptor file especially.Performed by host computer to storage system in step 1926 by storage system The IO that system sends, as the result of this point, updates the dish descriptor file of new VM with newly-generated data vvol ID.
If in step 1918, system management memory device determines that data vvol of template VM and new VM are incompatible, then to main frame Computer returns error message.When receiving this error message, host computer sends wound in step 1921 to storage system Build vvol API Calls to create new data vvol.In step 1922, system management memory device by generate new data vvolID, Update the assignment bit map in container database 316 and add new vvol entry to create new data to vvol data base 314 Vvol, and return new data vvolID to host computer.In step 1923, host computer is according to by incorporated herein Full content is incorporated into, is filed on January 21st, 2009 and entitled " Data Mover for Computer System " No. 12/356,694 U.S. Patent application described in technology perform data move (step 1923).In step After 1923, perform step 1925 and 1926 as described above.
Figure 20 is the flow chart of the method step for supplying VM according to another embodiment.In this embodiment, make By the computer system of management server 610, master control VM, (such as, computer system 102(shown in Fig. 5 D is hereinafter referred to as " main Machine computer ")) and the storage system cluster of Fig. 2 B (be system management memory device 131 or system management memory device especially 132 or system management memory device 135).As indicated, receive the request for supplying VM in step 2002.This can be to make With with management server 610 suitable user interface VM manager to management server 610 send for supply there is certain The request generated during the order of the VM of size and storage capacity profile.In response to this order, in step 2004, manage server 610 vvol(initiating to be used for creating the metadata for comprising VM by the mode described above in association with Fig. 8 are metadata especially Vvol) method, system management memory device step 2008 create in the method this metadata vvol of metadata vvol(be File in NAS device), and return metadata vvol ID to management server 610.In step 2020, manage server 610 register back the vvol ID of metadata vvol to host computer.In step 2022, host computer sends to storage system For the bind request of metadata vvol ID, in response to this bind request, storage system returns to IP address respectively in step 2023 With directory path as PE ID and SLLID.In step 2024, host computer assembles in the IP address specified and directory path Catalogue, and in the catalogue of assembling, store meta data file.In the embodiment using NFS, NFS client 545 or 585 NFS handle can be resolved in given IP address and directory path to send NFS request to such catalogue.
In step 2026, host computer is initiated for in the virtual disk that the mode described above in association with Fig. 8 is VM Each virtual disk creates the method for data vvol, and system management memory device creates data vvol also in the method in step 2030 And the vvol ID of data vvol is returned to host computer.In step 2032, host computer describes at the dish for virtual disk Symbol file stores the ID of data vvol.The method solves after creating data vvol at all virtual disks for VM Binding metadata vvol(is not shown) as terminating.
The most like that, creating new vvol from storage container and explicitly do not referring to for new vvol When determining storage capacity profile, the storage capacity profile that succession is associated by new vvol with storage container.Can be from some different profiles One of the storage capacity profile that associates with storage container of selection.Such as, as shown in Figure 21, different profiles include producing (prod) Profile 2101, exploitation (dev) profile 2102 and test profile 2103(are collectively referred to herein as " profile 2100 ").It should be appreciated that Other profiles many can be defined.As indicated, each profile entries of certain profiles is fixed type or variable type, and There is title and one or more value associated with it.The profile entries of fixed type has the optional item of fixed number Mesh.It is arranged to true or false for example, it is possible to " replicated " by profile entries.In contrast, the profile entries of variable type is without predetermined Justice selects.As an alternative, for profile entries default settings and the value scope of variable type, and user can select in scope Interior any value.If not specified value, then use default value.In exemplary profile 2100 shown in figure 21, variable type Profile entries has three numbers separated by comma.First number is the more low side of the scope specified, and the second number refers to Fixed scope more high-end.3rd number is default value.Therefore, will replicate vvol, this vvol inherits in producing profile 2101 The storage capacity profile of definition (replicate. value=true), and can use defined in scope 0.1 to 24 hours (being defaulted as 1 hour) In target recovery time (RTO) replicated.Additionally, replicate for this vvol(. value=true) allow snapshot.The snapshot number kept Mesh, in scope 1 to 100, is defaulted as 1, and snapshot frequency scope the most once to every 24 hours once in, be defaulted as The most once.SnapInherit row instruction the given vvol of snapshot using create as when deriving the new vvol of vvol whether Given profile attributes (and value) should be propagated to derivative vvol.In the example producing profile 2101, can be with generating letter Shelves 2101 propagate only the first two profile entries (replicating and RTO) to the snapshot vvol of given vvol.Snapshot vvol all its The value of his attribute will be provided in the default value specified in profile.In other words, these other attributes are on given vvol Any customization (such as, the non-default value of snapshot frequency) by due to their corresponding SnapInherit be classified as false without by Snapshot vvol propagates.Profile also comprise other row (such as CloneInherit(is not shown)) and ReplicaInherit(do not show Go out), these two row control respectively to clone and which property value of copy propagation of given vvol.
When the method according to Fig. 4 creates storage container, can arrange can be fixed for the vvol created from storage container The type of the storage capacity profile of justice.The method being used for creating storage container shown in flow chart illustration Fig. 4 in Figure 21, its In between step 412 and 413 inserting step 2110.In step 2110, storage administrator selects the storage container for creating Profile 2100 in one or more profile.Such as, be the storage container that creates of client can with produce profile 2101 with exploitation profile 2102 associate so that for production type vvol will as it may is that as succession give tacit consent to The value that value or client specify is producing the storage capacity profile defined in profile 2101, and is that the vvol of development types will be as It may is that inherit the value specified with default value or client like that to develop the storage capacity profile defined in profile 2102.
Figure 22 be illustrate for create vvol and define the storage capacity profile for vvol by system management memory The flow chart of the method step that device 131,132 or 135 performs.The method step of Figure 22 (be especially method step 2210, 2212,2218 and 2220) step 806 shown in Fig. 8,810,812 and 814 are corresponded respectively to.Additionally, the method step of Figure 22 Including step 2214,2215 and 2216, the storage capacity profile of these steps definition vvol for creating.
In step 2214, system management memory device determine whether for create vvol request in specify by The value used in storage capacity profile.If they are not, then system management memory device uses and the depositing of vvol in step 2215 The storage capacity profile of storage container association is as the storage capacity profile with default value of vvol.If have specified that by The value used in storage capacity profile, then system management memory device uses and depositing that the storage container of vvol associates in step 2216 Storage capability profile is as the storage capacity profile with the value specified rather than default value of vvol.
In one embodiment, the storage capacity profile storing vvol in vvol data base 314 is key-value pair. The storage capacity profile once having been defined for and storing in vvol data base 314 vvol is key-value pair, and As long as the attribute relevant with duplication and snapshot and value are the part of this profile as shown in the exemplary profile of Figure 21, store system Just it is able to carry out duplication and snapshot for vvol and instructs further without sent by host computer.
Figure 23 is to illustrate for creating being performed by system management memory device 131,132 or 135 of snapshot from female vvol The flow chart of method step.In one embodiment, snapshot is used to follow the tracks of data structure with according to the storage at given vvol Snapshot definition scheduling snapshot in capability profile.When reaching for the time of the scheduling of snapshot, system management memory device is in step Rapid 2310 follow the tracks of data structure from snapshot fetches vvol ID.Then, in step 2312, system management memory device generates for fast According to unique vvol ID.System management memory device uses female vvol(i.e. to have in step 2315 and takes from snapshot tracking data structure The vvol of vvol ID returned) storage capacity profile as the storage capacity profile of snapshot vvol.It should be noted that, owing to this is The snapshot processes driven by automatization's profile of storage system drive, so user is not obtained for specifying snapshot vvol's The chance of the custom value used in storage capacity profile.In step 2318, system management memory device is by updating container database Assignment bit map in 316 and add the new vvol entry for snapshot vvol in the storage of female vvol to vvol data base 314 Snapshot vvol is created in container.Then, in step 2320, system management memory device is used for female vvol's by scheduling for generating The time updating decision of next snapshot is according to following the tracks of data structure.It should be appreciated that system management memory device must for institute just like Lower vvol concurrent maintenance snapshot is followed the tracks of data structure and performs the method step of Figure 23, the storage capacity profile rule of these vvol The snapshot of degree of setting the tone.
After creating snapshot by manner described above, update the key-value pair of storage in vvol data base 314 With instruction snapshot vvol as type=snapshot.Equally, safeguard that generation number is (whenever shooting or arranging snapshot for snapshot wherein Be incremented by generate number or be set to equal to date+time) embodiment in, storage generate number be key-value pair.Also exist The female vvol ID storing snapshot vvol in snapshot vvol entry is key-value pair.As result, host computer can be to Vvol data base 314 inquiry and specific snapshot corresponding for vvol ID.It is also possible to allow host computer send to vvol data base Inquiry for the snapshot corresponding with specific vvol ID and specific generation number.
Various embodiments described can be used various by computer-implemented operation, and these operations relate at meter The data of storage in calculation machine system.Such as, but these operations generally may not need physical manipulation physical magnitude, these numbers Amount wherein can store to use electricity or the form of magnetic signal, transmit, combine, compare or otherwise handle it Or their expression.It addition, often quote such manipulation in term (such as produce, identify, determine or compare). The operation of the part of any one or more embodiment of formation described herein can be useful machine operation.Additionally, one Individual or multiple embodiments are directed to a kind of equipment for performing these operations or device.This device can be by structure especially Making for concrete required purpose, or it can be general purpose computer, this general purpose computer is by the meter stored in a computer Calculation machine program activates selectively or configures.Especially, various general-purpose machinerys can be with the meter write according to teachings herein Calculation machine program is used together, or construct more specialized apparatus with perform action required can be more convenient.
Can with include handheld device, microprocessor system, based on microprocessor or programmable consumer electronic devices, Other computer system configurations of minicomputer, mainframe computers etc. realize various embodiments described.
One or more embodiment can be embodied as one or more computer program or at one or One or more computer program module embodied in multiple computer-readable mediums.Term computer-readable medium refers to can To store any data storage device of data, these data can input to computer system subsequently.Computer-readable medium can By based on for enable computer program embody in the way of being readable by a computer any existing of computer program or with The technology of rear exploitation.The example of computer-readable medium include hard-drive, network attachment storage device (NAS), read only memory, Random access memory (such as, flash memory device), CD(compact-disc), CD-ROM, CD-R or CD-RW, DVD(digital versatile Dish), tape and other optics and non-optical data storage device.Can also be by the computer system distribution meter of coupling network Calculation machine computer-readable recording medium, so that storage and computer readable code executed in a distributed way.
Although having described one or more embodiment by some details to understand clear, it will be clear that It is can to carry out some within the scope of the claims and change and amendment.Such as, use SCSI as the association for SAN equipment View and use NFS are as the agreement for NAS device.Any alternative (the such as fiber channel) of SCSI protocol can be used, And any alternative (such as CIFS(Common Internet File System) agreement of NFS protocol can be used).Thus, the reality of description Execute example and will be considered exemplary and non-limiting, and the scope of claim is not limited to details given here, but can To be modified in the scope of claim and equivalents.In the claims, unit and/or step are unless in claim In clearly state otherwise be not meant to any specific operation order.
Although additionally, the virtual method main assumption virtual machine described presents and the connecing of specific hardware systems compliant Mouthful, but can use, in conjunction with the most not corresponding with any specific hardware system virtualization, the method described.According to various realities The virtualization system executing example is all conceived to, and these embodiments are implemented as the embodiment of master control, the embodiment of non-master control or are The often embodiment of fuzzy differentiation between.Furthermore it is possible to implement various virtual the most within hardware Change operation.Such as, look-up table can be used for revising storage access request so that non-disk data safety by hardware implementation mode.
It is the most all possible that many changes, revises, adds and improve the degree of virtualization.Therefore virtualization software may be used To include performing the main frame of virtualization, control station or the parts of client operating system.Can provide multiple example with In being described herein as the parts of single instance, operation or structure.Finally, various parts, operation and data repository it Between some is any on border, and in the context of particular instantiation configuration, illustrate specific operation.The distribution of other functions is set Think and can fall in the range of the embodiments described herein.It is said that in general, be rendered as separating component in example arrangement 26S Proteasome Structure and Function may be implemented as combination structure or parts.Similarly, structure and the merit of single parts it are rendered as Separating component can be may be implemented as.These and other change, revise, add and improve the model that can fall into claims In enclosing.

Claims (14)

1. a computer system (100), has the multiple virtual machines (571) run wherein, in described virtual machine (571) Each virtual machine has in storage system (130) as separating the virtual disk that logical storage volumes (151,152) is managed (575), described computer system includes:
Hardware store interface (503,504), is arranged to send input-output order IO to described storage system (130);With And
It is characterized in that, described computer system (100) also includes:
Virtualization software module (565), is arranged to receive read/write request from described virtual machine (571) and generate Will be according to from the first virtual machine (5711) read/write request and with described first virtual machine (5711) described virtual disk (575A) IO that is issued by described hardware store interface (503,504) of the block device title that associates, and by basis From the second virtual machine (5712) read/write request and with described second virtual machine (5712) described virtual disk (575) close The 2nd IO that the block device title of connection is issued by described hardware store interface (503,504),
Each IO in a wherein said IO and described 2nd IO includes protocol end identifier (PE ID) and secondary identifier (SLLID), described protocol end identifier (PE ID) identify in described storage system (130) with described logical storage volumes (151, 152) protocol end (161) of one or more logical storage volumes association in, and described secondary identifier (SLLID) mark Know among the one or more logical storage volumes in described logical storage volumes (151,152) with described protocol end (161) the respective logic storage volume (151) associated.
Computer system the most according to claim 1, wherein said virtualization software module (565) is additionally configured to dimension Protecting Mapping data structure (580), described Mapping data structure provides block device title to described protocol end identifier and described The mapping of secondary identifier.
Computer system the most according to claim 2, wherein said Mapping data structure (580) also includes for described piece The entry of each block device title in device name, the instruction of described entry and the logical storage volumes of block device names associate (151) whether it is active or inactive.
Computer system the most according to claim 3, wherein said Mapping data structure (580) also includes for described piece The entry of each block device title in device name, the instruction of described entry is gone to and the logical storage volumes of block device names associate (151), be in operating IO number.
Computer system the most according to claim 4, wherein said virtualization software module (565) is additionally configured to By described hardware store interface (503,504) when described logical storage volumes (151) returns reading data or write confirmation Update the number of the IO being in operation.
Computer system the most according to claim 1, wherein the described protocol end identifier in a described IO and Described protocol end identifier in described 2nd IO is identical, and described secondary identifier in a described IO and Described secondary identifier in described 2nd IO is different.
Computer system the most according to claim 1, wherein the described protocol end identifier in a described IO and Described protocol end identifier in described 2nd IO is different.
Computer system the most according to claim 1, wherein said virtualization software module (565) be additionally configured to from The protocol end (161) of described storage system (130) receives signal that mistake occurred, determines that described mistake is and described association View end points (161) mistake that associates and send error event, the instruction of described error event can via described protocol end (161) Each logical storage volumes in one or more logical storage volumes (151,152) accessed is unavailable.
Computer system the most according to claim 8, wherein said one or more logical storage volumes (151,152) has Having the corresponding blocks device name that use Mapping data structure (580) identifies, described Mapping data structure provides block device title to arrive The mapping of protocol end identifier.
Computer system the most according to claim 1, wherein said virtualization software module (565) is additionally configured to Receive signal that mistake occurred from the protocol end (161) of described storage system (130), determine that described mistake is and logic Mistake that storage volume (151) associates and stop and send any additional I/O order to described logical storage volumes (151).
11. computer systems according to claim 1, wherein said protocol end identifier (PEID) includes for LUN World Wide Name.
12. computer systems according to claim 1, wherein said protocol end identifier (PEID) include IP address and Assembling point.
13. 1 kinds for the side used in the computer system (100) with the multiple virtual machines (571) run wherein Method, each virtual machine in described virtual machine (571) have in the storage system (130) as separate logical storage volumes (151, 152) virtual disk (575) being managed, described method includes:
IO input-output order and the 2nd IO is sent to described storage system (130) by hardware store interface (503,504) Input-output order;And
It is characterized in that following steps:
Receive read/write request from described virtual machine (571) and generation will be according to from the first virtual machine (5711) reading Take/write request and with described first virtual machine (5711) described virtual disk (575A) the block device title that associates passes through hardware The described IO that memory interface (503,504) is issued, and generation will be according to from the second virtual machine (5712) reading/ Write request and with described second virtual machine (5712) the block device title that associates of described virtual disk (575) by described hardware Described 2nd IO that memory interface (503,504) is issued,
Each IO in a wherein said IO and described 2nd IO includes protocol end identifier (PE ID) and secondary identifier (SLLID), described protocol end identifier (PE ID) identify in described storage system (130) with described logical storage volumes (151, 152) protocol end (161) of one or more logical storage volumes association in, and described secondary identifier (SLLID) mark Know among the one or more logical storage volumes in described logical storage volumes (151,152) with described protocol end (161) the respective logic storage volume (151) associated.
14. 1 kinds for setting of using in the computer system (100) with the multiple virtual machines (571) run wherein Standby, each virtual machine in described virtual machine (571) have in storage system (130) as separate logical storage volumes (151, 152) virtual disk (575) being managed, described equipment includes:
For being sent an IO input-output order and by hardware store interface (503,504) to described storage system (130) The device of two IO input-output orders;And
It is characterized in that described equipment also includes:
Will be according to from the first virtual machine (571 for asking from described virtual machine (571) reception read/write and generate1) Read/write request and with described first virtual machine (5711) described virtual disk (575A) the block device title that associates is by hard The described IO that part memory interface (503,504) is issued, and generation will be according to from the second virtual machine (5712) reading Take/write request and with described second virtual machine (5712) the block device title that associates of described virtual disk (575) by described The device of described 2nd IO that hardware store interface (503,504) is issued,
Each IO in a wherein said IO and described 2nd IO includes protocol end identifier (PE ID) and secondary identifier (SLLID), described protocol end identifier (PE ID) identify in described storage system (130) with described logical storage volumes (151, 152) protocol end (161) of one or more logical storage volumes association in, and described secondary identifier (SLLID) mark Know among the one or more logical storage volumes in described logical storage volumes (151,152) with described protocol end (161) the respective logic storage volume (151) associated.
CN201280041414.8A 2011-08-26 2012-08-22 Access the computer system of object storage system Active CN103765370B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610515983.1A CN106168884B (en) 2011-08-26 2012-08-22 Access the computer system of object storage system

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US13/219,378 US8650359B2 (en) 2011-08-26 2011-08-26 Computer system accessing object storage system
US13/219,378 2011-08-26
PCT/US2012/051840 WO2013032806A1 (en) 2011-08-26 2012-08-22 Computer system accessing object storage system

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN201610515983.1A Division CN106168884B (en) 2011-08-26 2012-08-22 Access the computer system of object storage system

Publications (2)

Publication Number Publication Date
CN103765370A CN103765370A (en) 2014-04-30
CN103765370B true CN103765370B (en) 2016-11-30

Family

ID=

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004355638A (en) * 1999-08-27 2004-12-16 Hitachi Ltd Computer system and device assigning method therefor

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004355638A (en) * 1999-08-27 2004-12-16 Hitachi Ltd Computer system and device assigning method therefor

Similar Documents

Publication Publication Date Title
CN103765372B (en) It is configured to the object storage system of input/output operations
CN103765371B (en) Derive the data-storage system as the logical volume of storage object
CN106168884B (en) Access the computer system of object storage system
CN103748545B (en) Data storage system and data storage control method
JP7053682B2 (en) Database tenant migration system and method
US20220035714A1 (en) Managing Disaster Recovery To Cloud Computing Environment
EP2306320B1 (en) Server image migration
CN110023896A (en) The merged block in flash-memory storage system directly mapped
US11886334B2 (en) Optimizing spool and memory space management
JP6423752B2 (en) Migration support apparatus and migration support method
US20230195444A1 (en) Software Application Deployment Across Clusters
CN105739930A (en) Storage framework as well as initialization method, data storage method and data storage and management apparatus therefor
US9940073B1 (en) Method and apparatus for automated selection of a storage group for storage tiering
US20220091744A1 (en) Optimized Application Agnostic Object Snapshot System
CN103765370B (en) Access the computer system of object storage system
US20240045609A1 (en) Protection of Objects in an Object-based Storage System
WO2022241024A1 (en) Monitoring gateways to a storage environment
WO2022240938A1 (en) Rebalancing in a fleet of storage systems using data science
WO2024097622A1 (en) Handling semidurable writes in a storage system
EP4338044A1 (en) Role enforcement for storage-as-a-service

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant
CP03 Change of name, title or address

Address after: California, USA

Patentee after: Weirui LLC

Country or region after: U.S.A.

Address before: California, USA

Patentee before: VMWARE, Inc.

Country or region before: U.S.A.