CN103765370B - Access the computer system of object storage system - Google Patents
Access the computer system of object storage system Download PDFInfo
- Publication number
- CN103765370B CN103765370B CN201280041414.8A CN201280041414A CN103765370B CN 103765370 B CN103765370 B CN 103765370B CN 201280041414 A CN201280041414 A CN 201280041414A CN 103765370 B CN103765370 B CN 103765370B
- Authority
- CN
- China
- Prior art keywords
- vvol
- storage
- computer system
- virtual machine
- logical storage
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003860 storage Methods 0.000 title claims abstract description 428
- 238000000034 method Methods 0.000 claims description 106
- 238000013507 mapping Methods 0.000 claims description 17
- 238000012790 confirmation Methods 0.000 claims description 4
- 230000004044 response Effects 0.000 abstract description 13
- 238000007726 management method Methods 0.000 description 134
- 230000027455 binding Effects 0.000 description 43
- 238000009739 binding Methods 0.000 description 43
- 238000010586 diagram Methods 0.000 description 18
- 230000008569 process Effects 0.000 description 17
- 238000004891 communication Methods 0.000 description 11
- 238000004364 calculation method Methods 0.000 description 10
- 238000000151 deposition Methods 0.000 description 10
- 230000008859 change Effects 0.000 description 9
- 238000005516 engineering process Methods 0.000 description 7
- 239000000203 mixture Substances 0.000 description 7
- 230000005540 biological transmission Effects 0.000 description 6
- 238000013500 data storage Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 5
- 230000009471 action Effects 0.000 description 4
- 238000004590 computer program Methods 0.000 description 4
- 238000009826 distribution Methods 0.000 description 4
- 239000000835 fiber Substances 0.000 description 4
- 230000009897 systematic effect Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 210000004027 cell Anatomy 0.000 description 3
- 238000012423 maintenance Methods 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 230000004069 differentiation Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 239000004744 fabric Substances 0.000 description 2
- 230000000977 initiatory effect Effects 0.000 description 2
- 238000007689 inspection Methods 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 238000003032 molecular docking Methods 0.000 description 2
- 230000004899 motility Effects 0.000 description 2
- 238000005192 partition Methods 0.000 description 2
- 230000002085 persistent effect Effects 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 238000011084 recovery Methods 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 230000004083 survival effect Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 108010022579 ATP dependent 26S protease Proteins 0.000 description 1
- 241000208340 Araliaceae Species 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 238000012550 audit Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 239000000155 melt Substances 0.000 description 1
- 238000002844 melting Methods 0.000 description 1
- 230000008018 melting Effects 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000006116 polymerization reaction Methods 0.000 description 1
- 230000003362 replicative effect Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000009987 spinning Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 210000000352 storage cell Anatomy 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
- 239000011800 void material Substances 0.000 description 1
Abstract
Storage system derives the logical storage volumes being supplied as storing object.The computer system connected accesses these storage objects as desired by the logical endpoints for protocol traffic that configures within the storage system, the standard agreement that uses such as SCSI and NFS.Before sending IO Command to logical storage volumes, computer system sends for the request that logical storage volumes is tied to protocol end.As response, return the first identifier for protocol end and the second identifier for logical storage volumes.Even if using same protocol end points, it is also possible to generate different second identifiers for Different Logic storage volume.Therefore, single protocol end can serve as the gateway for multiple logical storage volumes.
Description
Background technology
Along with computer system zooms to enterprise-level, particularly in the situation supporting large-scale data center, the lower number of plies
Storage area network (SAN) or network attachment storage device (NAS) is often used according to storage system.As conventionally by well
As understanding, SAN or NAS provides many technical capabilities and operational benefits, these technical capabilities and ground, operational benefits basis
The physical equipment controlled including the virtualization of data storage device, the fault recovery with transparent fault tolerant and error protection
Redundancy, be distributed and the storage device that replicates and from computer system management decoupling client-centered geographically
The centralized supervision closed and storage configuration management.
Architecturally, the storage device in SAN storage system (such as, disk array etc.) is typically connected to the network switch
(such as, fibre channel switch etc.), these network switch are then attached to server or " main frame ", these servers or
Person's " main frame " needs to access the data in storage device.Server, switch and storage device in SAN generally use small-sized
Computer system interface (SCSI) agreement communicates, and this SCSI protocol crosses over network transmission data in dish block level.Comparison and
Speech, NAS device is typically following equipment, and this device interior comprises one or more storage device and passes through procotol
(such as Ethernet) is connected to main frame (or intervening switch).In addition to comprising storage device, NAS device the most basis
Network file system (such as NFS (NFS) or Common Internet File System (CIFS)) preformat
Change its storage device.So, as with to main frame exposure dish (be referred to as LUN and in further detail below describe) these
Then dish needs according to the SAN comparison being formatted by the file system of host computer and then assembling, NAS device
Network file system (this system needs by the operating system support of main frame) makes the NAS device operating system table to main frame
Being now file server, then this document server can assemble or map NAS device and such as may have access to as operating system
Network-driven.It should be appreciated that along with storage system sales business constantly brings forth new ideas and issues new product, in SAN Yu NAS storage be
Clear difference between system continues to weaken, and actual storage system implementations often represents the characteristic of the two, in identical systems
Both middle offer file-level agreement (NAS) and block level agreement (SAN).Such as, in alternate nass framework, NAS " head (head) "
Or " gateway " equipment networking is to main frame rather than tradition NAS device.Such NAS gateway device itself does not comprises storage and drives
Dynamic, but make External memory equipment be connectable to NAS gateway device (such as, via fibre channel interface etc.).By main frame with
Similar manner is perceived as such NAS gateway device of tradition NAS device and provides a kind of for dramatically increasing storage based on NAS
It is simple that the capacity (the memory capacity level such as, supported by SAN on more conventional) of framework and holding file-level storage access
The ability of property.
SCSI and other storage devices based on block agreement (the storage system 30 than as shown in Figure 1A) utilize and represent one
The system management memory device 31 of the storage processor of individual or multiple programming with the memory element in polymerization storage device or drives
Dynamic and be rendered as them each having uniquely can one or more LUN(logical unit number of identifier number) 34.
LUN34 passes through network 20(such as, fiber channel etc.) fitted by physical host bus by one or more computer system 10
Orchestration (HBA) 11 accesses.In computer system 10 and at more than HBA11, by from the beginning of low level device drivers layer 12
And a series of software layers terminated in operating system proprietary file system layer 15 are implemented storage and are accessed abstract with having feature.
Realize the device driver layer 12 of the basic access of LUN34 is generally specific to the communication protocol that used by storage system (such as,
SCSI etc.).Data access layer 13 can be implemented to support to be visited by HBA11 and other data in device driver layer more than 12
Ask that the multipath of control and the visible LUN34 of management function merges.Generally at data access layer 13 and conventional operating systems file
Between system layer 15 implement LVM 14 support by the addressable LUN34 of HBA11 towards volume virtualization and
Management.Under the control of LVM 14, multiple LUN34 can be assembled together as volume and manage for literary composition
Part system layer 15 presents and is used as logical device by file system layer 15.
System management memory device 31 be implemented in resident in storage system 30, be referred to as spindle in figure ia
(spindle) virtualization being typically based on the physical memory cell that dish drives of 32.From the point of view of logic viewpoint, these spindles 32
In each spindle can be considered the array successively in panel (extent) of fixed size.System management memory device 31 passes through
(this logical memory space is divided to expose continuous logic memory space to the computer system (such as computer system 10) connected
Become to be referred to as the set of virtual SCSI device of LUN34) carry out abstract melting and make reading and write operation actual spin with what dish drove
The address in hammer body and panel is the complexity of target.Each LUN is by there is such LUN and presenting to computer system 10
Such LUN represents certain capacity being assigned for being used by computer system 10.Unit safeguarded by system management memory device 31
Data, this metadata includes the mapping for each such LUN to the ordered list in panel, and the most each such panel can
To be identified as spindle-panel to<spindle #, panel #>and any spinning of therefore may be located in various spindle 32
In hammer body.
Figure 1B is by network 21(such as, Ethernet) it is connected to one or more via NIC (NIC) 11 '
The block diagram of the Conventional memory systems based on NAS or file-level 40 of computer system 10.Storage system 40 includes representing one
Or the system management memory device 41 of the storage processor of multiple programmings.System management memory device 41 is resident in storage system 40
, in fig. ib be referred to as spindle 42 be typically based on dish drive physical memory cell above implement file system 45.From patrolling
From the point of view of collecting viewpoint, each spindle in these spindles can be considered the array successively in the panel 43 of fixed size.File
System 45 includes catalogue and file (these catalogues and literary composition by exposing to the computer system (such as computer system 10) connected
Part can be organized into file system level volume 44(hereinafter referred to as " FS volume "), these FS volume is visiting by their corresponding assembling point
Ask) name space come the abstract address in actual spindle and the panel making reading and write operation drive with dish that melts as target
Complexity.
Even if the progress in the storage system being described above, still have been widely recognized that they are the most scalable
To meet the specific needs of virtualized computer system.Such as, server machine cluster can serve up to 10,000
Virtual machine (VM), each VM uses number to be multiple " virtual disks " and number is multiple " snapshot ", each virtual disk and soon
According to the file can being such as stored as on specific LUN or FS volume.Even if be scaled down be estimated as every VM have 2 virtual
Dish and 2 snapshots, if VM is directly connected to physical disks (i.e. every physical disks has 1 virtual disk or snapshot), this still amounts to
60,000 different dishes are supported for storage system.Additionally, this scale storage device and Topology Management is known has any problem.Make
For result, develop such as combine here by quoting, entitled " Providing Multiple Concurrent
Access to a File System " United States Patent (USP) 7, this concept of the data repository described in 849,098, at these
In data repository, VM is multiplexed into physical storage entity (such as, VMFS clustered file systems based on LUN or FS volume)
In more small set.
In using the Conventional memory systems of LUN or FS volume, from multiple VM workload generally by single LUN or
Single FS volume of service.As result, the resource requirement from a VM workload will affect on identical LUN or FS volume to separately
The service level that one VM workload provides.For storage efficiency measurement (such as time delay and input/output operations per second (IO) or
Person IOPS) number of variations of workload that is accordingly dependent in given LUN or FS volume and can not being ensured.Thus,
The storage strategy for the storage system using LUN or FS volume can not be performed on the basis of every VM, and can not be at every VM base
Give service level agreement (SLA) on plinth to ensure.Additionally, the granularity at LUN or FS volume rather than the grain of the virtual disk at VM
Degree provides the data, services (such as snapshot, replicate, encrypt and deduplication) provided by storage system sales business.As result, can
To use the data, services provided by storage system sales business to come for whole LUN or whole FS volume of establishment snapshot, but can not
Store the LUN of virtual disk from which or file system creates the snapshot of the single virtual dish for VM discretely.
Summary of the invention
One or more embodiment relates to a kind of storage system, and this storage system is arranged to be isolated in wherein to be run
Workload so that every workload ground provides SLA to ensure and can the number of every workload ground offer storage system
According to service, and without storing the thorough redesign of system.In storage in the storage system of the virtual disk of multiple virtual machines,
SLA can be provided on the basis of every virtual disk to ensure, and the data of storage system can be provided on the basis of every virtual disk to take
Business.
According to one embodiment of present invention, storage system is referred to as the logical storage volume of " storage container " from here and refers to
Derive among group on every workload basis, be supplied as logical storage volumes that store object, that be referred herein as " virtual volume ".
For VM, virtual volume can be created for each virtual disk in the virtual disk of VM and snapshot and snapshot.In one embodiment,
By the computer system connected by the logic for protocol traffic being referred to as " protocol end " configured within the storage system
End points, use standard agreement (such as SCSI and NFS) access virtual volume as desired.
A kind of for being tied to depositing by the logical storage volumes created within the storage system
In storage system, the protocol end of configuration comprises the following steps for the method used by the application run in computer systems:
The request for binding logical storage volumes is sent to storage system via non-I/O path;And memory response receives in request
First identifier and the second identifier, wherein the first identifier and the second identifier are encoded into and will deposit to logic via I/O path
The IO that storage is permed out.First identifier identity protocol end points and the second identifier mark logical storage volumes.
A kind of according to an embodiment of the invention method bag for sending input-output order (IO) to logical storage volumes
Include following steps: receive the read/write for file from application and ask;Generate block level IO corresponding with read/write request;
The block device title included in block level IO is translated into the first identifier and the second identifier;And to by the first identifier mark
The protocol end known sends IO, IO and includes the second identifier for identifying logical storage volumes.
A kind of computer system is included therein multiple virtual machines of operation, in virtual machine
Each virtual machine have within the storage system as the virtual disk that is managed of logical storage volumes separated.This computer system is also
Including: hardware store interface, it is arranged to send IO to storage system;And virtualization software module, be arranged to from
Virtual machine receives the read/write for the file on virtual disk and asks and each have according to read/write request generation
Protocol end identifier and the first and second IO of secondary identifier.
Embodiments of the invention also include a kind of non-transient computer-readable recording medium storing instruction, and these instructions exist
Computer system is made to perform one of methods set forth above when being performed by computer system.
Accompanying drawing explanation
Figure 1A is the conventional memory device based on block agreement being connected to one or more computer system by network
Block diagram.
Figure 1B is the block diagram of the conventional NAS device being connected to one or more computer system by network.
Fig. 2 A is the storage system cluster based on block agreement of enforcement virtual volume according to an embodiment of the invention
Block diagram.
Fig. 2 B is the frame of the storage system cluster based on NAS of enforcement virtual volume according to an embodiment of the invention
Figure.
Fig. 3 is the virtual for managing of the storage system cluster of Fig. 2 A according to an embodiment of the invention or Fig. 2 B
The block diagram of the parts of volume.
Fig. 4 is the flow chart of the method step for creating storage container.
Fig. 5 A is in storage system based on SAN the one of the computer system being software for carrying out virtual volume of master control
The block diagram of individual embodiment.
Fig. 5 B is the computer system of Fig. 5 A being arranged to virtual volume of master control in storage system based on NAS
Block diagram.
Fig. 5 C is the another of the computer system being software for carrying out virtual volume of master control in storage system based on SAN
The block diagram of one embodiment.
Fig. 5 D is the computer system of Fig. 5 C being arranged to virtual volume of master control in storage system based on NAS
Block diagram.
Fig. 6 is the calculating illustrating the parts and communication path for managing virtual volume according to an embodiment of the invention
The simplified block diagram of machine environment.
Fig. 7 is the flow process for the method step to the storage system cluster authentication calculations machine system of Fig. 2 A or Fig. 2 B
Figure.
Fig. 8 is the flow chart of the method step for creating virtual volume according to an embodiment.
Fig. 9 A is the flow chart for finding can be used for the method step of the protocol end of computer system.
Fig. 9 B is for storing the method for the protocol end that system discovery computer system is connected to step via in-band path
Rapid flow chart.
Figure 10 is the flow chart of method step for sending and perform virtual volume bind request according to an embodiment.
Figure 11 A and Figure 11 B is the flow chart of the method step for sending IO to virtual volume according to an embodiment.
Figure 12 is the flow chart of the method step for performing IO in storage system according to an embodiment.
Figure 13 is the stream of method step for sending and perform virtual volume bind request again according to an embodiment
Cheng Tu.
Figure 14 is the concept map of the life cycle of virtual volume.
Figure 15 is the flow process of the method step for supplying VM of the storage system of the use Fig. 2 A according to an embodiment
Figure.
Figure 16 A is the flow chart for the method step making VM be energized.
Figure 16 B is the flow chart of the method step for making VM power-off.
Figure 17 is the flow chart of the method step of the size of the vvol for extending VM.
Figure 18 is the flow chart of the method step of the vvol for VM mobile between storage container.
Figure 19 is the flow chart for the method step from template VM clone VM.
Figure 20 is the flow chart of the method step for supplying VM according to another embodiment.
Figure 21 illustrates samples storage capability profile and for the method creating storage container, and the method includes that profile selects step
Suddenly.
Figure 22 is to illustrate for creating vvol and the definition flow process for the method step of the storage capacity profile of vvol
Figure.
Figure 23 is the flow chart of the diagram method step for creating snapshot.
Detailed description of the invention
Fig. 2 A and Fig. 2 B is the block diagram of the storage system cluster implementing " virtual volume " according to an embodiment of the invention.Storage
System cluster includes one or more storage system, such as, stores system 1301With 1302, these storage systems can be dish
Array, each storage system has multiple data storage cell (DSU), and marking one of these DSU in the drawings is 141, and deposits
Storage system cluster includes system management memory device 131 and 132, and these system management memory devices control the various of storage system 130
Operation is to realize embodiments of the invention described herein.In one embodiment, two or more storage system 130 is permissible
Implementing distributed memory system manager 135, this distributed memory system manager 135 controls the operation of storage system cluster such as
It is that single logical storage system is the same with them.The operation domain of distributed memory system manager 135 can be crossed at identical number
In according in the heart or cross over the storage system that multiple data center installs.Such as, in one suchembodiment, distributed deposit
Storage system manager 135 can include system management memory device 131, and system management memory device 131 is managing with as " subordinate "
" master control " manager it is used as, but it would be recognized that can implement for enforcement point when the system management memory device 132 of device communicates
The multiple alternative approach of cloth system management memory device.DSU represents physical memory cell, such as, based on dish or flash memory deposits
Storage unit, such as rotating disk or solid-state disk.According to embodiment, storage system cluster creates and to the computer system connected
(such as computer system 1001 and 1002) exposes " virtual volume " (vvol) as specifically described further here.In department of computer science
The application (such as, VM etc., these VM access their virtual disk) run in system 100 uses standard agreement (such as Fig. 2 A's
In embodiment for SCSI and in the embodiment of Fig. 2 B for NFS), by that be configured in storage system 130, be referred to as " association
View end points " the on-demand access vvol of the logical endpoints for SCSI or NFS protocol flow of (PE).From computer system 100 to
The communication path for the data manipulation relevant with application of storage system 130 referred herein as path " in band ".At computer
Between host bus adaptor (HBA) and the PE configured in storage system 130 of system 100 and in computer system 100
NIC (NIC) and in storage system 130 communication path between the PE of configuration be the example of in-band path.From meter
Calculation machine system 100 to storage system 130 be not band in and be commonly used to perform management operate communication path here by
It is referred to as " band is outer " path.The example of out-of-band path is illustrated in figure 6 discretely, such as in computer system 100 from in-band path
It is connected with the ethernet network between storage system 130.To put it more simply, computer system 100 is shown as being directly connected to deposit
Storage system 130.It will be appreciated, however, that they can by one or more switch in multiple paths and switch even
Receive storage system 130.
Distributed memory system manager 135 or single system management memory device 131 or 132 can be from representing physics
The logic " storage container " of the logical aggregate of DSU creates vvol(such as, when computer system 100 is asked etc.).It is said that in general,
Storage container can cross over more than one storage system, and many storage containers can by single system management memory device or
Distributed memory system manager creates.Similarly, single storage system can comprise many storage containers.At Fig. 2 A and Fig. 2 B
In, distributed memory system manager 135 storage container 142 createdAIt is shown as crossing over storage system 1301With storage system
System 1302, and storage container 142BWith storage container 142CIt is shown as being contained in single storage system and (that is, is respectively storage
System 1301With storage system 1302).It should be appreciated that owing to storage container can cross over a multiple storage system, so depositing
Storage system manager can supply the memory capacity of the memory capacity exceeding arbitrary storage system to its client.Also it should be noted that
Arrive, owing to multiple storage container can be created in single storage system, so system management memory person can use single depositing
Storage system supplies storage device to multiple clients.
In the embodiment of Fig. 2 A, supply each vvol from block-based storage system.In the embodiment of Fig. 2 B, based on
The storage system of NAS implements file system 145 on DSU141, and exposes each vvol conduct to computer system 100
File object in this file system.Additionally, as following will specifically describe further as, in computer system 100
The application of upper operation accesses the vvol for IO by PE.Such as, as shown in the dotted line in Fig. 2 A and Fig. 2 B, vvol151 and
Vvol152 may have access to via PE161;Vvol153 and vvol155 may have access to via PE162;Vvol154 via PE163 and
PE164 may have access to;And vvol156 may have access to via PE165.It should be appreciated that from multiple storage containers vvol(such as
Vvol153 in storage container 142A and at storage container 142CIn vvol155) can any time given via
Single PE(such as PE162) may have access to.It should also be appreciated that PE(such as PE166) can may have access to via them not existing
Any vvol time exist.
In the embodiment of Fig. 2 A, storage system 130 uses known for setting up the method for LUN that PE is embodied as spy
The LUN of different type.As with LUN, storage system 130 provides to each UE and is referred to as WWN(world wide title) unique
Identifier.In one embodiment, when creating PE, store the not specified size for special LUN of system 130, because here
The PE described is not real data container.In one suchembodiment, storage system 130 can assign null value or the least
Value as the size of the LUN relevant with PE so that as discussed further below, manager can please seek survival
Storage system provide LUN(such as, traditional data LUN and the LUN relevant with PE) list time quickly identify PE.Similarly, storage
System 130 can assign the LUN numbering more than 255 as the identifier number for LUN to indicate by mankind close friend's mode to PE
They are not data LUN.As the another way for distinguishing between PE and LUN, can be to inquiry data VPD page of extension
PE position is added in face (page 86h).When LUN is PE, PE position is arranged to 1, and is arranged to 0 when it is general data LUN.
Computer system 100 can be by sending scsi command REPORT_LUNS(report _ LUN) to find PE also via in-band path
And the PE position indicated by inspection determines whether they are the PE according to the embodiments described herein or routine data LUN.
Computer system 100 can check that LUN size and LUN numbering character are to further confirm that whether as PE or often LUN alternatively
Rule LUN.It should be appreciated that any technology in technology described above can be used to distinguish the LUN relevant with PE and common number
According to LUN.In one embodiment, PE position technology is used to the only technology distinguishing the LUN relevant with PE with general data LUN.
In the embodiment of Fig. 2 B, use known for setting up the method for the assembling pointing to FS volume point in storage system
PE is created in 130.The each PE created in the embodiment of Fig. 2 B is by the IP address being collectively referred to as " assembling point " the most routinely
Uniquely identify with file system path.But, it being different from conventional assembling point, PE does not associate with FS volume.Additionally, be different from Fig. 2 A
The PE of PE, Fig. 2 B unless the PE that virtual volume is tied to give otherwise can not be found by computer system 100 via in-band path.
Therefore, the PE of Fig. 2 B via out-of-band path by storing System Reports.
Fig. 3 is the parts for managing virtual volume of the storage system cluster of Fig. 2 A according to an embodiment or Fig. 2 B
Block diagram.Parts include system management memory device 131 and 132 soft performed in storage system 130 in one embodiment
Part module or the software module of distributed memory system manager 135 in another embodiment, i.e. input/output (I/O)
Manager 304, volume manager 306, container manager 308 and data access layer 310.In the description of embodiment here, should
Work as understanding, distributed memory system manager 135 any action taked can depend on that embodiment is by system management memory
Device 131 or system management memory device 132 are taked.
In the example of fig. 3, distributed memory system manager 135 creates three storage containers from DSU141
SC1, SC2 and SC3, each storage container in these storage containers is shown as having the spindle body disc being noted as P1 to Pn
District.It is said that in general, each storage container has fixed physical size and associates with the concrete panel of DSU.Shown in figure 3
In example, distributed memory system manager 135 has the access to container database 316, and container database 316 is deposited for each
Storage container stores its Container ID, physical layout information and some metadata.Container database 316 is managed by container manager 308
Reason and renewal, container manager 308 is the parts of distributed memory system manager 135 in one embodiment.Container ID is
The universal unique identifier given to storage container when creating storage container.Physical layout information by with the storage container given
The spindle panel of the DSU141 associating and being stored as the ordered list of<panel is numbered for system identifier, DSU ID>is constituted.Unit
Data section division can comprise some public and some storage exclusive metadata of system sales business.Such as, metadata merogenesis can wrap
Containing being licensed for access to the computer system of storage container or application or the ID of user.As another example, metadata merogenesis
Comprising assignment bit map, this assignment bit map has been assigned with which < system identifier, DSU of storage system for representing to existing vvol
ID, panel is numbered > panel and which panel be idle.In one embodiment, system management memory person can create for not
With the Separate Storage container of business unit, so that not from the vvol of identical storage container supply different business unit.Permissible
Apply other strategies being used for isolating vvol.Such as, system management memory person can use and will supply clouds from different storage containers
The such strategy of vvol of the different clients of service.Vvol can also be grouped and from depositing according to their required service level
Storage container is supplied.Additionally, system management memory person can create, delete and otherwise manage storage container, the most fixed
The storage container number that can be created of justice and arrange can the greatest physical size of every storage container setting.
Equally, in the example of fig. 3, distributed memory system manager 135 has been supplied and (has been represented requesting computer system
System 100) each from multiple vvol of different storage containers.It is said that in general, vvol can have fixed physical size or can
With by thin supply, and each vvol has vvol ID, this vvol ID be when creating vvol to vvol give general only
One identifier.For each vvol, vvol data base 314 is that each vvol stores depositing of its vvol ID, wherein establishment vvol
The ordered list of<skew, length>value of the Container ID of storage container and the address space including vvol in this storage container.
Vvol data base 314 is managed by volume manager 306 and updates, and volume manager 306 is distributed storage system in one embodiment
The parts of system manager 135.In one embodiment, vvol data base 314 also stores a small amount of metadata about vvol.This
Metadata is stored as the set of key-value pair in vvol data 314, and can any at the duration of existence of vvol
Time is updated by computer system 100 via out-of-band path and inquires about.The key-value of storage is to falling in three classifications.The
One classification is: the definition of some keyword of keyword the known explanation of their value (and therefore) can be used publicly.
One example is that (such as, in virtual machine embodiment, whether vvol comprises the metadata of VM or the number of VM with virtual volume type
According to) corresponding keyword.Another example be App ID, App ID be the ID of the application storing data in vvol.Second category
It is: some keyword and value are stored as void by computer system exclusive keyword computer system or its management module
Intend the metadata of volume.3rd classification is: the storage exclusive keyword of system sales business these permissions storage system sales business deposit
Some keyword of the metadata association of storage and virtual volume.This key-value thesaurus is used for it by storage system sales business
A reason of metadata be that all these keyword is readily available for storage system via the outband channel for vvol
Distributors's plug-in unit and other extensions.For key-value to store operation be virtual volume create and the part of other processes, and
And therefore storage operation should be the fastest.Storage system is also configured for and the value provided on concrete keyword
Exact match realizes the search of virtual volume.
I/O Manager 304 is that the software module safeguarded and connect data base 312 is (the most in certain embodiments for distributed storage
The parts of system administration manager 135), connect the currently active IO access path that data base 312 is stored between PE and vvol.At figure
In example shown in 3, it is shown that seven the currently active IO sessions.Each active session has related PEID, secondary mark
Symbol (SLLID), vvol ID and reference count (RefCnt), the difference that reference count instruction performs IO by this IO session is answered
Number.By distributed memory system manager 135(such as, when being asked by computer system 100) PE Yu vvol it
Between set up effective IO session process be referred herein as " binding " process.For each binding, distributed memory system manager
135(such as, via I/O Manager 304) to connect data base 312 add entry.By distributed memory system manager 135 with
Process that the process of rear dismounting IO session referred herein as " solves binding ".Binding, distributed memory system manager is solved for each
135(such as, via I/O Manager 304) reference count of IO session successively decreased one.When the reference count of IO session is zero, point
Cloth system management memory device 135(such as, via I/O Manager 304) can delete for this IO even from connecting data base 312
Connect the entry in path.As previously discussed, in one embodiment, computer system 100 generates and from outside via band
Bind request is conciliate in the transmission binding of radial distribution formula system management memory device 135.Alternatively, computer system 100 can be passed through
Make existing erroneous path over loading generate and solve bind request via in-band path transmission.In one embodiment, in ginseng
Examine count from 0 change over 1 or contrary time will generate number and be changed to the number of monotone increasing addend or stochastic generation.Real at another
Execute in example, generate the number that number is randomly generated, and eliminate RefCnt row from connecting data base 312, and for each binding,
Even if when bind request is to the vvol bound, distributed memory system manager 135 also (such as, manages via IO
Device 304) add entry to connection data base 312.
Storage systematic group at Fig. 2 A is concentrated, and I/O Manager 304 is used and connects data base 312 and process and received by PE
I/O Request (IO) from computer system 100.When receiving IO at one of PE, I/O Manager 304 resolves IO with mark at IO
In PE ID and SLLID that comprise to determine the vvol that IO is intended for.Data base 314, I/O Manager is connected by accessing
Then 304 can fetch the vvol ID associated with PE ID and SLLID resolved.In Fig. 3 and subsequent figure, incite somebody to action to simplify
PE ID is shown as PE_A, PE_B etc..In one embodiment, actual PE ID is the WWN of PE.Additionally, SLLID is shown as
S0001, S0002 etc..Actual SLLID by distributed memory system manager 135 be generated as be connected in data base 312 to
Any unique number among the SLLID of fixed PE ID association.Have vvol ID virtual volume logical address space with
Mapping between the physical location of DSU141 is used vvol data base 314 by volume manager 306 and is made by container manager 308
Perform with container database 316.Once having been obtained for the physical location of DSU141, data access layer 310(is an enforcement
Also for the parts of distributed memory system manager 135 in example) just physically perform IO at these.
Storage systematic group at Fig. 2 B is concentrated, and receives IO by PE, and each such IO includes that IO has been dealt into
NFS handle (or similar file system handle).In one embodiment, for the connection data base of such system
The IP address of the NFS interface that 312 comprise storage system is PE ID and includes that file system path is SLLID.Exist based on vvol
The position of the vvol in file system 145 generates SLLID.Between the logical address space and the physical location of DSU141 of vvol
Mapping used vvol data base 314 by volume manager 306 and used container database 316 to hold by container manager 308
OK.Once having been obtained for the physical location of DSU141, data access layer just physically performs IO at these.It should be noted that
Arriving, for the storage system of Fig. 2 B, container database 312 can comprise literary composition in the container position entry for given vvol
Part:<skew, length>entry ordered list (that is, vvol can be included in file system 145 the multiple file sections stored).
In one embodiment, safeguard in volatile memory and connect data base 312, and at persistent storage (ratio
Such as DSU141) in safeguard vvol data base 314 and container database 316.In other embodiments, can be in persistent storage
The all data bases of middle maintenance 312,314,316.
Fig. 4 is the flow chart of the method step 410 for creating storage container.In one embodiment, these steps exist
By system management memory device 131, system management memory device 132 or distributed memory system pipe under the control of storage administrator
Reason device 135 performs.As noted above, storage container represents the logical aggregate of physics DSU and can cross over from many
Physics DSU in a storage system.In step 411, storage administrator's (via distributed memory system manager 135 etc.) sets
Put the physical capacity of storage container.In cloud or data center, this physical capacity can such as represent is leased by client
The amount of physical storage device.The motility provided by storage container disclosed herein is that the storage container of different client can be by depositing
Storage manager supply from identical storage system, and the storage container being used for single client can such as set in any one storage
Standby physical capacity be insufficient for client request size in the case of or such as vvol the physical store area of coverage will
Naturally it is supplied from multiple storage systems in the case of crossing over multiple such duplication of storage system.In step 412, storage tube
Reason person provides access to the Permission Levels of storage container.Such as, in multi-tenant data in the minds of, client only can access
The storage container leased to he or she.In step 413, distributed memory system manager 135 generates for storage container
Unique identifier.Then, in step 414, distributed memory system manager 135(such as, in one embodiment via container
Manager 308) with quantity sufficient to the idle spindle panel of storage container allocation DSU141.As noted above, exist
In the case of the free space of any one storage system is insufficient for physical capacity, distributed memory system manager 135
Can be from the spindle panel of multiple storage systems distribution DSU141.After having been allocated for subregion, distributed memory system
Manager 135(such as, via container manager 308) with unique container ID,<System Number, DSU ID, panel numbering>have
Sequence table updates container database 316 with the context ID of the computer system being licensed for access to storage container.
According to the embodiments described herein, memory capacity profile (such as, SLA or service quality (QoS)) can be often
On the basis of vvol by distributed memory system manager 135(such as, requesting computer system 100 is represented) configuration.Therefore, having can
Can allow the vvol with different storage capacity profile is the part of identical storage container.In one embodiment, system manager
Definition default storage capability profile (or multiple may storage capacity profile), should (or these) storage capacity profile create
For newly created vvol and be stored in the metadata merogenesis of container database 316 during storage container.If not for
The new vvol created in storage container explicitly specifies storage capacity profile, then succession is associated by new vvol with storage container
Default storage capability profile.
Fig. 5 A is the computer system being configured to practice virtual volume of master control in the storage system cluster of Fig. 2 A
The block diagram of embodiment.One or more CPU (CPU) 501, memorizer 502, or many can included
Individual NIC (NIC) 503 is normal with the usually server category of one or more host bus adaptor (HBA) 504
Computer system 101 is constructed on rule hardware platform 500.HBA504 makes computer system 101 to pass through in storage device 130
The PE of configuration sends IO to virtual volume.It is further illustrated in Fig. 5 A, in hardware platform 500 operating system installed above 508, and
And on operating system 508, perform multiple application 5121-512N.The example of operating system 508 include known to commercial operation system
Any operating system in system (such as Microsoft Windows, Linux etc.).
According to the embodiments described herein, each application 512 have associated one or more vvol and to
By operating system 508 according to the block device being called the vvol created by application 512 " the establishment equipment " in operating system 508
Example sends IO.Associating between block device title with vvol ID it is maintained in block device data base 533.Carry out self-application
The IO of 5121-512N is received by file system driver 510, and file system driver 510 is converted into block IO, and to
Virtual volume device driver 532 provides block IO.On the other hand, carry out the IO of self-application 5121 to be shown as walking around file system and drive
Dynamic device 510 and be provided directly to virtual volume device driver 532, this represent application 5121 with by incorporated herein will be complete
Portion's content combines, entitled " Providing Access to a Raw Data Storage Unit in a Computer
System " United States Patent (USP) 7, the mode described in 155,558 directly accesses its block device as original storage device, example
As, achieve and content storage storehouse as data base disk, daily record dish, backup.When virtual volume device driver 532 receives block IO,
Its access block device databases 533 is to quote the WWN of block device title and the PE ID(PE LUN specified in IO) and SLLID
Between mapping, the IO access path of vvol with block device names associate is led in this PE ID and SLLID definition.Institute here
In the example shown, block device title archive corresponds to as application 5121The block device example vvol12 created, and block device
Title foo, dbase and log correspond respectively to as application 5122-512NIn the block device that creates of one or more application real
Example vvol1, vvol16 and vvol17.In block device data base 533, other information of storage include for each block device
Enlivening place value, this enlivens whether place value instruction block device enlivens, and includes the in-flight order of CIF(, commands-in-
Flight) value.Enliven position " 1 " to represent and can send IO to block device.Enliven position " 0 " and represent that block device is inactive and can not be to
Block device sends IO.CIF-value provides how many IO instruction of awing (but i.e., be issued not yet complete).Shown here
Example in, block device foo is active and has some in-flight orders.Block device archive is inactive and will not
Newer order can be accepted.But, it waits that 2 in-flight orders complete.Block device dbase is inactive and nothing is the completeest
The order become.Finally, block device log enlivens, but application is currently without to the pending IO of equipment.Virtual volume device driver
532 can select to remove such equipment from its data base 533 at any time.
In addition to performing mapping described above, virtual volume device driver 532 also sends former to data access layer 540
Beginning block level IO.Data access layer 540 includes equipment access layer 534 and the device driver 536 for HBA504, equipment access layer
Command queuing and scheduling strategy are applied to original block level IO, and device driver 536 lattice in the form meet agreement by 534
Formula original block level IO and to HBA504 send they for via in-band path to PE forward.Use SCSI association wherein
In the embodiment of view, such as at SAM-5(SCSI framework model-5) in the SCSI LUN data field specified encode vvol believe
Breath, this data field is 8 byte structures.In front 2 bytes, coding is routinely used for the PE ID of LUN ID, and utilizes surplus
6 bytes (part) that Yu encode vvol information in the LUNID of the SCSI second level, are SLLID especially.
Being further illustrated in Fig. 5 A, data access layer 540 also includes for disposing by in-band path from storage system
The error handling unit 542 of the I O error received.In one embodiment, I/O manager 304 propagated by mistake by PE
Put the I O error that unit 542 receives.The example of I O error class is included in the path error between computer system 101 and PE, PE
Mistake and vvol mistake.All mistakes detected are categorized as aforementioned class by error handling unit 542.Running into the road leading to PE
Footpath mistake and in the presence of leading to another path of PE, data access layer 540 is along the different path transmission IO leading to PE.At IO
When mistake is PE mistake, error handling unit 542 updates block device data base 533 with instruction for sending each of IO by PE
The erroneous condition of block device.When I O error is vvol mistake, error handling unit 542 updates block device data base 533 to refer to
Show the erroneous condition for each block device associated with vvol.Error handling unit 542 can also send warning or system
Event, so that by refusal to the further IO of the block device with erroneous condition.
Fig. 5 B is the block diagram of the computer system of Fig. 5 A, and this computer system has been arranged to the storage system with Fig. 2 B
The storage systematic group collection docking of system cluster rather than Fig. 2 A.In this embodiment, data access layer 540 includes NFS client
545 and for the device driver 546 of NIC503.Block device title is mapped to PE ID(NAS storage by NFS client 545
System IP address) and SLLID, this SLLID be the NFS file handle corresponding with block device.As shown in Figure 5 B at block device number
According to storehouse 533 stores this mapping.Yet suffer from it should be noted that, active with CIF row, but block device number shown in figure 5b
It is not illustrated according in storehouse 533.As will be described below, NFS file handle unique mark file pair in NAS storage system
As, and can be generated during binding procedure.Alternatively, in response to the request for binding vvol, NAS storage system returns
Return PEID and SLLID, and use the opening of the vvol of common in-band mechanisms (such as, search or readdirplus) will be to
Give NFS file handle.NFS client 545 also original block level IO received from virtual volume device driver 532 is translated into based on
The IO of NFS file.Then device driver 546 for NIC503 formats based on NFS file in the form meet agreement
IO and they be sent collectively to NFS handle NIC503 forward to one of PE for via in-band path.
Fig. 5 C is the block diagram of another embodiment of the computer system being software for carrying out virtual volume.In this embodiment
In, to the virtualization software of computer system 102 configuration denoted here as management program 560.Pacify on hardware platform 550
Tubulature reason program 560, hardware platform 550 includes CPU551, memorizer 552, NIC553 and HBA554 and supports that virtual machine is held
Row space 570, can parallel instancesization and the multiple virtual machines of execution (VM) 571 in virtual machine execution space 5701-571N.?
In one or more embodiment, use the VMware distributed by the VMware company in Paro Otto city, CaliforniaProduct implements management program 560 and virtual machine 571.Each virtual machine 571 implements virtual hardware platform 573,
Virtual hardware platform 573 is supported to install and is able to carry out applying the client operating system (OS) 572 of 579.The example bag of client OS572
Any operating system in commercial operation system (such as Microsoft Windows, Linux etc.) known to including.In each reality
In example, client OS572 includes native file system layer (the most not shown), such as, the literary composition of NTFS or ext3FS type
Part system layer.These file system layers dock with virtual hardware platform 573 to be deposited with access data from the viewpoint of client OS572
Storage device HBA, data storage device HBA are the virtual HBA574 implemented by virtual hardware platform 573 in reality, virtual
HBA574 provides performance disk storage support (to be virtual disk or virtual disk 575 in realityA-575X) to realize performing client
OS572.In certain embodiments, virtual disk 575A-575XCan show as supporting from the viewpoint of client OS572 for even
Receive the SCSI of virtual machine or any other of IDE, ATA and ATAPI of including known to persons of ordinary skill in the art
Suitably hardware connects interface standard.Although from the viewpoint of client OS572, what such client OS572 initiated being used in fact
Grant the relevant data transmission of file system and the file system call controlling to operate shows as to virtual disk 575A-575XPathfinding
Performed by for final, but in reality, process and transmit such calling by virtual HBA574 and adjust virtual machine
Monitor (VMM) 5611-561N, these virtual machine monitors (VMM) 5611-561NImplement to coordinate behaviour with management program 560
The virtual system support made and need.Specifically, HBA emulator 562 functionally enables data transmission and controls operation
Correctly disposed by management program 560, management program 560 by its various layers to being connected to the HBA554 of storage system 130
Transmit such operation eventually.
According to the embodiments described herein, each VM571 have associated one or more vvol and to by
Management program 560 is according to the block device example being called the vvol created by VM571 to " the establishment equipment " managed in program 560
Send IO.Associating between block device title with vvol ID it is maintained in block device data base 580.From VM5712-571N
IO received by SCSI virtualization layer 563, SCSI virtualization layer 563 be converted into virtual machine file system (VMFS) drive
The file I/O that device 564 understands.Then file I/O is changed in bulk IO by the 564 of VMFS driver, and to virtual volume device drives
Device 565 provides block IO.On the other hand, from VM5711IO be shown as walking around VMFS driver 564 and directly being carried
Supply virtual volume device driver 565, this represents VM5711Directly to access in the way of described in United States Patent (USP) 7,155,558
Its block device is as original storage device, such as, achieves and content storage storehouse as data base disk, daily record dish, backup.
When virtual volume device driver 565 receives block IO, its access block device databases 580 is specified in IO to quote
Block device title and PE ID and SLLID between mapping, PE ID and SLLID definition lead to and block device names associate
The IO session of vvol.In example shown here, block device title dbase and log correspond respectively to as VM5711Create
Block device example vvol1 and vvol4, and block device title vmdk2, vmdkn and snapn correspond respectively to as VM5712-571N
In one or more VM create block device example vvol12, vvol16 and vvol17.Block device data base 580 deposits
Other information of storage include enlivening place value for each block device, and this enlivens whether place value instruction block device enlivens, and wraps
Include the in-flight order of CIF() value.Enliven position " 1 " to represent and can send IO to block device.Enliven position " 0 " and represent block device not
Enliven and IO can not be sent to block device.CIF-value provides how many IO finger of awing (but i.e., be issued not yet complete)
Show.
In addition to performing mapping described above, virtual volume device driver 565 also sends former to data access layer 566
Beginning block level IO.Data access layer 566 includes equipment access layer 567 and the device driver 568 for HBA554, equipment access layer
Command queuing and scheduling strategy are applied to original block level IO, and device driver 568 lattice in the form meet agreement by 567
Formula original block level IO and to HBA554 send they for via in-band path to PE forward.Use SCSI association wherein
In the embodiment of view, such as at SAM-5(SCSI framework model-5) in the SCSI LUN data field specified encode vvol believe
Breath, SCSI LUN data field is 8 byte structures.In front 2 bytes, coding is routinely used for the PE ID of LUN ID, and
Utilize 6 bytes (part) of residue to encode vvol information in the LUN ID of the SCSI second level, be SLLID especially.Such as figure
Being further illustrated in 5C, data access layer 566 also includes the error handling worked in the way of identical with error handling unit 542
Unit 569.
Fig. 5 D is the block diagram of the computer system of Fig. 5 C, and this computer system has been arranged to the storage system with Fig. 2 B
The storage systematic group collection docking of system cluster rather than Fig. 2 A.In this embodiment, data access layer 566 includes NFS client
585 and for the device driver 586 of NIC553.Block device title is mapped to PE ID(IP address by NFS client 585) and
The SLLID(NFS file handle corresponding with block device).This mapping is stored as shown in fig. 5d in block device data base 580.
Yet suffer from but block device data base 580 shown in figure 5d is not illustrated it should be noted that, active with CIF row.As with
Lower by as description, NFS file handle uniquely identifies file object in NAS, and is binding in one embodiment
It is generated during journey.NFS client 585 also original block level IO received from virtual volume device driver 565 is translated into based on
The IO of NFS file.Then device driver 586 for NIC553 formats based on NFS file in the form meet agreement
IO and they be sent collectively to NFS handle NIC553 forward to one of PE for via in-band path.
It should be appreciated that various terms, layer and the classification for describing the parts in Fig. 5 A-Fig. 5 D can differently be quoted
And without departing from their function or spirit and scope of the present invention.Such as, VMM561 can be considered at VM571 and pipe
The virtualization parts of the separation between reason program 560 (can itself be considered to virtualize " kernel " portion in such concept
Part) because there is the VMM of the separation of the VM for each instantiation.Alternatively, each VMM561 can be considered its correspondence
The parts of virtual machine, because such VMM includes the simulation hardware parts for virtual machine.In such alternative concepts, example
As, the conceptual level being described as virtual hardware platform 573 can merge with VMM561 and be merged in VMM561, so that from
Fig. 5 C and Fig. 5 D removes virtual host bus adapter 574(i.e., because its function is by host bus adaptor emulator 562
Realize).
Fig. 6 is the computer illustrating the parts and communication path for managing vvol according to an embodiment of the invention
The simplified block diagram of environment.As previously described, the communication path for I/O protocol flow be referred to as in-band path and
Being shown as dotted line 601 in Fig. 6, dotted line 601 connects the data access layer 540(of computer system by computer systems
There is provided HBA or NIC) with storage system 130 in configuration one or more PE.It is used for managing the communication path of vvol
It is that out-of-band path (as previously defined, not being the path of " in band ") and is shown as solid line 602 in figure 6.Root
According to the embodiments described herein, can be by the plug-in unit 612 provided in management server 610 and/or in computer system 103
In each computer system in provide plug-in unit 622 manage vvol, one of these plug-in units are the most only shown.Set in storage
On standby side, management interface 625 is configured by system management memory device 131, and management interface 626 is by system management memory device 132
Configuration.Additionally, management interface 624 is configured by distributed memory system manager 135.Each management interface and plug-in unit 612,622
Communication.In order to contribute to sending and dispose administration order, have been developed for special applications DLL (API).It should be appreciated that
In one embodiment, two plug-in units 612,622 of customization with the storage hardware communications from particular memory system sales business.Cause
This, management server 610 and computer system 103 by from the storage hardware communications luck being used for different storage system sales business
With different plug-in units.In another embodiment, can there is the single plug-in unit mutual with the management interface of any distributors.This will need
It is programmed for system management memory device knowing interface (such as, issuing) by by computer system and/or management server.
The system administration manager 611 of associated computer system also it is configured to management server 610.An embodiment
In, computer system performs virtual machine, and system administration manager 611 manages the virtual machine run in computer systems.Management
One example of the system administration manager 611 of virtual machine is distributed by VMware companyProduct.As indicated, be
System manager 611 with in computer system 103 operation main frame background program (daemon) (hostd) 621 communicate (by
The suitable hardware interface of both management server 610 and computer system 103) to receive resource use from computer system 103
Report and the various management of application initiation to running in computer system 103 operate.
Fig. 7 is the flow process for the method step to the storage system cluster authentication calculations machine system of Fig. 2 A or Fig. 2 B
Figure.This is initiated when computer system is by asking certification to storage its Secure Sockets Layer(SSL) certificate of system transfers
A little method steps.In step 710, storage system sends the prompting (example for certification certificate to the computer system of request certification
As, the user name and password).When receiving certification certificate in step 712, storage system compares they and storage in step 714
Certificate.Armed with correct certificate, then the SSL card of storage system computer system of authentication storage in keyword thesaurus
Book (step 716).Armed with incorrect certificate, then storage system is ignored SSL certificate and returns suitable error message (step
Rapid 718).After certified, computer system can call API with by SSL chain road direction storage system send administration order,
And the unique context ID included in SSL certificate is tactful for carrying out some by the system that stores, and which such as defines and calculates
Which storage container is machine system can access.In certain embodiments, can make when managing the authority of computer system approval
With their context ID.Such as, host computer can be licensed establishment vvol, but can be not permitted deletion vvol or
The snapshot of person vvol, or host computer can be licensed create vvol snapshot, but can be not permitted clone vvol.
Additionally, authority can be according to the user class change in privileges of the user of the computer system signing in certification.
Fig. 8 is for using establishment virtual volume api command to create the flow chart of the method step of virtual volume.A reality
Executing in example, computer system 103 has certain from one of its application reception for creating in step 802 in computer system 103
Via out-of-band path 602 to depositing during the request of the vvol of individual size and storage capacity profile (such as minimum IOPS and average delay)
Storage system sends establishment virtual volume api command.As response, computer system 103 in step 804(computer system 103 He
Request is applied that be licensed for access to and has abundant idle capacity to adapt among the storage container of request) select storage container
And send establishment virtual volume api command via plug-in unit 622 to storage system.Api command include vvol storage container ID,
Vvol size and storage capacity profile.In another embodiment, api command include applying require storage system with newly created
The key-value that vvol stores together is to set.In another embodiment, server 610 is managed via out-of-band path 602 to depositing
Storage system sends establishment virtual volume api command (via plug-in unit 612).
In step 806, system management memory device receives via management interface (such as, management interface 624,625 or 626)
For generate vvol ask and access selection storage container the metadata merogenesis in container database 316 with verify
Request contexts including computer system 103 and application has abundant authority to create vvol in the storage container selected.
In one embodiment, if Permission Levels are insufficient, then return error message to computer system 103.If Permission Levels
Fully, then unique vvol ID is generated in step 810.Then, in step 812, system management memory device scanning container database
Assignment bit map in the metadata merogenesis of 316 is to determine the Free Partition of the storage container of selection.System management memory device distributes
Select storage container be enough to adapt to request vvol size Free Partition and update storage container in container data
Assignment bit map in the metadata merogenesis in storehouse 316.System management memory device also updates vvol data base 314 by new vvol entry.
New vvol entry is included in the vvol ID of step 810 generation, the ordered list in newly assigned storage container panel and new vvol's
It is expressed as the metadata of key-value pair.Then, in step 814, system management memory device transmits to computer system 103
vvol ID.In step 816, computer system 103 associates the application of vvol ID and request establishment vvol.An embodiment
In, for one or more vvol descriptor file of each applicating maintenance, and safeguard to creating the application of vvol for request
Vvol descriptor file in write vvol ID.
As in figs. 2 a and 2b, and not all vvol is connected to PE.The vvol being not attached to PE does not knows by correspondence
The IO that application sends, because not setting up IO session to vvol.Can be before vvol sends IO, vvol experiences binding procedure,
As the result of this binding procedure, vvol will be bound to specific PE.Once vvol is bound to PE, it is possible to send to vvol
IO is until solving binding vvol from PE.
In one embodiment, computer system 130 use binding virtual volume API via out-of-band path 602 to storage
System sends bind request.Bind request identifies vvol(to be bound and uses vvol ID), and as response, store system
Vvol is tied to the PE that computer system 103 is connected to via in-band path.Fig. 9 A is via in band for computer system
Its flow chart of the method step of PE of being connected to of path discovery.Use standard SCSI command REPORT_LUNS via Dai Nei road
The PE being now based in the storage device of SCSI protocol configuration is sent out in footpath.Use API to send out via out-of-band path and be now based on NFS protocol
Storage device in configuration PE.The method step of Fig. 9 A is performed by the storage system that computer system is each connection.
In step 910, computer system determines that whether the storage system of connection is based on SCSI protocol or based on NFS association
View.If storage system is based on SCSI protocol, then in storage system band, sent scsi command REPORT_ by computer system
LUNS(step 912).Then, in step 913, computer system audit, from the response of storage system, is and return especially
PE ID in each PE ID association PE position, with between the LUN relevant with PE and routine data LUN distinguish.If deposited
Storage system based on NFS protocol, then by computer system from plug-in unit 622 to management interface (such as, management interface 624,625 or
626) the outer API Calls that sends of band is to obtain the ID(step 914 of available PE).Following the step 916 of step 913 and step 914,
The PE ID of the LUN relevant with PE that computer system storage is returned by storage system or the PE ID that returned by management interface with
For using during binding procedure.It should be appreciated that the PE ID returned by storage device based on SCSI protocol each includes
WWN, and each included IP address and assembling point by the PE ID of storage device based on NFS protocol return.
Fig. 9 B is for system management memory device 131 or system management memory device 132 or distributed memory system pipe
Reason device 135(hereinafter referred to as " system management memory device ") find what given computer system 103 was connected to via in-band path
The flow chart of the method step of PE.Found that such PE enables storage system in response to from request by system management memory device
The bind request of computer system returns computer system to computer system can be operatively coupled to effective PE thereon
ID.In step 950, system management memory device sends outside band to computer system 103 via management interface and plug-in unit 622
" Discovery_Topology(finds _ topology) " API Calls.Computer system 103 returns its system identifier and it is via figure
The list of all PE ID that the flow chart of 9A finds.In one embodiment, system management memory device is by via management interface
" Discovery_Topology " API Calls is sent to perform step 950 to management server 610 with plug-in unit 612.Such
In embodiment, reception is responded by storage system, and this respond packet contains multiple computer systems ID and the PEID of association, a computer
The PE ID of system identifier and association is for managing each corresponding computer system 103 of server 610 management.Then, in step
952, system management memory device processes the result from step 950.Such as, system management memory device is removed not in its current control
The list of all PE ID under system.Such as, system management memory device 135 call sending Discovery_Topology
Time receive some PE ID can correspond to be connected to another storage system of same computer system.Similarly, some receives
PE ID can correspond to since the older PE deleted by system management memory person, etc..In step 954, store system pipes
The result of reason device cache handles uses for during follow-up bind request.In one embodiment, storage system pipes
Reason device is periodically run the step of Fig. 9 B and delays to change the high speed updating it by ongoing computer systems and networks topology
The result deposited.In another embodiment, system management memory device runs the step of Fig. 9 B when it receives new vvol request to create
Suddenly.In still another embodiment, system management memory device is the step of service chart 9B after the authenticating step of service chart 7.
Figure 10 is for using binding virtual volume API send and perform the flow process of method step of virtual volume bind request
Figure.In one embodiment, computer system 103 is at its one of application request I O access and the vvol being not already bound to PE
Bind request is sent via out-of-band path 602 to storage system during the block device associated.In another embodiment, management server
610 combine some VM management operation sends bind request, these VM management operation include VM energising and from a storage container to
The vvol of another storage container migrates.
Continuing example described above, in this example, application request I O access and the vvol being not already bound to PE close
The block device of connection, computer system 103 in step 1002 from block device data base 533(or 580) determine the vvol of vvol
ID.Then, in step 1004, computer system 103 is sent to storage system by out-of-band path 602 and asks for binding vvol
Ask.
System management memory device receives via management interface (such as, management interface 624,625 or 626) in step 1006
For binding the request of vvol, and then performing step 1008, step 1008 includes selecting vvol by the PE being bound to, life
Become to be used for the SLLID of the PE of selection and generate number and update connection data base 312(such as, via I/O Manager 304).According to
Connection (i.e., only have and can be used for selecting with the PE being connected in the existing band of computer system 103) and other factors are (the most logical
Cross the current i/o traffic of available PE) carry out the PE that selection vvol will be bound to.In one embodiment, storage system is according to figure
Process and the PE list of cache that the method for 9B sends to it from computer system 103 select.SLLID generates and is using figure
The embodiment of the storage system cluster of 2A from use Fig. 2 B storage system cluster embodiment between different.At former case
Under, generate the unique SLLID of PE for selecting.In the later case, the literary composition of the file object corresponding with vvol is led in generation
Part path is as SLLID.Generate SLLID at PE for selecting and generate after number, update connect data base 312 with
Including the newly-generated IO session with vvol.Then, in step 1010, the ID of PE, the life selected is returned to computer system 103
The SLLID become and generation number.Alternatively, use Fig. 2 B storage system cluster embodiment in, can for vvol pair
The file object answered generates and returns to computer system 103 together with the ID of PE, the SLLID of generation selected and generation number
Unique NFS file handle.In step 1012, computer system 103 updates block device data base 533(or 580) with include from
PE ID, SLLID(and alternatively that storage system returns, NFS handle) and generation number.Especially, will be to block device data base
533(or 580) add PE ID, the SLLID(and alternatively returned from storage system, NFS handle) and generation number is each
Set is as new entry.It is used for taking precautions against Replay Attack it should be appreciated that generate number.Therefore, Replay Attack is not considered wherein
In embodiment, do not use generation number.
By hope to identical vvol send IO different application initiate to the follow-up bind request of identical vvol time, deposit
Vvol can be tied to identical or different PE by storage system manager.If vvol to be tied to identical PE, then store system
The ID that manager returns identical PE and the SLLID being previously generated, and it is incremented by this IO connection being combined storage with data base 312
The reference count in path.On the other hand, if vvol is tied to different PE, then system management memory device generates new SLLID
And return the ID and newly-generated SLLID of different PE and be connected with this new IO of vvol to connecting data base 312 interpolation
Path is as new entry.
Solution binding virtual volume API can be used to send virtual volume solution bind request.Solve bind request and include that following IO is even
Meet PE ID and SLLID in path, the most bind vvol by this IO access path.Ask it is suggested, however, that process to solve to bind
Ask.System management memory device is idle to solve binding vvol immediately or after a delay from PE.Data base 312 is connected by updating
The reference count of entry of PE ID and SLLID is comprised to process solution bind request to successively decrease.If reference count is decremented to
Zero, then can delete entry.In this case, it should be noted that vvol exists, but be no longer available for using given
The IO of PE ID and SLLID.
In the case of implementing the vvol of virtual disk of VM, the reference count for this vvol incites somebody to action at least one.Make
VM power-off and in combination send solution bind request time, reference count will be decremented by one.If reference count is zero, the most permissible
Vvol entry is removed from connecting data base 312.It is said that in general, be useful from connect data base 312 removing entry, because I/O
Manager 304 manages less data and can also reclaim SLLID.Such benefit is at the vvol sum by storage system storage
Greatly (such as, vvol is at million grades), but become by when applying the vvol sum accessed actively little (such as, VM is tens thousand of)
Significantly.Additionally, when vvol is not bound to any PE, storage system has when selecting where storage vvol in DSU141
Bigger motility.For example, it is possible to implement storage system with asymmetric, classification DSU141, some of them DSU141 provide and faster count
According to accessing and other DSU141 slower data access of offer (such as, to save carrying cost).In one implementation, exist
Vvol is not bound to any PE(, and this can be by checking that the reference count of the entry in connecting data base 312 of vvol comes
Determine) time, storage system can to more slowly and/or more cheap types physical storage device migrate vvol.Then, once will
Vvol is tied to PE, then storage system can migrate vvol to the physical storage device of faster type.It should be appreciated that can lead to
Cross one or more element of ordered list of the container position of the given vvol of the composition changing in vvol data base 314 also
And the respective disc differentiation coordination contour updated in the metadata merogenesis of container database 316 realizes such migration.
Conciliating binding vvol to PE binding makes system management memory device can determine vvol activity.System management memory device can
To utilize this information so that non-I/O service (passive) and I/O service (enlivening) vvol are performed the storage exclusive optimization of system sales business.
Such as, if system management memory device can be arranged to vvol and keep beyond the specific threshold time in passive state, will
It is repositioned onto middle time delay (low cost) hard-drive from low delay (high cost) SSD.
Figure 11 A and 11B is the flow chart of the method step for sending IO to virtual volume according to an embodiment.Figure
11A be for directly send the flow chart of method step 1100 of IO from application to original block equipment and Figure 11 B be for from
Application sends the flow chart of the method step 1120 of IO by file system driver.
Method 1100 starts in step 1102, wherein application (the application 512 or Fig. 5 C-shown in such as Fig. 5 A-Fig. 5 B
VM571 shown in Fig. 5 D sends IO to original block equipment.In step 1104, virtual volume device driver 532 or 565 basis
The IO sent by application generates original block level IO.In step 1106, the title of original block equipment is by virtual volume device driver 532
Or 565 are translated into PE ID and SLLID(and also by NFS client 545 in using the embodiment of storage device of Fig. 2 B
Or 585 are translated into NFS handle).In step 1108, data access layer 540 or 566 perform PE ID and SLLID(and
Also by NFS handle in using the embodiment of storage device of Fig. 2 B) it is encoded into original block level IO.Then, in step 1110,
HBA/NIC sends original block level IO.
Applying (application 512 shown in such as Fig. 5 A-Fig. 5 B) for non-VM, method 1120 starts in step 1121.?
Step 1121, applies and sends IO to the file of storage on block device based on vvol.Then, in step 1122, file system
Driver (such as, file system driver 510) generates block level IO according to file I/O.After step 1122, perform and step
1106,1108 steps 1126 identical with 1110,1128 and 1130.
Applying (VM571 shown in such as Fig. 5 C-Fig. 5 D) for VM, method 1120 starts in step 1123.In step
1123, VM send IO to its virtual disk.Then, in step 1124, this IO is such as translated written by SCSI virtualization layer 563
Part IO.File system driver (such as, VMFS driver 564) then generates block level IO in step 1125 according to file I/O.?
After step 1125, perform with step 1106,1108 and 1110 identical steps 1126,1128 and 1130.
Figure 12 is the flow chart of the method step for performing IO in storage system according to an embodiment.In step
1210, the IO sent by computer system is received by one of PE of configuring within the storage system.I/O Manager 304 is in step
1212 resolve IO.After step 1212, if storage system cluster is type shown in Fig. 2 A, then held by I/O Manager 304
Row step 1214a, and if store system cluster be type shown in Fig. 2 B, then performed step by I/O Manager 304
1214b.In step 1214a, I/O Manager 304 IO analytically extract SLLID and access connection data base 312 with determine with
The vvol ID that PE ID is corresponding with the SLLID of extraction.In step 1214b, I/O Manager 304 IO analytically extracts NFS handle
And use PE ID and NFS handle as SLLID to identify vvol.Step 1216 is performed after step 1214a and 1214b.
Vvol data base 314 and container database 316 is accessed respectively to obtain in step 1216, volume manager 306 and container manager 308
It must will be performed the physical storage locations of IO.Then, in step 1218, data access layer 310 is to obtaining in step 1216
Physical storage locations performs IO.
In some cases, application (application 512 or VM571), management server 610 and/or system management memory device
May determine that the binding such as experience problem when PE becomes over loading due to too many binding of vvol to specific PE.As solution
The mode of such problem, even if when I/O command is drawn towards vvol, system management memory device still can be by binding
Vvol is tied to different PE again.Figure 13 is sending for use again binding API and performing according to an embodiment
The flow chart of the method step 1300 of vvol bind request again.
As indicated, method 1300 starts in step 1302, wherein system management memory device is it is determined that be tied to vvol
The 2nd different for the PE PE being currently bound to from vvol.In step 1304, system management memory device is via out-of-band path
Send the request for again binding vvol to computer system (such as, computer system 103), this computer system run to
Vvol sends the application of IO.In step 1306, computer system 103 receives bind request again from system management memory device, and
And as response, send the request for vvol being tied to new PE.In step 1308, system management memory device receives and again ties up
Fixed request, and as response, vvol is tied to new PE.In step 1310, system management memory device is as above in association with Figure 10
The ID of the new PE that vvol is the most also bound to is transmitted and for accessing the SLLID of vvol to computer system as description.
In step 1312, computer system receives new PE ID and SLLID from system management memory device.In block device data
In storehouse 533 or 580, initially the active position that new PE connects is arranged to 1, it means that have been set up the use via new PE
New IO session in vvol.The active position that oneth PE connects also is arranged to 0 by computer system, and this expression can not pass through this PE
Connect and send more IO to vvol.Connect, because may have it should be appreciated that should not solve this PE of binding immediately when deexcitation
The IO to vvol connected by this PE, these IO may be awing, i.e. but it is issued and is not fully complete.Therefore, in step
1314, computer system accesses block device data base 533 or 580 with check whether to have been completed by a PE connect to
All " in-flight order " (CIF) that vvol sends, i.e. whether CIF=0.Computer system is before performing step 1318
Wait that CIF becomes zero.Meanwhile, send the additional I/O to vvol by new PE, because the active position connected by new PE is arranged
Become 1.When CIF reaches zero, perform step 1318, wherein send for solving what binding the oneth PE connected to system management memory device
Request.Then, in step 1320, system management memory device solves binding vvol from a PE.Computer system is also in step 1324
All additional I/O are sent to vvol by new PE.
Figure 14 is the concept map of the life cycle of the virtual volume according to an embodiment.All orders shown in Figure 14
(that is, establishment, snapshot, clone, bind, solve binding, extend and delete) forms vvol administration order collection, and by above in association with
The plug-in unit 612,622 that Fig. 6 describes may have access to.As indicated, as to order establishment vvol, snapshot vvol or clone
When the result of any order in vvol generates vvol, the vvol of generation keeps in " passive " state, and wherein vvol is not
It is bound to specific PE, and therefore can not receive IO.Additionally, perform fast to order when vvol is in passive state
According to vvol, clone vvol or extension vvol in any order time, if original vvol and newly created vvol(has)
Passive state keeps.Also as indicated, when the vvol in passive state is bound to PE, vvol enters " enlivening " shape
State.Conversely, when enlivening vvol from PE solution binding, vvol enters passive state, and this assumes that vvol is not bound to any attached
Add PE.Perform when vvol is in active state to order snapshot vvol, clone vvol, extension vvol or again tie up
Determine in vvol any order time, original vvol keeps in active state, and if newly created vvol(have)
Passive state keeps.
As described above, VM can have multiple virtual disk, and is that each virtual disk creates the vvol separated.
VM also has the meta data file of the configuration describing VM.Meta data file includes VM configuration file, VM journal file, dish descriptor
File, VM swap file etc., each dish descriptor file each respective virtual dish in the virtual disk of VM.For virtual disk
Dish descriptor file comprise the information relating to virtual disk, such as its vvol ID, its size, the thinnest supply virtual disk
Mark etc. with one or more snapshot created for virtual disk.VM swap file provides the exchange sky of the VM in storage system
Between.In one embodiment, vvol stores these VM configuration files, and this vvol is referred herein as metadata
vvol。
Figure 15 is the flow chart of the method step for supplying VM according to an embodiment.In this embodiment, make
With management server 610, computer system (such as, hereinafter referred to as " the main frame meter of the computer 102(shown in Fig. 5 C of master control VM
Calculation machine ")) and the storage system cluster of Fig. 2 A, it is system management memory device 131,132 or 135 especially.As indicated, storage
System administration manager receives the request for supplying VM in step 1502.This can be to use and managing the suitable of server 610
The VM manager of user interface sends the VM for supply with certain size and storage capacity profile to management server 610
The request generated during order.In response to this order, in step 1504, management server 610 is initiated for retouching with above in association with Fig. 8
The mode stated creates the vvol(hereinafter referred to as " metadata vvol " of the metadata for comprising VM) method, system management memory
Device creates metadata vvol in the method in step 1508 and returns the vvol of metadata vvol to management server 610
ID.In step 1514, management server 610 registers back the vvol ID of metadata vvol to the computer system of master control VM.In step
Rapid 1516, host computer initiates the method for metadata vvol is tied to PE by the mode described above in association with Figure 10,
Metadata vvol is tied to PE in step 1518 and returns PE to host computer by system management memory device in the method
ID and SLLID.
In step 1522, host computer uses " establishment equipment " in the operating system of host computer to call and create
Build the block device example of metadata vvol.Then, in step 1524, host computer creates file system (example on block device
As, VMFS), return file system ID(FSID in response to this point).Host computer has return in step 1526 assembling
The file system of FSID, and in the name space associated with this file system, store the metadata of VM.Showing of metadata
Example includes VM journal file, dish descriptor file and VM swap file, and each dish descriptor file is in the virtual disk of VM
Each respective virtual dish.
In step 1528, host computer is initiated for in the virtual disk that the mode described above in association with Fig. 8 is VM
Each virtual disk creates each such vvol of vvol(referred herein as " data vvol ") method, system management memory device exists
Step 1530 creates data vvol in the method and returns the vvol ID of data vvol to host computer.In step
1532, host computer is storing the ID of data vvol in the dish descriptor file of virtual disk.The method is to be
All virtual disks of VM solve binding metadata vvol(after creating data vvol not shown) as terminating.
Figure 16 A is the method step for having made VM be energized after having supplied VM in the way of combining Figure 15 description
Flow chart.Figure 16 B is the flow chart of the method step having made VM power-off after having made VM energising.Both approaches is by for VM
Host computer perform.
When receiving VM power-on command in step 1608, fetch metadata vvol corresponding with VM in step 1610.Then,
In step 1612, metadata vvol experiences binding procedure as above described in conjunction with Figure 10.In step 1614 in metadata vvol
Upper assembling file system so that step 1616 read for the meta data file of data vvol (be dish especially
Descriptor file) and obtain data vvol ID.Then data vvol experience as retouched above in association with Figure 10 one by one in step 1618
The binding procedure stated.
When receiving VM power off command in step 1620, block device data base (such as, the block device data base of Fig. 5 C
580) data vvol of labelling VM are inactive, and host computer waits and associating with each vvol in data vvol
CIF reaches zero (step 1622).When the CIF associated with each data vvol reaches zero, host computer please in step 1624
Storage system solution of seeking survival binds this data vvol.After the CIF associated with all data vvol reaches zero, in step 1626 at block
In device databases, labelling metadata is inactive.Then, in step 1628, zero is reached at the CIF associated with metadata vvol
Time, host computer solves binding metadata vvol in step 1630 request.
Figure 17 and Figure 18 is the flow chart of the method step for again supplying VM.In example shown here, Figure 17
For extending data vvol that the vvol(of VM is particularly used for the virtual disk of VM) the holding on a host computer of size
The flow chart of method step of row, and Figure 18 is holding within the storage system of the vvol for VM mobile between storage container
The flow chart of the method step of row.
Start in step 1708 for the method for the size of data vvol of the virtual disk of VM for extension, wherein main frame meter
Calculation machine determines whether VM is energized.If in step 1708, host computer determines that VM is not energized, then host computer is in step
Rapid 1710 ID fetching metadata vvol corresponding with VM.Then, host computer initiate for metadata in step 1712
The binding procedure of vvol.After the binding, in step 1714, host computer assemble in metadata vvol file system and
Fetch the ID of data vvol corresponding with virtual disk from the dish descriptor file for virtual disk, this dish descriptor file is in unit
File in the file system of assembling in data vvol.Then, in step 1716, host computer in step 1716 to storage is
System sends extension vvol API Calls, and wherein extension vvol API Calls includes the ID of data vvol and the new big of data vvol
Little.
If VM is energized, then host computer fetches data vvol of virtual disk to be extended of VM in step 1715
ID.Should recognize from the method for Figure 16 A, this ID can be obtained from the dish descriptor file associated with the virtual disk of VM.So
After, in step 1716, host computer sends extension vvol API Calls in step 1716 to storage system, wherein extends vvol
API Calls includes the ID of data vvol and the new size of data vvol.
Extension vvol API Calls causes and updates vvol data base and container database (such as, Fig. 3 within the storage system
Vvol data base 314 and container database 316) to reflect the address space of the increase of vvol.Adjust receiving extension vvol API
During with the confirmation being complete, host computer updates the dish descriptor literary composition of the virtual disk for VM in step 1718 by new size
Part.Then, in step 1720, host computer determines whether VM is energized.If it is not, then host computer is in step 1722
Dismantle file system and send the request for solving binding metadata vvol to storage system.On the other hand, if VM quilt
Energising, then the method terminates.
For the vvol of the current bindings of VM to PE is shifted to (the wherein source storage of destination's storage container from source storage container
Container with both destination storage containers in the range of identical system management memory device) method start in step 1810, its
The Container ID (respectively SC1 and SC2) of middle reception source and destination storage container and the vvol ID of vvol to be moved.So
After, in step 1812, update vvol data base (such as, the vvol data base 314 of Fig. 3) and container database (such as, Fig. 3
Container database 316) panel assignment bit map as follows.First, system management memory device from SC1 container database 316
Entry remove vvol panel in SC1, and then by the entry in container database 316 of amendment SC2 come to
SC2 assigns these panels.In one embodiment, storage system can be by assigning new spindle panel to compensate SC1 to SC1
In (owing to removing vvol storage panel) memory capacity loss, and by removing some untapped spindles from SC2
(owing to adding vvol storage panel) memory capacity that panel makes up in SC2 increases.In step 1814, system management memory
Device determines whether the PE of current bindings can optimally serve the IO of the new position to vvol.Current PE can not serve to
The sample instance during IO of the new position of vvol be if storage administrator's static configuration system management memory device with to
Vvol from different clients and therefore from different storage containers assigns different PE.If current PE can not serve to
The IO of vvol, then vvol step 1815 experience above in association with Figure 13 describe binding procedure again (with to connect data base's (example
Such as, the connection data base 312 of Fig. 3) the change of association).After step 1815, perform step 1816, wherein to main frame meter
Calculation machine returns the confirmation successfully moved.If system management memory device step 1814 determine current PE can serve to
The IO of the new position of vvol, then walk around step 1815 and then perform step 1816.
When between incompatible storage container (storage container such as, created in the storage device of different manufacturers it
Between) mobile vvol time, also exist in addition to container database 316, vvol data base 314 and the change connecting data base 312
Perform data between storage container to move.In one embodiment, be used in by incorporated herein full content being hereby incorporated by,
It is filed on May 29th, 2008 and entitled " Offloading Storage Operations to Storage
Hardware " No. 12/129,323 U.S. Patent application described in Data Transfer Techniques.
Figure 19 is the stream for the method step performed in host computer and storage system from template VM clone VM
Cheng Tu.This method starts in step 1908, and wherein host computer sends for creating metadata for new VM to storage system
The request of vvol.1910, storage system is that new VM creates metadata and to main frame according to the method described above in association with Fig. 8
Computer returns new metadata vvol ID.Then, in step 1914, for belonging to all data vvol ID of template VM from master
Machine computer sends clone's vvol API Calls via out-of-band path 601 to storage system.In step 1918, system management memory
Device inspection is the most compatible with data vvol checking template VM and new VM.If it should be appreciated that being cloned in different manufacturer
Occur between the storage container created in storage system, then data vvol may be incompatible.If there is compatibility, then perform step
Rapid 1919.In step 1919, system management memory device is by generating new data vvol ID, updating dividing in container database 316
Coordination contour and add new vvol entry to create new data vvol and multiple to data vvol of new VM to vvol data base 314
Make the content of storage in data vvol of template VM.In step 1920, system management memory device returns new to host computer
Data vvol ID.Receiving new data vvol ID provides data vvol to clone and faultless confirmation to host computer.
Then, in step 1925, host computer sends IO to update with newly-generated data vvol ID to metadata vvol of new VM
Meta data file, is dish descriptor file especially.Performed by host computer to storage system in step 1926 by storage system
The IO that system sends, as the result of this point, updates the dish descriptor file of new VM with newly-generated data vvol ID.
If in step 1918, system management memory device determines that data vvol of template VM and new VM are incompatible, then to main frame
Computer returns error message.When receiving this error message, host computer sends wound in step 1921 to storage system
Build vvol API Calls to create new data vvol.In step 1922, system management memory device by generate new data vvolID,
Update the assignment bit map in container database 316 and add new vvol entry to create new data to vvol data base 314
Vvol, and return new data vvolID to host computer.In step 1923, host computer is according to by incorporated herein
Full content is incorporated into, is filed on January 21st, 2009 and entitled " Data Mover for Computer
System " No. 12/356,694 U.S. Patent application described in technology perform data move (step 1923).In step
After 1923, perform step 1925 and 1926 as described above.
Figure 20 is the flow chart of the method step for supplying VM according to another embodiment.In this embodiment, make
By the computer system of management server 610, master control VM, (such as, computer system 102(shown in Fig. 5 D is hereinafter referred to as " main
Machine computer ")) and the storage system cluster of Fig. 2 B (be system management memory device 131 or system management memory device especially
132 or system management memory device 135).As indicated, receive the request for supplying VM in step 2002.This can be to make
With with management server 610 suitable user interface VM manager to management server 610 send for supply there is certain
The request generated during the order of the VM of size and storage capacity profile.In response to this order, in step 2004, manage server
610 vvol(initiating to be used for creating the metadata for comprising VM by the mode described above in association with Fig. 8 are metadata especially
Vvol) method, system management memory device step 2008 create in the method this metadata vvol of metadata vvol(be
File in NAS device), and return metadata vvol ID to management server 610.In step 2020, manage server
610 register back the vvol ID of metadata vvol to host computer.In step 2022, host computer sends to storage system
For the bind request of metadata vvol ID, in response to this bind request, storage system returns to IP address respectively in step 2023
With directory path as PE ID and SLLID.In step 2024, host computer assembles in the IP address specified and directory path
Catalogue, and in the catalogue of assembling, store meta data file.In the embodiment using NFS, NFS client 545 or 585
NFS handle can be resolved in given IP address and directory path to send NFS request to such catalogue.
In step 2026, host computer is initiated for in the virtual disk that the mode described above in association with Fig. 8 is VM
Each virtual disk creates the method for data vvol, and system management memory device creates data vvol also in the method in step 2030
And the vvol ID of data vvol is returned to host computer.In step 2032, host computer describes at the dish for virtual disk
Symbol file stores the ID of data vvol.The method solves after creating data vvol at all virtual disks for VM
Binding metadata vvol(is not shown) as terminating.
The most like that, creating new vvol from storage container and explicitly do not referring to for new vvol
When determining storage capacity profile, the storage capacity profile that succession is associated by new vvol with storage container.Can be from some different profiles
One of the storage capacity profile that associates with storage container of selection.Such as, as shown in Figure 21, different profiles include producing (prod)
Profile 2101, exploitation (dev) profile 2102 and test profile 2103(are collectively referred to herein as " profile 2100 ").It should be appreciated that
Other profiles many can be defined.As indicated, each profile entries of certain profiles is fixed type or variable type, and
There is title and one or more value associated with it.The profile entries of fixed type has the optional item of fixed number
Mesh.It is arranged to true or false for example, it is possible to " replicated " by profile entries.In contrast, the profile entries of variable type is without predetermined
Justice selects.As an alternative, for profile entries default settings and the value scope of variable type, and user can select in scope
Interior any value.If not specified value, then use default value.In exemplary profile 2100 shown in figure 21, variable type
Profile entries has three numbers separated by comma.First number is the more low side of the scope specified, and the second number refers to
Fixed scope more high-end.3rd number is default value.Therefore, will replicate vvol, this vvol inherits in producing profile 2101
The storage capacity profile of definition (replicate. value=true), and can use defined in scope 0.1 to 24 hours (being defaulted as 1 hour)
In target recovery time (RTO) replicated.Additionally, replicate for this vvol(. value=true) allow snapshot.The snapshot number kept
Mesh, in scope 1 to 100, is defaulted as 1, and snapshot frequency scope the most once to every 24 hours once in, be defaulted as
The most once.SnapInherit row instruction the given vvol of snapshot using create as when deriving the new vvol of vvol whether
Given profile attributes (and value) should be propagated to derivative vvol.In the example producing profile 2101, can be with generating letter
Shelves 2101 propagate only the first two profile entries (replicating and RTO) to the snapshot vvol of given vvol.Snapshot vvol all its
The value of his attribute will be provided in the default value specified in profile.In other words, these other attributes are on given vvol
Any customization (such as, the non-default value of snapshot frequency) by due to their corresponding SnapInherit be classified as false without by
Snapshot vvol propagates.Profile also comprise other row (such as CloneInherit(is not shown)) and ReplicaInherit(do not show
Go out), these two row control respectively to clone and which property value of copy propagation of given vvol.
When the method according to Fig. 4 creates storage container, can arrange can be fixed for the vvol created from storage container
The type of the storage capacity profile of justice.The method being used for creating storage container shown in flow chart illustration Fig. 4 in Figure 21, its
In between step 412 and 413 inserting step 2110.In step 2110, storage administrator selects the storage container for creating
Profile 2100 in one or more profile.Such as, be the storage container that creates of client can with produce profile
2101 with exploitation profile 2102 associate so that for production type vvol will as it may is that as succession give tacit consent to
The value that value or client specify is producing the storage capacity profile defined in profile 2101, and is that the vvol of development types will be as
It may is that inherit the value specified with default value or client like that to develop the storage capacity profile defined in profile 2102.
Figure 22 be illustrate for create vvol and define the storage capacity profile for vvol by system management memory
The flow chart of the method step that device 131,132 or 135 performs.The method step of Figure 22 (be especially method step 2210,
2212,2218 and 2220) step 806 shown in Fig. 8,810,812 and 814 are corresponded respectively to.Additionally, the method step of Figure 22
Including step 2214,2215 and 2216, the storage capacity profile of these steps definition vvol for creating.
In step 2214, system management memory device determine whether for create vvol request in specify by
The value used in storage capacity profile.If they are not, then system management memory device uses and the depositing of vvol in step 2215
The storage capacity profile of storage container association is as the storage capacity profile with default value of vvol.If have specified that by
The value used in storage capacity profile, then system management memory device uses and depositing that the storage container of vvol associates in step 2216
Storage capability profile is as the storage capacity profile with the value specified rather than default value of vvol.
In one embodiment, the storage capacity profile storing vvol in vvol data base 314 is key-value pair.
The storage capacity profile once having been defined for and storing in vvol data base 314 vvol is key-value pair, and
As long as the attribute relevant with duplication and snapshot and value are the part of this profile as shown in the exemplary profile of Figure 21, store system
Just it is able to carry out duplication and snapshot for vvol and instructs further without sent by host computer.
Figure 23 is to illustrate for creating being performed by system management memory device 131,132 or 135 of snapshot from female vvol
The flow chart of method step.In one embodiment, snapshot is used to follow the tracks of data structure with according to the storage at given vvol
Snapshot definition scheduling snapshot in capability profile.When reaching for the time of the scheduling of snapshot, system management memory device is in step
Rapid 2310 follow the tracks of data structure from snapshot fetches vvol ID.Then, in step 2312, system management memory device generates for fast
According to unique vvol ID.System management memory device uses female vvol(i.e. to have in step 2315 and takes from snapshot tracking data structure
The vvol of vvol ID returned) storage capacity profile as the storage capacity profile of snapshot vvol.It should be noted that, owing to this is
The snapshot processes driven by automatization's profile of storage system drive, so user is not obtained for specifying snapshot vvol's
The chance of the custom value used in storage capacity profile.In step 2318, system management memory device is by updating container database
Assignment bit map in 316 and add the new vvol entry for snapshot vvol in the storage of female vvol to vvol data base 314
Snapshot vvol is created in container.Then, in step 2320, system management memory device is used for female vvol's by scheduling for generating
The time updating decision of next snapshot is according to following the tracks of data structure.It should be appreciated that system management memory device must for institute just like
Lower vvol concurrent maintenance snapshot is followed the tracks of data structure and performs the method step of Figure 23, the storage capacity profile rule of these vvol
The snapshot of degree of setting the tone.
After creating snapshot by manner described above, update the key-value pair of storage in vvol data base 314
With instruction snapshot vvol as type=snapshot.Equally, safeguard that generation number is (whenever shooting or arranging snapshot for snapshot wherein
Be incremented by generate number or be set to equal to date+time) embodiment in, storage generate number be key-value pair.Also exist
The female vvol ID storing snapshot vvol in snapshot vvol entry is key-value pair.As result, host computer can be to
Vvol data base 314 inquiry and specific snapshot corresponding for vvol ID.It is also possible to allow host computer send to vvol data base
Inquiry for the snapshot corresponding with specific vvol ID and specific generation number.
Various embodiments described can be used various by computer-implemented operation, and these operations relate at meter
The data of storage in calculation machine system.Such as, but these operations generally may not need physical manipulation physical magnitude, these numbers
Amount wherein can store to use electricity or the form of magnetic signal, transmit, combine, compare or otherwise handle it
Or their expression.It addition, often quote such manipulation in term (such as produce, identify, determine or compare).
The operation of the part of any one or more embodiment of formation described herein can be useful machine operation.Additionally, one
Individual or multiple embodiments are directed to a kind of equipment for performing these operations or device.This device can be by structure especially
Making for concrete required purpose, or it can be general purpose computer, this general purpose computer is by the meter stored in a computer
Calculation machine program activates selectively or configures.Especially, various general-purpose machinerys can be with the meter write according to teachings herein
Calculation machine program is used together, or construct more specialized apparatus with perform action required can be more convenient.
Can with include handheld device, microprocessor system, based on microprocessor or programmable consumer electronic devices,
Other computer system configurations of minicomputer, mainframe computers etc. realize various embodiments described.
One or more embodiment can be embodied as one or more computer program or at one or
One or more computer program module embodied in multiple computer-readable mediums.Term computer-readable medium refers to can
To store any data storage device of data, these data can input to computer system subsequently.Computer-readable medium can
By based on for enable computer program embody in the way of being readable by a computer any existing of computer program or with
The technology of rear exploitation.The example of computer-readable medium include hard-drive, network attachment storage device (NAS), read only memory,
Random access memory (such as, flash memory device), CD(compact-disc), CD-ROM, CD-R or CD-RW, DVD(digital versatile
Dish), tape and other optics and non-optical data storage device.Can also be by the computer system distribution meter of coupling network
Calculation machine computer-readable recording medium, so that storage and computer readable code executed in a distributed way.
Although having described one or more embodiment by some details to understand clear, it will be clear that
It is can to carry out some within the scope of the claims and change and amendment.Such as, use SCSI as the association for SAN equipment
View and use NFS are as the agreement for NAS device.Any alternative (the such as fiber channel) of SCSI protocol can be used,
And any alternative (such as CIFS(Common Internet File System) agreement of NFS protocol can be used).Thus, the reality of description
Execute example and will be considered exemplary and non-limiting, and the scope of claim is not limited to details given here, but can
To be modified in the scope of claim and equivalents.In the claims, unit and/or step are unless in claim
In clearly state otherwise be not meant to any specific operation order.
Although additionally, the virtual method main assumption virtual machine described presents and the connecing of specific hardware systems compliant
Mouthful, but can use, in conjunction with the most not corresponding with any specific hardware system virtualization, the method described.According to various realities
The virtualization system executing example is all conceived to, and these embodiments are implemented as the embodiment of master control, the embodiment of non-master control or are
The often embodiment of fuzzy differentiation between.Furthermore it is possible to implement various virtual the most within hardware
Change operation.Such as, look-up table can be used for revising storage access request so that non-disk data safety by hardware implementation mode.
It is the most all possible that many changes, revises, adds and improve the degree of virtualization.Therefore virtualization software may be used
To include performing the main frame of virtualization, control station or the parts of client operating system.Can provide multiple example with
In being described herein as the parts of single instance, operation or structure.Finally, various parts, operation and data repository it
Between some is any on border, and in the context of particular instantiation configuration, illustrate specific operation.The distribution of other functions is set
Think and can fall in the range of the embodiments described herein.It is said that in general, be rendered as separating component in example arrangement
26S Proteasome Structure and Function may be implemented as combination structure or parts.Similarly, structure and the merit of single parts it are rendered as
Separating component can be may be implemented as.These and other change, revise, add and improve the model that can fall into claims
In enclosing.
Claims (14)
1. a computer system (100), has the multiple virtual machines (571) run wherein, in described virtual machine (571)
Each virtual machine has in storage system (130) as separating the virtual disk that logical storage volumes (151,152) is managed
(575), described computer system includes:
Hardware store interface (503,504), is arranged to send input-output order IO to described storage system (130);With
And
It is characterized in that, described computer system (100) also includes:
Virtualization software module (565), is arranged to receive read/write request from described virtual machine (571) and generate
Will be according to from the first virtual machine (5711) read/write request and with described first virtual machine (5711) described virtual disk
(575A) IO that is issued by described hardware store interface (503,504) of the block device title that associates, and by basis
From the second virtual machine (5712) read/write request and with described second virtual machine (5712) described virtual disk (575) close
The 2nd IO that the block device title of connection is issued by described hardware store interface (503,504),
Each IO in a wherein said IO and described 2nd IO includes protocol end identifier (PE ID) and secondary identifier
(SLLID), described protocol end identifier (PE ID) identify in described storage system (130) with described logical storage volumes (151,
152) protocol end (161) of one or more logical storage volumes association in, and described secondary identifier (SLLID) mark
Know among the one or more logical storage volumes in described logical storage volumes (151,152) with described protocol end
(161) the respective logic storage volume (151) associated.
Computer system the most according to claim 1, wherein said virtualization software module (565) is additionally configured to dimension
Protecting Mapping data structure (580), described Mapping data structure provides block device title to described protocol end identifier and described
The mapping of secondary identifier.
Computer system the most according to claim 2, wherein said Mapping data structure (580) also includes for described piece
The entry of each block device title in device name, the instruction of described entry and the logical storage volumes of block device names associate
(151) whether it is active or inactive.
Computer system the most according to claim 3, wherein said Mapping data structure (580) also includes for described piece
The entry of each block device title in device name, the instruction of described entry is gone to and the logical storage volumes of block device names associate
(151), be in operating IO number.
Computer system the most according to claim 4, wherein said virtualization software module (565) is additionally configured to
By described hardware store interface (503,504) when described logical storage volumes (151) returns reading data or write confirmation
Update the number of the IO being in operation.
Computer system the most according to claim 1, wherein the described protocol end identifier in a described IO and
Described protocol end identifier in described 2nd IO is identical, and described secondary identifier in a described IO and
Described secondary identifier in described 2nd IO is different.
Computer system the most according to claim 1, wherein the described protocol end identifier in a described IO and
Described protocol end identifier in described 2nd IO is different.
Computer system the most according to claim 1, wherein said virtualization software module (565) be additionally configured to from
The protocol end (161) of described storage system (130) receives signal that mistake occurred, determines that described mistake is and described association
View end points (161) mistake that associates and send error event, the instruction of described error event can via described protocol end (161)
Each logical storage volumes in one or more logical storage volumes (151,152) accessed is unavailable.
Computer system the most according to claim 8, wherein said one or more logical storage volumes (151,152) has
Having the corresponding blocks device name that use Mapping data structure (580) identifies, described Mapping data structure provides block device title to arrive
The mapping of protocol end identifier.
Computer system the most according to claim 1, wherein said virtualization software module (565) is additionally configured to
Receive signal that mistake occurred from the protocol end (161) of described storage system (130), determine that described mistake is and logic
Mistake that storage volume (151) associates and stop and send any additional I/O order to described logical storage volumes (151).
11. computer systems according to claim 1, wherein said protocol end identifier (PEID) includes for LUN
World Wide Name.
12. computer systems according to claim 1, wherein said protocol end identifier (PEID) include IP address and
Assembling point.
13. 1 kinds for the side used in the computer system (100) with the multiple virtual machines (571) run wherein
Method, each virtual machine in described virtual machine (571) have in the storage system (130) as separate logical storage volumes (151,
152) virtual disk (575) being managed, described method includes:
IO input-output order and the 2nd IO is sent to described storage system (130) by hardware store interface (503,504)
Input-output order;And
It is characterized in that following steps:
Receive read/write request from described virtual machine (571) and generation will be according to from the first virtual machine (5711) reading
Take/write request and with described first virtual machine (5711) described virtual disk (575A) the block device title that associates passes through hardware
The described IO that memory interface (503,504) is issued, and generation will be according to from the second virtual machine (5712) reading/
Write request and with described second virtual machine (5712) the block device title that associates of described virtual disk (575) by described hardware
Described 2nd IO that memory interface (503,504) is issued,
Each IO in a wherein said IO and described 2nd IO includes protocol end identifier (PE ID) and secondary identifier
(SLLID), described protocol end identifier (PE ID) identify in described storage system (130) with described logical storage volumes (151,
152) protocol end (161) of one or more logical storage volumes association in, and described secondary identifier (SLLID) mark
Know among the one or more logical storage volumes in described logical storage volumes (151,152) with described protocol end
(161) the respective logic storage volume (151) associated.
14. 1 kinds for setting of using in the computer system (100) with the multiple virtual machines (571) run wherein
Standby, each virtual machine in described virtual machine (571) have in storage system (130) as separate logical storage volumes (151,
152) virtual disk (575) being managed, described equipment includes:
For being sent an IO input-output order and by hardware store interface (503,504) to described storage system (130)
The device of two IO input-output orders;And
It is characterized in that described equipment also includes:
Will be according to from the first virtual machine (571 for asking from described virtual machine (571) reception read/write and generate1)
Read/write request and with described first virtual machine (5711) described virtual disk (575A) the block device title that associates is by hard
The described IO that part memory interface (503,504) is issued, and generation will be according to from the second virtual machine (5712) reading
Take/write request and with described second virtual machine (5712) the block device title that associates of described virtual disk (575) by described
The device of described 2nd IO that hardware store interface (503,504) is issued,
Each IO in a wherein said IO and described 2nd IO includes protocol end identifier (PE ID) and secondary identifier
(SLLID), described protocol end identifier (PE ID) identify in described storage system (130) with described logical storage volumes (151,
152) protocol end (161) of one or more logical storage volumes association in, and described secondary identifier (SLLID) mark
Know among the one or more logical storage volumes in described logical storage volumes (151,152) with described protocol end
(161) the respective logic storage volume (151) associated.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610515983.1A CN106168884B (en) | 2011-08-26 | 2012-08-22 | Access the computer system of object storage system |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/219,378 US8650359B2 (en) | 2011-08-26 | 2011-08-26 | Computer system accessing object storage system |
US13/219,378 | 2011-08-26 | ||
PCT/US2012/051840 WO2013032806A1 (en) | 2011-08-26 | 2012-08-22 | Computer system accessing object storage system |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610515983.1A Division CN106168884B (en) | 2011-08-26 | 2012-08-22 | Access the computer system of object storage system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103765370A CN103765370A (en) | 2014-04-30 |
CN103765370B true CN103765370B (en) | 2016-11-30 |
Family
ID=
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004355638A (en) * | 1999-08-27 | 2004-12-16 | Hitachi Ltd | Computer system and device assigning method therefor |
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004355638A (en) * | 1999-08-27 | 2004-12-16 | Hitachi Ltd | Computer system and device assigning method therefor |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103765372B (en) | It is configured to the object storage system of input/output operations | |
CN103765371B (en) | Derive the data-storage system as the logical volume of storage object | |
CN106168884B (en) | Access the computer system of object storage system | |
CN103748545B (en) | Data storage system and data storage control method | |
JP7053682B2 (en) | Database tenant migration system and method | |
US20220035714A1 (en) | Managing Disaster Recovery To Cloud Computing Environment | |
EP2306320B1 (en) | Server image migration | |
CN110023896A (en) | The merged block in flash-memory storage system directly mapped | |
US11886334B2 (en) | Optimizing spool and memory space management | |
JP6423752B2 (en) | Migration support apparatus and migration support method | |
US20230195444A1 (en) | Software Application Deployment Across Clusters | |
CN105739930A (en) | Storage framework as well as initialization method, data storage method and data storage and management apparatus therefor | |
US9940073B1 (en) | Method and apparatus for automated selection of a storage group for storage tiering | |
US20220091744A1 (en) | Optimized Application Agnostic Object Snapshot System | |
CN103765370B (en) | Access the computer system of object storage system | |
US20240045609A1 (en) | Protection of Objects in an Object-based Storage System | |
WO2022241024A1 (en) | Monitoring gateways to a storage environment | |
WO2022240938A1 (en) | Rebalancing in a fleet of storage systems using data science | |
WO2024097622A1 (en) | Handling semidurable writes in a storage system | |
EP4338044A1 (en) | Role enforcement for storage-as-a-service |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
CP03 | Change of name, title or address |
Address after: California, USA Patentee after: Weirui LLC Country or region after: U.S.A. Address before: California, USA Patentee before: VMWARE, Inc. Country or region before: U.S.A. |