US20200249857A1 - Extent Lock Resolution In Active/Active Replication - Google Patents
Extent Lock Resolution In Active/Active Replication Download PDFInfo
- Publication number
- US20200249857A1 US20200249857A1 US16/263,414 US201916263414A US2020249857A1 US 20200249857 A1 US20200249857 A1 US 20200249857A1 US 201916263414 A US201916263414 A US 201916263414A US 2020249857 A1 US2020249857 A1 US 2020249857A1
- Authority
- US
- United States
- Prior art keywords
- lock
- request
- extent
- write
- storage device
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000010076 replication Effects 0.000 title claims abstract description 118
- 238000000034 method Methods 0.000 claims description 66
- 238000004590 computer program Methods 0.000 claims description 15
- 230000008569 process Effects 0.000 description 38
- 238000012545 processing Methods 0.000 description 13
- 239000007787 solid Substances 0.000 description 8
- 238000013500 data storage Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 239000000835 fiber Substances 0.000 description 3
- 238000003491 array Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000009429 electrical wiring Methods 0.000 description 2
- 230000005670 electromagnetic radiation Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000011084 recovery Methods 0.000 description 2
- 230000001413 cellular effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 238000010348 incorporation Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000009987 spinning Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/52—Program synchronisation; Mutual exclusion, e.g. by means of semaphores
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/176—Support for shared access to files; File sharing support
- G06F16/1767—Concurrency control, e.g. optimistic or pessimistic approaches
- G06F16/1774—Locking methods, e.g. locking methods for file systems allowing shared and concurrent access to files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/062—Securing storage systems
- G06F3/0622—Securing storage systems in relation to access
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0629—Configuration or reconfiguration of storage systems
- G06F3/0637—Permissions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0646—Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
- G06F3/065—Replication mechanisms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0655—Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
- G06F3/0659—Command handling arrangements, e.g. command buffers, queues, command scheduling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/52—Program synchronisation; Mutual exclusion, e.g. by means of semaphores
- G06F9/526—Mutual exclusion algorithms
Definitions
- Data replication techniques enable organizations to protect data from loss, implement disaster recovery, or to migrate data between locations.
- One popular mode of data replication is active/active replication in which a network of servers and applications concurrently perform input/output (TO) operations across a virtualized storage layer. This type of replication provides advantages such as continuous availability, as replication operations are not interrupted when one system or node in the network goes down.
- an infrastructure that employs active/active replication requires some locking mechanism to enable concurrent updates to data from any site in the network. For example, if a host writes the first 4 KB of one page into one device and the last 4 KB of the same page into its peer device in an active/active setup, both sides will try to lock the page on both storage devices, leading to a deadlock.
- One aspect may provide a method for implementing extent lock resolution in an active/active replication session of a storage system.
- the method includes designating one of the storage devices as a lock winner.
- a lock winner designation indicates the one of the storage devices takes priority, over another of the storage devices, over acquisition of a lock.
- the method also includes receiving a replication write input/output (TO) request issued, by one of a first host system and a second host system during the active/active replication session, determining an extent of pages to be modified by the replication write TO request, locking the extent in one of the storage devices determined to be local to the host system that issued the replication write IO request, and executing the replication write IO request at the local storage device.
- TO replication write input/output
- the method further includes sending a write request to one of the storage devices remote from the host system that issued the replication write IO request, and receiving the write request at the storage device remote from the host system. If the remote storage device is the designated lock winner, and an attempt to lock the extent is unsuccessful, the remote storage device waits for the lock to become available. If the remote storage device is not the designated lock winner, and an attempt to lock the extent is unsuccessful, the remote storage device rejects the write request and sends a request to the local storage device to resend the write request.
- the system includes a memory having computer-executable instructions.
- the system also includes a processor operated by a storage system.
- the processor executes the computer-executable instructions.
- the computer-executable instructions When executed by the processor, the computer-executable instructions cause the processor to perform operations.
- the operations include designating one of the storage devices as a lock winner.
- a lock winner designation indicates the one of the storage devices takes priority, over another of the storage devices, over acquisition of a lock.
- the operations also include receiving a replication write input/output (TO) request issued, by one of a first host system and a second host system during the active/active replication session, determining an extent of pages to be modified by the replication write IO request, locking the extent in one of the storage devices determined to be local to the host system that issued the replication write IO request, and executing the replication write IO request at the local storage device.
- the operations further include sending a write request to one of the storage devices remote from the host system that issued the replication write IO request, and receiving the write request at the storage device remote from the host system. If the remote storage device is the designated lock winner, and an attempt to lock the extent is unsuccessful, the remote storage device waits for the lock to become available. If the remote storage device is not the designated lock winner, and an attempt to lock the extent is unsuccessful, the remote storage device rejects the write request and sends a request to the local storage device to resend the write request.
- TO replication write input/output
- Another aspect may provide a computer program product for implementing extent lock resolution in an active/active replication session of a storage system.
- the computer program is embodied on a non-transitory computer readable medium.
- the computer program product includes instructions that, when executed by a computer at a storage system, causes the computer to perform operations.
- the operations include designating one of the storage devices as a lock winner.
- a lock winner designation indicates the one of the storage devices takes priority, over another of the storage devices, over acquisition of a lock.
- the operations also include receiving a replication write input/output (TO) request issued, by one of a first host system and a second host system during the active/active replication session, determining an extent of pages to be modified by the replication write IO request, locking the extent in one of the storage devices determined to be local to the host system that issued the replication write IO request, and executing the replication write IO request at the local storage device.
- the operations further include sending a write request to one of the storage devices remote from the host system that issued the replication write IO request, and receiving the write request at the storage device remote from the host system. If the remote storage device is the designated lock winner, and an attempt to lock the extent is unsuccessful, the remote storage device waits for the lock to become available. If the remote storage device is not the designated lock winner, and an attempt to lock the extent is unsuccessful, the remote storage device rejects the write request and sends a request to the local storage device to resend the write request.
- TO replication write input/output
- FIG. 1 is a block diagram of a storage system to perform extent lock resolution in an active/active replication session in accordance with an illustrative embodiment
- FIGS. 2A-2D are flow diagrams of processes for performing extent lock resolution in an active/active replication session of a storage system in accordance with an illustrative embodiment.
- FIG. 3 is a block diagram of a hardware device that may perform at least a portion of the processes shown in FIGS. 2A-2D ;
- FIG. 4 is a simplified block diagram of an apparatus that may be used to implement at least a portion of the systems of FIGS. 1 and 4 and at least a portion of the process of FIGS. 2A-2D .
- Embodiments described herein provide extent lock resolution in a storage system that performs active/active replication.
- active/active replication refers to a mode of data replication in which a network of servers and applications concurrently perform input/output (IO) operations across a virtualized storage layer.
- This type of replication mode can create challenges, e.g., where a deadlock situation ensues when both sides of a replication system attempt to lock the same page at the same time.
- Existing solutions for resolving this issue include techniques to obtain the lock on both sides of the system before processing the IO or to designate extent ownership on each side, moving the ownership as part of the active/active negotiations. However, this can not only be cumbersome but can also have a negative impact on overall system performance.
- the embodiments described herein provide a solution for extent lock situations by designating one side of the storage network as a lock winner, giving that side of the network priority over locks and lock handling when both sides of the network simultaneously attempt to lock the same page during the active/active session.
- the term “storage system” is intended to be broadly construed so as to encompass, for example, private or public cloud computing systems for storing data as well as systems for storing data comprising virtual infrastructure and those not comprising virtual infrastructure.
- client refers, interchangeably, to any person, system, or other entity that uses a storage system to read/write data.
- the term “storage device” may also refer to a storage array including multiple storage devices.
- a storage medium may refer to one or more storage mediums such as a hard drive, a combination of hard drives, flash storage, combinations of flash storage, combinations of hard drives, flash, and other storage devices, and other types and combinations of computer readable storage mediums including those yet to be conceived.
- a storage medium may also refer both physical and logical storage mediums and may include multiple level of virtual to physical mappings and may be or include an image or disk image.
- a storage medium may be computer-readable and may also be referred to herein as a computer-readable program medium.
- I/O request or simply “I/O” or “10” may be used to refer to an input or output request, such as a data read or data write request.
- a storage device may refer to any non-volatile memory (NVM) device, including hard disk drives (HDDs), solid state drivers (SSDs), flash devices (e.g., NAND flash devices), and similar devices that may be accessed locally and/or remotely (e.g., via a storage attached network (SAN) (also referred to herein as storage array network (SAN)).
- NVM non-volatile memory
- HDDs hard disk drives
- SSDs solid state drivers
- flash devices e.g., NAND flash devices
- SAN storage attached network
- SAN storage array network
- a storage array may refer to a data storage system that is used for block-based, file-based or object storage, where storage arrays can include, for example, dedicated storage hardware that contains spinning hard disk drives (HDDs), solid-state disk drives, and/or all-flash drives (e.g., the XtremIO all flash drive, available from DELL/EMC of Hopkinton Mass.).
- a data storage entity may be any one or more of a file system, object storage, a virtualized device, a logical unit, a logical unit number, a logical volume, a logical device, a physical device, and/or a storage medium.
- a physical storage unit may be a physical entity, such as a disk or an array of disks, for storing data in storage locations that can be accessed by address, where physical storage unit is used interchangeably with physical volume.
- a data storage entity may be any one or more of a file system, object storage, a virtualized device, a logical unit, a logical unit number, a logical volume, a logical device, a physical device, and/or a storage medium.
- a snapshot may refer to differential representations of an image, i.e. the snapshot may have pointers to the original volume and may point to log volumes for changed locations.
- a snapshot may refer to differential representations of the state of a system. Snapshots may be combined into a snapshot array, which may represent different images over a time period or different states of a system over a time period.
- a journal may be a record of write transactions (e.g., I/O data) issued to a storage system, which may be used to maintain a duplicate storage system, and to roll back the duplicate storage system to a previous point in time.
- each entry in a journal contains, apart from the I/O data itself, I/O metadata that can include information such as a volume identifier (ID), the I/O block offset within the volume, the I/O length, and a timestamp of the I/O.
- ID volume identifier
- a data protection strategy that can be advantageous for use with computer systems, especially networked storage systems, is checkpointing.
- a checkpoint contains a consistent point in time image of an entire system, including configuration, logical volume mapping metadata, physical on disk layout metadata, and actual user data.
- a checkpoint preserves the state of a system at a given point in time by saving one or more snapshots of, for example, a file system, or an application at one or more points in time.
- a checkpoint can preserve a snapshot of an application's state, so that it can restart from that point in case of failure, which can be useful for long running applications that are executed in failure-prone computing systems. If a checkpoint is used, an application periodically writes large volumes of snapshot data to persistent storage in an attempt to capture its current state. Thus, if there is a failure, the application can recover by rolling-back its execution state to a previously saved checkpoint.
- active/active replication refers to a mode of data replication in which a network of servers and applications concurrently perform input/output (IO) operations across a virtualized storage layer.
- IO input/output
- an extent refers to a contiguous area of storage reserved for a file system that is represented as a range of blocks.
- a file may consist of zero or more extents and one file fragment would require one extent.
- active/active replication refers to a mode of data replication in which a network of servers and applications concurrently perform input/output (IO) operations across a virtualized storage layer.
- This type of replication mode can create challenges, e.g., where a deadlock situation ensues when both sides of a replication system attempt to lock the same page at the same time.
- the system 100 includes a first host system computer 102 A and a second host system computer 102 B. Each of the host system computers 102 A and 102 B is communicatively coupled to storage devices 104 A and 104 B over one or more networks 112 .
- the host system computers 102 A and 102 B may be implemented as high-speed computer processing devices, such as one or more mainframe computers capable of handling a high volume of activities conducted on behalf of end users of the active/active replication session.
- the storage devices 104 A and 104 B store a variety of data used by the host system computers 102 A and 102 B in implementing the active/active replication session. It is understood that the storage devices 104 A and 104 B may be implemented using memory contained in their respective host system computers 102 A and 102 B or may be separate physical devices. The storage devices 104 A and 104 B may be logically addressable as consolidated data sources across a distributed environment that includes the networks 112 . The storage devices 104 A- 104 B may communicate over a replication link 118 to perform replication write operations.
- storage device 104 A receives a write IO request from host system computer 102 A and, once the write operation has been completed on the storage device 104 A, the write IO is replicated to the storage device 104 B over the replication link 118 . It is understood that other means of communication between the storage devices 104 A- 104 B may be employed, e.g., through one or more networks of networks 112 .
- the host system computers 102 A- 102 B may operate as database servers and coordinate access to application data including data stored in the storage devices 104 A and 104 B.
- the host system computers 102 A- 102 B may be implemented using one or more servers operating in response to a computer program stored in a storage medium accessible by the servers.
- the host system computers 102 A- 102 B may each operate as a network server (e.g., a web server) to communicate with any network entities, such as storage systems 104 A and 104 B.
- Storage devices 104 A and 104 B may be implemented as varying types of storage devices.
- the storage devices 104 A and 104 B may include one or more rotating magnetic storage devices, one or more rotating optical storage devices, and/or one or more solid state drives (SSDs), such as a flash drive.
- the storage devices 104 A and 104 B may include one or more hard disk drives (HDD), one or more flash drives, optical disks, as well as one or more other types of data storage devices.
- the storage devices 104 A and 104 B may include a set of one or more data storage arrays.
- a data storage array may be, for example, a redundant array of inexpensive disks (RAID) array, an optical storage array, or any other type of data storage array.
- RAID redundant array of inexpensive disks
- the networks 112 may be any type of known networks including, but not limited to, a storage area network (SAN), wide area network (WAN), a local area network (LAN), a global network (e.g. Internet), a virtual private network (VPN), and an intranet.
- the networks 112 may be implemented using wireless networks or any kind of physical network implementation known in the art, e.g., using cellular, satellite, and/or terrestrial network technologies.
- the networks 112 may also include short range wireless networks utilizing, e.g., BLUETOOTHTM and WI-FITM technologies and protocols.
- host system computer 102 A and storage device 104 A reside in a first data center (not shown), and host system computer 102 B and storage device 104 B reside in a second data center. That is, host system computers 102 A and 102 B may reside in geographically disparate locations.
- the host system computer 102 A and the storage system 104 A at the first data center are communicatively coupled through a local network (e.g., as shown by solid line 114 A in FIG. 1 ), and the host system computer 102 B and the storage system 104 B may be communicatively coupled through a second local network (e.g., as shown by solid line 114 B in FIG. 1 ).
- the local communication networks 114 A and 114 B may include internal (e.g., short distance) communication links (e.g., InfiniBand (IB) link or Fibre Channel (FC) link) to transfer data between storage volumes for storing replicas (also referred to herein as snap sets).
- internal (e.g., short distance) communication links e.g., InfiniBand (IB) link or Fibre Channel (FC) link
- IB InfiniBand
- FC Fibre Channel
- the host system computer 102 A at the first data center and the storage system 104 B at the second data center may communicate remotely over a long distance network of the networks 112 .
- the host system computer 102 B at the second data center and the storage system 104 A at the first data center may communicate remotely over a long distance network of the networks 112 .
- the long distance communication networks (shown in FIG. 1 as dotted lines 116 A and 116 B, respectively) may be long-distance communication networks of a storage area network (SAN), e.g., over an Ethernet or Internet (e.g., TCP/IP) link that may employ, for example, the iSC SI protocol.
- FIG. 1 Also shown in FIG. 1 is a virtualized storage layer 106 including virtual databases 108 A- 108 n .
- the virtualized storage layer 106 represents a storage array virtualized across two or more physical sites to create a data presence mirrored between the sites and enables simultaneous writes to the two or more sites.
- the databases 108 A- 108 n may reside in one or more of the storage devices 104 A- 104 B.
- the virtualized storage layer 106 is communicatively coupled to the host systems 102 A- 102 B through the storage devices 104 A- 104 B via the networks 112 .
- the host system computer 102 A and the host system computer 102 B each implements a replication manager application 110 A and 110 B, respectively, to manage the processes described herein.
- the host system computers 102 A and 102 B perform 10 operations on the storage devices 104 A and 104 B in an active/active replication session.
- the IO operations for each of the host system computers 102 A and 102 B may be managed the respective replication manager applications 110 A and 110 B.
- replication manager applications 110 A and 110 B perform data replication to their local storage systems and to remote storage systems over the networks 112 in an active/active replication mode.
- Data replication may be performed based on data replication policies that may define various settings for data recovery operations.
- one policy may define a plurality of attributes, such as a frequency with which replicas are generated and how long each replica is kept at a storage system.
- a policy may define metrics for use in snap set creation and replication process determinations.
- replication manager applications 110 A and 110 B through the host systems 102 A and 102 B, are configured to enable designation of one of the storage devices to be a lock winner. In one embodiment, this designation can be determined by criteria such as a serial number of the storage devices. For example, the host systems 102 A and 102 B compare the serial numbers of their local storage devices 104 A and 104 B and, through their replication manager applications 110 A and 110 B, determine which serial number is higher. The storage device with the highest serial number is designated as the lock winner. This can be configured as an automated process that is performed by the replication manager applications or may be a manual process. In an embodiment, a user or administrator at one of the data centers can designate that the storage device residing at his/her data center become the lock winner. It will be understood that other means or criteria to designate a lock winner may be employed.
- FIG. 2A-2D flow diagrams of processes 200 A- 200 D for implementing extent lock resolution for storage devices in a storage system will now be described.
- the system e.g., system 100 of FIG. 1 and/or system 300 of FIG. 3
- the system is performing replication in an active/active replication mode.
- a one of the first storage device 104 A and the second storage device 104 B of the storage devices is designated as a lock winner.
- a lock winner designation indicates that the winning storage device will take priority, with respect to another storage device, over acquisition of a lock.
- this designation of lock winner may be implemented using different criteria.
- the designation may be an automated function based on serial numbers of the storage devices.
- the storage device having the highest serial number is automatically designated as the lock winner.
- the designation can be made by user selection.
- the storage device designated as the lock winner is assigned an attribute by the system, such that the host systems 102 A- 102 B implementing the active/active replication session can identify which storage system has been designated the lock winner.
- a replication write input/output (IO) request is issued by one of the host systems (e.g., 102 A or 102 B).
- IO replication write input/output
- block 206 an extent of pages to be modified by the replication write IO request is identified.
- the process 200 A attempts to lock the extent in the storage device that is local to the host system that issued the replication write IO request. For example, if host system 102 A issued to the replication write IO request, then the process 200 A attempts to lock the extent in storage device 104 A, which is local to the host system 102 A, regardless of which of the storage devices is designated winner or loser.
- block 210 it is determined whether the attempt to lock the extent was successful.
- the attempt may be unsuccessful if the extent is not available at the time of the lock attempt. In this case, if the extent is not available, the corresponding host system (to which the storage device is local) waits for the extent to become available in block 212 , and the process 200 A returns to block 208 .
- the process 200 A executes the write operation at the local storage device (e.g., 104 A) in block 213 , and sends a write request to the storage device that is remote from the host system that issued the replication write IO request (e.g., storage device 104 B) in block 214 .
- This write request is a replication write request from the first storage device 104 A requesting that the write operation executed at the local storage device be replicated to the remote storage device.
- the write request includes the data subject to the replication write IO request and corresponding extent.
- the storage device remote from the host system 102 A is storage device 104 B.
- the remote storage device receives the write request from the local storage device.
- the process 200 A continues to one of process 200 B or 200 C of FIGS. 2B and 2C , respectively, based on the perspective of the storage devices designated as either a lock winner or a lock loser. For example, if the remote storage device (e.g., 104 B) is the designated lock winner, the process continues in FIG. 2B . If, however, the remote storage device is not the designated lock winner (i.e., lock loser), the process continues in FIG. 2C .
- the remote storage device e.g., 104 B
- the remote storage device is not the designated lock winner (i.e., lock loser)
- the process 200 B is performed from the perspective of the remote storage device as the lock winner.
- the process 200 B attempts to lock the extent by the storage device designated as the lock winner. For example, suppose storage device 104 B has been designated the lock winner (from block 202 ). The storage device 104 B attempts to lock the extent of pages to be modified by the write request in block 218 .
- the process 200 B determines whether the attempt in block 218 was successful. If the attempt is unsuccessful (e.g., the extent is not available or locked by another IO operation, the process 200 B waits for the lock in block 222 and the process 200 B returns to block 218 .
- the replication write IO request is executed at the remote storage device and completed in block 224
- the process 200 B returns a success notification (e.g., that the replication write operation was executed at the remote storage device) to the local storage device in block 225 , and the process 200 B continues to FIG. 2D .
- the process 200 A proceeds to FIG. 2C , and the process 200 C is performed from the perspective of the lock loser.
- FIG. 2C an attempt to lock the extent by the remote storage system as the lock loser is performed in block 226 .
- the remote storage device determines whether the lock is successful. If not, the remote storage device, as the lock loser, rejects the replication IO request with a busy status and returns and indicator to the local storage device in block 230 . In block 232 , the remote storage device sends a request to the local storage device to resend the replication write IO request.
- the replication write IO request is executed and completed (e.g., the data of the request is written to the remote storage device) in block 234 , and a success notification is sent by the remote storage device to the local storage device in block 235 .
- the process 200 C proceeds to FIG. 2D .
- the process 400 D begins with receipt of the success notifications (block 236 ) by the local storage device from the remote storage device (e.g., from blocks 225 and 235 , respectively).
- the local storage device unlocks the extent, and in block 240 , the local storage device sends a notification of successful completion to the host system that issued the replication write IO request.
- the process 200 D returns to block 204 of FIG. 2A indicating another replication write IO request has been issued.
- the processes 200 A- 200 D can be continued in a loop until all replication write IO requests are processed in the session.
- the host system computers 102 A- 102 B may be implemented as one or more computers, such as a computer 300 as shown in FIG. 3 .
- Computer 300 may include processor 302 , volatile memory 304 (e.g., RAM), non-volatile memory 306 (e.g., a hard disk drive, solid state drive such as a flash drive, a hybrid magnetic and solid state drive, etc.), graphical user interface (GUI) 308 (e.g., a mouse, a keyboard, a display, and so forth) and input/output (I/O) device 320 .
- volatile memory 304 e.g., RAM
- non-volatile memory 306 e.g., a hard disk drive, solid state drive such as a flash drive, a hybrid magnetic and solid state drive, etc.
- GUI graphical user interface
- I/O input/output
- Non-volatile memory 306 stores computer instructions 312 , an operating system 316 and data 318 such that, for example, the computer instructions 312 are executed by the processor 302 out of volatile memory 304 to perform at least a portion of the processes 200 A- 200 C shown in FIGS. 2A-2D .
- Program code may be applied to data entered using an input device of GUI 308 or received from I/O device 320 .
- Processes 200 A- 200 D shown in FIGS. 2A-2D are not limited to use with the hardware and software of FIG. 3 and may find applicability in any computing or processing environment and with any type of machine or set of machines that is capable of running a computer program. Processes 200 A- 200 D may be implemented in hardware, software, or a combination of the two.
- processes 200 A- 200 D are not limited to the specific processing order shown in FIGS. 2A-2D . Rather, one or more blocks of processes 200 A- 200 D may be re-ordered, combined or removed, performed in parallel or in serial, as necessary, to achieve the results set forth herein.
- Processor 302 may be implemented by one or more programmable processors executing one or more computer programs to perform the functions of the system.
- the term “processor” is used to describe an electronic circuit that performs a function, an operation, or a sequence of operations. The function, operation, or sequence of operations can be hard coded into the electronic circuit or soft coded by way of instructions held in a memory device.
- a “processor” can perform the function, operation, or sequence of operations using digital values or using analog signals.
- the “processor” can be embodied in an application specific integrated circuit (ASIC).
- the “processor” can be embodied in a microprocessor with associated program memory.
- the “processor” can be embodied in a discrete electronic circuit.
- the “processor” can be analog, digital or mixed-signal.
- circuits While illustrative embodiments have been described with respect to processes of circuits, described embodiments may be implemented as a single integrated circuit, a multi-chip module, a single card, or a multi-card circuit pack. Further, as would be apparent to one skilled in the art, various functions of circuit elements may also be implemented as processing blocks in a software program. Such software may be employed in, for example, a digital signal processor, micro-controller, or general-purpose computer. Thus, described embodiments may be implemented in hardware, a combination of hardware and software, software, or software in execution by one or more processors.
- Some embodiments may be implemented in the form of methods and apparatuses for practicing those methods. Described embodiments may also be implemented in the form of program code, for example, stored in a storage medium, loaded into and/or executed by a machine, or transmitted over some transmission medium or carrier, such as over electrical wiring or cabling, through fiber optics, or via electromagnetic radiation.
- a non-transitory machine-readable medium may include but is not limited to tangible media, such as magnetic recording media including hard drives, floppy diskettes, and magnetic tape media, optical recording media including compact discs (CDs) and digital versatile discs (DVDs), solid state memory such as flash memory, hybrid magnetic and solid state memory, non-volatile memory, volatile memory, and so forth, but does not include a transitory signal per se.
- the program code When embodied in a non-transitory machine-readable medium and the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the method.
- processing devices may include, for example, a general purpose microprocessor, a digital signal processor (DSP), a reduced instruction set computer (RISC), a complex instruction set computer (CISC), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a programmable logic array (PLA), a microcontroller, an embedded controller, a multi-core processor, and/or others, including combinations of the above.
- DSP digital signal processor
- RISC reduced instruction set computer
- CISC complex instruction set computer
- ASIC application specific integrated circuit
- FPGA field programmable gate array
- PDA programmable logic array
- microcontroller an embedded controller
- multi-core processor and/or others, including combinations of the above.
- Described embodiments may also be implemented in the form of a bitstream or other sequence of signal values electrically or optically transmitted through a medium, stored magnetic-field variations in a magnetic recording medium, etc., generated using a method and/or an apparatus as recited in the claims.
- processing blocks represent computer software instructions or groups of instructions.
- the processing blocks may represent steps performed by functionally equivalent circuits such as a digital signal processor (DSP) circuit or an application specific integrated circuit (ASIC).
- DSP digital signal processor
- ASIC application specific integrated circuit
- the flow diagram does not depict the syntax of any particular programming language but rather illustrate the functional information one of ordinary skill in the art requires to fabricate circuits or to generate computer software to perform the processing required of the particular apparatus. It should be noted that many routine program elements, such as initialization of loops and variables and the use of temporary variables may be omitted for clarity.
- Some embodiments may be implemented in the form of methods and apparatuses for practicing those methods. Described embodiments may also be implemented in the form of program code, for example, stored in a storage medium, loaded into and/or executed by a machine, or transmitted over some transmission medium or carrier, such as over electrical wiring or cabling, through fiber optics, or via electromagnetic radiation.
- a non-transitory machine-readable medium may include but is not limited to tangible media, such as magnetic recording media including hard drives, floppy diskettes, and magnetic tape media, optical recording media including compact discs (CDs) and digital versatile discs (DVDs), solid state memory such as flash memory, hybrid magnetic and solid state memory, non-volatile memory, volatile memory, and so forth, but does not include a transitory signal per se.
- the program code When embodied in a non-transitory machine-readable medium and the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the method.
- the program code segments When implemented on one or more processing devices, the program code segments combine with the processor to provide a unique device that operates analogously to specific logic circuits.
- processing devices may include, for example, a general purpose microprocessor, a digital signal processor (DSP), a reduced instruction set computer (RISC), a complex instruction set computer (CISC), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a programmable logic array (PLA), a microcontroller, an embedded controller, a multi-core processor, and/or others, including combinations of one or more of the above.
- Described embodiments may also be implemented in the form of a bitstream or other sequence of signal values electrically or optically transmitted through a medium, stored magnetic-field variations in a magnetic recording medium, etc., generated using a method and/or an apparatus as recited in the claims.
- FIG. 4 shows Program Logic 404 embodied on a computer-readable medium 402 as shown, and wherein the Logic is encoded in computer-executable code configured for carrying out the reservation service process of this invention and thereby forming a Computer Program Product 400 .
- the logic may be the same logic on memory loaded on processor.
- the program logic may also be embodied in software modules, as modules, or as hardware modules.
- a processor may be a virtual processor or a physical processor. Logic may be distributed across several processors or virtual processors to execute the logic.
- a storage medium may be a physical or logical device. In some embodiments, a storage medium may consist of physical or logical devices. In some embodiments, a storage medium may be mapped across multiple physical and/or logical devices. In some embodiments, storage medium may exist in a virtualized environment. In some embodiments, a processor may be a virtual or physical embodiment. In some embodiments, logic may be executed across one or more physical or virtual processors.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
- Data replication techniques enable organizations to protect data from loss, implement disaster recovery, or to migrate data between locations. There are various types of replication modes that can be utilized by an organization, and each mode comes with its own advantages and disadvantages. One popular mode of data replication is active/active replication in which a network of servers and applications concurrently perform input/output (TO) operations across a virtualized storage layer. This type of replication provides advantages such as continuous availability, as replication operations are not interrupted when one system or node in the network goes down.
- However, an infrastructure that employs active/active replication requires some locking mechanism to enable concurrent updates to data from any site in the network. For example, if a host writes the first 4 KB of one page into one device and the last 4 KB of the same page into its peer device in an active/active setup, both sides will try to lock the page on both storage devices, leading to a deadlock.
- This Summary is provided to introduce a selection of concepts in a simplified form that are further described herein in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
- One aspect may provide a method for implementing extent lock resolution in an active/active replication session of a storage system. The method includes designating one of the storage devices as a lock winner. A lock winner designation indicates the one of the storage devices takes priority, over another of the storage devices, over acquisition of a lock. The method also includes receiving a replication write input/output (TO) request issued, by one of a first host system and a second host system during the active/active replication session, determining an extent of pages to be modified by the replication write TO request, locking the extent in one of the storage devices determined to be local to the host system that issued the replication write IO request, and executing the replication write IO request at the local storage device. The method further includes sending a write request to one of the storage devices remote from the host system that issued the replication write IO request, and receiving the write request at the storage device remote from the host system. If the remote storage device is the designated lock winner, and an attempt to lock the extent is unsuccessful, the remote storage device waits for the lock to become available. If the remote storage device is not the designated lock winner, and an attempt to lock the extent is unsuccessful, the remote storage device rejects the write request and sends a request to the local storage device to resend the write request.
- Another aspect may provide a system for implementing extent lock resolution in an active/active replication session of a storage system. The system includes a memory having computer-executable instructions. The system also includes a processor operated by a storage system. The processor executes the computer-executable instructions. When executed by the processor, the computer-executable instructions cause the processor to perform operations. The operations include designating one of the storage devices as a lock winner. A lock winner designation indicates the one of the storage devices takes priority, over another of the storage devices, over acquisition of a lock. The operations also include receiving a replication write input/output (TO) request issued, by one of a first host system and a second host system during the active/active replication session, determining an extent of pages to be modified by the replication write IO request, locking the extent in one of the storage devices determined to be local to the host system that issued the replication write IO request, and executing the replication write IO request at the local storage device. The operations further include sending a write request to one of the storage devices remote from the host system that issued the replication write IO request, and receiving the write request at the storage device remote from the host system. If the remote storage device is the designated lock winner, and an attempt to lock the extent is unsuccessful, the remote storage device waits for the lock to become available. If the remote storage device is not the designated lock winner, and an attempt to lock the extent is unsuccessful, the remote storage device rejects the write request and sends a request to the local storage device to resend the write request.
- Another aspect may provide a computer program product for implementing extent lock resolution in an active/active replication session of a storage system. The computer program is embodied on a non-transitory computer readable medium. The computer program product includes instructions that, when executed by a computer at a storage system, causes the computer to perform operations. The operations include designating one of the storage devices as a lock winner. A lock winner designation indicates the one of the storage devices takes priority, over another of the storage devices, over acquisition of a lock. The operations also include receiving a replication write input/output (TO) request issued, by one of a first host system and a second host system during the active/active replication session, determining an extent of pages to be modified by the replication write IO request, locking the extent in one of the storage devices determined to be local to the host system that issued the replication write IO request, and executing the replication write IO request at the local storage device. The operations further include sending a write request to one of the storage devices remote from the host system that issued the replication write IO request, and receiving the write request at the storage device remote from the host system. If the remote storage device is the designated lock winner, and an attempt to lock the extent is unsuccessful, the remote storage device waits for the lock to become available. If the remote storage device is not the designated lock winner, and an attempt to lock the extent is unsuccessful, the remote storage device rejects the write request and sends a request to the local storage device to resend the write request.
- Objects, aspects, features, and advantages of embodiments disclosed herein will become more fully apparent from the following detailed description, the appended claims, and the accompanying drawings in which like reference numerals identify similar or identical elements. Reference numerals that are introduced in the specification in association with a drawing figure may be repeated in one or more subsequent figures without additional description in the specification in order to provide context for other features. For clarity, not every element may be labeled in every figure. The drawings are not necessarily to scale, emphasis instead being placed upon illustrating embodiments, principles, and concepts. The drawings are not meant to limit the scope of the claims included herewith.
-
FIG. 1 is a block diagram of a storage system to perform extent lock resolution in an active/active replication session in accordance with an illustrative embodiment; -
FIGS. 2A-2D are flow diagrams of processes for performing extent lock resolution in an active/active replication session of a storage system in accordance with an illustrative embodiment; and -
FIG. 3 is a block diagram of a hardware device that may perform at least a portion of the processes shown inFIGS. 2A-2D ; and -
FIG. 4 is a simplified block diagram of an apparatus that may be used to implement at least a portion of the systems ofFIGS. 1 and 4 and at least a portion of the process ofFIGS. 2A-2D . - Embodiments described herein provide extent lock resolution in a storage system that performs active/active replication. As indicated above, active/active replication refers to a mode of data replication in which a network of servers and applications concurrently perform input/output (IO) operations across a virtualized storage layer. This type of replication mode can create challenges, e.g., where a deadlock situation ensues when both sides of a replication system attempt to lock the same page at the same time. Existing solutions for resolving this issue include techniques to obtain the lock on both sides of the system before processing the IO or to designate extent ownership on each side, moving the ownership as part of the active/active negotiations. However, this can not only be cumbersome but can also have a negative impact on overall system performance. The embodiments described herein provide a solution for extent lock situations by designating one side of the storage network as a lock winner, giving that side of the network priority over locks and lock handling when both sides of the network simultaneously attempt to lock the same page during the active/active session.
- Before describing embodiments of the concepts, structures, and techniques sought to be protected herein, some terms are explained. The following description includes a number of terms for which the definitions are generally known in the art. However, the following glossary definitions are provided to clarify the subsequent description and may be helpful in understanding the specification and claims.
- As used herein, the term “storage system” is intended to be broadly construed so as to encompass, for example, private or public cloud computing systems for storing data as well as systems for storing data comprising virtual infrastructure and those not comprising virtual infrastructure. As used herein, the terms “client,” “host,” and “user” refer, interchangeably, to any person, system, or other entity that uses a storage system to read/write data. In some embodiments, the term “storage device” may also refer to a storage array including multiple storage devices. In certain embodiments, a storage medium may refer to one or more storage mediums such as a hard drive, a combination of hard drives, flash storage, combinations of flash storage, combinations of hard drives, flash, and other storage devices, and other types and combinations of computer readable storage mediums including those yet to be conceived. A storage medium may also refer both physical and logical storage mediums and may include multiple level of virtual to physical mappings and may be or include an image or disk image. A storage medium may be computer-readable and may also be referred to herein as a computer-readable program medium.
- In certain embodiments, the term “I/O request” or simply “I/O” or “10” may be used to refer to an input or output request, such as a data read or data write request.
- In certain embodiments, a storage device may refer to any non-volatile memory (NVM) device, including hard disk drives (HDDs), solid state drivers (SSDs), flash devices (e.g., NAND flash devices), and similar devices that may be accessed locally and/or remotely (e.g., via a storage attached network (SAN) (also referred to herein as storage array network (SAN)).
- In certain embodiments, a storage array (sometimes referred to as a disk array) may refer to a data storage system that is used for block-based, file-based or object storage, where storage arrays can include, for example, dedicated storage hardware that contains spinning hard disk drives (HDDs), solid-state disk drives, and/or all-flash drives (e.g., the XtremIO all flash drive, available from DELL/EMC of Hopkinton Mass.). In certain embodiments, a data storage entity may be any one or more of a file system, object storage, a virtualized device, a logical unit, a logical unit number, a logical volume, a logical device, a physical device, and/or a storage medium.
- In certain embodiments, a physical storage unit may be a physical entity, such as a disk or an array of disks, for storing data in storage locations that can be accessed by address, where physical storage unit is used interchangeably with physical volume. In certain embodiments, a data storage entity may be any one or more of a file system, object storage, a virtualized device, a logical unit, a logical unit number, a logical volume, a logical device, a physical device, and/or a storage medium.
- In certain embodiments, a snapshot may refer to differential representations of an image, i.e. the snapshot may have pointers to the original volume and may point to log volumes for changed locations. In certain embodiments, a snapshot may refer to differential representations of the state of a system. Snapshots may be combined into a snapshot array, which may represent different images over a time period or different states of a system over a time period.
- In certain embodiments, a journal may be a record of write transactions (e.g., I/O data) issued to a storage system, which may be used to maintain a duplicate storage system, and to roll back the duplicate storage system to a previous point in time. In some embodiments, each entry in a journal contains, apart from the I/O data itself, I/O metadata that can include information such as a volume identifier (ID), the I/O block offset within the volume, the I/O length, and a timestamp of the I/O.
- In certain embodiments, a data protection strategy that can be advantageous for use with computer systems, especially networked storage systems, is checkpointing. A checkpoint, as used herein, contains a consistent point in time image of an entire system, including configuration, logical volume mapping metadata, physical on disk layout metadata, and actual user data. In certain embodiments, a checkpoint preserves the state of a system at a given point in time by saving one or more snapshots of, for example, a file system, or an application at one or more points in time. A checkpoint can preserve a snapshot of an application's state, so that it can restart from that point in case of failure, which can be useful for long running applications that are executed in failure-prone computing systems. If a checkpoint is used, an application periodically writes large volumes of snapshot data to persistent storage in an attempt to capture its current state. Thus, if there is a failure, the application can recover by rolling-back its execution state to a previously saved checkpoint.
- In certain embodiments, active/active replication refers to a mode of data replication in which a network of servers and applications concurrently perform input/output (IO) operations across a virtualized storage layer. This type of replication provides advantages such as continuous availability, as replication operations are not interrupted when one system or node in the network goes down.
- In certain embodiments, an extent refers to a contiguous area of storage reserved for a file system that is represented as a range of blocks. For example, a file may consist of zero or more extents and one file fragment would require one extent.
- While vendor-specific terminology may be used herein to facilitate understanding, it is understood that the concepts, techniques, and structures sought to be protected herein are not limited to use with any specific commercial products. In addition, to ensure clarity in the disclosure, well-understood methods, procedures, circuits, components, and products are not described in detail herein.
- The phrases, “such as,” “for example,” “e.g.,” “exemplary,” and variants thereof, are used herein to describe non-limiting embodiments and are used herein to mean “serving as an example, instance, or illustration.” Any embodiments herein described via these phrases and/or variants are not necessarily to be construed as preferred or advantageous over other embodiments and/or to exclude the incorporation of features from other embodiments. In addition, the word “optionally” is used herein to mean that a feature or process, etc., is provided in some embodiments and not provided in other embodiments.” Any particular embodiment of the invention may include a plurality of “optional” features unless such features conflict.
- Turning now to
FIG. 1 , anexample storage system 100 for performing extent lock resolution in an active/active replication session of a storage system will now be described. As indicated above, active/active replication refers to a mode of data replication in which a network of servers and applications concurrently perform input/output (IO) operations across a virtualized storage layer. This type of replication mode can create challenges, e.g., where a deadlock situation ensues when both sides of a replication system attempt to lock the same page at the same time. Existing solutions for resolving this issue include techniques to obtain the lock on both sides of the system before processing the IO or to designate extent ownership on each side, moving the ownership as part of the active/active negotiations; however, this can not only be cumbersome but can also have a negative impact on overall system performance. - The
system 100 includes a firsthost system computer 102A and a second host system computer 102B. Each of thehost system computers 102A and 102B is communicatively coupled tostorage devices 104A and 104B over one ormore networks 112. Thehost system computers 102A and 102B may be implemented as high-speed computer processing devices, such as one or more mainframe computers capable of handling a high volume of activities conducted on behalf of end users of the active/active replication session. - The
storage devices 104A and 104B store a variety of data used by thehost system computers 102A and 102B in implementing the active/active replication session. It is understood that thestorage devices 104A and 104B may be implemented using memory contained in their respectivehost system computers 102A and 102B or may be separate physical devices. Thestorage devices 104A and 104B may be logically addressable as consolidated data sources across a distributed environment that includes thenetworks 112. Thestorage devices 104A-104B may communicate over areplication link 118 to perform replication write operations. For example, in embodiments,storage device 104A receives a write IO request fromhost system computer 102A and, once the write operation has been completed on thestorage device 104A, the write IO is replicated to the storage device 104B over thereplication link 118. It is understood that other means of communication between thestorage devices 104A-104B may be employed, e.g., through one or more networks ofnetworks 112. - The
host system computers 102A-102B may operate as database servers and coordinate access to application data including data stored in thestorage devices 104A and 104B. Thehost system computers 102A-102B may be implemented using one or more servers operating in response to a computer program stored in a storage medium accessible by the servers. Thehost system computers 102A-102B may each operate as a network server (e.g., a web server) to communicate with any network entities, such asstorage systems 104A and 104B. -
Storage devices 104A and 104B may be implemented as varying types of storage devices. For example, thestorage devices 104A and 104B may include one or more rotating magnetic storage devices, one or more rotating optical storage devices, and/or one or more solid state drives (SSDs), such as a flash drive. Thestorage devices 104A and 104B may include one or more hard disk drives (HDD), one or more flash drives, optical disks, as well as one or more other types of data storage devices. In other examples, thestorage devices 104A and 104B may include a set of one or more data storage arrays. A data storage array may be, for example, a redundant array of inexpensive disks (RAID) array, an optical storage array, or any other type of data storage array. - The
networks 112 may be any type of known networks including, but not limited to, a storage area network (SAN), wide area network (WAN), a local area network (LAN), a global network (e.g. Internet), a virtual private network (VPN), and an intranet. Thenetworks 112 may be implemented using wireless networks or any kind of physical network implementation known in the art, e.g., using cellular, satellite, and/or terrestrial network technologies. Thenetworks 112 may also include short range wireless networks utilizing, e.g., BLUETOOTH™ and WI-FI™ technologies and protocols. - In some embodiments,
host system computer 102A andstorage device 104A reside in a first data center (not shown), and host system computer 102B and storage device 104B reside in a second data center. That is,host system computers 102A and 102B may reside in geographically disparate locations. In this embodiment, thehost system computer 102A and thestorage system 104A at the first data center are communicatively coupled through a local network (e.g., as shown bysolid line 114A inFIG. 1 ), and the host system computer 102B and the storage system 104B may be communicatively coupled through a second local network (e.g., as shown by solid line 114B inFIG. 1 ). In some embodiments, thelocal communication networks 114A and 114B may include internal (e.g., short distance) communication links (e.g., InfiniBand (IB) link or Fibre Channel (FC) link) to transfer data between storage volumes for storing replicas (also referred to herein as snap sets). - In embodiments, the
host system computer 102A at the first data center and the storage system 104B at the second data center may communicate remotely over a long distance network of thenetworks 112. Likewise, the host system computer 102B at the second data center and thestorage system 104A at the first data center may communicate remotely over a long distance network of thenetworks 112. The long distance communication networks (shown inFIG. 1 asdotted lines 116A and 116B, respectively) may be long-distance communication networks of a storage area network (SAN), e.g., over an Ethernet or Internet (e.g., TCP/IP) link that may employ, for example, the iSC SI protocol. - Also shown in
FIG. 1 is avirtualized storage layer 106 includingvirtual databases 108A-108 n. Thevirtualized storage layer 106 represents a storage array virtualized across two or more physical sites to create a data presence mirrored between the sites and enables simultaneous writes to the two or more sites. Thedatabases 108A-108 n may reside in one or more of thestorage devices 104A-104B. Thevirtualized storage layer 106 is communicatively coupled to thehost systems 102A-102B through thestorage devices 104A-104B via thenetworks 112. - In embodiments, as shown in
FIG. 1 , thehost system computer 102A and the host system computer 102B each implements areplication manager application 110A and 110B, respectively, to manage the processes described herein. Thehost system computers 102A and 102B perform 10 operations on thestorage devices 104A and 104B in an active/active replication session. In some embodiments, the IO operations for each of thehost system computers 102A and 102B may be managed the respectivereplication manager applications 110A and 110B. As changes are made to data stored onstorage devices 104A and 104B via the IO operations from thehost system computers 102A and 102B,replication manager applications 110A and 110B perform data replication to their local storage systems and to remote storage systems over thenetworks 112 in an active/active replication mode. - Data replication may be performed based on data replication policies that may define various settings for data recovery operations. For example, one policy may define a plurality of attributes, such as a frequency with which replicas are generated and how long each replica is kept at a storage system. In some embodiments, a policy may define metrics for use in snap set creation and replication process determinations.
- In embodiments,
replication manager applications 110A and 110B, through thehost systems 102A and 102B, are configured to enable designation of one of the storage devices to be a lock winner. In one embodiment, this designation can be determined by criteria such as a serial number of the storage devices. For example, thehost systems 102A and 102B compare the serial numbers of theirlocal storage devices 104A and 104B and, through theirreplication manager applications 110A and 110B, determine which serial number is higher. The storage device with the highest serial number is designated as the lock winner. This can be configured as an automated process that is performed by the replication manager applications or may be a manual process. In an embodiment, a user or administrator at one of the data centers can designate that the storage device residing at his/her data center become the lock winner. It will be understood that other means or criteria to designate a lock winner may be employed. - Turning now to
FIG. 2A-2D , flow diagrams ofprocesses 200A-200D for implementing extent lock resolution for storage devices in a storage system will now be described. The Figures assume that the system (e.g.,system 100 ofFIG. 1 and/orsystem 300 ofFIG. 3 ) is performing replication in an active/active replication mode. - In
block 202 ofFIG. 2A , a one of thefirst storage device 104A and the second storage device 104B of the storage devices is designated as a lock winner. A lock winner designation indicates that the winning storage device will take priority, with respect to another storage device, over acquisition of a lock. As indicated above, this designation of lock winner may be implemented using different criteria. For example, the designation may be an automated function based on serial numbers of the storage devices. In this example, the storage device having the highest serial number is automatically designated as the lock winner. In another embodiment, the designation can be made by user selection. In either embodiment, the storage device designated as the lock winner is assigned an attribute by the system, such that thehost systems 102A-102B implementing the active/active replication session can identify which storage system has been designated the lock winner. - In
block 204, a replication write input/output (IO) request is issued by one of the host systems (e.g., 102A or 102B). Inblock 206, an extent of pages to be modified by the replication write IO request is identified. - In
block 208, theprocess 200A attempts to lock the extent in the storage device that is local to the host system that issued the replication write IO request. For example, ifhost system 102A issued to the replication write IO request, then theprocess 200A attempts to lock the extent instorage device 104A, which is local to thehost system 102A, regardless of which of the storage devices is designated winner or loser. - In
block 210, it is determined whether the attempt to lock the extent was successful. The attempt may be unsuccessful if the extent is not available at the time of the lock attempt. In this case, if the extent is not available, the corresponding host system (to which the storage device is local) waits for the extent to become available inblock 212, and theprocess 200A returns to block 208. - If the attempt to lock the extent is successful in
block 210, theprocess 200A executes the write operation at the local storage device (e.g., 104A) inblock 213, and sends a write request to the storage device that is remote from the host system that issued the replication write IO request (e.g., storage device 104B) inblock 214. This write request is a replication write request from thefirst storage device 104A requesting that the write operation executed at the local storage device be replicated to the remote storage device. The write request includes the data subject to the replication write IO request and corresponding extent. Using the above example, the storage device remote from thehost system 102A is storage device 104B. Inblock 216, the remote storage device (e.g., 104B) receives the write request from the local storage device. - The
process 200A continues to one ofprocess FIGS. 2B and 2C , respectively, based on the perspective of the storage devices designated as either a lock winner or a lock loser. For example, if the remote storage device (e.g., 104B) is the designated lock winner, the process continues inFIG. 2B . If, however, the remote storage device is not the designated lock winner (i.e., lock loser), the process continues inFIG. 2C . - Turning now to
FIG. 2B , theprocess 200B is performed from the perspective of the remote storage device as the lock winner. Inblock 218, theprocess 200B attempts to lock the extent by the storage device designated as the lock winner. For example, suppose storage device 104B has been designated the lock winner (from block 202). The storage device 104B attempts to lock the extent of pages to be modified by the write request inblock 218. - In
block 220, theprocess 200B determines whether the attempt inblock 218 was successful. If the attempt is unsuccessful (e.g., the extent is not available or locked by another IO operation, theprocess 200B waits for the lock inblock 222 and theprocess 200B returns to block 218. - Otherwise, if the attempt to lock the extent is successful in
block 220, the replication write IO request is executed at the remote storage device and completed inblock 224 Theprocess 200B returns a success notification (e.g., that the replication write operation was executed at the remote storage device) to the local storage device in block 225, and theprocess 200B continues toFIG. 2D . - Returning now to
FIG. 2A , if the remote storage device is the designated lock loser, theprocess 200A proceeds toFIG. 2C , and theprocess 200C is performed from the perspective of the lock loser. InFIG. 2C , an attempt to lock the extent by the remote storage system as the lock loser is performed inblock 226. - In
block 228, it is determined whether the lock is successful. If not, the remote storage device, as the lock loser, rejects the replication IO request with a busy status and returns and indicator to the local storage device in block 230. In block 232, the remote storage device sends a request to the local storage device to resend the replication write IO request. - Returning to block 228, if the attempt to lock the extent is successful, the replication write IO request is executed and completed (e.g., the data of the request is written to the remote storage device) in block 234, and a success notification is sent by the remote storage device to the local storage device in
block 235. Theprocess 200C proceeds toFIG. 2D . - In
FIG. 4D , the process 400D begins with receipt of the success notifications (block 236) by the local storage device from the remote storage device (e.g., fromblocks 225 and 235, respectively). Inblock 238, the local storage device unlocks the extent, and in block 240, the local storage device sends a notification of successful completion to the host system that issued the replication write IO request. Theprocess 200D returns to block 204 ofFIG. 2A indicating another replication write IO request has been issued. - The
processes 200A-200D can be continued in a loop until all replication write IO requests are processed in the session. - In some embodiments, the
host system computers 102A-102B may be implemented as one or more computers, such as acomputer 300 as shown inFIG. 3 .Computer 300 may includeprocessor 302, volatile memory 304 (e.g., RAM), non-volatile memory 306 (e.g., a hard disk drive, solid state drive such as a flash drive, a hybrid magnetic and solid state drive, etc.), graphical user interface (GUI) 308 (e.g., a mouse, a keyboard, a display, and so forth) and input/output (I/O) device 320. Non-volatile memory 306stores computer instructions 312, anoperating system 316 and data 318 such that, for example, thecomputer instructions 312 are executed by theprocessor 302 out ofvolatile memory 304 to perform at least a portion of theprocesses 200A-200C shown inFIGS. 2A-2D . Program code may be applied to data entered using an input device ofGUI 308 or received from I/O device 320. - Processes 200A-200D shown in
FIGS. 2A-2D are not limited to use with the hardware and software ofFIG. 3 and may find applicability in any computing or processing environment and with any type of machine or set of machines that is capable of running a computer program. Processes 200A-200D may be implemented in hardware, software, or a combination of the two. - The processes described herein are not limited to the specific embodiments described. For example, processes 200A-200D are not limited to the specific processing order shown in
FIGS. 2A-2D . Rather, one or more blocks ofprocesses 200A-200D may be re-ordered, combined or removed, performed in parallel or in serial, as necessary, to achieve the results set forth herein. -
Processor 302 may be implemented by one or more programmable processors executing one or more computer programs to perform the functions of the system. As used herein, the term “processor” is used to describe an electronic circuit that performs a function, an operation, or a sequence of operations. The function, operation, or sequence of operations can be hard coded into the electronic circuit or soft coded by way of instructions held in a memory device. A “processor” can perform the function, operation, or sequence of operations using digital values or using analog signals. In some embodiments, the “processor” can be embodied in an application specific integrated circuit (ASIC). In some embodiments, the “processor” can be embodied in a microprocessor with associated program memory. In some embodiments, the “processor” can be embodied in a discrete electronic circuit. The “processor” can be analog, digital or mixed-signal. - While illustrative embodiments have been described with respect to processes of circuits, described embodiments may be implemented as a single integrated circuit, a multi-chip module, a single card, or a multi-card circuit pack. Further, as would be apparent to one skilled in the art, various functions of circuit elements may also be implemented as processing blocks in a software program. Such software may be employed in, for example, a digital signal processor, micro-controller, or general-purpose computer. Thus, described embodiments may be implemented in hardware, a combination of hardware and software, software, or software in execution by one or more processors.
- Some embodiments may be implemented in the form of methods and apparatuses for practicing those methods. Described embodiments may also be implemented in the form of program code, for example, stored in a storage medium, loaded into and/or executed by a machine, or transmitted over some transmission medium or carrier, such as over electrical wiring or cabling, through fiber optics, or via electromagnetic radiation. A non-transitory machine-readable medium may include but is not limited to tangible media, such as magnetic recording media including hard drives, floppy diskettes, and magnetic tape media, optical recording media including compact discs (CDs) and digital versatile discs (DVDs), solid state memory such as flash memory, hybrid magnetic and solid state memory, non-volatile memory, volatile memory, and so forth, but does not include a transitory signal per se. When embodied in a non-transitory machine-readable medium and the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the method.
- When implemented on a processing device, the program code segments combine with the processor to provide a unique device that operates analogously to specific logic circuits. Such processing devices may include, for example, a general purpose microprocessor, a digital signal processor (DSP), a reduced instruction set computer (RISC), a complex instruction set computer (CISC), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a programmable logic array (PLA), a microcontroller, an embedded controller, a multi-core processor, and/or others, including combinations of the above. Described embodiments may also be implemented in the form of a bitstream or other sequence of signal values electrically or optically transmitted through a medium, stored magnetic-field variations in a magnetic recording medium, etc., generated using a method and/or an apparatus as recited in the claims.
- Various elements, which are described in the context of a single embodiment, may also be provided separately or in any suitable subcombination. It will be further understood that various changes in the details, materials, and arrangements of the parts that have been described and illustrated herein may be made by those skilled in the art without departing from the scope of the following claims.
- In the above-described flow charts of
FIG. 2A-2C , rectangular elements, herein denoted “processing blocks,” represent computer software instructions or groups of instructions. Alternatively, the processing blocks may represent steps performed by functionally equivalent circuits such as a digital signal processor (DSP) circuit or an application specific integrated circuit (ASIC). The flow diagram does not depict the syntax of any particular programming language but rather illustrate the functional information one of ordinary skill in the art requires to fabricate circuits or to generate computer software to perform the processing required of the particular apparatus. It should be noted that many routine program elements, such as initialization of loops and variables and the use of temporary variables may be omitted for clarity. The particular sequence of blocks described is illustrative only and can be varied without departing from the spirit of the concepts, structures, and techniques sought to be protected herein. Thus, unless otherwise stated, the blocks described below are unordered meaning that, when possible, the functions represented by the blocks can be performed in any convenient or desirable order. - Some embodiments may be implemented in the form of methods and apparatuses for practicing those methods. Described embodiments may also be implemented in the form of program code, for example, stored in a storage medium, loaded into and/or executed by a machine, or transmitted over some transmission medium or carrier, such as over electrical wiring or cabling, through fiber optics, or via electromagnetic radiation. A non-transitory machine-readable medium may include but is not limited to tangible media, such as magnetic recording media including hard drives, floppy diskettes, and magnetic tape media, optical recording media including compact discs (CDs) and digital versatile discs (DVDs), solid state memory such as flash memory, hybrid magnetic and solid state memory, non-volatile memory, volatile memory, and so forth, but does not include a transitory signal per se. When embodied in a non-transitory machine-readable medium and the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the method.
- When implemented on one or more processing devices, the program code segments combine with the processor to provide a unique device that operates analogously to specific logic circuits. Such processing devices may include, for example, a general purpose microprocessor, a digital signal processor (DSP), a reduced instruction set computer (RISC), a complex instruction set computer (CISC), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a programmable logic array (PLA), a microcontroller, an embedded controller, a multi-core processor, and/or others, including combinations of one or more of the above. Described embodiments may also be implemented in the form of a bitstream or other sequence of signal values electrically or optically transmitted through a medium, stored magnetic-field variations in a magnetic recording medium, etc., generated using a method and/or an apparatus as recited in the claims.
- For example, when the program code is loaded into and executed by a machine, such as the computer of
FIG. 3 , the machine becomes an apparatus for practicing the invention. When implemented on one or more general-purpose processors, the program code combines with such a processor to provide a unique apparatus that operates analogously to specific logic circuits. As such a general-purpose digital machine can be transformed into a special purpose digital machine.FIG. 4 shows Program Logic 404 embodied on a computer-readable medium 402 as shown, and wherein the Logic is encoded in computer-executable code configured for carrying out the reservation service process of this invention and thereby forming aComputer Program Product 400. The logic may be the same logic on memory loaded on processor. The program logic may also be embodied in software modules, as modules, or as hardware modules. A processor may be a virtual processor or a physical processor. Logic may be distributed across several processors or virtual processors to execute the logic. - In some embodiments, a storage medium may be a physical or logical device. In some embodiments, a storage medium may consist of physical or logical devices. In some embodiments, a storage medium may be mapped across multiple physical and/or logical devices. In some embodiments, storage medium may exist in a virtualized environment. In some embodiments, a processor may be a virtual or physical embodiment. In some embodiments, logic may be executed across one or more physical or virtual processors.
- For purposes of illustrating the present embodiment, the disclosed embodiments are described as embodied in a specific configuration and using special logical arrangements, but one skilled in the art will appreciate that the device is not limited to the specific configuration but rather only by the claims included with this specification. In addition, it is expected that during the life of a patent maturing from this application, many relevant technologies will be developed, and the scopes of the corresponding terms are intended to include all such new technologies a priori.
- The terms “comprises,” “comprising”, “includes”, “including”, “having” and their conjugates at least mean “including but not limited to”. As used herein, the singular form “a,” “an” and “the” includes plural references unless the context clearly dictates otherwise. Various elements, which are described in the context of a single embodiment, may also be provided separately or in any suitable subcombination. It will be further understood that various changes in the details, materials, and arrangements of the parts that have been described and illustrated herein may be made by those skilled in the art without departing from the scope of the following claims.
Claims (20)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/263,414 US10719249B1 (en) | 2019-01-31 | 2019-01-31 | Extent lock resolution in active/active replication |
US16/883,024 US10908830B2 (en) | 2019-01-31 | 2020-05-26 | Extent lock resolution in active/active replication |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/263,414 US10719249B1 (en) | 2019-01-31 | 2019-01-31 | Extent lock resolution in active/active replication |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/883,024 Continuation US10908830B2 (en) | 2019-01-31 | 2020-05-26 | Extent lock resolution in active/active replication |
Publications (2)
Publication Number | Publication Date |
---|---|
US10719249B1 US10719249B1 (en) | 2020-07-21 |
US20200249857A1 true US20200249857A1 (en) | 2020-08-06 |
Family
ID=71611718
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/263,414 Active US10719249B1 (en) | 2019-01-31 | 2019-01-31 | Extent lock resolution in active/active replication |
US16/883,024 Active US10908830B2 (en) | 2019-01-31 | 2020-05-26 | Extent lock resolution in active/active replication |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/883,024 Active US10908830B2 (en) | 2019-01-31 | 2020-05-26 | Extent lock resolution in active/active replication |
Country Status (1)
Country | Link |
---|---|
US (2) | US10719249B1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220237209A1 (en) * | 2019-05-30 | 2022-07-28 | Zte Corporation | Database processing method and apparatus, and computer-readable storage medium |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11782604B2 (en) * | 2021-07-23 | 2023-10-10 | EMC IP Holding Company, LLC | IO request flow performance analysis system and method |
Family Cites Families (136)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5913213A (en) * | 1997-06-16 | 1999-06-15 | Telefonaktiebolaget L M Ericsson | Lingering locks for replicated data objects |
US6253274B1 (en) * | 1998-08-28 | 2001-06-26 | International Business Machines Corporation | Apparatus for a high performance locking facility |
US6938122B2 (en) | 2001-01-23 | 2005-08-30 | Emc Corporation | Remote mirroring in a switched environment |
US6883018B1 (en) | 2001-01-24 | 2005-04-19 | Emc Corporation | Scanning a message-list |
US7870195B1 (en) | 2001-01-24 | 2011-01-11 | Emc Corporation | Inter-processor messaging |
US7032228B1 (en) | 2001-03-01 | 2006-04-18 | Emc Corporation | Common device interface |
US6553464B1 (en) | 2001-03-04 | 2003-04-22 | Emc Corporation | Obtaining data from a remote storage device |
US6640280B1 (en) | 2001-03-04 | 2003-10-28 | Emc Corporation | Obtaining data from a remote storage device using multiple jobs per device on RA |
US7577957B1 (en) | 2001-03-04 | 2009-08-18 | Emc Corporation | Multiple jobs per device that are linked via a device record for servicing by different adapters |
US6886164B2 (en) | 2001-05-08 | 2005-04-26 | Emc Corporation | Selection of a resource in a distributed computer system |
US6496908B1 (en) | 2001-05-18 | 2002-12-17 | Emc Corporation | Remote mirroring |
US6968369B2 (en) | 2001-09-27 | 2005-11-22 | Emc Corporation | Remote data facility over an IP network |
US6944726B2 (en) | 2001-11-14 | 2005-09-13 | Emc Corporation | Distributed background track processing |
US6976139B2 (en) | 2001-11-14 | 2005-12-13 | Emc Corporation | Reversing a communication path between storage devices |
US6862632B1 (en) | 2001-11-14 | 2005-03-01 | Emc Corporation | Dynamic RDF system for transferring initial data between source and destination volume wherein data maybe restored to either volume at same time other data is written |
US6910075B2 (en) | 2001-11-14 | 2005-06-21 | Emc Corporation | Dynamic RDF groups |
US7113945B1 (en) | 2002-04-10 | 2006-09-26 | Emc Corporation | Virtual storage device that uses volatile memory |
US7475124B2 (en) | 2002-09-25 | 2009-01-06 | Emc Corporation | Network block services for client access of network-attached data storage in an IP network |
US7640342B1 (en) | 2002-09-27 | 2009-12-29 | Emc Corporation | System and method for determining configuration of one or more data storage systems |
US7292969B1 (en) | 2002-09-27 | 2007-11-06 | Emc Corporation | Method and system for simulating performance on one or more data storage systems |
US7380082B2 (en) | 2003-03-25 | 2008-05-27 | Emc Corporation | Reading virtual ordered writes at local storage device |
US7114033B2 (en) | 2003-03-25 | 2006-09-26 | Emc Corporation | Handling data writes copied from a remote data storage device |
US7051176B2 (en) | 2003-03-25 | 2006-05-23 | Emc Corporation | Reading data provided to a remote storage device |
US6898685B2 (en) | 2003-03-25 | 2005-05-24 | Emc Corporation | Ordering data writes from a local storage device to a remote storage device |
US7054883B2 (en) | 2003-12-01 | 2006-05-30 | Emc Corporation | Virtual ordered writes for multiple storage devices |
US7228456B2 (en) | 2003-12-01 | 2007-06-05 | Emc Corporation | Data recovery for virtual ordered writes for multiple storage devices |
US8185708B2 (en) | 2004-09-30 | 2012-05-22 | Emc Corporation | Host implementation of triangular asynchronous replication |
US8677087B2 (en) | 2006-01-03 | 2014-03-18 | Emc Corporation | Continuous backup of a storage device |
US20070156982A1 (en) | 2006-01-03 | 2007-07-05 | David Meiri | Continuous backup using a mirror device |
US7613890B1 (en) | 2006-03-31 | 2009-11-03 | Emc Corporation | Consistent replication across multiple storage devices |
US7617372B1 (en) | 2006-09-28 | 2009-11-10 | Emc Corporation | Avoiding copy on first write |
US8346719B2 (en) * | 2007-05-17 | 2013-01-01 | Novell, Inc. | Multi-node replication systems, devices and methods |
US7702871B1 (en) | 2007-08-31 | 2010-04-20 | Emc Corporation | Write pacing |
US8335899B1 (en) | 2008-03-31 | 2012-12-18 | Emc Corporation | Active/active remote synchronous mirroring |
US20090265352A1 (en) * | 2008-04-18 | 2009-10-22 | Gravic, Inc. | Methods for ensuring fair access to information |
US7962458B2 (en) * | 2008-06-12 | 2011-06-14 | Gravic, Inc. | Method for replicating explicit locks in a data replication engine |
US8652202B2 (en) | 2008-08-22 | 2014-02-18 | Edwards Lifesciences Corporation | Prosthetic heart valve and delivery apparatus |
US8515911B1 (en) | 2009-01-06 | 2013-08-20 | Emc Corporation | Methods and apparatus for managing multiple point in time copies in a file system |
US8706959B1 (en) | 2009-06-30 | 2014-04-22 | Emc Corporation | Virtual storage machine |
US9575985B2 (en) * | 2009-12-07 | 2017-02-21 | Novell, Inc. | Distributed lock administration |
US8566483B1 (en) | 2009-12-17 | 2013-10-22 | Emc Corporation | Measuring data access activity |
US8380928B1 (en) | 2009-12-17 | 2013-02-19 | Emc Corporation | Applying data access activity measurements |
US8429346B1 (en) | 2009-12-28 | 2013-04-23 | Emc Corporation | Automated data relocation among storage tiers based on storage load |
US8332687B1 (en) | 2010-06-23 | 2012-12-11 | Emc Corporation | Splitter used in a continuous data protection environment |
US8327103B1 (en) | 2010-06-28 | 2012-12-04 | Emc Corporation | Scheduling data relocation activities using configurable fairness criteria |
US8775388B1 (en) | 2010-09-29 | 2014-07-08 | Emc Corporation | Selecting iteration schemes for deduplication |
US8694700B1 (en) | 2010-09-29 | 2014-04-08 | Emc Corporation | Using I/O track information for continuous push with splitter for storage device |
US8683153B1 (en) | 2010-09-29 | 2014-03-25 | Emc Corporation | Iterating for deduplication |
US8335771B1 (en) | 2010-09-29 | 2012-12-18 | Emc Corporation | Storage array snapshots for logged access replication in a continuous data protection system |
US8539148B1 (en) | 2010-12-22 | 2013-09-17 | Emc Corporation | Deduplication efficiency |
US8578204B1 (en) | 2010-12-29 | 2013-11-05 | Emc Corporation | Witness facility for distributed storage system |
US8600943B1 (en) | 2010-12-31 | 2013-12-03 | Emc Corporation | Porting replication relationships |
US8468180B1 (en) | 2010-12-31 | 2013-06-18 | Emc Corporation | Porting storage metadata |
US9110693B1 (en) | 2011-02-17 | 2015-08-18 | Emc Corporation | VM mobility over distance |
US9513814B1 (en) | 2011-03-29 | 2016-12-06 | EMC IP Holding Company LLC | Balancing I/O load on data storage systems |
US8977812B1 (en) | 2011-03-30 | 2015-03-10 | Emc Corporation | Iterating in parallel for deduplication |
US9009437B1 (en) | 2011-06-20 | 2015-04-14 | Emc Corporation | Techniques for shared data storage provisioning with thin devices |
US8862546B1 (en) | 2011-06-30 | 2014-10-14 | Emc Corporation | Virtual access roll |
US8719497B1 (en) | 2011-09-21 | 2014-05-06 | Emc Corporation | Using device spoofing to improve recovery time in a continuous data protection environment |
US8825964B1 (en) | 2011-09-26 | 2014-09-02 | Emc Corporation | Adaptive integration of cloud data services with a data storage system |
US8838849B1 (en) | 2011-12-08 | 2014-09-16 | Emc Corporation | Link sharing for multiple replication modes |
US8966211B1 (en) | 2011-12-19 | 2015-02-24 | Emc Corporation | Techniques for dynamic binding of device identifiers to data storage devices |
US10082959B1 (en) | 2011-12-27 | 2018-09-25 | EMC IP Holding Company LLC | Managing data placement in storage systems |
US8977826B1 (en) | 2011-12-28 | 2015-03-10 | Emc Corporation | Extent commands in replication |
US9524220B1 (en) | 2011-12-30 | 2016-12-20 | EMC IP Holding Company, LLC | Memory optimization for configuration elasticity in cloud environments |
US9811288B1 (en) | 2011-12-30 | 2017-11-07 | EMC IP Holding Company LLC | Managing data placement based on flash drive wear level |
US8732124B1 (en) | 2012-03-26 | 2014-05-20 | Emc Corporation | Multisite replication with coordinated cycle switching |
US9026492B1 (en) | 2012-03-26 | 2015-05-05 | Emc Corporation | Multisite replication with uncoordinated cycle switching |
US8583607B1 (en) | 2012-03-28 | 2013-11-12 | Emc Corporation | Managing deduplication density |
US8712976B1 (en) | 2012-03-28 | 2014-04-29 | Emc Corporation | Managing deduplication density |
US9100343B1 (en) | 2012-03-29 | 2015-08-04 | Emc Corporation | Storage descriptors and service catalogs in a cloud environment |
US8799601B1 (en) * | 2012-06-28 | 2014-08-05 | Emc Corporation | Techniques for managing deduplication based on recently written extents |
US8954699B1 (en) | 2012-06-28 | 2015-02-10 | Emc Corporation | Techniques for identifying IO hot spots using range-lock information |
US8782324B1 (en) | 2012-06-28 | 2014-07-15 | Emc Corporation | Techniques for managing placement of extents based on a history of active extents |
US9483355B1 (en) | 2012-06-29 | 2016-11-01 | EMC IP Holding Company LLC | Tracking copy sessions |
US9152336B1 (en) | 2012-06-30 | 2015-10-06 | Emc Corporation | System and method for LUN adjustment |
US8930746B1 (en) | 2012-06-30 | 2015-01-06 | Emc Corporation | System and method for LUN adjustment |
US8909887B1 (en) * | 2012-09-25 | 2014-12-09 | Emc Corporation | Selective defragmentation based on IO hot spots |
US9542125B1 (en) | 2012-09-25 | 2017-01-10 | EMC IP Holding Company LLC | Managing data relocation in storage systems |
US9684593B1 (en) | 2012-11-30 | 2017-06-20 | EMC IP Holding Company LLC | Techniques using an encryption tier property with application hinting and I/O tagging |
US9449011B1 (en) | 2012-12-28 | 2016-09-20 | Emc Corporation | Managing data deduplication in storage systems |
US9477431B1 (en) | 2012-12-28 | 2016-10-25 | EMC IP Holding Company LLC | Managing storage space of storage tiers |
US9817766B1 (en) | 2012-12-28 | 2017-11-14 | EMC IP Holding Company LLC | Managing relocation of slices in storage systems |
US9355112B1 (en) * | 2012-12-31 | 2016-05-31 | Emc Corporation | Optimizing compression based on data activity |
US9594514B1 (en) | 2013-06-27 | 2017-03-14 | EMC IP Holding Company LLC | Managing host data placed in a container file system on a data storage array having multiple storage tiers |
US9710187B1 (en) | 2013-06-27 | 2017-07-18 | EMC IP Holding Company LLC | Managing data relocation in storage systems |
US9418131B1 (en) | 2013-09-24 | 2016-08-16 | Emc Corporation | Synchronization of volumes |
US9378106B1 (en) | 2013-09-26 | 2016-06-28 | Emc Corporation | Hash-based replication |
US9037822B1 (en) | 2013-09-26 | 2015-05-19 | Emc Corporation | Hierarchical volume tree |
US9384206B1 (en) | 2013-12-26 | 2016-07-05 | Emc Corporation | Managing data deduplication in storage systems |
US9529545B1 (en) | 2013-12-26 | 2016-12-27 | EMC IP Holding Company LLC | Managing data deduplication in storage systems based on storage space characteristics |
US9460102B1 (en) | 2013-12-26 | 2016-10-04 | Emc Corporation | Managing data deduplication in storage systems based on I/O activities |
US9395937B1 (en) | 2013-12-27 | 2016-07-19 | Emc Corporation | Managing storage space in storage systems |
US9606870B1 (en) | 2014-03-31 | 2017-03-28 | EMC IP Holding Company LLC | Data reduction techniques in a flash-based key/value cluster storage |
US9342465B1 (en) | 2014-03-31 | 2016-05-17 | Emc Corporation | Encrypting data in a flash-based contents-addressable block device |
US9396243B1 (en) | 2014-06-27 | 2016-07-19 | Emc Corporation | Hash-based replication using short hash handle and identity bit |
US9459809B1 (en) | 2014-06-30 | 2016-10-04 | Emc Corporation | Optimizing data location in data storage arrays |
US9304889B1 (en) | 2014-09-24 | 2016-04-05 | Emc Corporation | Suspending data replication |
US10025843B1 (en) | 2014-09-24 | 2018-07-17 | EMC IP Holding Company LLC | Adjusting consistency groups during asynchronous replication |
US9489275B2 (en) | 2014-10-02 | 2016-11-08 | Netapp, Inc. | Techniques for error handling in parallel splitting of storage commands |
US10037369B1 (en) | 2015-06-26 | 2018-07-31 | EMC IP Holding Company LLC | Storage tiering in replication target based on logical extents |
US10417056B2 (en) * | 2015-08-04 | 2019-09-17 | Oracle International Corporation | Systems and methods for performing concurrency restriction and throttling over contended locks |
US10152527B1 (en) | 2015-12-28 | 2018-12-11 | EMC IP Holding Company LLC | Increment resynchronization in hash-based replication |
US10534547B2 (en) | 2015-12-29 | 2020-01-14 | EMC IP Holding Company LLC | Consistent transition from asynchronous to synchronous replication in hash-based storage systems |
US10496672B2 (en) | 2015-12-30 | 2019-12-03 | EMC IP Holding Company LLC | Creating replicas at user-defined points in time |
US10459883B1 (en) | 2015-12-30 | 2019-10-29 | EMC IP Holding Company LLC | Retention policies for unscheduled replicas in backup, snapshots, and remote replication |
US9933947B1 (en) * | 2015-12-30 | 2018-04-03 | EMC IP Holding Company LLC | Maintaining write consistency on distributed multiple page writes |
US20170193070A1 (en) * | 2015-12-31 | 2017-07-06 | Synchronoss Technologies, Inc. | System and method for a distributed replication lock for active-active geo-redundant systems |
US10310951B1 (en) | 2016-03-22 | 2019-06-04 | EMC IP Holding Company LLC | Storage system asynchronous data replication cycle trigger with empty cycle detection |
US10324635B1 (en) | 2016-03-22 | 2019-06-18 | EMC IP Holding Company LLC | Adaptive compression for data replication in a storage system |
US9959063B1 (en) | 2016-03-30 | 2018-05-01 | EMC IP Holding Company LLC | Parallel migration of multiple consistency groups in a storage system |
US9959073B1 (en) | 2016-03-30 | 2018-05-01 | EMC IP Holding Company LLC | Detection of host connectivity for data migration in a storage system |
US10565058B1 (en) | 2016-03-30 | 2020-02-18 | EMC IP Holding Company LLC | Adaptive hash-based data replication in a storage system |
US10095428B1 (en) | 2016-03-30 | 2018-10-09 | EMC IP Holding Company LLC | Live migration of a tree of replicas in a storage system |
US10417396B2 (en) | 2016-04-14 | 2019-09-17 | NetSuite Inc. | System and methods for provisioning and monitoring licensing of applications or extensions to applications on a multi-tenant platform |
US10261853B1 (en) | 2016-06-28 | 2019-04-16 | EMC IP Holding Company LLC | Dynamic replication error retry and recovery |
US10496668B1 (en) | 2016-06-28 | 2019-12-03 | EMC IP Holding Company LLC | Optimized tender processing of hash-based replicated data |
US10459632B1 (en) | 2016-09-16 | 2019-10-29 | EMC IP Holding Company LLC | Method and system for automatic replication data verification and recovery |
US10374792B1 (en) | 2016-09-29 | 2019-08-06 | EMC IP Holding Company LLC | Layout-independent cryptographic stamp of a distributed dataset |
US10503427B2 (en) | 2017-03-10 | 2019-12-10 | Pure Storage, Inc. | Synchronously replicating datasets and other managed objects to cloud-based storage systems |
US10235066B1 (en) | 2017-04-27 | 2019-03-19 | EMC IP Holding Company LLC | Journal destage relay for online system checkpoint creation |
US10331350B1 (en) | 2017-04-27 | 2019-06-25 | EMC IP Holding Company LLC | Capacity determination for content-based storage |
US10152381B1 (en) | 2017-04-27 | 2018-12-11 | EMC IP Holding Company LLC | Using storage defragmentation function to facilitate system checkpoint |
US10409520B1 (en) | 2017-04-27 | 2019-09-10 | EMC IP Holding Company LLC | Replication of content-based storage using address space slices |
US10503609B1 (en) | 2017-04-27 | 2019-12-10 | EMC IP Holding Company LLC | Replication link smoothing using historical data |
US10324806B1 (en) | 2017-04-27 | 2019-06-18 | EMC IP Holding Company LLC | Snapshot visualization for content-based storage |
US10176046B1 (en) | 2017-06-29 | 2019-01-08 | EMC IP Holding Company LLC | Checkpointing of metadata into user data area of a content addressable storage system |
US10359965B1 (en) | 2017-07-28 | 2019-07-23 | EMC IP Holding Company LLC | Signature generator for use in comparing sets of data in a content addressable storage system |
US10437855B1 (en) | 2017-07-28 | 2019-10-08 | EMC IP Holding Company LLC | Automatic verification of asynchronously replicated data |
US10466925B1 (en) | 2017-10-25 | 2019-11-05 | EMC IP Holding Company LLC | Compression signaling for replication process in a content addressable storage system |
US10496489B1 (en) | 2017-11-21 | 2019-12-03 | EMC IP Holding Company LLC | Storage system configured for controlled transition between asynchronous and synchronous replication modes |
US10338851B1 (en) | 2018-01-16 | 2019-07-02 | EMC IP Holding Company LLC | Storage system with consistent termination of data replication across multiple distributed processing modules |
US10324640B1 (en) | 2018-01-22 | 2019-06-18 | EMC IP Holding Company LLC | Storage system with consistent initiation of data replication across multiple distributed processing modules |
US10394485B1 (en) | 2018-03-29 | 2019-08-27 | EMC IP Holding Company LLC | Storage system with efficient re-synchronization mode for use in replication of data from source to target |
US10496324B2 (en) | 2018-03-30 | 2019-12-03 | EMC IP Holding Company LLC | Storage system with concurrent fan-out asynchronous replication using decoupled replication sessions |
US10613793B1 (en) | 2018-11-01 | 2020-04-07 | EMC IP Holding Company LLC | Method to support hash based xcopy synchronous replication |
-
2019
- 2019-01-31 US US16/263,414 patent/US10719249B1/en active Active
-
2020
- 2020-05-26 US US16/883,024 patent/US10908830B2/en active Active
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220237209A1 (en) * | 2019-05-30 | 2022-07-28 | Zte Corporation | Database processing method and apparatus, and computer-readable storage medium |
US11928132B2 (en) * | 2019-05-30 | 2024-03-12 | Xi'an Zhongxing New Software Co., Ltd. | Database processing method and apparatus, and computer-readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
US20200285409A1 (en) | 2020-09-10 |
US10719249B1 (en) | 2020-07-21 |
US10908830B2 (en) | 2021-02-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10853200B2 (en) | Consistent input/output (IO) recovery for active/active cluster replication | |
US10191813B2 (en) | Data replication snapshots for persistent storage using operation numbers | |
CN110392876B (en) | Method for synchronously copying data sets and other managed objects to cloud-based storage system | |
US11782783B2 (en) | Method and apparatus to neutralize replication error and retain primary and secondary synchronization during synchronous replication | |
US9916321B2 (en) | Methods and apparatus for controlling snapshot exports | |
US11036423B2 (en) | Dynamic recycling algorithm to handle overlapping writes during synchronous replication of application workloads with large number of files | |
US11593016B2 (en) | Serializing execution of replication operations | |
US11442825B2 (en) | Establishing a synchronous replication relationship between two or more storage systems | |
US11099953B2 (en) | Automatic data healing using a storage controller | |
US20060230243A1 (en) | Cascaded snapshots | |
US11144252B2 (en) | Optimizing write IO bandwidth and latency in an active-active clustered system based on a single storage node having ownership of a storage object | |
US20170351462A1 (en) | Provisioning a slave for data storage using metadata with updated references | |
US9086811B2 (en) | Managing data sets of a storage system | |
US10452502B2 (en) | Handling node failure in multi-node data storage systems | |
US11238063B2 (en) | Provenance-based replication in a storage system | |
US10908830B2 (en) | Extent lock resolution in active/active replication | |
US9749193B1 (en) | Rule-based systems for outcome-based data protection | |
US10776224B2 (en) | Recovery after service disruption during an active/active replication session | |
US10719257B1 (en) | Time-to-live (TTL) license management in an active/active replication session | |
CN112771501A (en) | Remote Direct Memory Operation (RDMO) for transactional processing systems | |
US11226746B2 (en) | Automatic data healing by I/O | |
US8850126B2 (en) | Exclusive access during a critical sub-operation to enable simultaneous operations | |
US11531644B2 (en) | Fractional consistent global snapshots of a distributed namespace |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
AS | Assignment |
Owner name: EMC IP HOLDING COMPANY LLC, MASSACHUSETTS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MEIRI, DAVID;CHEN, XIANGPING;SIGNING DATES FROM 20190129 TO 20190211;REEL/FRAME:048305/0548 |
|
AS | Assignment |
Owner name: THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., TEXAS Free format text: SECURITY AGREEMENT;ASSIGNORS:CREDANT TECHNOLOGIES, INC.;DELL INTERNATIONAL L.L.C.;DELL MARKETING L.P.;AND OTHERS;REEL/FRAME:049452/0223 Effective date: 20190320 |
|
AS | Assignment |
Owner name: THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., TEXAS Free format text: SECURITY AGREEMENT;ASSIGNORS:CREDANT TECHNOLOGIES INC.;DELL INTERNATIONAL L.L.C.;DELL MARKETING L.P.;AND OTHERS;REEL/FRAME:053546/0001 Effective date: 20200409 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |