US7188272B2 - Method, system and article of manufacture for recovery from a failure in a cascading PPRC system - Google Patents
Method, system and article of manufacture for recovery from a failure in a cascading PPRC system Download PDFInfo
- Publication number
- US7188272B2 US7188272B2 US10/674,866 US67486603A US7188272B2 US 7188272 B2 US7188272 B2 US 7188272B2 US 67486603 A US67486603 A US 67486603A US 7188272 B2 US7188272 B2 US 7188272B2
- Authority
- US
- United States
- Prior art keywords
- storage unit
- data
- map
- storage
- mirroring
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/2053—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
- G06F11/2056—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring
- G06F11/2082—Data synchronisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/2053—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
- G06F11/2056—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring
- G06F11/2058—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring using more than 2 mirrored copies
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/2053—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
- G06F11/2056—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring
- G06F11/2071—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring using a plurality of controllers
Definitions
- the present invention relates to a method, system, and article of manufacture for recovery from a failure of a storage unit in a cascading PPRC system.
- Information technology systems including storage systems, may need protection from site disasters or outages, where outages may be planned or unplanned. Furthermore, information technology systems may require features for data migration, data backup, or data duplication. Implementations for disaster or outage recovery, data migration, data backup, and data duplication may include mirroring or copying of data in storage systems. Such mirroring or copying of data may involve interactions among hosts, storage systems and connecting networking components of the information technology system.
- An enterprise storage server such as the IBM* TotalStorage Enterprise Storage Server*, may be a disk storage server that includes one or more processors coupled to storage devices, including high capacity scalable storage devices, Redundant Array of Independent Disks (RAID), etc.
- the enterprise storage servers are connected to a network and include features for copying data in storage systems.
- Peer-to-Peer Remote Copy is an ESS function that allows the shadowing of application system data from a first site to a second site.
- the first site may be referred to as an application site, a local site, or a primary site.
- the second site may be referred to as a recovery site, a remote site or a secondary site.
- the logical volumes that hold the data in the ESS at the local site are called local volumes, and the corresponding logical volumes that hold the mirrored data at the remote site are called remote volumes.
- High speed links, such as ESCON links may connect the local and remote ESS systems.
- synchronous PPRC In the synchronous type of operation for PPRC, i.e., synchronous PPRC, the updates done my a host application to the local volumes at the local site are synchronously shadowed onto the remote volumes at the remote site.
- synchronous PPRC is a synchronous copying solution
- write updates are ensured on both copies (local and remote) before the write is considered to be completed for the host application.
- the host application does not get the “write complete” condition until the update is synchronously done in both the local and the remote volumes. Therefore, from the perspective of the host application the data at the remote volumes at the remote site is equivalent to the data at the local volumes at the local site.
- Synchronous PPRC increases the response time as compared to asynchronous copy operation, and this is inherent to the synchronous operation.
- the overhead comes from the additional steps that are executed before the write operation is signaled as completed to the host application.
- the PPRC activity between the local site and the remote site will be comprised of signals and data that travel through the links that connect the sites, and the overhead response time of the host application write operations will increase proportionally with the distance between the sites. Therefore, the distance affects a host application's response time.
- Extended Distance PPRC also referred to as PPRC Extended Distance
- PPRC mirrors the updates of the local volume onto the remote volumes in an asynchronous manner, while the host application is running.
- the host application receives a write complete response before the update is copied from the local volumes to the remote volumes.
- a host application's write operations are free of the typical synchronous overheads. Therefore, Extended Distance PPRC is suitable for remote copy solutions at very long distances with minimal impact on host applications. There is no overhead penalty upon the host application's write such as in synchronous PPRC.
- Extended Distance PPRC does not continuously maintain an equivalent copy of the local data at the remote site.
- a first storage controller is associated with the first storage unit which synchronously mirrors the data to a second storage unit associated with a second storage controller, which in turn asynchronously mirrors the data to a third storage unit.
- the first, second and third storage units are maintained at separate locations. It is common for the first storage unit to be maintained at the main application site.
- the second storage unit is often maintained at a bunker site near enough to the first storage unit to maintain an efficient synchronous PPRC relationship, but separated and protected from the first storage unit in order to decrease the chance that the first and second storage units would both be destroyed in a common disaster.
- the third storage unit can be located at any distance from the second storage unit.
- the present invention is directed to overcoming one or more of the problems discussed above.
- the need in the art is addressed by a method of recovery from a data storage system failure in a data storage system having a host computer writing data to a first storage unit with a first storage controller synchronously mirroring the data to a second storage unit, and with a second storage controller asynchronously mirroring the data to a third storage unit.
- the method is triggered by the detection of a failure associated with the first storage unit.
- the synchronous PPRC relationship between the first storage unit and the second storage unit is terminated and the host is directed to write data updates directly to the second storage unit.
- the asynchronous PPRC relationship between the second storage unit and the third storage unit is maintained.
- the asynchronous mirroring of data updates from the second storage unit to the third storage unit is suspended and synchronous mirroring of the data updates in a reverse direction, from the second storage unit to the first storage unit, is commenced.
- host I/O operations can be quiesced.
- the synchronous PPRC relationship with the first storage volume mirroring data to the second storage unit may be reestablished and host I/O writes to the first storage unit may be resumed.
- the asynchronous PPRC relationship between the second storage unit and the third storage unit is reestablished and the data stored on the third storage volume is brought current with that maintained on the second storage volume.
- the data storage tracks associated with the second storage unit which contain mirrored data updates from the synchronous PPRC relationship with the first storage unit are identified by a first map, more specifically an out of synch (OOS) bitmap which represents updates that must be sent from the second to the third storage unit.
- OOS out of synch
- the data storage tracks associated with the second storage unit which contain data updates received when host I/O operations are writing data directly to the second storage volume can be identified with a second map, specifically a change recording (CR) bitmap.
- the information contained in the OOS bitmap and the CR bitmap or copies thereof may be manipulated to allow recovery of the first storage volume, resynchronization of the third storage volume and return to normal operations after a failure associated with the first storage volume without the need for a full volume copy.
- host application downtime is minimized.
- FIG. 1 illustrates a block diagram of a computing environment in accordance with certain described aspects of the invention
- FIG. 2 illustrates a block diagram of a cascading copy application in accordance with certain described implementations of the invention
- FIG. 3 illustrates logic implemented in a first storage unit in accordance with certain described implementations of the invention
- FIG. 4 illustrates logic for receiving data synchronously as implemented in an second storage unit in accordance with certain described implementations of the invention.
- FIG. 5 illustrates logic for copying data asynchronously as implemented in the second storage unit in accordance with certain described implementations of the invention
- FIG. 6 illustrates a block diagram of a method of recovering from a failure to the first storage unit which does not require a full volume copy
- FIG. 7 illustrates a block diagram of the bitmap manipulation associated with asynchronous PPRC mirroring of data from the second storage unit to the third storage unit;
- FIG. 8 illustrates a block diagram of the bitmap manipulation occurring when data updates are written directly to the second storage unit
- FIG. 9 illustrates a block diagram of the bitmap manipulation occurring when data updates are synchronously mirrored from the second storage unit to the first storage unit.
- FIG. 10 illustrates a block diagram of the bitmap manipulation occurring when the asynchronous PPRC relationship is reestablished and the third data storage unit is synchronized.
- FIG. 1 illustrates a computing environment utilizing three storage control units, such as a first storage unit 100 , an second storage unit 102 , and a third storage unit 104 connected by data interface channels 106 , 108 , such as the Enterprise System Connection (ESCON)* channel or any other data interface mechanism known in the art (e.g., fibre channel, Storage Area Network (SAN) interconnections, etc.).
- ESCON Enterprise System Connection
- SAN Storage Area Network
- the three storage control units 100 , 102 , 104 may be at three different sites with the first storage unit 100 and the second storage unit 102 being within a synchronous communication distance of each other.
- the synchronous communication distance between two storage control units is the distance up to which synchronous communication is feasible between the two storage control units.
- the third storage unit 104 may be a long distance away from the second storage unit 102 and the first storage unit 100 , such that synchronous copying of data from the second storage unit 102 to the third storage unit 104 may be time consuming or impractical.
- the second storage unit 102 may be in a secure environment separated from the first storage unit 100 and with separate power to reduce the possibility of an outage affecting both the first storage unit 100 and the second storage unit 102 .
- Certain implementations of the invention create a three site (local, intermediate, remote) disaster recovery solution where there may be no data loss if the first storage unit 100 is lost.
- the first storage unit 100 is kept at the local site
- the second storage unit 102 is kept at the intermediate site
- the third storage unit 104 is kept at the remote site. Data copied on the second storage unit 102 or the third storage unit 104 may be used to recover from the loss of the first storage unit 100 .
- the first storage unit 100 and the second storage unit 102 may be at the same site.
- functions of a plurality of storage control units may be integrated into a single storage control unit, e.g., functions of the first storage unit 100 and the second storage unit 102 may be integrated into a single storage control unit.
- the first storage unit 100 is coupled to a host via data interface channel 112 . While only a single host 110 is shown coupled to the first storage unit 100 , in certain implementations of the invention, a plurality of hosts may be coupled to the first storage unit 100 .
- the host 110 may be any computational device known in the art, such as a personal computer, a workstation, a server, a mainframe, a hand held computer, a palm top computer, a telephony device, network appliance, etc.
- the host 110 may include any operating system (not shown) known in the art, such as the IBM OS/390* operating system.
- the host 110 may include at least one host application 114 that sends Input/Output (I/O) requests to the first storage unit 100 .
- I/O Input/Output
- the storage control units 100 , 102 , 104 are coupled to storage volumes such as local site storage volumes 116 , intermediate site storage volumes 118 , and remote site storage volumes 120 , respectively.
- the storage volumes 116 , 118 , 120 may be configured as a Direct Access Storage Device (DASD), one or more RAID ranks, just a bunch of disks (JBOD), or any other data repository system known in the art.
- DASD Direct Access Storage Device
- JBOD just a bunch of disks
- the storage control units 100 , 102 , 104 may each include a cache, such as cache 122 , 124 , 126 , respectively.
- the caches 122 , 124 , 126 comprise volatile memory to store tracks.
- the storage control units 100 , 102 , 104 may each include a non-volatile storage (NVS), such as non-volatile storage 128 , 130 , 132 , respectively.
- NVS non-volatile storage
- the non-volatile storage 128 , 130 , 132 elements may buffer certain modified tracks in the caches 122 , 124 , 126 , respectively.
- the first storage unit 100 additionally includes an application, such as a local application 134 , for synchronous copying of data stored in the cache 122 , non-volatile storage 128 , and local site storage volumes 116 to another storage control unit, such as the second storage unit 102 .
- the local application 134 includes copy services functions that execute in the first storage unit 100 .
- the first storage unit 100 receives I/O requests from the host application 114 to read and write to the local site storage volumes 116 .
- the second storage unit 102 additionally includes an application such as a cascading PPRC application 136 .
- the cascading PPRC application 136 includes copy services functions that execute in the second storage unit 102 .
- the cascading PPRC application 136 can interact with the first storage unit 100 to receive data synchronously.
- the cascading PPRC application 136 can also send data asynchronously to the third storage unit 104 . Therefore, the cascading PPRC application 136 cascades a first pair of storage control units formed by the first storage unit 100 and the second storage unit 102 and the third storage unit 104 .
- additional storage control units may be cascaded.
- the third storage unit 104 additionally includes an application, such as a remote application 138 , that can receive data asynchronously from another storage control unit such as the second storage unit 102 .
- the remote application 138 includes copy services functions that execute in the third storage unit 104 .
- the second storage unit 102 also includes an out of synch (OOS) bitmap 140 .
- OOS bitmap 140 identifies those tracks having changed data on the intermediate site storage volumes 118 , said data having been changed as a result of the synchronous PPRC updates received from the first storage unit 100 .
- the OOS bitmap 140 can be used to identify those tracks associated with the intermediate site storage volumes 118 which have been updated directly by the host application 114 .
- the second storage unit 102 also includes a change recording (CR) bitmap 142 .
- the CR bitmap 142 is capable of being toggled creating a preserved copy of the CR bitmap 144 at a point in time.
- the CR bitmap 142 identifies tracks associated with the intermediate site storage volumes 118 which contain changed or updated data.
- FIG. 1 illustrates a computing environment where a host application 114 sends I/O requests to a first storage unit 100 .
- the first storage unit 100 synchronously copies data to the second storage unit 102
- the second storage unit 102 asynchronously copies data to the third storage unit 104 .
- FIG. 2 illustrates a block diagram that illustrates communications between the local application 134 , the cascading PPRC application 136 and the remote application 138 , in accordance with certain implementations of the invention.
- the local application 134 performs a synchronous data transfer, such as via synchronous PPRC 200 , to a synchronous copy process 202 that may be generated by the cascading PPRC application 136 .
- the synchronous data transfer 200 takes place over the data interface channel 106 .
- a background asynchronous copy process 204 that may be generated by the cascading PPRC application 136 performs an asynchronous data transfer, such as via Extended Distance PPRC 206 , to the remote application 138 .
- the asynchronous data transfer takes place over the data interface channel 108 .
- the intermediate site storage volumes 118 may include a copy of the local site storage volumes 116 .
- the distance between the first storage unit 100 and the second storage unit 102 is kept as close as possible to minimize the performance impact of synchronous PPRC.
- Data is copied asynchronously from the second storage unit 102 to the third storage unit 104 . As a result, the effect of long distance on the host response time is eliminated.
- FIG. 2 illustrates how the cascading PPRC application 136 on the second storage unit 102 receives data synchronously from the first storage unit 100 and transmits data asynchronously to the third storage unit 104 .
- FIG. 3 illustrates logic implemented in the local storage unit 100 in accordance with certain implementations of the invention. Certain implementations of the invention of the logic of FIG. 3 may be implemented in the local application 134 resident in the first storage unit 100 .
- Control starts at block 300 where the local application 134 receives a write request from the host application 114 .
- the local application 134 writes (at block 302 ) data corresponding to the write request on the cache 122 and the non-volatile storage 128 on the first storage unit 100 .
- Additional applications such as caching applications and non-volatile storage applications in the first storage unit 100 may manage the data in the cache 122 and the data in the non-volatile storage 128 and keep the data in the cache 122 and the non-volatile storage 128 consistent with the data in the local site storage volumes 116 .
- the local application 134 determines (at block 304 ) if the first storage unit 100 is a primary PPRC device, i.e., the first storage unit includes source data for a PPRC transaction. If so, the local application 134 sends (at block 306 ) the written data to the second storage unit 10 via a new write request. The local application 134 waits (at block 308 ) for a write complete acknowledgement from the second storage unit 102 . The local application 134 receives (at block 310 ) a write complete acknowledgement from the second storage unit 102 . Therefore, the local application 134 has transferred the data written by the host application 114 on the first storage unit 100 to the second storage unit 102 via a synchronous copy.
- the local application 134 signals (at block 312 ) to the host application 114 that the write request from the host application 114 has been completed at the first storage unit 100 .
- the local application 134 receives (at block 300 ) a next write request from the host application 114 .
- the local application 134 determines (at block 304 ) that the first storage unit 100 is not a primary PPRC device, i.e., the first storage unit is not a source device for a PPRC transaction, then the local application 134 does not have to send any data to the second storage unit 102 , and the local application 134 signals (at block 312 ) to the host application 114 that the write request from the host application 114 has been completed at the first storage unit 100 .
- FIG. 3 illustrates a logic for receiving a write request from the host application 114 to the first storage unit 100 and synchronously copying the data corresponding to the write request from the first storage unit 100 to the second storage unit 102 .
- the host application 114 waits for the write request to be completed while the synchronous copying of the data takes place. Since the first storage unit 100 and the second storage unit 102 are within a synchronous communication distance of each other, the synchronous copying of data from the first storage unit 100 to the second storage unit 102 takes a smaller amount of time when compared to the situation where the first storage unit 100 is beyond a synchronous communication distance to the second storage unit 102 . Since the copy of the data on the second storage unit 102 is written synchronously, the second storage unit 102 includes an equivalent copy of the data on the first storage unit 100 .
- FIG. 4 illustrates logic for receiving data synchronously as implemented in the second storage unit 102 in accordance with certain implementations of the invention.
- the cascading PPRC application 136 may perform the logic illustrated in FIG. 4 .
- Control starts at block 400 where the cascading PPRC application 136 receives a write request from the local application 134 .
- the write request sent at block 306 of FIG. 3 to the second storage unit 102 may be received by the cascading PPRC application 136 .
- the cascading PPRC application 136 writes (at block 402 ) data corresponding to the write request to the cache 124 and the non-volatile storage 130 .
- the second storage unit 102 may keep the cache 124 and the non-volatile storage 130 consistent with the intermediate storage volumes 118 .
- the cascading PPRC application 136 determines (at block 404 ) if data on the second storage unit 102 is to be cascaded, i.e., the data is set to be sent to the third storage unit 104 . If so, the synchronous copy process 202 of the cascading PPRC application 136 marks (at block 406 ) data as PPRC modified. The synchronous copy process 202 of the cascading PPRC application 136 signals (at block 408 ) a write complete acknowledgement to the local application 134 . The cascading PPRC application 136 receives (at block 400 ) the next write request from the local application 134 .
- the cascading PPRC application 136 determines (at block 404 ) that data on the second storage unit 102 does not have to be cascaded, then the synchronous copy process 202 of the cascading PPRC application 136 signals (at block 408 ) a write complete acknowledgement to the local application 134 and the cascading PPRC application 136 receives (at block 400 ) the next request from the local application 134 .
- FIG. 4 illustrates how the second storage unit 102 receives a write request from the first storage unit 100 where the write request responds to a host write request.
- the second storage unit 102 marks data corresponding to the host write request as PPRC modified.
- FIG. 5 illustrates logic for copying data asynchronously as implemented in the second storage unit 102 in accordance with certain implementations of the invention.
- the logic illustrated in FIG. 5 may be performed by the background asynchronous copy process 204 of the cascading PPRC application 136 .
- Control starts at block 500 where the background asynchronous copy process 204 of the cascading PPRC application 136 determines the PPRC modified data stored in the cache 124 , non-volatile storage 130 , and the intermediate site storage volumes 118 of the second storage unit 102 .
- the background asynchronous copy process 204 of the cascading PPRC application 136 sends (at block 502 ) the PPRC modified data to the third storage unit 104 asynchronously, i.e., the background asynchronous copy process 204 keeps sending the PPRC modified data stored in the cache 124 , non-volatile storage 130 , and the intermediate site storage volumes 118 of the second storage unit 102 .
- the background asynchronous copy process 204 determines (at block 504 ) if the write complete acknowledgement has been received from the third storage unit 104 . If not, the background asynchronous copy process 204 again determines (at block 504 ) if the write complete acknowledgement has been received.
- the background asynchronous copy process 204 determines (at block 504 ) that write complete acknowledgement has been received from the third storage unit 104 then the background asynchronous copy process 204 determines (at block 500 ) the PPRC modified data once again.
- the logic of FIG. 5 illustrates how the background asynchronous copy process 204 while executing in the background copies of data asynchronously from the second storage unit 102 to the third storage unit 104 . Since the copying is asynchronous, the second storage unit 102 and the third storage unit 104 may be separated by long distances, such as the extended distances allowed by Extended Distance PPRC.
- the background asynchronous copy process 204 may quickly complete the copy of all remaining modified data to the third storage unit 104 .
- the remote site storage volumes 120 will include an equivalent copy of all updates up to the time of the outage. If there are multiple failures such that both the first storage unit 100 and the second storage unit are lost, then there may be data loss at the remote site.
- the data on the third storage unit 104 may not be equivalent to the data on the first storage unit 100 unless all of the data from the second storage unit 102 has been copied up to some point in time.
- certain implementations of the invention may force the data at the third storage unit to contain all dependent updates up to some specified time.
- the consistent copy at the third storage unit 104 may be preserved via a point in time copy, such as FlashCopy*.
- One method may include quiescing the host I/O temporarily at the local site while the third storage unit 104 catches up with the updates. Another method may prevent writes to the second storage unit 102 while the third storage unit 104 catches up with the updates.
- the implementations create a long distance disaster recovery solution by first copying synchronously from a first storage unit to an second storage unit and subsequently copying asynchronously from the second storage unit to a third storage unit.
- the distance between the first storage unit and the second storage unit may be small enough such that copying data synchronously does not cause a significant performance impact on applications that perform I/O operations on the first storage unit.
- the data can be recovered from replicated copies of the data on either the second storage unit 102 or the remote control storage unit 104 .
- the described techniques may be implemented as a method, apparatus or article of manufacture using standard programming and/or engineering techniques to produce software, firmware, hardware, or any combination thereof.
- article of manufacture refers to code or logic implemented in hardware logic (e.g., an integrated circuit chip, Programmable Gate Array (PGA), Application Specific Integrated Circuit (ASIC), etc.) or a computer readable medium (e.g., magnetic storage medium such as hard disk drives, floppy disks, tape), optical storage (e.g., CD-ROMs, optical disks, etc.), volatile and non-volatile memory devices (e.g., EEPROMs, ROMs, PROMs, RAMs, DRAMs, SRAMs, firmware, programmable logic, etc.).
- hardware logic e.g., an integrated circuit chip, Programmable Gate Array (PGA), Application Specific Integrated Circuit (ASIC), etc.
- a computer readable medium e.g., magnetic storage medium such as hard disk drives, floppy disks, tape
- optical storage e.g
- Code in the computer readable medium is accessed and executed by a processor.
- the code in which implementations are made may further be accessible through a transmission media or from a file server over a network.
- the article of manufacture in which the code is implemented may comprise a transmission media such as network transmission line, wireless transmission media, signals propagating through space, radio waves, infrared signals, etc.
- a transmission media such as network transmission line, wireless transmission media, signals propagating through space, radio waves, infrared signals, etc.
- the data transfer between the first storage unit 100 and the second storage unit 102 may be via Extended Distance PPRC. However, there may be data loss if there is an outage at the first storage unit 100 . Additionally, in alternative implementations of the invention the data transfer between the second storage unit 102 and the third storage unit 104 may be via synchronous PPRC. However, there may be performance impacts on the I/O from the host 110 to the first storage unit 100 .
- the functions of the first storage unit 100 and the second storage unit 102 may be implemented in a single storage control unit.
- a fourth storage control unit may be coupled to the third storage unit 104 and data may be transferred from the third storage unit 104 to the fourth storage control unit.
- a chain of synchronous data transfers and a chain of asynchronous data transfers may take place among a plurality of cascaded storage control units.
- the storage control units may be any storage unit known in the art.
- FIGS. 3 , 4 , and 5 describe specific operations occurring in a particular order. Further, the operations may be performed in parallel as well as sequentially. In alternative implementations, certain of the logic operations may be performed in a different order, modified, or removed and still implement implementations of the present invention. Moreover, steps may be added to the above described logic and still conform to the implementations. Yet further steps may be performed by a single process or distributed processes.
- FIG. 6 A generalized illustration of a method for recovery from a failure associated with the first storage unit 100 is shown in FIG. 6 .
- the local (or primary) site fails, the balance of the data storage system is initially unaware of the failure.
- High availability cluster multi-processing (HACMP) or other management software detects the loss of the first storage unit 100 (step 600 ).
- HACMP High availability cluster multi-processing
- the Extended Distance PPRC relationship causing the asynchronous mirroring of data from the second storage unit 102 to the third storage unit 104 is intact and operational.
- the PPRC relationship between the second storage unit 102 and the third storage unit 104 is accomplished as is shown in FIG. 7 .
- OOS bitmap 140 The identity of data tracks associated with the second storage unit 102 which have been modified by the synchronous mirroring of data prior to the failure of the first storage unit 100 are reflected in an out of synch (OOS) bitmap 140 (step 702 ). Continuously, data on tracks identified by the OOS bitmap 140 is asynchronously mirrored to the third storage unit (step 704 ). The OOS bitmap 140 will be employed in the recovery method in a new function to allow recovery of the first storage unit 100 without a full volume copy.
- OOS out of synch
- the recovery program Upon detection of the failure associated with the first storage unit, the recovery program issues a command to the second storage unit 102 which can be a FAILOVER command suspending the synchronous mirroring of data from the first storage unit 100 to the second storage unit 102 (step 602 ).
- the host application 114 is directed to write data updates directly to the second storage unit 102 (step 604 ). These updates written from the host 110 to the second storage unit 102 are reflected in the existing OOS bitmap 140 .
- changes to tracks associated with the second storage unit 102 are also reflected in a change recording (CR) bitmap 142 ( FIG. 8 , step 802 ), set up as a result of the FAILOVER command.
- CR change recording
- the failure associated with the first storage unit 100 can be corrected.
- writes to the second storage unit 120 assure that minimal or no data is lost and normal operations can continue.
- the data stored on the first storage unit 100 Prior to the time host I/O operations to the first storage unit 100 are resumed, the data stored on the first storage unit 100 must be synchronized with the data stored on the second storage unit 102 .
- a preferred method of accomplishing this synchronization which avoids host I/O interruption or a full volume data copy is to use the OOS bitmap 140 and CR bitmap 142 and a reverse PPRC synchronous mirroring operation to synchronize the first storage unit.
- the OOS bitmap 140 is necessary to the asynchronous mirroring of data updates from the second storage unit 102 to the third storage unit 104 (steps 702 , 704 ), it is necessary to initially suspend the asynchronous mirroring of data updates from the second storage unit 102 to the third storage unit 104 (step 606 ) prior to synchronizing the first storage unit 100 . Then, updates stored on the second storage unit 102 can be synchronously mirrored to the first storage unit 100 (step 608 ).
- the resynchronization of the first storage unit is a two step process. First, changed data written by the host application 114 to the second storage unit 102 while the first storage unit 100 was not operational is copied to the first storage unit 100 . This changed data is stored on tracks associated with the second storage unit 102 and reflected in the OOS bitmap associated with the second storage unit 102 . During resynchronization, a first pass is made through the OOS bitmap and updates are copied to the first storage unit 100 .
- host writes are not sent to the first storage unit 100 , but are recorded in the OOS bitmap associated with the second storage unit 102 . Then, a second pass is made through the OOS bitmap. During the second pass, host writes are sent synchronously from the second storage unit 102 to the first storage unit 100 .
- FIG. 9 A method which avoids terminating host I/O operations during the bitmap manipulations necessary to resynchronize the first storage unit 100 is illustrated in FIG. 9 .
- host I/O operations are writing changed data to tracks associated with the second storage unit 102 .
- the identity of the tracks with updated data stored therein is reflected in both the OOS bitmap 140 and the CR bitmap 142 (step 902 ) at the second storage unit 102 .
- the recovery program can issue a command to the second storage unit 102 which causes a swap of the contents of the CR and OOS bitmaps 142 , 140 .
- This command can be a FAILBACK command.
- the second storage unit 102 is marked with a special indicator “primed for resynch” in preparation for the resynchronization between the second and third units 102 , 104 at the end of the process.
- the resynchronization between the second and first storage units 102 , 100 is started using the new swapped contents of the OOS bitmap 140 associated with the second storage unit 102 to determine which tracks to send. Throughout this process, host writes continue to be recorded in both the OOS bitmap 140 and the CR bitmap 142 .
- the OOS bitmap 140 records differences between the second and first units 100 , 102
- the CR bitmap records differences between the second and third storage units 102 , 104 .
- changed data on tracks reflected by the OOS bitmap 140 can be copied from the second storage unit 102 to the first storage unit 100 (step 912 ) while new updates are synchronously mirrored from the second storage unit 102 to the first storage unit 100 until a full duplex between the storage units is reached (step 914 ).
- host I/O operations must be quiesced to allow the PPRC relationship between the first storage unit 100 and the second storage unit 102 to be reestablished and to allow host I/O to be swapped back to the first storage unit 100 (step 610 ).
- the asynchronous PPRC relationship between the second storage unit 102 and the third storage unit 104 may be reestablished (step 616 ).
- reestablishment of the asynchronous mirroring relationship from the second storage unit 102 to the third storage unit 104 occurs as is shown in FIG. 10 (step 1007 ).
- the recovery program is triggered to compare the CR bitmap 142 to the OOS bitmap 140 by the “primed for resync” indicator set by the FAILBACK command used to start the resynchronization of the first storage unit 100 and the second storage unit 102 .
- Both bitmaps are associated with the second storage unit 102 , and have been tracking data changes written synchronously to the second storage unit 102 since the recommencement of host I/O operations to the first storage unit 100 .
- the recovery program must add the identity of tracks containing changed data as identified by the CR bitmap 142 but not identified by the OOS bitmap 140 to the OOS bitmap 140 (step 1008 ). Then, changed data as identified by the OOS bitmap 140 can be mirrored from the second storage unit 102 to the third storage unit 102 (step 1010 ).
- the described techniques for recovery from a failure in a cascading PPRC system may be implemented.
- the described techniques may be implemented as a method, apparatus or article of manufacture using standard programming and/or engineering techniques to produce software, firmware, hardware, or any combination thereof.
- article of manufacture refers to code or logic implemented in hardware logic (e.g., an integrated circuit chip, Programmable Gate Array (PGA), Application Specific Integrated Circuit (ASIC), etc.) or a computer readable medium (e.g., magnetic storage medium such as hard disk drives, floppy disks, tape), optical storage (e.g., CD-ROMs, optical disks, etc.), volatile and non-volatile memory devices (e.g., EEPROMs, ROMs, PROMs, RAMs, DRAMs, SRAMs, firmware, programmable logic, etc.).
- Code in the computer readable medium is accessed and executed by a processor.
- the code in which implementations are made may further be accessible through a transmission media or from a file server over a network.
- the article of manufacture in which the code is implemented may comprise a transmission media such as network transmission line, wireless transmission media, signals propagating through space, radio waves, infrared signals, etc.
- a transmission media such as network transmission line, wireless transmission media, signals propagating through space, radio waves, infrared signals, etc.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A method of recovery from a data storage system failure in a data storage system having a host computer writing data to a first storage unit with a first storage controller synchronously mirroring the data to a second storage unit, and with a second storage controller asynchronously mirroring the data to a third storage unit. The method begins with the detection of a failure associated with the first storage unit. Upon detection of the error or failure associated with the first storage unit, the synchronous data mirroring relationship between the first storage unit and the second storage unit is terminated and the host is directed to write data updates directly to the second storage unit. Upon correction of the failure associated with the first storage unit, the asynchronous mirroring of data updates from the second storage unit to the third storage unit is suspended and synchronous mirroring of the data updates in a reverse direction, from the second storage unit to the first storage unit, is commenced. When a full duplex state is reached between the first storage unit and the second storage unit, the synchronous PPRC relationship with the first storage volume mirroring data to the second storage volume may be reestablished and host I/O writes to the first storage unit may be resumed.
Description
This application incorporates by reference commonly-assigned and co-pending U.S. patent Ser. No. 10/464,024, filed Jun. 6, 2003, and entitled METHOD, SYSTEM AND ARTICLE OF MANUFACTURE FOR REMOTE COPYING OF DATA. This application also incorporates by reference commonly-assigned and co-pending Ser. No. 10/674,289 entitled METHOD, SYSTEM, AND PROGRAM FOR RECOVERY FROM A FAILURE IN AN ASYNCHRONOUS DATA COPYING SYSTEM; Ser. No. 10/675,289 entitled APPARATUS AND METHOD TO COORDINATE MULTIPLE DATA STORAGE AND RETREIVAL STORAGE SYSTEMS; Ser. No. 10/676,852 entitled METHOD, SYSTEM AND PROGRAM FOR FORMING A CONSISTENCY GROUP; Ser. No. 10/674,900 entitled AUTONOMIC INFRASTRUCTURE ENABLEMENT FOR POINT IN TIME COPY CONSISTENCY GROUPS; Ser. No. 10/674,845 entitled METHOD, SYSTEM, AND PROGRAM FOR MIRRORING DATA AMONG STORAGE SITES; and Ser. No. 10/675,317 entitled METHOD, SYSTEM AND PROGRAM FOR ASYNCHRONOUS COPY, all filed on Sep. 29, 2003.
1. Technical Field
The present invention relates to a method, system, and article of manufacture for recovery from a failure of a storage unit in a cascading PPRC system.
2. Bachground Art
Information technology systems, including storage systems, may need protection from site disasters or outages, where outages may be planned or unplanned. Furthermore, information technology systems may require features for data migration, data backup, or data duplication. Implementations for disaster or outage recovery, data migration, data backup, and data duplication may include mirroring or copying of data in storage systems. Such mirroring or copying of data may involve interactions among hosts, storage systems and connecting networking components of the information technology system.
An enterprise storage server (ESS), such as the IBM* TotalStorage Enterprise Storage Server*, may be a disk storage server that includes one or more processors coupled to storage devices, including high capacity scalable storage devices, Redundant Array of Independent Disks (RAID), etc. The enterprise storage servers are connected to a network and include features for copying data in storage systems.
Peer-to-Peer Remote Copy (PPRC) is an ESS function that allows the shadowing of application system data from a first site to a second site. The first site may be referred to as an application site, a local site, or a primary site. The second site may be referred to as a recovery site, a remote site or a secondary site. The logical volumes that hold the data in the ESS at the local site are called local volumes, and the corresponding logical volumes that hold the mirrored data at the remote site are called remote volumes. High speed links, such as ESCON links may connect the local and remote ESS systems.
In the synchronous type of operation for PPRC, i.e., synchronous PPRC, the updates done my a host application to the local volumes at the local site are synchronously shadowed onto the remote volumes at the remote site. As synchronous PPRC is a synchronous copying solution, write updates are ensured on both copies (local and remote) before the write is considered to be completed for the host application. In synchronous PPRC the host application does not get the “write complete” condition until the update is synchronously done in both the local and the remote volumes. Therefore, from the perspective of the host application the data at the remote volumes at the remote site is equivalent to the data at the local volumes at the local site.
Synchronous PPRC increases the response time as compared to asynchronous copy operation, and this is inherent to the synchronous operation. The overhead comes from the additional steps that are executed before the write operation is signaled as completed to the host application. Also, the PPRC activity between the local site and the remote site will be comprised of signals and data that travel through the links that connect the sites, and the overhead response time of the host application write operations will increase proportionally with the distance between the sites. Therefore, the distance affects a host application's response time. In certain implementations, there may be a maximum supported distance for synchronous PPRC operations referred to as the synchronous communication distance.
In the Extended Distance PPRC (also referred to as PPRC Extended Distance) method of operation, PPRC mirrors the updates of the local volume onto the remote volumes in an asynchronous manner, while the host application is running. In Extended Distance PPRC, the host application receives a write complete response before the update is copied from the local volumes to the remote volumes. In this way, when in Extended Distance PPRC, a host application's write operations are free of the typical synchronous overheads. Therefore, Extended Distance PPRC is suitable for remote copy solutions at very long distances with minimal impact on host applications. There is no overhead penalty upon the host application's write such as in synchronous PPRC. However, Extended Distance PPRC does not continuously maintain an equivalent copy of the local data at the remote site.
Further details of the PPRC are described in the IBM publication “IBM TotalStorage Enterprise Storage Server: PPRC Extended Distance,” IBM document number SG24-6568-00 (Copyright IBM, 2002), which publication is incorporated herein by reference in its entirety.
Additional flexibility and safety in data storage can be achieved by combining synchronous PPRC and asynchronous Extended Distance PPRC elements in a single data storage system. Once such system is disclosed in co-pending and commonly assigned U.S. patent application Ser. No. 10/464,024, filed Jun. 17, 2003 entitled, “Method, System, and Article of Manufacture for Remote Copying of Data” which application is incorporated herein by reference in its entirety. The cascading data storage system described in U.S. patent application Ser. No. 10/464,024 features a first storage unit receiving data from the I/O operations of a host computer. A first storage controller is associated with the first storage unit which synchronously mirrors the data to a second storage unit associated with a second storage controller, which in turn asynchronously mirrors the data to a third storage unit. Typically, the first, second and third storage units are maintained at separate locations. It is common for the first storage unit to be maintained at the main application site. The second storage unit is often maintained at a bunker site near enough to the first storage unit to maintain an efficient synchronous PPRC relationship, but separated and protected from the first storage unit in order to decrease the chance that the first and second storage units would both be destroyed in a common disaster. The third storage unit can be located at any distance from the second storage unit.
As is discussed in U.S. application Ser. No. 10/464,024, return to full operation at the first storage unit after a failure can be accomplished by performing a full copy of all volumes maintained on the second or third storage units to the first storage unit. Unfortunately, a full volume copy may take hours depending upon the amount of data stored in the respective storage units. Therefore, a need exists in the art for a recovery method and apparatus that can be implemented that avoids the need for full copies of volumes to restore the configuration back to normal operation.
The present invention is directed to overcoming one or more of the problems discussed above.
The need in the art is addressed by a method of recovery from a data storage system failure in a data storage system having a host computer writing data to a first storage unit with a first storage controller synchronously mirroring the data to a second storage unit, and with a second storage controller asynchronously mirroring the data to a third storage unit. The method is triggered by the detection of a failure associated with the first storage unit. Upon detection of the error or failure associated with the first storage unit, the synchronous PPRC relationship between the first storage unit and the second storage unit is terminated and the host is directed to write data updates directly to the second storage unit. During the time period when the host begins writing updates to the second storage unit, the asynchronous PPRC relationship between the second storage unit and the third storage unit is maintained. Upon correction of the failure associated with the first storage unit, the asynchronous mirroring of data updates from the second storage unit to the third storage unit is suspended and synchronous mirroring of the data updates in a reverse direction, from the second storage unit to the first storage unit, is commenced. When a full duplex state is reached between the first storage unit and the second storage unit, host I/O operations can be quiesced. Subsequently, the synchronous PPRC relationship with the first storage volume mirroring data to the second storage unit may be reestablished and host I/O writes to the first storage unit may be resumed. Finally, the asynchronous PPRC relationship between the second storage unit and the third storage unit is reestablished and the data stored on the third storage volume is brought current with that maintained on the second storage volume.
Preferably, the data storage tracks associated with the second storage unit which contain mirrored data updates from the synchronous PPRC relationship with the first storage unit are identified by a first map, more specifically an out of synch (OOS) bitmap which represents updates that must be sent from the second to the third storage unit. Similarly, the data storage tracks associated with the second storage unit which contain data updates received when host I/O operations are writing data directly to the second storage volume, can be identified with a second map, specifically a change recording (CR) bitmap. The information contained in the OOS bitmap and the CR bitmap or copies thereof may be manipulated to allow recovery of the first storage volume, resynchronization of the third storage volume and return to normal operations after a failure associated with the first storage volume without the need for a full volume copy. In addition, host application downtime is minimized.
In the following description, reference is made to the accompanying drawings which form a part hereof and which illustrate several implementations. It is understood that other implementations may be utilized and structural and operational changes may be made without departing from the scope of the present limitations.
The three storage control units 100, 102, 104 may be at three different sites with the first storage unit 100 and the second storage unit 102 being within a synchronous communication distance of each other. The synchronous communication distance between two storage control units is the distance up to which synchronous communication is feasible between the two storage control units. The third storage unit 104 may be a long distance away from the second storage unit 102 and the first storage unit 100, such that synchronous copying of data from the second storage unit 102 to the third storage unit 104 may be time consuming or impractical. Additionally, the second storage unit 102 may be in a secure environment separated from the first storage unit 100 and with separate power to reduce the possibility of an outage affecting both the first storage unit 100 and the second storage unit 102. Certain implementations of the invention create a three site (local, intermediate, remote) disaster recovery solution where there may be no data loss if the first storage unit 100 is lost. In the three site disaster recovery solution, the first storage unit 100 is kept at the local site, the second storage unit 102 is kept at the intermediate site, and the third storage unit 104 is kept at the remote site. Data copied on the second storage unit 102 or the third storage unit 104 may be used to recover from the loss of the first storage unit 100. In certain alternative implementations, there may be less than three sites. For example, the first storage unit 100 and the second storage unit 102 may be at the same site. In additional alternative implementations of the invention, there may be more than three storage control units distributed among three or more sites. Furthermore, functions of a plurality of storage control units may be integrated into a single storage control unit, e.g., functions of the first storage unit 100 and the second storage unit 102 may be integrated into a single storage control unit.
The first storage unit 100 is coupled to a host via data interface channel 112. While only a single host 110 is shown coupled to the first storage unit 100, in certain implementations of the invention, a plurality of hosts may be coupled to the first storage unit 100. The host 110 may be any computational device known in the art, such as a personal computer, a workstation, a server, a mainframe, a hand held computer, a palm top computer, a telephony device, network appliance, etc. The host 110 may include any operating system (not shown) known in the art, such as the IBM OS/390* operating system. The host 110 may include at least one host application 114 that sends Input/Output (I/O) requests to the first storage unit 100.
The storage control units 100, 102, 104 are coupled to storage volumes such as local site storage volumes 116, intermediate site storage volumes 118, and remote site storage volumes 120, respectively. The storage volumes 116, 118, 120 may be configured as a Direct Access Storage Device (DASD), one or more RAID ranks, just a bunch of disks (JBOD), or any other data repository system known in the art.
The storage control units 100, 102, 104 may each include a cache, such as cache 122, 124, 126, respectively. The caches 122, 124, 126 comprise volatile memory to store tracks. The storage control units 100, 102, 104 may each include a non-volatile storage (NVS), such as non-volatile storage 128, 130, 132, respectively. The non-volatile storage 128, 130, 132 elements may buffer certain modified tracks in the caches 122, 124, 126, respectively.
The first storage unit 100 additionally includes an application, such as a local application 134, for synchronous copying of data stored in the cache 122, non-volatile storage 128, and local site storage volumes 116 to another storage control unit, such as the second storage unit 102. The local application 134 includes copy services functions that execute in the first storage unit 100. The first storage unit 100 receives I/O requests from the host application 114 to read and write to the local site storage volumes 116.
The second storage unit 102 additionally includes an application such as a cascading PPRC application 136. The cascading PPRC application 136 includes copy services functions that execute in the second storage unit 102. The cascading PPRC application 136 can interact with the first storage unit 100 to receive data synchronously. The cascading PPRC application 136 can also send data asynchronously to the third storage unit 104. Therefore, the cascading PPRC application 136 cascades a first pair of storage control units formed by the first storage unit 100 and the second storage unit 102 and the third storage unit 104. In alternative implementations of the invention, additional storage control units may be cascaded.
The third storage unit 104 additionally includes an application, such as a remote application 138, that can receive data asynchronously from another storage control unit such as the second storage unit 102. The remote application 138 includes copy services functions that execute in the third storage unit 104.
The second storage unit 102 also includes an out of synch (OOS) bitmap 140. The OOS bitmap 140 identifies those tracks having changed data on the intermediate site storage volumes 118, said data having been changed as a result of the synchronous PPRC updates received from the first storage unit 100. In addition, the OOS bitmap 140, as will be discussed in detail below, can be used to identify those tracks associated with the intermediate site storage volumes 118 which have been updated directly by the host application 114. The second storage unit 102 also includes a change recording (CR) bitmap 142. The CR bitmap 142 is capable of being toggled creating a preserved copy of the CR bitmap 144 at a point in time. Like the OOS bitmap 140, the CR bitmap 142 identifies tracks associated with the intermediate site storage volumes 118 which contain changed or updated data.
Therefore, FIG. 1 illustrates a computing environment where a host application 114 sends I/O requests to a first storage unit 100. The first storage unit 100 synchronously copies data to the second storage unit 102, and the second storage unit 102 asynchronously copies data to the third storage unit 104.
The local application 134 performs a synchronous data transfer, such as via synchronous PPRC 200, to a synchronous copy process 202 that may be generated by the cascading PPRC application 136. The synchronous data transfer 200 takes place over the data interface channel 106.
A background asynchronous copy process 204 that may be generated by the cascading PPRC application 136 performs an asynchronous data transfer, such as via Extended Distance PPRC 206, to the remote application 138. The asynchronous data transfer takes place over the data interface channel 108.
Since data from the first storage unit 100 are copied synchronously to the second storage unit 102, the intermediate site storage volumes 118 may include a copy of the local site storage volumes 116. In certain implementations of the invention the distance between the first storage unit 100 and the second storage unit 102 is kept as close as possible to minimize the performance impact of synchronous PPRC. Data is copied asynchronously from the second storage unit 102 to the third storage unit 104. As a result, the effect of long distance on the host response time is eliminated.
Therefore, FIG. 2 illustrates how the cascading PPRC application 136 on the second storage unit 102 receives data synchronously from the first storage unit 100 and transmits data asynchronously to the third storage unit 104.
Control starts at block 300 where the local application 134 receives a write request from the host application 114. The local application 134 writes (at block 302) data corresponding to the write request on the cache 122 and the non-volatile storage 128 on the first storage unit 100. Additional applications (not shown) such as caching applications and non-volatile storage applications in the first storage unit 100 may manage the data in the cache 122 and the data in the non-volatile storage 128 and keep the data in the cache 122 and the non-volatile storage 128 consistent with the data in the local site storage volumes 116.
The local application 134 determines (at block 304) if the first storage unit 100 is a primary PPRC device, i.e., the first storage unit includes source data for a PPRC transaction. If so, the local application 134 sends (at block 306) the written data to the second storage unit 10 via a new write request. The local application 134 waits (at block 308) for a write complete acknowledgement from the second storage unit 102. The local application 134 receives (at block 310) a write complete acknowledgement from the second storage unit 102. Therefore, the local application 134 has transferred the data written by the host application 114 on the first storage unit 100 to the second storage unit 102 via a synchronous copy.
The local application 134 signals (at block 312) to the host application 114 that the write request from the host application 114 has been completed at the first storage unit 100. The local application 134 receives (at block 300) a next write request from the host application 114.
If the local application 134 determines (at block 304) that the first storage unit 100 is not a primary PPRC device, i.e., the first storage unit is not a source device for a PPRC transaction, then the local application 134 does not have to send any data to the second storage unit 102, and the local application 134 signals (at block 312) to the host application 114 that the write request from the host application 114 has been completed at the first storage unit 100.
Therefore, FIG. 3 illustrates a logic for receiving a write request from the host application 114 to the first storage unit 100 and synchronously copying the data corresponding to the write request from the first storage unit 100 to the second storage unit 102. The host application 114 waits for the write request to be completed while the synchronous copying of the data takes place. Since the first storage unit 100 and the second storage unit 102 are within a synchronous communication distance of each other, the synchronous copying of data from the first storage unit 100 to the second storage unit 102 takes a smaller amount of time when compared to the situation where the first storage unit 100 is beyond a synchronous communication distance to the second storage unit 102. Since the copy of the data on the second storage unit 102 is written synchronously, the second storage unit 102 includes an equivalent copy of the data on the first storage unit 100.
Control starts at block 400 where the cascading PPRC application 136 receives a write request from the local application 134. For example, the write request sent at block 306 of FIG. 3 to the second storage unit 102 may be received by the cascading PPRC application 136. The cascading PPRC application 136 writes (at block 402) data corresponding to the write request to the cache 124 and the non-volatile storage 130. The second storage unit 102 may keep the cache 124 and the non-volatile storage 130 consistent with the intermediate storage volumes 118.
The cascading PPRC application 136 determines (at block 404) if data on the second storage unit 102 is to be cascaded, i.e., the data is set to be sent to the third storage unit 104. If so, the synchronous copy process 202 of the cascading PPRC application 136 marks (at block 406) data as PPRC modified. The synchronous copy process 202 of the cascading PPRC application 136 signals (at block 408) a write complete acknowledgement to the local application 134. The cascading PPRC application 136 receives (at block 400) the next write request from the local application 134.
If the cascading PPRC application 136 determines (at block 404) that data on the second storage unit 102 does not have to be cascaded, then the synchronous copy process 202 of the cascading PPRC application 136 signals (at block 408) a write complete acknowledgement to the local application 134 and the cascading PPRC application 136 receives (at block 400) the next request from the local application 134.
Therefore, FIG. 4 illustrates how the second storage unit 102 receives a write request from the first storage unit 100 where the write request responds to a host write request. The second storage unit 102 marks data corresponding to the host write request as PPRC modified.
Control starts at block 500 where the background asynchronous copy process 204 of the cascading PPRC application 136 determines the PPRC modified data stored in the cache 124, non-volatile storage 130, and the intermediate site storage volumes 118 of the second storage unit 102.
The background asynchronous copy process 204 of the cascading PPRC application 136 sends (at block 502) the PPRC modified data to the third storage unit 104 asynchronously, i.e., the background asynchronous copy process 204 keeps sending the PPRC modified data stored in the cache 124, non-volatile storage 130, and the intermediate site storage volumes 118 of the second storage unit 102.
After the PPRC modified data has been sent, the background asynchronous copy process 204 determines (at block 504) if the write complete acknowledgement has been received from the third storage unit 104. If not, the background asynchronous copy process 204 again determines (at block 504) if the write complete acknowledgement has been received.
If after all PPRC modified data has been sent, the background asynchronous copy process 204 determines (at block 504) that write complete acknowledgement has been received from the third storage unit 104 then the background asynchronous copy process 204 determines (at block 500) the PPRC modified data once again.
The logic of FIG. 5 illustrates how the background asynchronous copy process 204 while executing in the background copies of data asynchronously from the second storage unit 102 to the third storage unit 104. Since the copying is asynchronous, the second storage unit 102 and the third storage unit 104 may be separated by long distances, such as the extended distances allowed by Extended Distance PPRC.
In certain implementations of the invention, if the first storage unit 100 stops sending updates to the second storage unit 102 because of an outage at the local site that has the first storage unit 100, then the background asynchronous copy process 204 may quickly complete the copy of all remaining modified data to the third storage unit 104. At the completion of the copy, the remote site storage volumes 120 will include an equivalent copy of all updates up to the time of the outage. If there are multiple failures such that both the first storage unit 100 and the second storage unit are lost, then there may be data loss at the remote site.
Since the third storage unit 104 is updated asynchronously, the data on the third storage unit 104 may not be equivalent to the data on the first storage unit 100 unless all of the data from the second storage unit 102 has been copied up to some point in time. To maintain an equivalent copy of data at the third storage unit 104 in case of failure of both the first storage unit 100 and the second storage unit 102, certain implementations of the invention may force the data at the third storage unit to contain all dependent updates up to some specified time. The consistent copy at the third storage unit 104 may be preserved via a point in time copy, such as FlashCopy*. One method may include quiescing the host I/O temporarily at the local site while the third storage unit 104 catches up with the updates. Another method may prevent writes to the second storage unit 102 while the third storage unit 104 catches up with the updates.
The implementations create a long distance disaster recovery solution by first copying synchronously from a first storage unit to an second storage unit and subsequently copying asynchronously from the second storage unit to a third storage unit. The distance between the first storage unit and the second storage unit may be small enough such that copying data synchronously does not cause a significant performance impact on applications that perform I/O operations on the first storage unit.
In implementations of the invention, if either the first storage unit 100 or data on the first storage unit 100 is lost, then the data can be recovered from replicated copies of the data on either the second storage unit 102 or the remote control storage unit 104. In certain implementations, it may be preferable to recover the data from the second storage unit 102 as the data on the second storage unit 102 is always equivalent to the data on the first storage unit 100 since data is copied synchronously from the first storage unit 100 to the second storage unit 102.
The described techniques may be implemented as a method, apparatus or article of manufacture using standard programming and/or engineering techniques to produce software, firmware, hardware, or any combination thereof. The term “article of manufacture” as used herein refers to code or logic implemented in hardware logic (e.g., an integrated circuit chip, Programmable Gate Array (PGA), Application Specific Integrated Circuit (ASIC), etc.) or a computer readable medium (e.g., magnetic storage medium such as hard disk drives, floppy disks, tape), optical storage (e.g., CD-ROMs, optical disks, etc.), volatile and non-volatile memory devices (e.g., EEPROMs, ROMs, PROMs, RAMs, DRAMs, SRAMs, firmware, programmable logic, etc.). Code in the computer readable medium is accessed and executed by a processor. The code in which implementations are made may further be accessible through a transmission media or from a file server over a network. In such cases, the article of manufacture in which the code is implemented may comprise a transmission media such as network transmission line, wireless transmission media, signals propagating through space, radio waves, infrared signals, etc. Of course, those skilled in the art will recognize that many modifications may be made to this configuration without departing from the scope of the implementations and that the article of manufacture may comprise any information bearing medium known in the art.
In alternative implementations of the invention, the data transfer between the first storage unit 100 and the second storage unit 102 may be via Extended Distance PPRC. However, there may be data loss if there is an outage at the first storage unit 100. Additionally, in alternative implementations of the invention the data transfer between the second storage unit 102 and the third storage unit 104 may be via synchronous PPRC. However, there may be performance impacts on the I/O from the host 110 to the first storage unit 100.
In alternative implementations of the invention, the functions of the first storage unit 100 and the second storage unit 102 may be implemented in a single storage control unit. Furthermore, in additional implementations of the invention there may be more than three storage control units cascaded to each other. For example, a fourth storage control unit may be coupled to the third storage unit 104 and data may be transferred from the third storage unit 104 to the fourth storage control unit. In certain implementations of the invention, a chain of synchronous data transfers and a chain of asynchronous data transfers may take place among a plurality of cascaded storage control units. Furthermore, while the implementations have been described with storage control units, the storage control units may be any storage unit known in the art.
The logic of FIGS. 3 , 4, and 5 describe specific operations occurring in a particular order. Further, the operations may be performed in parallel as well as sequentially. In alternative implementations, certain of the logic operations may be performed in a different order, modified, or removed and still implement implementations of the present invention. Moreover, steps may be added to the above described logic and still conform to the implementations. Yet further steps may be performed by a single process or distributed processes.
Many of the software and hardware components have been described in separate modules for purposes of illustration. Such components may be integrated into a fewer number of components or divided into a larger number of components. Additionally, certain operations described as performed by a specific component may be performed by other components.
Therefore, the foregoing description of the implementations has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form disclosed. Many modifications and variations are possible in light of the above teaching. It is intended that the scope of the invention be limited not by this detailed description but rather by the claims appended hereto. The above specification, examples and data provide a complete description of the manufacture and use of the composition of the invention. Since many implementations of the invention can be made without departing from the spirit and scope of the invention, the invention resides in the claims hereinafter appended.
A generalized illustration of a method for recovery from a failure associated with the first storage unit 100 is shown in FIG. 6 . If the local (or primary) site fails, the balance of the data storage system is initially unaware of the failure. High availability cluster multi-processing (HACMP) or other management software detects the loss of the first storage unit 100 (step 600). Meanwhile, the Extended Distance PPRC relationship causing the asynchronous mirroring of data from the second storage unit 102 to the third storage unit 104 is intact and operational. Specifically, the PPRC relationship between the second storage unit 102 and the third storage unit 104 is accomplished as is shown in FIG. 7 . The identity of data tracks associated with the second storage unit 102 which have been modified by the synchronous mirroring of data prior to the failure of the first storage unit 100 are reflected in an out of synch (OOS) bitmap 140 (step 702). Continuously, data on tracks identified by the OOS bitmap 140 is asynchronously mirrored to the third storage unit (step 704). The OOS bitmap 140 will be employed in the recovery method in a new function to allow recovery of the first storage unit 100 without a full volume copy.
Upon detection of the failure associated with the first storage unit, the recovery program issues a command to the second storage unit 102 which can be a FAILOVER command suspending the synchronous mirroring of data from the first storage unit 100 to the second storage unit 102 (step 602). However, the direction of the synchronous PPRC pair (first storage unit 100 to second storage unit 102) is not reversed at this time. The host application 114 is directed to write data updates directly to the second storage unit 102 (step 604). These updates written from the host 110 to the second storage unit 102 are reflected in the existing OOS bitmap 140. In addition, changes to tracks associated with the second storage unit 102 are also reflected in a change recording (CR) bitmap 142 (FIG. 8 , step 802), set up as a result of the FAILOVER command.
After the commencement of host I/O updates being written directly to the second storage unit 102 along with the associated tracking of changes to the second storage unit 102 (steps 604, 802), the failure associated with the first storage unit 100 can be corrected. During the time period where corrections to the first storage unit are occurring, writes to the second storage unit 120 assure that minimal or no data is lost and normal operations can continue.
Prior to the time host I/O operations to the first storage unit 100 are resumed, the data stored on the first storage unit 100 must be synchronized with the data stored on the second storage unit 102. A preferred method of accomplishing this synchronization which avoids host I/O interruption or a full volume data copy is to use the OOS bitmap 140 and CR bitmap 142 and a reverse PPRC synchronous mirroring operation to synchronize the first storage unit. Since the OOS bitmap 140 is necessary to the asynchronous mirroring of data updates from the second storage unit 102 to the third storage unit 104 (steps 702, 704), it is necessary to initially suspend the asynchronous mirroring of data updates from the second storage unit 102 to the third storage unit 104 (step 606) prior to synchronizing the first storage unit 100. Then, updates stored on the second storage unit 102 can be synchronously mirrored to the first storage unit 100 (step 608).
Prior to resynchronization of the first storage unit 100, the relationship between the second storage unit 102 and the third storage unit 104 must be suspended (step 901). The resynchronization of the first storage unit is a two step process. First, changed data written by the host application 114 to the second storage unit 102 while the first storage unit 100 was not operational is copied to the first storage unit 100. This changed data is stored on tracks associated with the second storage unit 102 and reflected in the OOS bitmap associated with the second storage unit 102. During resynchronization, a first pass is made through the OOS bitmap and updates are copied to the first storage unit 100. During the first pass, host writes are not sent to the first storage unit 100, but are recorded in the OOS bitmap associated with the second storage unit 102. Then, a second pass is made through the OOS bitmap. During the second pass, host writes are sent synchronously from the second storage unit 102 to the first storage unit 100.
A method which avoids terminating host I/O operations during the bitmap manipulations necessary to resynchronize the first storage unit 100 is illustrated in FIG. 9 .
At the beginning of the bitmap manipulation, host I/O operations are writing changed data to tracks associated with the second storage unit 102. The identity of the tracks with updated data stored therein is reflected in both the OOS bitmap 140 and the CR bitmap 142 (step 902) at the second storage unit 102. Next, the recovery program can issue a command to the second storage unit 102 which causes a swap of the contents of the CR and OOS bitmaps 142, 140. This command can be a FAILBACK command. The second storage unit 102 is marked with a special indicator “primed for resynch” in preparation for the resynchronization between the second and third units 102, 104 at the end of the process. Also, as part of the recovery command, the resynchronization between the second and first storage units 102, 100 is started using the new swapped contents of the OOS bitmap 140 associated with the second storage unit 102 to determine which tracks to send. Throughout this process, host writes continue to be recorded in both the OOS bitmap 140 and the CR bitmap 142. The OOS bitmap 140 records differences between the second and first units 100, 102, and the CR bitmap records differences between the second and third storage units 102, 104.
Once the bitmaps have been manipulated as described above, changed data on tracks reflected by the OOS bitmap 140 can be copied from the second storage unit 102 to the first storage unit 100 (step 912) while new updates are synchronously mirrored from the second storage unit 102 to the first storage unit 100 until a full duplex between the storage units is reached (step 914).
When the recovery program determines that the first storage unit 100 is fully synchronized with the second storage unit 102 (they are in “full duplex”), host writes are no longer recorded in the OOS bitmap 140. Host writes that must be accounted on behalf of the third storage unit 104 are still recorded in the CR bitmap 142.
At this point in the process, host I/O operations must be quiesced to allow the PPRC relationship between the first storage unit 100 and the second storage unit 102 to be reestablished and to allow host I/O to be swapped back to the first storage unit 100 (step 610). After host I/O is swapped back to the first storage unit 100 (step 614), the asynchronous PPRC relationship between the second storage unit 102 and the third storage unit 104 may be reestablished (step 616).
Specifically, reestablishment of the asynchronous mirroring relationship from the second storage unit 102 to the third storage unit 104 occurs as is shown in FIG. 10 (step 1007). The recovery program is triggered to compare the CR bitmap 142 to the OOS bitmap 140 by the “primed for resync” indicator set by the FAILBACK command used to start the resynchronization of the first storage unit 100 and the second storage unit 102. Both bitmaps are associated with the second storage unit 102, and have been tracking data changes written synchronously to the second storage unit 102 since the recommencement of host I/O operations to the first storage unit 100. Next, the recovery program must add the identity of tracks containing changed data as identified by the CR bitmap 142 but not identified by the OOS bitmap 140 to the OOS bitmap 140 (step 1008). Then, changed data as identified by the OOS bitmap 140 can be mirrored from the second storage unit 102 to the third storage unit 102 (step 1010).
The described techniques for recovery from a failure in a cascading PPRC system may be implemented. The described techniques may be implemented as a method, apparatus or article of manufacture using standard programming and/or engineering techniques to produce software, firmware, hardware, or any combination thereof. The term “article of manufacture” as used herein refers to code or logic implemented in hardware logic (e.g., an integrated circuit chip, Programmable Gate Array (PGA), Application Specific Integrated Circuit (ASIC), etc.) or a computer readable medium (e.g., magnetic storage medium such as hard disk drives, floppy disks, tape), optical storage (e.g., CD-ROMs, optical disks, etc.), volatile and non-volatile memory devices (e.g., EEPROMs, ROMs, PROMs, RAMs, DRAMs, SRAMs, firmware, programmable logic, etc.). Code in the computer readable medium is accessed and executed by a processor. The code in which implementations are made may further be accessible through a transmission media or from a file server over a network. In such cases, the article of manufacture in which the code is implemented may comprise a transmission media such as network transmission line, wireless transmission media, signals propagating through space, radio waves, infrared signals, etc. Of course, those skilled in the art will recognize that many modifications may be made to this configuration without departing from the scope of the implementations and that the article of manufacture may comprise any information bearing medium known in the art.
The objects of the invention have been fully realized through the embodiments disclosed herein. Those skilled in the art will appreciate that the various aspects of the invention may be achieved through different embodiments without departing from the essential function of the invention. The particular embodiments are illustrative and not meant to limit the scope of the invention as set forth in the following claims.
Claims (15)
1. A method of recovery from a data storage system failure in a data storage system having a host computer writing data to a first storage unit associated with a first storage controller synchronously mirroring the data to a second storage unit associated with a second storage controller asynchronously mirroring the data to a third storage unit, the method comprising:
asynchronously mirroring the data to the third storage unit, comprising:
creating a first map associated with the second storage unit identifying tracks storing mirrored data which has been received on the second storage unit by the synchronous mirroring of the data from the first storage unit; and
continuously transmitting batches of the mirrored data on tracks identified by the first map from the second storage unit to the third storage unit;
detecting a failure associated with the first storage unit;
writing data updates directly to the second storage unit;
correcting the failure associated with the first storage unit; and
synchronously mirroring the data updates from the second storage unit to the first storage unit, comprising:
creating a second map associated with the second storage unit identifying the tracks storing changed data from a point in time where the host computer began writing directly to the second storage unit;
copying to the first storage unit the changed data on the tracks identified by the second map, comprising:
preserving a copy of the second map at a point in time at which copying the changed data to the first storage unit begins;
re-setting the second map to a clear state and thereafter identifying the tracks containing the changed data in the second map;
comparing the first map to the second map and adding the identity of the tracks containing the data updates, as identified by the first map but not identified by the second map, to the second map;
comparing the preserved copy of the second map to the first map and adding the identity of the tracks containing the changed data, as identified by the preserved copy of the second map but not identified by the first map, to the first map; and
copying the data updates on the tracks then identified by the first map from the second storage unit to the first storage unit; and
synchronously mirroring to the first storage unit further changed data written to the second storage unit by the host computer until a full duplex state between the first storage unit and the second storage unit is reached.
2. The method of claim 1 further comprising suspending the synchronous mirroring of the data from the first storage unit to the second storage unit upon detection of the failure associated with the first storage unit.
3. The method of claim 1 further comprising suspending the asynchronous mirroring of the data updates from the second storage unit to the third storage unit while synchronously mirroring the data updates from the second storage unit to the first storage unit.
4. The method of claim 3 further comprising:
re-establishing the synchronous mirroring of the data from the first storage unit to the second storage unit;
resuming the writing of the data from the host computer to the first storage unit; and
re-establishing the asynchronous mirroring of data from the second storage unit to the third storage unit.
5. The method of claim 1 wherein the synchronous mirroring to the first storage unit of the further updates written to the second storage unit by the host computer comprises:
continuing to identify the tracks having the data updates with the first map and with the second map; and
synchronously copying the data updates on the tracks identified by the first map from the second storage unit to the first storage unit.
6. The method of claim 1 further comprising the following steps when the full duplex state between the first storage unit and the second storage unit is reached:
terminating the use of the first map to identify the tracks associated with the second storage unit storing the data updates; and
continuing to use the second map to identify the tracks associated with the second storage unit storing changed data.
7. The method of claim 1 wherein the writing of the data by the host computer is quiesced after the full duplex state between the first storage unit and the second storage unit is reached.
8. The method of claim 4 wherein re-establishing the asynchronous mirroring of the data from the second storage unit to the third storage unit comprises:
comparing the second map to the first map and adding the identity of the tracks containing the changed data, as identified by the second map but not identified by the first map, to the first map; and
asynchronously mirroring the data updates on the tracks then identified by the first map from the second storage unit to the third storage unit.
9. A system for copying stored data and having the ability to recover from a failure, comprising:
a second storage controller associated with a second storage unit, the second storage controller synchronously receiving mirrored data from a first storage controller associated with a host computer and a first storage unit, the second storage controller further asynchronously mirroring the data to a third storage controller associated with a third storage unit, the second storage controller comprising:
a copy of the second map made at a point in time the synchronous mirroring of the data updates to the first storage unit begins; and
the second map set to a clear state at the point in time the synchronous mirroring of the data updates to the first storage unit begins and thereafter identifying the tracks containing new changed data;
means for detecting a failure associated with the first storage unit;
means for writing data updates from the host computer to the second storage controller upon detection of the failure associated with the first storage unit;
means for synchronously mirroring the data updates from the second storage unit to the first storage unit upon correction of the failure associated with the first storage unit.
10. The system for copying stored data of claim 9 wherein the second storage controller further comprises a first map identifying tracks storing the mirrored data which has been received on the second storage unit by the synchronous mirroring of the data from the first storage unit.
11. The system for copying stored data of claim 9 wherein the second storage controller further comprises a second map identifying the tracks storing changed data from a point in time where the host computer begins writing the data updates directly to the second storage unit.
12. The system for copying stored data of claim 9 wherein the second storage controller further comprises means for comparing the tracks identified by the first map to the tracks identified by one of the second map, and the preserved copy of the second map.
13. The system for copying stored data of claim 12 wherein the means for synchronously mirroring the data updates from the second storage unit to the first storage unit causes the following steps to be taken:
comparing the first map to the re-set second map and adding the identity of the tracks containing the data updates, as identified by the first map but not identified by the second map, to the second map;
comparing the preserved copy of the second map to the first map and adding the identity of the tracks containing the changed data, as identified by the preserved copy of the second map but not identified by the first map, to the first map; and
copying the data updates on the tracks then identified by the first map from the second storage unit to the first storage unit.
14. The system for copying stored data of claim 9 wherein the second storage controller further comprises means for terminating the asynchronous mirroring of the data to the third storage controller associated with the third storage unit upon the commencement of the synchronous mirroring of the data updates from the second storage unit to the first storage unit.
15. The system for copying stored data of claim 14 wherein the second storage controller further comprises:
means for re-establishing the asynchronous mirroring of the data to the third storage controller associated with the third storage unit; and
means for synchronizing the data stored on the third storage unit with the data stored on the second storage unit.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/674,866 US7188272B2 (en) | 2003-09-29 | 2003-09-29 | Method, system and article of manufacture for recovery from a failure in a cascading PPRC system |
US11/555,810 US7512835B2 (en) | 2003-09-29 | 2006-11-02 | Method, system and article of manufacture for recovery from a failure in a cascading PPRC system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/674,866 US7188272B2 (en) | 2003-09-29 | 2003-09-29 | Method, system and article of manufacture for recovery from a failure in a cascading PPRC system |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/555,810 Division US7512835B2 (en) | 2003-09-29 | 2006-11-02 | Method, system and article of manufacture for recovery from a failure in a cascading PPRC system |
Publications (2)
Publication Number | Publication Date |
---|---|
US20050081091A1 US20050081091A1 (en) | 2005-04-14 |
US7188272B2 true US7188272B2 (en) | 2007-03-06 |
Family
ID=34422074
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/674,866 Expired - Fee Related US7188272B2 (en) | 2003-09-29 | 2003-09-29 | Method, system and article of manufacture for recovery from a failure in a cascading PPRC system |
US11/555,810 Expired - Fee Related US7512835B2 (en) | 2003-09-29 | 2006-11-02 | Method, system and article of manufacture for recovery from a failure in a cascading PPRC system |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/555,810 Expired - Fee Related US7512835B2 (en) | 2003-09-29 | 2006-11-02 | Method, system and article of manufacture for recovery from a failure in a cascading PPRC system |
Country Status (1)
Country | Link |
---|---|
US (2) | US7188272B2 (en) |
Cited By (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060048014A1 (en) * | 2004-09-01 | 2006-03-02 | Masamitsu Takahashi | Data processing system and copy processing method thereof |
US20060143502A1 (en) * | 2004-12-10 | 2006-06-29 | Dell Products L.P. | System and method for managing failures in a redundant memory subsystem |
US20060161721A1 (en) * | 2004-04-23 | 2006-07-20 | Takashige Iwamura | Remote copying system with consistency guaranteed between a pair |
US20060168228A1 (en) * | 2004-12-21 | 2006-07-27 | Dell Products L.P. | System and method for maintaining data integrity in a cluster network |
US20060179343A1 (en) * | 2005-02-08 | 2006-08-10 | Hitachi, Ltd. | Method and apparatus for replicating volumes between heterogenous storage systems |
US20080168303A1 (en) * | 2007-01-04 | 2008-07-10 | International Business Machines Corporation | Storage management in cascaded replication of data |
US20080172572A1 (en) * | 2007-01-12 | 2008-07-17 | International Business Machines Corporation | Using virtual copies in a failover and failback environment |
US20090055689A1 (en) * | 2007-08-21 | 2009-02-26 | International Business Machines Corporation | Systems, methods, and computer products for coordinated disaster recovery |
US20100050014A1 (en) * | 2008-08-21 | 2010-02-25 | Bramante William J | Dual independent non volatile memory systems |
US20100146348A1 (en) * | 2008-12-08 | 2010-06-10 | International Business Machines Corporation | Efficient method and apparatus for keeping track of in flight data in a dual node storage controller |
US20100191708A1 (en) * | 2009-01-23 | 2010-07-29 | International Business Machines Corporation | Synchronous Deletion of Managed Files |
US20110185121A1 (en) * | 2010-01-28 | 2011-07-28 | International Business Machines Corporation | Mirroring multiple writeable storage arrays |
US20110208932A1 (en) * | 2008-10-30 | 2011-08-25 | International Business Machines Corporation | Flashcopy handling |
US20110225380A1 (en) * | 2010-03-11 | 2011-09-15 | International Business Machines Corporation | Multiple backup processes |
US20110225124A1 (en) * | 2010-03-11 | 2011-09-15 | International Business Machines Corporation | Creating a buffer point-in-time copy relationship for a point-in-time copy function executed to create a point-in-time copy relationship |
US8677088B1 (en) * | 2009-10-29 | 2014-03-18 | Symantec Corporation | Systems and methods for recovering primary sites after failovers to remote secondary sites |
US20140108756A1 (en) * | 2012-10-17 | 2014-04-17 | International Business Machines Corporation | Bitmap selection for remote copying of updates |
US8719523B2 (en) | 2011-10-03 | 2014-05-06 | International Business Machines Corporation | Maintaining multiple target copies |
US8788770B2 (en) | 2010-05-25 | 2014-07-22 | International Business Machines Corporation | Multiple cascaded backup process |
US8843721B2 (en) | 2009-09-24 | 2014-09-23 | International Business Machines Corporation | Data storage using bitmaps |
US8856472B2 (en) | 2011-09-23 | 2014-10-07 | International Business Machines Corporation | Restore in cascaded copy environment |
US8909985B2 (en) | 2012-07-12 | 2014-12-09 | International Business Machines Corporation | Multiple hyperswap replication sessions |
US9250808B2 (en) | 2009-09-25 | 2016-02-02 | International Business Machines Corporation | Data storage and moving of relatively infrequently accessed data among storage of different types |
US9514012B2 (en) | 2014-04-03 | 2016-12-06 | International Business Machines Corporation | Tertiary storage unit management in bidirectional data copying |
US20170364427A1 (en) * | 2016-06-20 | 2017-12-21 | International Business Machines Corporation | After swapping from a first storage to a second storage, mirroring data from the second storage to the first storage for data in the first storage that experienced data errors |
US10078566B2 (en) * | 2016-06-20 | 2018-09-18 | International Business Machines Corporation | Managing health conditions to determine when to restart replication after a swap triggered by a storage health event |
US10579285B2 (en) | 2018-02-07 | 2020-03-03 | International Business Machines Corporation | Automatic data healing by I/O |
US10585767B2 (en) | 2018-02-07 | 2020-03-10 | International Business Machines Corporation | Automatic data healing using a storage controller |
Families Citing this family (80)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6961804B2 (en) * | 2001-07-20 | 2005-11-01 | International Business Machines Corporation | Flexible techniques for associating cache memories with processors and main memory |
CN1241120C (en) * | 2001-08-31 | 2006-02-08 | 联想(北京)有限公司 | Method for backing up and recovering data in hard disk of computer |
US6730883B2 (en) * | 2002-10-02 | 2004-05-04 | Stratagene | Flexible heating cover assembly for thermal cycling of samples of biological material |
JP2005309550A (en) * | 2004-04-19 | 2005-11-04 | Hitachi Ltd | Remote copying method and system |
JP4282464B2 (en) * | 2003-12-17 | 2009-06-24 | 株式会社日立製作所 | Remote copy system |
US7457830B1 (en) * | 2004-02-27 | 2008-11-25 | Symantec Operating Corporation | Method and system of replicating data using a recovery data change log |
JP2005309793A (en) | 2004-04-22 | 2005-11-04 | Hitachi Ltd | Data processing system |
US7707372B1 (en) * | 2004-06-30 | 2010-04-27 | Symantec Operating Corporation | Updating a change track map based on a mirror recovery map |
JP4401895B2 (en) * | 2004-08-09 | 2010-01-20 | 株式会社日立製作所 | Computer system, computer and its program. |
US7360113B2 (en) | 2004-08-30 | 2008-04-15 | Mendocino Software, Inc. | Protocol for communicating data block copies in an error recovery environment |
US20060047714A1 (en) * | 2004-08-30 | 2006-03-02 | Mendocino Software, Inc. | Systems and methods for rapid presentation of historical views of stored data |
US7664983B2 (en) * | 2004-08-30 | 2010-02-16 | Symantec Corporation | Systems and methods for event driven recovery management |
US8078813B2 (en) * | 2004-09-30 | 2011-12-13 | Emc Corporation | Triangular asynchronous replication |
US8275749B2 (en) * | 2005-02-07 | 2012-09-25 | Mimosa Systems, Inc. | Enterprise server version migration through identity preservation |
US7870416B2 (en) * | 2005-02-07 | 2011-01-11 | Mimosa Systems, Inc. | Enterprise service availability through identity preservation |
US7657780B2 (en) * | 2005-02-07 | 2010-02-02 | Mimosa Systems, Inc. | Enterprise service availability through identity preservation |
US8918366B2 (en) * | 2005-02-07 | 2014-12-23 | Mimosa Systems, Inc. | Synthetic full copies of data and dynamic bulk-to-brick transformation |
US7778976B2 (en) * | 2005-02-07 | 2010-08-17 | Mimosa, Inc. | Multi-dimensional surrogates for data management |
US8812433B2 (en) * | 2005-02-07 | 2014-08-19 | Mimosa Systems, Inc. | Dynamic bulk-to-brick transformation of data |
US8161318B2 (en) * | 2005-02-07 | 2012-04-17 | Mimosa Systems, Inc. | Enterprise service availability through identity preservation |
US8271436B2 (en) * | 2005-02-07 | 2012-09-18 | Mimosa Systems, Inc. | Retro-fitting synthetic full copies of data |
US7917475B2 (en) * | 2005-02-07 | 2011-03-29 | Mimosa Systems, Inc. | Enterprise server version migration through identity preservation |
US8543542B2 (en) * | 2005-02-07 | 2013-09-24 | Mimosa Systems, Inc. | Synthetic full copies of data and dynamic bulk-to-brick transformation |
US8799206B2 (en) * | 2005-02-07 | 2014-08-05 | Mimosa Systems, Inc. | Dynamic bulk-to-brick transformation of data |
US20060236149A1 (en) * | 2005-04-14 | 2006-10-19 | Dell Products L.P. | System and method for rebuilding a storage disk |
US7672979B1 (en) * | 2005-04-22 | 2010-03-02 | Symantec Operating Corporation | Backup and restore techniques using inconsistent state indicators |
US7627775B2 (en) * | 2005-12-13 | 2009-12-01 | International Business Machines Corporation | Managing failures in mirrored systems |
US20070234342A1 (en) * | 2006-01-25 | 2007-10-04 | Flynn John T Jr | System and method for relocating running applications to topologically remotely located computing systems |
US7603581B2 (en) * | 2006-03-17 | 2009-10-13 | International Business Machines Corporation | Remote copying of updates to primary and secondary storage locations subject to a copy relationship |
US7613749B2 (en) * | 2006-04-12 | 2009-11-03 | International Business Machines Corporation | System and method for application fault tolerance and recovery using topologically remotely located computing devices |
US7788231B2 (en) | 2006-04-18 | 2010-08-31 | International Business Machines Corporation | Using a heartbeat signal to maintain data consistency for writes to source storage copied to target storage |
US7627729B2 (en) * | 2006-09-07 | 2009-12-01 | International Business Machines Corporation | Apparatus, system, and method for an improved synchronous mirror swap |
US7594138B2 (en) * | 2007-01-31 | 2009-09-22 | International Business Machines Corporation | System and method of error recovery for backup applications |
US7975109B2 (en) | 2007-05-30 | 2011-07-05 | Schooner Information Technology, Inc. | System including a fine-grained memory and a less-fine-grained memory |
US8250323B2 (en) * | 2007-12-06 | 2012-08-21 | International Business Machines Corporation | Determining whether to use a repository to store data updated during a resynchronization |
US8307129B2 (en) * | 2008-01-14 | 2012-11-06 | International Business Machines Corporation | Methods and computer program products for swapping synchronous replication secondaries from a subchannel set other than zero to subchannel set zero using dynamic I/O |
US7761610B2 (en) * | 2008-01-25 | 2010-07-20 | International Business Machines Corporation | Methods and computer program products for defining synchronous replication devices in a subchannel set other than subchannel set zero |
US8229945B2 (en) | 2008-03-20 | 2012-07-24 | Schooner Information Technology, Inc. | Scalable database management software on a cluster of nodes using a shared-distributed flash memory |
US8732386B2 (en) | 2008-03-20 | 2014-05-20 | Sandisk Enterprise IP LLC. | Sharing data fabric for coherent-distributed caching of multi-node shared-distributed flash memory |
US7908514B2 (en) * | 2008-06-26 | 2011-03-15 | Microsoft Corporation | Minimizing data loss in asynchronous replication solution using distributed redundancy |
JP5147570B2 (en) | 2008-07-02 | 2013-02-20 | 株式会社日立製作所 | Storage system and remote copy recovery method |
JP5422147B2 (en) * | 2008-07-08 | 2014-02-19 | 株式会社日立製作所 | Remote copy system and remote copy method |
US8516173B2 (en) * | 2008-07-28 | 2013-08-20 | International Business Machines Corporation | Swapping PPRC secondaries from a subchannel set other than zero to subchannel set zero using control block field manipulation |
US8055943B2 (en) * | 2009-04-24 | 2011-11-08 | International Business Machines Corporation | Synchronous and asynchronous continuous data protection |
WO2011043892A2 (en) * | 2009-10-06 | 2011-04-14 | Motorola Mobility, Inc. | Method and system for restoring a server interface for a mobile device |
US9047351B2 (en) | 2010-04-12 | 2015-06-02 | Sandisk Enterprise Ip Llc | Cluster of processing nodes with distributed global flash memory using commodity server technology |
US9164554B2 (en) | 2010-04-12 | 2015-10-20 | Sandisk Enterprise Ip Llc | Non-volatile solid-state storage system supporting high bandwidth and random access |
US8700842B2 (en) | 2010-04-12 | 2014-04-15 | Sandisk Enterprise Ip Llc | Minimizing write operations to a flash memory-based object store |
US8868487B2 (en) | 2010-04-12 | 2014-10-21 | Sandisk Enterprise Ip Llc | Event processing in a flash memory-based object store |
US8856593B2 (en) | 2010-04-12 | 2014-10-07 | Sandisk Enterprise Ip Llc | Failure recovery using consensus replication in a distributed flash memory system |
US8954385B2 (en) | 2010-06-28 | 2015-02-10 | Sandisk Enterprise Ip Llc | Efficient recovery of transactional data stores |
US8694733B2 (en) | 2011-01-03 | 2014-04-08 | Sandisk Enterprise Ip Llc | Slave consistency in a synchronous replication environment |
US8706946B2 (en) | 2011-02-11 | 2014-04-22 | International Business Machines Corporation | Extender storage pool system |
WO2012127537A1 (en) * | 2011-03-24 | 2012-09-27 | Hitachi, Ltd. | Computer system and data backup method |
US8874515B2 (en) | 2011-04-11 | 2014-10-28 | Sandisk Enterprise Ip Llc | Low level object version tracking using non-volatile memory write generations |
CN102761566B (en) * | 2011-04-26 | 2015-09-23 | 国际商业机器公司 | The method and apparatus of migration virtual machine |
US8806268B2 (en) | 2011-09-29 | 2014-08-12 | International Business Machines Corporation | Communication of conditions at a primary storage controller to a host |
JP5862246B2 (en) * | 2011-11-30 | 2016-02-16 | 富士通株式会社 | Data management program, data management method, and storage apparatus |
US9135064B2 (en) | 2012-03-07 | 2015-09-15 | Sandisk Enterprise Ip Llc | Fine grained adaptive throttling of background processes |
US9251230B2 (en) | 2012-10-17 | 2016-02-02 | International Business Machines Corporation | Exchanging locations of an out of synchronization indicator and a change recording indicator via pointers |
US9251231B2 (en) | 2012-10-17 | 2016-02-02 | International Business Machines Corporation | Merging an out of synchronization indicator and a change recording indicator in response to a failure in consistency group formation |
US9444889B1 (en) | 2013-02-08 | 2016-09-13 | Quantcast Corporation | Managing distributed system performance using accelerated data retrieval operations |
US9032172B2 (en) * | 2013-02-11 | 2015-05-12 | International Business Machines Corporation | Systems, methods and computer program products for selective copying of track data through peer-to-peer remote copy |
US9405628B2 (en) | 2013-09-23 | 2016-08-02 | International Business Machines Corporation | Data migration using multi-storage volume swap |
US9558080B2 (en) * | 2013-10-31 | 2017-01-31 | Microsoft Technology Licensing, Llc | Crash recovery using non-volatile memory |
US9619331B2 (en) | 2014-01-18 | 2017-04-11 | International Business Machines Corporation | Storage unit replacement using point-in-time snap copy |
US9626367B1 (en) | 2014-06-18 | 2017-04-18 | Veritas Technologies Llc | Managing a backup procedure |
JP2016024656A (en) * | 2014-07-22 | 2016-02-08 | 富士通株式会社 | Storage controller, storage system and storage control program |
US9542277B2 (en) | 2014-09-30 | 2017-01-10 | International Business Machines Corporation | High availability protection for asynchronous disaster recovery |
US20160100004A1 (en) * | 2014-10-06 | 2016-04-07 | International Business Machines Corporation | Data replication across servers |
US9836367B2 (en) | 2015-08-28 | 2017-12-05 | Netapp, Inc. | Trust relationship migration for data mirroring |
US9430163B1 (en) * | 2015-12-15 | 2016-08-30 | International Business Machines Corporation | Implementing synchronization for remote disk mirroring |
CN106933493B (en) * | 2015-12-30 | 2020-04-24 | 伊姆西Ip控股有限责任公司 | Method and equipment for capacity expansion of cache disk array |
CN105760264A (en) * | 2016-02-04 | 2016-07-13 | 浪潮电子信息产业股份有限公司 | Method and device for detecting faulty hardware equipment of server |
US9891849B2 (en) | 2016-04-14 | 2018-02-13 | International Business Machines Corporation | Accelerated recovery in data replication environments |
US11474707B2 (en) | 2016-06-03 | 2022-10-18 | International Business Machines Corporation | Data loss recovery in a secondary storage controller from a primary storage controller |
US10437730B2 (en) | 2016-08-22 | 2019-10-08 | International Business Machines Corporation | Read cache synchronization in data replication environments |
US10331363B2 (en) | 2017-11-22 | 2019-06-25 | Seagate Technology Llc | Monitoring modifications to data blocks |
US11853585B2 (en) * | 2020-01-27 | 2023-12-26 | International Business Machines Corporation | Performing a point-in-time snapshot copy operation within a data consistency application |
WO2022258166A1 (en) * | 2021-06-09 | 2022-12-15 | Huawei Technologies Co., Ltd. | Cascaded continuous data protection system and method of initial synchronization therein |
Citations (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5155845A (en) * | 1990-06-15 | 1992-10-13 | Storage Technology Corporation | Data storage system for providing redundant copies of data on different disk drives |
US5504861A (en) * | 1994-02-22 | 1996-04-02 | International Business Machines Corporation | Remote data duplexing |
US5544347A (en) * | 1990-09-24 | 1996-08-06 | Emc Corporation | Data storage system controlled remote data mirroring with respectively maintained data indices |
US5555371A (en) * | 1992-12-17 | 1996-09-10 | International Business Machines Corporation | Data backup copying with delayed directory updating and reduced numbers of DASD accesses at a back up site using a log structured array data storage |
US5657440A (en) * | 1994-03-21 | 1997-08-12 | International Business Machines Corporation | Asynchronous remote data copying using subsystem to subsystem communication |
US5720029A (en) * | 1995-07-25 | 1998-02-17 | International Business Machines Corporation | Asynchronously shadowing record updates in a remote copy session using track arrays |
US5889935A (en) * | 1996-05-28 | 1999-03-30 | Emc Corporation | Disaster control features for remote data mirroring |
US6304980B1 (en) * | 1996-03-13 | 2001-10-16 | International Business Machines Corporation | Peer-to-peer backup system with failure-triggered device switching honoring reservation of primary device |
US6308284B1 (en) * | 1998-08-28 | 2001-10-23 | Emc Corporation | Method and apparatus for maintaining data coherency |
US6463501B1 (en) * | 1999-10-21 | 2002-10-08 | International Business Machines Corporation | Method, system and program for maintaining data consistency among updates across groups of storage areas using update times |
US6480970B1 (en) * | 2000-05-17 | 2002-11-12 | Lsi Logic Corporation | Method of verifying data consistency between local and remote mirrored data storage systems |
US6490596B1 (en) | 1999-11-09 | 2002-12-03 | International Business Machines Corporation | Method of transmitting streamlined data updates by selectively omitting unchanged data parts |
US6499112B1 (en) | 2000-03-28 | 2002-12-24 | Storage Technology Corporation | Automatic stand alone recovery for peer to peer remote copy (PPRC) operations |
WO2003028183A1 (en) | 2001-09-28 | 2003-04-03 | Commvault Systems, Inc. | System and method for generating and managing quick recovery volumes |
US6654912B1 (en) * | 2000-10-04 | 2003-11-25 | Network Appliance, Inc. | Recovery of file system data in file servers mirrored file system volumes |
US20040034808A1 (en) * | 2002-08-16 | 2004-02-19 | International Business Machines Corporation | Method, system, and program for providing a mirror copy of data |
US20040039959A1 (en) * | 2002-08-21 | 2004-02-26 | Lecrone Douglas E. | SAR restart and going home procedures |
US6718352B1 (en) * | 2001-03-20 | 2004-04-06 | Emc Corporation | Methods and apparatus for managing a data set stored on a data storage device |
US6742138B1 (en) * | 2001-06-12 | 2004-05-25 | Emc Corporation | Data recovery method and apparatus |
US6772303B2 (en) * | 1998-08-13 | 2004-08-03 | International Business Machines Corporation | System and method for dynamically resynchronizing backup data |
US6813683B2 (en) * | 2000-01-28 | 2004-11-02 | Hitachi, Ltd. | Method and apparatus for copying data from a main site to a remote site |
US20040260899A1 (en) * | 2003-06-18 | 2004-12-23 | Kern Robert Frederic | Method, system, and program for handling a failover to a remote storage location |
US20040260873A1 (en) * | 2003-06-17 | 2004-12-23 | Hitachi, Ltd. | Method and apparatus for managing replication volumes |
US6842825B2 (en) * | 2002-08-07 | 2005-01-11 | International Business Machines Corporation | Adjusting timestamps to preserve update timing information for cached data objects |
US6948089B2 (en) * | 2002-01-10 | 2005-09-20 | Hitachi, Ltd. | Apparatus and method for multiple generation remote backup and fast restore |
US7017076B2 (en) * | 2003-07-29 | 2006-03-21 | Hitachi, Ltd. | Apparatus and storage system for controlling acquisition of snapshot |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5870537A (en) | 1996-03-13 | 1999-02-09 | International Business Machines Corporation | Concurrent switch to shadowed device for storage controller and device errors |
-
2003
- 2003-09-29 US US10/674,866 patent/US7188272B2/en not_active Expired - Fee Related
-
2006
- 2006-11-02 US US11/555,810 patent/US7512835B2/en not_active Expired - Fee Related
Patent Citations (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5155845A (en) * | 1990-06-15 | 1992-10-13 | Storage Technology Corporation | Data storage system for providing redundant copies of data on different disk drives |
US5544347A (en) * | 1990-09-24 | 1996-08-06 | Emc Corporation | Data storage system controlled remote data mirroring with respectively maintained data indices |
US5555371A (en) * | 1992-12-17 | 1996-09-10 | International Business Machines Corporation | Data backup copying with delayed directory updating and reduced numbers of DASD accesses at a back up site using a log structured array data storage |
US5742792A (en) * | 1993-04-23 | 1998-04-21 | Emc Corporation | Remote data mirroring |
US5504861A (en) * | 1994-02-22 | 1996-04-02 | International Business Machines Corporation | Remote data duplexing |
US5657440A (en) * | 1994-03-21 | 1997-08-12 | International Business Machines Corporation | Asynchronous remote data copying using subsystem to subsystem communication |
US5720029A (en) * | 1995-07-25 | 1998-02-17 | International Business Machines Corporation | Asynchronously shadowing record updates in a remote copy session using track arrays |
US6304980B1 (en) * | 1996-03-13 | 2001-10-16 | International Business Machines Corporation | Peer-to-peer backup system with failure-triggered device switching honoring reservation of primary device |
US5889935A (en) * | 1996-05-28 | 1999-03-30 | Emc Corporation | Disaster control features for remote data mirroring |
US6772303B2 (en) * | 1998-08-13 | 2004-08-03 | International Business Machines Corporation | System and method for dynamically resynchronizing backup data |
US6308284B1 (en) * | 1998-08-28 | 2001-10-23 | Emc Corporation | Method and apparatus for maintaining data coherency |
US6463501B1 (en) * | 1999-10-21 | 2002-10-08 | International Business Machines Corporation | Method, system and program for maintaining data consistency among updates across groups of storage areas using update times |
US6490596B1 (en) | 1999-11-09 | 2002-12-03 | International Business Machines Corporation | Method of transmitting streamlined data updates by selectively omitting unchanged data parts |
US6813683B2 (en) * | 2000-01-28 | 2004-11-02 | Hitachi, Ltd. | Method and apparatus for copying data from a main site to a remote site |
US6499112B1 (en) | 2000-03-28 | 2002-12-24 | Storage Technology Corporation | Automatic stand alone recovery for peer to peer remote copy (PPRC) operations |
US6480970B1 (en) * | 2000-05-17 | 2002-11-12 | Lsi Logic Corporation | Method of verifying data consistency between local and remote mirrored data storage systems |
US6654912B1 (en) * | 2000-10-04 | 2003-11-25 | Network Appliance, Inc. | Recovery of file system data in file servers mirrored file system volumes |
US6718352B1 (en) * | 2001-03-20 | 2004-04-06 | Emc Corporation | Methods and apparatus for managing a data set stored on a data storage device |
US6742138B1 (en) * | 2001-06-12 | 2004-05-25 | Emc Corporation | Data recovery method and apparatus |
WO2003028183A1 (en) | 2001-09-28 | 2003-04-03 | Commvault Systems, Inc. | System and method for generating and managing quick recovery volumes |
US6948089B2 (en) * | 2002-01-10 | 2005-09-20 | Hitachi, Ltd. | Apparatus and method for multiple generation remote backup and fast restore |
US6842825B2 (en) * | 2002-08-07 | 2005-01-11 | International Business Machines Corporation | Adjusting timestamps to preserve update timing information for cached data objects |
US20040034808A1 (en) * | 2002-08-16 | 2004-02-19 | International Business Machines Corporation | Method, system, and program for providing a mirror copy of data |
US20040039959A1 (en) * | 2002-08-21 | 2004-02-26 | Lecrone Douglas E. | SAR restart and going home procedures |
US20040260873A1 (en) * | 2003-06-17 | 2004-12-23 | Hitachi, Ltd. | Method and apparatus for managing replication volumes |
US20040260899A1 (en) * | 2003-06-18 | 2004-12-23 | Kern Robert Frederic | Method, system, and program for handling a failover to a remote storage location |
US7017076B2 (en) * | 2003-07-29 | 2006-03-21 | Hitachi, Ltd. | Apparatus and storage system for controlling acquisition of snapshot |
Cited By (66)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060161721A1 (en) * | 2004-04-23 | 2006-07-20 | Takashige Iwamura | Remote copying system with consistency guaranteed between a pair |
US8161009B2 (en) * | 2004-04-23 | 2012-04-17 | Hitachi, Ltd. | Remote copying system with consistency guaranteed between a pair |
US7370228B2 (en) * | 2004-09-01 | 2008-05-06 | Hitachi, Ltd. | Data processing system and copy processing method thereof |
US20060048014A1 (en) * | 2004-09-01 | 2006-03-02 | Masamitsu Takahashi | Data processing system and copy processing method thereof |
US20080178041A1 (en) * | 2004-09-01 | 2008-07-24 | Masamitsu Takahashi | Data processing system and copy processing method thereof |
US7870423B2 (en) | 2004-09-01 | 2011-01-11 | Hitachi, Ltd. | Data processing system and copy processing method thereof |
US20060143502A1 (en) * | 2004-12-10 | 2006-06-29 | Dell Products L.P. | System and method for managing failures in a redundant memory subsystem |
US20060168228A1 (en) * | 2004-12-21 | 2006-07-27 | Dell Products L.P. | System and method for maintaining data integrity in a cluster network |
US7519851B2 (en) * | 2005-02-08 | 2009-04-14 | Hitachi, Ltd. | Apparatus for replicating volumes between heterogenous storage systems |
US20060179343A1 (en) * | 2005-02-08 | 2006-08-10 | Hitachi, Ltd. | Method and apparatus for replicating volumes between heterogenous storage systems |
US7702953B2 (en) * | 2007-01-04 | 2010-04-20 | International Business Machines Corporation | Storage management in cascaded replication of data |
US20080168303A1 (en) * | 2007-01-04 | 2008-07-10 | International Business Machines Corporation | Storage management in cascaded replication of data |
US20080172572A1 (en) * | 2007-01-12 | 2008-07-17 | International Business Machines Corporation | Using virtual copies in a failover and failback environment |
US8060779B2 (en) | 2007-01-12 | 2011-11-15 | International Business Machines Corporation | Using virtual copies in a failover and failback environment |
US20100192008A1 (en) * | 2007-01-12 | 2010-07-29 | International Business Machines Corporation | Using virtual copies in a failover and failback environment |
US7793148B2 (en) * | 2007-01-12 | 2010-09-07 | International Business Machines Corporation | Using virtual copies in a failover and failback environment |
US20090055689A1 (en) * | 2007-08-21 | 2009-02-26 | International Business Machines Corporation | Systems, methods, and computer products for coordinated disaster recovery |
US20100050014A1 (en) * | 2008-08-21 | 2010-02-25 | Bramante William J | Dual independent non volatile memory systems |
US7882388B2 (en) * | 2008-08-21 | 2011-02-01 | Sierra Wireless America, Inc. | Dual independent non volatile memory systems |
US20110208932A1 (en) * | 2008-10-30 | 2011-08-25 | International Business Machines Corporation | Flashcopy handling |
US8688936B2 (en) * | 2008-10-30 | 2014-04-01 | International Business Machines Corporation | Point-in-time copies in a cascade using maps and fdisks |
US8713272B2 (en) | 2008-10-30 | 2014-04-29 | International Business Machines Corporation | Point-in-time copies in a cascade using maps and fdisks |
US20100146348A1 (en) * | 2008-12-08 | 2010-06-10 | International Business Machines Corporation | Efficient method and apparatus for keeping track of in flight data in a dual node storage controller |
US8176363B2 (en) * | 2008-12-08 | 2012-05-08 | International Business Machines Corporation | Efficient method and apparatus for keeping track of in flight data in a dual node storage controller |
US20100191708A1 (en) * | 2009-01-23 | 2010-07-29 | International Business Machines Corporation | Synchronous Deletion of Managed Files |
US8862848B2 (en) | 2009-09-24 | 2014-10-14 | International Business Machines Corporation | Data storage using bitmaps |
US8843721B2 (en) | 2009-09-24 | 2014-09-23 | International Business Machines Corporation | Data storage using bitmaps |
US9256367B2 (en) | 2009-09-25 | 2016-02-09 | International Business Machines Corporation | Data storage and moving of relatively infrequently accessed data among storage of different types |
US9250808B2 (en) | 2009-09-25 | 2016-02-02 | International Business Machines Corporation | Data storage and moving of relatively infrequently accessed data among storage of different types |
US8677088B1 (en) * | 2009-10-29 | 2014-03-18 | Symantec Corporation | Systems and methods for recovering primary sites after failovers to remote secondary sites |
US20110185121A1 (en) * | 2010-01-28 | 2011-07-28 | International Business Machines Corporation | Mirroring multiple writeable storage arrays |
US9766826B2 (en) | 2010-01-28 | 2017-09-19 | International Business Machines Corporation | Mirroring multiple writeable storage arrays |
US9304696B2 (en) | 2010-01-28 | 2016-04-05 | International Business Machines Corporation | Mirroring multiple writeable storage arrays |
US8862816B2 (en) | 2010-01-28 | 2014-10-14 | International Business Machines Corporation | Mirroring multiple writeable storage arrays |
US20110225124A1 (en) * | 2010-03-11 | 2011-09-15 | International Business Machines Corporation | Creating a buffer point-in-time copy relationship for a point-in-time copy function executed to create a point-in-time copy relationship |
US8533411B2 (en) | 2010-03-11 | 2013-09-10 | International Business Machines Corporation | Multiple backup processes |
US8285679B2 (en) | 2010-03-11 | 2012-10-09 | International Business Machines Corporation | Creating a buffer point-in-time copy relationship for a point-in-time copy function executed to create a point-in-time copy relationship |
US20110225380A1 (en) * | 2010-03-11 | 2011-09-15 | International Business Machines Corporation | Multiple backup processes |
US8566282B2 (en) | 2010-03-11 | 2013-10-22 | International Business Machines Corporation | Creating a buffer point-in-time copy relationship for a point-in-time copy function executed to create a point-in-time copy relationship |
US8788770B2 (en) | 2010-05-25 | 2014-07-22 | International Business Machines Corporation | Multiple cascaded backup process |
US8793453B2 (en) | 2010-05-25 | 2014-07-29 | International Business Machines Corporation | Multiple cascaded backup process |
US9514004B2 (en) | 2011-09-23 | 2016-12-06 | International Business Machines Corporation | Restore in cascaded copy environment |
US8856472B2 (en) | 2011-09-23 | 2014-10-07 | International Business Machines Corporation | Restore in cascaded copy environment |
US8868860B2 (en) | 2011-09-23 | 2014-10-21 | International Business Machines Corporation | Restore in cascaded copy environment |
US8719523B2 (en) | 2011-10-03 | 2014-05-06 | International Business Machines Corporation | Maintaining multiple target copies |
US8732419B2 (en) | 2011-10-03 | 2014-05-20 | International Business Machines Corporation | Maintaining multiple target copies |
US8914671B2 (en) | 2012-07-12 | 2014-12-16 | International Business Machines Corporation | Multiple hyperswap replication sessions |
US8909985B2 (en) | 2012-07-12 | 2014-12-09 | International Business Machines Corporation | Multiple hyperswap replication sessions |
US9141639B2 (en) * | 2012-10-17 | 2015-09-22 | International Business Machines Corporation | Bitmap selection for remote copying of updates |
US9092449B2 (en) * | 2012-10-17 | 2015-07-28 | International Business Machines Corporation | Bitmap selection for remote copying of updates |
US9483366B2 (en) | 2012-10-17 | 2016-11-01 | International Business Machines Corporation | Bitmap selection for remote copying of updates |
US20140108857A1 (en) * | 2012-10-17 | 2014-04-17 | International Business Machines Corporation | Bitmap selection for remote copying of updates |
US20140108756A1 (en) * | 2012-10-17 | 2014-04-17 | International Business Machines Corporation | Bitmap selection for remote copying of updates |
US9514012B2 (en) | 2014-04-03 | 2016-12-06 | International Business Machines Corporation | Tertiary storage unit management in bidirectional data copying |
US10146472B2 (en) | 2014-04-03 | 2018-12-04 | International Business Machines Corporation | Tertiary storage unit management in bidirectional data copying |
US20180276089A1 (en) * | 2016-06-20 | 2018-09-27 | International Business Machines Corporation | After swapping from a first storage to a second storage, mirroring data from the second storage to the first storage for data in the first storage that experienced data errors |
US10083099B2 (en) * | 2016-06-20 | 2018-09-25 | International Business Machines Corporation | After swapping from a first storage to a second storage, mirroring data from the second storage to the first storage for data in the first storage that experienced data errors |
US10078566B2 (en) * | 2016-06-20 | 2018-09-18 | International Business Machines Corporation | Managing health conditions to determine when to restart replication after a swap triggered by a storage health event |
US20180293145A1 (en) * | 2016-06-20 | 2018-10-11 | International Business Machines Corporation | Managing health conditions to determine when to restart replication after a swap triggered by a storage health event |
US20170364427A1 (en) * | 2016-06-20 | 2017-12-21 | International Business Machines Corporation | After swapping from a first storage to a second storage, mirroring data from the second storage to the first storage for data in the first storage that experienced data errors |
US10846187B2 (en) * | 2016-06-20 | 2020-11-24 | International Business Machines Corporation | Managing health conditions to determine when to restart replication after a swap triggered by a storage health event |
US10977142B2 (en) * | 2016-06-20 | 2021-04-13 | International Business Machines Corporation | After swapping from a first storage to a second storage, mirroring data from the second storage to the first storage for data in the first storage that experienced data errors |
US10579285B2 (en) | 2018-02-07 | 2020-03-03 | International Business Machines Corporation | Automatic data healing by I/O |
US10585767B2 (en) | 2018-02-07 | 2020-03-10 | International Business Machines Corporation | Automatic data healing using a storage controller |
US11099953B2 (en) | 2018-02-07 | 2021-08-24 | International Business Machines Corporation | Automatic data healing using a storage controller |
US11226746B2 (en) | 2018-02-07 | 2022-01-18 | International Business Machines Corporation | Automatic data healing by I/O |
Also Published As
Publication number | Publication date |
---|---|
US7512835B2 (en) | 2009-03-31 |
US20070061531A1 (en) | 2007-03-15 |
US20050081091A1 (en) | 2005-04-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7188272B2 (en) | Method, system and article of manufacture for recovery from a failure in a cascading PPRC system | |
US7610318B2 (en) | Autonomic infrastructure enablement for point in time copy consistency | |
US7921273B2 (en) | Method, system, and article of manufacture for remote copying of data | |
US7278049B2 (en) | Method, system, and program for recovery from a failure in an asynchronous data copying system | |
US5682513A (en) | Cache queue entry linking for DASD record updates | |
US5870537A (en) | Concurrent switch to shadowed device for storage controller and device errors | |
JP3655963B2 (en) | Storage controller, data storage system including the same, and dual pair suppression method | |
US6304980B1 (en) | Peer-to-peer backup system with failure-triggered device switching honoring reservation of primary device | |
US7134044B2 (en) | Method, system, and program for providing a mirror copy of data | |
US7225307B2 (en) | Apparatus, system, and method for synchronizing an asynchronous mirror volume using a synchronous mirror volume | |
US7747576B2 (en) | Incremental update control for remote copy | |
US7206911B2 (en) | Method, system, and program for a system architecture for an arbitrary number of backup components | |
US7120824B2 (en) | Method, apparatus and program storage device for maintaining data consistency and cache coherency during communications failures between nodes in a remote mirror pair | |
US6035412A (en) | RDF-based and MMF-based backups | |
US7702953B2 (en) | Storage management in cascaded replication of data | |
US5720029A (en) | Asynchronously shadowing record updates in a remote copy session using track arrays | |
JP3149325B2 (en) | Method and associated system for forming a consistency group to provide disaster recovery functionality | |
US7779291B2 (en) | Four site triangular asynchronous replication | |
JP2576847B2 (en) | Storage control device and related method | |
US9576040B1 (en) | N-site asynchronous replication | |
US20060136685A1 (en) | Method and system to maintain data consistency over an internet small computer system interface (iSCSI) network | |
US20050071710A1 (en) | Method, system, and program for mirroring data among storage sites | |
US7752404B2 (en) | Toggling between concurrent and cascaded triangular asynchronous replication | |
US7979396B1 (en) | System and method for performing consistent resynchronization between synchronized copies | |
US7680997B1 (en) | Data recovery simulation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: IBM CORPORATION, NEW YORK Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BARTFAI, ROBERT F.;FACTOR, MICHAEL E.;SPEAR, GAIL A.;AND OTHERS;REEL/FRAME:014558/0865 Effective date: 20030925 |
|
FEPP | Fee payment procedure |
Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
REMI | Maintenance fee reminder mailed | ||
LAPS | Lapse for failure to pay maintenance fees | ||
STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20110306 |