WO2003025780A1 - System and method for journal recovery for multinode environments - Google Patents

System and method for journal recovery for multinode environments Download PDF

Info

Publication number
WO2003025780A1
WO2003025780A1 PCT/US2002/030083 US0230083W WO03025780A1 WO 2003025780 A1 WO2003025780 A1 WO 2003025780A1 US 0230083 W US0230083 W US 0230083W WO 03025780 A1 WO03025780 A1 WO 03025780A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
block
journal
transaction
lock held
Prior art date
Application number
PCT/US2002/030083
Other languages
French (fr)
Other versions
WO2003025780A8 (en
WO2003025780A9 (en
Inventor
Brent A. Kingsbury
Sam Revitch
Terence M. Rokop
Original Assignee
Polyserve, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Polyserve, Inc. filed Critical Polyserve, Inc.
Publication of WO2003025780A1 publication Critical patent/WO2003025780A1/en
Publication of WO2003025780A9 publication Critical patent/WO2003025780A9/en
Publication of WO2003025780A8 publication Critical patent/WO2003025780A8/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0709Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/52Program synchronisation; Mutual exclusion, e.g. by means of semaphores
    • G06F9/526Mutual exclusion algorithms
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99938Concurrency, e.g. lock management in shared database
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99951File or database maintenance
    • Y10S707/99952Coherency, e.g. same view to multiple users
    • Y10S707/99953Recoverability

Definitions

  • the present invention relates generally to computer systems.
  • the present invention relates to computer systems that share resources such as storage.
  • Servers are typically used for big applications and workloads such as those used in conjunction with large web services and manufacturing. Often, a single server does not have enough power to perform the required application. To accommodate these large applications, several servers may be used in conjunction with several shared storage devices in a storage area network (SAN). In addition, it may be valuable to group servers together to achieve better availability or manageability.
  • SAN storage area network
  • Fig. 1 is a block diagram of a shared storage system suitable for facilitating an embodiment of the present invention.
  • Figs. 2A-2C are examples of journal entries according to an embodiment of the present invention.
  • Fig. 3 is a flow diagram of a method for journal recovery according to an embodiment of the present invention.
  • Figs. 4A-4B are flow diagrams of a method for analyzing the journal, such as step 304 of Fig. 3, according to an embodiment of the present invention.
  • Figs. 5 A-5B are flow diagrams of a method for replaying selected transactions during journal recovery according to an embodiment of the present invention.
  • Figs. 6A-6B are flow diagrams for a method for analyzing the journal according to an alternate embodiment of the present invention.
  • Figs. 7A-7C are flow diagrams of a method for replaying the selected transactions for journal recovery according to ,the alternate embodiment of the present invention.
  • Fig. 1 is a block diagram of a shared storage system suitable for facilitating an embodiment of the present invention.
  • nodes 102A-102D are coupled together through a network switch 100.
  • the network switch 100 can represent any network infrastructure such as an Ethernet, InfiniBand network or Fibre Channel network capable of host-to-host communication.
  • the nodes 102A-102D are also shown to be coupled to a data storage interconnect 104.
  • An example of the data storage interconnect 104 is a Fibre Channel switch, such as a Brocade 3200 Fibre Channel switch.
  • the data storage network might be an iSCSI or other LP storage network, InfiniBand network, or another kind of host-to-storage network.
  • the network switch 100 and the data storage interconnect 104 may be embodied in a single interconnect.
  • nodes 102A-102D include but are not limited to computers, servers, and any other processing units or applications that can share storage or data.
  • nodes 102A-102D will sometimes be referred to as servers.
  • the data interconnect 104 is shown to be coupled to shared storage 106A-106D.
  • shared storage 106A-106D include any form of storage such as hard drive disks, compact disks, tape, and random access memory.
  • Shared storage can be any storage device, such as hard drive disks, compact disks, tape, and random access memory.
  • a filesystem is a logical entity built on the shared storage. Although the shared storage is typically considered a physical device while the filesystem is typically considered a logical structure overlaid on part of the storage, the filesystem is sometimes referred to herein as shared storage for simplicity. For example, when it is stated that shared storage fails, it can be a failure of a part of a filesystem, one or more filesystems, or the physical storage device on which the filesystem is overlaid. Accordingly, shared storage, as used herein, can mean the physical storage device, a portion of a filesystem, a filesystem, filesystems, or any combination thereof.
  • Figs. 2A-2C are examples of journal entries according to an embodiment of the present invention.
  • a journaling mechanism is used to allow multiple independent processing nodes to update a common set of data structures atomically, even if these updates affect multiple blocks and the hardware is not capable of updating multiple blocks atomically.
  • a journal records information about updates, possibly affecting multiple blocks, in a way that is easily located following a system failure. The act of creating a set of such updates that preferably occurs atomically is called a transaction. Each transaction is recorded in the journal with a journal entry.
  • a journal entry includes a set of blocks written into the journal; these blocks include copies of the block values to be written as part of the update, along with information specifying the locations where these new values will be written. Sometimes these locations will be referred to as the final locations for the update, distinguished from the copies of the block values in the journal entry itself.
  • journal entry When all new values in a particular journal entry have been written to their final locations, the journal entry can be marked expired. An expired journal entry need not be retained, since all the updates it records have been performed; the space it takes up can be re-used for another purpose.
  • node 3 retains information in its local memory, ready to update block B to 3,000 and block C to 1,500.
  • each blocks is assumed to store just an integer, and blocks are named by letters.
  • blocks B and C may represent two separate bank accounts wherein the user has transferred money from bank account B to bank account C.
  • Node 3 has the information but has not yet recorded it in the journal.
  • Node 1 has set block X to 750, set block Y to 1250 and the journal entry recording this transaction has been made valid.
  • Node 7 has set block Q to 50. Note that node 7 is in the middle of writing the transaction and the journal entry is not yet valid.
  • Node 4 retains information in memory and wants to update block A to 500.
  • node 4 wants to update block B as part of the same transaction updating block A to 500.
  • Block B is part of a transaction from node 3 that is not yet valid (as shown in Fig. 2A). Accordingly, node 4 asks node 3 to make that transaction valid in order for node 4 to update block B.
  • node 3 completed writing a journal entry recording the update that sets block B to 3000 and, block C to 1500, and the transaction is now valid.
  • node 3 has completed writing a journal entry for a prior transaction which set block C to 1000 and block D to 250.
  • Node 7's journal entry is still not yet valid since it has not yet finished writing out its transaction.
  • Node 4's journal entry remains empty.
  • node 4 can write a journal entry for its transaction now that node 3 has finished its transaction with regard to block B. Note that valid journal entries do not have to be contiguous according to an embodiment of the present invention. At time 3, nodes 3, 1, and 4 have written valid transactions while node 7's transaction still remains not valid.
  • Fig. 3 is a flow diagram of a method for journal recovery according to an embodiment of the present invention.
  • a failure is detected (300).
  • Examples of failures can include a server failure during a journal update.
  • grants of new locks are paused (302).
  • a shared resource environment such as a storage area network
  • locks are often used to ensure that all the nodes accessing a block of data will access the latest version.
  • the granting of new locks are paused.
  • the journal is analyzed (304), then selected transactions are replayed (306), then ordinary operations are continued (308).
  • Figs. 4A-4B are flow diagrams of a method for analyzing the journal, such as step 304 of Fig. 3, according to an embodiment of the present invention.
  • Figs. 5A-5B are flow diagrams of a method for replaying selected transactions during journal recovery, such as step 306 of Fig. 3, according to an embodiment of the present invention.
  • Figs. 4A-4B and 5 A-5B are preferably read in conjunction.
  • Metadata blocks include but are not limited to data structure information such as file name, date a file was opened, length of file, and where contents of a file is stored.
  • a given block may at one time belong to the set of metadata blocks, and later belong to the set of non-metadata blocks, or vice- versa; changes of blocks from one category to the other are recorded in journal entries in the journal.
  • the journal recovery mechanism selectively updates only the appropriate blocks according to the metadata/non-metadata distinction.
  • a list of blocks is set to null (400). This list will record blocks which are not metadata blocks.
  • the oldest unexpired transaction is retrieved (402). It is determined whether the transaction contains a change of metadata/non-metadata block status (404). This determination looks to see whether the journal entry for this transaction records a block or blocks changing from belonging to the set of metadata blocks to the set of non-metadata blocks or vice- versa. This type of change is preferably recorded in the transaction and can be determined by reading the transaction. If the transaction does contain a record of changing metadata/non-metadata block status (404), then the first record of a block changing its metadata/non-metadata status is retrieved (406). There is preferably a record in the journal entry that indicates whether this particular block is or is not metadata, and it is this record that is retrieved.
  • the analysis of the journal results in a list of non-metadata blocks to ensure that the non-metadata blocks are not tampered with in the journal recovery. This method of analyzing the journal allows the set of metadata blocks to change dynamically.
  • Figs. 5A-5B are flow diagrams of a method for replaying selected transactions during journal recovery according to an embodiment of the present invention.
  • Figs. 5A-5B show an example of step 306 of Fig. 3 and is preferably read in conjunction with Figs. 4A-4B.
  • a particular node retrieves the oldest unexpired valid transaction (500). The first block in the transaction is retrieved (502). It is then determined whether this block is in the list of non-metadata blocks (504). The list of non-metadata blocks was established in the analysis of the journal in step 304 of Fig. 3 and Figs. 4A-4B.
  • this block is not in the list of non-metadata blocks (504), then it is determined if this block is covered by a lock which is held by a surviving node (506).
  • a surviving node is a node that has not failed. If the block is not covered by a lock held by a surviving node then this node writes the block to its final location (508).
  • this block is in the list of non-metadata blocks (504 of Fig. 5A), or if this block is covered by a lock held by a surviving node (506 of Fig. 5A), then it is determined if there are more blocks in the transaction (510). If there are more blocks, then the next block is retrieved (516), and then it is determined if this new block is in the list of non-metadata blocks (504 Fig. 5A).
  • the transaction is marked as expired (512). It is then determined whether there are more valid expired transactions (514). If there are more unexpired valid transactions then the next unexpired valid transaction is retrieved (518), and the first block of the transaction is retrieved (502 Fig. 5A). If there are no more unexpired valid transactions (514), then the journal recovery is complete.
  • Figs. 6A-6B are flow diagrams for a method for analyzing the journal according to an alternate embodiment of the present invention.
  • Figs. 6A-6B are further details of the step of analyzing the journal (304 of Fig. 3) and should be read in conjunction with Figs. 7A-7C, which are further details of the step of replaying selected transactions (306 of Fig. 3).
  • the journal recovery is selective for a failed node's updates but does not account for blocks changing from metadata to non-metadata status or vice versa.
  • a list of blocks is set to null (600). When this analysis is finished, this list will consist of blocks last updated by a failed node. The oldest, unexpired valid transaction is retrieved (602). It is then determined if this transaction is from a failed node (604). If it is not from a failed node then the first block in the transaction is retrieved (606). The block is then removed from the list if it is present on the list (608). The list is the one that was established in step 600. Since a particular block may have been involved in several transactions the block may be listed on the list due to a first transaction, then later removed from the list due to a second transaction.
  • this transaction is from a failed node (604), then the first block in the transaction is retrieved (618 of Fig. 6B). This block is then added to the list (620), and then it is determined if there are more blocks in the transaction (622). If there are more blocks, then the next block transaction is retrieved (624), and this new block is added to the list (620). If there are no more blocks in the transaction then it is determined if there are more valid transactions in the journal (614 of Fig. 6A).
  • Figs. 7 A-7C are flow diagrams of a method for replaying the selected transactions for journal recovery according to the alternate embodiment of the present invention. This method is the alternate embodiment of step 306 of Fig. 3 and should be viewed in conjunction with Figs. 6A-6B.
  • the oldest unexpired valid transaction is retrieved (700). It is then determined if this transaction is from a failed node (702). If it is not from a failed node, it is determined if there are more unexpired valid transactions (704). If there are more transactions, then the next unexpired valid transaction is retrieved (706). If there are no more unexpired valid transactions (704), then the journal recovery is finished.
  • the transaction is from a failed node (702), then the first block in the transaction is retrieved (710). It is then determined if the block is in the list of blocks most recently updated by a failed node (712). This list is the same list that was derived in the method shown in Figs. 6A-6B. If the block is on the list, then it is determined whether the block is covered by a lock held by a surviving node (714). A surviving node is a node that has not failed. If the block is not covered by a lock held by a surviving node, then the block is written to its final location (716). It is then determined if more blocks are in the transaction (720).
  • This determination is also made if the block is not on the list of blocks most recently update by a failed node (712 of Fig. 7B), or the block is covered by a lock held by a surviving node (714 of Fig. 7B). If there are no more blocks in the transaction (712), then it is determined if there are more unexpired valid transactions (704 of Fig. 7A). If, however, there are more blocks in the transaction (720), then the next block is retrieved (722), and it is determined if the new block is in the list of blocks most recently updated by a failed node (712 of Fig. 7B).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Hardware Redundancy (AREA)

Abstract

A system and method are disclosed for providing journal recovery in a multi-node environment which comprises determining whether a block was last updated by a first node (500); determining whether the block is associated with a lock held by a second node (506); and writing the block to a final location if the block is not associated with a lock held by the second node (508). In another embodiment, the block is associated with metadata (504).

Description

SYSTEM AND METHOD FOR JOURNAL RECOVERY FOR MULTINODE ENVIRONMENTS
CROSS REFERENCE TO RELATED APPLICATIONS
This application claims priority to U.S. Provisional Patent Application No. 60/324,196 (Attorney Docket No. POLYP001+) entitled SHARED STORAGE LOCK: A NEW SOFTWARE SYNCHRONIZATION MECHANISM FOR ENFORCING MUTUAL EXCLUSION AMONG MULTIPLE NEGOTIATORS filed September 21, 2001, which is incorporated herein by reference for all purposes.
This application claims priority to U.S. Provisional Patent Application No. 60/324,226 (Attorney Docket No. POLYP002+) entitled JOUNALING
MECHANISM WITH EFFICIENT, SELECTINE RECOVERY FOR MULTI-NODE ENVIRONMENTS filed September 21, 2001, which is incorporated herein by reference for all purposes.
This application claims priority to U.S. Provisional Patent Application No. 60/324,224 (Attorney Docket No. POLYP003+) entitled COLLABORATIVE
CACHING IN A MULTI-NODE FILESYSTEM filed September 21, 2001, which is incorporated herein by reference for all purposes.
This application claims priority to U.S. Provisional Patent Application No 60/324,242 (Attorney Docket No. POLYP005+) entitled DISTRIBUTED MANAGEMENT OF A STORAGE AREA NETWORK filed September 21 , 2001 , which is incorporated herein by reference for all purposes. This application claims priority to U.S. Provisional Patent Application No. 60/324,195 (Attorney Docket No. POLYP006+) entitled METHOD FOR IMPLEMENTING JOURNALING AND DISTRIBUTED LOCK MANAGEMENT filed September 21, 2001, which is incorporated herein by reference for all purposes.
This application claims priority to U.S. Provisional Patent Application No.
60/324,243 (Attorney Docket No. POLYP007+) entitled MATRIX SERVER: A HIGHLY AVAILABLE MATRIX PROCESSING SYSTEM WITH COHERENT SHARED FILE STORAGE filed September 21, 2001, which is incorporated herein by reference for all purposes.
This application claims priority to U.S. Provisional Patent Application No.
60/324,787 (Attorney Docket No. POLYP008+) entitled A METHOD FOR EFFICIENT ON-LINE LOCK RECOVERY IN A HIGHLY AVAILABLE MATRIX PROCESSING SYSTEM filed September 24, 2001, which is incorporated herein by reference for all purposes.
This application claims priority to U.S. Provisional Patent Application No.
60/327,191 (Attorney Docket No. POLYP009+) entitled FAST LOCK RECOVERY: A METHOD FOR EFFICIENT ON-LINE LOCK RECOVERY IN A HIGHLY AVAILABLE MATRIX PROCESSING SYSTEM filed October 1, 2001, which is incorporated herein by reference for all purposes.
This application is related to co-pending U.S. Patent Application No. (Attorney Docket No. POLYP001) entitled A SYSTEM AND
METHOD FOR SYNCHRONIZATION FOR ENFORCING MUTUAL EXCLUSION AMONG MULTIPLE NEGOTIATORS filed concurrently herewith, which is incorporated herein by reference for all purposes; and co-pending U.S. Patent
Application No. (Attorney Docket No. POLYP003) entitled A
SYSTEM AND METHOD FOR COLLABORATIVE CACHING IN A MULTINODE SYSTEM filed concurrently herewith, which is incorporated herein by reference for all purposes; and co-pending U.S. Patent Application No. (Attorney Docket No. POLYP005) entitled A SYSTEM AND
METHOD FOR MANAGEMENT OF A STORAGE AREA NETWORK filed concurrently herewith, which is incorporated herein by reference for all purposes; and co-pending U.S. Patent Application No. (Attorney Docket No. POLYP006) entitled SYSTEM AND METHOD FOR IMPLEMENTING
JOURNALING IN A MULTI-NODE ENVIRONMENT filed concurrently herewith, which is incorporated herein by reference for all purposes; and co-pending U.S. Patent
Application No. (Attorney Docket No. POLYP007) entitled A
SYSTEM AND METHOD FOR A MULTI-NODE ENVIRONMENT WITH SHARED STORAGE filed concurrently herewith, which is incorporated herein by reference for all purposes; and co-pending U.S. Patent Application No. (Attorney Docket No. POLYP009) entitled A SYSTEM AND
METHOD FOR EFFICIENT LOCK RECOVERY filed concurrently herewith, which is incorporated herein by reference for all purposes.
FIELD OF THE INVENTION
The present invention relates generally to computer systems. In particular, the present invention relates to computer systems that share resources such as storage. BACKGROUND OF THE INVENTION
Servers are typically used for big applications and workloads such as those used in conjunction with large web services and manufacturing. Often, a single server does not have enough power to perform the required application. To accommodate these large applications, several servers may be used in conjunction with several shared storage devices in a storage area network (SAN). In addition, it may be valuable to group servers together to achieve better availability or manageability.
As systems become large, it becomes more difficult to coordinate multiple component updates to shared data structures with high performance and efficient behavior. It would be beneficial to synthesize atomic updates on data structures spread over multiple data blocks when the hardware can only provide atomicity at the level of single block updates. The need for atomic update arises because systems can fail, and it can be costly or impossible to find and repair inconsistencies introduced by partially complete updates. One way to manage recovery is through the use of a j ournal that records information about updates.
What is needed is a system and method for journal recovery in a multi-node environment that efficiently restores common data structures to a consistent state even if some of the processing nodes fail while surviving nodes have overlapping updates in progress. The present invention addresses such needs.
BRIEF DESCRIPTION OF THE DRAWINGS The present invention will be readily understood by the following detailed description in conjunction with the accompanying drawings, wherein like reference numerals designate like structural elements, and in which:
Fig. 1 is a block diagram of a shared storage system suitable for facilitating an embodiment of the present invention.
Figs. 2A-2C are examples of journal entries according to an embodiment of the present invention.
Fig. 3 is a flow diagram of a method for journal recovery according to an embodiment of the present invention.
Figs. 4A-4B are flow diagrams of a method for analyzing the journal, such as step 304 of Fig. 3, according to an embodiment of the present invention.
Figs. 5 A-5B are flow diagrams of a method for replaying selected transactions during journal recovery according to an embodiment of the present invention.
Figs. 6A-6B are flow diagrams for a method for analyzing the journal according to an alternate embodiment of the present invention.
Figs. 7A-7C are flow diagrams of a method for replaying the selected transactions for journal recovery according to ,the alternate embodiment of the present invention.
DETAILED DESCRIPTION It should be appreciated that the present invention can be implemented in numerous ways, including as a process, an apparatus, a system, or a computer readable medium such as a computer readable storage medium or a computer network wherein program instructions are sent over optical or electronic communication links. It should be noted that the order of the steps of disclosed processes may be altered within the scope of the invention.
A detailed description of one or more preferred embodiments of the invention are provided below along with accompanying figures that illustrate by way of example the principles of the invention. While the invention is described in connection with such embodiments, it should be understood that the invention is not limited to any embodiment. On the contrary, the scope of the invention is limited only by the appended claims and the invention encompasses numerous alternatives, modifications and equivalents. For the purpose of example, numerous specific details are set forth in the following description in order to provide a thorough understanding of the present invention. The present invention may be practiced according to the claims without some or all of these specific details. For the purpose of clarity, technical material that is known in the technical fields related to the invention has not been described in detail so that the present invention is not unnecessarily obscured.
Fig. 1 is a block diagram of a shared storage system suitable for facilitating an embodiment of the present invention. In this example, nodes 102A-102D are coupled together through a network switch 100. The network switch 100 can represent any network infrastructure such as an Ethernet, InfiniBand network or Fibre Channel network capable of host-to-host communication. Additionally, the nodes 102A-102D are also shown to be coupled to a data storage interconnect 104. An example of the data storage interconnect 104 is a Fibre Channel switch, such as a Brocade 3200 Fibre Channel switch. Alternately, the data storage network might be an iSCSI or other LP storage network, InfiniBand network, or another kind of host-to-storage network. In addition, the network switch 100 and the data storage interconnect 104 may be embodied in a single interconnect. Examples of nodes 102A-102D include but are not limited to computers, servers, and any other processing units or applications that can share storage or data. For exemplary purposes, nodes 102A-102D will sometimes be referred to as servers. The data interconnect 104 is shown to be coupled to shared storage 106A-106D. Examples of shared storage 106A-106D include any form of storage such as hard drive disks, compact disks, tape, and random access memory.
Shared storage can be any storage device, such as hard drive disks, compact disks, tape, and random access memory. A filesystem is a logical entity built on the shared storage. Although the shared storage is typically considered a physical device while the filesystem is typically considered a logical structure overlaid on part of the storage, the filesystem is sometimes referred to herein as shared storage for simplicity. For example, when it is stated that shared storage fails, it can be a failure of a part of a filesystem, one or more filesystems, or the physical storage device on which the filesystem is overlaid. Accordingly, shared storage, as used herein, can mean the physical storage device, a portion of a filesystem, a filesystem, filesystems, or any combination thereof.
Figs. 2A-2C are examples of journal entries according to an embodiment of the present invention. A journaling mechanism is used to allow multiple independent processing nodes to update a common set of data structures atomically, even if these updates affect multiple blocks and the hardware is not capable of updating multiple blocks atomically. A journal records information about updates, possibly affecting multiple blocks, in a way that is easily located following a system failure. The act of creating a set of such updates that preferably occurs atomically is called a transaction. Each transaction is recorded in the journal with a journal entry. A journal entry includes a set of blocks written into the journal; these blocks include copies of the block values to be written as part of the update, along with information specifying the locations where these new values will be written. Sometimes these locations will be referred to as the final locations for the update, distinguished from the copies of the block values in the journal entry itself.
By writing a single block, atomically, to complete a journal entry, the journal seals the intention to perform a multi-block update in the shared storage. This is called making the journal entry valid. Until the journal entry is valid, no block values are updated in their final locations; once the journal entry is valid, block values can be updated in their final locations as desired. Accordingly, if there is a failure before a journal entry is made valid, the system can recover a state that includes no part of the updates in the recovered transaction corresponding to that journal entry; after the write completing a journal entry has been performed and the journal entry is valid, the state that includes all parts of that transaction's update can be recovered. In other words, none of the updates, even if they have been partially written into the journal entry, will be made if the node that is making the journal entry fails before the entry is made valid. After the entry is made valid, even if there is a failure, the valid updates will ultimately completed. Thus, by examining the journal after a failure, the data structures can be restored to a state that could have existed if all multi-block updates had been made atomically.
When all new values in a particular journal entry have been written to their final locations, the journal entry can be marked expired. An expired journal entry need not be retained, since all the updates it records have been performed; the space it takes up can be re-used for another purpose.
In the example shown in Fig. 2 A, at time 1, node 3 retains information in its local memory, ready to update block B to 3,000 and block C to 1,500. In this simplified example, each blocks is assumed to store just an integer, and blocks are named by letters. For example, blocks B and C may represent two separate bank accounts wherein the user has transferred money from bank account B to bank account C. Node 3 has the information but has not yet recorded it in the journal.
Node 1 has set block X to 750, set block Y to 1250 and the journal entry recording this transaction has been made valid. Node 7 has set block Q to 50. Note that node 7 is in the middle of writing the transaction and the journal entry is not yet valid. Node 4 retains information in memory and wants to update block A to 500.
Between time 1 (Fig. 2A) and time 2 (Fig. 2B), node 4 wants to update block B as part of the same transaction updating block A to 500. Block B is part of a transaction from node 3 that is not yet valid (as shown in Fig. 2A). Accordingly, node 4 asks node 3 to make that transaction valid in order for node 4 to update block B. At time 2, shown in Fig. 2B, node 3 completed writing a journal entry recording the update that sets block B to 3000 and, block C to 1500, and the transaction is now valid. And in addition, node 3 has completed writing a journal entry for a prior transaction which set block C to 1000 and block D to 250. Node 7's journal entry is still not yet valid since it has not yet finished writing out its transaction. Node 4's journal entry remains empty.
Between time 2 (Fig. 2B) and time 3 (Fig. 2C), node 4 can write a journal entry for its transaction now that node 3 has finished its transaction with regard to block B. Note that valid journal entries do not have to be contiguous according to an embodiment of the present invention. At time 3, nodes 3, 1, and 4 have written valid transactions while node 7's transaction still remains not valid.
Fig. 3 is a flow diagram of a method for journal recovery according to an embodiment of the present invention. In this example, a failure is detected (300). Examples of failures can include a server failure during a journal update. In case of failure, grants of new locks are paused (302). In a shared resource environment, such as a storage area network, locks are often used to ensure that all the nodes accessing a block of data will access the latest version. In this example, the granting of new locks are paused. The journal is analyzed (304), then selected transactions are replayed (306), then ordinary operations are continued (308).
Figs. 4A-4B are flow diagrams of a method for analyzing the journal, such as step 304 of Fig. 3, according to an embodiment of the present invention. Figs. 5A-5B are flow diagrams of a method for replaying selected transactions during journal recovery, such as step 306 of Fig. 3, according to an embodiment of the present invention. Figs. 4A-4B and 5 A-5B are preferably read in conjunction.
In this embodiment, some data blocks are updated atomically using the journaling mechanism, and other blocks are updated directly, without using the journaling mechanism. Data blocks that are updated atomically using the journaling mechanism are herein referred to as metadata blocks. Examples of metadata blocks include but are not limited to data structure information such as file name, date a file was opened, length of file, and where contents of a file is stored.
In this embodiment, a given block may at one time belong to the set of metadata blocks, and later belong to the set of non-metadata blocks, or vice- versa; changes of blocks from one category to the other are recorded in journal entries in the journal. The journal recovery mechanism selectively updates only the appropriate blocks according to the metadata/non-metadata distinction.
In this example, a list of blocks is set to null (400). This list will record blocks which are not metadata blocks.
The oldest unexpired transaction is retrieved (402). It is determined whether the transaction contains a change of metadata/non-metadata block status (404). This determination looks to see whether the journal entry for this transaction records a block or blocks changing from belonging to the set of metadata blocks to the set of non-metadata blocks or vice- versa.. This type of change is preferably recorded in the transaction and can be determined by reading the transaction. If the transaction does contain a record of changing metadata/non-metadata block status (404), then the first record of a block changing its metadata/non-metadata status is retrieved (406). There is preferably a record in the journal entry that indicates whether this particular block is or is not metadata, and it is this record that is retrieved.
It is then determined whether the record shows that this particular block has turned from metadata to non-metadata (410). If this block has become non-metadata, then the block is added to the list that was created in step 400 of Fig. 4A (414). If the block has not turned non-metadata (410), then the block is removed from the list if it is currently present on the list (412). Assuming that several changes have been made to a particular block, the block may be added to the list and removed multiple times if multiple changes have been made between changing its status from metadata to non- metadata. As each of these changes is analyzed the block will be added or subtracted from the list.
It is then determined whether there are more records (416). If so, then the next record is retrieved (418), and it is again determined if this record indicates the block becoming non-metadata (410). If there are no more records (416), then it is determined if there is another valid transaction (420). The determination of whether there is another transaction (420) is also made if the previous transaction does not contain a change of metadata/non-metadata block status (404 of Fig. 4A). If there are no other transactions, then the analysis of the journal is complete. If, however, there is another transaction, then the next transaction is retrieved (422), and it is determined whether this transaction contains a change of metadata/non-metadata block status (404 of Fig. 4 A),
The analysis of the journal results in a list of non-metadata blocks to ensure that the non-metadata blocks are not tampered with in the journal recovery. This method of analyzing the journal allows the set of metadata blocks to change dynamically.
Figs. 5A-5B are flow diagrams of a method for replaying selected transactions during journal recovery according to an embodiment of the present invention. Figs. 5A-5B show an example of step 306 of Fig. 3 and is preferably read in conjunction with Figs. 4A-4B. In this example, a particular node retrieves the oldest unexpired valid transaction (500). The first block in the transaction is retrieved (502). It is then determined whether this block is in the list of non-metadata blocks (504). The list of non-metadata blocks was established in the analysis of the journal in step 304 of Fig. 3 and Figs. 4A-4B.
If this block is not in the list of non-metadata blocks (504), then it is determined if this block is covered by a lock which is held by a surviving node (506). A surviving node is a node that has not failed. If the block is not covered by a lock held by a surviving node then this node writes the block to its final location (508).
It is then determined if there are more blocks in the transaction (510). If there is, consideration of this block proceeds as for the first block by first determining whether the block is in the list of metadata blocks (504).
Likewise, if this block is in the list of non-metadata blocks (504 of Fig. 5A), or if this block is covered by a lock held by a surviving node (506 of Fig. 5A), then it is determined if there are more blocks in the transaction (510). If there are more blocks, then the next block is retrieved (516), and then it is determined if this new block is in the list of non-metadata blocks (504 Fig. 5A).
If there are no more blocks in the transaction (510), then the transaction is marked as expired (512). It is then determined whether there are more valid expired transactions (514). If there are more unexpired valid transactions then the next unexpired valid transaction is retrieved (518), and the first block of the transaction is retrieved (502 Fig. 5A). If there are no more unexpired valid transactions (514), then the journal recovery is complete.
Figs. 6A-6B are flow diagrams for a method for analyzing the journal according to an alternate embodiment of the present invention. Figs. 6A-6B are further details of the step of analyzing the journal (304 of Fig. 3) and should be read in conjunction with Figs. 7A-7C, which are further details of the step of replaying selected transactions (306 of Fig. 3). In this embodiment, the journal recovery is selective for a failed node's updates but does not account for blocks changing from metadata to non-metadata status or vice versa.
In this example, a list of blocks is set to null (600). When this analysis is finished, this list will consist of blocks last updated by a failed node. The oldest, unexpired valid transaction is retrieved (602). It is then determined if this transaction is from a failed node (604). If it is not from a failed node then the first block in the transaction is retrieved (606). The block is then removed from the list if it is present on the list (608). The list is the one that was established in step 600. Since a particular block may have been involved in several transactions the block may be listed on the list due to a first transaction, then later removed from the list due to a second transaction.
It is then determined whether there are more blocks in the transaction (610). If there are more blocks in the transaction, the next block in the transaction is retrieved (612). If there are no more blocks in the transaction (610), then it is determined whether there are more valid transactions in the journal (614). If there are more valid transactions, then the next valid transaction is retrieved, and it is again determined if this new transaction is from a failed node (604). If there are no more valid transactions (604), then the journal analysis is complete.
If this transaction is from a failed node (604), then the first block in the transaction is retrieved (618 of Fig. 6B). This block is then added to the list (620), and then it is determined if there are more blocks in the transaction (622). If there are more blocks, then the next block transaction is retrieved (624), and this new block is added to the list (620). If there are no more blocks in the transaction then it is determined if there are more valid transactions in the journal (614 of Fig. 6A).
Figs. 7 A-7C are flow diagrams of a method for replaying the selected transactions for journal recovery according to the alternate embodiment of the present invention. This method is the alternate embodiment of step 306 of Fig. 3 and should be viewed in conjunction with Figs. 6A-6B.
In this example, the oldest unexpired valid transaction is retrieved (700). It is then determined if this transaction is from a failed node (702). If it is not from a failed node, it is determined if there are more unexpired valid transactions (704). If there are more transactions, then the next unexpired valid transaction is retrieved (706). If there are no more unexpired valid transactions (704), then the journal recovery is finished.
If the transaction is from a failed node (702), then the first block in the transaction is retrieved (710). It is then determined if the block is in the list of blocks most recently updated by a failed node (712). This list is the same list that was derived in the method shown in Figs. 6A-6B. If the block is on the list, then it is determined whether the block is covered by a lock held by a surviving node (714). A surviving node is a node that has not failed. If the block is not covered by a lock held by a surviving node, then the block is written to its final location (716). It is then determined if more blocks are in the transaction (720). This determination is also made if the block is not on the list of blocks most recently update by a failed node (712 of Fig. 7B), or the block is covered by a lock held by a surviving node (714 of Fig. 7B). If there are no more blocks in the transaction (712), then it is determined if there are more unexpired valid transactions (704 of Fig. 7A). If, however, there are more blocks in the transaction (720), then the next block is retrieved (722), and it is determined if the new block is in the list of blocks most recently updated by a failed node (712 of Fig. 7B).
Although the foregoing invention has been described in some detail for purposes of clarity of understanding, it will be apparent that certain changes and modifications may be practiced within the scope of the appended claims. It should be noted that there are many alternative ways of implementing both the process and apparatus of the present invention. Accordingly, the present embodiments are to be considered as illustrative and not restrictive, and the invention is not to be limited to the details given herein, but may be modified within the scope and equivalents of the appended claims.
WHAT IS CLAIMED IS:

Claims

1. A method of journal recovery in a multi-node environment comprising: determining whether a block was last updated by a first node;
determining whether the block is associated with a lock held by a second node; and "
writing the block to a final location if the block is not associated with a lock held by the second node.
2. The method of claim 1 , wherein the first node is a failed node.
3. The method of claim 1 , wherein the second node is a surviving node.
4. The method of claim 1, further comprising determining whether a journal entry was created by the first node.
5. A system of journal recovery in a multi-node environment comprising: a processor configured to determine whether a block was last updated by a first node; determine whether the block is associated with a lock held by a second node; and write the block to a final location if the block is not associated with a lock held by the second node; and
a memory coupled to the processor, wherein the memory provides the processor with instructions.
6. A system of journal recovery in a multi-node environment comprising: a first node, wherein the first node has failed; a second node; a third node configured to determine whether a block was last updated by the first node; determine whether the block is associated with a lock held by the second node; and write the block to a final location if the block is not associated with a lock held by the second node.
7. The system of claim 6, wherein the second node and the third node are the same node.
8. The system of claim 6, wherein the second node and the third node are different nodes.
9. The system of claim 6, wherein the second node is a surviving node.
10. A method of j ournal recovery in a multi-node environment comprising: detennining whether a block is associated with metadata;
determining whether the block is associated with a lock held by a surviving node; and
writing the block to a final location if the block is not associated with a lock held by the surviving node.
11. The method of claim 10, wherein the writing the block to a final location occurs if the block is not associated with a lock held by the surviving node, and the block is associated with metadata.
12. A system of journal recovery in a multi-node environment comprising: a first node, wherein the first node has failed; a second node; a third node configured to determine whether a block is associated with metadata; determine whether the block is associated with a lock held by the second node; and write the block to a final location if the block is not associated with a lock held by the second node.
13. The system of claim 12, wherein the second node and the third node are the same node.
14. The system of claim 12, wherein the second node and the third node are different nodes.
15. The system of claim 12, wherein the second node is a surviving node.
16. A computer program product for journal recovery in a multi-node environment, the computer program product being embodied in a computer readable medium and comprising computer instructions for:
determining whether a block was last updated by a first node;
determining whether the block is associated with a lock held by a second node; and
writing the block to a final location if the block is not associated with a lock held by the second node.
16. A computer program product for j ournal recovery in a multi-node environment, the computer program product being embodied in a computer readable medium and comprising computer instructions for:
determining whether a block is associated with metadata;
determining whether the block is associated with a lock held by a surviving node; and
writing the block to a final location if the block is not associated with a lock held by the surviving node.
PCT/US2002/030083 2001-09-21 2002-09-20 System and method for journal recovery for multinode environments WO2003025780A1 (en)

Applications Claiming Priority (16)

Application Number Priority Date Filing Date Title
US32419501P 2001-09-21 2001-09-21
US32424201P 2001-09-21 2001-09-21
US32419601P 2001-09-21 2001-09-21
US32424301P 2001-09-21 2001-09-21
US32422401P 2001-09-21 2001-09-21
US32422601P 2001-09-21 2001-09-21
US60/324,226 2001-09-21
US60/324,242 2001-09-21
US60/324,196 2001-09-21
US60/324,224 2001-09-21
US60/324,195 2001-09-21
US60/324,243 2001-09-21
US32478701P 2001-09-24 2001-09-24
US60/324,787 2001-09-24
US32719101P 2001-10-01 2001-10-01
US60/327,191 2001-10-01

Publications (3)

Publication Number Publication Date
WO2003025780A1 true WO2003025780A1 (en) 2003-03-27
WO2003025780A9 WO2003025780A9 (en) 2004-03-04
WO2003025780A8 WO2003025780A8 (en) 2004-04-01

Family

ID=27575390

Family Applications (5)

Application Number Title Priority Date Filing Date
PCT/US2002/030085 WO2003025751A1 (en) 2001-09-21 2002-09-20 A system and method for efficient lock recovery
PCT/US2002/030083 WO2003025780A1 (en) 2001-09-21 2002-09-20 System and method for journal recovery for multinode environments
PCT/US2002/030082 WO2003025801A1 (en) 2001-09-21 2002-09-20 System and method for implementing journaling in a multi-node environment
PCT/US2002/029859 WO2003027903A1 (en) 2001-09-21 2002-09-20 A system and method for a multi-node environment with shared storage
PCT/US2002/029857 WO2003027853A1 (en) 2001-09-21 2002-09-20 A system and method for synchronisation for enforcing mutual exclusion among multiple negotiators

Family Applications Before (1)

Application Number Title Priority Date Filing Date
PCT/US2002/030085 WO2003025751A1 (en) 2001-09-21 2002-09-20 A system and method for efficient lock recovery

Family Applications After (3)

Application Number Title Priority Date Filing Date
PCT/US2002/030082 WO2003025801A1 (en) 2001-09-21 2002-09-20 System and method for implementing journaling in a multi-node environment
PCT/US2002/029859 WO2003027903A1 (en) 2001-09-21 2002-09-20 A system and method for a multi-node environment with shared storage
PCT/US2002/029857 WO2003027853A1 (en) 2001-09-21 2002-09-20 A system and method for synchronisation for enforcing mutual exclusion among multiple negotiators

Country Status (7)

Country Link
US (8) US7266722B2 (en)
EP (2) EP1428149B1 (en)
JP (2) JP4249622B2 (en)
CN (2) CN1302419C (en)
AU (1) AU2002341784A1 (en)
CA (2) CA2460833C (en)
WO (5) WO2003025751A1 (en)

Families Citing this family (192)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7412462B2 (en) * 2000-02-18 2008-08-12 Burnside Acquisition, Llc Data repository and method for promoting network storage of data
US6890968B2 (en) * 2001-05-16 2005-05-10 Kerr Corporation Prepolymerized filler in dental restorative composite
US7640582B2 (en) 2003-04-16 2009-12-29 Silicon Graphics International Clustered filesystem for mix of trusted and untrusted nodes
US7617292B2 (en) * 2001-06-05 2009-11-10 Silicon Graphics International Multi-class heterogeneous clients in a clustered filesystem
US8010558B2 (en) 2001-06-05 2011-08-30 Silicon Graphics International Relocation of metadata server with outstanding DMAPI requests
US20040139125A1 (en) 2001-06-05 2004-07-15 Roger Strassburg Snapshot copy of data volume during data access
US7571215B2 (en) * 2001-07-16 2009-08-04 Bea Systems, Inc. Data replication protocol
US7409420B2 (en) * 2001-07-16 2008-08-05 Bea Systems, Inc. Method and apparatus for session replication and failover
US7702791B2 (en) 2001-07-16 2010-04-20 Bea Systems, Inc. Hardware load-balancing apparatus for session replication
US7113980B2 (en) 2001-09-06 2006-09-26 Bea Systems, Inc. Exactly once JMS communication
US6826601B2 (en) * 2001-09-06 2004-11-30 Bea Systems, Inc. Exactly one cache framework
US7266722B2 (en) * 2001-09-21 2007-09-04 Hewlett-Packard Development Company, L.P. System and method for efficient lock recovery
US7392302B2 (en) 2002-02-21 2008-06-24 Bea Systems, Inc. Systems and methods for automated service migration
US7178050B2 (en) * 2002-02-22 2007-02-13 Bea Systems, Inc. System for highly available transaction recovery for transaction processing systems
US7096213B2 (en) * 2002-04-08 2006-08-22 Oracle International Corporation Persistent key-value repository with a pluggable architecture to abstract physical storage
WO2003092166A1 (en) * 2002-04-25 2003-11-06 Kashya Israel Ltd. An apparatus for continuous compression of large volumes of data
US20030220943A1 (en) * 2002-05-23 2003-11-27 International Business Machines Corporation Recovery of a single metadata controller failure in a storage area network environment
US7774466B2 (en) * 2002-10-17 2010-08-10 Intel Corporation Methods and apparatus for load balancing storage nodes in a distributed storage area network system
US20050286377A1 (en) * 2002-11-07 2005-12-29 Koninkleijke Philips Electronics, N.V. Record carrier having a main file system area and a virtual file system area
US7457906B2 (en) * 2003-01-21 2008-11-25 Nextio, Inc. Method and apparatus for shared I/O in a load/store fabric
US7613797B2 (en) * 2003-03-19 2009-11-03 Unisys Corporation Remote discovery and system architecture
GB0308923D0 (en) * 2003-04-17 2003-05-28 Ibm Low-overhead storage cluster configuration locking
US7409389B2 (en) 2003-04-29 2008-08-05 International Business Machines Corporation Managing access to objects of a computing environment
US7376744B2 (en) * 2003-05-09 2008-05-20 Oracle International Corporation Using local locks for global synchronization in multi-node systems
US20040230903A1 (en) * 2003-05-16 2004-11-18 Dethe Elza Method and system for enabling collaborative authoring of hierarchical documents with associated business logic
CA2429375A1 (en) * 2003-05-22 2004-11-22 Cognos Incorporated Model action logging
WO2005008434A2 (en) * 2003-07-11 2005-01-27 Computer Associates Think, Inc. A distributed locking method and system for networked device management
US7356531B1 (en) * 2003-07-25 2008-04-08 Symantec Operating Corporation Network file system record lock recovery in a highly available environment
US7739541B1 (en) 2003-07-25 2010-06-15 Symantec Operating Corporation System and method for resolving cluster partitions in out-of-band storage virtualization environments
US8234517B2 (en) * 2003-08-01 2012-07-31 Oracle International Corporation Parallel recovery by non-failed nodes
US7584454B1 (en) * 2003-09-10 2009-09-01 Nextaxiom Technology, Inc. Semantic-based transactional support and recovery for nested composite software services
US20050091215A1 (en) * 2003-09-29 2005-04-28 Chandra Tushar D. Technique for provisioning storage for servers in an on-demand environment
US7234073B1 (en) * 2003-09-30 2007-06-19 Emc Corporation System and methods for failover management of manageable entity agents
US7581205B1 (en) 2003-09-30 2009-08-25 Nextaxiom Technology, Inc. System and method of implementing a customizable software platform
US8225282B1 (en) 2003-11-25 2012-07-17 Nextaxiom Technology, Inc. Semantic-based, service-oriented system and method of developing, programming and managing software modules and software solutions
US7376147B2 (en) * 2003-12-18 2008-05-20 Intel Corporation Adaptor supporting different protocols
US7155546B2 (en) * 2003-12-18 2006-12-26 Intel Corporation Multiple physical interfaces in a slot of a storage enclosure to support different storage interconnect architectures
US20050138154A1 (en) * 2003-12-18 2005-06-23 Intel Corporation Enclosure management device
US10776206B1 (en) * 2004-02-06 2020-09-15 Vmware, Inc. Distributed transaction system
US20110179082A1 (en) * 2004-02-06 2011-07-21 Vmware, Inc. Managing concurrent file system accesses by multiple servers using locks
US8560747B1 (en) 2007-02-16 2013-10-15 Vmware, Inc. Associating heartbeat data with access to shared resources of a computer system
US8700585B2 (en) * 2004-02-06 2014-04-15 Vmware, Inc. Optimistic locking method and system for committing transactions on a file system
US7849098B1 (en) * 2004-02-06 2010-12-07 Vmware, Inc. Providing multiple concurrent access to a file system
US8543781B2 (en) * 2004-02-06 2013-09-24 Vmware, Inc. Hybrid locking using network and on-disk based schemes
JP4485256B2 (en) * 2004-05-20 2010-06-16 株式会社日立製作所 Storage area management method and management system
US7962449B2 (en) * 2004-06-25 2011-06-14 Apple Inc. Trusted index structure in a network environment
US8131674B2 (en) 2004-06-25 2012-03-06 Apple Inc. Methods and systems for managing data
US7730012B2 (en) 2004-06-25 2010-06-01 Apple Inc. Methods and systems for managing data
US7386752B1 (en) * 2004-06-30 2008-06-10 Symantec Operating Corporation Using asset dependencies to identify the recovery set and optionally automate and/or optimize the recovery
US7769734B2 (en) * 2004-07-26 2010-08-03 International Business Machines Corporation Managing long-lived resource locks in a multi-system mail infrastructure
WO2006015536A1 (en) * 2004-08-08 2006-02-16 Huawei Technologies Co. Ltd. A method for realizing notification log operation
US20060041559A1 (en) * 2004-08-17 2006-02-23 International Business Machines Corporation Innovation for managing virtual storage area networks
US20060059269A1 (en) * 2004-09-13 2006-03-16 Chien Chen Transparent recovery of switch device
US7310711B2 (en) * 2004-10-29 2007-12-18 Hitachi Global Storage Technologies Netherlands B.V. Hard disk drive with support for atomic transactions
US7496701B2 (en) * 2004-11-18 2009-02-24 International Business Machines Corporation Managing virtual server control of computer support systems with heartbeat message
JP4462024B2 (en) 2004-12-09 2010-05-12 株式会社日立製作所 Failover method by disk takeover
US8495266B2 (en) * 2004-12-10 2013-07-23 Hewlett-Packard Development Company, L.P. Distributed lock
US20060242453A1 (en) * 2005-04-25 2006-10-26 Dell Products L.P. System and method for managing hung cluster nodes
US7506204B2 (en) * 2005-04-25 2009-03-17 Microsoft Corporation Dedicated connection to a database server for alternative failure recovery
JP4648751B2 (en) * 2005-05-02 2011-03-09 株式会社日立製作所 Storage control system and storage control method
US7631016B2 (en) * 2005-05-04 2009-12-08 Oracle International Corporation Providing the latest version of a data item from an N-replica set
US7356653B2 (en) * 2005-06-03 2008-04-08 International Business Machines Corporation Reader-initiated shared memory synchronization
US7437426B2 (en) * 2005-09-27 2008-10-14 Oracle International Corporation Detecting and correcting node misconfiguration of information about the location of shared storage resources
US8060713B1 (en) 2005-12-21 2011-11-15 Emc (Benelux) B.V., S.A.R.L. Consolidating snapshots in a continuous data protection system using journaling
US7774565B2 (en) * 2005-12-21 2010-08-10 Emc Israel Development Center, Ltd. Methods and apparatus for point in time data access and recovery
US7849361B2 (en) * 2005-12-22 2010-12-07 Emc Corporation Methods and apparatus for multiple point in time data access
US7836033B1 (en) * 2006-01-24 2010-11-16 Network Appliance, Inc. Method and apparatus for parallel updates to global state in a multi-processor system
US20070180287A1 (en) * 2006-01-31 2007-08-02 Dell Products L. P. System and method for managing node resets in a cluster
US7577867B2 (en) * 2006-02-17 2009-08-18 Emc Corporation Cross tagging to data for consistent recovery
US7552148B2 (en) * 2006-02-28 2009-06-23 Microsoft Corporation Shutdown recovery
US7899780B1 (en) * 2006-03-30 2011-03-01 Emc Corporation Methods and apparatus for structured partitioning of management information
CN100383750C (en) * 2006-06-07 2008-04-23 中国科学院计算技术研究所 High-reliable journal system realizing method facing to large-scale computing system
US7734960B2 (en) * 2006-08-14 2010-06-08 Hewlett-Packard Development Company, L.P. Method of managing nodes in computer cluster
US7886034B1 (en) * 2006-09-27 2011-02-08 Symantec Corporation Adaptive liveness management for robust and efficient peer-to-peer storage
US20080082533A1 (en) * 2006-09-28 2008-04-03 Tak Fung Wang Persistent locks/resources for concurrency control
US7627612B2 (en) * 2006-09-28 2009-12-01 Emc Israel Development Center, Ltd. Methods and apparatus for optimal journaling for continuous data replication
US7627687B2 (en) * 2006-09-28 2009-12-01 Emc Israel Development Center, Ltd. Methods and apparatus for managing data flow in a continuous data replication system having journaling
US8024521B2 (en) * 2007-03-13 2011-09-20 Sony Computer Entertainment Inc. Atomic operation on non-standard sized data using external cache
US7778986B2 (en) * 2007-08-29 2010-08-17 International Business Machines Corporation Securing transfer of ownership of a storage object from an unavailable owner node to another node
US8055855B2 (en) * 2007-10-05 2011-11-08 International Business Machines Corporation Varying access parameters for processes to access memory addresses in response to detecting a condition related to a pattern of processes access to memory addresses
US7770064B2 (en) * 2007-10-05 2010-08-03 International Business Machines Corporation Recovery of application faults in a mirrored application environment
US7856536B2 (en) * 2007-10-05 2010-12-21 International Business Machines Corporation Providing a process exclusive access to a page including a memory address to which a lock is granted to the process
US7921272B2 (en) * 2007-10-05 2011-04-05 International Business Machines Corporation Monitoring patterns of processes accessing addresses in a storage device to determine access parameters to apply
US7860836B1 (en) 2007-12-26 2010-12-28 Emc (Benelux) B.V., S.A.R.L. Method and apparatus to recover data in a continuous data protection environment using a journal
US7840536B1 (en) 2007-12-26 2010-11-23 Emc (Benelux) B.V., S.A.R.L. Methods and apparatus for dynamic journal expansion
US8041940B1 (en) 2007-12-26 2011-10-18 Emc Corporation Offloading encryption processing in a storage area network
US7958372B1 (en) 2007-12-26 2011-06-07 Emc (Benelux) B.V., S.A.R.L. Method and apparatus to convert a logical unit from a first encryption state to a second encryption state using a journal in a continuous data protection environment
US9178785B1 (en) 2008-01-24 2015-11-03 NextAxiom Technology, Inc Accounting for usage and usage-based pricing of runtime engine
US9501542B1 (en) 2008-03-11 2016-11-22 Emc Corporation Methods and apparatus for volume synchronization
US8108634B1 (en) 2008-06-27 2012-01-31 Emc B.V., S.A.R.L. Replicating a thin logical unit
US7840730B2 (en) 2008-06-27 2010-11-23 Microsoft Corporation Cluster shared volumes
US7719443B1 (en) 2008-06-27 2010-05-18 Emc Corporation Compressing data in a continuous data protection environment
US8719473B2 (en) * 2008-09-19 2014-05-06 Microsoft Corporation Resource arbitration for shared-write access via persistent reservation
US7882286B1 (en) 2008-09-26 2011-02-01 EMC (Benelux)B.V., S.A.R.L. Synchronizing volumes for replication
US8060714B1 (en) 2008-09-26 2011-11-15 Emc (Benelux) B.V., S.A.R.L. Initializing volumes in a replication system
WO2010041515A1 (en) 2008-10-06 2010-04-15 インターナショナル・ビジネス・マシーンズ・コーポレーション System accessing shared data by a plurality of application servers
US8171337B2 (en) 2009-03-30 2012-05-01 The Boeing Company Computer architectures using shared storage
US8296358B2 (en) * 2009-05-14 2012-10-23 Hewlett-Packard Development Company, L.P. Method and system for journaling data updates in a distributed file system
US20110055494A1 (en) * 2009-08-25 2011-03-03 Yahoo! Inc. Method for distributed direct object access storage
US8055615B2 (en) * 2009-08-25 2011-11-08 Yahoo! Inc. Method for efficient storage node replacement
US9311319B2 (en) * 2009-08-27 2016-04-12 Hewlett Packard Enterprise Development Lp Method and system for administration of storage objects
US20110093745A1 (en) * 2009-10-20 2011-04-21 Aviad Zlotnick Systems and methods for implementing test applications for systems using locks
US8510334B2 (en) 2009-11-05 2013-08-13 Oracle International Corporation Lock manager on disk
US8392680B1 (en) 2010-03-30 2013-03-05 Emc International Company Accessing a volume in a distributed environment
US8103937B1 (en) * 2010-03-31 2012-01-24 Emc Corporation Cas command network replication
US8381014B2 (en) 2010-05-06 2013-02-19 International Business Machines Corporation Node controller first failure error management for a distributed system
US20110276728A1 (en) * 2010-05-06 2011-11-10 Hitachi, Ltd. Method and apparatus for storage i/o path configuration
US8332687B1 (en) 2010-06-23 2012-12-11 Emc Corporation Splitter used in a continuous data protection environment
US9098462B1 (en) 2010-09-14 2015-08-04 The Boeing Company Communications via shared memory
US8433869B1 (en) 2010-09-27 2013-04-30 Emc International Company Virtualized consistency group using an enhanced splitter
US8478955B1 (en) 2010-09-27 2013-07-02 Emc International Company Virtualized consistency group using more than one data protection appliance
US8335771B1 (en) 2010-09-29 2012-12-18 Emc Corporation Storage array snapshots for logged access replication in a continuous data protection system
US8694700B1 (en) 2010-09-29 2014-04-08 Emc Corporation Using I/O track information for continuous push with splitter for storage device
US8589732B2 (en) * 2010-10-25 2013-11-19 Microsoft Corporation Consistent messaging with replication
US8335761B1 (en) 2010-12-02 2012-12-18 Emc International Company Replicating in a multi-copy environment
US8812916B2 (en) 2011-06-02 2014-08-19 International Business Machines Corporation Failure data management for a distributed computer system
US9256605B1 (en) 2011-08-03 2016-02-09 Emc Corporation Reading and writing to an unexposed device
US8973018B2 (en) 2011-08-23 2015-03-03 International Business Machines Corporation Configuring and relaying events from a storage controller to a host server
US8694724B1 (en) * 2011-09-06 2014-04-08 Emc Corporation Managing data storage by provisioning cache as a virtual device
US8898112B1 (en) 2011-09-07 2014-11-25 Emc Corporation Write signature command
US8560662B2 (en) 2011-09-12 2013-10-15 Microsoft Corporation Locking system for cluster updates
US9170852B2 (en) 2012-02-02 2015-10-27 Microsoft Technology Licensing, Llc Self-updating functionality in a distributed system
US20130290385A1 (en) * 2012-04-30 2013-10-31 Charles B. Morrey, III Durably recording events for performing file system operations
US9223659B1 (en) 2012-06-28 2015-12-29 Emc International Company Generating and accessing a virtual volume snapshot in a continuous data protection system
US9218295B2 (en) * 2012-07-13 2015-12-22 Ca, Inc. Methods and systems for implementing time-locks
US9336094B1 (en) 2012-09-13 2016-05-10 Emc International Company Scaleout replication of an application
US10235145B1 (en) 2012-09-13 2019-03-19 Emc International Company Distributed scale-out replication
US9081840B2 (en) * 2012-09-21 2015-07-14 Citigroup Technology, Inc. Methods and systems for modeling a replication topology
US9110914B1 (en) 2013-03-14 2015-08-18 Emc Corporation Continuous data protection using deduplication-based storage
US8996460B1 (en) 2013-03-14 2015-03-31 Emc Corporation Accessing an image in a continuous data protection using deduplication-based storage
US9383937B1 (en) 2013-03-14 2016-07-05 Emc Corporation Journal tiering in a continuous data protection system using deduplication-based storage
US9696939B1 (en) 2013-03-14 2017-07-04 EMC IP Holding Company LLC Replicating data using deduplication-based arrays using network-based replication
US9081842B1 (en) 2013-03-15 2015-07-14 Emc Corporation Synchronous and asymmetric asynchronous active-active-active data access
US9244997B1 (en) 2013-03-15 2016-01-26 Emc Corporation Asymmetric active-active access of asynchronously-protected data storage
US9152339B1 (en) 2013-03-15 2015-10-06 Emc Corporation Synchronization of asymmetric active-active, asynchronously-protected storage
US9069709B1 (en) 2013-06-24 2015-06-30 Emc International Company Dynamic granularity in data replication
US9087112B1 (en) 2013-06-24 2015-07-21 Emc International Company Consistency across snapshot shipping and continuous replication
US9146878B1 (en) 2013-06-25 2015-09-29 Emc Corporation Storage recovery from total cache loss using journal-based replication
US9454485B2 (en) 2013-08-01 2016-09-27 Lenovo Enterprise Solutions (Singapore) Pte. Ltd. Sharing local cache from a failover node
US9916243B2 (en) * 2013-10-25 2018-03-13 Advanced Micro Devices, Inc. Method and apparatus for performing a bus lock and translation lookaside buffer invalidation
US9367260B1 (en) 2013-12-13 2016-06-14 Emc Corporation Dynamic replication system
US9405765B1 (en) 2013-12-17 2016-08-02 Emc Corporation Replication of virtual machines
US9158630B1 (en) 2013-12-19 2015-10-13 Emc Corporation Testing integrity of replicated storage
US9372752B2 (en) * 2013-12-27 2016-06-21 Intel Corporation Assisted coherent shared memory
US10140194B2 (en) 2014-03-20 2018-11-27 Hewlett Packard Enterprise Development Lp Storage system transactions
US9189339B1 (en) 2014-03-28 2015-11-17 Emc Corporation Replication of a virtual distributed volume with virtual machine granualarity
US9686206B2 (en) * 2014-04-29 2017-06-20 Silicon Graphics International Corp. Temporal based collaborative mutual exclusion control of a shared resource
US9497140B2 (en) 2014-05-14 2016-11-15 International Business Machines Corporation Autonomous multi-node network configuration and self-awareness through establishment of a switch port group
US9274718B1 (en) 2014-06-20 2016-03-01 Emc Corporation Migration in replication system
US10082980B1 (en) 2014-06-20 2018-09-25 EMC IP Holding Company LLC Migration of snapshot in replication system using a log
US9619543B1 (en) 2014-06-23 2017-04-11 EMC IP Holding Company LLC Replicating in virtual desktop infrastructure
US10237342B2 (en) * 2014-09-17 2019-03-19 Dh2I Company Coordinated and high availability storage access
US10101943B1 (en) 2014-09-25 2018-10-16 EMC IP Holding Company LLC Realigning data in replication system
US10324798B1 (en) 2014-09-25 2019-06-18 EMC IP Holding Company LLC Restoring active areas of a logical unit
US10437783B1 (en) 2014-09-25 2019-10-08 EMC IP Holding Company LLC Recover storage array using remote deduplication device
US9529885B1 (en) 2014-09-29 2016-12-27 EMC IP Holding Company LLC Maintaining consistent point-in-time in asynchronous replication during virtual machine relocation
US9910621B1 (en) 2014-09-29 2018-03-06 EMC IP Holding Company LLC Backlogging I/O metadata utilizing counters to monitor write acknowledgements and no acknowledgements
US9600377B1 (en) 2014-12-03 2017-03-21 EMC IP Holding Company LLC Providing data protection using point-in-time images from multiple types of storage devices
US10496487B1 (en) 2014-12-03 2019-12-03 EMC IP Holding Company LLC Storing snapshot changes with snapshots
US9405481B1 (en) 2014-12-17 2016-08-02 Emc Corporation Replicating using volume multiplexing with consistency group file
US9632881B1 (en) 2015-03-24 2017-04-25 EMC IP Holding Company LLC Replication of a virtual distributed volume
US10296419B1 (en) 2015-03-27 2019-05-21 EMC IP Holding Company LLC Accessing a virtual device using a kernel
US9411535B1 (en) 2015-03-27 2016-08-09 Emc Corporation Accessing multiple virtual devices
US9678680B1 (en) 2015-03-30 2017-06-13 EMC IP Holding Company LLC Forming a protection domain in a storage architecture
US10853181B1 (en) 2015-06-29 2020-12-01 EMC IP Holding Company LLC Backing up volumes using fragment files
US10496538B2 (en) * 2015-06-30 2019-12-03 Veritas Technologies Llc System, method and mechanism to efficiently coordinate cache sharing between cluster nodes operating on the same regions of a file or the file system blocks shared among multiple files
US10360236B2 (en) * 2015-09-25 2019-07-23 International Business Machines Corporation Replicating structured query language (SQL) in a heterogeneous replication environment
US10341252B2 (en) * 2015-09-30 2019-07-02 Veritas Technologies Llc Partition arbitration optimization
US9684576B1 (en) 2015-12-21 2017-06-20 EMC IP Holding Company LLC Replication using a virtual distributed volume
US10067837B1 (en) 2015-12-28 2018-09-04 EMC IP Holding Company LLC Continuous data protection with cloud resources
US10133874B1 (en) 2015-12-28 2018-11-20 EMC IP Holding Company LLC Performing snapshot replication on a storage system not configured to support snapshot replication
US10235196B1 (en) 2015-12-28 2019-03-19 EMC IP Holding Company LLC Virtual machine joining or separating
US10579282B1 (en) 2016-03-30 2020-03-03 EMC IP Holding Company LLC Distributed copy in multi-copy replication where offset and size of I/O requests to replication site is half offset and size of I/O request to production volume
US10152267B1 (en) 2016-03-30 2018-12-11 Emc Corporation Replication data pull
US10235087B1 (en) 2016-03-30 2019-03-19 EMC IP Holding Company LLC Distributing journal data over multiple journals
US10235060B1 (en) 2016-04-14 2019-03-19 EMC IP Holding Company, LLC Multilevel snapshot replication for hot and cold regions of a storage system
CN106055417B (en) * 2016-06-02 2018-09-11 北京百度网讯科技有限公司 Method for message transmission and device for robot operating system
US10210073B1 (en) 2016-09-23 2019-02-19 EMC IP Holding Company, LLC Real time debugging of production replicated data with data obfuscation in a storage system
US10146961B1 (en) 2016-09-23 2018-12-04 EMC IP Holding Company LLC Encrypting replication journals in a storage system
US10346366B1 (en) 2016-09-23 2019-07-09 Amazon Technologies, Inc. Management of a data processing pipeline
US10019194B1 (en) 2016-09-23 2018-07-10 EMC IP Holding Company LLC Eventually consistent synchronous data replication in a storage system
US10666569B1 (en) * 2016-09-23 2020-05-26 Amazon Technologies, Inc. Journal service with named clients
US10235091B1 (en) 2016-09-23 2019-03-19 EMC IP Holding Company LLC Full sweep disk synchronization in a storage system
US10423459B1 (en) 2016-09-23 2019-09-24 Amazon Technologies, Inc. Resource manager
US10805238B1 (en) 2016-09-23 2020-10-13 Amazon Technologies, Inc. Management of alternative resources
US10235090B1 (en) 2016-09-23 2019-03-19 EMC IP Holding Company LLC Validating replication copy consistency using a hash function in a storage system
US10725915B1 (en) 2017-03-31 2020-07-28 Veritas Technologies Llc Methods and systems for maintaining cache coherency between caches of nodes in a clustered environment
US10459810B2 (en) 2017-07-06 2019-10-29 Oracle International Corporation Technique for higher availability in a multi-node system using replicated lock information to determine a set of data blocks for recovery
US11144493B1 (en) 2018-05-02 2021-10-12 Ecosense Lighting Inc. Composite interface circuit
CN109376014B (en) * 2018-10-19 2021-07-02 郑州云海信息技术有限公司 Distributed lock manager implementation method and system
US11880350B2 (en) * 2021-06-08 2024-01-23 International Business Machines Corporation Identifying resource lock ownership across a clustered computing environment
US12093144B1 (en) * 2023-04-24 2024-09-17 Dell Products, L.P. Method and system for performing cross platform restoration operations

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5751992A (en) * 1994-09-23 1998-05-12 International Business Machines Corporation Computer program product for continuous destaging of changed data from a shared cache in a multisystem shared disk environment wherein castout interest is established in a hierarchical fashion
US5850507A (en) * 1996-03-19 1998-12-15 Oracle Corporation Method and apparatus for improved transaction recovery
US6108654A (en) * 1997-10-31 2000-08-22 Oracle Corporation Method and system for locking resources in a computer system
US6163855A (en) * 1998-04-17 2000-12-19 Microsoft Corporation Method and system for replicated and consistent modifications in a server cluster

Family Cites Families (70)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0833857B2 (en) * 1987-02-18 1996-03-29 株式会社日立製作所 System database sharing system system system
JP2667039B2 (en) 1990-05-18 1997-10-22 株式会社東芝 Data management system and data management method
JPH0827755B2 (en) * 1991-02-15 1996-03-21 インターナショナル・ビジネス・マシーンズ・コーポレイション How to access data units at high speed
US5276872A (en) * 1991-06-25 1994-01-04 Digital Equipment Corporation Concurrency and recovery for index trees with nodal updates using multiple atomic actions by which the trees integrity is preserved during undesired system interruptions
US5438464A (en) * 1993-04-23 1995-08-01 Quantum Corporation Synchronization of multiple disk drive spindles
JP3023441B2 (en) * 1993-11-16 2000-03-21 株式会社日立製作所 Database division management method and parallel database system
DE4341877A1 (en) * 1993-12-08 1995-06-14 Siemens Ag Coordination to access multiple processors to common resource
US5454108A (en) * 1994-01-26 1995-09-26 International Business Machines Corporation Distributed lock manager using a passive, state-full control-server
JP3516362B2 (en) * 1995-03-01 2004-04-05 富士通株式会社 Shared data processing device and shared data processing system
US5699500A (en) * 1995-06-01 1997-12-16 Ncr Corporation Reliable datagram service provider for fast messaging in a clustered environment
US5594863A (en) * 1995-06-26 1997-01-14 Novell, Inc. Method and apparatus for network file recovery
US6356740B1 (en) * 1995-06-30 2002-03-12 Hughes Electronics Corporation Method and system of frequency stabilization in a mobile satellite communication system
JPH09114721A (en) 1995-10-19 1997-05-02 Nec Corp Device sharing method and device sharing system in local area network
US5678026A (en) 1995-12-28 1997-10-14 Unisys Corporation Multi-processor data processing system with control for granting multiple storage locks in parallel and parallel lock priority and second level cache priority queues
US6026426A (en) * 1996-04-30 2000-02-15 International Business Machines Corporation Application programming interface unifying multiple mechanisms
US6016505A (en) * 1996-04-30 2000-01-18 International Business Machines Corporation Program product to effect barrier synchronization in a distributed computing environment
US5920872A (en) * 1996-06-25 1999-07-06 Oracle Corporation Resource management using resource domains
US6044367A (en) * 1996-08-02 2000-03-28 Hewlett-Packard Company Distributed I/O store
US5875469A (en) * 1996-08-26 1999-02-23 International Business Machines Corporation Apparatus and method of snooping processors and look-aside caches
US6026474A (en) 1996-11-22 2000-02-15 Mangosoft Corporation Shared client-side web caching using globally addressable memory
US5987506A (en) * 1996-11-22 1999-11-16 Mangosoft Corporation Remote access and geographically distributed computers in a globally addressable storage environment
US5909540A (en) * 1996-11-22 1999-06-01 Mangosoft Corporation System and method for providing highly available data storage using globally addressable memory
US5974250A (en) * 1996-12-13 1999-10-26 Compaq Computer Corp. System and method for secure information transmission over a network
US6108757A (en) * 1997-02-28 2000-08-22 Lucent Technologies Inc. Method for locking a shared resource in multiprocessor system
US5913227A (en) * 1997-03-24 1999-06-15 Emc Corporation Agent-implemented locking mechanism
FR2762418B1 (en) * 1997-04-17 1999-06-11 Alsthom Cge Alcatel METHOD FOR MANAGING A SHARED MEMORY
US6237001B1 (en) * 1997-04-23 2001-05-22 Oracle Corporation Managing access to data in a distributed database environment
US6021508A (en) * 1997-07-11 2000-02-01 International Business Machines Corporation Parallel file system and method for independent metadata loggin
US5960446A (en) * 1997-07-11 1999-09-28 International Business Machines Corporation Parallel file system and method with allocation map
US5953719A (en) 1997-09-15 1999-09-14 International Business Machines Corporation Heterogeneous database system with dynamic commit procedure control
US6112281A (en) * 1997-10-07 2000-08-29 Oracle Corporation I/O forwarding in a cache coherent shared disk computer system
US6009466A (en) * 1997-10-31 1999-12-28 International Business Machines Corporation Network management system for enabling a user to configure a network of storage devices via a graphical user interface
JPH11143843A (en) 1997-11-06 1999-05-28 Hitachi Ltd Operation condition management method for plural nodes configuration system
US6199105B1 (en) * 1997-12-09 2001-03-06 Nec Corporation Recovery system for system coupling apparatuses, and recording medium recording recovery program
US6256740B1 (en) 1998-02-06 2001-07-03 Ncr Corporation Name service for multinode system segmented into I/O and compute nodes, generating guid at I/O node and exporting guid to compute nodes via interconnect fabric
US6173293B1 (en) * 1998-03-13 2001-01-09 Digital Equipment Corporation Scalable distributed file system
US6438582B1 (en) * 1998-07-21 2002-08-20 International Business Machines Corporation Method and system for efficiently coordinating commit processing in a parallel or distributed database system
US6272491B1 (en) * 1998-08-24 2001-08-07 Oracle Corporation Method and system for mastering locks in a multiple server database system
US6154512A (en) * 1998-11-19 2000-11-28 Nortel Networks Corporation Digital phase lock loop with control for enabling and disabling synchronization
US6178519B1 (en) * 1998-12-10 2001-01-23 Mci Worldcom, Inc. Cluster-wide database system
US6757277B1 (en) * 1999-01-26 2004-06-29 Siemens Information And Communication Networks, Inc. System and method for coding algorithm policy adjustment in telephony-over-LAN networks
US6226717B1 (en) * 1999-02-04 2001-05-01 Compaq Computer Corporation System and method for exclusive access to shared storage
US6269410B1 (en) * 1999-02-12 2001-07-31 Hewlett-Packard Co Method and apparatus for using system traces to characterize workloads in a data storage system
US6725392B1 (en) * 1999-03-03 2004-04-20 Adaptec, Inc. Controller fault recovery system for a distributed file system
WO2000062502A2 (en) 1999-04-12 2000-10-19 Rainfinity, Inc. Distributed server cluster for controlling network traffic
CA2338025C (en) * 1999-05-20 2004-06-22 Ivan Chung-Shung Hwang A method and apparatus for implementing a workgroup server array
US6421723B1 (en) * 1999-06-11 2002-07-16 Dell Products L.P. Method and system for establishing a storage area network configuration
JP4057201B2 (en) 1999-09-16 2008-03-05 富士通株式会社 High-speed data exchange method between different computers and extent extraction / conversion program recording medium
US6598058B2 (en) * 1999-09-22 2003-07-22 International Business Machines Corporation Method and apparatus for cross-node sharing of cached dynamic SQL in a multiple relational database management system environment
US6865549B1 (en) * 1999-11-15 2005-03-08 Sun Microsystems, Inc. Method and apparatus for concurrency control in a policy-based management system
US6473819B1 (en) * 1999-12-17 2002-10-29 International Business Machines Corporation Scalable interruptible queue locks for shared-memory multiprocessor
US6618819B1 (en) * 1999-12-23 2003-09-09 Nortel Networks Limited Sparing system and method to accommodate equipment failures in critical systems
US6370625B1 (en) 1999-12-29 2002-04-09 Intel Corporation Method and apparatus for lock synchronization in a microprocessor system
US7062648B2 (en) 2000-02-18 2006-06-13 Avamar Technologies, Inc. System and method for redundant array network storage
US6643748B1 (en) * 2000-04-20 2003-11-04 Microsoft Corporation Programmatic masking of storage units
US20030041138A1 (en) * 2000-05-02 2003-02-27 Sun Microsystems, Inc. Cluster membership monitor
US6530004B1 (en) * 2000-06-20 2003-03-04 International Business Machines Corporation Efficient fault-tolerant preservation of data integrity during dynamic RAID data migration
US7844513B2 (en) 2000-07-17 2010-11-30 Galactic Computing Corporation Bvi/Bc Method and system for operating a commissioned e-commerce service prover
EP1312179B1 (en) 2000-08-17 2012-12-05 Broadcom Corporation Method and system for transmitting isochronous voice in a wireless network
US6665814B2 (en) * 2000-11-29 2003-12-16 International Business Machines Corporation Method and apparatus for providing serialization support for a computer system
US6976060B2 (en) * 2000-12-05 2005-12-13 Agami Sytems, Inc. Symmetric shared file storage system
US8219662B2 (en) * 2000-12-06 2012-07-10 International Business Machines Corporation Redirecting data generated by network devices
US20040213239A1 (en) * 2000-12-15 2004-10-28 Lin Xinming A. Implementation of IP multicast on ATM network with EMCON links
US6804794B1 (en) * 2001-02-28 2004-10-12 Emc Corporation Error condition handling
US7130316B2 (en) 2001-04-11 2006-10-31 Ati Technologies, Inc. System for frame based audio synchronization and method thereof
US7107319B2 (en) * 2001-05-31 2006-09-12 Oracle Corporation Method and apparatus for reducing latency and message traffic during data and lock transfer in a multi-node system
US6708175B2 (en) * 2001-06-06 2004-03-16 International Business Machines Corporation Program support for disk fencing in a shared disk parallel file system across storage area network
US7266722B2 (en) * 2001-09-21 2007-09-04 Hewlett-Packard Development Company, L.P. System and method for efficient lock recovery
US6871268B2 (en) * 2002-03-07 2005-03-22 International Business Machines Corporation Methods and systems for distributed caching in presence of updates and in accordance with holding times
US6862666B2 (en) * 2002-05-16 2005-03-01 Sun Microsystems, Inc. Hardware assisted lease-based access to memory

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5751992A (en) * 1994-09-23 1998-05-12 International Business Machines Corporation Computer program product for continuous destaging of changed data from a shared cache in a multisystem shared disk environment wherein castout interest is established in a hierarchical fashion
US5850507A (en) * 1996-03-19 1998-12-15 Oracle Corporation Method and apparatus for improved transaction recovery
US6108654A (en) * 1997-10-31 2000-08-22 Oracle Corporation Method and system for locking resources in a computer system
US6163855A (en) * 1998-04-17 2000-12-19 Microsoft Corporation Method and system for replicated and consistent modifications in a server cluster

Also Published As

Publication number Publication date
WO2003025751A9 (en) 2004-05-06
US7437386B2 (en) 2008-10-14
JP2005504369A (en) 2005-02-10
US20040202013A1 (en) 2004-10-14
WO2003027853A1 (en) 2003-04-03
US7149853B2 (en) 2006-12-12
CN1320483C (en) 2007-06-06
WO2003025801A1 (en) 2003-03-27
CN1589448A (en) 2005-03-02
US7266722B2 (en) 2007-09-04
CN1302419C (en) 2007-02-28
CA2460833C (en) 2013-02-26
US7240057B2 (en) 2007-07-03
CN1589447A (en) 2005-03-02
US20050015640A1 (en) 2005-01-20
CA2461015A1 (en) 2003-04-03
US7496646B2 (en) 2009-02-24
WO2003025780A8 (en) 2004-04-01
CA2460833A1 (en) 2003-03-27
AU2002341784A1 (en) 2003-04-01
JP2005534081A (en) 2005-11-10
US20030065686A1 (en) 2003-04-03
US20030079155A1 (en) 2003-04-24
US7467330B2 (en) 2008-12-16
US20030065760A1 (en) 2003-04-03
US20030065896A1 (en) 2003-04-03
WO2003025780A9 (en) 2004-03-04
US20070033436A1 (en) 2007-02-08
US20030065672A1 (en) 2003-04-03
US7111197B2 (en) 2006-09-19
JP4249622B2 (en) 2009-04-02
EP1428149B1 (en) 2012-11-07
EP1428151A4 (en) 2007-08-01
WO2003025751A1 (en) 2003-03-27
WO2003027903A1 (en) 2003-04-03
EP1428149A4 (en) 2007-04-04
EP1428151A1 (en) 2004-06-16
EP1428149A1 (en) 2004-06-16

Similar Documents

Publication Publication Date Title
US7467330B2 (en) System and method for journal recovery for multinode environments
US5193162A (en) Cache memory with data compaction for use in the audit trail of a data processing system having record locking capabilities
KR101833114B1 (en) Fast crash recovery for distributed database systems
US6163856A (en) Method and apparatus for file system disaster recovery
US6144999A (en) Method and apparatus for file system disaster recovery
US6185663B1 (en) Computer method and apparatus for file system block allocation with multiple redo
US6873995B2 (en) Method, system, and program product for transaction management in a distributed content management application
US6185577B1 (en) Method and apparatus for incremental undo
US7376651B2 (en) Virtual storage device that uses volatile memory
US5845292A (en) System and method for restoring a distributed checkpointed database
US6574749B1 (en) Reliable distributed shared memory
Lu et al. Isolation-only transactions for mobile computing
US20140330802A1 (en) Metadata structures and related locking techniques to improve performance and scalability in a cluster file system
US20030208500A1 (en) Multi-level undo of main-memory and volatile resources
US8214377B2 (en) Method, system, and program for managing groups of objects when there are different group types
JPH1097451A (en) Method and device for optimizing log file of client/server computer system
KR20150129839A (en) System-wide checkpoint avoidance for distributed database systems
EP1320802A2 (en) A method and system for highly-parallel logging and recovery operation in main-memory transaction processing systems
US6948093B2 (en) Data processing arrangement and method
Ruffin A survey of logging uses
US8805886B1 (en) Recoverable single-phase logging
KR950011056B1 (en) Method of log/recovery management in transaction processing system
Gustavsson On recovery and consistency preservation in distributed real-time database systems
SE Cloudy Transactions: Cooperative XML Authoring on Amazon S3.

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BY BZ CA CH CN CO CR CU CZ DE DM DZ EC EE ES FI GB GD GE GH HR HU ID IL IN IS JP KE KG KP KR LC LK LR LS LT LU LV MA MD MG MN MW MX MZ NO NZ OM PH PL PT RU SD SE SG SI SK SL TJ TM TN TR TZ UA UG UZ VN YU ZA ZM

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ UG ZM ZW AM AZ BY KG KZ RU TJ TM AT BE BG CH CY CZ DK EE ES FI FR GB GR IE IT LU MC PT SE SK TR BF BJ CF CG CI GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
COP Corrected version of pamphlet

Free format text: PAGES 1/12-12/12, DRAWINGS, REPLACED BY NEW PAGES 1/12-12/12

CFP Corrected version of a pamphlet front page
CR1 Correction of entry in section i

Free format text: IN PCT GAZETTE 13/2003 UNDER (71) THE NATIONALITY AND RESIDENCE OF "POLYSERVE, INC." SHOULD READ "[US/US]"

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP