WO2001059568A2

WO2001059568A2 - Active cooperation deadlock detection system/method in a distributed database network

Info

Publication number: WO2001059568A2
Application number: PCT/SE2001/000265
Authority: WO
Inventors: Ulf T. Wiger; Thomas H. J. J. Arts
Original assignee: Telefonaktiebolaget Lm Ericsson (Publ)
Priority date: 2000-02-11
Filing date: 2001-02-09
Publication date: 2001-08-16
Also published as: GB2374700A; GB0216640D0; AU2001232565A1; WO2001059568A3

Abstract

In a network including a distributed database (e.g. replicated database) which includes data objects, a data object locking/unlocking system/method is provided where clients actively cooperate with one another during lock negotiations so that deadlock may be detected and broken. Deadlock detection (e.g. via WFG analysis) is performed on a local level by clients based in part upon messages that clients receive from one another regarding transactions being executed. For example, a first client (i.e. a process or thread working on a task and seeking to access one or more data objects) may send a message to a second client of the network indicating that the first client is holding a particular data object and waiting for another particular data object. By combining this information with facts already known to it, the second client can determine that it and the first client are involved in a deadlock. Once detected, deadlock may be resolved or broken by one of the two or more clients in the deadlock surrendering a lock on a data object.

Description

ACTIVE COOPERATION DEADLOCK

DETECTION SYSTEM/METHOD IN

A DISTRIBUTED DATABASE NETWORK

1. Field of Invention

This invention pertains to distributed database networks, and more particularly to a system and method for detecting deadlocks in such networks.

2. Related Art and Other Considerations

When executing application programs, computers frequently assign and/or change values of data objects (e.g. a data object may be, for example, a single data value, a set of data values, any parameter(s) to be changed, or an executable set of code with its own parameter(s)). In many instances data objects are stored as part of a database. In more complex systems in which a plurality of computers are networked together, more than one computer may require access to a certain data object, and may change or update the value of that data object. To cater to such a multi-computer networked environment, a distributed database network/system can be established in which a database(s) is made available to a plurality of computers in the network regardless of where the database tables are located. For example, database table Tl may reside on a first computer while database table T2 resides on a second computer in the network; but both tables appear and function the same to all clients on these computers (i.e. location transparency). Thus, in a distributed database system, all accessible data objects need not be replicated and stored on each computer in a network.

A replicated database system is a specific type of distributed database system, and is established where each computer in the network maintains its own version of the database. All other computers are advised of changes to a data object in a database so that consistency is maintained among objects in the replicated databases. Replicated databases have two primary advantages. A first advantage is fault tolerance. A second advantage is that local access to i replicated database is faster and less expensive than remote access to a database at another computer.

Each computer can have one or more programs or processes for accessing its local version of a replicated database in order to perform respective tasks. A task (or transaction) is a stream of activity, and may be, for example, a database operation, file or data record request, register access or use, memory access request or use, etc. When a particular process (i.e. client) needs to exclusively access a local version of a data object in the replicated database, an exclusive "lock" of the data object must be obtained by the accessing process in order to ensure exclusive access to the data object. When a data object is exclusively "locked" by one process, no other process of the network may access it.

In the case of a distributed database network, a client can obtain a lock on a data object that is either located on the same computer or on another computer in the network. In the case of a replicated database network, because a copy of the object is on each computer in the network a lock management system recognizes the fact that an object is replicated, and simultaneously locks all instances of the database object as soon as one local instance of the object is locked. Many so-called concurrency control algorithms or strategies use various forms of locking to achieve such goals.

In furtherance of the aforesaid example, a process may require access to multiple data objects in order to complete a particular task on which it is working.

Unfortunately, the combination of such requirements with exclusive locking of objects on a distributed (e.g. replicated) database network can lead to "deadlock."

A set of processes is "deadlocked" when each process in the set is waiting for an event (e.g. release of a data object) that only another process in the set can cause. An illustrative example of deadlock is shown in Figure 1. Computer system 11 includes process x (client 1) while computer system 13 includes process y (client 2). Each process requires access to data objects A and B (referred to by reference numerals 15 and 17, respectively) to complete their respective transactions. As illustrated in Figure 1, client 1 has an exclusive lock on data object A while client 2 has an exclusive lock on data object B (exclusive locking is illustrated by solid lines). At the same time, client 2 is waiting to access data object A, and client 1 is waiting to access data object B

(requests for access or locking are illustrated by dotted or broken lines indicative of a client waiting). Thus, client 2 cannot access data object A until client 1 releases its lock on data object A. Likewise, client 1 cannot access or lock data object B until client 2 releases its lock on data object B. Deadlock results, as processes x and y (i.e. clients 1 and 2) are each awaiting the object locked by the other. Because the respective locks are not released, the processes cannot complete their respective transactions and the situation may last endlessly with each process unable to carry out further processing (i.e. a circular wait).

Generally speaking, deadlock prevention has often been thought to be better than deadlock detection. When implementing deadlock prevention, transactions are typically restarted when it is determined that a requested operation, if allowed, might cause deadlock. Unfortunately, this may often result in unnecessary transaction restarts and is undesirable for at least this reason.

Deadlock detection, on the other hand, has been difficult to efficiently implement in distributed systems, since it is often based on analysis of wait-for-graphs (WFGs) which must contain all relevant dependencies to be useful. See "Readings in Database Systems", 2^nd Edition, by Michael Stonebreaker, including the paper "Concurrency Control in Distributed Database Systems" by P.A. Bernstein, et. al.

A WFG can be conceptualized as a directed acrylic graph (DAG) in which a node is inserted for each transaction (T). For example, if transaction Ti needs a lock which is held exclusively by transaction Tj, an edge Ti- Tj is created to illustrate that Ti waits for Tj. A deadlock condition exists if and only if the graph contains a cycle. For purposes of example only, assume that Tj is also waiting for Ti, in addition to Ti- Tj. The result is Ti-^Tj-^Ti, which is a deadlock. Deadlock cycles may also be indirect. For example, Ti-^Tj->Tk- Ti (Ti waits for Tj, which waits for Tk, which waits for Ti) is another example of deadlock. For a discussion of certain additional WFG technology, see also U.S. Patent Nos. 5,459,871 and 5,835,766, the disclosures of which are both hereby incorporated herein by reference.

Deadlock detection systems often assume the presence of a central lock manager, and/or central lock table, in which the presence of deadlock can be detected based on global WFG analysis. In other words, one site is designated as the deadlock detector for the system or network. Each scheduler or lock manager of other sites of the network periodically sends its local information (including all objects that its client is locked on and all that it is waiting for) to the one designated site. The deadlock detecting designated site merges the same into a global WFG to determine deadlock cycles. Client processes/threads at other non-central sites in the network are typically unaware of WFG updates.

Such a centralized control (or arbitrator) renders deadlock detection expensive since the central arbitrator must handle a WFG which includes all ongoing transactions in the network or system. The resulting high cost creates the need for either periodic (i.e., non-real-time) analysis or more conservative approaches such as timestamp ordering. Unfortunately, periodic analysis may degrade performance, increase detection cost, and/or introduce "phantom deadlocks" (i.e. incorrect recognition of deadlocks causing transactions to be restarted unnecessarily).

A distributed deadlock detection system is described at slide 68 of 118, in "Transaction Management in Distributed Computing System" by A. Zaslavsky, accessible at www.ct.monash.edu.au. Unfortunately, this system suffers from many of the above-listed problems. For example, analysis of WFGs becomes undesirably expensive if the graph includes all ongoing transactions on a given processor. In Zaslavsky, this is the case because a WFG at a node includes all transactions from all other processors on the network that are in any way related to any transaction currently in process at that node. Client processes throughout the network do not communicate with one another in Zaslavsky regarding locking, and are apparently unaware of WFG messages sent between nodes. A typical scenario may involve hundreds of ongoing transactions at a given node, where none or only a few are in any danger of deadlock. Thus, it is wasteful and expensive to have the node analyze WFGs for all ongoing transactions in the network, most of which have nothing to do with those in danger of deadlock. In such a scenario, a central scanner of the WFG at each node becomes too heavy to be interrupt-driven and therefore is unable to practically detect deadlocks in real-time. The high expense results in periodic forwarding of WFGs and/or periodic WFG analysis, which are undesirable for the above-listed reasons.

What is needed therefore, and an object of the present invention, is an efficient method and apparatus for detecting deadlock in a distributed database network where a plurality of processes (clients) require access to common data objects in order to complete respective transactions/tasks.

SUMMARY A method/system of detecting deadlock in a distributed database network

(including but not limited to a replicated database network) is provided. At least one client (process or thread for executing a transaction) in the network may transmit information to at least one other client so as to enable the other client to detect deadlock. Clients need not communicate with one another absent deadlock. Such active cooperation between client(s) enables each client in the network to have its own deadlock detection system. In certain embodiments, each client's deadlock detection system need only store and analyze information related to the transaction which that client is executing, thereby enabling deadlock to be efficiently detected in approximate real time with minimal communications cost. Moreover, unnecessary transaction/task restarts as well as the need for a centralized deadlock detector may be reduced or even eliminated.

BRIEF DESCRIPTION OF THE DRAWINGS

The foregoing and other objects, features, and advantages of the invention will be apparent from the following more particular description of preferred embodiments as illustrated in the accompanying drawings in which reference characters refer to the same parts throughout the various views. The drawings are not necessarily to scale, emphasis instead being placed upon illustrating principles of the invention.

Fig. 1 is a schematic view of first and second clients in a deadlock scenario.

Fig. 2 is a schematic view of a network including two nodes whereat replicated data objects reside in accordance with an embodiment of this invention.

Fig. 3 is a schematic view of a plurality of clients (processes) accessing a plurality of data objects in a distributed or replicated database network in accordance with an embodiment of this invention.

Fig. 4 is a schematic diagram of three clients (processes) completing respective transactions utilizing three data objects in a manner such that deadlock does not occur.

Fig. 5 is a schematic diagram illustrating a deadlock detection system/method in a distributed or replicated database network in accordance with an embodiment of this invention that enables deadlock to be detected in real time and resolved in an efficient manner. Figs. 6(a) through 6(h) are schematic diagrams illustrating certain basic steps taken in accordance with the deadlock detection and resolution of Fig. 5.

Figs. 7(a) through 7(d) illustrate certain basic steps taken in the updating of client Cl 's WFG during the course of messages 5-1 through 5-16 of Fig. 5.

Figs. 8(a) through 8(c) illustrate certain basic steps taken in the updating of client

C2's WFG during the course of messages 5-1 through 5-16 of Fig. 5.

Figs. 9(a) through 9(c) illustrate certain basic steps taken in the updating of client C3's WFG during the course of messages 5-1 through 5-16 of Fig. 5.

Fig. 10 is a flowchart illustrating how a client may determine whether to send another client a message about a lock in accordance with a particular embodiment of this invention.

Figure 11 is a flowchart illustrating steps taken by an object algorithm in accordance with an embodiment of this invention.

Figure 12 is a flowchart illustrating steps taken by a client algorithm in accordance with an embodiment of this invention.

Figure 13 is a flowchart illustrating steps taken by a client algorithm in accordance with an embodiment of this invention.

DETAILED DESCRIPTION OF THE DRAWINGS

In the following description, for purposes of explanation and not limitation, specific details are set forth such as particular architectures, interfaces, techniques, etc. in order to provide a thorough understanding of the present invention. However, it will be apparent to those skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known devices, circuits, graphs, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

Fig. 2 shows a replicated database network 20 comprising two illustrative nodes 30A and 30B. Each node has its own version of a replicated database 33 including at least data objects Ol, 02, and 03. Specifically, node 30A includes hard disk 32A whereon its version of the replicated database, referenced as 33A, is stored. Similarly, node 30B includes hard disk 32B whereon its version of the replicated database, referenced as 33B, is stored. While Fig. 2 illustrates a replicated database network, it is noted that this invention is also applicable to other types of distributed database networks including those where accessible data objects are not stored on all computers in the network.

Referring again to Fig. 2, since versions of database 33 (including the respective data objects 01, 02 and 03) are stored both at nodes 30A and 30B, when one node updates the value of a data object, the updated value is communicated to the other node so that the other node can likewise have the updated value, thereby maintaining a coordination of the value of each data object in replicated database 33 across the network. In a similar manner, when an object at one node is locked by a client, that same object if present at other nodes or databases must also be locked at the other locations. Such cooperative locking may be performed by a central lock manager or alternatively the objects themselves in different embodiments of this invention.

Each node 30A and 30B includes a processor or CPU (40A and 40B respectively) which is connected by an internal bus (42A and 42B respectively) to numerous elements. Illustrated ones of the elements connected to internal bus 42 include a read only memory (ROM) (43 A and 43B respectively); a random access memory (RAM) (44A and 44B respectively); a disk drive interface (45 A and 45B respectively); and a network interface (46 A and 46B respectively). Disk drive interfaces 45A and 45B are connected to respective disk drives 50A and 50B at each node. Network interfaces 46 connect to network link 60 over which the nodes 30A and 30B communicate with one another and with other similar nodes of the network.

Hard disks 32A and 32B are one example of a node-status inviolable memory or storage medium. "Node-status inviolable" means that the contents of the memory remain unaffected when the node crashes or assumes a down status. Although the node-status inviolable memory is illustrated in one embodiment as being a hard magnetic disk, it should be understood that other types of memory, e.g., optical disk, magneto-optical disk, magnetic tape, etc., may be utilized for storage by the nodes of the network.

Processors 40A and 40B execute sets of instructions in respective operating system(s), which in turn allow the processors to execute various application programs which are preferably stored on hard disks 32 A and 32B. Optionally, a set of instructions embodied in a computer product and known as a lock manager application program (LOCK MANAGER) (73A and 73B) may also be provided at each node, or alternatively a objects may take care of locking themselves or a centralized locking manager may be provided for the entire network. Processors 40 of the respective nodes execute application programs 70A, 70B. In order to be executed, such programs must be loaded from the medium on which they are stored, e.g., hard disk 32A, 32B, into RAM 44A, 44B.

Fig. 3 is a diagram relating to the network of Fig. 2 or any other type of distributed database network, including clients C1-C3 and database(s) 33 in which data objects 01, 02, and 03 are stored. Each database 33 may contain a complete set of data objects 01-03, or alternatively certain objects (e.g. 01-02) may be stored in a first database at one node of the network and other objects (e.g. 03) may be stored in a second database at another node of the network. The term "client" as used herein means a "process" or "thread" working on a task or transaction. Typically, a client may only work on one transaction at a time, with transactions being performed in a sequential manner (one transaction may not be started by a client until the previous one being executed by that client has been completed). A "thread" is similar to a "process" in this respect, with this term being used by, for example, programming language JANA. Referring to Figs. 2-3 and all other embodiments herein, clients Cl, C2 and C3 may all be at the same node (i.e., run by the same processor), or alternatively may be distributed among plural nodes (e.g. client Cl may be at a first node with a first processor, client C2 at a second node with a second processor, and client C3 at a third node with a third processor).

As shown in Fig. 3, each client Cl, C2 and C3 has a transaction/task to perform involving at least two different database objects. For example, client Cl requires exclusive access to data objects 01 and 02 in order to complete its transaction, while C2 requires exclusive access to objects 02 and 03 to complete its transaction, and C3 requires exclusive access to objects 02 and 03 in order to complete its transaction. Each data object may be either active (handling its own locks) or passive (locking is administered by a lock manager 73). However, a useful abstraction in the embodiments set forth below and in Figs. 4-9 is for clients to think they do in fact communicate with objects 01-03 directly.

In accordance with certain embodiments of this invention, an active cooperation deadlock detection system/method has clients Cl, C2 and C3 sending or volunteering lock and waiting information to one another on a per transaction basis. For example, a first client may inform a second client (which is waiting for the first client to release/surrender a lock on a data object) over network link(s) 60 about other lock operations in which the first client (or another client) is involved. Absent such a message, the second client would have no way of getting the complete picture of what other lock operations the first client (or another client) is involved in and/or what objects for which the first client (or another client) is waiting. Given such knowledge, it is possible for each client to maintain a simplified deadlock detection system including a WFG relating just to its transaction (i.e. the localized WFG includes the transaction upon which the client is working as well as any transaction related thereto). The term "related" is used in a broad sense; for example, if object Ol has a queue AB, object 02 a queue BC, object 03 a queue CA, and object 04 a queue CB, then client A still analyzes and stores information of the transaction on object 02, although it is not executing that transaction (because client A is involved in transaction relating to B and/or C). The WFG at each client is updated by an object (or the client) with data received from other client(s) relating to its transaction, so that each client can detect deadlock relating to its transaction on a substantial real-time basis. There is no need for a network centralized deadlock detector (although one may be used in non-preferred embodiments of this invention), and unnecessary transaction restarts may be reduced or avoided. Thus, deadlock may be more efficiently detected and resolved.

Figure 4 illustrates a sequence of messages involving clients Cl, C2 and C3, and data objects Ol, 02 and 03, in which the sequencing takes place in a manner that avoids deadlock (i.e. deadlock does not occur in this example). In the Fig. 4 example, client Cl requires exclusive access to data objects 01 and 02 in order to complete its transaction, while client C2 requires exclusive access to objects 02 and 03 to complete its transaction, and client C3 requires exclusive access to data objects Ol and 03 in order to complete its transaction.

Client Cl starts by requesting (message 4-1) an exclusive lock on data object Ol .

Data object Ol responds (directly or via a lock manager) to client Cl indicating that the lock on it has been approved (message 4-2). Thus, client Cl has its requested lock on data object 01. All other nodes of the network may or may not be informed of a lock on data object 01 at this time. Client Cl then requests (message 4-3) an exclusive lock on data object 02. Data object 02 responds (message 4-4) indicating that the lock has been approved. Client Cl thus has its requested exclusive locks on data objects 01 and 02, so that only client Cl may access and/or vary data objects 01 and 02 during the locking period (no other client/process on the network may access or vary these objects so long as the exclusive locks remain in place). Client Cl proceeds and completes its transaction/task, after which it unlocks data objects Ol (message 4-5) and 02 (message 4-6) freeing up these data objects for access by other clients of the network. After client Cl finishes, client C2 requests (message 4-7) an exclusive lock on data object 02. Data object 02 responds (message 4-8) indicating that the lock has been approved. Client C2 then requests (message 4-9) an exclusive lock on data object 03. Data object 03 responds (message 4-10) to client C2 indicating that the lock on it has been approved. Client C2 thus has its requested exclusive locks on data objects 02 and 03, so that only client C2 may access and/or vary these two data objects during the locking periods.

Client C2 proceeds and completes its transaction, after which it unlocks data objects 02 (message 4-11) and 03 (message 4-12) thereby freeing up these objects for access by other clients of the network. After client C2 finishes, client C3 begins by requesting (message 4-13) an exclusive lock on data object 01. Data object Ol responds (message 4-14) indicating that the lock has been approved. Client C3 then requests (message 4- 15) an exclusive lock on data object 03. Data object 03 responds (message 4-16) to client C3 indicating that the lock on it has been approved. Client C3 thus has its requested exclusive locks on objects 01 and 03, so that only client C3 may access and/or vary these two data objects during the locking periods. Client C3 proceeds and completes its transaction, after which it unlocks objects Ol (message 4-17) and 03

(message 4-18) thereby freeing up these objects for access by other clients/processes of the network. In Fig. 4, clients C1-C3 were able to complete their transactions with six locking operations and six unlocking operations being requested. Deadlock was avoided in large part due to the fact that each client was able to finish its transaction without any interleaving of other transactions. The total sequence required eighteen messages 4-1 through 4-18 (two messages per each lock operation plus one message per each unlock operation). However, clients are not always so fortunate in avoiding deadlock as will be seen below with reference to Figs. 5-9.

In Fig. 5, given the same clients and data object requirements as in Fig. 4, deadlock occurs due to the illustrated interleaving of transactions. In accordance with an embodiment of this invention, the deadlock is detected by at least one client (as opposed to a centralized deadlock detection system) and rectified in the Fig. 5 scenario using an additional eleven messages (for a total of twenty-nine - the eighteen of Fig. 4 plus the additional eleven) for all transactions to be completed.

While Fig. 5 illustrates sequencing between clients C1-C3 and data objects Oi03, Figs. 6(a) through 6(h) schematically illustrate locking scenarios as they unfold throughout the sequencing of Fig. 5. Here, as in Fig. 4, client Cl requires exclusive access to data objects 01 and 02 in order to complete its transaction TI, while client C2 requires exclusive access to data objects 02 and 03 to complete its transaction T2, and client C3 requires exclusive access to data objects 01 and 03 in order to complete its transaction T3. Typically, T1-T3 are separate and distinct transactions.

The localized WFGs of clients C1-C3, as they exist from a visual perspective at the beginning of the Fig. 5 sequence, are illustrated in Figs. 7(a) [client Cl's transaction TI], 8(a) [client C2's transaction T2], and 9(a) [client C3's transaction T3], respectively. In other words, each localized WFG of a client at the initiation of a transaction includes only the transaction (TI, T2 or T3) to be performed by that process. As shown in Figs. 7-9, the localized WFGs for different clients are typically different for the reasons discussed below, although in certain scenarios more than one WFG at different clients may end up being the same at different points in a message sequencing scenario.

Referring to Fig. 5, client Cl initially requests (message 5-1) an exclusive lock on data object Ol . Data object Ol responds (directly or via a lock manager) to client Cl indicating that the lock on it has been approved (message 5-2). Thus, client Cl has its requested exclusive lock on data object Ol , so that only client Cl may access and/or vary data object Ol during the locking period (no other client or process on the network may access or vary data object 01 so long as this lock remains on it). Once client Cl has its lock on data object Ol, the next event in the Fig. 5 sequence is client C2 requesting (message 5-3) an exclusive lock on data object 02. Thus, in contrast to the scenario of Fig. 4, here client C2 begins its task before client Cl has completed its transaction. Data object 02 responds (message 5-4) to client C2 indicating that the requested exclusive lock on it has been approved as it was not otherwise locked at the time of the request. Client C3 then (before either Cl or C2 have completed their transactions) requests (message 5-5) an exclusive lock on data object 03. Data object 03 responds (message 5-6) to client C3 indicating that the requested exclusive lock on it has been approved. At this point, client Cl has an exclusive lock on data object Ol , client C2 an exclusive lock on data object 02, and client C3 an exclusive lock on data object 03. Thus, the respective localized WFGs of clients C1-C3 remain as in Figs. 7(a), 8(a), and 9(a), respectively.

Cl then requests (message 5-7) an exclusive lock on data object 02. Data object 02 responds (message 5-8) to Cl indicating that the lock request cannot be approved because data object 02 is already locked by client C2, thereby denying client Cl 's request and telling client Cl to wait pending release of client C2's lock on data object 02. Thus, client Cl is pending on data object 02 at this point. Because client Cl is now aware of client C2's lock on data object 02, client Cl updates its WFG to include the fact that client Cl's transaction TI is waiting on client C2's transaction T2, as shown in Fig. 7(b) [i.e. T1- T2]. When denying a lock to client Cl, data object 02 also sends a message (message 5-9) to client C2 (which has a lock on data object 02) indicating that client Cl is waiting for data object 02. Client C2 now knows that client Cl 's transaction is related in some respect to client C2's transaction, as they both require access to data object 02. Client C2 updates its WFG accordingly as shown in Fig. 8(b) [T1 ->T2].

Thus, in general, when a data object (or its lock manager) is locked by a first client and receives a lock request from a second client, the object responds to both of the first and second clients informing each of them of which client has its lock and which is pending, thereby updating clients with regard to other clients executing related transactions.

Referring again to Fig. 5, client C2 then requests (message 5-10) an exclusive lock on data object 03. Data object 03 responds (message 5-11) to client C2 indicating that the lock request cannot be approved because data object 03 is already locked by client C3, thereby denying client C2's request and telling client C2 to wait pending release of the 03 lock. Thus, client C2 is pending on data object 03 at this point. Client C2 updates its WFG accordingly as shown in Fig. 8c, which illustrates client Cl 's transaction TI waiting on client C2's transaction T2, which in turn is waiting on client C3's transaction T3 [i.e. T1-^T2- T3]. When denying a lock to client C2, data object 03 also sends a message (message 5-12) to client C3 (which already has a lock on data object 03) indicating that client C2 is waiting for data object 03. Client C3 updates its localized WFG accordingly as shown in Fig. 9(b) [i.e. T2->T3]. Thus, while client C3 has data object 03 exclusively locked, it also knows that client C2's transaction T2 is waiting for data object 03. Since client C2 was informed by data object 02 that client Cl was waiting for 02 (i.e. client Cl waiting for client C2 to release object 02), and by data object 03 that client C3 holds data object 03, client C2 determines that its transaction T2 is related in some respect to those (TI and T3) of client Cl and client C3, and that it and client Cl are in waiting patterns (a potential for deadlock exists). Since client C2 has now stored in its WFG certain information (i.e. T2- T3) that it determines may not be known to another waiting client Cl, client C2 sends a message (circled message 5-13) to client Cl (e.g. via link 60 or otherwise) informing it that client C2 is waiting for data object 03 which is held by client C3. Client Cl updates its WFG table accordingly with this information, as shown in Fig. 7c, so that it now knows that: (i) it holds or has a lock on data object 01, (ii) it is waiting for data object 02, (iii) client C2 holds data object 02, (iv) client C2 is waiting for data object 03, and (v) client C3 holds data object 03 (i.e. T1 - T2- T3). However, client Cl at this point has not yet learned of anything that client C3 (or its transaction T3) is waiting for, so deadlock has not yet occurred (there is not yet any circular pattern or cycle, so there exists a potential for client C3 to finish its transaction T3 and release data object 03 so that deadlock will not occur). At this point in time, the respective localized WFGs are in states as shown in Figs. 7c, 8c, and 9(b), for clients C1-C3, respectively. Deadlock has not yet occurred.

Client C3 then requests (message 5-14) an exclusive lock on data object 01. Data object 01 responds (message 5-15) to client C3 indicating that the lock request cannot be approved (i.e. data object Ol is already locked by client Cl), thereby denying client C3's request and telling client C3 to wait pending release of the data object 01 lock. Thus, client C3 is pending on data object Ol at this point. Client C3 updates its WFG accordingly as shown in Fig. 9c (i.e. T2^T3- T1). When denying a lock to client C3, data object Ol also sends a message (message 5-16) to client Cl (which already has a lock on data object Ol) indicating that C3 is now waiting for data object 01. Client Cl updates its WFG accordingly as shown in Fig. 7(d). This message (message 5-16) received by client Cl from data object 01 is the last piece of the puzzle needed by client Cl for it to dt tect that it is involved in a deadlock. This is because Cl now knows in addition to facts (i) through (v) above, that (vi) client C3 is waiting for data object 01. Client Cl updates and now has each of items (i), (ii), (iii), (iv), (v), and (vi) in its WFG table as shown in Fig. 7(d). Client Cl 's WFG now shows a complete or circular cycle (i.e. T1- T2- T3- T1). Upon scanning its localized WFG (which includes only its own transaction TI and transactions (T2, T3) related thereto), client Cl detects the circular pattern and thus deadlock. In other words, client Cl determines that each of clients C1-C3 is now waiting for an event (e.g. release of a data object) that only another one of the clients (or transaction) in the client set can cause. The deadlock is illustrated in Fig. 6(a), where solid lines indicate exclusive locks by clients on data objects, and broken lines indicate a client waiting (or pending) for a data object.

Figures 7-9 illustrate that while deadlock is detected after message 5-16, it is only detected by client Cl in this particular embodiment. The respective WFGs of clients C2-C3 are not yet updated in this embodiment with enough related information to enable those clients to detect the deadlock. Client Cl was able to do so due to the active cooperation among the clients (i.e. client C2 having sent message 5-13 to client Cl).

After client Cl detects the deadlock shown in Fig. 6(a), it initiates a solution for the deadlock by surrendering or releasing its lock on data object Ol as shown in Figs. 5 and 6(b). In doing this, client Cl sends an unlock message (message 5-17) to data object 01. Data object Ol responds (message 5-19) to client Cl indicating that the Cl :01 lock has been released, that client Cl is now pending on data object 01 (i.e. client Cl may obtain another lock on object 01 following client C3's lock on data object 01 being released), and that previously pending client C3 now holds a lock on data object 01. As shown in Fig. 5, data object 01 also sends a message (message 5- 18) to client C3 indicating that the Cl lock has been released thereby causing client C3 to have an exclusive hold or lock on data object Ol . Clients Cl and C3 update their WFGs accordingly (not shown). Fig. 6c illustrates this scenario where client C3 holds an exclusive lock on data object Ol, with client Cl pending on data object 01. Since client Cl has now been informed of new information about client C3, and it also knows that the transaction of waiting client C2 is related to its transaction and potentially does not know of the new information, client Cl sends a message (message 5-20) to client C2 informing client C2 that client Cl is now waiting for data object 01 which is held by C3 (this message turns out to be irrelevant; but is sent in accordance with the procedure of clients sharing information with other clients related to their transaction). Client C2 updates its WFG accordingly (not shown). Client C3 now has locks on each of data object Ol and data object 03 thereby allowing client C3 to complete its transaction while client Cl waits for data object 01.Still referring to Fig. 5, once client C3 has completed its transaction, it sends a message (message 5-21) to data object 01 unlocking the same. In response to this message, data object Ol unlocks from client C3 and sends a message (message 5-22) to client Cl indicating that client Cl now holds an exclusive lock on data object 01 (client Cl had been pending on data object 01 during the time client C3 had its lock on data object Ol). Localized WFGs are updated accordingly (not shown).

Fig. 6(d) illustrates the situation in which client Cl holds a lock on data object 01, client Cl is pending on data object 02, client C2 holds a lock on data object 02, client C2 is pending on data object 03, and client C3 still holds its lock on data object 03. Client C3 then sends a message (message 5-23) to data object 03 unlocking the same. In response to this message, data object 03 unlocks from client C3 and sends a message (message 5-24) to client C2 indicating that client C2 now holds an exclusive lock on data object 03 (client C2 had been pending on data object 03 during the time client C3 had its lock on data object 03). Localized WFGs are updated accordingly (not shown). Fig. 6(e) illustrates client Cl holding a lock on data object 01, client Cl pending on data object 02, client C2 holding locks on data object 02 and data object 03, and client C3 no longer holding locks on any of data objects 01-03 because it has completed its transaction T3. Accordingly, client C2 now has locks on each of data object 02 and data object 03 thereby allowing it to complete its transaction while client Cl waits for data object 02. Once client C2 has completed its transaction, it sends a message (message 5-25) to data object 02 unlocking the same. In response to this message, data object 02 unlocks from client C2 and sends a message (message 5-26) to client Cl indicating that client Cl now holds an exclusive lock on data object 02 (client Cl had been pending on data object 02 during the time client C2 had its lock on data object 02).

Fig. 6(f) illustrates client Cl holding locks on data object 01 and data object 02, client C2 still holding a lock on data object 03, and client C3 holding no locks on any of data objects 01-03. Accordingly, client Cl now has locks on each of data object 01 and data object 02 thereby allowing it to complete its transaction. Client C2 then sends a message (message 5-27) to data object 03 unlocking the same. Fig. 6(g) illustrates this status where client Cl holds locks on data objects 01 and 02 and completes its transaction, and clients C2 and C3 no longer hold any locks on any of data objects 01- 03 because they have completed their respective transactions. Localized WFGs are updated accordingly (not shown).

As shown in Fig. 5, once client Cl has completed its transaction, it sends messages to data object Ol (message 5-28) and data object 02 (message 5-29) unlocking them so that other clients of the network (at the same or different nodes of the network) are again free to access them. Fig. 6(h) illustrates this status where none of clients C1-C3 hold any locks on any of data objects 01-03, as they have all completed their transactions after resolving the aforesaid deadlock. After any or all of data objects 01-03 have been modified during the course of the aforesaid transactions, they are updated as to their values and/or other changes across the replicated databases as described above so that each of the replicated databases 33 is the same in this regard. As can be seen above, communications transmitted between clients C1-C3 via link 60 (e.g. the circled messages in Fig. 5) enable clients (e.g. Cl in the Fig. 5 example) to detect the deadlock via localized WFG analysis. In the Fig. 5 example, the first circled message (from client C2 to client Cl) proved to be what would otherwise have been a missing piece of information needed by C 1 to detect the deadlock, while the circled message volunteered from client Cl to client C2 had no effect. Without such active cooperation or communication(s) between clients, client Cl would have never known that client C2 was waiting for data object 03 which was held by client C3, and thus would not have detected the deadlock.

In the Figs. 5-9 example above, a total of twenty-nine messages was required to detect and resolve the deadlock: two messages/lock request + N if N clients are waiting for a lock on that object (6 * 2 + 3=15); one message/indirect dependency (2 * 1 = 2); three messages/lock surrender + N if N other clients are waiting for a lock on that object (1 * 3 + 0 = 3), and one message/unlock + N if N clients are waiting (6 * 1 + 3 = 9).

An advantage of allowing clients to perform deadlock detection, as in the Fig. 5- 9 example, is that once deadlock is detected graceful resolution is possible. One or more of the affected clients simply trade places on a wait list on the object(s) in question. For example, in Figs. 5-6, once client C3 is done with data object Ol, the lock is again granted to client Cl, which had surrendered its lock to client C3 earlier in order to resolve the deadlock. Thus, no transaction had to be restarted. Moreover, by allowing individual clients to detect deadlock via their own localized WFGs, and partitioning WFG information of different clients on a per transaction basis (i.e. a client's WFG may include only information about other clients whose transactions relate to the WFG client's transaction), the graphs may be of reduced complexity thereby enabling them to be scanned in substantially real time so that deadlocks can be more efficiently detected and more easily resolved. In other words, it is possible to limit the size of local client WFGs to include only transactions which operate on the same resources as the locking process/client. Furthermore, since locking processes (or clients) may synchronize with one another, they can exchange resources in an efficient manner so as to avoid and/or reduce transaction restarts.

Once deadlock is detected, it must be determined which client(s) should surrender a lock(s) in order to resolve the deadlock. In making such a determination, some type of globally consistent ordering may be used. In Fig. 5, for example, it was assumed that Cl < C3, meaning that client Cl must surrender its lock to client C3. However, many alternative types of ordering may instead be used, with the particular type of ordering implemented being a mere implementation detail subject to selection based upon the application at issue. In certain embodiments, the orderings are total, since otherwise no unique representative can easily be found.

Different algorithms or programming may be used in different embodiments of this invention so that clients (e.g. C1-C3) may determine when they are to inform other client(s) of information that they have received. For example, in the embodiment shown in the flowchart of Fig. 10 (which slightly differs from the Fig. 5 embodiment), a client C_x determines when to send another client C_y information in the following manner. For each client C_y which is waiting for client C_x, client C_x knows to send each such client C_y all information regarding locks for which any client is waiting for but which client C_y is not involved. Client C_x determines whether this condition is met each time client C_x receives a message from an object indicating that client C_x is to wait. To exemplify this method, reference is made to Figs. 5 and 10 (although the sequence of messages resulting from this Fig. 10 embodiment is slightly different that the sequence shown in Fig. 5).

As shown in Fig. 10, the first query 103 is whether a message received is from a data object. If so, client C_x then determines at 107 whether the received message (M) is telling client C_x to wait for a lock to be released. If not, then no message is sent by client C_x to any other client 105. If so, then client C_x determines whether any other client C_y is waiting for client C_x to release a lock (step 109). If so, then at 111 client C_x determines whether the received message M includes information relating to a particular lock which some client is waiting for but which client C_y is not involved. If so, then client C_x sends a message to client C_y informing it of all or a portion of the information in the received message (M) (step 115). However, if the received message (M) was determined in query 111 to relate to a lock which client C_y is waiting for or otherwise involved in, then client C_x sends no message to client C_y (step 113).

In applying the embodiment of Fig. 10 to the scenario of Fig. 5, the first message in Fig. 5 where a client is told to wait is message 5-8 to client Cl . Because client Cl does not hold any lock for which any other client is waiting, it does not send any volunteered message to another client. When client C2 is told to wait (query 107 satisfied) by message 5-11 received from object 03 (query 103 satisfied), client C2 already had a lock on object 02 for which client Cl was waiting (query 109 satisfied). Moreover, client C2 was not aware of any relationship between client Cl and object 03 (query 11 1 satisfied). Thus, in accordance with the above method of determination illustrated in Fig. 10, client C2 determines that it should send client Cl message 5-13 informing it that client C2 was waiting for a lock on object 03 that was held by client C3 (step 115). In a similar manner, message 5-15 causes client C3 to send a message (not shown) to client C2 (since client C2 is waiting for client C3, and client C3 is unaware of C2 being related to object Ol) telling client C2 that client Cl holds object Ol and client C3 is pending on object Ol . This message results in clients Cl and C2 each being capable of detecting the deadlock. However, if the clients are programmed in a manner such that C1<C2<C3 (Cl surrenders its locks to C2 and/or C3), client C2 does nothing since it knows that client Cl must surrender first. Finally, it is noted that message 5-19 does not result in client Cl sending any message to any other client because no client is waiting for client Cl in any lock. In other embodiments of this invention, a client C may determine to send another client C_y information about lock L when client C determines that (i) C_y >C, and (ii) C_y has a lock but is not involved in lock L. Other methods may also be used for enabling clients to determine when to send other clients such information according to other embodiments of this invention.

In certain embodiments of this invention, all or a large portion of messages sent from objects (e.g. messages 5-2; 5-4; 5-6; 5-8; 5-9; 5-11 ; 5-12; 5-15; 5-16; 5-18; 5-19; etc.) may include therein a sequence or version number. For example, the first message that a particular object O sends may have a sequence number of one, the second message that object O sends may have a sequence number of two therein, the third message that object O sends may have a sequence number of three therein, and so forth. In other words, the sequence number for each message sent from a particular object O increases by one (S=S+1) each time another message is sent by that object. In such a manner, the potential for confusion is reduced (i.e. in a distributed database system, a client may receive messages from objects at much different points in time) as a receiver of message(s) from object(s) can place messages received from that and other objects in a sequence indicative of their time or sequence of origination.

From a programming perspective, the details of the locking algorithm and the optional fact of a client performing lock negotiations, may be hidden behind a functional interface. Specifically, clients may adhere to a two-phase locking protocol, acquiring all necessary locks before performing any work or releasing any locks.

It is noted that in transactions involving a large number of data objects, it may not be cost effective for clients to communicate with each separate object. In such cases, a locking agent(s) may be employed with each locking agent being for administering locks for a number of data objects. Such an agent(s) may behave like the individual objects discussed above, and need not affect the client cooperation protocol. In certain embodiments, the algorithm for implementing certain aspects of this invention may include two entities; one for the object and one for the client. These entities may be seen as interfaces that arrange for client(s) to get access to an object, and ensure that the object is only accessed by clients which are allowed to do so. The algorithm implementing the object interface preferably provides read and write functionality, and also guarantees that such read or write functionality is guaranteed to be done by a client that has the right to do so (i.e., holds the lock). The algorithm implementing the client interface keeps track of the objects that need to be locked for a read and/or write operation. The client interface actively asks objects for read/write permission (a lock) and waits for the results supplied by the objects. The information is gathered, combined with consecutive information provided by the objects and competing clients and action is taken upon the received information. The client interface can either conclude that it has obtained read/write access to all required objects, or, by absence of all information, wait for more information and/or send information to competing clients. Set forth below is a more detailed description of the algorithms for both entities in more detail. The algorithm(s) may be stored, for example, in different memories at a plurality of different computers in the distributed network. At each computer, for example, the algorithm(s) may stored in normal memory (e.g., hard drive, RAM, ROM, EPROM, etc.), secondary flash memory, primary flash memory, and/or in the processor memory in different embodiments of this invention. It is noted that since data communication is involved, a portion of data needed by the algorithm(s) is typically sent overt a network so that at certain points in time it is in a wire or other communication media.

In certain embodiments of this invention, it may be assumed that computers run in a distributed environment, and are connected with a communications network having a slow speed compared to computational speed of the individual computers in the network. Clients and objects have unique references and a total order exists among these references. The uniqueness of the reference is guaranteed. However, the set of possible references is finite. No strong assumption is therefore made about the relationship between creating a chent or object and the assigned reference(s). It may also, in certain embodiments, be assumed that a client created later in time has a larger reference than earlier-created clients, although this need not be the case in all embodiments. It may also be assumed that enough memory is available to store all information about relationships between locks and objects of interest. Another assumption may be the existence of a sequence of numbers as long as is needed to attach version numbers to messages, such that within the lifetime of an object, we do not run out of version numbers.

With regard to the object algorithm, the algorithm implementing the interface to the object stores requests from clients for access to certain locks. The requests are stored in a waiting queue in order of arrival (e.g., in any of the memories listed above, with ordering in a queue being performed, e.g., by links). The first element of a queue is considered to have been granted access to the objects. Upon arrival of a request from a client, the request is put at the end of the respective queue and all clients in the queue are notified about the new situation or scenario of the queue (i.e., notified of the new order of and/or client requests in the queue). Clients are permitted to cancel their requests which have previously been made, in which case they are removed from the queue. Again, all clients remaining in the queue are notified.

Still referring to the object algorithm, a client which holds a lock on a particular object (i.e., the client which is the first in the waiting queue) may request to surrender its lock on that object. Upon surrendering, the client is no longer the first in the queue. In certain embodiments, when surrendering a lock, the surrendering client is thereafter placed last in the queue and all clients are again notified about the new order in the queue. Placing a surrendering client at the end of the queue is deemed to be a safe way of implementing a surrender action. In alternative embodiments of this invention, a surrendering client may independently determine which location in the queue line (e.g., at the end, in the middle, or simply switching places with the request immediately following it) would be the safest place to be located following its surrender to minimize the chance of future deadlock. In such embodiments, the request of the surrendering client may then be inserted into that particular location in the queue following the client's surrendering of its lock on the object.

Each notification to clients of an updated status of a queue is supplied with a version number. In certain embodiments, the version number is increased after every change in a particular queue. Moreover, notifications may also carry the identity of the particular client which induced the notification to be sent (e.g., the client which requested access, the client which canceled, the client which unlocked, or the client which surrendered).

The object algorithm also may implement a read and write option on an object. However, read and write are only permitted whenever a requesting client holds a lock for the object at issue (i.e., is first in the queue).

As mentioned above, the later a client is created, the larger the reference value/number assigned to that particular client (i.e., the reference may reflect the time of creation in certain exemplary embodiments). In certain embodiments, the tail of a waiting queue for an object (i.e., all elements in the queue except for the first one) may be sorted with respect to the total order of the references. In other words, the waiting order in a queue may be dictated by the reference of the clients therein, e.g., so that the oldest client is located at the front and the youngest at the rear. However, this may be unsafe in certain environments such as one where a finite set of references is provided. This potential problem may be overcome where the set of references is large enough and time-out primitives are used to deal with starvation. This particular method is not always preferred, but may be more efficient in cases where negotiation concerning objects is relatively fast as compared to the time it takes to run out of references. However, clients which do not get served in time should be caused to restart their requests, which may take a substantial amount of overhead for certain individual clients, whereas other clients may be served much faster. In other embodiments, the ordering in a queue is simply based upon when the various requests arrived.

Turning to the client algorithm, the interface for the client is activated whenever a client needs some object(s) on which to perform an operation (e.g., read or write). Access to the object(s) is requested by the client interface. After all objects required by a client to perform its transaction have been locked, the client is notified that the operation may take place. After performing the operation and completing the transaction, the client interface is addressed by the client and the interface releases (unlocks) the lock on all of the objects which it had acquired for the client.

Communication between the client interface and object interface has been described above. A lock can be requested, surrendered or canceled. Once a lock has been granted to a client, that client may read/write on the object. Following completion of its transaction, a client typically unlocks all of the locks utilized for the transaction, and the client is removed from the corresponding lock request queues.

With regard to the client algorithm(s), the client interface may be constantly updated when the queue of an object that it has requested changes. This information is stored by the client interface and, as such, the client interface detects easily when all objects are assigned to it (when all objects are assigned to it, the client can begin/complete its transaction). The queue information that objects provide to clients have respective version numbers and the identity of a client attached thereto. Clients keep track of version numbers and whenever information is received with a version number smaller than a version number of earlier received information, this information may be ignored because the version numbers indicate that it is old or outdated. This guards against the potential of a client detecting deadlock based upon old information due to outdated information. However, in other embodiments, older version numbers may still be considered in WFG graphs, so as to guard against the possibility of deadlock not being detected as a result of delays in the network.

When a client sends a surrender request to a particular object, it preferably ignores all information coming from the surrendered object until information received from that object carries the identity of the surrendering client. In other words, the client algorithm may utilize the client information attached to a return message from an object as acknowledgement of the surrender request by that client. This provides for safe operation in a case where a client surrenders and outdated information arrives stating e.g., that the client just received access.

Whenever information about queues of respective objects is such that not all objects are yet assigned to clients, the client interface may compute the so-called wait- for-graph (WFG). This graph expresses which client(s) waits for which other client(s) to get access to an object, as described above and illustrated in the drawings. If a client's graph contains a cycle, then a deadlock situation exists. The client interface for each client checks for cycles in the appropriate graph(s) and if such a cycle exists, it computes which of the clients in this cycle holds the lock for an object which has the largest reference value/number (since there exists a total order of the references, this is uniquely defined). If the client interface does not represent this largest reference itself, it waits for more object information to come. If, however, the client interface represents this largest reference, it sends a surrender message to the object it holds that caused the deadlock situation. Thus, the client with the largest reference is selected among the clients which hold a lock in the deadlock scenario. In other embodiments of this invention, the client with the smallest reference may be selected to surrender a lock in order to resolve the deadlock. Other suitable techniques may also be used. In order to be able to surrender only one object, a client or client algorithm stores the relationship between that object and the clients waiting for that particular object. It is typically clear which object is waited for in a deadlock scenario, since the client that surrenders may very well hold other objects which are not involved in the deadlock scenario. In alternative embodiments where little memory is available, a client may choose to not store all information, but instead surrender all objects that the client holds in order to resolve a deadlock. This latter approach is safe but inefficient.

In certain embodiments, when a client surrenders a lock its request is moved to the rear of the queue line. However, in special circumstances, a client interface may compute that surrendering to the end of the queue is not the most efficient method of surrendering. But this may require extra information from the application for which the algorithm is used. For example, if it is known that every client is at most claiming three objects at a time, this information could very well be used to compute more optimal surrendering positions.

When a client interface does not encounter a cycle in the graph, it reports the graph information to other clients which may be interested in the information (e.g., clients involved in transaction relating to the transaction of the notifying/reporting client), as illustrated at step 357 in Figure 13. Note that all clients only have part of the information of the "global" wait-for-graph. By spreading the information to other clients, one ensures at least one client obtains the "global" wait-for-graph relating to a deadlock (if one occurs) as its local graph. Here, one has several possible ways in which to spread the information around. The efficiency and optimality of the strategy of which a client sends this information to other clients depends upon the way objects and clients relate (e.g., how many clients are demanding the same object, how many objects are demanded by clients, etc.). A first, and simplistic, way of spreading the information is for a client to send its entire wait-for-graph to all clients that occur in its own graph. This may be performed in certain embodiments of this invention. However, in other embodiments, client only sends a portion of its wait-for-graph to another client if that client does not have that information and is not getting it from another client. The latter can be achieved by, for example, clients only sending information to clients having smaller reference numbers, or alternatively, only sending information to clients having larger reference numbers. Another strategy for spreading information, which reduces the number of messages sent and therefore is efficient in nature when computation is cheaper than sending, is for a client to only send its wait- for-graph information (entire wait-for-graph or portions thereof) to other client(s) which the sending client is waiting for and to include only information regarding clients that wait for the sending client. In other embodiments, a client could send all clients which wait for it information regarding all clients which it is waiting for.

Any of the above-listed systems/methods for determining when and how much information to send other clients may be used in different embodiments of this invention. Moreover, any of these information spreading strategies may be improved by computing which information that a client wishes to send to another client is or should already be stored by the another client. In such a manner, a reduced amount of superfluous information is sent.

Wait-for-graphs may be implemented by using standard data structures and computing cycles. Such graphs are known in the art, for which several well-known algorithms exist. Any such well-known algorithm will suffice. However, specific for the client interface algorithm herein, a node in a wait-for-graph may include client identity, the object that the client is locking, and whether the client has obtained the lock to this object. All clients need not be presented in a graph; only those that hold a lock. However, the relationship which expresses a client waiting for another client may take all known clients into account. Whenever a cycle is detected, from the nodes on this cycle, the client with, for example, the largest reference number, surrenders the object it locks. Figure 11 is a flow chart illustrating certain steps taken by the object algorithm in accordance with an embodiment of this invention. The flowchart starts 201 with an incoming communication from a requesting client. If the client desires to read 203 data from the object, then it is determined at 205 whether or not the requesting client has a lock on that object. If so, reading of a value is permitted 207 and a reply sent to the client. If not locked, then reading is not permitted 209 and a corresponding reply is sent to the client indicating the same. Thereafter, the state of the algorithm goes back to start 211. If the client desires to write 213 data to the object, then it is determined at 215 whether or not the requesting client has a lock on that object. If so, writing is permitted 217 and a reply 221 sent to the client indicating the same. If not locked, then reading is not permitted 219 and a corresponding reply is sent to the client indicating the same. Thereafter, the state of the algorithm goes back to start state 211. When the received communication is a request for a lock on that object 225, it is determined whether it is a fresh or new request 227. If not (e.g., an old or outdated request), then the algorithm proceed back to start. If so, then the version number is increased 229 and the lock request is put in the lock request queue for that object 231. Thereafter the object notifies 233 related clients of the updated status of the lock queue and lock status, including the new version number in such notification as discussed above. After notification 233, it system returns to the start state 211, 201. When the received communication is a surrender request 235 where a client is attempting to surrender a lock on the object, it is first determined 237 whether that client does indeed have a lock on that object. If not, proceed to start. If so, then the version number is increased 239 and the client is removed from the front of the lock queue line 241 and placed back into the lock queue line at a different location (e.g., at the end of the lock request line). Thereafter, the object notifies 243 related clients of the updated status of the lock queue and lock status of that object, including the new version number in such notification. After notification 243, the system returns to the start state. When the received communication is an unlock request 245, it is first determined 247 whether the requesting client is in the queue for that object. If not, proceed to start 253. If so, the version number is increased 249 and that client is removed 251 from the queue for that object 251. Thereafter, the object notifies 255 related clients of the updated status of the lock queue and lock status of that object, including the new version number in such notification. After notification, the system returns to the start state. In other embodiments, at each notification step 233, 243, and 255, the object at issue may notify all clients currently in that queue of the updated status; while in other embodiments the determination as to which clients to notify may be made in a different manner.

Figure 12 illustrates steps taken by the client aspect of the algorithm according to an embodiment of this invention. Starting 301 occurs when the client at issue needs an object(s) to perform an operation on (i.e., to complete a transaction or task). When it is determine that the client needs a lock 303 on an object, a lock request is sent out 305 to that object and that object is added to the stored set of objects that are needed by that client 307. After lock requests have been sent out, the client awaits locks on the requested objects 309. A determination 311 is made as to whether the client has all of its requested or needed locks. If not, then proceed again to start. If so, then the client reports 313 that it has all locks necessary to complete its transaction. When lock information is received 315, the client determines 317 whether it is a report that all locks have been obtained. If so, then proceed to start and the client can complete its transaction. If not, then it is determined 319 whether the received lock information is outdated. If so, then proceed to start and the information is ignored. If not, then it is determined 321 whether the client is still waiting for an acknowledgement from that object (e.g., as discussed above, when a client sends a surrender request, it may ignore all communications from that object until receiving acknowledgement of the object's receipt of the same). If so, then proceed to start. If not, then the received lock information is added to a stored list of received lock information 323 (e.g., add to locked list and/or local WFG). Thereafter, it is determined 325 whether the client has all locks that it needs. If not, it proceeds to handle the received lock information 329. If so, it reports 327 that it has all of its needed locks. Figure 13 illustrates steps taken by the client aspect of the algorithm in handling the lock information 329, 341 in accordance with an embodiment of this invention. The locking information in the local WFG is analyzed 343. It is then determined 345 whether a deadlock has been detected (e.g., whether a cycle is detected in the WFG). If not, a computation is performed 355 to identify all related (i.e., indirect) clients and locking information is sent to those clients 357. The various methods for determining which clients to sent information to and how much information to send are described above. If deadlock is detected at 345, then all information is removed on the object to be surrendered 347. The client then determines 349 whether it should be the client that sends out the surrender request on that object. If not, proceed back to start as another client will be doing so in order to resolve the deadlock. If so, then the client sends a surrender request to a target object 351 and waits for acknowledgement 353. In such a manner, the deadlock is broken.

While the invention has been particularly shown and described with reference to the preferred embodiments thereof, it will be understood by those skilled in the art that various alterations in form and detail may be made without departing from the scope and spirit of the invention. Additionally, while the sequencing of Figs. 5-9 involves three clients, it will be recognized by those skilled in the art that as few as two clients may be involved in a deadlock detectable in accordance with this invention, and many more than three clients may be involved in a deadlock detectable in accordance with this invention.

Claims

WE CLAIM:

1. A method of detecting deadlock in a distributed database network, said method comprising the steps of:

providing first and second clients in the distributed database network; and

transmitting information from the second client to the first client to enable the first client to detect a deadlock.

2. The method of claim 1, further comprising the steps of each of the first and second clients updating respective localized dependency graphs in response to information received from the other of the first and second clients.

3. The method of claim 1, further comprising steps of:

the distributed database network comprising a replicated database network

including a plurality of databases each inclusive of a plurality of data objects replicated on each of a plurality of computer in the replicated database network;

the first and second clients obtaining locks on first and second of the data objects, respectively;

each of the first and second clients waiting for respective events in a manner such that the first and second clients are in deadlock;

the second client sending a message to the first client including data indicative of at least one event that the second client is waiting for; and

the first client, based at least in part upon the data received in the message from the second client, detecting the deadlock.

4. The method of claim 3, further comprising the step of one of the first and second clients surrendering a lock on a data object in order to initiate resolution of the deadlock.

5. The method of claim 4, further comprising:

providing a third client, and

the first, second, and third clients obtaining exclusive locks on first, second, and third of the data objects, respectively, prior to said detecting step.

6. The method of claim 5, further comprising:

the first client waiting for access to at least one of the second and third data objects, the second client waiting for access to at least one of the first and third data objects, and the third client waiting for access to at least one of the first and second data objects, prior to said detecting step; and

said sending step comprising the second client sending the message to the first client where data in the message is indicative of the second client waiting for access to at least one of the first and third data objects.

7. The method of claim 4, wherein said detecting step includes the first client storing data received from the second client in a wait-for graph (WFG) or table which is for including only information relating to a transaction which the first client is working on, and the first client performing WFG analysis on the WFG or table.

8. The method of claim 4, further comprising the step of resolving the deadlock without having to restart any transaction of any of the first and second clients.

9. The method of claim 4, wherein said step of one of the first and second clients surrendering a lock on a data object in order to initiate resolution of the deadlock includes the first client surrendering its lock on the first data object;

after the first client surrenders its lock on the first data object, the second client obtaining a lock on the first data object so that the second client simultaneously has locks on the first and second data objects;

the second client completing a transaction utilizing each of the first and second data objects;

after said completing step, the second client surrendering its locks on the first and second data objects;

after said surrendering step, the first client again obtaining a lock on the first data object and obtaining a lock on the second data object so that the first chent simultaneously has locks on the first and second data objects;

the first client completing a transaction utilizing each of the first and second data objects; and

after completing its transaction, the first client surrendering its locks on the first and second data objects.

10. The method of claim 1, wherein the steps recited are performed in the order in which they are recited, and wherein each of the clients includes a process or thread implementable by a computer system.

11. The method of claim 1 , wherein said step of transmitting information further includes the first client transmitting information to the second client to enable the second client to detect the deadlock.

12. The method of claim 1, further comprising the step of the second client determining whether to send certain information to the first client based at least in part upon whether the second client is aware of whether the first client has previously been sent the certain information.

13. A method of detecting deadlock comprising the steps of:

providing at least first and second clients in a deadlock where each of the first and second clients is waiting for an event that only another client in the deadlock can cause;

the first client detecting the deadlock based at least in part upon a message received by the first client from the second client including data indicative of a status of the second client; and

one of the first and second clients causing a lock on a data object to be surrendered in order to initiate resolution of the deadlock.

14. The method of claim 13, further comprising the step of

resolving the deadlock without having to restart any transaction involving either of the first and second clients; and wherein the status of the second client includes at least one of (a) an event that the second client is waiting for, and (b) an object exclusively locked by a client.

15. The method of claim 13, further comprising:

wherein said step of one of the first and second clients causing a lock on a data object to be surrendered includes the first client surrendering its lock on a first data object in a replicated database; after the first client surrenders its lock on the first data object, the second client obtaining a lock on the first data object so that the second client simultaneously has locks on the first data object and a second data object;

the first client again obtaining a lock on the first data object and obtaining a lock on the second data object so that the first client simultaneously has locks on the first and second data objects;

after completing its task, the first client surrendering its locks on the first and second data objects.

16. A distributed database network comprising:

first and second clients for performing first and second transactions, respectively;

a plurality of data objects which may be accessed by said first and second clients during performance of their respective transactions; and

said first client including a deadlock detection system for detecting a deadlock in which said first and second clients are involved based at least in part upon a message received by the first client from the second client including data indicative of an event or data object that the second client is waiting for.

17. The network of claim 16, wherein said deadlock detection system causes one of the first and second clients to surrender a lock on a data object in order to initiate resolution of the deadlock.

18. A deadlock detection system comprising:

first and second processes for performing first and second transactions, respectively, each transaction utilizing at least one common data object stored in a replicated database system; and

at least said first process comprising a deadlock detection system including dependency information stored relating to the first and second transactions, for detecting a deadlock in which said first and second clients are involved.

19. The system of claim 18, wherein each of said first and second processes comprises a deadlock detection system, each including a graph or table having information stored therein relating to the first and second transactions, for detecting a deadlock in which said first and second clients are involved, and wherein the graph or table of each process is updated based upon information received from the other client indicative of an event that the other client is waiting for.

20. The network of claim 18, further comprising means for causing one of the first and second clients to surrender a lock on a data object in order to initiate deadlock resolution.

21. The system of claim 18, wherein said dependency information includes only information relating to said first transaction .

22. A method of a first client in a distributed database network determining whether to send information relating to a lock on a data object to a second client, the method comprising the steps of:

the first client receiving a message from the data object, the message including information relating to the lock on the data object;

the first client determining whether a client is waiting for the lock to be released;

the first client determining whether the second client has a lock on the data object or is waiting for the lock on the data object to be released;

the first client sending a message to the second client including information relating to the lock on the data object when it is determined that a client other than the second client is waiting for the lock on the data object to be released.

23. The method of claim 22, wherein the first client does not send any message to the second client when it is determined that the second client holds a lock on the data object.

24. The method of claim 22, wherein the first client does not send any message to the second client when it is determined that no client is waiting for the first client to release a lock on a data object.

25. A method of detecting deadlock in a distributed database network, said method comprising the steps of: providing a client in a distributed database network; the client receiving lock information from a data object; the client storing at least a portion of the received lock information in a localized storage; and the client determining whether a deadlock scenario is present by analyzing the localized storage and determining therefrom whether a deadlock condition exists.

26. The method of claim 25, further comprising: the received lock information including a version indicator; and the client determining whether to ignore the lock information based upon the version value attached thereto.

27. The method of claim 25, wherein the client determines whether it should surrender a lock that it has on an object in order to resolve the deadlock condition; and wherein the localized storage comprises a wait-for-graph.

28. The method of claim 25, further comprising: determining which other clients to send lock information to; and the client sending at least a portion of the received lock information to at least one other client.

29. A method of handling a request from a client relating to an object in a distributed database network, said method comprising the steps of: providing the object and a client in the distributed database network; receiving a request from the client relating to a lock on the object; changing a version indicator value in response to receiving the request from the client, to an updated version indicator value; modifying a lock queue relating to the object in response to the request; and notifying a plurality of other clients that have respective lock requests in the lock queue of an updated status of the lock queue and the updated version indicator value.

30. The method of claim 29, wherein the request is one of a lock request seeking a lock on the object, an unlock request seeking to unlock a lock held on the object, and a surrender request.

31. The method of claim 29, wherein the request is a lock request and wherein said modifying step includes placing the lock request in the lock queue.

32. The method of claim 29, wherein the request is a surrender request, and wherein said modifying step includes changing the position of a lock request of the client that sent the request in the lock queue.

33. The method of claim 29, wherein the request is an unlock request, and wherein said modifying step includes removing a lock request of the client from the lock queue.