US5117352A - Mechanism for fail-over notification - Google Patents

Mechanism for fail-over notification Download PDF

Info

Publication number
US5117352A
US5117352A US07/424,903 US42490389A US5117352A US 5117352 A US5117352 A US 5117352A US 42490389 A US42490389 A US 42490389A US 5117352 A US5117352 A US 5117352A
Authority
US
United States
Prior art keywords
request
subroutine
application part
granted
resource
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
US07/424,903
Other languages
English (en)
Inventor
Louis H. Falek
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hewlett Packard Development Co LP
Original Assignee
Digital Equipment Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Digital Equipment Corp filed Critical Digital Equipment Corp
Priority to US07/424,903 priority Critical patent/US5117352A/en
Priority to GB9021576A priority patent/GB2237130B/en
Priority to JP2281723A priority patent/JPH03194647A/ja
Priority to DE4033336A priority patent/DE4033336A1/de
Priority to FR9013032A priority patent/FR2655168A1/fr
Application granted granted Critical
Publication of US5117352A publication Critical patent/US5117352A/en
Assigned to COMPAQ INFORMATION TECHNOLOGIES GROUP, L.P. reassignment COMPAQ INFORMATION TECHNOLOGIES GROUP, L.P. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: COMPAQ COMPUTER CORPORATION, DIGITAL EQUIPMENT CORPORATION
Assigned to HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P. reassignment HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P. CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: COMPAQ INFORMATION TECHNOLOGIES GROUP, LP
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0709Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing

Definitions

  • the invention relates to failure notification in a computer network wherein several co-operating parts of an application program are each running on a different node of the computer network.
  • each CPU running one of the parts of the application must function properly throughout the entire processing of the part. If one part of the application program fails due to a CPU crash, it is imperative that notification of the failure be made to enable a network manager to implement appropriate corrective actions. For example, the network manager can transfer the failed part of the application program to another CPU on the network for execution.
  • the present invention provides an automatic fail-over notification mechanism that is largely transparent to application developers and easily implemented by inserting a call subroutine instruction in each part of an application program.
  • the mechanism comprises a set of linked subroutines called by each part of the application program and operating through the use of a distributed lock manager to link and reverse link the various parts of the distributed application program.
  • the fail-over notification mechanism utilizes the links and reverse links to initiate a failure communication upon the crash of a CPU executing one part of the application program.
  • the failure communication contains information that is sufficient to initiate an automatic recovery from the failure without the need of manual intervention by a network operator or manager.
  • a distributed lock manager as for example the VMS Distributed Lock Manager marketed by DIGITAL EQUIPMENT CORPORATION, is a mechanism for coordinating network wide access to files and their individual records and to synchronize interprocess events across the entire network.
  • the lock mechanism permits a program developer to name each resource on the network, whether physical or logical, and to require an application program running on a CPU (i.e. a process) to request a lock for that resource prior to acess to the named resource.
  • a database or a subset thereof can be a resource and a lock can be in one of several lock modes recognized by the lock manager. Lock modes can include various exclusive or shared write and read access privileges to the resource to share access with other processes or prevent access by other processes.
  • the lock manager also maintains granted and waiting queues for lock requests for each named resource and services waiting requests in a first come, first served manner.
  • a lock request is placed in a granted queue when the request is granted and is placed in a waiting queue when the lock request is incompatible with an already granted lock request.
  • Each process running in a network can enqueue and dequeue (release) lock requests from a lock request queue and, further, specify an asynchronous system trap in the lock request.
  • An asynchronous system trap is a routine that interrupts a running process upon the occurrence of an event and then executes.
  • the events can include the granting of a lock request or the making of a lock request that is incompatible with an already granted lock request.
  • a completion AST When an asynchronous system trap is invoked by the granting of a lock request it is referred to as a completion AST.
  • a blocking AST When an asynchronous system trap is invoked by the making of a lock request that is incompatible with an already granted lock request it is referred to as a blocking AST.
  • a process that specifies an AST with its lock request will be interrupted by the routine relating to the specified AS when the lock request is granted in the case of a completion AST or when another process makes a request for a lock mode that is incompatible with the lock mode granted to the process in the case of a blocking AST.
  • a lock manager can include a communication facility which may comprise a lock value block associated with the lock queues for the resource name.
  • a lock value block may contain, e.g., 16 bytes of information which can be input and/or read by a process through the enqueuing and dequeuing of lock requests.
  • the information can comprise interprocess messages.
  • the present invention provides a fail-over notification mechanism by naming certain logical resources relating to processes that constitute the parts of a distributed application program and requiring a certain set of ordered compatible and incompatible lock requests for those resources to link and reverse link the processes through the lock request queues.
  • the present invention also specifies completion and blocking asynchronous system traps in the lock requests to cause the issuance of reverse linking lock requests and provide failure notification upon the crash of a CPU running one of the parts of the application program.
  • the logical resources include a "SPECIAL” resource, a "BROADCAST” resource, a "PERMISSION -- TO -- TALK” resource, and a resource uniquely named for each one of the parts of the application program.
  • the SPECIAL resource is the process selected to receive failure notification whenever one of the other parts of the application fails and to take corrective action.
  • the BROADCAST resource utilizes the communication facility of the lock manager to advise the SPECIAL resource when a part of the application program has entered or leaves the network.
  • the PERMISSION -- TO -- TALK resource operates in conjunction with the BROADCAST resource to assure transmission of messages between processes.
  • each process executing a portion of the application program names a resource that uniquely identifies the part, e.g., the resource name can be based upon the nodename where the process is located within the network.
  • the fail-over mechanism is activated by a single subroutine call instruction that is inserted in each part of an application program by a program developer.
  • the subroutine call instruction calls a SETUP routine.
  • the SETUP routine makes all of the lock requests and associated asynchronous system traps calls necessary to setup the fail-over mechanism.
  • the use of the SETUP routine which is resident in the network and available to all application programs, makes the fail-over mechanism transparent to the application developer.
  • the SETUP routine makes three lock requests on behalf of the process which called it, as follows:
  • the lock request for the SPECIAL resource in EXCLUSIVE specifies a completion AST called SPECIAL -- AST. since the request is for an EXCLUSIVE lock mode, only one of the processes of the application program, usually the first to make the request, will be granted the lock request. The other processes of the application program will have their respective lock requests placed in a LOCK REQUEST FOR SPECIAL IN EXCLUSIVE MODE WAITING queue.
  • the process granted the EXCLUSIVE mode request is then interrupted by the SPECIAL -- AST completion routine which makes that process assume responsibility for receiving and acting upon a failure notification (the process granted the EXCLUSIVE lock is referred to as the special server).
  • the SPECIAL -- AST utilizes the lock manager to obtain a list of all of the processes in the LOCK REQUEST FOR SPECIAL IN EXCLUSIVE WAITING queue.
  • the list derived from the LOCK REQUEST FOR SPECIAL IN EXCLUSIVE WAITING queue corresponds to all of the other processes of the application program.
  • the list provides a forward link between the process assigned responsibility for receiving failure notification and the other processes of the application program through the LOCK REQUEST FOR SPECIAL IN EXCLUSIVE WAITING queue.
  • the SPECIAL -- AST will then issue a lock request in EXCLUSIVE for each of the SERVER -- xxx resources corresponding to the processes in the waiting queue list. Since the lock request in EXCLUSIVE is incompatible with the PROTECTED WRITE locks granted to the listed processes corresponding to the SERVER -- xxx resources, each of the lock requests made by the SPECIAL -- AST on behalf of the special server will be placed in respective LOCK REQUEST FOR SERVER -- xxxx IN EXCLUSIVE WAITING queues, one for each process in the list.
  • the lock request in EXCLUSIVE for each SERVER -- xxx will also specify a completion AST called a FAIL -- AST.
  • the exclusive lock requests made on behalf of the special server by the SPECIAL -- AST provides reverse links between the special server and all of the other parts of the application program through the waiting queues of the SERVER -- xxx resources.
  • the FAIL -- AST completion routine will interrupt the running of the process on the special server.
  • the FAIL -- AST runs a routine to identify which process has failed through the use of the unique name of the SERVER -- xxx resource for which the special server has been granted the EXCLUSIVE lock and can, for example, either send a message to a specified mailbox or call a routine prepared by the developer of the application program to automatically handle a failure of a part of the program.
  • FIG. 1 is a block diagram of a computer network.
  • FIG. 2 is a block diagram of the computer network of FIG. 1 including an illustration of a program running on several components of the computer network.
  • FIG. 3 is a detail of a job record stored in the disk file of the computer network illustrated in FIG. 2.
  • FIG. 4 is a flow chart of a SETUP routine according to the present invention.
  • FIG. 5 is a flow chart of a SPECIAL AST routine according to the present invention.
  • FIG. 6 is a logical block diagram of several parts of an application program linked and reverse linked to one another through lock manager queues according to the present invention.
  • FIG. 7 is a flow chart of a FAIL -- AST routine according to the present invention.
  • FIG. 8 illustrates side-by-side flow charts for the CLUSTER -- BROADCAST and MSG AST routines according to the present invention.
  • FIG. 1 there is illustrated a computer network including several CPU's 10, 11, 12, 13, a user interface 14 and a disk file database 15.
  • the CPU's 10, 11, 12, 13, user interface 14 and disk file 15 are all coupled to one another by a common bus 16.
  • Each of the CPU's 10, 11, 12 and 13 comprises a self-contained data processing unit with its own main memory and can communicate with the other components of the network over the common bus 16.
  • the disk file data base 15 is a shared resource that is accessible to all of the CPU's 10, 11, 12, 13 and the user interface 14.
  • the disk file database 15 contains a job record 16 for each application program on the network.
  • the job record 16 is a data field containing information which identifies the program and indicates such pertinent attributes of he program as, for example, a run program instruction, a request to run program, a CPU field to identify the CPU's 10, 11, 12, 13 where the program is to be run, a distribution list to identify which parts of the program are to be run separately on the various CPU's 10, 11, 12, 13, action to be taken upon the crash of anyone of the CPU's, a data output file location, a mail box location to receive messages and any other information required to run the program on the network.
  • a network operator will utilize the user interface 14 to access the job record 16 for the program in the disk file database 15 and insert a request to run the program into the request to run program field of the job record 16.
  • the network operator will then instruct a program executor 17 on, e.g., CPU 10, to look at the job record 16 of the application program.
  • the program executor 17 will read the data in the job record 16, including the request to run the program inserted by the network operator, and implement the execution of the application program in accordance with the information contained in the job record 16.
  • the program may comprise a batch data processing program which specifies, in its distribution field that three different parts 18, 19, 20 of the program are to be run concurrently, one on each of the CPU's 11, 12, 13.
  • each of a distributed lock manager 21 and an object library 22 is located on each CPU 10, 11, 12, 13.
  • the object library 22 contains a series of subroutines that can be called by each part 18, 19, 20 of the application program.
  • the subroutines operate through the use of the lock manager 21 to assign notification responsibility to one of the parts 18, 19, 20 (hereinafter referred to as the special server) and form links and reverse links between the special server and the other parts 18, 19, 20 of the application program.
  • the distributed lock manager 21 can comprise a VMS Distributed Lock Manager which permits a process to name a resource and request a specified "lock" on the resource name.
  • the resource can be any logical or physical resource in the computer network and the lock represents a write and read access privilege to the resource.
  • the specific type of lock is referred to as a lock mode.
  • the various lock modes available on the VMS Distributed Lock Manager are as follows:
  • the lock modes provide a facility to coordinate access to a resource among various processes running throughout a computer network.
  • the EXCLUSIVE lock mode makes a process the owner of a named resource since no other process can either read from or write to the resource while it is in the EXCLUSIVE lock mode.
  • a PROTECTED WRITE lock mode is less restrictive because, while only one process can write to the named resource, other processes are free to read from the named resource.
  • the other lock modes are less restrictive to provide varying degrees of concurrent and exclusive read and write access privileges to a named resource.
  • the lock manager 21 does not actively enforce the lock scheme so that it is important that each process makes an appropriate lock request before attempting to access a resource for proper operation of the lock scheme.
  • Enforcement of the lock scheme is implemented by enforcing a convention by, e.g., always having a process make the appropriate lock request for a named resource and permitting the process to proceed only if the lock request is granted.
  • a process wants to read from a database that is a named resource, the process will request a CONCURRENT READ lock on the database. If that database is already subject to a granted EXCLUSIVE lock mode, the granting of the request for CONCURRENT READ will not be compatible with the already granted EXCLUSIVE lock mode for the resource. Thus, the lock manager 21 will not grant the request for CONCURRENT READ.
  • the lock manager 21 interacts with processes running on the various CPU's of the computer network through certain system services made available to the processes, as follows:
  • the lock manager 21 maintains a lock database that comprises a plurality of queues containing lock request information.
  • Each named resource will have a series of queues for each lock mode including, for each lock mode, a granted queue that lists the processes granted the lock mode, a waiting queue that lists outstanding requests by processes which have not been granted because the lock mode requested is incompatible with another already granted lock request and a conversion queue that lists lock requests granted at one mode and waiting for conversion to another lock mode.
  • the lock manager 21 will process each lock request by referring to the lock database to check for compatibility with already granted requests and then proceed to either grant the lock request or place the request in a waiting queue. When a process releases a granted lock mode, the lock manager 21 will look up the waiting queue for that lock mode and then grant the lock mode, on a first come, first served basis, to the next lock request in the waiting queue.
  • the lock manager 21 also maintains a lock status block for each lock for a resource name to indicate into which queue the lock requests for that lock has been placed.
  • a lock valve block is linked to each lock status block.
  • the lock valve block can comprise, e.g., 16 bytes of information that can be either read or written by the processes granted the lock request for that lock.
  • the lock valve block will be available to a next process when it is granted the lock request for the lock mode of the named resource.
  • the lock valve block can be utilized as a communication facility to pass 16 byte messages between processes.
  • each lock request can specify a completion AST or a blocking AST to synchronize resource access among the processes running on the computer network.
  • a completion AST will interrupt a process when a lock request made by the process is granted and a blocking AST will interrupt the process when another process makes a lock request that is incompatible with the lock request already granted to the process.
  • the object library 22 residing in each of the CPU's 11, 12, 13 contains a set of linked subroutines that can be called directly or indirectly by respective ones of the co-operating parts 18, 19, 20 of the application program running on the CPU's 11, 12, 13, respectively.
  • the subroutines are SETUP, CLUSTER -- BROADCAST, MSG -- AST, SPECIAL -- AST, FAIl -- AST, LIST -- SERVER, SHOW -- SERVER and UPDATE -- INFO.
  • the subroutines utilize the lock manager 21 to link and reverse link the co-operating application parts 18, 19, 20 by naming logical resources relating to the application parts 18, 19, 20 and making a series of compatible and incompatible lock requests on the named resource to link the parts 18, 19, 20 through the lock queues maintained by the lock manager 21.
  • the logical resources include SPECIAL, BROADCAST, PERMISSION -- TO -- TALK, and a SERVER -- xxx resource corresponding to each part 18, 19, 20 of the application program.
  • the developer of the application program can utilize the fail-over mechanism of the present invention by inserting a single call subroutine instruction in each part 18, 19, 20 of the application program to call the SETUP routine from the object library 22 in the CPU 11, 12, 13 where the part is running.
  • the call subroutine instruction will specify the SETUP routine and provide an argument to uniquely identify the application program to which the part calling the subroutine belongs.
  • the call subroutine can also provide an argument identifying an existing mail box channel in the network for receiving messages from the fail-over mechanism. A zero indication from the mailbox channel will tell the SETUP routine to open a mailbox channel for the application program.
  • the argument can specify an asynchronous system trap written by the program developer which is to interrupt the application program and run upon a failure notification.
  • the asynchronous system trap can specify corrective actions.
  • All of the subroutines running on a particular CPU 11, 12, 13 in the network co-operate to provide a common memory space in the main memory of the respective CPU 11, 12, 13 to store information relating to the fail-over mechanism set up for the specific application program.
  • the common memory space is automatically set up by the SETUP routine and is accessible to all of the routines in the object library 22. All of the names of the resources for a specific application program will contain a prefix corresponding to the specific application program, i.e. the prefix passed as an argument in the call to the SETUP routine.
  • the common memory space stores information relating to the part 18, 19, 20 running on the specific CPU 11, 12, 13. The information indicates that the part 18, 19, 20 has called SETUP and is in the fail-over mechanism and, further, whether the part is the special server.
  • the call SETUP instruction 100 is in each part 18, 19, 20 of the application program such that the routine will be run separately for each part 18, 19, 20.
  • the SETUP routine starts by making a request for the BROADCAST resource in a CONCURRENT READ lock mode 101 on behalf of the part that called the SETUP routine by utilizing the $ENQW service of the lock manager 21.
  • the SETUP routine then waits for the lock mode to be granted 102.
  • the $ENQW service automatically waits for the granting of the lock.
  • the loop illustrated in the flow chart is for illustration purposes to indicate that the routine does not proceed until the lock is granted.
  • the lock will be granted since all of the SETUP routines request CONCURRENT READ which are compatible with one another.
  • the request for the BROADCAST resource will also specify the MSG -- AST routine which is a blocking AST, as will be described in more detail below.
  • the SETUP routine After the granting of the request for the BROADCAST resource, the SETUP routine then makes a request 103 for a resource which it names in the request, e.g., after the node (i.e. the CPU 11, 12, 13 where the part 18, 19, 20 is running), e.g. SERVER -- CPU 11 in a PROTECTED WRITE lock mode.
  • the VMS system used in the representative embodiment of the present invention provides a service called $GETSYI which can be called by any process to identify the node in the network where the process is running.
  • the SERVER -- xxx resource can be named directly after the part 18, 19, 20 which called the SETUP routine. All that is required in that the resource name be unique to the specific part 18, 19, 20 of the application program.
  • the SETUP routine will again use the $ENQW service of the lock manager 21.
  • the SETUP routine then waits 104 for the request for the SERVER -- xxx to be granted. This request will also be granted since each SETUP routine is requesting a PROTECTED WRITE lock mode on a specific resource named after its corresponding node which is different than the resource names for which a lock is being requested by the other SETUP routines.
  • the SETUP routine will then make a request through the $ENQ service for the SPECIAL resource in an EXCLUSIVE lock mode 105.
  • the request for the SPECIAL resource specifies the SPECIAL -- AST which is a completion AST. Since all of the SETUP routines are making a request in EXCLUSIVE, only one of the SETUP routines, usually the first in time to make the request, will be granted the EXCLUSIVE lock mode. The other requests made by the other SETUP routines running on the network will each be incompatible with the granted EXCLUSIVE lock mode. These requests will then be placed in a REQUEST FOR SPECIAL IN EXCLUSIVE WAITING queue by the distributed lock manager 21. As indicated, the part 18, 19, 20 granted the EXCLUSIVE lock mode is referred to as the special server and the SETUP routine for that part 18, 19, 20 sets a bit in the respective common memory space to indicate special server status.
  • the SETUP routine After making the request for the SPECIAL resource in an EXCLUSIVE lock mode, the SETUP routine checks the common memory to see if the corresponding part 18, 19, 20 is the special server 106. If the corresponding part 18, 19, 20 is the special server, the SETUP routine returns control to the part 18, 19, 20 (107). If the corresponding part 18, 19, 20 is not the special server, the SETUP routine proceeds to call the CLUSTER BROADCAST routine 108, which will be explained in more detail below together with the related MSG -- AST blocking routine and the BROADCAST resource.
  • each SETUP routine specifies the SPECIAL -- AST completion routine in the request for the SPECIAL resource, the one part 18, 19, 20 granted the SPECIAL resource in an EXCLUSIVE lock mode through the called SETUP routine will be interrupted by the SPECIAL -- AST routine.
  • the SPECIAL -- AST routine initially utilizes the $GETLKIW service 201 of the lock manager 21 to obtain a list of all processes currently in the REQUEST FOR SPECIAL IN EXCLUSIVE WAITING queue.
  • the SPECIAL -- AST then makes a request 202 for an EXCLUSIVE lock mode through the $ENQW service of the lock manager 21 for each of the SERVER -- xxx resources corresponding to the processes in the list obtained from the $GETLKIW service.
  • the SPECIAL -- AST can either refer to the job record 16 or use the $GETSYI service to correlate the part 18, 19, 20 to the node name where the part is running.
  • SPECIAL -- AST can formulate the SERVER -- xxx names of the resources for which it makes the requests for an EXCLUSIVE lock.
  • Each request for the SERVER -- xxx will specify the FAIL AST blocking routine.
  • each request for the SERVER -- xxx resources made by the SPECIAL -- AST will be incompatible with the PROTECTED WRITE lock mode already granted to each process on the list (See 103, 104 FIG. 4).
  • the request made by SPECIAL -- AST on behalf of the special server will each be placed in a respective REQUEST FOR SERVER -- xxx EXCLUSIVE WAITING queue by the lock manager 21.
  • the SPECIAL -- AST will return control to the part 18, 19, 20 made special server 203.
  • FIG. 6 there is illustrated a logical block diagram of the parts 18, 19, 20 of the application program as linked and reverse linked by the running of the SETUP and SPECIAL -- AST routines.
  • each of the several SETUP routines called by the parts 18, 19, 20 stores information regarding the corresponding part 18, 19, 20 in a common memory space in the respective CPU 11, 12, 13.
  • the SPECIAL -- AST routine for the part 18, 19, 20 granted the EXCLUSIVE lock mode for the SPECIAL resource sets a bit in the field of the common memory space associated with the information on the corresponding part 18, 19, 20 to indicate special server status.
  • the corresponding lock request made by part 18 through the SETUP routine in the object library 22 resident in the CPU 11, where part 18 is running, will be in the granted queue 51 for the EXCLUSIVE lock mode of the SPECIAL resource maintained by the distributed lock manager 21.
  • a forward link between the special server and the other parts 19, 20 not granted the EXCLUSIVE lock mode on SPECIAL is provided by the REQUEST FOR SPECIAL IN EXCLUSIVE WAITING queue 53, which is acessible 54 to the special server through the $GETLKI service of the lock manager 21 (See FIG. 5).
  • a reverse link 55 between the special server and each other part 19, 20 of the application program is provided by the REQUEST SERVER -- xxx EXCLUSIVE WAITING queues 56 set up by the lock manager 21 upon the making of the EXCLUSIVE requests on behalf of the special server (part 18) by the SPECIAL -- AST 202 (FIG. 5).
  • each SERVER -- xxx resource will have a corresponding granted queue 57 for the PROTECTED WRITE mode which contains the rquest made by the SETUP routines on behalf of the corresponding part 18, 19, 20.
  • the granted queues 57 complete the reversed link 55 through the link 58 between each part 19, 20 and the corresponding granted queue 57.
  • the phantom line indication of a fourth part of the application represent an additional part which may enter the network after the running of the SPECIAL -- AST.
  • the setting up of the reverse link between the special server and the new part will be explained later in connection with the BROADCAST resource.
  • the LIST -- SERVER, UPDATE -- INFO and SHOW -- SERVER routines in the object library 22 are all provided as services to any application developer so that an application developer can call any one of the routines to utilize the services of the lock manager 21 in respect of the resources named by the fail-over mechanism.
  • each part 18, 19, 20 can call the UPDATE -- INFO routine to write information regarding the part such as, e.g. software ID, version no., step currently executing, etc., into the lock value block associated with the lock for the corresponding SERVER -- xxx resource.
  • the UPDATE INFO routine permits an application developer to utilize the lock value block feature of the VMS lock manager 21.
  • the SHOW -- SERVER routine enables any process running in the network to obtain and read the lock value block of the SERVER -- xxx of any part 18, 19, 20 to obtain the information in the lock value block.
  • the LIST -- SERVER routine enables any process running in the network to list the requests in the lock manager queues to thereby obtain information regarding the parts of a specific application program and where they are running.
  • a feature of the VMS Distributed Lock Manager includes a monitoring function of the operation of the components within the network.
  • the distributed lock manager 21 will invalidate the lock requests made from processes on the failed CPU and reform the queues accordingly.
  • the granted queue for the SERVER -- xxx of the part 19, 20 running on the failed CPU will be emptied and the lock manager 21 will look up and grant the waiting request for EXCLUSIVE on the SERVER -- xxx made by the SPECIAL -- AST on behalf of the special server.
  • the FAIL -- AST completion routine specified by the EXCLUSIVE request (See FIG. 5) will then interrupt the running of part 18 and execute.
  • the FAIL -- AST first calls a wild card $GETLKIW 300 to obtain information on the specific EXCLUSIVE request that has been granted to the special server.
  • a wild card $GETLKIW returns information on all of the lock requests of a process and will therefore return information on all of the SERVER -- xxx resource queues to the part that is the special server.
  • the one wherein the special server is the holder of the EXCLUSIVE lock mode corresponds to the part 19, 20 running on the CPU 12, 13 which has failed.
  • the name of the resource enables the FAIL -- AST to identify the part 19, 20 whch has failed due to a CPU malfunction.
  • the FAIL -- AST can look up the job record 16 for the application program and correlate the node name used to name the SERVER -- xxx to the part 18, 19, 20 running on the node.
  • the FAIL -- AST will then notify 301 the special server by either writing a failure message identifying the failed part to the mailbox specified in the call subroutine instruction argument inserted into the part or call an asynchronous system trap specified by the application developer. Either way, information on the failure of the one part of the application program is automatically provided to the application program in such a manner that corrective action can be taken.
  • the asynchronous system trap specified in the call to SETUP can read the job record 16 field containing the action to be taken on crash information.
  • FIG. 8 there is illustrated side-by-side flow charts for the CLUSTER -- BROADCAST and MSG -- AST routines of the object library 22.
  • the CLUSTER -- BROADCAST and MSG -- AST routines run in conjunction with the locking scheme for the BROADCAST and PERMISSION -- TO -- TALK resources to provide a communication facility between the parts 18, 19, 20 of the application program.
  • the SETUP routine calls CLUSTER -- BROADCAST 106 immediately after making a request for the EXCLUSIVE lock on SPECIAL 105 if the corresponding part 18, 19, 20 is not the special server.
  • the special server In the event a part that is not the special server (See, e.g., phantom line in FIG. 6) starts running after the SPECIAL -- AST has finished, the special server would not have included a request for the SERVER -- xxx resource corresponding to the new part. This is because the SETUP routine called by the new part would not have made a request for the SPECIAL resource in EXCLUSIVE until after the SPECIAL -- AST has run and thus, the request made by the new part would not have been in the REQUEST FOR SPECIAL IN EXCLUSIVE WAITING queue.
  • the CLUSTER -- BROADCAST routine is utilized to send two types of messages.
  • One type is a predefined message that the MSG -- AST routine will recognize and perform certain actions in response thereto.
  • Examples of predefined messages are a message sent by the SETUP routine upon the entry of a new part into the network and a message sent by SPECIAL -- AST when a new part becomes special server.
  • the other type of message can be passed directly to the application program, e.g. passed to the mailbox specified in the call soubroutine argument.
  • the CLUSTER -- BROADCAST routine initially makes a request for the PERMISSION -- TO -- TALK resource in the EXCLUSIVE lock mode 400.
  • the CLUSTER -- BROADCAST routine waits 401 until the request has been granted and then looks to see if it already has a lock on the BROADCAST resource 402. This can be accomplished by looking into the common memory set up by the SETUP routine to see if the process calling CLUSTER -- BROADCAST previously called the SETUP routine.
  • the CALL -- BROADCAST routine will proceed to request a conversion from the CONCURRENT READ lock mode to the EXCLUSIVE lock mode for BROADCAST 403.
  • CLUSTER -- BROADCAST If the part which called CLUSTER -- BROADCAST does not have a lock on BROADCAST, the CLUSTER -- BROADCAST routine will make a new request for the BROADCAST resource in the EXCLUSIVE lock mode 404.
  • This aspect of the CLUSTER -- BROADCAST routine makes the communication facility available to any process running on the network, whether it is a part of the application program or not. Thus, any process can send a message to the application program even if it is not a part of the program.
  • the MSG -- AST blocking routine specified by each part 18, 19, 20 of the application program will interrupt the respective part 18, 19, 20 and runs 500.
  • the blocking routine will run at this time since the EXCLUSIVE request made by the new part is incompatible with the CONCURRENT READ lock mode previously granted to each part (See 102, FIG. 5).
  • the MSG -- AST routine initially makes a request to convert the CONCURRENT READ lock on the BROADCAST resource to the NULL mode, which is compatible with the EXCLUSIVE lock mode sought by the new part (501).
  • the MSG -- AST routine utilizes the $GETLKI service 502 of the lock manager 21 to obtain information on the granted EXCLUSIVE lock queue of the BROADCAST routine to see if the new part has been granted the EXCLUSIVE lock mode 503. If the lock mode has not been granted, the routine loop back 504, e.g., every 100 milliseconds to 502 until the EXCLUSIVE lock mode has been granted. Once the EXCLUSIVE lock mode has been granted, the MSG -- AST routine makes a request to convert the NULL mode for BROADCAST back to the CONCURRENT READ mode 505.
  • the CLUSTER -- BROADCAST waits 405 for the EXCLUSIVE lock mode to be granted. Once it is granted, the CLUSTER -- BROADCAST routine writes a message 407 to the lock value block associated with the BROADCAST lock. If the process which called the CLUSTER -- BROADCAST routine is a new part, the message is directed to the special server and identifies the new part of the application program.
  • the CLUSTER -- BROADCAST routine looks to see if the calling process had a lock on BROADCAST prior to the calling of the CLUSTER -- BROADCAST routine 408, i.e., whether the process is a part of the application program and made a request for a BROADCAST CONCURRENT READ lock through the SETUP routine, or is an unrelated process sending a message to the application program. If the process did not initially have a lock on BROADCAST, the CLUSTER -- BROADCAST routine dequeues the EXCLUSIVE request 409. If the process did have a CONCURRENT READ lock, the CLUSTER -- BROADCAST routine makes a request to convert the lock mode from EXCLUSIVE back to CONCURRENT READ 410.
  • the MSG -- AST routine having made a request to convert the lock mode from NULL to CONCURRENT READ 505, then waits for the conversion 506, which will be granted after the CLUSTER -- BROADCAST request to dequeue or convert EXCLUSIVE 409, 410 is granted 507.
  • the MSG -- AST routine will then obtain and read 508 the lock value block.
  • a convention can be used in the lock value block to address a message.
  • an address field contained in the lock value block can indicate a node name to specify a message to a particular part of the application program, an asterisk in the address field can indicate that the message is directed to all of the parts 18, 19, 20 and a blank in the address field can indicate that the message is for the special server. If the message 509 is not for the part 18, 19, 20 which has been interrupted by the MSG -- AST routine, the MSG -- AST will return 510 control to the corresponding part 18, 19, 20.
  • the MSG -- AST will look at the special server status bit in the common memory and when it sees that it is set proceed 510 to read the message.
  • the message will be the predefined message sent by SETUP that indicates that a new part has entered the network.
  • the MSG -- AST responds to the predefined message by making a request for an EXCLUSIVE lock mode for the SERVER -- xxx resource corresponding to the new part 511. Before making the request, however, the MSG -- AST can utilize the $GETLKI service of the lock manager 21 to verify whether an EXCLUSIVE lock mode request for the SERVER -- xxx has already been made by the special server.
  • the CLUSTER -- BROADCAST routine completes by calling $GETLKI for the BROADCAST NULL mode 411.
  • the CLUSTER -- BROADCAST routine then checks to see if any of the other processes are still in the NULL mode 412, i.e. the request to convert has not yet been granted (505, 506) 413. If any of the requests have not yet been granted, the CLUSTER -- BROADCAST routine loops 414 back to the $GETLKI instruction 411, e.g. every 100 milliseconds, until there is no part still waiting conversion from the NULL mode to the CONCURRENT READ mode. At that time, 415, the CLUSTER -- BROADCAST routine dequeues the PERMISSION -- TO -- TALK EXCLUSIVE lock mode 416 and returns control to the new part 417.
  • the CLUSTER -- BROADCAST routine through the EXCLUSIVE lock on the PERMISSION -- TO -- TALK resource 400 and the waiting loops 414, 504 insures that the message written into the lock value block is received by the intended part of the application program, e.g., the special server.
  • the PERMISSION -- TO -- TALK resource also orders messages since only one process at a time can send a message due to the EXCLUSIVE lock requirement for the PERMISSION -- TO -- TALK resource. All other processes wanting to send a message at the same time will be placed in a waiting queue for the EXCLUSIVE lock on the PERMISSION -- TO -- TALK resource. The requests will be granted subsequently, one at a time, on a first come, first served basis until each message is sent.
  • the CLUSTER -- BROADCAST routine can be used to notify the special server when a part is to leave the network prior to the rest of the parts of the application program.
  • the application developer would have to insert a call CLUSTER -- BROADCAST instruction into the part that will leave the network to send a message to the special server to, e.g. dequeue the EXCLUSIVE request for the SERVER -- xxx corresponding to the departing part.
  • the CLUSTER -- BROADCAST routine can also be used by the SPECIAL -- AST to write a message to the other parts 19, 20 of the application program to indicate that it is the special server. This will be another predefined message using the asterisk (message for all parts) that causes each MSG -- AST to write the identification of the special server in the respective common memory spaces.
  • the lock manager 21 will remove the EXCLUSIVE lock on SPECIAL held by the special server and grant the EXCLUSIVE lock to the first part 19, 20 in the REQUEST FOR SPECIAL IN EXCLUSIVE WAITING queue.
  • the SPECIAL -- AST completion routine will then interrupt the part 19, 20 granted the EXCLUSIVE lock and run, as described above, to form the reverse links between the new special server and the SPECIAL -- AST can also call CLUSTER -- BROADCAST to send a message to each of the remaining parts indicating that it is now the special server for update of the respective common memory space.
  • the SPECIAL -- AST can look into the common memory space for the corresponding part that is now special server to see if it contains information on a previous special server, i.e. a message from a previous special server, as discussed above. If the common memory space for the new special server does have information on a special server, this will indicate to the SPECIAL -- AST that there was a previous special server and thus indicate a failure. The SPECIAL -- AST would then call a routine to act on the failure information as is done by the FAIL -- AST.
  • An application developer can also specify a particular part 18, 19, 20 to be the special server through the use of another predefined message.
  • the application developer would send the predefined message to each part (18, 19, 20) of the application program by calling CLUSTER -- BROADCAST.
  • the predefined message specifies which part is to be the special server and the MSG -- AST'S act to implement the specified special server, as follows.
  • the MSG -- AST's will simply return control to the part.
  • the MSG -- AST will dequeue the EXCLUSIVE lock on the SPECIAL resource and dequeue all of the locks for the SERVER -- xxx resources. The MSG -- AST will then once again request a lock in EXCLUSIVE for the SPECIAL resource and then exit.
  • the MSG -- AST will dequeue the request for SPECIAL in EXCLUSIVE and then again enqueue the request for SPECIAL in EXCLUSIVE before returning control to that part.
  • the releasing of the lock request for SPECIAL by each of the parts 18, 19, 20 not to be the special server will leave the one part designated by the application developer either already holding the EXCLUSIVE lock for the SPECIAL resource or alone in the waiting queue with no request in the granted queue.
  • the lock manager 21 will then grant the EXCLUSIVE lock to the part 18, 19, 20 designated by the application developer.
  • the making of the request for SPECIAL in EXCLUSIVE by the other parts 18, 19, 20 after releasing any lock requests for SPECIAL will be after the part 18, 19, 20 designated by the application developer has obtained the EXCLUSIVE lock for SPECIAL.
  • the other parts will be placed in the REQUEST FOR SPECIAL IN EXCLUSIVE WAITING queue.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Multi Processors (AREA)
  • Executing Machine-Instructions (AREA)
  • Hardware Redundancy (AREA)
US07/424,903 1989-10-20 1989-10-20 Mechanism for fail-over notification Expired - Lifetime US5117352A (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
US07/424,903 US5117352A (en) 1989-10-20 1989-10-20 Mechanism for fail-over notification
GB9021576A GB2237130B (en) 1989-10-20 1990-10-04 Mechanism for failure notification
JP2281723A JPH03194647A (ja) 1989-10-20 1990-10-19 故障通告方法
DE4033336A DE4033336A1 (de) 1989-10-20 1990-10-19 Verfahren zum erzeugen einer ausfallmeldung und mechanismus fuer ausfallmeldung
FR9013032A FR2655168A1 (fr) 1989-10-20 1990-10-22 Mecanisme pour notification de panne.

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US07/424,903 US5117352A (en) 1989-10-20 1989-10-20 Mechanism for fail-over notification

Publications (1)

Publication Number Publication Date
US5117352A true US5117352A (en) 1992-05-26

Family

ID=23684356

Family Applications (1)

Application Number Title Priority Date Filing Date
US07/424,903 Expired - Lifetime US5117352A (en) 1989-10-20 1989-10-20 Mechanism for fail-over notification

Country Status (5)

Country Link
US (1) US5117352A (fr)
JP (1) JPH03194647A (fr)
DE (1) DE4033336A1 (fr)
FR (1) FR2655168A1 (fr)
GB (1) GB2237130B (fr)

Cited By (55)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1994018621A1 (fr) * 1993-02-10 1994-08-18 Telefonaktiebolaget Lm Ericsson Procede et systeme dans un systeme d'exploitation reparti
US5408649A (en) * 1993-04-30 1995-04-18 Quotron Systems, Inc. Distributed data access system including a plurality of database access processors with one-for-N redundancy
US5553292A (en) * 1993-03-12 1996-09-03 International Business Machines Corporation Method and system for minimizing the effects of disruptive hardware actions in a data processing system
US5566297A (en) * 1994-06-16 1996-10-15 International Business Machines Corporation Non-disruptive recovery from file server failure in a highly available file system for clustered computing environments
US5594861A (en) * 1995-08-18 1997-01-14 Telefonaktiebolaget L M Ericsson Method and apparatus for handling processing errors in telecommunications exchanges
US5612865A (en) * 1995-06-01 1997-03-18 Ncr Corporation Dynamic hashing method for optimal distribution of locks within a clustered system
US5668993A (en) * 1994-02-28 1997-09-16 Teleflex Information Systems, Inc. Multithreaded batch processing system
US5682537A (en) * 1995-08-31 1997-10-28 Unisys Corporation Object lock management system with improved local lock management and global deadlock detection in a parallel data processing system
US5699500A (en) * 1995-06-01 1997-12-16 Ncr Corporation Reliable datagram service provider for fast messaging in a clustered environment
US5724584A (en) * 1994-02-28 1998-03-03 Teleflex Information Systems, Inc. Method and apparatus for processing discrete billing events
US5740359A (en) * 1994-12-27 1998-04-14 Kabushiki Kaisha Toshiba Program execution system having a plurality of program versions
US5745747A (en) * 1995-02-06 1998-04-28 International Business Machines Corporation Method and system of lock request management in a data processing system having multiple processes per transaction
US5748884A (en) * 1996-06-13 1998-05-05 Mci Corporation Autonotification system for notifying recipients of detected events in a network environment
US5923840A (en) * 1997-04-08 1999-07-13 International Business Machines Corporation Method of reporting errors by a hardware element of a distributed computer system
US5968189A (en) * 1997-04-08 1999-10-19 International Business Machines Corporation System of reporting errors by a hardware element of a distributed computer system
US5996086A (en) * 1997-10-14 1999-11-30 Lsi Logic Corporation Context-based failover architecture for redundant servers
US5999916A (en) * 1994-02-28 1999-12-07 Teleflex Information Systems, Inc. No-reset option in a batch billing system
US6178529B1 (en) 1997-11-03 2001-01-23 Microsoft Corporation Method and system for resource monitoring of disparate resources in a server cluster
US6230230B1 (en) * 1998-12-03 2001-05-08 Sun Microsystems, Inc. Elimination of traps and atomics in thread synchronization
US6243825B1 (en) * 1998-04-17 2001-06-05 Microsoft Corporation Method and system for transparently failing over a computer name in a server cluster
US6360331B2 (en) 1998-04-17 2002-03-19 Microsoft Corporation Method and system for transparently failing over application configuration information in a server cluster
US6412034B1 (en) * 1999-04-16 2002-06-25 Oracle Corporation Transaction-based locking approach
SG90111A1 (en) * 1999-03-30 2002-07-23 Ibm Cluster node distress signal
US6449734B1 (en) 1998-04-17 2002-09-10 Microsoft Corporation Method and system for discarding locally committed transactions to ensure consistency in a server cluster
US20020165929A1 (en) * 2001-04-23 2002-11-07 Mclaughlin Richard J. Method and protocol for assuring synchronous access to critical facilitites in a multi-system cluster
US20030005350A1 (en) * 2001-06-29 2003-01-02 Maarten Koning Failover management system
US6523078B1 (en) 1999-11-23 2003-02-18 Steeleye Technology, Inc. Distributed locking system and method for a clustered system having a distributed system for storing cluster configuration information
US6529982B2 (en) 1997-01-23 2003-03-04 Sun Microsystems, Inc. Locking of computer resources
US20030093524A1 (en) * 2001-11-13 2003-05-15 Microsoft Corporation Method and system for locking resources in a distributed environment
US20030097610A1 (en) * 2001-11-21 2003-05-22 Exanet, Inc. Functional fail-over apparatus and method of operation thereof
US20030101300A1 (en) * 2001-11-13 2003-05-29 Microsoft Corporation. Method and system for locking multiple resources in a distributed environment
US6574654B1 (en) * 1996-06-24 2003-06-03 Oracle Corporation Method and apparatus for lock caching
US20030105871A1 (en) * 2001-11-13 2003-06-05 Microsoft Corporation, Method and system for modifying lock properties in a distributed environment
US20030206536A1 (en) * 2002-05-06 2003-11-06 Mark Maggenti System and method for registering IP address of wireless communication device
US6658488B2 (en) 1994-02-28 2003-12-02 Teleflex Information Systems, Inc. No-reset option in a batch billing system
US6708226B2 (en) 1994-02-28 2004-03-16 At&T Wireless Services, Inc. Multithreaded batch processing system
US20040138907A1 (en) * 1994-02-28 2004-07-15 Peters Michael S. No-reset option in a batch billing system
US20040220774A1 (en) * 1999-12-29 2004-11-04 Anant Agarwal Early warning mechanism for enhancing enterprise availability
US20050132193A1 (en) * 2003-12-05 2005-06-16 Buffalo Inc. Cipher key setting system, access point, wireless LAN terminal, and cipher key setting method
US20060026299A1 (en) * 2004-07-29 2006-02-02 Gostin Gary B Communication among partitioned devices
US20060102159A1 (en) * 2004-11-18 2006-05-18 Hommes Daniel J Protruding oil separation baffle holes
US7058667B2 (en) 2000-12-27 2006-06-06 Microsoft Corporation Method and system for creating and maintaining version-specific properties in a file
US20060123003A1 (en) * 2004-12-08 2006-06-08 International Business Machines Corporation Method, system and program for enabling non-self actuated database transactions to lock onto a database component
US20060136926A1 (en) * 2001-11-13 2006-06-22 Microsoft Corporation Allocating locks in a distributed environment
US20060206901A1 (en) * 2005-03-08 2006-09-14 Oracle International Corporation Method and system for deadlock detection in a distributed environment
US20060277261A1 (en) * 2005-06-07 2006-12-07 Microsoft Corporation Locked receive locations
US20070260696A1 (en) * 2006-05-02 2007-11-08 Mypoints.Com Inc. System and method for providing three-way failover for a transactional database
US7302692B2 (en) 2002-05-31 2007-11-27 International Business Machines Corporation Locally providing globally consistent information to communications layers
US7444374B1 (en) * 1998-12-10 2008-10-28 Michelle Baker Electronic mail software with modular integrated authoring/reading software components including methods and apparatus for controlling the interactivity between mail authors and recipients
US20080266302A1 (en) * 2007-04-30 2008-10-30 Advanced Micro Devices, Inc. Mechanism for granting controlled access to a shared resource
US20090094243A1 (en) * 2004-06-23 2009-04-09 Exanet Ltd. Method for managing lock resources in a distributed storage system
US8229961B2 (en) 2010-05-05 2012-07-24 Red Hat, Inc. Management of latency and throughput in a cluster file system
US8533331B1 (en) * 2007-02-05 2013-09-10 Symantec Corporation Method and apparatus for preventing concurrency violation among resources
US9081653B2 (en) 2011-11-16 2015-07-14 Flextronics Ap, Llc Duplicated processing in vehicles
US9389926B2 (en) 2010-05-05 2016-07-12 Red Hat, Inc. Distributed resource contention detection

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4399504A (en) * 1980-10-06 1983-08-16 International Business Machines Corporation Method and means for the sharing of data resources in a multiprocessing, multiprogramming environment
US4480304A (en) * 1980-10-06 1984-10-30 International Business Machines Corporation Method and means for the retention of locks across system, subsystem, and communication failures in a multiprocessing, multiprogramming, shared data environment
US4646298A (en) * 1984-05-01 1987-02-24 Texas Instruments Incorporated Self testing data processing system with system test master arbitration
US4660201A (en) * 1984-03-13 1987-04-21 Nec Corporation Failure notice system in a data transmission system
US4665520A (en) * 1985-02-01 1987-05-12 International Business Machines Corporation Optimistic recovery in a distributed processing system
US4768150A (en) * 1986-09-17 1988-08-30 International Business Machines Corporation Application program interface to networking functions
US4803683A (en) * 1985-08-30 1989-02-07 Hitachi, Ltd. Method and apparatus for testing a distributed computer system
US4815076A (en) * 1987-02-17 1989-03-21 Schlumberger Technology Corporation Reconfiguration advisor
US4827411A (en) * 1987-06-15 1989-05-02 International Business Machines Corporation Method of maintaining a topology database
US4965719A (en) * 1988-02-16 1990-10-23 International Business Machines Corporation Method for lock management, page coherency, and asynchronous writing of changed pages to shared external store in a distributed computing system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5023779A (en) * 1982-09-21 1991-06-11 Xerox Corporation Distributed processing environment fault isolation
JPS62197858A (ja) * 1986-02-26 1987-09-01 Hitachi Ltd システム間デ−タベ−ス共用方式

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4399504A (en) * 1980-10-06 1983-08-16 International Business Machines Corporation Method and means for the sharing of data resources in a multiprocessing, multiprogramming environment
US4480304A (en) * 1980-10-06 1984-10-30 International Business Machines Corporation Method and means for the retention of locks across system, subsystem, and communication failures in a multiprocessing, multiprogramming, shared data environment
US4660201A (en) * 1984-03-13 1987-04-21 Nec Corporation Failure notice system in a data transmission system
US4646298A (en) * 1984-05-01 1987-02-24 Texas Instruments Incorporated Self testing data processing system with system test master arbitration
US4665520A (en) * 1985-02-01 1987-05-12 International Business Machines Corporation Optimistic recovery in a distributed processing system
US4803683A (en) * 1985-08-30 1989-02-07 Hitachi, Ltd. Method and apparatus for testing a distributed computer system
US4768150A (en) * 1986-09-17 1988-08-30 International Business Machines Corporation Application program interface to networking functions
US4815076A (en) * 1987-02-17 1989-03-21 Schlumberger Technology Corporation Reconfiguration advisor
US4827411A (en) * 1987-06-15 1989-05-02 International Business Machines Corporation Method of maintaining a topology database
US4965719A (en) * 1988-02-16 1990-10-23 International Business Machines Corporation Method for lock management, page coherency, and asynchronous writing of changed pages to shared external store in a distributed computing system

Cited By (86)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1994018621A1 (fr) * 1993-02-10 1994-08-18 Telefonaktiebolaget Lm Ericsson Procede et systeme dans un systeme d'exploitation reparti
CN1043093C (zh) * 1993-02-10 1999-04-21 艾利森电话股份有限公司 在分布式操作系统中用于卸下链接进程链的系统
US5606659A (en) * 1993-02-10 1997-02-25 Telefonaktiebolaget Lm Ericsson Method and system for demounting a chain of linked processes in a distributed operating system
US5553292A (en) * 1993-03-12 1996-09-03 International Business Machines Corporation Method and system for minimizing the effects of disruptive hardware actions in a data processing system
US5621884A (en) * 1993-04-30 1997-04-15 Quotron Systems, Inc. Distributed data access system including a plurality of database access processors with one-for-N redundancy
US5408649A (en) * 1993-04-30 1995-04-18 Quotron Systems, Inc. Distributed data access system including a plurality of database access processors with one-for-N redundancy
US7412707B2 (en) 1994-02-28 2008-08-12 Peters Michael S No-reset option in a batch billing system
US5999916A (en) * 1994-02-28 1999-12-07 Teleflex Information Systems, Inc. No-reset option in a batch billing system
US5668993A (en) * 1994-02-28 1997-09-16 Teleflex Information Systems, Inc. Multithreaded batch processing system
US6658488B2 (en) 1994-02-28 2003-12-02 Teleflex Information Systems, Inc. No-reset option in a batch billing system
US6708226B2 (en) 1994-02-28 2004-03-16 At&T Wireless Services, Inc. Multithreaded batch processing system
US5724584A (en) * 1994-02-28 1998-03-03 Teleflex Information Systems, Inc. Method and apparatus for processing discrete billing events
US20040138907A1 (en) * 1994-02-28 2004-07-15 Peters Michael S. No-reset option in a batch billing system
US6332167B1 (en) 1994-02-28 2001-12-18 Teleflex Information Systems, Inc. Multithreaded batch processing system
US6282519B1 (en) 1994-02-28 2001-08-28 Teleflex Information Systems, Inc. No-reset option in a batch billing system
US5566297A (en) * 1994-06-16 1996-10-15 International Business Machines Corporation Non-disruptive recovery from file server failure in a highly available file system for clustered computing environments
US5740359A (en) * 1994-12-27 1998-04-14 Kabushiki Kaisha Toshiba Program execution system having a plurality of program versions
US5745747A (en) * 1995-02-06 1998-04-28 International Business Machines Corporation Method and system of lock request management in a data processing system having multiple processes per transaction
US5699500A (en) * 1995-06-01 1997-12-16 Ncr Corporation Reliable datagram service provider for fast messaging in a clustered environment
US5612865A (en) * 1995-06-01 1997-03-18 Ncr Corporation Dynamic hashing method for optimal distribution of locks within a clustered system
US5594861A (en) * 1995-08-18 1997-01-14 Telefonaktiebolaget L M Ericsson Method and apparatus for handling processing errors in telecommunications exchanges
US5682537A (en) * 1995-08-31 1997-10-28 Unisys Corporation Object lock management system with improved local lock management and global deadlock detection in a parallel data processing system
US5748884A (en) * 1996-06-13 1998-05-05 Mci Corporation Autonotification system for notifying recipients of detected events in a network environment
US6574654B1 (en) * 1996-06-24 2003-06-03 Oracle Corporation Method and apparatus for lock caching
US6529982B2 (en) 1997-01-23 2003-03-04 Sun Microsystems, Inc. Locking of computer resources
US20030070021A1 (en) * 1997-01-23 2003-04-10 Sun Microsystems, Inc. Locking of computer resources
US5968189A (en) * 1997-04-08 1999-10-19 International Business Machines Corporation System of reporting errors by a hardware element of a distributed computer system
US5923840A (en) * 1997-04-08 1999-07-13 International Business Machines Corporation Method of reporting errors by a hardware element of a distributed computer system
US5996086A (en) * 1997-10-14 1999-11-30 Lsi Logic Corporation Context-based failover architecture for redundant servers
US6178529B1 (en) 1997-11-03 2001-01-23 Microsoft Corporation Method and system for resource monitoring of disparate resources in a server cluster
US6243825B1 (en) * 1998-04-17 2001-06-05 Microsoft Corporation Method and system for transparently failing over a computer name in a server cluster
US6449734B1 (en) 1998-04-17 2002-09-10 Microsoft Corporation Method and system for discarding locally committed transactions to ensure consistency in a server cluster
US6360331B2 (en) 1998-04-17 2002-03-19 Microsoft Corporation Method and system for transparently failing over application configuration information in a server cluster
US6230230B1 (en) * 1998-12-03 2001-05-08 Sun Microsystems, Inc. Elimination of traps and atomics in thread synchronization
US7444374B1 (en) * 1998-12-10 2008-10-28 Michelle Baker Electronic mail software with modular integrated authoring/reading software components including methods and apparatus for controlling the interactivity between mail authors and recipients
SG90111A1 (en) * 1999-03-30 2002-07-23 Ibm Cluster node distress signal
US6442713B1 (en) 1999-03-30 2002-08-27 International Business Machines Corporation Cluster node distress signal
US6412034B1 (en) * 1999-04-16 2002-06-25 Oracle Corporation Transaction-based locking approach
US6523078B1 (en) 1999-11-23 2003-02-18 Steeleye Technology, Inc. Distributed locking system and method for a clustered system having a distributed system for storing cluster configuration information
US20040220774A1 (en) * 1999-12-29 2004-11-04 Anant Agarwal Early warning mechanism for enhancing enterprise availability
US7823134B2 (en) * 1999-12-29 2010-10-26 Symantec Operating Corporation Early warning mechanism for enhancing enterprise availability
US7058667B2 (en) 2000-12-27 2006-06-06 Microsoft Corporation Method and system for creating and maintaining version-specific properties in a file
US7849054B2 (en) 2000-12-27 2010-12-07 Microsoft Corporation Method and system for creating and maintaining version-specific properties in a file
US20020165929A1 (en) * 2001-04-23 2002-11-07 Mclaughlin Richard J. Method and protocol for assuring synchronous access to critical facilitites in a multi-system cluster
US6959337B2 (en) * 2001-04-23 2005-10-25 Hewlett-Packard Development Company, L.P. Networked system for assuring synchronous access to critical facilities
US20030005350A1 (en) * 2001-06-29 2003-01-02 Maarten Koning Failover management system
US20060136926A1 (en) * 2001-11-13 2006-06-22 Microsoft Corporation Allocating locks in a distributed environment
US20030101300A1 (en) * 2001-11-13 2003-05-29 Microsoft Corporation. Method and system for locking multiple resources in a distributed environment
US7406519B2 (en) 2001-11-13 2008-07-29 Microsoft Corporation Method and system for locking resources in a distributed environment
US20030105871A1 (en) * 2001-11-13 2003-06-05 Microsoft Corporation, Method and system for modifying lock properties in a distributed environment
US7159056B2 (en) 2001-11-13 2007-01-02 Microsoft Corporation Method and system for locking multiple resources in a distributed environment
US7487278B2 (en) 2001-11-13 2009-02-03 Microsoft Corporation Locking multiple resources in a distributed environment
US6748470B2 (en) * 2001-11-13 2004-06-08 Microsoft Corporation Method and system for locking multiple resources in a distributed environment
US20080307138A1 (en) * 2001-11-13 2008-12-11 Microsoft Corporation Method and system for locking resources in a distributed environment
US20060136637A1 (en) * 2001-11-13 2006-06-22 Microsoft Corporation Locking multiple resources in a distributed environment
US20030093524A1 (en) * 2001-11-13 2003-05-15 Microsoft Corporation Method and system for locking resources in a distributed environment
US20040221079A1 (en) * 2001-11-13 2004-11-04 Microsoft Corporation Method and system for locking multiple resources in a distributed environment
US20030097610A1 (en) * 2001-11-21 2003-05-22 Exanet, Inc. Functional fail-over apparatus and method of operation thereof
US6934880B2 (en) 2001-11-21 2005-08-23 Exanet, Inc. Functional fail-over apparatus and method of operation thereof
US20030206536A1 (en) * 2002-05-06 2003-11-06 Mark Maggenti System and method for registering IP address of wireless communication device
US8064450B2 (en) * 2002-05-06 2011-11-22 Qualcomm Incorporated System and method for registering IP address of wireless communication device
US20070294709A1 (en) * 2002-05-31 2007-12-20 International Business Machines Corporation Locally providing globally consistent information to communications layers
US7302692B2 (en) 2002-05-31 2007-11-27 International Business Machines Corporation Locally providing globally consistent information to communications layers
US8091092B2 (en) 2002-05-31 2012-01-03 International Business Machines Corporation Locally providing globally consistent information to communications layers
US20050132193A1 (en) * 2003-12-05 2005-06-16 Buffalo Inc. Cipher key setting system, access point, wireless LAN terminal, and cipher key setting method
US8566299B2 (en) 2004-06-23 2013-10-22 Dell Global B.V.-Singapore Branch Method for managing lock resources in a distributed storage system
US20090094243A1 (en) * 2004-06-23 2009-04-09 Exanet Ltd. Method for managing lock resources in a distributed storage system
US8086581B2 (en) * 2004-06-23 2011-12-27 Dell Global B.V. - Singapore Branch Method for managing lock resources in a distributed storage system
US8898246B2 (en) * 2004-07-29 2014-11-25 Hewlett-Packard Development Company, L.P. Communication among partitioned devices
US20060026299A1 (en) * 2004-07-29 2006-02-02 Gostin Gary B Communication among partitioned devices
US20060102159A1 (en) * 2004-11-18 2006-05-18 Hommes Daniel J Protruding oil separation baffle holes
US20060123003A1 (en) * 2004-12-08 2006-06-08 International Business Machines Corporation Method, system and program for enabling non-self actuated database transactions to lock onto a database component
US20060206901A1 (en) * 2005-03-08 2006-09-14 Oracle International Corporation Method and system for deadlock detection in a distributed environment
US7735089B2 (en) * 2005-03-08 2010-06-08 Oracle International Corporation Method and system for deadlock detection in a distributed environment
US8010608B2 (en) * 2005-06-07 2011-08-30 Microsoft Corporation Locked receive locations
US20060277261A1 (en) * 2005-06-07 2006-12-07 Microsoft Corporation Locked receive locations
US7613742B2 (en) * 2006-05-02 2009-11-03 Mypoints.Com Inc. System and method for providing three-way failover for a transactional database
US20070260696A1 (en) * 2006-05-02 2007-11-08 Mypoints.Com Inc. System and method for providing three-way failover for a transactional database
US8533331B1 (en) * 2007-02-05 2013-09-10 Symantec Corporation Method and apparatus for preventing concurrency violation among resources
US20080266302A1 (en) * 2007-04-30 2008-10-30 Advanced Micro Devices, Inc. Mechanism for granting controlled access to a shared resource
US8576236B2 (en) * 2007-04-30 2013-11-05 Advanced Micro Devices, Inc. Mechanism for granting controlled access to a shared resource
US8068114B2 (en) * 2007-04-30 2011-11-29 Advanced Micro Devices, Inc. Mechanism for granting controlled access to a shared resource
US8229961B2 (en) 2010-05-05 2012-07-24 Red Hat, Inc. Management of latency and throughput in a cluster file system
US9389926B2 (en) 2010-05-05 2016-07-12 Red Hat, Inc. Distributed resource contention detection
US9870369B2 (en) 2010-05-05 2018-01-16 Red Hat, Inc. Distributed resource contention detection and handling
US9081653B2 (en) 2011-11-16 2015-07-14 Flextronics Ap, Llc Duplicated processing in vehicles

Also Published As

Publication number Publication date
GB9021576D0 (en) 1990-11-21
JPH03194647A (ja) 1991-08-26
GB2237130B (en) 1994-01-26
FR2655168A1 (fr) 1991-05-31
DE4033336A1 (de) 1991-04-25
GB2237130A (en) 1991-04-24

Similar Documents

Publication Publication Date Title
US5117352A (en) Mechanism for fail-over notification
US5454108A (en) Distributed lock manager using a passive, state-full control-server
US5805900A (en) Method and apparatus for serializing resource access requests in a multisystem complex
US5706516A (en) System for communicating messages among agent processes
US5613139A (en) Hardware implemented locking mechanism for handling both single and plural lock requests in a lock message
US5161227A (en) Multilevel locking system and method
US6622155B1 (en) Distributed monitor concurrency control
US5499364A (en) System and method for optimizing message flows between agents in distributed computations
US5109515A (en) User and application program transparent resource sharing multiple computer interface architecture with kernel process level transfer of user requested services
US6108654A (en) Method and system for locking resources in a computer system
Huang et al. On Using Priority Inheritance In Real-Time Databases.
US6138168A (en) Support for application programs in a distributed environment
EP0444376B1 (fr) Dispositif de passage de messages entre plusieurs processeurs couplé par une mémoire partagée intelligente
US5956712A (en) Byte range locking in a distributed environment
US5687372A (en) Customer information control system and method in a loosely coupled parallel processing environment
JPS63201860A (ja) ネツトワーク管理システム
US6865741B1 (en) Determining completion of transactions processing in a dynamically changing network
US5682507A (en) Plurality of servers having identical customer information control procedure functions using temporary storage file of a predetermined server for centrally storing temporary data records
EP0747814A1 (fr) Système et méthode de commande d'information client avec fonctions de commande de sérialisation de transaction dans un environnement de traitement parallèle à couplage lâche
US6587889B1 (en) Junction manager program object interconnection and method
EP0747812A2 (fr) Système et méthode CICS avec des fonctions API pour démarrer et annuler des transactions dans un environnement de traitement parallèle à couplage lâche
US20030172195A1 (en) Method and system for guaranteeing sequential consistency in distributed computations
US6704765B1 (en) System for allocating resources among agent processes
Cvijović et al. An approach to the design of distributed real-time operating systems
JPH09330241A (ja) デッドロック防止排他制御方式

Legal Events

Date Code Title Description
STCF Information on status: patent grant

Free format text: PATENTED CASE

FPAY Fee payment

Year of fee payment: 4

FPAY Fee payment

Year of fee payment: 8

AS Assignment

Owner name: COMPAQ INFORMATION TECHNOLOGIES GROUP, L.P., TEXAS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:DIGITAL EQUIPMENT CORPORATION;COMPAQ COMPUTER CORPORATION;REEL/FRAME:012447/0903;SIGNING DATES FROM 19991209 TO 20010620

FPAY Fee payment

Year of fee payment: 12

AS Assignment

Owner name: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P., TEXAS

Free format text: CHANGE OF NAME;ASSIGNOR:COMPAQ INFORMATION TECHNOLOGIES GROUP, LP;REEL/FRAME:015000/0305

Effective date: 20021001