CA1213064A

CA1213064A - Asynchronous checkpointing system for error recovery

Info

Publication number: CA1213064A
Application number: CA000449475A
Authority: CA
Inventors: Rufus E. Finley
Original assignee: Burroughs Corp
Current assignee: Unisys Corp
Priority date: 1983-03-14
Filing date: 1984-03-13
Publication date: 1986-10-21

Abstract

ASYNCHRONOUS CHECKPOINTING SYSTEM FOR ERROR RECOVERY
Abstract A method of recovering from an error condition during operation of a program that is modifying a data base without corrupting the data base, wherein the program includes calls to record the progress of the operation in a table in memory. On the occurrence of an error condition, the tables for all programs in operation are transferred to a disk. During error recovery, the tables are returned to memory where the information stored in the respective tables is used by each active program to restore operation of the particular program to a point where the operation can be completed without corrupting the data base. Each program is designed to interrogate its own recovery table following the occurrence of an error condition to restore operation at a point where the integrity of the data base is assured.

Description

L3~69~

AS~CEIRONOUS CHECKPOINTING SYSTEM FOR ERROR RECOVERY
Fie1~ tbe 1~ enti~n This invention relates to a data processing system for controlling a data base a~d, more particularly, is directed to a method and means for preventing data base 15 corruption as a result of an unexpected system shutdown.
Back~rourd o the Inventi_ In a data base management system, for example, for storing, updating and retrieval o information, such as data items stored i~ ~he form of records in one or more 20 iles, unexpected system shutdown may result in corruption o the data ba~e and cause problems in restarting the data base managemeIl~ system. One well-known tech~ique is to make a permanent updated r~cord of the data base at fixed lnterval~ of time, ~uch as ~he beginning of a 25 d~y, or the start of a new shift, or other convenient time. If the system e7~periences a shu~down due to power failure or some other probleml an uncorrupted data base can be duplicated u~ing the ~ackup recording and then repea~ing the operationc that modify the data base from the time of the last backup recording. Such a system has obvious drawbacks in that duplicating a single d~y's or single ~hift operation to update the data base at best may involve many manohour~ of effort and at worse may be impossible to re~onstruct.
3~
, . . . ~
.,~ ~

3~:~$~

~2--1 The concept of checkpointing has been proposed which provides for automatically ha]ting processing at controlled intervals to make a magnetic tape or disk recordiny of the condition of all variables of the machine run. In the even-t of an error or interruption, restart proceduras make it possible to continue processing from the last checkpoint rather than from the beginning of the machine run. Such known checkpointing techniques are under automatic system control in which checkpoints are established at processing intervals based on a certain number of items, transactions, or records having been processed. At each checkpoint, input and output records must be recorded along with the contents of storage areas and memory, as well as the contents of counters and registers in the processor. After an error or other interruption, the accuracy of processing up to that point must be verified and a restart procedure selected which re-enters the main routine at that point.
According to the present invention there is provided in a data base management system having a plurality of application programs capable of concurrently executing a plurality of tasks ~ith respect to a data base located in a non-volatile bulk memory, a method of providing said system with the capability of recovering from an unexpected interruption in system operation without corrupting the data base, said method ~5 comprising allocating a task ~ecovery area in a random access memory during the initial processing of each task which may modify the data base; independently recording task recovery data during the execution of each such task in i~s respective task recovery area to the extent required to provide recovery for the task without data base ~orruption in the event of an unexpected interruption in system operation while the task is executing, said independently recording task recovery data occurring in response to instructions embedded in the application programs used for executing the task; de-allocating each task recovery area in response to deactivation of the task as a result of its successful completion; transferring recovery data in said task recovery areas to said bulk memory in response to the occurrence of an unexpected interruption ,, .~

1 in system operation while a task is executing; during system recovery operations, returning the recovery data transferred to said bulk memory in response to said interruption back to said task recovery areas in said random access memory; and then independently performing task recovery opexations for each task which was active when the unexpected interruption occurred using the recovery data returned to its respective task recovery area, said task recovery operations being performed in a manner such that the integrity of the data hase is maintained, said independently performing task recovery operations occurring in response to instructions embedded in the application programs used for executing the task.
In one embodiment there is provided an improved method and means for error recovery to prevent data base corruption as a result of unexpected system shutdown. The error recovery system may be called "distributed asynchronous checkpointing" since it is under software app~ication control rather than automatic system control. The application software provides for the continuous recording of information needed to resolve data base inconsistencies or restart in the event of a system shutdown. The application software determines what data is temporarily stored in memory by the current task program and when it is to be recorded based on the particular function the software is currently performing.
Even though several tasks may be active concurrently, each task checkpoints itself independently. A task as usad herein is a collection of individual programs that operate as a single transaction performing a pre-specified function. Each task records this error recovery information in an area in memory which is identified by a uniquP identification number. If a task or particular activity within a task is completed, its recorded error recovery inormation is erased.
However, if the data management system experiences an unexpected shutdown, all currently recorded error recovery information is transferred to the permanent storage, such as a diskO During a subsequent error recovery operation, the recorded information is returned to the random access storage;
each task that was currently active when the system shut down L3~6~-~

l can then interrogate the recorded information and take appropriate action to correct any inconsistencies based on the recorded information.
An embodiment of the present invent.ion will now be described by way of example, with reference to the accompanying drawings in which:-FIG. 1 is a block diagram of a digital data processing system e.mbodying means and methods in accordance with the present embodiment FIGS. 2, 3 and 4 are flow diagrams illustrating the overall operation of the data processing system of FIG. l;
FIG. 5 is a chart of the recovery tables in memory, FIG. 6 is a flow chart of the system operation under normal operation;
FIG. 7 is a flow chart of system operation with error recovery in progress;
FIG. 8 is a block diagram showing schematically the operation of a multi-program system having exror recovery recording in memory;
FIGS. 9-19 are flow diagrams of subroutine calls used to implement the error recovery operation; and FIGS. 20-23 are flow diagrams showing an example o a task using the error recovery system~

L3~69~

1 Detailed Description The Prror recovery system of ~he present embodiment may be incorporated in any data processing system in which programs are executed that request block allocations in a bulk memory used for storing a data base, or that change data stored in the bulk memory. Such operations, if unexpectedly interrupted before completion, may corrupt the data base. Each program involving these types of activities contains calls to "activity recording 10 services" to record sufficient information to support an orderly recovery following any unexpected interr~uption.
FIG. 1 shows an example of a typical hardware system which supports a data base stored in a bulk memory 16 such as a magnetic disk memory~ The setting up of files 15 on the disk, the storage of data in such files, or the changing of data in such files is under user control from a plurality of terminals, three of which are indicated at 10, 12, and 14 in FIG. 1. These terminals allow the users to communicate with the data base stored on the 20 disk memory 16 by means of keyboards 18 at each terminal~
Interaction with the user is provided by a CRT display 20 at each terminal.
Transfer of digitally coded information into and ou~ of the user terminals is controlled by a processor 25 22 over a common bus 24. The terminals are connected to the bus through an interface 26. The processor 22 is controlled by programs stored in a read only memory (ROM) 28. The processor uses a random access memory (RAM) 30 as a temporary storag during execution of the program~
30 The RAM has an auxiliary power source 32 which maintains power to the RAM in case of power failure to the system7 Transfer of information to and from the disk memory 16 313~

, -6-1 is through a disk controller 34 connected to the main bus 24. I~ will be seen that FIG. 1 represen~s conventi.onal architecture of a digital data processing system.
Referring to FIGS. 2, 3 and 4, the system operation under control of programs stored in the ROM 28 may be summarized as follows:
Once power is turned on to the system, as indicated at 36, a check is made (see 38 on FIG. 2) to determine whether there was a prior pcwer failure which in~errupted operation of the systemO A power failure is one of the unexpected interruptions which causes shu~down of the ~ystem and invoke~ the error recovery procedure on restoration of power. Whenever there i5 a pow~r ailure during sy3tem operati`on, a Power Faiiure flag i~ set i~
a specific location in R~M 30 which is retained in RAM
30 by the auxiliary power source 32. The manner in which the Power Failure 1ag is s~t is de~cribed below.
A~suming that the system is in an initial startup mode, the Power Failure flag will be fals~.
The sy~tem then ~nter~ an initial startup phase, indicated at 40, which "boots" the $y8tem and ~tart~
reading into the R~M 30 from the disk memory 16. This places the sy~tem in an initial ope~ating phase.
The first operation of the system i5 to read two ~5 Status 1ags ~tored at a predetermined location on the di~k and load these flags into predetermined locations in the R~M 30. As will be d~scribed in detail below, one flag indicates the statu~ of the data base, namely, good or bad, and the other flag indicates whether or not a recovery table is stored on the di-qk. Followiny an initial power-up, both of these flags will be fals4.
However, if there has been a prior unexpected interruption 3~

3~

1 in the operation of the system due, for example, to a software detected errQr or a power failure, both of these flags will be true. Once the Data Base Status flag and ~he RecoYery Table Status flag are stored in the RAM
30, these flags as stored on the disk are both reset to false in preparation for a possible uncontrolled shutdown of the system, as indicated at 44.
If the Data Base Statu~ flag transferred from the disk to RAM 30 is true, indicating that the data base is corrupted due to an unexpected shutdown, a warning light is turned on at each of the user terminals, as indicated at 46 and 48 in FIG. 2. As shown at 50 in FIG. 3, the Recovery Table Status 1ag as stored in memory 30 is then checked. If a recovery table is precent on t`he disk, as indicated by the Recovery Table Status flag being true, an error reçovery procedure is required. However, assuming for the moment that the system is in an initial power~up phase, the Racovery Table 5tatus flag will be false. This sets the Recovery in Progress flag in the RAM 30 to false, as indicated at 52. The syqtem then causes all defined application and system tasks to be activated in anticipation of a user request from one of the terminals, as indieated at 54.
As~uming that the Recovery in Progress ~lag is false~ as determined at 56 in the flow diagram of FIG. 3, the operating progr m the~ calls an "activity anchox initiali~e" subroutine. The function of this subroutine is to s~t up a table of task anchors in memory for each of the tasks performed by ~ny appLication programs called in response to user requests o a type that involve an ~rror recovery procedure. User requests that only interrogat~ the data base do not corrupt the data base l3~

1 and so do not invoke any error recovery procedure~
As seen in FIG. 5, task anchors for up to 50 tasks, for example, are established in RAM 30 at known locations.
Each task anchor includes a forward link address and a backward link address which later are set to point to the first and last activity recording areas for the asociated task. These ac~ivity recoxding areas are used to store error recovery data during execution of the task programsO A task may have one or more activities which modifie~ the data base. If only one activity recording area is required for a particular task, the forward link and backward link in the task anchor will point to the same activity recording area. The initialized task anchors have the links pointing to the task anchor address. A Size word in the task anchor i~ set to the size of the anchor table, a Lower Status word is set to zero, and a Checksum is calculated and stored. This is done for all ~he task anchors during ini~ialization, At this point, the opera~ing program indicates to the ~erminal users that the system is ready to operate.
Reerring to the flow diagram of ~IG. 4, as indicated at 6~, the data processing system now functions in a normal manner to allow the users to allocate new files in the data base, readout data, add additional information ~O the file in the da~a base, or initiate any other transactions which the system is progr~mmed to perform in managing the data base files.
Referring to FIG. 6, normal operation of the ~ystem with or wi~hout error recovery is summarized. Ta~ks l-N
operate at the input level to provide input from the respec~ive terminals. Thus task 1 a~ the inpu level 12~L3064 g 1 causes User #l terminal to display a menu from which the user selects a particular option, e.g., create a file.
Task 2 does the same at User #2 terminal and task N, of course, to the User #~ terminal. Each task responds to the particular option selected by the user and prompts the user to en~er addi~ional information, depending on the op~ion selec~ed. For example, if User #l selects an option to create a new file in the data base, task 1 will then prompt User #l to enter additional information such as the file name, description and fields. The input tasX
is completed by se~ding the new transaction tG a transaction scheduler, which i~ a pro~ram ta~k for initiating the transactions called by the several user terminals~
lS As noted above, each task is a collection o individual programs that operate ac a sin~le transaction performing a prespecified function. In FIG. 6, broken lines ~eparate the tasks to indicate that each ta~k may be active concurrently with other tasXs. The processor switches between tasks using interrupts to provide a conventional multi-programming operation. The scheduler, which i9 a ~eparate task, receives the specified tran~actions from the input tasXs and save~ them in a transaction queue on the disk. ~he queue identifies each transaction called for by a user and stores the associa~ed input data rom the user terminal. The scheduler select~ a transaction in the queue to process and uses the transaction to activate an appropriate ~ask to be executed by the processor. The scheduler removes the transaction from the queue when the scheduled task is completed. The scheduler causes the particular ta~k to be executed, a~ indicated at 150 o~ FIG. 6. Under 3~6~

1 normal conditions, the Recovery in Progress flag is false since the task has not yet been initiated. When a particular transaction is complete, as indicated at 152, the scheduler is notified and removed from the disk queue.
During normal operation in the execution of any assigned task by the data processiny system, the system continuously monitors three conditions which result in termination of operation. ~s shown in FIG. 4 at 64, if a user turns the key off on the system, all currently requested transactions are completed, the Data Base Status flag is set to true and stored in the disk, as indicated at 68, and then the power is shut off. A
second condition which is monitored is a software detected error. The programs are written to contain ample checks to detect common types of program int~rface errors, unexpected changes to sensitive data in memory, i~con~istent links in data base storage, vut of range off~et values and unexpected rçtu~ns from call procedures.
If any such condition is detected, as indicated at 70, the status o a Recovery in Progress flag is checked to determine w~ether ~he system is currently operating in a normal mode or in a recovery mode. Assuming that the gygtem i9 currently operating in the normal mode so tha~
the Recovery in Progress flag is ~alse, the recovery ~5 tables and the ac~ive task table in memvry are transferred to the disk, and the Recovery Table Presen~ Sta~us flag on the disk i~ set to true, as indicated at 94 and 96.
The Data Ba~e Status flag, which may be either good or bad at thi~ staye, is also stored on the disX (see 983 and the system returns to the ~tartup mode starting at "A' in FIG. 2. If the ~y~tem is already in a recovery mode at the time a software error is d~tected, the system set~ the Data Base Statu~ flag in memory true, ~ .

3~6~

1 indicating that the data base status is bad. The Data Base S~atus flag is then written to disX and the system returns to the startup mode at "A".
A third unexpected event which is detected, as indicated at 102 in FIG. 4, is a complete power failure.
A power failure does not result in an immediate loss of information s~ored in RAM because the R~M is provided with a backup battery power supply 32. A power failure causes the Power Failure flag stored in the RAM to be set to true, and the system remains ln this condition until power i~ restored. Once power is restored to the system, system operation returns to the ~tartup mode, indicated at "A" on the f1QW diagram of FIG. 2.
Since the Power Failure flag is now true, the system checks to see if a recovery was in progress. If not, as indicated at 110 in FIG. 2, the recovery table and active tasX table in RAM are stored on the disk along with the Data Base 5tatus flag. The Recovery Table Present Status ~lag i~ set to true and written on the di~k a~ well. After the Power Failure flag is reset, the system operation returns to the startup mode.
The startup operation is identical to that described above in connection with FIG. 2. When ~y~tem op~3ration reaches the point where .it checks to d termine whether a recovery table is present, as indicated at 50 in FIG. 3, if operating in a recovery mode, it will find that this flag i5 ~rue. In this ~ase, the recovery table and the active task table which were stored on the disk as the result of a power failure or a software detected error, are tra~sferred from the disk back into RAM, as indicated at 1~0 in FIG. 3. A Recov~ry in Progress flag in memory is ~hen set to true.

3~

1 When this flag is checked, as indicated at 56 in the flow diagram of FIGo 3, the system will now find that the flag is true, indicating ~hat a recovery is in progress. This condition is displayed on all the terminals to tell the users to stand by. The system then schedules a task tc perform application exror recovery for each active task identified in the active task table, as indicated at 126 and described in detail in connection with the flow diagram of FIG. 7. When application error recovery has been completed for each of the active tasks, the Recovery in Progress flag is reset to false and the system is ready to resume normal operation.
Referring to FIG. 7 in detail, if a recovery is in 15 progxess, as determined at 56, N is set to the number of transactions in the active task table in memory. If N
is not equal to zero, the first transaction identi~ied in the active task table i5 sent to the schedul~r. The transaction scheduler, just a~ in normal operation 20 described in connection with FIG. 6, saves the transaction in the disk queue. The scheduler then checks to determine whether the recovery and the transaction are complete, as indicated at 131~ If so, the complete statu~ causes the scheduler to remove the transaction from ~he queue and re~urns operation ~o the system. The value of N i5 then decremented by 1, as indicated at 133.
If N is still not zero, the next transaction in the ac~ive task table is sent to the scheduler. If a transaction i5 not complete, the transaction scheduler selects a transastion in he disk queue to process and sends the transaction to the appropriate task to be ~5 1 processed, as indicated a~ 135 and 137 in FIG. 7 4 The particular task receives and processes the transaction with the Recovery in Progress flag set true, as indicated at 139. When the task is complete, it sends the transaction back to the scheduler with a complete status, as indicated at 141. It will be seen that the operations of FIGS. 6 and 7 are similar except that the transactions are received from the active task table rather than from the terminals, and the particular task called for by the 10 transaction is processed with the Recovery in Progress flag set to true rathsr than being set to false.
As pointed out above, each task involving modifica-tion of the data base for processiny a transaction initiated by a user terminal incorporates its own recovery 15 procedures within the task program. Each program is responsible for recording the information that is necessary for error recovery to be successful. These recordings occur to identify the phase in which the program is currently processing, and the critical 20 information is collected and grouped by task in ~he error recovery ~able. While this procedure can be implemented for each task as a whole, it may be more convenient in designing the error recovery procedure ~or a given task to be divided into a number of separate nested activities 25 with each activity recording data in memory~ If an activity is completed successfully, the recorded recov~ry data for that activity may be deleted frcm memoryO
Thus, as shown in FIGo 8, the data base management system uses the scheduler, indicated at 150, to activate the 30 tasks necessary to process the ~ransactions initiated by the various user terminals~ Each task has a recording area in RAM, indica~ed at 152, in which error recovery 3~

3~36~

1 data is stored. Each task may be subdivided into nested subroutines, indicated as program A, program a, and program C under task No. 1. Each subroutine, reEerred to as an activity level, provides error recording with an associated area in RAM.
Referring again to FIG. 5, each task has an area set aside in memory, referred to as a task anchor. The anchor for each task comprises a heading for storing link addresses or pointers to first and last activity recording 10 areas in RAM. These activity recording areas are set aside by each activity within a task. Thus, as shown in FIG. 5, the first or highest level activity sets aside a first recording area. A forward link address pointing to the location of the first recording area is stored in the 15 task anchor. A backward link pointing to the same recording area is also stored in the task anchor.
An activity header for the first recording area for each task is now provided with a forward link and a backward link, both of which point to the task anchor, assuming 20 there is only a single recording area for that task~ If there are two or more recording areas, such as woul~d be required for program B and program C, these additional recording areas are provided with forward and backward links which provide both forward and backward loops for 25 linking ~he taslc anchor and each of the activity recordiny areas for the particular task in a ';chain". The activity header for the anchor and each activity, in addition to the forward and backward links, include the size of the recording area, a Lower Status fla~, which indicates 30 the status of th~ next lower activity level in the task, and a Checksum value. In addition, a Data Ready ~lag is provided in each recording area and set true when ~3 ^ -15 1 the activity recovery data is initiali~ed. This flag is checked during recovery as a precaution against use of incomplete da a. A recovery identification indicating the activity level within the task is also recorded.
Individual data items stored in an activity recording area are set by calls embedded in the application program at the particular activity level to "activity set byte", "activity set word" and "activity set doubl~ word". Each of these calls is used to assign a value to a byte, word or double word variable in the recovery data stxucture. An "activity start" call procedure is u~ed to mark the start of an activity for recovery purposes. I~ establishes the number of bytes of local data to be reserved for recording information for the particular activity level and establishes a pointer variable for the base s~ructure which defines the local data for the particular dctivity level. Two other procedure calls for the error recovery recording services are required, an Nac~ivity data ready'~ call which is used to mark the fact that the variables in the current recovery data s~ructure have been assigned initial values, and an "activity en~" call which marks the exit from an activity and operates to discard all data associated with the current activity level. Each of the above listed ~5 call procedures used for recording error recovery data during ex~cution oE a task is described in more detail in connection with the flow diagrams shown ~n FIGS. 9-16.
Th~ "activity start'l call, shown in detail in FIG.
9, is used to establish a recording area during execution of an activity within a particular taskO As pointed out above, a sinyle ask may consist of a single ac~ivity or a series of activities, as outlined above in ~L3~6~

1 connection with FIG. 8. ~Jhen an "activity start" call procedure is encountered during ex0cution of a program, a subroutine is executed which first checks to make sure that the Recovery in Progresc flag is false, since the recording services are not invoked during a recovery. If a recovery is in progress, the "activity start" call proceduxe immediately terminates and returns to the activity program. However, if a recovery is not in progress, the "activity start" program then builds a 10 recording activity area in memory by first calculating the record size required. This is determined by adding the number of bytes requir~d for the heading of the recording area to the amount of memory defined by the application program tha~ is required to store the required 15 error recovery data. The operating system then sets aside a buffer area in memory of the required size, and the memory area is initialized. The "lower status" is initialized to a ~ormal ctatus to indicate no lower level activity has besn called.
Ater the ac~ivity recording area is ~uilt, the recordin~ activity area is llnked ~o the ta~k anchor through the forward link and to the ne~t higher level recording area thxough the backward link, as illustrated in FIG. 5.
After the recording area is established for the particular activity by the activity start call and linked to the tasX anchor and to the next higher activity level recording area, the checksums for the new recording area, the previous recording area and the anchor are all recalculated and stored in the respective headings ~o complete the activity start operation. Also the "lower status" in the previous or next higher ~3~6~

l activity level heading is set to No Data to indicate no data has yet been stored in the current activity recording axea.
Once a recording area is established in memory for use by a particular ac~ivity, the application program within that activity may store a byte, a word (two bytes) or a double word ~four bytes) into the recording area in response to a ~et byte call, a set word call or a set double word call, shown respectively in FIGS. lO, ll, and 12. In each instance, a check is first made of the Recovery in Progre~s flag. If it is not true, the program finds the activity recording axea, as shown in FIG. 15, find~ the activity data item location in the data recording area, as shown in FIG. 16, and stores the l~ byte, word, or double word, as the case may be in the item location. To fincl the activity record, as shown in FIG. 15, the operating ~ystem first getq the task I.D.
from a predetermined location in memory where the op~rating system ~tore~ the I.D. of the ac~ive ta~k. The Z task I.D. determines the location o~ the task anchor, and the task anchor provides a backward link poi~ting to the last activity recordin~ area for the particular task.
To find the addre~ o~ the activity data item where the byte, word or double word is to be stored, the offset of the data address is computed knowing the base address o the recording ~rea and the number of bytes used for the header of the recording area. A new checksum is determined and stored in the header after the byte, word or double word has been ~tored in the memory. Thus by using one of these three calls, an application program can provide for error recovery by storing information generated during the execution of the proyram in a 3~

, 6~

1~
1 specified recording area set aside in memory.
When the application program has stored enough information in the recording area in memory to be able to make a recovery following a software error or power failure, the Data Ready flag in the recording area is set using an "activity data ready" call, as shown in FIG.
13. Again, after checking to see that the Recovery in Progress fla~ has not been set true, ~he par~icular activity recording area in memory is located and the Data Ready flag in the heading is set true. After calculating a new checksum, the "lower status" in the header of the previous activity recording area in the memory is set to Incomplete, indicating that the next current activity level has completed the storage of recovery data but not called "activity end". A new checksum for the previous or higher lavel recording area is recalculated before e~i~ing the "activity data ready"
call.
Before exiting a completed activity program, an "activity end" call is used to remove the associatad r~covery recording area in the memory. The "activity end" ca}l, as shown in detail in FIG. 14, again checks to determine whether the pr~gram is being executed in the recovery mode. If not, the "activity end" call then finds the activity r~cord in memory and unlink~ the recording area from the chain which links the anchor to the recording areas ~or the other activity levels. Since this invol~es changing the forward and backward link addresses in the anchor a~d in the previous or next higher order activity l~vel recording area, a new check-sum is computed for both the previous recording area heading and for the anchor heading. The recording area 3~

1 is then released so that that area of memory can be available for other uses. Also ~he "lower sta~us" of the higher level recording area heading is changed to an End o Data status, indicating the lower level program routine has been completed.
Special calls are also used during the error recovery operation to locate the data recorded by the application program prior to the occurrence of the error condition. Th~s~ include the "activity restart" call, described in connection with the flow diagram shown in FIG. 17. When the "activity res~art" call is encountered, it gets the task identification from the table in memory.
The tasX identification points to the task anchor and is uæed to obtain the forward link address from the corre ponding task anchor in memory~ The forward link points to the irst or highest aetivity level error recovery recording area for the particular task, as described above in connection with FIG. 5. The recovery identification recorded in the header of the first recording area is then checked to make sure there i5 a match with the user recovery identification previously stored in the header o the recordin~ area. If there is a match and the checksum is valid, the Data Ready flag in the header is checked to make ~ure active data has been recorded. If so, a pointer to the beginning address o~
the recorded da~a is set and returned to the main program.
An "activity lower status" call is also provided for use by the application program which operates to return the lower status information stored in the header Of an activi~y recording area in memory. As shown in FIG~ 18, thi~ call routine gets the task identi~ication and then obtains the forward link address from the anchor L3~6~

~ -20-1 header. From the highest level activity record, it obtains the recovery identification stored in the header;
and if it matches ~he desired activity level, a checksum is made and the lower status information in the header is returned to the main program.
An "activity restart end" call, shown in FIG. 19, is used to erase the activity record in memory. It functions to erase all activity levels associated with the particular ~ask in memory. The "activity restart end" call gets the task id~ntification which points to the anchor header. From the anchor header, it gets the forward link address which points to the header of the first recording area for the particular task. If there i~ a record, it updates the anchor links to bypass the 15 activity level and releases the record area in memory.
This loop is repeated until all of the recording areas associated with a paxticular tasX have b~en r^eleased.
A new checksum for the task anchor is then computed, and the call returns to the main program.
To better under~tand the invention, operation of the system in executing the user task for creating a new file in the data base is summarized by the flow diagrams of FIGS. 20, 21, 22, and 23. As described in connection with FIG. ~, the scheduler initiate~ the appropriate task to be processed. In the example, the task to be processed is the Create File task. When the Create File transaction i3 received from the scheduler, the Recovery in Progress flag is checked. Assuming it is false, so that the normal operation is in progress, the "activity start" call, descri~ed above in connection with FIG. 9, is initiated by the task program~ As a result, the recording area for storing error recovery data is set aside in memory and ~LZ~3~6'~

1 the heading information of the first or highest activity level recording area for the particular task is initialized in ~he manner described above. In the present example, it is assumed that the Create File task does not have any nested activity levels and so no more than one recording area for error recovery is required.
Once the recording area is set aside in memory, linked to the task anchor, and initialized, an "activity set word" call i8 u~ed to set a progress word in the recording area to a null condition. An "activity set double word"
call is used to set a detail block number to zero in the recording area, and an "activity set word" call is used to set the user file in the recording area to æero. An "activity data ready" call is then executed (see FIG. 13) to set the Data Ready flag in the heading o the recording area to true. The Create File task then proceeds to reserve the next u~er file number and create a user file description block on the disk. The block number is then recorded in the error recovery recordi~g area by an "activity set double word" call. An "activity set" call is then u~ed to change the progress word to indicate the progress detail by the task.
The Create File task then creates an entry in the index blocX on a disX for this new user ile, and an Z5 activity set word records a progress word in the recovery recording area to indicate that the index has been inserted. The Create ~ew File task then ~ets a catalog entry on the disk to indicate that this new user file is active. An "activity set word" call then records as data in the ac~ivity recording area that the task has progressed to setting the catalog entry to "active"O An "activity end" call is then executed (see FIG. 14). This removes the error recovery recording area from memory and ~3~

1 compl~tes ~he exeeution of the Create File task. The transaction is returned to ~he scheduler as completed.
It will be seen that each particular application program for executing a particular task involving modification of the da~a base has embedded in the program calls which set up the recording area in memory for the recovery. Then progress inormation is recorded in the recording area in memory and ~he Data Ready ~la~ is set.
If the task is completed without any error condition arising, the recording area is released beore the task returns operation to the scheduler.
ln the event tha~ an error condition arises before a task is completed, the task is again scheduled but with the Recovery in Progres~ flag set to true, as described above in connection with FIG. 7. If the Recovery in Progress flag is true, the program c~lls a Create File Recovery program, as shown in the flow diagram of FIG.
20. The recovery subroutine is shown in detail in the flow diagram of FIGS. 22 and 23. The subroutine first calls an "ac~ivity restart" whichr as described above in connection with FIG. 17, returned a pointer to the associated recording area in memory where the recovery data was recorded prior to the occurrence o the errox condition, Using the pointer, the program checks the status of ~he recording area to determine whether there is active data recorded. If no data is recorded, the program checks to see whether the catalog entry on the disk is actîve for the particular user file. If the catalog entry on the disk is not active, this indicates t~at the transaction was completed at the time the error was encountered. The program thereore calls an "activity restart end". This call returns a "normal" status which .

3 U~

.~ -23-1 causes the transaction to be sent to the scheduler as comple~e. If the catalog entry on the disk i.3 active, the return status is set to "incomplete" to indicate the transaction has not started. Under the~e condition~, the program returns to the main program of the Create File task, as indicated in FIG. 20, to be rescheduled.
If there has been data recorded in the recording area, this indicates that the task had been partially executed at the time the error condition occurred. In this c~se, as shown by the flow dia~ram of FIG. ~2, the program checks to determine if ~he recorded user file number is still zero. If so, the recovery opexation is terminated by setting the return status to "incomplete"
and returning the transaction to the scheduler. However, if a file has been created so that the file number is other than zero, the recorded data is used to determine if the task reached the progres3 point where "detail inserted" was completed before the error condition occurred. If the ~a~k had reached this poin~, the program

2~ bxanches to "C" in the flow diagram, as shown in FIG. 23 However, assuming the task had not progressed to this point in th~ program, the subroutine compares the u~er file number with the highest reserve user file number.
If the re3erve number is greater than the file number, the reserve number is decremented by 1. If a description block number has been allocated so that it is not equal to zero, the block number is deallocated, the return 3tatus is set to "incomplete", the "activity restart end"
call is executed, and the operation is returned to the scheduler to be rescheduled. Thus, in effect, the error recovery ~rocedure has backed out of the task, leaving the data base in the same status it was in before the ~3~

1 Create File task was initated prior to the error condition.
If the Create File task has proceeded to the point where it had progressed to the "detail inserted" status, the recovery program goes forward, as shown in FIG. 23, and determines whether the program had progressed to th~
point where an index had been inserted on the di k. If it had not progressed to this point, the program creates an entry in the index block on the disk for the user file. The recovery program then ehecks to determine if the ta~k had progressed to the point where the catalog name was ac~ivated. If not, the program sets the catalog entry on the disk to an active state for this user file.
The return statu~ is set to "normal" since the Create 15 File task i5 now a comple~ed transaction.
From the above description, it will be seen that each tasX includes procedures for recording progress data in a specified area or areas in memory during execution.
Th~ data stored in these areas are retaineZ on the disk in 20 the event of an error that interrupts the execution o a task. During a subsequent error operation, the data is restored to the memory and used by the task to either complete th~ task or back out of the task in a way that does not ~orrupt the data base. This allows automatic system recovery from software errors or power loss.
Recovery data i~ recorded in memory and then discarded to allow new data to be recorded as task execution progresses without an error condition occurxing.

Claims

The embodiments of the invention in which an exclusive property or privilege is claimed are defined as follows:-

1. In a data base management system having a plurality of application programs capable of concurrently executing a plurality of tasks with respect to a data base located in a non-volatile bulk memory, a method of providing said system with the capability of recovering from an unexpected interruption in system operation without corrupting the data base, said method comprising:
allocating a task recovery area in a random access memory during the initial processing of each task which may modify the data base;
independently recording task recovery data during the execution of each such task in its respective task recovery area to the extent required to provide recovery for the task without data base corruption in the event of an unexpected interruption in system operation while the task is executing, said independently recording task recovery data occurring in response to instructions embedded in the application programs used for executing the task;
de-allocating each task recovery area in response to deactivation of the task as a result of its successful completion;
transferring recovery data in said task recovery areas to said bulk memory in response to the occurrence of an unexpected interruption in system operation while a task is executing;
during system recovery operations, returning the recovery data transferred to said bulk memory in response to said interruption back to said task recovery areas in said random access memory; and then independently performing task recovery operations for each task which was active when the unexpected interruption occurred using the recovery data returned to its respective task recovery area, said task recovery operations being performed in a manner such that the integrity of the data base is maintained, said independently performing task recovery operations occurring in response to instructions embedded in the application programs used for executing the task.

2. The invention in accordance with claim 1, wherein the recovery data stored by each active task in its task recovery area during task execution includes task progress data.

3. The invention in accordance with claim 2, wherein the step of independently performing task recovery operations for each task includes interrogating the respective task recovery area and in response to said progress data either completing the task or backing out of the task and then re-executing the task.

4. The invention in accordance with claim 1, including the steps of recording an active task identification in an active task table in said random access memory when a task is activated, transferring said table to the bulk memory in response to said interruption along with said task recovery areas, and returning said active task table to said random access memory along with said task recovery areas during said system recovery operations.

5. The invention in accordance with claim 4, including the step of interrogating said table to determine the tasks for which the step of independently performing task recovery operations are to be performed.

6. The invention in accordance with claim 5, wherein the step of allocating includes storing data identifying the locations of allocated task recording areas in said table, and wherein the steps of independently recording task recovery data and independently performing task recovery operations refer to said table for determinng the location of the respective recovery area allocated for each task.

7. The invention in accordance with claim 6, further including the steps of recording a flag in each task recovery area indicating if the data recorded therein is ready for use during recovery, testing the flag in response to said interruption, and inhibiting the transfer of a task recovery area to the bulk memory until the flag indicates that the data is ready for use during recovery operations.

8. The invention in accordance with claim 7, further including the steps of determining when said task recovery tables have been stored in said bulk memory, and in response thereto initiating system recovery operations.

9. The invention in accordance with claim 1, wherein a task may comprise a plurality of nested activities, wherein the step of allocating includes dividing the task recovery area of such a task into linked activity recovery areas, and wherein the step of independently recording for such a task includes recording recovery data during the performance of each nested activity in a respective activity recovery area.