JPH05303523A

JPH05303523A - In-fault process restore system

Info

Publication number: JPH05303523A
Application number: JP4109552A
Authority: JP
Inventors: Hiroyuki Yoshida; 浩幸吉田; Kenji Yokoyama; 憲治横山
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1992-04-28
Filing date: 1992-04-28
Publication date: 1993-11-16

Abstract

PURPOSE:To process update requests by plural tasks in parallel and to efficiently restore data by stopping a process in a short period in the case of fault detec tion. CONSTITUTION:An update request by a task 1 to a block B1 is fetched (S9) and an update request by a task 2 to B1 is fetched (S10). Once the task 1 makes a commitment, update information (update journal, data before update, and data after update) determined by the S9 and S10 is saved in a magnetic disk device(MD). Then the update request by the task 1 is fetched (S14) and then the update request by the task 2 is fetched (S15). Once the task 2 makes a commitment 2, update information determined by S14 and S15 is saved in the MD. If a fault occurs here, the pieces of update information in the MD are stored in a corresponding storage area in a main storage device to enter the storage state at the time of the commitment 1 which is determined last time, and CPU processing is restarted.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は、障害時処理回復方式
に関し、例えばコンピュータシステムなどにおける障害
時処理回復に適用して好適な障害時処理回復方式に関す
るものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a failure process recovery system, and more particularly to a failure process recovery system suitable for application to a failure process recovery in a computer system or the like.

【０００２】[0002]

【従来の技術】近年、オンライン情報システムなどが普
及している。例えば銀行オンラインシステムなどにおい
て、何等かの原因で障害が発生して、ＣＰＵ処理などが
停止されると、処理途中のデータが消滅したり、現在ま
で行ってきた処理を再び最初からやりなおさなければな
らず、利用者に多大な混乱を与えたり、銀行側なども大
きい損失を受ける場合がある。このため障害（例えば電
源障害、装置のハードウエア的故障、プログラム障害な
ど）に対処するさまざまな方法が開発されている。2. Description of the Related Art In recent years, online information systems have become popular. For example, in a bank online system or the like, if a failure occurs for some reason and the CPU processing is stopped, the data being processed is lost, or the processing that has been performed up to now must be restarted from the beginning. Not to mention, it may cause a great deal of confusion to users, and the banks may suffer a great loss. For this reason, various methods have been developed to deal with failures (for example, power failure, device hardware failure, program failure, etc.).

【０００３】例えば、上記障害に対処する方法として、
データの更新時に、その更新内容などを更新後のデータ
とは別ファイル（例えば不揮発性の更新履歴ファイルと
呼ぶ）に格納し、ある処理を行っているときに障害が発
生した場合はこの更新履歴ファイルの更新内容を参照し
て更新前のデータを再現して、再現されたデータに基づ
き処理を再開するという方法（この方法を第１の障害時
処理回復方法と呼ぶ）があった。For example, as a method for coping with the above-mentioned obstacles,
When updating data, store the updated contents in a separate file from the updated data (for example, called a non-volatile update history file), and if a failure occurs during certain processing, update history There is a method (referred to as a first failure process recovery method) of referring to the update content of the file to reproduce the data before the update and restarting the process based on the reproduced data.

【０００４】また、データ更新時に更新前のデータを、
更新後のデータとは別ファイル（例えば、不揮発性のロ
グファイルと呼ぶ）に格納し、ある処理を行っていると
きに障害が発生した場合はこのログファイルの更新前の
データを使用して、処理を再開するという方法（この方
法を第２の障害時処理回復方法と呼ぶ）もあった。In addition, when updating data,
Store the data in a separate file from the updated data (for example, a non-volatile log file), and if a failure occurs during certain processing, use the data in this log file before the update. There is also a method of restarting processing (this method is called a second failure processing recovery method).

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、上記第
１の障害時処理回復方法では、更新内容によっては、１
回の更新内容が大きくなって、更新履歴ファイルに保存
する更新内容のデータ量が大きくなったり、小さくなっ
たりするこが想定され、更新履歴ファイルのファイル容
量を決めにくく、そして、更新内容のデータ量が大きく
なると処理回復に時間がかかるという問題がある。ま
た、同一ブロックのデータに対する更新が何回も行われ
ている場合、回復処理のときに複数回の更新内容に基づ
き更新前の状態に回復しなければならないため処理回復
にますます時間がかかるという問題がある。そしてこの
様に処理回復に時間がかかるということは、本来の目的
であるアプリケーションの処理を開始できるまでの時間
が長くなり、利用者が利用できない時間が長くなるとい
う問題がある。However, according to the first failure processing recovery method, depending on the update contents, 1
It is expected that the amount of update contents stored in the update history file will increase or decrease as the update contents become larger, and it will be difficult to determine the file size of the update history file. When the amount is large, there is a problem that it takes time to recover the processing. Also, if the same block of data is updated many times, it will take more and more time to recover because the state before the update must be restored based on the contents of multiple updates during the recovery process. There's a problem. In addition, such a time-consuming process recovery causes a problem that it takes a long time to start the processing of the application, which is the original purpose, and the time the user cannot use becomes long.

【０００６】また上記第２の障害時処理回復方法では、
データの更新を行う場合、更新前のデータを、主記憶メ
モリからログファイルへ保存するために書き込みを行う
必要が生じる。しかしながら、あるタスクからコミット
（ＣＯＭＭＩＴ：更新を確定するコマンド）が宣言され
て、あるブロックに対する更新を行って、主記憶メモリ
から磁気ディスク装置への書き込みを行っている間は他
のタスクから更新対象ブロックに対する更新を行うこと
ができず、書き込みを終了して上記コミットを発行した
タスクが再びコミットなどを発行して一つのトランザク
ションが終了しなければ、他のタスクから上記ブロック
に対する更新を行うことができないという問題があっ
た。このため異なるタスクから同じブロックに対して更
新したい場合に、時間がかかるという問題があった。In the second method for recovering from a process when a failure occurs,
When updating the data, it is necessary to write the data before the update in order to save the data from the main memory to the log file. However, while a commit (COMMIT: command to confirm the update) is declared from a certain task to update a certain block and writing from the main memory to the magnetic disk device, the update target from another task If it is not possible to update the block, and the task that issued the commit and issued the commit again issues a commit etc. and one transaction does not end, the other task can update the block. There was a problem that I could not. Therefore, there is a problem that it takes time to update the same block from different tasks.

【０００７】この発明は、以上の課題に鑑み為されたも
のであり、その目的とするところは、例えばコンピュー
タシステムなどにおいて、複数のタスクによる同一ブロ
ックのデータの更新の並行処理を可能にし、しかも障害
検出時に短時間の処理停止で、効率的にデータを回復す
ることができる障害時処理回復方式を提供することであ
る。The present invention has been made in view of the above problems, and an object thereof is to enable parallel processing of updating data of the same block by a plurality of tasks in a computer system, for example. It is an object of the present invention to provide a failure recovery processing method capable of efficiently recovering data by stopping processing for a short time when a failure is detected.

【０００８】[0008]

【課題を解決するための手段】この発明は、以上の目的
を達成するために以下の特徴的な各手段と方法で構成さ
れる。つまり、更新対象データに対する更新要求が発行
されるごとに、更新ジャーナルを揮発的に格納する更新
ジャーナル格納手段と、上記更新要求に対応して上記更
新対象データを更新し、この更新後のデータを揮発的に
格納する更新後データ格納手段と、上記更新を確定する
ごとに更新前のデータを揮発的に格納する更新前データ
格納手段と、同じタスクによって少なくとも１以上の更
新要求が発行されるごとに一つのトランザクションを確
定し、この一つのトランザクションが確定されるごとに
上記更新ジャーナルと上記更新前のデータと上記更新後
のデータとを不揮発的に保存する更新情報保存手段とを
備える。The present invention comprises the following characteristic means and methods for achieving the above object. That is, each time an update request for the update target data is issued, the update journal storage unit that stores the update journal in a volatile manner and the update target data corresponding to the update request are updated, and the updated data is updated. After-update data storage means for volatile storage, before-update data storage means for volatile storage of pre-update data each time the update is confirmed, and at least one or more update requests issued by the same task And a update information storage unit that stores the update journal, the data before the update, and the data after the update in a non-volatile manner each time the transaction is confirmed.

【０００９】そして、障害検出時には、この障害検出で
確認される障害発生時に一番近い前の時期に確定された
トランザクションで更新されて上記更新情報保存手段に
よって保存さている更新ジャーナルを上記更新ジャーナ
ル格納手段に格納し、同様に保存されている更新前のデ
ータを上記更新前データ格納手段に格納し、同様に保存
されている更新後のデータを上記更新後データ格納手段
に格納して、上記障害発生時に一番近い前の時期に確定
されたトランザクションのときの格納状態に回復するこ
とを特徴とする。At the time of detecting a failure, the update journal stored in the update information storage means and updated by the transaction determined at the closest previous time when the failure confirmed by the failure detection occurs is stored in the update journal. In the same manner, the pre-updated data stored in the same means is stored in the pre-updated data storage means, and the post-updated data stored in the same manner is also stored in the post-updated data storage means to prevent the failure. It is characterized by recovering the storage state at the time of the transaction that was finalized at the closest previous time when the transaction occurred.

【００１０】[0010]

【作用】この発明によれば、同一ブロックのデータに対
する異なるタスクからの更新要求が複数発行されても、
上記更新ジャーナル格納手段と、上記更新後データ格納
手段と、上記更新前データ格納手段とで更新情報を格納
しているので、複数のタスクからの更新要求を並行して
取り込むことができ、同じタスクからの１以上の更新要
求が発行されるごとに、そのタスクの上記更新情報を確
定し、上記更新情報保存手段によって保存しているの
で、従来に比べ効率的に更新情報を保存することができ
る。According to the present invention, even if a plurality of update requests for the same block of data are issued from different tasks,
Since the update information is stored in the update journal storage means, the updated data storage means, and the pre-update data storage means, update requests from a plurality of tasks can be taken in parallel, and the same task Each time one or more update requests are issued from the above, the above-mentioned update information of the task is determined and saved by the above-mentioned update information saving means, so the update information can be saved more efficiently than in the past. ..

【００１１】そして、例えば電源断などによる障害発生
時には、電源投入の後のＩＰＬなどを行った後に障害検
出して、この障害発生時期に一番近い前の時期に確定さ
れたトランザクションで更新されて上記更新情報保存手
段によって保存されている更新ジャーナルと、更新後デ
ータと、更新前データとをそれぞれ対応する上記更新ジ
ャーナル格納手段と、更新後データ格納手段と、更新前
データ格納手段とに読み出して格納し、上記確定したト
ランザクションのときの格納状態にしているので、自動
的にこの格納状態から処理を再開できる。従って複数の
タスクから更新要求が発行されていても、上記障害回復
処理によって自動的に前回確定したトランザクションの
状態から効率的に再開することができる。When a failure occurs due to, for example, power-off, the failure is detected after performing an IPL after the power is turned on, and the transaction is updated with the transaction confirmed at the time closest to the failure occurrence time. The update journal stored by the update information storage means, the updated data, and the pre-update data are read into the corresponding update journal storage means, post-update data storage means, and pre-update data storage means, respectively. Since it is stored and is in the storage state at the time of the confirmed transaction, the processing can be automatically restarted from this storage state. Therefore, even if an update request is issued from a plurality of tasks, it is possible to automatically restart from the state of the transaction confirmed last time automatically by the failure recovery processing.

【００１２】[0012]

【実施例】次にこの発明をコンピュータシステムの障害
時処理回復方式に適用した場合の好適な一実施例を図面
を用いて説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT A preferred embodiment of the present invention applied to a failure processing recovery system of a computer system will be described with reference to the drawings.

【００１３】この一実施例では、一般的なコンピュータ
システムにおいて、障害時に短時間の停止（ロック）
で、効率的に処理を回復する一例を以下に説明する。In this embodiment, in a general computer system, when a failure occurs, the computer is stopped (locked) for a short time.
Then, an example of efficiently recovering the processing will be described below.

【００１４】そして、一例として、少なくとも１以上の
タスクがある対象ブロックに対する更新要求を出し、更
新するごとに更新ジャーナルと、更新前の対象ブロック
のデータを主記憶装置に障害回復用データとして記憶
し、上記更新要求を出した各タスクがコミットを発行し
て、更新を確定するごとに、それまで記憶されている更
新内容と更新前のデータとを磁気ディスク装置に保存
し、ＣＰＵ処理の途中で障害が起きたならば、ＩＰＬを
行うと共に上記磁気ディスク内容を参照して、障害前の
データに回復させる例を以下に説明する。As an example, at least one task issues an update request for a target block, and each time the update is performed, the update journal and the data of the target block before the update are stored in the main memory as failure recovery data. Each time the task that has issued the update request issues a commit and confirms the update, the update contents and the data before update stored up to that point are saved in the magnetic disk device, and during the CPU processing. If a failure occurs, an example of performing the IPL and referring to the contents of the magnetic disk to recover the data before the failure will be described below.

【００１５】図１はこの一実施例に係るコンピュータシ
ステムの障害時処理回復方式の一例の処理フローチャー
ト（その１：全体的な処理の流れ一例）である。そして
図２はこの一実施例に係るコンピュータシステムの障害
時処理回復方式の一例の処理フローチャート（その２：
図１における障害回復処理部分の詳細な処理の流れの一
例）である。FIG. 1 is a processing flowchart of an example of a failure recovery processing method for a computer system according to this embodiment (part 1: an example of the overall processing flow). FIG. 2 is a processing flowchart of an example of a failure recovery processing method for a computer system according to this embodiment (part 2:
2 is an example of a detailed processing flow of a failure recovery processing part in FIG. 1).

【００１６】この図１及び図２の一例の処理フローチャ
ートを説明する前に、このコンピュータシステムの障害
時処理回復方式を実現するための一例のコンピュータシ
ステムの構成を図３を用いて説明する。Before describing the processing flowcharts of the examples of FIGS. 1 and 2, the configuration of an example of a computer system for implementing the failure recovery processing method of the computer system will be described with reference to FIG.

【００１７】図３はこの一実施例に係るコンピュータシ
ステムの障害時処理回復方式を実現するコンピュータシ
ステムの一例のハードウエア構成図である。FIG. 3 is a hardware configuration diagram of an example of a computer system which realizes the failure recovery processing method of the computer system according to this embodiment.

【００１８】この図３において、この一例のコンピュー
タシステムは、ＣＰＵ７と、主記憶装置１と、磁気ディ
スク装置２〜５と、入出力装置８とで構成されている。In FIG. 3, the computer system of this example comprises a CPU 7, a main storage device 1, magnetic disk devices 2 to 5, and an input / output device 8.

【００１９】そして、磁気ディスク装置４は主にメイン
プログラム（ＭＰ）などを格納して、電源ＯＮによって
ＣＰＵ７からの命令で主記憶装置１のメインプログラム
（ＭＰ）格納領域にＩＰＬ（ＩｎｉｔｉａｌＰｒｏｇ
ｒａｍＬｏａｄｉｎｇ）される。磁気ディスク装置５
は主にアプリケーションプログラム（ＡＰ）を格納し
て、ＣＰＵ７から入出力装置８に供給される命令に基づ
き、主記憶装置１のアプリケーションプログラム（Ａ
Ｐ）格納領域１６に格納させる。The magnetic disk device 4 mainly stores a main program (MP) and the like, and when the power is turned on, an IPL (Initial Program) is stored in the main program (MP) storage area of the main storage device 1 by an instruction from the CPU 7.
ram Loading). Magnetic disk unit 5
Mainly stores an application program (AP), and based on an instruction supplied from the CPU 7 to the input / output device 8, the application program (A) of the main storage device 1 is stored.
P) Store in storage area 16.

【００２０】ＣＰＵ７は磁気ディスク装置５から供給さ
れるアプリケーションプログラム（ＡＰ）を主記憶装
置１の所定のアプリケーションプログラム（ＡＰ）格
納領域１６に格納させタスク処理する。またアプリケー
ションプログラム（ＡＰ）からプログラムの変更など
のためのコミットなどが供給されると、ＣＰＵ７の命令
によってデータファイル（ＤＦ）に対するプログラムの
更新などを行う。The CPU 7 causes an application program (AP) supplied from the magnetic disk device 5 to be stored in a predetermined application program (AP) storage area 16 of the main storage device 1 for task processing. When the application program (AP) supplies a commit or the like for changing the program, the CPU 7 commands the program to update the program for the data file (DF).

【００２１】磁気ディスク装置３はＣＰＵ７から命令に
基づき主記憶装置１から供給される障害回復用データに
よって障害回復用ログファイル（ＳＫＦ）を格納し、ま
た供給されるチェックポイントに基づきチェックポイン
トファイル（ＣＰＦ）などを格納する。またこれらのフ
ァイルはＣＰＵ７からの命令に基づき、主記憶装置１の
所定の格納領域に読み出される。磁気ディスク装置２は
主記憶装置１で得られたデータなどをＣＰＵ７の命令に
基づきデータファイル（ＤＦ）に格納したり、読み出し
たりする。The magnetic disk device 3 stores a failure recovery log file (SKF) by the failure recovery data supplied from the main storage device 1 based on an instruction from the CPU 7, and a checkpoint file (SKF) based on the supplied checkpoints. CPF) etc. are stored. Further, these files are read into a predetermined storage area of the main storage device 1 based on an instruction from the CPU 7. The magnetic disk device 2 stores or reads the data obtained in the main storage device 1 in a data file (DF) based on an instruction from the CPU 7.

【００２２】主記憶装置１は揮発性の記憶装置であっ
て、磁気ディスク装置４から供給されるメインプログラ
ム（ＭＰ）をメインプログラム（ＭＰ）格納領域に格納
し、ジャーナルログＪを、ジャーナルログ格納領域に格
納し、障害回復用データを障害回復用データ格納領域に
格納し、入出力データを入出力データ格納領域に格納
し、チェックポイントをチェックポイント格納領域に格
納する。また、ＣＰＵ７から命令に基づきジャーナルロ
グ格納領域のジャーナルログＪは磁気ディスク装置３の
障害回復処理用ログファイル（ＳＫＦ）に編集されて格
納される。またＣＰＵ７からの命令に基づき障害回復用
データ格納領域の障害回復用データも、磁気ディスク装
置３の障害回復処理用ログファイル（ＳＫＦ）に編集さ
れて格納される。また、ＣＰＵ７からの命令に基づき、
チェックポイント格納領域のチェックポイントも磁気デ
ィスク装置３のチェックポイントファイル（ＣＰＦ）に
編集されて格納される。またＣＰＵ７からの命令に基づ
き入出力データ格納領域の入出力データは磁気ディスク
装置２に格納される。The main storage device 1 is a volatile storage device, and stores the main program (MP) supplied from the magnetic disk device 4 in the main program (MP) storage area, and stores the journal log J in the journal log storage. In the area, the failure recovery data is stored in the failure recovery data storage area, the input / output data is stored in the input / output data storage area, and the checkpoint is stored in the checkpoint storage area. The journal log J in the journal log storage area is edited and stored in the failure recovery processing log file (SKF) of the magnetic disk device 3 based on a command from the CPU 7. Further, the failure recovery data in the failure recovery data storage area is edited and stored in the failure recovery processing log file (SKF) of the magnetic disk device 3 based on an instruction from the CPU 7. Also, based on the instruction from the CPU 7,
The checkpoints in the checkpoint storage area are also edited and stored in the checkpoint file (CPF) of the magnetic disk device 3. Input / output data in the input / output data storage area is stored in the magnetic disk device 2 based on an instruction from the CPU 7.

【００２３】入出力装置８は、例えばシステム内に種々
の命令やデータを供給するための入力部（図示省略）と
このシステム内で生成されたデータなどを出力するため
の出力部（（図示省略）などから構成される。The input / output device 8 is, for example, an input unit (not shown) for supplying various instructions and data to the system and an output unit ((not shown) for outputting data generated in the system. ) And so on.

【００２４】図４は、この一実施例のコンピュータシス
テムの障害時処理回復方式におけるあるブロックＢ１に
対するタスク１及びタスク２によるデータ更新のタイム
シーケンス例を説明する一例の説明図である。FIG. 4 is an explanatory diagram of an example for explaining a time sequence example of the data update by the task 1 and the task 2 for a certain block B1 in the failure recovery processing method of the computer system of this embodiment.

【００２５】この図４において、アプリケーションプ
ログラム（ＡＰ）のタスク１とタスク２によるデータ更
新の例を説明する。アプリケーションプログラム（Ａ
Ｐ）がタスク１においてコミットを発行し（Ｔ１）、次
にタスク２においてもコミットが発行された（Ｔ２）。
次にタスク１においてブロックＢ１のデータに対するＡ
ＤＤ（Ｒ１）（これはレコードＲ１を追加する処理）が
行われた（Ｔ３）。次にタスク２においてブロックＢ１
のデータに対するＡＤＤ（Ｒ２）（これはレコードＲ２
を追加する処理）が行われた（Ｔ４）。次にタスク１に
おいてコミット（番号１）が発行された（Ｔ５）。タス
ク１における前回のコミット（Ｔ１）発行から、今回の
コミット発行（Ｔ５）までをトランザクション１（ＴＲ
１）と呼ぶ。上記コミット（番号１）の発行によってト
ランザクション１（ＴＲ１）が確定し、ＡＤＤ（Ｒ１）
を確定して保存させる。In FIG. 4, an example of data update by the task 1 and task 2 of the application program (AP) will be described. Application program (A
P) issued a commit in task 1 (T1), and then also issued a commit in task 2 (T2).
Next, in task 1, A for the data in block B1
DD (R1) (this is the process of adding record R1) was performed (T3). Next, in task 2, block B1
ADD (R2) for this data (this is record R2
Was added) (T4). Next, in task 1, a commit (number 1) was issued (T5). Transaction 1 (TR) from the last commit (T1) issuance in task 1 to the current commit issuance (T5)
1). Transaction 1 (TR1) is confirmed by issuing the above-mentioned commit (number 1), and ADD (R1)
Confirm and save.

【００２６】そして次にタスク１においてブロック１に
対するＤＥＬＥＴＥ（Ｒ１）（これはレコードＲ１を削
除する処理）が行われた（Ｔ６）。次にタスク２におい
てブロック１に対するＡＤＤ（Ｒ３）（これはレコード
Ｒ３を追加する処理）が行われた（Ｔ７）。次にタスク
２においてコミット（番号２）が発行された（Ｔ８）。
タスク２における前回のコミット（Ｔ２）発行から今回
のコミット発行（Ｔ８）までをトランザクション２（Ｔ
Ｒ３）と呼ぶ。そして上記コミット（番号１）の発行に
よってトランザクション２（ＴＲ２）が確定し、ＡＤＤ
（Ｒ２）及びＡＤＤ（Ｒ３）を確定して保存させる。そ
して上記タスク１におけるコミット発行（Ｔ５）以降の
ＤＥＬＥＴＥ（Ｒ１）（Ｔ６）をトランザクション３
（ＴＲ３）と呼ぶ。しかしながら、タスク１においては
ＤＥＬＥＴＥ（Ｒ１）（Ｔ６）に対してコミットが発行
されていないので、トランザクション３（ＴＲ３）は未
確定の状態にある。Then, in task 1, DELETE (R1) for block 1 (this is the process of deleting the record R1) was performed (T6). Next, in task 2, ADD (R3) (this is the process of adding record R3) to block 1 was performed (T7). Next, in task 2, a commit (number 2) was issued (T8).
Transaction 2 (T) from the previous commit (T2) issuance to the current commit issuance (T8) in task 2
R3). Then, the transaction 2 (TR2) is confirmed by issuing the above-mentioned commit (number 1), and ADD
(R2) and ADD (R3) are confirmed and saved. Then, the DELETE (R1) (T6) after the commit issuance (T5) in the task 1 is transaction 3
(TR3). However, in task 1, since commit has not been issued to DELETE (R1) (T6), transaction 3 (TR3) is in an indeterminate state.

【００２７】図５はこの一実施例に係るコンピュータシ
ステムの各種ファイル及びデータの保存及び格納状態図
であり、上記図４の処理によって最終的に主記憶装置１
に格納及び磁気ディスク装置２、３に保存されている状
態を示している。FIG. 5 is a diagram showing the storage and storage states of various files and data in the computer system according to this embodiment, and the main storage device 1 is finally obtained by the processing of FIG.
Shows the state of being stored and stored in the magnetic disk devices 2 and 3.

【００２８】主記憶装置１は、例えばメインプログラム
（ＭＰ）格納領域１１と、ジャーナルログ格納領域１２
と、障害回復用データ格納領域１３と、入出力データ格
納領域１４と、チェックポイント格納領域１５と、アプ
リケーションプログラム（ＡＰ）格納領域１６とで構
成されている。そして、例えばジャーナルログ格納領域
１２はブロックＢ１に対するデータ更新に伴い、データ
の更新内容であるジャーナルログＪ１〜Ｊ４が格納され
ている。そして各ジャーナルログＪはタスク番号と、ト
ランザクション番号と、更新内容（例えばＡＤＤ（Ｒ
１）など）と、更新ブロック番号とから構成されてい
る。また障害回復用データ格納領域１３は例えばブロッ
クＢ１のデータを更新する場合の、更新前のブロックデ
ータＬ１を格納する。The main storage device 1 includes, for example, a main program (MP) storage area 11 and a journal log storage area 12
A failure recovery data storage area 13, an input / output data storage area 14, a checkpoint storage area 15, and an application program (AP) storage area 16. Then, for example, the journal log storage area 12 stores journal logs J1 to J4, which are the contents of the data update, as the data is updated to the block B1. Then, each journal log J has a task number, a transaction number, and an update content (for example, ADD (R
1) etc.) and the update block number. Further, the failure recovery data storage area 13 stores the block data L1 before the update when updating the data of the block B1, for example.

【００２９】また入出力データ格納領域１４は、キャッ
シュの形式で更新後のブロックデータＢ１が格納されて
いる。また、チェックポイント格納領域１５は、コミッ
トの発行によってタスクの更新が有効になった場合の、
タスク番号と、トランザクション番号と、有効コミット
番号などが格納されている。The input / output data storage area 14 stores the updated block data B1 in the cache format. In addition, the checkpoint storage area 15 stores, when the task update is enabled by issuing the commit,
The task number, transaction number, effective commit number, etc. are stored.

【００３０】磁気ディスク装置２は、主記憶装置１にお
いて更新されたブロックのデータがデータファイルＤＦ
に格納されている。磁気ディスク装置３は、障害回復処
理用ログファイルＳＫＦとチェックポイントファイルＣ
ＰＦが格納されている。そしてこの障害回復処理用ログ
ファイルＳＫＦには上記主記憶装置１から供給されるジ
ャーナルログＪと、障害回復用データＬと、有効コミッ
ト番号とで構成されている。またチェックポイントファ
イルＣＰＦは、更新が有効なそのコミット番号と、その
タスク番号と、そのトランザクション番号とが格納され
ている。In the magnetic disk device 2, the updated block data in the main storage device 1 is stored in the data file DF.
It is stored in. The magnetic disk device 3 uses the log file SKF and the checkpoint file C for failure recovery processing.
The PF is stored. The failure recovery processing log file SKF includes a journal log J supplied from the main storage device 1, failure recovery data L, and a valid commit number. The checkpoint file CPF stores the commit number for which the update is valid, the task number, and the transaction number.

【００３１】図１は、この一実施例に係るコンピュータ
システムの障害時処理回復方式の一例の処理フローチャ
ート（その１：全体的な処理の流れの一例）である。そ
して図２はこの一実施例に係るコンピュータシステムの
障害時処理回復方式の一例の処理フローチャート（その
２：図１おける障害回復処理部分の詳細な処理の流れの
一例）である。FIG. 1 is a processing flowchart (part 1: an example of the overall processing flow) of an example of a failure recovery processing method for a computer system according to this embodiment. FIG. 2 is a processing flowchart of an example of the failure recovery processing method of the computer system according to this embodiment (part 2: an example of a detailed processing flow of the failure recovery processing portion in FIG. 1).

【００３２】まず図１において、このコンピュータシス
テムは、電源の投入に従って最初に磁気ディスク装置４
に保存されているメインプログラムを主記憶装置１にＩ
ＰＬする（Ｓ１）。次にＣＰＵ７は磁気ディスク装置３
に保存されているチェックポイントファイルＣＰＦ１、
２を参照（Ｓ２）して、有効コミット番号が存在するか
否かを判断する（Ｓ３）。ここで存在するならば次に磁
気ディスク装置２、３に保存されているデータファイル
ＤＦ及び障害回復処理用ログファイル（ＳＫＦ）中でコ
ミット番号が一致するもののみを主記憶装置１の対応す
る格納領域１２、１３、１４に読み出して（Ｓ４）、格
納したデータを回復データとして（Ｓ５）、この格納状
態に基づきＣＰＵ処理を開始する（Ｓ６）。上記Ｓ２〜
Ｓ５の処理は例えばこのコンピュータシステムが使用中
に電源断などで一端処理が停止して、その後に再ＩＰＬ
されたときの処理と考えてもよい。First, referring to FIG. 1, the computer system is such that the magnetic disk device 4 is firstly turned on when the power is turned on.
The main program stored in
PL (S1). Next, the CPU 7 is the magnetic disk device 3
Checkpoint file CPF1 saved in
2 is referenced (S2), and it is determined whether or not there is a valid commit number (S3). If it exists here, then only the data file DF and the failure recovery processing log file (SKF) stored in the magnetic disk devices 2 and 3 having the same commit number are stored in the corresponding main storage device 1. The data is read into the areas 12, 13, and 14 (S4), the stored data is used as recovery data (S5), and the CPU processing is started based on this storage state (S6). S2 above
The processing of S5 is temporarily stopped, for example, when the computer system is in use and the power is cut off.
It may be considered as the processing when it is performed.

【００３３】一方、上記Ｓ３において、障害発生がな
く、しかも電源断を行い、その後の電源投入時には、上
記データファイルＤＦや障害回復処理用ログファイル
（ＳＫＦ）では、ファイル内容の一貫性が保たれている
ので、有効コミット番号に対応する障害回復用データは
磁気ディスク装置３には存在しないので、続いてＳ６が
実行される。On the other hand, in S3, when the power supply is cut off without any failure, and when the power is turned on thereafter, the file contents of the data file DF and the failure recovery processing log file (SKF) are kept consistent. Since the failure recovery data corresponding to the valid commit number does not exist in the magnetic disk device 3, S6 is subsequently executed.

【００３４】Ｓ６では、ＣＰＵ７がアプリケーション
プログラム（ＡＰ）の実行を行う。次に例えば磁気ディ
スク装置５から主記憶装置１にロードされたアプリケー
ションプログラム（ＡＰ）からデータ更新要求が発行
されるか否かをＣＰＵ７が監視して（Ｓ７）、更新要求
がなければ上記Ｓ６のＣＰＵ処理を継続して、例えばア
プリケーションプログラム（ＡＰ）からタスク１及び
タスク２に対するコミットが発行されると（Ｓ８）、タ
スク１による例えばブロックＢ１に対する更新内容をジ
ャーナルログＪ１（図５のＪ１（ＡＤＤ（Ｒ１）））と
して、主記憶装置１のジャーナルログ格納領域１２に格
納する（Ｓ９）。これによって入出力データ領域１４に
もＲ１が設定される。また障害回復用データ格納領域１
３には、ブロックＢ１の更新前の状態Ｌ０が格納されて
いる。In S6, the CPU 7 causes the application
The program (AP) is executed. Next, for example, the CPU 7 monitors whether or not a data update request is issued from the application program (AP) loaded from the magnetic disk device 5 to the main storage device 1 (S7). When CPU processing is continued and, for example, a commit for task 1 and task 2 is issued from the application program (AP) (S8), the update content for block B1 by task 1 is updated in the journal log J1 (J1 (ADD in FIG. 5). (R1))) in the journal log storage area 12 of the main storage device 1 (S9). As a result, R1 is also set in the input / output data area 14. Data storage area for failure recovery 1
The state L0 of the block B1 before the update is stored in 3.

【００３５】次にタスク２による例えばブロックＢ１に
対する更新内容をジャーナルログＪ２（図５のＪ２（Ａ
ＤＤ（Ｒ２）））として、主記憶装置１のジャーナルロ
グ格納領域１２に格納する（Ｓ１０）。これによって入
出力データ格納領域１４には前回のＲ１の設定に加え、
Ｒ２も設定される。この状態では障害回復用データ格納
領域１３にブロックＢ１の更新前の状態Ｌ０が既存であ
り、新たにＢ１に対する回復用データは格納しない。Next, the update contents of, for example, the block B1 by the task 2 are displayed in the journal log J2 (J2 (A in FIG. 5).
DD (R2))) in the journal log storage area 12 of the main storage device 1 (S10). As a result, in addition to the previous setting of R1 in the input / output data storage area 14,
R2 is also set. In this state, the state L0 before update of the block B1 already exists in the failure recovery data storage area 13, and new recovery data for B1 is not stored.

【００３６】次にアプリケーションプログラム（Ａ
Ｐ）からタスク１に対するコミット（コミット番号１）
が発行されると（Ｓ１１）、次にこの時点でジャーナル
ログ格納領域１２に格納されているジャーナルログＪ２
（ＡＤＤ（Ｒ２））と、障害回復用データ格納領域１３
に格納されている更新前のブロックのデータＬ０（デー
タ空）とを磁気ディスク装置３の障害回復処理用ログフ
ァイルＳＫＦに保存し、入出力データ格納領域１４に格
納されている変更後のデータＢ０（Ｒ１、Ｒ２）を磁気
ディスク装置２のデータファイルＤＦに保存させ、チェ
ックポイント格納領域１４でチェックポイントＣＰ０
（タスク１：ＴＲ１（確定）、タスク２：ＴＲ０（未確
定）、有効コミット番号１）を生成して磁気ディスク装
置３のチェックポイントファイルＣＰＦ１に保存させる
（Ｓ１２）。この保存を行っているときに、もし障害が
起きなければ（Ｓ１３）、上記コミット番号１によるブ
ロックＢ１に対するトランザクションＴＲ１によるデー
タ更新は確定したと判断する。Next, the application program (A
Commit from P) to task 1 (commit number 1)
Is issued (S11), the journal log J2 stored in the journal log storage area 12 at this time is then written.
(ADD (R2)) and failure recovery data storage area 13
The data L0 (data empty) of the block before the update stored in the log file SKF for failure recovery processing of the magnetic disk device 3 and the changed data B0 stored in the input / output data storage area 14 are stored. (R1, R2) is saved in the data file DF of the magnetic disk device 2, and the checkpoint CP0 is stored in the checkpoint storage area 14.
(Task 1: TR1 (determined), task 2: TR0 (undetermined), valid commit number 1) is generated and saved in the checkpoint file CPF1 of the magnetic disk device 3 (S12). If no failure occurs during this saving (S13), it is determined that the data update by the transaction TR1 for the block B1 with the commit number 1 is confirmed.

【００３７】しかしながら、電源断などの障害が起きて
保存できず、しかも主記憶装置１内のデータが無くなっ
たと判断されると磁気ディスク装置３に保存されている
障害回復処理用ログファイルＳＫＦとチェックポイント
ファイルＣＰＦ１又はＣＰＦ２とを使用して障害回復処
理を行う（Ｓ２０）。However, if it is judged that the data cannot be saved due to a failure such as power failure and that the data in the main storage device 1 has been lost, the log file SKF for failure recovery processing saved in the magnetic disk device 3 is checked. Fault recovery processing is performed using the point file CPF1 or CPF2 (S20).

【００３８】この障害回復処理（Ｓ２０）の詳細につい
ては図２に示しており、この図２によると、まず再ＩＰ
Ｌを行い（Ｓ２０１）、次に磁気ディスク装置３のチェ
ックポイントファイルＣＰＦ１及びＣＰＦ２を参照し
（Ｓ２０２）、有効コミット番号が存在するか否かを判
断して（Ｓ２０３）、有効コミット番号があるならばこ
の番号に対応する障害回復処理用ログファイルＳＫＦの
障害回復処理用ログを主記憶装置１上に読み出して（Ｓ
２０４）、障害回復用データＬ及びジャーナルログＪに
基づきデータを回復して（Ｓ２０５）、磁気ディスク装
置２のデータファイルＤＦに保存する（Ｓ２０６）。こ
のときに障害が起きなかったかどうかを判断して（Ｓ２
０７）、起きたならば再び上記Ｓ２０１〜２０６を行
う。尚図１のＳ１３までの例においては、まだ更新内容
などを保存完了しているものがないので、Ｓ２０３にお
いて有効コミット番号が保存されていないと判断して障
害回復処理を終了し、以後ＩＰＬの状態で再開する。The details of this failure recovery processing (S20) are shown in FIG. 2, and according to this FIG.
Perform L (S201), then refer to the checkpoint files CPF1 and CPF2 of the magnetic disk device 3 (S202), determine whether or not a valid commit number exists (S203), and if there is a valid commit number. For example, the failure recovery processing log of the failure recovery processing log file SKF corresponding to this number is read into the main storage device 1 (S
204), the data is recovered based on the failure recovery data L and the journal log J (S205) and saved in the data file DF of the magnetic disk device 2 (S206). At this time, it is judged whether or not a failure has occurred (S2
07), if it occurs, the above steps S201 to 206 are performed again. In the example up to S13 in FIG. 1, since there is no update content etc. saved, it is judged in S203 that the valid commit number has not been saved, and the failure recovery process is terminated. Resume in the state.

【００３９】例えば、上記Ｓ１３で障害無しとして完了
し、しかもこれまでの処理で障害が一切発生していない
場合には、主記憶装置１ではデータが上記Ｓ１１の状態
で格納されており、また磁気ディスク装置２、３には上
記Ｓ１２の状態で保存される。For example, when the process is completed in S13 as a failure-free operation and no failure has occurred in the processing so far, the main storage device 1 stores the data in the state of S11, and It is stored in the disk devices 2 and 3 in the state of S12.

【００４０】尚、上記Ｓ２０を行った後に、次のＳ１４
以降の処理を行うか否かはアプリケーションプログラ
ム（ＡＰ）の処理によるが、この一例では次にＳ１４以
降の処理を行う一例を説明する。After performing the above S20, the next S14
Whether or not the subsequent processing is performed depends on the processing of the application program (AP), but in this example, an example of performing the processing of S14 and subsequent steps will be described.

【００４１】そして、例えば上記Ｓ１３の終了（障害無
し）の後にＳ１４ではタスク１によるブロックＢ１に対
する更新内容をジャーナルログＪ３（図５のＪ３（ＤＥ
ＬＥＴＥ（Ｒ１）））として、主記憶装置１のジャーナ
ルログ格納領域１２に格納する。これによって入出力デ
ータ格納領域１４からＲ１が削除され、この結果Ｒ２が
残る。また、障害回復処理用データ格納領域１３にはＲ
１が設定される。チェックポイント格納領域１５は前回
の状態を保持している。Then, for example, after the end of S13 (no failure), the update contents of the block B1 by the task 1 are updated in S14 in the journal log J3 (J3 (DE in FIG. 5) (DE
LETE (R1))) in the journal log storage area 12 of the main storage device 1. As a result, R1 is deleted from the input / output data storage area 14, and as a result, R2 remains. Further, R is stored in the failure recovery processing data storage area 13.
1 is set. The checkpoint storage area 15 retains the previous state.

【００４２】次に例えばタスク２によるブロックＢ１に
対する更新内容をジャーナルログＪ４（図５のＪ４（Ａ
ＤＤ（Ｒ３）））として、主記憶装置１のジャーナルロ
グ格納領域１１に格納する（Ｓ１５）。これによって入
出力データ格納領域１４にはＲ２に加え、Ｒ３が設定さ
れる。また、障害回復処理用データ格納領域１３にはＲ
１に加え、Ｒ２が設定される。チェックポイント格納領
域１５は前回の状態を保持している。Next, for example, the update contents of the block B1 by the task 2 are written in the journal log J4 (J4 (A in FIG. 5).
DD (R3))) in the journal log storage area 11 of the main storage device 1 (S15). As a result, R3 is set in the input / output data storage area 14 in addition to R2. Further, R is stored in the failure recovery processing data storage area 13.
In addition to 1, R2 is set. The checkpoint storage area 15 retains the previous state.

【００４３】次にアプリケーションプログラム（Ａ
Ｐ）からタスク２に対するコミット（コミット番号２）
が発行されると（Ｓ１６）、次にこの時点でジャーナル
ログ格納領域１２に格納されているジャーナルログＪ３
（ＤＥＬＥＴＥ（Ｒ１））と、障害回復用データ格納領
域１３に格納されている更新前のブロックのデータＬ１
（Ｒ１、Ｒ２）とを磁気ディスク装置３の障害回復処理
用ログファイルＳＫＦに保存し、入出力データ格納領域
１４に格納されている変更後のデータＢ１（Ｒ２、Ｒ
３）を磁気ディスク装置２のデータファイルＤＦに前回
の保存データの上に書き替えて保存させ、チェックポイ
ント格納領域１４でチェックポイントＣＰ１（タスク
１：ＴＲ１、タスク２：ＴＲ２、有効コミット番号２）
を生成して磁気ディスク装置３のチェックポイントファ
イルＣＰＦ２に保存させる（Ｓ１７）。この保存を行っ
ているときに、障害が起きなければ（Ｓ１８）、上記コ
ミット番号２によるブロックＢ１に対するトランザクシ
ョン２（ＴＲ２）によるデータ更新は確定したと判断す
る。Next, the application program (A
Commit from P) to task 2 (commit number 2)
Is issued (S16), the journal log J3 stored in the journal log storage area 12 at this time is then issued.
(DELETE (R1)) and the data L1 of the block before update stored in the failure recovery data storage area 13
(R1, R2) are saved in the failure recovery processing log file SKF of the magnetic disk device 3, and the changed data B1 (R2, R2) stored in the input / output data storage area 14 is stored.
3) is rewritten and saved in the data file DF of the magnetic disk device 2 over the previously saved data, and the checkpoint CP1 (task 1: TR1, task 2: TR2, valid commit number 2) is stored in the checkpoint storage area 14.
Is generated and stored in the checkpoint file CPF2 of the magnetic disk device 3 (S17). If no failure occurs during this saving (S18), it is determined that the data update by the transaction 2 (TR2) for the block B1 with the commit number 2 is confirmed.

【００４４】しかしながら、電源断などの障害が起きて
保存できず、しかも主記憶装置１内のデータが無くなっ
たと判断されると磁気ディスク装置３に保存されている
障害回復処理用ログファイルＳＫＦとチェックポイント
ファイルＣＰＦ１又はＣＰＦ２とを使用して障害回復処
理を行う（Ｓ２１）。However, if it is judged that the data cannot be saved due to a failure such as power failure and that the data in the main memory 1 has been lost, the log file SKF for failure recovery processing saved in the magnetic disk device 3 is checked. Fault recovery processing is performed using the point file CPF1 or CPF2 (S21).

【００４５】この障害回復処理（Ｓ２１）の詳細につい
ては上記図２と同様に、この図２によると、まず再ＩＰ
Ｌを行い（Ｓ２０１）、次に磁気ディスク装置３のチェ
ックポイントファイルＣＰＦ１及びＣＰＦ２を参照し
（Ｓ２０２）、有効コミット番号が存在するか否かを判
断して（Ｓ２０３）、有効コミット番号１が保存されて
いるので、このコミット番号１に対応する障害回復処理
用ログファイルＳＫＦの障害回復処理用ログ（コミット
番号１：Ｊ２、Ｌ０）を主記憶装置１の対応する格納領
域１２、１３にそれぞれ読み出して、そして磁気ディス
ク装置２に保存されているデータファイルＤＦ（Ｒ１、
Ｒ２）も主記憶装置１の入出力データ格納領域１４上に
読み出して（Ｓ２０４）、ジャーナルログＪ２（ＡＤＤ
（Ｒ２））に基づき、入出力データ格納領域１４に読み
出されたデータ（Ｒ１、Ｒ２）に対して逆処理（即ち、
ＤＥＬＥＴＥ（Ｒ２）処理）を行ってデータ（Ｒ１）を
回復（再現）して（Ｓ２０５）、磁気ディスク装置２の
データファイルＤＦにデータ（Ｒ１）を保存する（Ｓ２
０６）。The details of the failure recovery process (S21) are the same as in FIG. 2 above. According to FIG.
Perform L (S201), then refer to the checkpoint files CPF1 and CPF2 of the magnetic disk device 3 (S202), determine whether a valid commit number exists (S203), and save the valid commit number 1. Therefore, the failure recovery processing log (commit number 1: J2, L0) of the failure recovery processing log file SKF corresponding to this commit number 1 is read into the corresponding storage areas 12, 13 of the main storage device 1, respectively. Then, the data file DF (R1, R1,
R2) is also read onto the input / output data storage area 14 of the main storage device 1 (S204), and the journal log J2 (ADD
Based on (R2)), reverse processing (that is, that is, to the data (R1, R2) read in the input / output data storage area 14 is performed.
DELETE (R2) processing is performed to recover (reproduce) the data (R1) (S205), and the data (R1) is saved in the data file DF of the magnetic disk device 2 (S2).
06).

【００４６】これによって前回のコミット番号１で確定
されたトランザクション１（ＴＲ１）の状態にデータを
回復することができた。このときに障害が起きなかった
かどうかを判断して（Ｓ２０７）、起きたならば再び上
記Ｓ２０１〜Ｓ２０６を行う。As a result, the data could be restored to the state of the transaction 1 (TR1) confirmed by the previous commit number 1. At this time, it is determined whether or not a failure has occurred (S207), and if so, the above steps S201 to S206 are performed again.

【００４７】上記Ｓ１８又はＳ２１などが終了すると更
新されたデータ又は障害回復されたデータに基づきＣＰ
Ｕ７による所定処理を行う（Ｓ１９）ことができる。
尚、Ｓ１９に進むか否かはアプリケーションプログラ
ム（ＡＰ）の命令によるが、この一例はＳ２１の後にＳ
１９に進むことができる例を説明する。Upon completion of S18 or S21, the CP is updated based on the updated data or the error-recovered data.
The predetermined process by U7 can be performed (S19).
Whether to proceed to S19 depends on the instruction of the application program (AP), but in this example, S21 is followed by S
An example that can proceed to 19 will be described.

【００４８】また上記Ｓ１４によってタスク１によるブ
ロックＢ１に対する更新の処理によってタスク１はトラ
ンザクション３（ＴＲ３）を処理中であり、タスク１に
対する次のコミットが発行されることによってトランザ
クション３（ＴＲ３）を確定させることができる。この
トランザクションＴＲ３を確定させるまでに、他のタス
クから更新要求が出されても、その要求はジャーナルロ
グ格納領域１２及び障害回復用データ格納領域１３及び
入出力データ格納領域１４に格納しておくことができ
る。Further, the task 1 is processing the transaction 3 (TR3) by the update processing of the block B1 by the task 1 in S14, and the transaction 3 (TR3) is determined by issuing the next commit for the task 1. Can be made Even if an update request is issued from another task before the transaction TR3 is finalized, the request should be stored in the journal log storage area 12, the failure recovery data storage area 13, and the input / output data storage area 14. You can

【００４９】以上の様にして、図１の処理フローチャー
トにおいて、コミット番号１（Ｓ１１）及びコミット番
号２（Ｓ１６）において、何等障害が起きなかった場合
は、最終的に図５の格納状態及び保存状態にされる。即
ち磁気ディスク装置３の障害回復処理用ログファイルＳ
ＫＦにはコミット番号１〜２のログ（Ｌ、Ｊ）が保存さ
れ、チェックポイントファイルＣＰＦ１には有効コミッ
ト番号１が保存され、チェックポイントファイルＣＰＦ
２には有効コミット番号２が保存される。As described above, in the process flow chart of FIG. 1, if no failure occurs in the commit number 1 (S11) and the commit number 2 (S16), the storage state and the save state of FIG. 5 are finally obtained. Be put into a state. That is, the log file S for failure recovery processing of the magnetic disk device 3
Logs (L, J) with commit numbers 1 and 2 are stored in KF, valid commit number 1 is stored in checkpoint file CPF1, and checkpoint file CPF is stored.
The valid commit number 2 is stored in 2.

【００５０】もしも上記Ｓ１８によって障害無く上記コ
ミット番号２の処理が確定した後に、Ｓ１９などで電源
断などの障害が起きて主記憶装置１内のデータが無くな
った場合でも、上記図２と同様な方法で、チェックポイ
ントファイルＣＰＦ２から有効コミット番号２を参照し
て、同じ番号２の障害回復処理用ログファイルＳＫＦの
Ｌ１（Ｒ１、Ｒ２）とＪ３（ＤＥＬＥＴＥ（Ｒ１））を
主記憶装置１内の対応する格納領域１２、１３に読み出
して、更にデータファイルＤＦからデータ（Ｒ２、Ｒ
３）を入出力データ格納領域１４に読み出して、データ
（Ｒ２、Ｒ３）を設定して前回確定したトランザクショ
ン２（ＴＲ２）の格納状態を回復する。Even if the processing of the commit number 2 is confirmed without a failure in S18 and the data in the main storage device 1 is lost due to a failure such as power failure in S19, the same as in FIG. By referring to the effective commit number 2 from the checkpoint file CPF2, L1 (R1, R2) and J3 (DELETE (R1)) of the log file SKF for failure recovery processing of the same number 2 are stored in the main storage device 1 in the main storage device 1. The data (R2, R) is read from the corresponding storage areas 12, 13 and further read from the data file DF.
3) is read into the input / output data storage area 14, data (R2, R3) is set, and the storage state of the transaction 2 (TR2) confirmed last time is restored.

【００５１】以上の一実施例によれば、例えば電源障害
などでＣＰＵ処理が停止しても、再ＩＰＬに伴って自動
的にデータ更新前のデータを再現して、処理を再開でき
るようにすることができ、しかも障害によってＣＰＵの
処理が停止し、そして処理を再開始できる状態にするま
での時間を従来に比べ短縮することができる。According to the above embodiment, even if the CPU processing is stopped due to a power failure or the like, the data before the data update is automatically reproduced along with the re-IPL so that the processing can be restarted. In addition, the time until the processing of the CPU is stopped due to the failure and the processing can be restarted can be shortened as compared with the conventional case.

【００５２】これは１トランザクションＴＲの期間中
に、同じ更新対象ブロックのデータに対する複数の更新
要求を取り込んで格納しておくことができ、一つのトラ
ンザクションが確定するごとに上記複数の更新ジャーナ
ルと、更新前のデータと、更新後のデータとを保存して
いるので、障害時回復処理にこれらのデータを使用して
障害前に一番近い時期に確定したトランザクションの状
態に主記憶装置１の格納状態を回復（再現）できるから
である。This means that a plurality of update requests for the data of the same update target block can be fetched and stored during the period of one transaction TR, and the plurality of update journals can be stored each time one transaction is confirmed. Since the data before the update and the data after the update are saved, these data are used for the failure recovery processing to store the data in the main storage device 1 in the state of the transaction determined at the earliest time before the failure. This is because the state can be recovered (reproduced).

【００５３】更に、アプリケーションプログラム（Ａ
Ｐ）による更新処理を短期ロックで処理でき、アプリケ
ーションプログラム（ＡＰ）の処理効率を向上させる
ことができる。従って、複数のタスク処理を並行して行
うことができ、処理の効率化も図ることができる。また
以上の様なコンピュータシステムは、稼働停止時間が長
くなると重大な被害を受ける様なシステム、例えばオン
ラインシステムや種々のデータベース検索システムなど
に適用して効果的である。Furthermore, the application program (A
The update process by P) can be processed with a short-term lock, and the processing efficiency of the application program (AP) can be improved. Therefore, a plurality of task processes can be performed in parallel, and the process efficiency can be improved. Further, the above computer system is effective when applied to a system that suffers serious damage when the operation stop time is long, such as an online system or various database search systems.

【００５４】以上の一実施例においては、コンピュータ
システムの構成として、図３に示す構成を例に説明した
が、これに限るものではない。例えば通信機能を備えた
入出力装置８から外部の有線回線や無線回線などを通じ
て他のシステムとオンラインできる様に構成されても適
用することができる。また、主記憶装置１は揮発性のも
のであって、アクセス速度の速い記憶装置が望ましい。In the above embodiment, the configuration of the computer system has been described by taking the configuration shown in FIG. 3 as an example, but the configuration is not limited to this. For example, the present invention can be applied even if the I / O device 8 having a communication function is configured to be online with another system through an external wired line or wireless line. Further, the main storage device 1 is volatile, and a storage device having a high access speed is desirable.

【００５５】また、以上の一実施例の図３においては、
磁気ディスク装置２〜５を使用したが、不揮発性であれ
ばよいのであって、例えばアクセス速度に制限されなけ
れば磁気テープ装置などであってもよい。また磁気ディ
スク装置２〜５は、個々に分けること無く、一つの磁気
ディスク装置であってもよい。Further, in FIG. 3 of the above embodiment,
Although the magnetic disk devices 2 to 5 are used, they may be non-volatile, and may be a magnetic tape device or the like as long as the access speed is not limited. Further, the magnetic disk devices 2 to 5 may be one magnetic disk device without being individually divided.

【００５６】また、以上の一実施例においては、コンピ
ュータシステムを例に説明したが、これに限るものでは
ない。例えば専用システムにＭＰＵや記憶装置などが備
えられたものであれば、基本的には適用することができ
る。In the above embodiment, the computer system has been described as an example, but the invention is not limited to this. For example, if a dedicated system is provided with an MPU, a storage device, etc., it can be basically applied.

【００５７】また、以上の一実施例の図１の処理フロー
チャートにおいては、更新内容をジャーナルログＪ１〜
Ｊ４の場合について説明したが、この更新内容に限るも
のでのではない。またコミットを発行するタイミングも
コミット番号１をジャーナルログＪ２とＪ３の間に発行
し、コミット番号２をジャーナルログＪ４の後に発行し
たが、これに限るものではなく、いずれのタイミングで
発行されても適用することができる。また、図１の処理
フローチャートにおいては、Ｓ１２においてタスク１に
対するコミットの発行で、その時の入出力データＢ０
（Ｒ１、Ｒ２）を磁気ディスク装置２にそのまま保存し
たが、これに限るものではない。例えば上記Ｓ１２にお
いてはタスク１のトランザクション１（ＴＲ１）が確定
したのであるから、トランザクション１（ＴＲ１）に属
するＡＤＤ（Ｒ１）が確定して、タスク２によるＡＤＤ
（Ｒ２）は未確定であるので、Ｒ１のみを保存させるこ
とであってもよい。また、その他の処理の流れも、上記
図１の流れに限定するものではない。Further, in the processing flowchart of FIG. 1 of the above embodiment, the contents of update are the journal logs J1 to J1.
Although the case of J4 has been described, the contents are not limited to this update content. Further, the commit issuance timing is such that the commit number 1 is issued between the journal logs J2 and J3 and the commit number 2 is issued after the journal log J4. However, the commit number is not limited to this, and is issued at any timing. Can be applied. Further, in the processing flowchart of FIG. 1, when the commit is issued to the task 1 in S12, the input / output data B0 at that time is issued.
Although (R1, R2) is stored in the magnetic disk device 2 as it is, it is not limited to this. For example, in step S12, the transaction 1 (TR1) of task 1 is confirmed, so the ADD (R1) belonging to transaction 1 (TR1) is confirmed and the ADD of task 2 is determined.
Since (R2) is undetermined, only R1 may be stored. Further, the flow of other processing is not limited to the flow shown in FIG.

【００５８】[0058]

【発明の効果】以上述べた様にこの発明によれば、上記
更新ジャーナル格納手段と、上記更新後データ格納手段
と、上記更新前データ格納手段と、上記更新情報保存手
段とを備えて、障害検出時には、障害発生時に一番近い
前の時期に確定したトランザクションで更新されて上記
更新情報保存手段によって保存されている更新ジャーナ
ルと、更新後データと、更新前データとをそれぞれ対応
する上記更新ジャーナル格納手段と、更新後データ格納
手段と、更新前データ格納手段とに読み出して格納し、
上記確定したトランザクションのときの格納状態にして
いるので、自動的にこの格納状態から処理を再開でき
る。As described above, according to the present invention, the update journal storage means, the post-update data storage means, the pre-update data storage means, and the update information storage means are provided, and a failure occurs. At the time of detection, the update journal that has been updated by the transaction established at the closest previous time when the failure occurred and saved by the update information saving means, the updated data, and the pre-update data respectively correspond to the update journal. Read and store in the storage means, the updated data storage means, and the pre-updated data storage means,
Since the storage state at the time of the confirmed transaction is set, the processing can be automatically restarted from this storage state.

【００５９】従って、複数のタスクから更新要求が発行
されていても、障害が起きても自動的に前回確定したト
ランザクションのときの状態から効率的に再開すること
ができる。更にアプリケーションプログラムによる更
新処理を短期ロックで処理でき、アプリケーションプ
ログラムの処理効率を向上させることができる。Therefore, even if an update request is issued from a plurality of tasks, even if a failure occurs, it is possible to automatically and efficiently restart from the state of the last confirmed transaction. Furthermore, the update processing by the application program can be processed with a short-term lock, and the processing efficiency of the application program can be improved.

[Brief description of drawings]

【図１】この発明の一実施例に係るコンピュータシステ
ムの障害時処理回復方式の処理フローチャート（その
１）である。FIG. 1 is a processing flowchart (part 1) of a failure recovery processing method for a computer system according to an embodiment of the present invention.

【図２】この発明の一実施例に係るコンピュータシステ
ムの障害時処理回復方式の処理フローチャート（その
２）である。FIG. 2 is a processing flowchart (part 2) of the failure recovery processing method of the computer system according to the embodiment of the present invention.

【図３】この発明の一実施例に係るコンピュータシステ
ムの障害時処理回復方式を実現するハードウエア構成図
である。FIG. 3 is a hardware configuration diagram for realizing a failure time process recovery method for a computer system according to an embodiment of the present invention.

【図４】この発明の一実施例に係るコンピュータシステ
ムの障害時処理回復方式におけるあるブロックのデータ
更新の順序例を説明する説明図である。FIG. 4 is an explanatory diagram illustrating an example of a sequence of updating data of a block in a failure recovery processing method for a computer system according to an embodiment of the present invention.

【図５】この発明の一実施例に係るコンピュータシステ
ムの障害時処理回復方式における主記憶装置及び磁気デ
ィスク装置のファイル及びデータの格納状態例の説明図
である。FIG. 5 is an explanatory diagram of an example of a storage state of files and data in a main storage device and a magnetic disk device in a failure processing recovery method for a computer system according to an embodiment of the present invention.

【符号の説明】１…主記憶装置、２〜５…磁気ディスク装置、７…ＣＰ
Ｕ、１２…ジャーナルログ格納領域、１３…障害回復用
データ格納領域、１４…入出力データ格納領域、１５…
チェックポイント格納領域、１６…アプリケーション
プログラム（ＡＰ）格納領域。[Explanation of Codes] 1 ... Main storage device, 2-5 ... Magnetic disk device, 7 ... CP
U, 12 ... Journal log storage area, 13 ... Failure recovery data storage area, 14 ... Input / output data storage area, 15 ...
Checkpoint storage area, 16 ... Application
Program (AP) storage area.

Claims

[Claims]

1. An update journal storage unit that stores an update journal in a volatile manner each time an update request for update target data is issued, and the update target data is updated in response to the update request,
An updated data storage means for storing the updated data in a volatile manner, an unupdated data storage means for storing the unupdated data in a volatile manner each time the update is confirmed, and at least one or more updates by the same task. Each time a request is issued, one transaction is confirmed, and each time this one transaction is confirmed, the update journal and the update information storage means for storing the data before update and the data after update in a non-volatile manner. When a failure is detected, the update journal stored in the update information storage means is updated by the transaction confirmed in the closest previous period when the failure confirmed by the failure detection occurs, and the update journal storage means is used. Stored in
Similarly, the pre-updated data stored is stored in the pre-updated data storage means, and the post-updated data stored in the same manner is stored in the post-updated data storage means. A failure recovery method characterized by recovering the storage status of a transaction that was established in the previous period.