JPH06139087A

JPH06139087A - Check point restart system

Info

Publication number: JPH06139087A
Application number: JP4291523A
Authority: JP
Inventors: Namiko Hayashi; 奈美子林
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1992-10-29
Filing date: 1992-10-29
Publication date: 1994-05-20
Anticipated expiration: 2016-02-19
Also published as: JP3135714B2

Abstract

PURPOSE:To improve the job execution performance by asynchronously executing the job execution and the writing of the execution environment in a secondary storage device. CONSTITUTION:Check point information to be acquired is written in a magnetic disk 14 through an expansion memory device 12. In this case, the expansion memory device 12 is a nonvolatile memory backed up by the power supply and even when a fault occurs in writing in the magnetic disk device 14, the check point information is held in the memory 12 as it is. At the fault generation, the presence or absence of the effectivity of the check point information of the expansion memory device 12 is judged by referring to the flag of a management information area 3A. The recovery processing using the check point information of the expansion memory 12 is performed if it is effective and the recovery process using the check point information of the magnetic disk 14 if it is invalid.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明はチェックポイントファ
イルの内容に基づいて計算機システムの実行環境を復元
するチェックポイントリスタート方式に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a checkpoint restart method for restoring the execution environment of a computer system based on the contents of a checkpoint file.

【０００２】[0002]

【従来の技術】一般に、コンピュータシステムにおいて
は、各種の障害対策機能が設けられている。この障害対
策機能の１つとして、リスタート処理が良く使用されて
いる。このリスタート処理は、障害により実行中のジョ
ブまたはシステム全体が停止した場合に実行されるもの
である。2. Description of the Related Art Generally, a computer system is provided with various fault countermeasure functions. Restart processing is often used as one of the failure countermeasure functions. This restart processing is executed when the job being executed or the entire system is stopped due to a failure.

【０００３】リスタート処理には、特定のジョブのみを
リスタートさせる場合（ジョブリスタート）と、システ
ム全体をリスタートさせる場合（システムリスタート）
とがあり、一般には、処理の高速化の目的でジョブリス
タートが多く使用されている。In the restart processing, only a specific job is restarted (job restart) and the entire system is restarted (system restart).
Generally, job restart is often used for the purpose of speeding up the process.

【０００４】ジョブリスタートとしては、チェックポイ
ントリスタート処理方式が良く知られている。このチェ
ックポイントリスタートは、ジョブの要所要所に予めチ
ェックポイントを設定しておくものであり、ジョブ処理
がチェックポイントにくるたびにその実行環境をステー
タス情報として２次記憶装置のチェックポイントファイ
ルに記録しておき、障害によりジョブの実行が中断した
場合には、最新のチェックポイントからリスタートされ
る。As a job restart, a checkpoint restart processing method is well known. In this checkpoint restart, checkpoints are set in advance at key points of the job, and each time the job processing reaches a checkpoint, its execution environment is recorded as status information in the checkpoint file of the secondary storage device. If the job execution is interrupted due to a failure, the job is restarted from the latest checkpoint.

【０００５】このように、従来のチェックポイントリス
タートは、実行中のプログラムの実行環境を外部の２次
記憶装置にチェックポイントファイルとして保存し、障
害発生によりプログラムの実行が中断した場合に、保存
されているチェックポイントファイルの情報に基づいて
旧環境を復元し、再実行を可能とするものであり、この
リスタート方式を用いることで高速の障害復旧を行なう
ことができる。As described above, the conventional checkpoint restart saves the execution environment of the program being executed as a checkpoint file in the external secondary storage device, and saves it when the execution of the program is interrupted due to a failure. The old environment is restored on the basis of the information of the checkpoint file that has been stored, and re-execution is possible. By using this restart method, high-speed failure recovery can be performed.

【０００６】しかしながら、このような従来のチェック
ポイントリスタート方式では、ジョブの実行と、２次記
憶装置への実行環境の書き込みが同期して行なわれるの
で、ＣＰＵは、図６に示されているように、２次記憶装
置への書き込みを指示する書き込み要求を発行してから
書き込み完了を示す応答が返送されるまで、ジョブの実
行が待たされることになる。However, in such a conventional checkpoint restart method, since the execution of the job and the writing of the execution environment into the secondary storage device are performed in synchronization, the CPU is shown in FIG. As described above, the execution of the job is delayed until the write request for instructing the write to the secondary storage device is issued and the response indicating the write completion is returned.

【０００７】なぜなら、２次記憶装置への書き込み終了
を待たずに次のジョブを実行すると、主記憶上の実行環
境が変化されてしまうので、もし２次記憶装置への書き
込み途中に障害が発生した場合には元の実行環境が消失
されてしまうからである。This is because if the next job is executed without waiting for the completion of writing to the secondary storage device, the execution environment on the main storage will change, so if a failure occurs during writing to the secondary storage device. If you do, the original execution environment will be lost.

【０００８】このため、従来のチェックポイントリスタ
ート方式では、チェックポイント毎に実行環境を２次記
憶装置に書き込むための待ち時間が挿入されることにな
り、これによってジョブの実行性能が低下される欠点が
あった。Therefore, in the conventional checkpoint restart method, a waiting time for writing the execution environment into the secondary storage device is inserted at each checkpoint, which deteriorates the job execution performance. There was a flaw.

【０００９】[0009]

【発明が解決しようとする課題】従来では、ジョブの実
行と２次記憶装置への実行環境の書き込みが同期して行
なわれるので、チェックポイント毎に実行環境を２次記
憶装置に書き込むための待ち時間が挿入されることにな
り、ジョブの実行性能が低下される欠点があった。Conventionally, since the execution of a job and the writing of the execution environment to the secondary storage device are performed in synchronization with each other, the waiting for writing the execution environment to the secondary storage device is performed at each checkpoint. Since time is inserted, there is a drawback that the job execution performance is degraded.

【００１０】この発明はこのような点に鑑みてなされた
もので、ジョブの実行と２次記憶装置への実行環境の書
き込みとを非同期で実行できるようにし、ジョブ実行性
能の向上を図ることができるチェックポイントリスター
ト方式を提供することを目的とする。The present invention has been made in view of the above circumstances, and enables execution of a job and writing of an execution environment in a secondary storage device to be executed asynchronously, thereby improving job execution performance. The purpose is to provide a possible checkpoint restart method.

【００１１】[0011]

【課題を解決するための手段および作用】この発明は、
チェックポイントファイルの内容に基づいて計算機シス
テムの実行環境を復元するチェックポイントリスタート
方式において、チェックポイントファイルが記憶される
２次記憶装置と、前記チェックポイントファイルに書き
込むべきチェックポイント情報が格納されるバッファエ
リアとそのバッファエリアのチェックポイント情報の有
効性を示すフラグがセットされる管理情報エリアとを有
する不揮発性メモリと、前記不揮発性メモリから前記２
次記憶装置へのデータ転送を実行する入出力手段と、前
記計算機システム上で実行中のジョブの実行環境をチェ
ックポイント毎に採取し、その採取情報を前記バッファ
エリアに格納すると共に、前記管理情報エリアに前記フ
ラグをセットする手段と、前記不揮発性メモリのバッフ
ァエリアの内容を前記２次記憶装置のチェックポイント
ファイルに書き込む要求を、前記入出力手段に発行する
手段と、前記入出力手段からの書き込み完了通知に応答
して、前記管理情報エリアのフラグをリセットして前記
不揮発性メモリの対応するバッファエリアを解放する手
段と、障害発生時に前記不揮発性メモリの管理情報エリ
アのフラグを参照し、そのフラグのセット／リセット状
態に応じて前記不揮発性メモリのバッファエリアの内容
または前記２次記憶装置のチェックポイントファイルを
利用して計算機システムの実行環境を復元する手段とを
具備することを特徴とする。Means and Actions for Solving the Problems
In the checkpoint restart method that restores the execution environment of the computer system based on the contents of the checkpoint file, a secondary storage device in which the checkpoint file is stored and checkpoint information to be written in the checkpoint file are stored. A nonvolatile memory having a buffer area and a management information area in which a flag indicating the validity of the checkpoint information of the buffer area is set;
Input / output means for executing data transfer to the next storage device, and an execution environment of a job being executed on the computer system is sampled for each checkpoint, and the collected information is stored in the buffer area, and the management information is also stored. Means for setting the flag in the area; means for issuing to the input / output means a request for writing the contents of the buffer area of the non-volatile memory in the checkpoint file of the secondary storage; In response to the write completion notification, referring to the means of resetting the flag of the management information area to release the corresponding buffer area of the non-volatile memory, and the flag of the management information area of the non-volatile memory when a failure occurs, Depending on the set / reset state of the flag, the contents of the buffer area of the nonvolatile memory or the secondary memory Characterized by comprising a means for restoring the execution environment of the device checkpoint file computer system using the.

【００１２】このチェックポイントリスタート方式にお
いては、採取対象のチェックポイント情報は不揮発性メ
モリを介して２次記憶装置に書き込まれる。この場合、
メモリは不揮発性であるので、もし２次記憶装置への書
き込み途中に障害が発生した場合においても、そのチェ
ックポイント情報は不揮発性メモリにそのまま保持され
ている。このため、障害発生時にフラグを参照して不揮
発性メモリのチェックポイント情報の有効性の有無を判
断し、有効の場合には不揮発性メモリのチェックポイン
ト情報を利用した復元処理、無効の場合には２次記憶装
置のチェックポイント情報を利用した復元処理を行なう
ことによって、不揮発性メモリを利用したチェックポイ
ントリスタートが可能となる。したがって、ジョブの実
行を２次記憶装置への書き込みが完了するまで待つ必要
がなくなるり、ジョブの実行と２次記憶装置への実行環
境の書き込みとを非同期で実行できるようになり、ジョ
ブ実行性能の向上を図ることができる。In the checkpoint restart method, the checkpoint information to be collected is written in the secondary storage device via the non-volatile memory. in this case,
Since the memory is non-volatile, even if a failure occurs during writing to the secondary storage device, the checkpoint information is retained in the non-volatile memory as it is. For this reason, when a failure occurs, the flag is used to determine whether the checkpoint information in the non-volatile memory is valid. If it is valid, the restoration process using the checkpoint information in the non-volatile memory is performed. By performing the restoration process using the checkpoint information of the secondary storage device, the checkpoint restart using the non-volatile memory becomes possible. Therefore, it is not necessary to wait for the execution of the job until the writing to the secondary storage device is completed, and the execution of the job and the writing of the execution environment to the secondary storage device can be executed asynchronously. Can be improved.

【００１３】[0013]

【実施例】以下、図面を参照してこの発明の実施例を説
明する。Embodiments of the present invention will be described below with reference to the drawings.

【００１４】図１にはこの発明の一実施例に係わる計算
機システムの構成が示されている。この計算機システム
は、計算機本体１１、拡張メモリ装置１２、電源バック
アップ装置１３、および磁気ディスク装置１４から構成
されている。計算機本体１１は通常のコンピュータシス
テムと同様の構成を有するものであり、システムバスを
介して相互接続されたＣＰＵ１１１、メインメモリ１１
２、Ｉ／Ｏチャネル１１３等から構成されている。FIG. 1 shows the configuration of a computer system according to an embodiment of the present invention. This computer system comprises a computer main body 11, an extended memory device 12, a power supply backup device 13, and a magnetic disk device 14. The computer main body 11 has the same configuration as a normal computer system, and includes a CPU 111 and a main memory 11 which are interconnected via a system bus.
2, I / O channel 113 and the like.

【００１５】ＣＰＵ１１１は、計算機システム全体の制
御を司るものであり、各種ジョブを実行する。また、Ｃ
ＰＵ１１１は、チェックポイントリスタートを実現する
ための機能を有している。チェックポイントリスタート
は、ジョブの要所要所に予めチェックポイントを設定し
ておくものであり、ジョブ処理がチェックポイントにく
るたびにその実行環境をチェックポイント情報として磁
気ディスク装置１１４のチェックポイントファイルに記
録しておき、障害によりジョブの実行が中断した場合に
は、最新のチェックポイントからリスタートされる。The CPU 111 controls the entire computer system and executes various jobs. Also, C
The PU 111 has a function for realizing checkpoint restart. In the checkpoint restart, checkpoints are set in advance at key points of the job, and the execution environment is recorded as checkpoint information in the checkpoint file of the magnetic disk device 114 each time the job processing reaches the checkpoint. If the job execution is interrupted due to a failure, the job is restarted from the latest checkpoint.

【００１６】チェックポイントの採取過程においては、
ＣＰＵ１１１は、チェックポイント情報をメインメモリ
１１２から拡張メモリ装置１２に転送し、その後、拡張
メモリ装置１２から磁気ディスク装置１１４にチェック
ポイント情報を書き込む要求を発行する。このようなＣ
ＰＵ１１１によるチェックポイントの採取処理は、メイ
ンメモリ１１２のチェックポイント管理プログラムに従
って実行される。In the process of collecting check points,
The CPU 111 transfers the checkpoint information from the main memory 112 to the extended memory device 12, and then issues a request for writing the checkpoint information from the extended memory device 12 to the magnetic disk device 114. C like this
The checkpoint collection processing by the PU 111 is executed according to the checkpoint management program in the main memory 112.

【００１７】メインメモリ１１２はダイナミックＲＡＭ
のような通常の揮発姓ＲＡＭから構成されおり、チェッ
クポイント管理プログラムを記憶している。また、メイ
ンメモリ１１２はジョブ実行のための作業領域として使
用され、メインメモリ１１２上にはジョブの実行環境を
示す情報が設定される。図において、２Ａ，２Ｂ，２Ｃ
はプログラムの実行環境の一部を示すものであり、これ
らがチェックポイント情報として採取される。The main memory 112 is a dynamic RAM
And a checkpoint management program stored therein. Further, the main memory 112 is used as a work area for job execution, and information indicating a job execution environment is set on the main memory 112. In the figure, 2A, 2B, 2C
Indicates a part of the execution environment of the program, and these are collected as checkpoint information.

【００１８】Ｉ／Ｏチャネル１１３は、ＣＰＵ１１１か
らの要求に応じて磁気ディスク装置１４に対するリード
／ライトを実行する入出力装置であり、チェックポイン
トの採取過程においては、ＣＰＵ１１１からの書き込み
要求にしたがって拡張メモリ装置１２のチェックポイン
ト情報を磁気ディスク装置１４のチェックポイントファ
イル１４１に書き込む。また、Ｉ／Ｏチャネル１１３
は、書き込みが完了すると、書き込み完了通知をＣＰＵ
１１１に渡す。The I / O channel 113 is an input / output device for executing read / write with respect to the magnetic disk device 14 in response to a request from the CPU 111, and is expanded according to a write request from the CPU 111 in the checkpoint sampling process. The checkpoint information of the memory device 12 is written in the checkpoint file 141 of the magnetic disk device 14. Also, the I / O channel 113
When the writing is completed, the CPU sends a writing completion notification to the CPU.
Hand it over to 111.

【００１９】拡張メモリ装置１２は必要に応じて計算機
本体１１に接続されるダイナミックＲＡＭのような揮発
姓メモリであるが、ここでは、電源バックアップ装置１
３からのバックアップ電源によって不揮発姓メモリとし
て利用されるように構成されている。この拡張メモリ装
置１２には、採取されたチェックポイント情報が一時的
に記憶されるバッファエリア３Ｂ１、３Ｂ２と、チェッ
クポイント情報の世代管理のための管理情報が設定され
る管理情報エリア３Ａが割り当てられる。The expansion memory device 12 is a volatile memory such as a dynamic RAM which is connected to the computer main body 11 as needed, but here, the power supply backup device 1 is used.
It is configured to be used as a non-volatile memory by a backup power source from 3. The extended memory device 12 is allocated with buffer areas 3B1 and 3B2 in which the collected checkpoint information is temporarily stored, and a management information area 3A in which management information for generation management of checkpoint information is set. .

【００２０】磁気ディスク装置１４は、この計算機シス
テムの２次記憶装置として利用されるものであり、ここ
には、チェックポイントリスタートに必要なチェックポ
イントファイル１４１が蓄積保持される。チェックポイ
ントファイル１４１は、管理情報エリア６Ａ、チェック
ポイント情報エリア６Ｂ１、６Ｂ２から構成される。管
理情報エリア６Ａには、チェックポイント情報の世代管
理等を行なう管理情報が設定される。ここでは、拡張メ
モリ装置１２の管理情報エリア３Ａの内容が随時、管理
情報エリア６Ａに反映される。チェックポイント情報エ
リア６Ｂ１、６Ｂ２には、拡張メモリ装置１２のバッフ
ァエリア３Ｂ１、３Ｂ２の内容が書き込まれる。次に、
図２のフローチャートを参照して、チェックポイント情
報の採取処理を説明する。The magnetic disk device 14 is used as a secondary storage device of this computer system, and a checkpoint file 141 required for a checkpoint restart is accumulated and held therein. The checkpoint file 141 includes a management information area 6A and checkpoint information areas 6B1 and 6B2. In the management information area 6A, management information for performing generation management of checkpoint information is set. Here, the contents of the management information area 3A of the extended memory device 12 are reflected in the management information area 6A at any time. The contents of the buffer areas 3B1 and 3B2 of the extended memory device 12 are written in the checkpoint information areas 6B1 and 6B2. next,
Checkpoint information collection processing will be described with reference to the flowchart in FIG.

【００２１】ここでは、拡張メモリ装置１２を利用した
ロールバック処理に対応するために２つのバッファを利
用して、拡張メモリ装置１２に２世代のチェックポイン
ト情報を確保する場合について説明する。Here, a case will be described in which two buffers are used to support rollback processing using the extended memory device 12 and two-generation checkpoint information is secured in the extended memory device 12.

【００２２】ユーザプログラムのチェックポイント（プ
ログラム静止点１）で、チェックポイント情報として保
存すべき情報（世代１）を２Ａ、２Ｂ、２Ｃと仮定す
る。この場合、ＣＰＵ１１１は、それら情報２Ａ、２
Ｂ、２Ｃを格納するためのバッファ３Ｂ１を拡張メモリ
装置１２上に確保する（ステップＳ１１）。It is assumed that information (generation 1) to be stored as checkpoint information at the checkpoint (program quiescent point 1) of the user program is 2A, 2B, 2C. In this case, the CPU 111 causes the information 2A, 2
A buffer 3B1 for storing B and 2C is secured in the extended memory device 12 (step S11).

【００２３】次いで、ＣＰＵ１１１は、採取すべき情報
２Ａ、２Ｂ、２Ｃをバッファ３Ｂ１に転送し（ステップ
Ｓ１２）、そして、この時点で、このチェックポイント
情報を確立するために、管理情報エリア３Ａに情報２
Ａ、２Ｂ、２ＣのＩＤ（識別子）と、拡張メモリ装置１
２上に有効な情報がセットされていることを示す有効フ
ラグとをセットする（ステップＳ１３）。Next, the CPU 111 transfers the information 2A, 2B, 2C to be collected to the buffer 3B1 (step S12), and at this point, the information is stored in the management information area 3A in order to establish this checkpoint information. Two
The IDs (identifiers) of A, 2B, and 2C and the extended memory device 1
2 and a valid flag indicating that valid information is set (step S13).

【００２４】次いで、ＣＰＵ１１１は、拡張メモリ装置
１２上のバッファ３Ｂ１の内容を磁気ディスク装置１４
のチェックポイントファイル１４１に書き込むための書
き込み要求（Ｗ１）をＩ／Ｏチャネル１１３に発行し、
ユーザプログラムの実行に戻る（ステップＳ１４）。Next, the CPU 111 loads the contents of the buffer 3B1 on the extended memory device 12 into the magnetic disk device 14.
Issue a write request (W1) for writing to the checkpoint file 141 of
The process returns to the execution of the user program (step S14).

【００２５】そして、ジョブ実行を継続し、次のチェッ
クポイントがくると、ＣＰＵ１１１は、その時点で採取
すべき情報（世代２）を格納するためのバッファ３Ｂ２
を拡張メモリ装置１２上に確保する（ステップＳ１
５）。次いで、ＣＰＵ１１１は、採取すべき情報をバッ
ファ３Ｂ２に転送し（ステップＳ１６）、そして、この
時点で、このチェックポイント情報を確立するために、
管理情報エリア３Ａにその情報のＩＤ（識別子）と、拡
張メモリ装置１２上にセットされていることを示す有効
フラグをセットする（ステップＳ１７）。When job execution is continued and the next checkpoint arrives, the CPU 111 causes the buffer 3B2 to store the information (generation 2) to be collected at that time.
Is reserved on the extended memory device 12 (step S1).
5). Next, the CPU 111 transfers the information to be collected to the buffer 3B2 (step S16), and at this point, in order to establish this checkpoint information,
In the management information area 3A, an ID (identifier) of the information and a valid flag indicating that it is set on the extended memory device 12 are set (step S17).

【００２６】次いで、ＣＰＵ１１１は、拡張メモリ装置
１２上のバッファ３Ｂ２の内容を磁気ディスク装置１４
のチェックポイントファイル１４１に書き込むための書
き込み要求（Ｗ２）をＩ／Ｏチャネル１１３に発行し、
ユーザプログラムの実行に戻る（ステップＳ１８）。Next, the CPU 111 loads the contents of the buffer 3B2 on the expansion memory device 12 into the magnetic disk device 14.
Issue a write request (W2) for writing to the checkpoint file 141 of
The process returns to the execution of the user program (step S18).

【００２７】この後、さらに次のチェックポイントがく
るとバッファ３Ｂ１が使用対象となるので、次のチェッ
クポイントが来る前に、最終的なＩ／Ｏ完了待ちを行な
う（ステップＳ１９）。そして、書き込み要求Ｗ１に対
する書き込み完了通知に応答して、バッファ３Ｂ１が解
放され、次の新たなチェックポイント情報のためにその
バッファ３Ｂ１が新たに確保される（ステップＳ２
０）。図３には、書き込み完了時の処理が示されてい
る。図２のステップＳ１９でＩ／Ｏチャネル１１３から
書き込み完了通知が発行されると、ＣＰＵ１１１は、図
３の処理を実行する。After that, the buffer 3B1 becomes the target for use when the next checkpoint arrives, so that a final I / O completion wait is performed before the next checkpoint arrives (step S19). Then, in response to the write completion notification for the write request W1, the buffer 3B1 is released, and the buffer 3B1 is newly secured for the next new checkpoint information (step S2).
0). FIG. 3 shows the processing when the writing is completed. When the write completion notification is issued from the I / O channel 113 in step S19 of FIG. 2, the CPU 111 executes the process of FIG.

【００２８】例えば、バッファ３Ｂ１の内容をディスク
１４に書き込む要求Ｗ１に対する完了通知の場合、ＣＰ
Ｕ１１１は、まず、管理情報エリア３Ａのバッファ３Ｂ
１に対応するフラグをリセットする（ステップＳ２
１）。次いで、ＣＰＵ１１１は、バッファ３Ｂ１を次の
チェックポイント情報のために解放する（ステップＳ２
２）。次に、図４のフローチャートを参照して、障害発
生時の復元処理を説明する。何らかの障害発生により計
算機システムがダウンした場合、計算機システムの再立
ち上げ後、ＣＰＵ１１１によって次のリスタート処理が
実行される。For example, in the case of the completion notification for the request W1 for writing the contents of the buffer 3B1 to the disk 14, the CP
U111 is the buffer 3B of the management information area 3A.
The flag corresponding to 1 is reset (step S2
1). Next, the CPU 111 releases the buffer 3B1 for the next checkpoint information (step S2).
2). Next, the restoration process when a failure occurs will be described with reference to the flowchart in FIG. When the computer system is down due to some failure, the CPU 111 executes the next restart process after the computer system is restarted.

【００２９】すなわち、ＣＰＵ１１１は、まず、拡張メ
モリ装置１２の管理情報エリア３Ａを参照し、最新のＩ
Ｄを持つチェックポイント情報に対応するフラグの状態
を調べる（ステップＳ３１）。ＣＰＵ１１１は、フラグ
がセットされているかリセットされているかを判断し
（ステップＳ３２）、そのフラグのセット／リセットに
応じて復元処理に使用する情報を選択する。That is, the CPU 111 first refers to the management information area 3A of the expansion memory device 12 to find the latest I
The state of the flag corresponding to the checkpoint information having D is checked (step S31). The CPU 111 determines whether the flag is set or reset (step S32), and selects the information used for the restoration process according to the set / reset of the flag.

【００３０】フラグがセットされている場合には、拡張
メモリ１２のチェックポイント情報が有効であり、その
チェックポイント情報はまだディスク１４に書き込まれ
ていない。このため、フラグがセットされている場合に
は、ＣＰＵ１１１は、拡張メモリ１２のチェックポイン
ト情報（バッファ３Ｂ１または３Ｂ２）を利用した復元
処理を実行する（ステップＳ３３）。If the flag is set, the checkpoint information in the extended memory 12 is valid and the checkpoint information has not yet been written to the disk 14. Therefore, when the flag is set, the CPU 111 executes the restoration process using the checkpoint information (buffer 3B1 or 3B2) of the extension memory 12 (step S33).

【００３１】フラグがリセットされている場合には、拡
張メモリ１２のチェックポイント情報が無効であり、そ
のチェックポイント情報は既にディスク１４に書き込ま
れている。このため、フラグがリセットされている場合
には、ＣＰＵ１１１は、磁気デク装置１４のチェックポ
イントファイル１４１のチェックポイント情報（６Ｂ１
または６Ｂ２）を利用した復元処理を実行する（ステッ
プＳ３４）。If the flag is reset, the checkpoint information in the extension memory 12 is invalid and the checkpoint information has already been written in the disk 14. Therefore, if the flag is reset, the CPU 111 checks the checkpoint information (6B1) of the checkpoint file 141 of the magnetic disk device 14.
Alternatively, the restoration process using 6B2) is executed (step S34).

【００３２】以上のように、この実施例のチェックポイ
ントリスタート方式においては、採取対象のチェックポ
イント情報は拡張メモリ装置１２を介して磁気ディスク
装置１４に書き込まれる。この場合、拡張メモリ装置１
２は電源バックアップされた不揮発性メモリであるの
で、もし磁気ディスク装置１４への書き込み途中に障害
が発生した場合においても、そのチェックポイント情報
は拡張メモリ装置１２にそのまま保持されている。この
ため、障害発生時に管理情報エリア３Ａのフラグを参照
して拡張メモリ装置１２のチェックポイント情報の有効
性の有無を判断し、有効の場合には拡張メモリ装置１２
のチェックポイント情報を利用した復元処理、無効の場
合には磁気ディスク装置１４のチェックポイント情報を
利用した復元処理を行なうことによって、拡張メモリ装
置１２を利用したチェックポイントリスタートが可能と
なる。As described above, in the checkpoint restart method of this embodiment, the checkpoint information to be collected is written in the magnetic disk device 14 via the extended memory device 12. In this case, the extended memory device 1
Reference numeral 2 is a non-volatile memory whose power is backed up. Therefore, even if a failure occurs during writing to the magnetic disk device 14, the checkpoint information is retained in the extended memory device 12 as it is. Therefore, when a failure occurs, the flag of the management information area 3A is referred to determine whether the checkpoint information of the extended memory device 12 is valid, and if it is valid, the extended memory device 12 is checked.
By performing the restoration process using the checkpoint information of No. 1 and the restoration process using the checkpoint information of the magnetic disk device 14 when the checkpoint information is invalid, the checkpoint restart using the extended memory device 12 becomes possible.

【００３３】したがって、図５に示すように、ジョブの
実行を磁気ディスク装置１４への書き込みが完了するま
で待つ必要がなくなるので、ジョブの実行は磁気ディス
ク装置１４への実行環境の書き込みと非同期で実行でき
るようになり、ジョブ実行性能の向上を図ることができ
る。Therefore, as shown in FIG. 5, it is not necessary to wait for the execution of the job until the writing to the magnetic disk device 14 is completed, so the job execution is asynchronous with the writing of the execution environment to the magnetic disk device 14. The job can be executed, and the job execution performance can be improved.

【００３４】尚、この実施例では、拡張メモリ装置１２
を不揮発性メモリとして使用するために電源バックアッ
プ装置１３を用いたが、ＥＥＰＲＯＭ等の不揮発性メモ
リ素子を拡張メモリ装置１２に使用することも可能であ
る。In this embodiment, the extended memory device 12 is used.
Although the power supply backup device 13 is used to use the above as a non-volatile memory, it is also possible to use a non-volatile memory element such as an EEPROM for the extended memory device 12.

【００３５】[0035]

【発明の効果】以上詳記したようにこの発明によれば、
ジョブの実行と２次記憶装置への実行環境の書き込みと
を非同期で実行できるようになり、ジョブ実行性能の向
上を図ることができる。As described above in detail, according to the present invention,
The execution of the job and the writing of the execution environment to the secondary storage device can be executed asynchronously, and the job execution performance can be improved.

[Brief description of drawings]

【図１】この発明の一実施例に係る計算機システムの構
成を示すブロック図。FIG. 1 is a block diagram showing the configuration of a computer system according to an embodiment of the present invention.

【図２】同実施例におけるチェックポイント情報採取処
理動作を説明するフローチャート。FIG. 2 is a flowchart illustrating a checkpoint information collection processing operation according to the embodiment.

【図３】同実施例におけるチェックポイント情報の書き
込み完了時の動作を説明するフローチャート。FIG. 3 is a flowchart illustrating an operation when writing of checkpoint information is completed in the embodiment.

【図４】同実施例におけるリスタート処理動作を説明す
るフローチャート。FIG. 4 is a flowchart illustrating a restart processing operation according to the embodiment.

【図５】同実施例におけるジョブの実行がチェックポイ
ン情報の書き込み動作と非同期に実行される様子を示す
図。FIG. 5 is a diagram showing how execution of a job in the embodiment is executed asynchronously with a checkpoint information writing operation.

【図６】従来のチェックポイト処理方式においてジョブ
の実行がチェックポイン情報の書き込み動作と同期して
実行される様子を示す図。FIG. 6 is a diagram showing how a job is executed in synchronization with a checkpoint information writing operation in a conventional checkpoint processing method.

[Explanation of symbols]

１１…計算機本体、１２…拡張メモリ装置、１３…電源
バックアップ装置、１４…磁気ディスク装置、１１１…
ＣＰＵ、１１２…メインメモリ、１１３…Ｉ／Ｏチャネ
ル、３Ａ…管理情報エリア、３Ｂ１，３Ｂ２…バッファ
エリア、１４１…チェックポイントファイル。11 ... Computer main body, 12 ... Extended memory device, 13 ... Power supply backup device, 14 ... Magnetic disk device, 111 ...
CPU, 112 ... Main memory, 113 ... I / O channel, 3A ... Management information area, 3B1, 3B2 ... Buffer area, 141 ... Checkpoint file.

Claims

[Claims]

1. A checkpoint restart method for restoring an execution environment of a computer system based on the contents of a checkpoint file, a secondary storage device in which the checkpoint file is stored, and a checkpoint to be written in the checkpoint file. A nonvolatile memory having a buffer area for storing information and a management information area in which a flag indicating the validity of checkpoint information in the buffer area is set, and data transfer from the nonvolatile memory to the secondary storage device. And an input / output means for executing the above, and the execution environment of the job being executed on the computer system is collected for each checkpoint, the collected information is stored in the buffer area, and the flag is set in the management information area. Means and a buffer of the non-volatile memory A means for issuing a request to write the contents of the area to the checkpoint file of the secondary storage device to the input / output means, and resetting the flag of the management information area in response to the write completion notification from the input / output means. Means for releasing the corresponding buffer area of the non-volatile memory, and referring to the flag of the management information area of the non-volatile memory when a failure occurs, the buffer of the non-volatile memory according to the set / reset state of the flag. A checkpoint restart method comprising means for restoring the execution environment of the computer system by utilizing the contents of the area or the checkpoint file of the secondary storage device.