JPH05108388A

JPH05108388A - Process restoration system

Info

Publication number: JPH05108388A
Application number: JP3266668A
Authority: JP
Inventors: Toshio Shirokibara; 敏雄白木原
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1991-10-16
Filing date: 1991-10-16
Publication date: 1993-04-30

Abstract

PURPOSE:To reduce an overhead for the process restoration by copying only the revised part without copying the entire file image at the check point. CONSTITUTION:An operation system(OS) 101 is provided with a write trap processing part 102, a check point processing part 103, and a process restoration part 104. The write during process execution is detected by the write protect function of an MMU(memory management unit) 105, and the copy of the written part is prepared to revise the copy. By operating a page table 107 of an MM (main storage) 106, the copy is reflected on the copy source at the time of passing the check point during process execution. In the case the process restores the revised data at to the data the time of latest check point, the restoration of the process is performed by making the copy invalid.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は、オペレーティングシ
ステムにおいて、プロセスが更新したデータを最も近い
チェックポイント時に復旧するプロセス復旧方式に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a process recovery system for recovering data updated by a process at the nearest checkpoint in an operating system.

【０００２】[0002]

【従来の技術】プロセス実行中にエラーなどによってプ
ロセスの実行が継続できなくなったときに、そのプロセ
スの実行中に適宜チェックポイントを設定しておき、プ
ロセスの実行が継続できなくなった時点で最も最近のチ
ェックポイントまでデータを復旧し、そのチェックポイ
ントからプロセスを再実行させる必要が生じる場合があ
る。2. Description of the Related Art When a process cannot be continued due to an error during execution of the process, a checkpoint is set appropriately during the process, and the latest checkpoint is made when the process cannot be continued. It may be necessary to recover the data up to that checkpoint and restart the process from that checkpoint.

【０００３】例えばデータベースシステムにおいて、ト
ランザクション実行中にデッドロックが発生した場合に
は、最も最近のコミットポイントからデッドロックが起
きた時点までに更新したデータをもとに戻し、そのコミ
ットポイントから実行を再開する必要がある。For example, in a database system, when a deadlock occurs during execution of a transaction, the data updated from the most recent commit point to the time when the deadlock occurs is returned to the original and the execution is executed from that commit point. Need to restart.

【０００４】また、複数のプロセッサが１つの実行キュ
ーからプロセスを取り、実行するようなマルチプロセッ
サシステムにおいて、あるプロセッサがハードウェアの
故障などによりプロセスの実行が継続できなくなった場
合、プロセスがディスパッチされてから変更したデータ
をもとに戻し、プロセスの状態を実行した時点の状態に
戻して実行キューにつなぐことにより、他のプロセッサ
によって実行させることができる。In a multiprocessor system in which a plurality of processors take a process from one execution queue and execute the process, when a certain processor cannot continue execution due to a hardware failure or the like, the process is dispatched. It can be executed by another processor by returning the changed data to the original state, returning the state of the process to the state at the time of execution, and connecting it to the execution queue.

【０００５】このようなプロセス復旧方式において、プ
ロセスが更新したデータを最も最近のチェックポイント
時まで復元する方法には、以下の２つのようなものがあ
る。（１）チェックポイントにおいて、プロセスの持ってい
るデータイメージ（データ領域）全体をコピーし、復旧
の必要が生じたときには、その時点のデータイメージを
無効にし、コピーしたデータイメージを有効にする方
法。In such a process recovery method, there are the following two methods for restoring the data updated by the process up to the latest checkpoint. (1) A method of copying the entire data image (data area) of the process at the checkpoint, invalidating the data image at that point in time when restoration is necessary, and validating the copied data image.

【０００６】（２）プロセスがどのデータをどの様に変
更したかというデータ更新の履歴を取っておき、その履
歴をもとにチェックポイント時のデータイメージを復旧
する方法。(2) A method of retrieving a data image at the time of checkpoint based on the history of data update which keeps track of which data the process changed and how.

【０００７】しかしながら（１）の方法では、チェック
ポイントでのデータイメージを保存しなければならず、
データイメージ保存のためのメモリ領域を必要とし、又
コピーの為のオーバヘッドが大きくなってしまう。However, in the method (1), the data image at the checkpoint must be saved,
A memory area for storing the data image is required, and the overhead for copying becomes large.

【０００８】また、（２）の方法では、プロセスのデー
タ更新の履歴を取る必要があり履歴のコピーのオーバヘ
ッドは元より、復旧時にはその履歴をもとにデータを復
旧するための手間がかかる。Further, in the method (2), it is necessary to take a history of data update of the process, the overhead of copying the history is a source, and at the time of restoration, it takes time and effort to restore the data based on the history.

【０００９】[0009]

【発明が解決しようとする課題】このように従来の方法
では、データイメージの保存、プロセスのデータ更新履
歴の保存及びその履歴に基づくデータの復旧のオーバヘ
ッドが大きくなっていた。本発明は上記のような従来技
術の欠点を除去し、プロセス復旧のためのオーバヘッド
を小さくするプロセス復旧方式を提供することを目的と
するものである。As described above, in the conventional method, the overhead of saving the data image, saving the data update history of the process, and restoring the data based on the history is large. SUMMARY OF THE INVENTION It is an object of the present invention to eliminate the above-mentioned drawbacks of the prior art and provide a process restoration method that reduces the overhead for process restoration.

【００１０】[0010]

【課題を解決するための手段】上記目的を達成するため
に本発明においては、ＭＭＵ（メモリ・マネージメント
・ユニット）のライト・プロテクト機能によりプロセス
実行中の書き込みを検出し、書き込みが起こった部分の
コピーを作成し、そのコピーに対して更新を行う。ま
た、ＭＭＵのページテーブルを操作することにより、プ
ロセス実行中のチェックポイント通過時には前記コピー
をコピー元に反映し、プロセスが更新したデータを最も
最近のチェックポイント時に復旧する場合には前記コピ
ーを無効にすることによりプロセスの復旧を行う。In order to achieve the above object, in the present invention, a write protect function of an MMU (memory management unit) is used to detect writing during process execution, and to detect a portion where writing has occurred. Make a copy and make updates to it. Also, by manipulating the page table of the MMU, the copy is reflected in the copy source when the checkpoint passes during the process execution, and the copy is invalid when the data updated by the process is restored at the latest checkpoint. To restore the process.

【００１１】[0011]

【作用】このような方式においては、チェックポイント
でのファイルイメージ全体をコピーせず、更新された部
分のみをコピーする。また、復旧時には、プロセスの実
行の履歴は必要なく、コピーを無効にするのみでよいた
め、プロセス復旧のためのオーバヘッドを軽減できる。In such a system, the entire file image at the checkpoint is not copied, but only the updated part is copied. Further, at the time of restoration, the history of process execution is not required and it is only necessary to invalidate the copy, so the overhead for restoration of the process can be reduced.

【００１２】[0012]

【実施例】以下本発明の一実施例を図面を参照して説明
する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings.

【００１３】図１は本実施例の構成図を示したものであ
る。オペレーティングシステム（OS）１０１内には、書
き込みトラップ処理部１０２，チェックポイント処理部
１０３，プロセス復旧部１０４が存在する。オペレーテ
ィングシステム１０１はＭＭＵ１０５によって仮想記憶
を実現している。また、主記憶（MM）上１０６にはペー
ジテーブル１０７、コピー管理テーブル１０８がある。FIG. 1 shows a block diagram of this embodiment. The operating system (OS) 101 includes a write trap processing unit 102, a checkpoint processing unit 103, and a process restoration unit 104. The operating system 101 realizes virtual storage by the MMU 105. Further, on the main memory (MM) 106, there are a page table 107 and a copy management table 108.

【００１４】図１の動作を図２のフローを用いて説明す
る。図２（ａ）は「書き込み」、（ｂ）は「チェックポ
イント」、（ｃ）は「プロセス復旧」のイベントに対す
る処理の流れを示している。「書き込み」のイベントが
起こった場合、予めＭＭＵ１０５によりライト・プロテ
クトをかけておきＭＭＵに例外（割り込み）を発生させ
る（ステップ１）。その例外はＭＭＵ１０５からの書き
込みトラップにより書き込みトラップ処理部１０２に入
力する。書き込みトラップ処理部１０２では、コピー管
理テーブル１０８を調べコピーが存在するか否かを検出
する（ステップ２）。コピーが存在しなければ書き込み
が起こったページのコピーを作成し（ステップ３）、ペ
ージテーブル１０７を更新する（ステップ４）。この更
新はページテーブル１０７内の書き込みが起こったペー
ジを指している部分が、新たに作成したコピーを指すよ
うに変更するものであって、この後コピー管理テーブル
１０８は変更内容を保存し管理する（ステップ５）。こ
の後、ステップ６においてコピーに対し更新を行う。The operation of FIG. 1 will be described with reference to the flow of FIG. 2A shows the flow of processing for the event of "write", (b) shows "checkpoint", and FIG. 2C shows the flow of processing for "process recovery" event. When a "write" event occurs, write protection is applied in advance by the MMU 105 to generate an exception (interrupt) in the MMU (step 1). The exception is input to the write trap processing unit 102 by the write trap from the MMU 105. The write trap processor 102 checks the copy management table 108 to detect whether or not a copy exists (step 2). If there is no copy, a copy of the page in which the writing has occurred is made (step 3) and the page table 107 is updated (step 4). This update is to change the portion of the page table 107 that points to the page where the writing has occurred to point to the newly created copy. After that, the copy management table 108 stores and manages the changes. (Step 5). Then, in step 6, the copy is updated.

【００１５】次に「チェックポイント」のイベントにつ
いて説明する。チェックポイントの処理は、チェックポ
イント処理部１０３で行われる。この処理部１０３で
は、コピー管理テーブル１０８をクリアーすることによ
りコピーを有効にする。又「プロセス復旧」はプロセッ
サからの復旧命令をプロセス復旧部１０４に入力し、コ
ピー管理テーブル１０８の内容をページテーブル１０７
に書き戻すことによりコピーを無効化し、プロセスを最
も近いチェックポイントから再実行するようにする。Next, the "checkpoint" event will be described. The checkpoint processing is performed by the checkpoint processing unit 103. The processing unit 103 enables the copy by clearing the copy management table 108. In the “process recovery”, a recovery command from the processor is input to the process recovery unit 104, and the contents of the copy management table 108 are changed to the page table 107.
Disable the copy by writing it back to and restart the process from the closest checkpoint.

【００１６】図３は仮想空間及び実空間（メインメモ
リ）との対応及びページテーブル，コピー管理テーブル
の内容を示す図である。今、実空間３０２上のページ番
号Ｒａにマッピングされているページ番号Ｖａに対する
書き込みが起こり（仮想空間３０１上のａ→ａ´）実空
間３０２上のページＲａのコピーＲａ´を作成し更新し
た時点を考える。この場合、図３（ｂ）に示すようにペ
ージテーブルのページ番号はＶａからＲａ´に更新さ
れ、又コピー管理テーブルのページ番号はＶａからＲａ
に変更されている。FIG. 3 is a diagram showing the correspondence between the virtual space and the real space (main memory) and the contents of the page table and copy management table. At this point, a write operation for the page number Va mapped to the page number Ra in the real space 302 occurs (a → a ′ in the virtual space 301), and a copy Ra ′ of the page Ra in the real space 302 is created and updated. think of. In this case, the page number of the page table is updated from Va to Ra ′ as shown in FIG. 3B, and the page number of the copy management table is Va to Ra.
Has been changed to.

【００１７】以上の説明をあるプロセスＰがファイルＦ
の内容を複数ページにわたって更新する様子について詳
細に説明することにする。図４はプロセスＰの流れを示
し、チェックポイント１の通過後、チェックポイント２
の間でページａがａ´にｂがｂ´に、ｃがｃ´に更新さ
れたとする。As described above, the process P is the file F.
A detailed description will be given of the manner in which the content of is updated over a plurality of pages. FIG. 4 shows the flow of the process P. After passing through the checkpoint 1, the checkpoint 2
During this period, it is assumed that page a is updated to a ', b to b', and c to c '.

【００１８】図５はプロセスｐの実行の対象となるファ
イルＦの仮想空間及び実空間のイメージを示したもので
ある。仮想空間上のページ番号Ｐａ，Ｐｂ，Ｐｃの内容
はそれぞれａ，ｂ，ｃで、実空間のページ番号Ｒａ，Ｒ
ｂ，Ｒｃにマッピングされている。図６はファイルＦの
ページテーブルの一部を示したものである。プロセスｐ
はチェックポイント１の後、ページ番号Ｐａの内容ａを
変更してａ´にする。同様にページ番号Ｐｂの内容ｂを
ｂ´に、ページ番号Ｐｃの内容ｃをｃ´に変更しチェッ
クポイント２に達するものとする。ファイルイメージは
コミットポイント１の後ライト・プロテクトされ、それ
以降に書き込みが起こると、それを処理するトラップ・
ルーチンにより書き込みが起こったページのコピーが作
成され、ページテーブルにつながれ更新される。また、
トラップ・ルーチンは更新されたページが更新前にどの
実空間にマッピングされていたかをコピー管理テーブル
に登録する。すなわち、チェックポイント２の直前で
は、ファイルイメージは図７のようになっている。ペー
ジテーブル，コピー管理テーブルはそれぞれ図８，図９
の様になっている。その後、チェックポイント２を通過
した時点で、コピー管理テーブルをクリアすることによ
り更新が有効になる。図１０は、図４中の時刻ｔのファ
イルイメージを示したものである。図１１は時刻ｔのペ
ージテーブルである。図１２は時刻ｔのコピー管理テー
ブルである。この時刻においてデッドロックなどが起こ
った場合、コピー管理テーブルの内容をページテーブル
を書き戻す事により、ページテーブルは図６のようにな
り、コピーしたページＲａ´，Ｒｂ´は無効になる。そ
の後プロセスｐの実行をチェックポイント１から再開す
ることにより復旧が可能になる。FIG. 5 shows an image of the virtual space and the real space of the file F which is the target of execution of the process p. The page numbers Pa, Pb, and Pc in the virtual space have contents a, b, and c, respectively, and the page numbers Ra and R in the real space are included.
It is mapped to b and Rc. FIG. 6 shows a part of the page table of the file F. Process p
Changes the content a of the page number Pa to a'after the checkpoint 1. Similarly, the content b of the page number Pb is changed to b ′, the content c of the page number Pc is changed to c ′, and the checkpoint 2 is reached. The file image is write protected after commit point 1 and a trap file is processed to handle any subsequent writes.
The routine makes a copy of the page that was written to, updates the page table. Also,
The trap routine registers in the copy management table which real space the updated page was mapped to before the update. That is, immediately before the checkpoint 2, the file image is as shown in FIG. The page table and copy management table are shown in FIG. 8 and FIG. 9, respectively.
It looks like. After that, when the check point 2 is passed, the update becomes effective by clearing the copy management table. FIG. 10 shows a file image at time t in FIG. FIG. 11 is a page table at time t. FIG. 12 shows the copy management table at time t. When a deadlock or the like occurs at this time, the page table becomes as shown in FIG. 6 by writing back the contents of the copy management table to the page table, and the copied pages Ra 'and Rb' are invalidated. After that, by restarting the execution of the process p from the checkpoint 1, it becomes possible to recover.

【００１９】[0019]

【発明の効果】以上述べたきたように、本発明によれ
ば、チェックポイントでのファイルイメージ全体（領域
全体）をコピーせず、更新された部分（ページ）のみを
コピーし、また復旧時には、プロセスの実行の履歴は必
要なく、コピーを無効にするのみでよいため、プロセス
復旧のためのオーバヘッドを軽減できる。As described above, according to the present invention, the entire file image (entire area) at the checkpoint is not copied, only the updated portion (page) is copied, and at the time of restoration, Since the history of process execution is not required and only the copy is invalidated, the overhead for process recovery can be reduced.

[Brief description of drawings]

【図１】本発明の一実施例を示す構成図。FIG. 1 is a configuration diagram showing an embodiment of the present invention.

【図２】処理の流れを示すフロー図。FIG. 2 is a flowchart showing the flow of processing.

【図３】書き込みが起こった場合の状態を示す図。FIG. 3 is a diagram showing a state in which writing has occurred.

【図４】プロセスｐの実行の流れを示す図。FIG. 4 is a diagram showing a flow of execution of a process p.

【図５】ファイルＦのファイルイメージを示す図。FIG. 5 is a diagram showing a file image of file F.

【図６】ファイルＦのページテーブルの一部を示す
図。FIG. 6 is a diagram showing a part of a page table of file F.

【図７】チェックポイント２の直前のファイルイメー
ジを示す図。FIG. 7 is a diagram showing a file image immediately before checkpoint 2.

【図８】チェックポイント２の直前のページテーブル
を示す図。FIG. 8 is a diagram showing a page table immediately before checkpoint 2.

【図９】チェックポイント２の直前のコピー管理テー
ブルを示す図。FIG. 9 is a diagram showing a copy management table immediately before checkpoint 2.

【図１０】時刻ｔのファイルイメージを示す図。FIG. 10 is a diagram showing a file image at time t.

【図１１】時刻ｔのページテーブルを示す図。FIG. 11 is a diagram showing a page table at time t.

【図１２】時刻ｔのコピー管理テーブルを示す図。FIG. 12 is a diagram showing a copy management table at time t.

[Explanation of symbols]

１０１…オペレーティングシステム１０２…書き込みトラップ処理ルーチン１０３…チェックポイント処理部１０４…プロセス復旧部１０５…ＭＭＵ１０６…主記憶１０７…ページテーブル１０８…コピー管理テーブル 101 ... Operating system 102 ... Write trap processing routine 103 ... Checkpoint processing unit 104 ... Process recovery unit 105 ... MMU 106 ... Main memory 107 ... Page table 108 ... Copy management table

Claims

[Claims]

1. When a predetermined process is executed while setting check points as appropriate, and when the process cannot continue to be executed during process execution, data is collected from the time when the process cannot be continued to the closest check point. In the process recovery method that recovers and re-executes the process from this checkpoint, the memory management unit is used to detect the writing to the memory during the process execution and make a copy of the page where the writing occurred Update, and operate the page table used by the memory management unit to enable or disable the copy to the copy source, and reflect the copy to the copy source at the checkpoint during the process execution. Checkpoint the data updated by A process recovery method characterized in that the copy is invalidated when it is recovered at times.