JP4095139B2

JP4095139B2 - Computer system and file management method

Info

Publication number: JP4095139B2
Application number: JP23293097A
Authority: JP
Inventors: 秀昭平山; 敏雄白木原
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1996-09-03
Filing date: 1997-08-28
Publication date: 2008-06-04
Anticipated expiration: 2017-08-28
Also published as: JPH10133927A

Description

【０００１】
【発明の属する技術分野】
この発明は、たとえばネットワーク接続された複数のコンピュータにより構成されるネットワークコンピューティング環境などにおいて、高い信頼性を必要とするグループコンピューティング処理、データベース処理、およびトランザクション処理などに適用して好適なコンピュータシステムおよびファイル管理方法に関する。
【０００２】
【従来の技術】
ＣＰＵによって実行されるプロセスのアドレス空間やコンテクスト、およびファイルなどの状態を定期的に採取して（これをチェックポイントと称す）、障害が発生したときに、最後に採取したチェックポイントの状態を復元し、その時点からプロセスの実行を再開始するといった障害からの回復機能を有したシステムにおいては、従来より外部入出力処理に関して課題があった。すなわち、障害が発生したときに、最後に採取したチェックポイントからプロセスを再実行させる際、プロセスのアドレス空間やプロセッサのコンテクストなどの状態は復元できるが、外部入力装置の状態の復元は容易ではなかった。
【０００３】
たとえば、ファイルに対する書き込みをキャンセルすることは困難であるために、ファイルに対して書き込みを行なうときには、データをファイルに書き込む前に書き込み以前のデータを事前に読み込んで保存を行ない、その後にファイルへのデータ書き込みを行なっていた。
【０００４】
図１５は、ファイルに対する書き込みをキャンセルすることが困難なため、ファイルに対して書き込みを行なうときに、データをファイルに書き込む前に書き込み以前のデータを事前に読み込んで保存を行ない、その後にファイルへのデータ書き込みを行なう従来のシステムの仕組みを説明する図である。
【０００５】
この例では、時刻ｔ１のチェックポイントを採取した時点において“ＡＢＣＤ”の４バイトのデータからなるファイルに、時刻ｔ２において１バイト目に“Ｘ”をｗｒｉｔｅしている（１）。この場合、従来では、ファイルの１バイト目に“Ｘ”をｗｒｉｔｅする前に、ファイルの１バイト目のデータ“Ｂ”をｒｅａｄしておき（これをｕｎｄｏログとも言う）（２）、その後でファイルの１バイト目に“Ｘ”をｗｒｉｔｅしている（３）。
【０００６】
その後、時刻ｔ３において障害が発生したために、プロセスを最後に採取されたチェックポイントの状態（ｔ１）にロールバックする。ファイルは、チェックポイントｔ１以降に１バイト目が“Ｘ”に更新されているが、更新時に採取されたｕｎｄｏログを用いることにより、チェックポイントｔ１の状態を復元している。なお、このｕｎｄｏログは、次のチェックポイント時に不要となり廃棄される。
【０００７】
また、たとえば２つのコンピュータにより構築され、その一方（プライマリコンピュータ）を運用系、他方（バックアップコンピュータ）を待機系として振り分けて２重化し、プライマリコンピュータに障害が発生したときに、バックアップコンピュータが処理を引き継くことによってシステムの可用性を高めるといったシステムも存在する。そして、このようなシステムで、前述したようにチェックポイントを定期的に採取していけば、信頼性をさらに向上させることが可能となる。
【０００８】
【発明が解決しようとする課題】
この様に、プロセスのアドレス空間やコンテクスト、およびファイルなどの状態、すなわち、チェックポイントを定期的に採取していき、障害が発生したときに、最後に採取したチェックポイントの状態を復元し、その時点からプロセスの実行を再開始するといった障害からの回復機能を有したシステム（２重化されているかどうかを問わない）においては、その信頼性は向上されるが、一方で、ファイルの更新（たとえば書き込み）を行なうときに、一旦更新前のデータをファイルから読み込んで、それからファイルへの更新を行なわなければならなかったために、ファイルの更新性能を低下させるという課題があった。
【０００９】
この発明は、このような実情に鑑みてなされたものであり、チェックポイントを定期的に採取して、障害が発生したときには最後に採取したチェックポイントの状況を復元し、その時点からプロセスの実行を再開始するといった障害からの回復機能を有したシステムにおいて、ファイルの更新を行なうときに、更新前のデータをファイルから読み込むなどといったことを不要とし、ファイルの更新性能を大幅に改善することを可能とするコンピュータシステムおよびファイル管理方法を提供することを目的とする。
【００１０】
【課題を解決するための手段】
この発明のコンピュータシステムは、運用系および待機系の２つのコンピュータで２重化されたコンピュータシステムであって、中断されたプロセスの処理を再開始するための、アドレス空間とプロセッサコンテクストとを含むチェックポイント情報が保存されるチェックポイントを定期的に採取し、前記運用系および待機系双方のコンピュータ上に前記チェックポイント情報を保存するコンピュータシステムにおいて、前記運用系のコンピュータ上で実行されるプロセスによって更新されるファイルを前記運用系および待機系双方のコンピュータで２重化して設けておき、前記プロセスからファイルの更新が指示されたときに、その更新情報を前記待機系のコンピュータ上に保存して運用系のファイルのみを更新し、その更新が完了した時点でその更新の要求元に対し更新完了を通知することにより、前記アドレス空間とプロセッサコンテクストを含むチェックポイント情報に基づき最後に採取されたチェックポイントにおける状態にプロセスが復元された場合に必要となる、最後に採取されたチェックポイントにおける状態に待機系のファイルを維持する第１のファイル管理手段と、前記チェックポイントが採取される毎に、前記更新情報に示される更新内容を前記待機系のファイルに反映させることにより、待機系のファイルを当該採取されたチェックポイントにおける状態に更新する第２のファイル管理手段とを具備してなることを特徴とする。
【００１１】
この発明のコンピュータシステムにおいては、プロセスがファイルの更新を要求したときに、その更新内容を示す更新情報を取得して保存するとともに、運用系のコンピュータに配置されたファイル（運用系ファイル）のみを即座に更新して、その結果を要求元であるプロセスに返答する。そして、チェックポイントが採取された後に、その保存しておいた更新情報で示される更新内容を、待機系のコンピュータに配置されたファイル（待機系ファイル）に反映させる。
【００１２】
一方、たとえばプロセスがアボートしたときなどには、保存しておいた更新情報に基づいて、最後に採取したチェックポイント以降に更新されたデータに対応する更新前のデータを待機系ファイルからすべて読み出し、この読み出した更新前のデータを用いて運用系ファイルをチェックポイント時点に復元する。
【００１３】
すなわち、このコンピュータシステムにおいては、従来のようにファイルを更新するときに、更新前のデータを読み出して退避させておくといった処理の完了を通常処理に待機させることなく、障害時のファイルのリカバリが実現されることになり、信頼性を損なうことなくファイルの更新性能を飛躍的に向上させることが可能となる。
【００１４】
また、運用系ファイルの復元に代えて、最終のチェックポイント以前に保存された更新情報で示される更新内容すべてが反映された待機系ファイルを用いたチェックポイントからのプロセスの再実行も有効である。すなわち、運用系のコンピュータの障害などにより、運用系ファイルを用いての再開始が不可能な場合などにおける処理の継続も確保されることになり、システムの可用性を向上させることになる。また、この場合には、第３のコンピュータに新たに待機系ファイルを確保すれば、システムの可用性をさらに向上させることが可能となる。
【００１５】
【発明の実施の形態】
まず、図１を参照してこの発明の基本原理を説明する。図１に示すように、この発明のコンピュータシステムは、運用系システム１０と待機系システム２０とで多重化されたシステムを前提とする。以下にそれぞれの動作を説明する。
【００１６】
（通常処理）
（１）運用系システム１０でアプリケーションプログラム１１がＷｒｉｔｅシステムコールを発行する。
【００１７】
（２）ジャケットルーチン１２がＷｒｉｔｅシステムコールをフックし、運用系のオペレーティングシステムにＷｒｉｔｅシステムコールを発行するとともに、そのＷｒｉｔｅ要求を待機系システム２０に送信する。ただし、待機系システム２０に即座にＷｒｉｔｅ要求を送信する必要はなく、次のチェックポイントまでに送信すればよい。また、待機系システム２０では、受信したＷｒｉｔｅ要求を即座に実行するのではなく、一旦未確定キュー２１１に格納する。
【００１８】
（３）チェックポイント処理が指示されると、運用系システム１０は、溜っているＷｒｉｔｅ要求をすべて待機系システム２０に送信し終えなければならない。
【００１９】
（４）一方、待機系システム２０では、未確定キュー２１１に格納されたＷｒｉｔｅ要求を確定キュー２１２に移動する。
【００２０】
（５）この確定キュー２１２に移されたＷｒｉｔｅ要求は、待機系システム２０のオペレーティングシステムによって順次処理されていく。
【００２１】
すなわち、通常処理において発生するファイル更新においては、更新前のデータを読み出して退避させておくといった処理の完了を待機することがない。
【００２２】
（ロールバック処理）
（３）´障害が発生したようなときに、運用系システム１０および待機系システム２０の双方にロールバック処理が指示される。
【００２３】
（４）´このとき、運用系システム１０に残存するＷｒｉｔｅ要求を、すべて待機系システム２０に送信する。また、待機系の未確定キュー２１１に格納されたＷｒｉｔｅ要求は、最後のチェックポイント以降に発行されたものであるので、逆にこれを参照して待機系ファイル２３から更新前のデータを読み出し、この読み出した更新前のデータを用いて運用系ファイル１４をロールバックする。これにより、運用系ファイル１４および待機系ファイル２３の双方のファイルが最後のチェックポイント時点の状態になる。
【００２４】
（５）´そして、待機系システム２０は、未確定キュー２１１に残存するＷｒｉｔｅ要求をすべてキャンセルする。
【００２５】
これにより、チェックポイント時点からの再開始が可能となる。
【００２６】
次に、この発明の実施の形態を説明する。
【００２７】
（第１の実施形態）
まず、この発明の第１の実施形態を説明する。図２にはこの発明の第１の実施形態に係るコンピュータシステムのシステム構成が示されている。図２に示したように、本実施形態のコンピュータシステムは、コンピュータがプライマリコンピュータ３０と、バックアップコンピュータ４０とで２重化されており、これらはネットワーク５０で接続されている。このプライマリコンピュータ３０とバックアップコンピュータ４０とは、前述した運用系システム１０および待機系システム２０双方をそれぞれに備えており、いずれかで運用系システム１０が動作するときに、他方では待機系システム２０が動作する。ここでは、プライマリコンピュータ３０側で運用系システム１０、バックアップコンピュータ４０側で待機系システム２０をそれぞれ説明する。
【００２８】
プロセス３５は、プライマリコンピュータ３０上で実行され、プライマリファイル３９とバックアップファイル４１とで２重化されたファイルを更新する。ここで、プライマリファイル３９はプライマリコンピュータ３０上に、バックアップファイル４１はバックアップコンピュータ４０上に配置され、プライマリコンピュータ３０上のファイルシステム３６およびバックアップコンピュータ４０上のファイルシステム４８を介して更新される。
【００２９】
プライマリコンピュータ３０上のファイルシステム３６は、プライマリファイル操作部３８とプライマリファイル復元部３７とを含んでいる。一方、バックアップコンピュータ４０上のファイルシステム４８は、バックアップファイル操作部４３、未確定キュー４３１、確定キュー４３２、バックアップファイル更新部４４およびプライマリファイル復元情報読み出し部４２を含んでいる。
【００３０】
プロセス３５がこの２重化されたファイルを更新する場合、プライマリファイル操作部３８およびバックアップファイル操作部４３を経由して行なう。プロセス３５がこの２重化されたファイルに対応するｗｒｉｔｅを行なうと、プライマリファイル３９は、そのまま即座に更新されるが、バックアップファイル４１はその時点では更新されずに、「ファイル書き込み情報」がバックアップファイル操作部４３を経由して、バックアップコンピュータ４０上の未確定キュー４３１に保存される。
【００３１】
また、プロセス３５がチェックポイントを採取する場合には、チェックポイント制御部３１が、チェックポイント情報保存部３２とプライマリファイル操作部３８とにその指示を出す。チェックポイント情報保存部３２は、チェックポイント採取の指示を受け取ると、チェックポイント情報（アドレス空間とプロセッサコンテクスト）をプライマリコンピュータ３０上およびバックアップコンピュータ４０上に保存する（プライマリコンピュータ３０上のチェックポイント情報３４およびバックアップコンピュータ４０上のチェックポイント情報４５）。
【００３２】
一方、プライマリファイル操作部３８は、チェックポイント採取の指示を受け取ると、バックアップファイル操作部４３を経由して、未確定キュー４３１に保存されていた「ファイル書き込み情報」を確定キュー４３２に移動させる。この確定キュー４３２に移動された「ファイル書き込み情報」は、チェックポイント採取後に、バックアップファイル更新部４４によってバックアップファイル４１の更新のために使用され、バックアップファイル４１の更新後に廃棄される。これにより、チェックポイント以降にプライマリファイル３９に対して行なわれたものと同じｗｒｉｔｅ操作が、バックアップファイル４１に対しても行なわれることになる。
【００３３】
プロセス３５がアボートなどの障害を発生させ、プロセス３５をプライマリコンピュータ３０上で最後に採取したチェックポイントから再実行する場合、アドレス空間とプロセッサコンテクストとは、プライマリコンピュータ３０上のチェックポイント情報復元部３７によって復元される。
【００３４】
ファイルに関しては、バックアップファイル４１は、チェックポイント以降の更新は未だ未確定キュー４３１に「ファイル書き込み情報」が保存されているだけであり、実際には更新されていないので復元は不要である。しかしながら、プライマリファイル３９は、チェックポイント以降にすでに更新が行なわれているので復元が必要である。したがって、未確定キュー４３１に保存された「ファイル書き込み情報」に基づき、プライマリファイル３９の更新前データをバックアップファイル４１からｒｅａｄし、このｒｅａｄした更新前データをプライマリファイル３９にｗｒｉｔｅすることによって復元する。そして、この後、未確定キュー４３１に保存された「ファイル書き込み情報」を廃棄する。なお、確定キュー４３２に「ファイル書き込み情報」が保存されている場合には、その「ファイル書き込み情報」のバックアップファイル４１への反映が完了した後に、前述した復元処理を開始する。
【００３５】
一方、プライマリコンピュータ３０またはプライマリコンピュータ３０を制御するオペレーティングシステムがシステムダウンなどの障害を発生させ、プロセス３５をバックアップコンピュータ４０上で最後に採取したチェックポイントから再実行する場合には、アドレス空間とプロセッサコンテクストとは、チェックポイント情報復元部４６によってプロセス４７に復元される。
【００３６】
ファイルに関しては、バックアップファイル４１は、チェックポイント以降の更新は未だ未確定キュー４３１に「ファイル書き込み情報」が保存されているだけであり、実際には更新されていないので復元は不要である。
【００３７】
なお、この「ファイル書き込み情報」のプライマリコンピュータ３０からバックアップコンピュータ４０への転送については最適化が可能である。障害が発生したときに、プライマリコンピュータ３０がダウンしなかった場合は、プライマリファイル３９を復元し、プライマリファイル３９を用いてチェックポイントからの処理を再開する。一方、障害が発生したときに、プライマリコンピュータ３０がダウンした場合には、バックアップファイル４１を用いてチェックポイントから処理を再開する。
【００３８】
それゆえに、「ファイル書き込み情報」は、プライマリファイル操作部３８からバックアップファイル操作部４３に即時に送る必要はない。すなわち、これらの「ファイル書き込み情報」は、次のチェックポイントまでに送ればよいので、転送効率を考慮すると、一旦プライマリファイル操作部３８において蓄積しておき、「一定容量蓄積された」、「一定時間経過した」および「チェックポイント採取が要求された」といった事象の発生をトリガとして、バックアップファイル操作部４３にまとめて送るということが可能である。
【００３９】
図３には、本実施形態を適用するコンピュータシステムの概略構成が示されている。コンピュータはプライマリコンピュータ３０とバックアップコンピュータ４０とで２重化されており、プライマリコンピュータ３０にはディスク装置６０ａが、バックアップコンピュータ４０にはディスク装置６０ｂがそれぞれ接続されている。プロセス３５はプライマリコンピュータ上で実行され、また、このプロセス３５がアクセスするファイルは、プライマリファイル３９とバックアップファイル４１とで２重化されており、各々ディスク装置６０ａとディスク装置６０ｂとに配置されている。
【００４０】
そして、チェックポイントは、チェックポイント情報をプライマリコンピュータ３０側（プライマリチェックポイント情報３４）と、バックアップコンピュータ４０側（バックアップチェックポイント情報４５）の両方に保持する。なお、この図では、チェックポイントをディスク装置上に保持しているが、メモリ上に保持しても構わない。
【００４１】
もし、プライマリコンピュータ３０またはプライマリコンピュータ３０を制御するオペレーティングシステムにシステムダウンなどの障害が発生した場合には、バックアップコンピュータ４０側でチェックポイント情報４５を用いてプロセス４７を再実行する。この場合プロセス４７は、バックアップファイル４１を使用することになる。
【００４２】
また、プライマリファイル３９またはバックアップファイル４１を複数個持ち、３重化以上のファイルシステムを作ることも可能である。この場合、たとえば３重化ファイルシステムならば、
（１）２個のプライマリファイルと１個のバックアップファイル
（２）１個のプライマリファイルと２個のバックアップファイル
といった組み合わせが考えられる。
【００４３】
図４は、本実施形態においてファイルを更新する様子を示す図である。この例では、プライマリコンピュータ３０上で動くプロセス３５が、４バイトのデータ“ＡＢＣＤ”を持つ２重化されたファイル（プライマリコンピュータ３０上のプライマリファイル３９と、バックアップコンピュータ４０上のバックアップファイル４１）に対し、時刻ｔ１において１バイト目に“Ｘ”をｗｒｉｔｅしている（１）。これによってプライマリファイル３９は即時に更新されるが、バックアップファイル４１は即時には更新されずに、「ファイル書き込み情報」のみを保存している。
【００４４】
この後、時刻ｔ２においてチェックポイントが採取されることによって、先程の「ファイル書き込み情報」の実行が確定する（２）。そして時刻ｔ２以降で、確定された「ファイル書き込み情報」に基づいて、バックアップファイル４１の更新を実行している。
【００４５】
図５は、本実施形態において障害発生時にプライマリファイルを復元する様子を示す図である。この例では、プライマリコンピュータ３０上で動くプロセス３５が、４バイトのデータ“ＡＢＣＤ”を持つ２重化されたファイル（プライマリコンピュータ３０上のプライマリファイル３９と、バックアップコンピュータ４０上のバックアップファイル４１）に対し、時刻ｔ１において１バイト目に“Ｘ”をｗｒｉｔｅしている（１）。これによってプライマリファイル３９は即時に更新されるが、バックアップファイル４１は即時には更新されずに、「ファイル書き込み情報」のみを保存している。
【００４６】
この後、時刻ｔ２において障害が発生している（２）。すなわち、時刻ｔ１おける「ファイル書き込み情報」でプライマリファイル３９は更新されているため復元の必要があるが、バックアップファイル４１は未だ更新されていないため復元の必要がない。ここで時刻ｔ１において保存された「ファイル書き込み情報」によって、プライマリファイル３９の更新部分がかわる。そこで、プライマリファイル３９の復元においては、未確定の「ファイル書き込み情報」に示された位置のデータをバックアップファイル４１からｒｅａｄし、そのｒｅａｄしたデータをプライマリファイル３９にｗｒｉｔｅすることによって、プライマリファイル３９を復元する。
【００４７】
そして、プライマリコンピュータ３０上で取られているチェックポイントを用いて、プライマリコンピュータ３０上でプロセス３５を再実行している。この再実行されたプロセス３５は、復元されたプライマリファイル３９を使用する。
【００４８】
図６は、ファイル操作部が「ファイル書き込み」を指示されたときの処理の流れを示すフローチャートである。この場合、まず、「ファイル書き込み情報」を保存し、未確定キュー４３１にリンクする（ステップＡ１）。次に、「ファイル書き込み情報」にしたがって、プライマリファイル３９の更新を行なう（ステップＡ２）。この時点で、「ファイル書き込み」操作は完了したとして、要求側に完了通知を行なう（ステップＡ３）。
【００４９】
図７は、ファイル操作部が「チェックポイント採取」を指示されたときの処理の流れを示すフローチャートである。この場合、保存されている「ファイル書き込み情報」を未確定キュー４３１から確定キュー４３２に移動する（ステップＢ１）。
【００５０】
図８は、バックアップファイル更新部の処理の流れを示すフローチャートである。この場合、まず、確定キュー４３２に「ファイル書き込み情報」がリンクされているかどうかを検査する（ステップＣ１）。もし、リンクされていない場合（ステップＣ１のＮ）、バックアップファイル更新部４４は、この検査を続行する。一方、リンクされている場合には（ステップＣ１のＹ）、確定キュー４３２にリンクされている「ファイル書き込み情報」に基いて、バックアップファイル４１を更新する（ステップＣ２）。そして、実行した「ファイル書き込み情報」を確定キュー４３２からはずす（ステップＣ３）。
【００５１】
図９は、プロセス３５にアボートなどの障害が発生し、プロセス３５をプライマリコンピュータ３０上で最後に採取したチェックポイントから再実行する場合の処理の流れを示すフローチャートである。
【００５２】
プロセス３５に障害が発生すると、まず、プライマリコンピュータ３０上のチェックポイント情報復元部３３に、「アドレス空間とプロセッサコンテクストとの復元を指示する（ステップＤ１）。次に、プライマリファイル復元部３３に、「プライマリファイルの復元」を指示する（ステップＤ２）。
【００５３】
図１０は、プライマリコンピュータ３０上のチェックポイント情報復元部が「アドレス空間とプロセッサコンテクストの復元」を指示された場合の処理の流れを示すフローチャートである。この場合、まず、プロセス３５のアドレス空間を復元する（ステップＥ１）。次に、プロセス３５のチェックポイント採取時のプロセッサコンテクストの状態を復元する（ステップＥ２）。
【００５４】
図１１は、プライマリファイル復元部３７が、「プライマリファイルの復元」を指示された場合の処理の流れを示すフローチャートである。この場合、まず、未確定キュー４３１に、「ファイル書き込み情報」がリンクされているかどうかを検査する（ステップＦ１）。「ファイル書き込み情報」がリンクされている場合には（ステップＦ１のＹ）未確定キュー４３１にリンクされている「ファイル書き込み情報」にしたがって、プライマリファイル３９の中の更新されている部分のデータをバックアップファイル４１からｒｅａｄし、そのＲｅａｄしたデータをプライマリファイル３９にｗｒｉｔｅすることにより、プライマリファイル３９のその更新されている部分のデータを復元する（ステップＦ２）。そして、復元に利用した「ファイル書き込み情報」を、未確定キュー４３１からはずす（廃棄する）（ステップＦ３）。この処理は、未確定キュー４３１にリンクた「ファイル書き込み情報」が無くなるまで繰り返される。
【００５５】
プライマリコンピュータ３０またはプライマリコンピュータ３０を制御するオペレーティングシステムにシステムダウンなどの障害が発生した場合には、プロセス３５をバックアップコンピュータ４０上で最後に採取したチェックポイントから再実行する。この場合は、バックアップファイル４１で処理を引き継ぐ。図１２は、障害が発生したときに、バックアップファイル４１で処理を引き継ぐ様子を示す図である。
【００５６】
この例では、プライマリコンピュータ３０上で動作するプロセス３５が、４バイトのデータ“ＡＢＣＤ”を持つ２重化されたファイル（プライマリコンピュータ３０上のプライマリファイル３９と、バックアップコンピュータ４０上のバックアップファイル４１）に対し、時刻ｔ１において１バイト目に“Ｘ”をｗｒｉｔｅしている（１）。これによってプライマリファイル３９は即時に更新されるが、バックアップファイル４１は即時には更新されずに、「ファイル書き込み情報」のみを保存している。
【００５７】
この後、時刻ｔ２においてプライマリコンピュータ３０に障害が発生している（２）。この場合、バックアップコンピュータ４０上に取られたチェックポイントを用いて、バックアップコンピュータ４０上でプロセス４７を再実行している。このとき、プロセス４７は、バックアップファイル４１を用いて処理を継続するわけだが、時刻ｔ１においてプライマリファイル３９は更新されているが、バックアップファイル４１は未だ更新されていないので、バックアップコンピュータ４０上でのプロセス４７の再実行においては、バックアップファイル４２がそのまま使用できる。
【００５８】
なお、障害発生によりバックアップファイルを切り離した場合には、その後に新たなバックアップファイルを作成することによって、再び図１の様な初期状態を再現することができ、再度の障害発生に対しても回復処理が可能となる。
【００５９】
また、障害発生によってバックアップファイルで処理を引き継ぎ、チェックポイントから処理を再実行した場合には、その後、バックアップファイルをプライマリファイルとして新たなバックアップファイルを作成することにより、再び図１の様な初期状態を再現することができ、再度の障害発生に対しても回復処理が可能となる。この再度バックアップファイルを作成する場合には、以下の様な２つの方法がある。
【００６０】
（１）バックアップファイル切り離し後のプライマリファイルの更新情報とデータとを保存しておき、バックアップファイルを再接続する場合には、バックアップファイルに前記切り離し後のプライマリファイルの更新情報とデータとを反映させる。
【００６１】
（２）プライマリファイルをバックアップファイルにコピーする。ただし、コピー中にもプライマリファイルが更新され続けている場合には、コピーを始めると同時にファイルの更新情報とデータとをバックアップファイルにも反映させる。
【００６２】
さらに、この２つの方法を組み合わせた以下の様な方法も有効である。
【００６３】
（３）切り離されたバックアップファイル（あるいは障害発生前のプライマリファイル）を再接続することを前提に、一定時間が経過するまでは（１）の方法が取れる様に、バックアップファイル切り離し後のプライマリファイルの更新情報とデータとを保存しておく。一定時間を経過したら、（１）の方法は締め、バックアップファイル切り離し後のプライマリファイルの更新情報とデータとの保存は止めて、（２）の方法を取るようにする。また、切り離されたバックアップファイル以外のファイルで再接続する場合にも、バックアップファイル切り離し後のプライマリファイルの更新情報とデータとの保存は止めて、（２）の方法を取る。
【００６４】
（第２の実施形態）
次に、この発明の第２の実施形態を説明する。第１の実施形態では、２重化されたコンピュータシステムを説明したが、この発明は、２重化されていないコンピュータ上のファイルシステムに適用しても効果がある。そこで、本実施形態では、２重化されていないコンピュータ上のファイルシステムに適用した場合を例に説明する。図１３は、この発明を２重化されていないコンピュータ上のファイルシステムに適用した場合の構成図である。このシステムでは、コンピュータは２重化されておらず、コンピュータ３０のみが存在する。プロセス３５は、このコンピュータ３０上で実行され、プライマリファイル３９とバックアップファイル４１とで２重化されたファイルを更新する。すなわち、これらブライマリファイル３９およびバックアップファイル４１は、共にコンピュータ３０上に配置され、ファイルシステム３６を介して更新される。
【００６５】
コンピュータ３０上のファイルシステム３６は、プライマリファイル操作部３８、プライマリファイル復元部３７、バックアップファイル操作部４３、未確定キュー４３１、確定キュー４３２、バックアップファイル更新部４４およびプライマリファイル復元情報読み出し部４２を含んでいる。
【００６６】
プロセス３５がこの２重化されたファイルを更新するときは、プライマリファイル操作部３８およびバックアップファイル操作部４３を経由して行なう。プロセス３５がこの２重化されたファイルに対するｗｒｉｔｅを行なうと、プライマリファイル３９はそのまま更新されるが、バックアップファイル４１は更新されずに、「ファイル書き込み情報」がバックアップファイル操作部４３を経由して未確定キュー４３１に保存される。
【００６７】
また、プロセス３５がチェックポイントを採取するときには、チェックポイント制御部３１が、チェックポイント情報保存部３２とプライマリファイル操作部４３に指示を出す。チェックポイント情報保存部３２はチェックポイント採取の指示を受けると、アドレス空間とプロセッサコンテクストとをコンピュータ３０上に行なう（チェックポイント情報３４）。
【００６８】
一方、プライマリファイル操作部３８は、チェックポイント採取の指示を受けると、バックアップファイル操作部４３を経由して、未確定キュー４３１に保存されていた「ファイル書き込み情報」を確定キュー４３２に移動させる。確定キュー４３２に移動された「ファイル書き込み情報」は、チェックポイント採取後に、バックアップファイル更新部４４によってバックアップファイル４１の更新のために使用され、バックアップファイル４１の更新後に廃棄される。これにより、チェックポイント以降にプライマリファイル３９に対して行なわれたのと同じように、ｗｒｉｔｅ操作がバックアップファイル４１に対して行なわれる。
【００６９】
プロセス３５にアボートなどの障害が発生し、プロセス３５をコンピュータ３０上で最後に採取したチェックポイントから再実行する場合、アドレス空間とプロセッサコンテクストは、コンピュータ３０上のチェックポイント情報復元部３３によって復元される。
【００７０】
ファイルに関しては、バックアップファイル４１は、チェックポイント以降の更新が未だ未確定キュー４３１に「ファイル書き込み情報」が保存されているだけであり、実際には更新されていないので復元は不要である。しかしながら、プライマリファイル３９は、チェックポイント以降にすでに更新が行なわれているので復元が必要である。したがって、未確定キュー４３１に保存された「ファイル書き込み情報」に基づき、プライマリファイル３９の更新前データをバックアップファイル４１からｒｅａｄし、このＲｅａｄした更新前データをプライマリファイル３９にｗｒｉｔｅすることによって復元する。そして、この後、未確定キュー４３１に保存された「ファイル書き込み情報」を廃棄する。なお、確定キュー４３２に「ファイル書き込み情報」が保存されている場合には、その「ファイル書き込み情報」のバックアップファイル４１への反映が完了した後に、前述した復元処理を開始する。
【００７１】
図１４には、本実施形態を適用するコンピュータシステムの概略構成が示されている。本実施形態のシステムはコンピュータ３０のみで稼働し２重化されていない。コンピュータ３０にはディスク装置６０ａとディスク装置６０ｂとが接続されている。プロセス３５はコンピュータ３０上で実行され、また、このプロセス３５がアクセスするファイルは、プライマリファイル３９とバックアップファイル４１とで２重化されており、各々ディスク装置６０ａとディスク装置６０ｂとに配置されている。
【００７２】
このように、この発明を適用することにより、プロセスのアドレス空間やプロセッサのコンテクストなどの状態（チェックポイント情報）を定期的に保存しながら実行を続け、障害が発生したときには最後に保存したチェックポイントからプロセスを再実行させることによる障害時対策を施したシステムにおいて、ファイルの更新を行なう際に、一旦更新前データをファイルから読み込む必要がなくなるため、ファイルの更新性能が大幅に改善される。
【００７３】
なお、前述の実施形態に記載したファイルの管理方法は、コンピュータに実行させることのできるプログラムとしてフロッピィディスク、光ディスクおよび半導体メモリなどの記録媒体に格納して頒布することが可能である。
【００７４】
【発明の効果】
以上詳述したように、この発明によれば、プロセスがファイルの更新を要求したときに、その更新内容を示す更新情報を取得して保存するとともにプライマリファイルのみを即座に更新し、チェックポイントが採取された後に、その保存しておいた更新情報で示される更新内容をバックアップファイルに反映させる。そして、たとえばプロセスがアボートしたときなどには、保存しておいた更新情報に基づいて、最後に採取したチェックポイント以降に更新されたデータに対応する更新前のデータをバックアップファイルからすべて読み出し、この読み出した更新前のデータを用いてプライマリファイルをチェックポイント時点に復元し、プロセスの再実行を開始する（バックアップファイルを用いたプロセスの再実行の開始も可能）。
【００７５】
すなわち、このコンピュータシステムにおいては、従来のようにファイルを更新するときに、更新前のデータを読み出して退避させておくといった処理の完了を通常処理に待機させることなく、障害時のファイルのリカバリが実現されることになり、信頼性を損なうことなくファイルの更新性能を飛躍的に向上させることが可能となる。
【図面の簡単な説明】
【図１】この発明の基本原理を説明するための概念図。
【図２】この発明の第１の実施形態に係るコンピュータシステムのシステム構成を示す図。
【図３】同実施形態を適用するコンピュータシステムの概略構成を示す図。
【図４】同実施形態においてファイルを更新する様子を示す図。
【図５】同実施形態において障害発生時にプライマリファイルを復元する様子を示す図。
【図６】同実施形態のファイル操作部が「ファイル書き込み」を指示されたときの処理の流れを示すフローチャート。
【図７】同実施形態のファイル操作部が「チェックポイント採取」を指示されたときの処理の流れを示すフローチャート。
【図８】同実施形態のバックアップファイル更新部の処理の流れを示すフローチャート。
【図９】同実施形態のプロセスにアボートなどの障害が発生し、プロセスをプライマリコンピュータ３０上で最後に採取したチェックポイントから再実行する場合の処理の流れを示すフローチャート。
【図１０】同実施形態のプライマリコンピュータ上のチェックポイント情報復元部が「アドレス空間とプロセッサコンテクストとの復元」を指示された場合の処理の流れを示すフローチャート。
【図１１】同実施形態のプライマリファイル復元部が「プライマリファイルの復元」を指示された場合の処理の流れを示すフローチャート。
【図１２】同実施形態の障害が発生したときにバックアップファイルで処理を引き継ぐ様子を示す図。
【図１３】この発明の第２の実施形態に係るコンピュータシステムのシステム構成を示す図。
【図１４】同実施形態を適用するコンピュータシステムの概略構成を示す図。
【図１５】従前のファイルに対する書き込みをキャンセルすることが困難なため、ファイルに対して書き込みを行なうときに、データをファイルに書き込む前に書き込み以前のデータを事前に読み込んで保存を行ない、その後にファイルへのデータ書き込みを行なう従来のシステムの仕組みを説明する図。
【符号の説明】
１０…運用系システム、１１…アプリケーションプログラム、１２…ジャケットルーチン、１３…ＯＳバッファキャッシュ、１４…ディスク装置、２０…待機系システム、２１デーモン、２１１…未確定キュー、２１２…確定キュー、２２…ＯＳバッファキャッシュ、２３…ディスク装置、３０…プライマリコンピュータ、３１…チェックポイント制御部、３２…チェックポイント情報保存部、３３…チェックポイント情報復元部、３４…チェックポイント情報、３５…プロセス、３６…ファイルシステム、３７…プライマリファイル復元部、３８…プライマリファイル操作部、３９…プライマリファイル、４０…バックアップコンピュータ、４１…バックアップファイル、４２…プライマリファイル復元情報読み出し部、４３…バックアップファイル操作部、４３１…未確定キュー、４３２…確定キュー、４４…バックアップファイル更新部、４５…チェックポイント情報、４６…チェックポイント情報復元部、４７…プロセス、５０…ネットワーク、６０ａ，６０ｂ…ディスク装置。[0001]
BACKGROUND OF THE INVENTION
The present invention is a computer system suitable for application to group computing processing, database processing, transaction processing, etc. that require high reliability in a network computing environment composed of a plurality of computers connected to a network, for example. And a file management method.
[0002]
[Prior art]
Periodically collect the address space, context, and file status of processes executed by the CPU (this is called a checkpoint), and restore the last collected checkpoint status when a failure occurs However, in a system having a recovery function from a failure such as restarting the execution of a process from that point, there has been a problem regarding external input / output processing. In other words, when a failure occurs, when the process is re-executed from the last collected checkpoint, the state of the process address space and processor context can be restored, but the state of the external input device cannot be easily restored. It was.
[0003]
For example, since it is difficult to cancel writing to a file, when writing to a file, before writing the data to the file, read and save the data before writing, and then write to the file Data was being written.
[0004]
In FIG. 15, since it is difficult to cancel writing to the file, when writing to the file, before writing the data to the file, the data before writing is read in advance and saved, and then to the file. It is a figure explaining the mechanism of the conventional system which performs data writing of.
[0005]
In this example, “X” is written to the first byte at time t2 in the file consisting of 4-byte data “ABCD” at the time when the checkpoint at time t1 is taken (1). In this case, conventionally, before writing “X” to the first byte of the file, the data “B” of the first byte of the file is read (this is also referred to as undo log) (2), and thereafter "X" is written to the first byte of the file (3).
[0006]
Thereafter, because a failure has occurred at time t3, the process is rolled back to the last checkpoint state (t1). The first byte of the file is updated to “X” after checkpoint t1, but the state of checkpoint t1 is restored by using an undo log collected at the time of update. Note that this undo log becomes unnecessary and discarded at the next checkpoint.
[0007]
Also, for example, it is constructed by two computers, and one (primary computer) is assigned as the active system and the other (backup computer) is assigned as the standby system, and the backup computer performs processing when a failure occurs in the primary computer. Some systems increase the availability of the system by taking over. Then, with such a system, if checkpoints are periodically collected as described above, the reliability can be further improved.
[0008]
[Problems to be solved by the invention]
In this way, the status of the process address space, context, and files, that is, checkpoints are collected periodically, and when a failure occurs, the last collected checkpoint state is restored. In a system with a function of recovering from a failure such as restarting the execution of a process from the point in time (whether it is duplicated or not), its reliability is improved, but on the other hand, file update ( For example, when data is written), the data before update must be read once from the file and then updated to the file, which causes a problem of reducing the file update performance.
[0009]
The present invention has been made in view of such circumstances, and periodically collects checkpoints, restores the status of the last collected checkpoint when a failure occurs, and executes the process from that point. In a system that has a recovery function from failure such as restarting, it is not necessary to read the data before update from the file when updating the file, and greatly improve the file update performance. It is an object of the present invention to provide a computer system and a file management method that make possible.
[0010]
[Means for Solving the Problems]
The computer system of the present invention is a computer system that is duplicated by two computers of an active system and a standby system, and is interrupted. Process To restart the process Checkpoint information including address space and processor context is stored Checkpoints are periodically collected and placed on both the active and standby computers. The checkpoint information In the computer system to be stored, a file to be updated by a process executed on the active computer is provided in duplicate on both the active computer and the standby computer, and the process is instructed to update the file. The update information is saved on the standby computer, and only the active file is updated. When the update is completed, the update request source is notified of the update completion. Thus, it is necessary when the process is restored to the state at the last collected checkpoint based on the checkpoint information including the address space and the processor context. First file management to maintain files Means and the checkpoint is taken Every time Reflecting the update contents indicated in the update information in the standby file The second file management for updating the standby file to the state at the collected checkpoint Means.
[0011]
In the computer system of the present invention, when a process requests a file update, the update information indicating the update contents is acquired and stored, and only the file (operating file) placed on the operating computer is stored. Update immediately and return the result to the requesting process. After the checkpoint is collected, the update content indicated by the stored update information is reflected in the file (standby file) arranged in the standby computer.
[0012]
On the other hand, for example, when the process is aborted, all the pre-update data corresponding to the data updated after the last collected checkpoint is read from the standby file based on the saved update information, Using the read data before update, the operational file is restored to the checkpoint time.
[0013]
In other words, in this computer system, when a file is updated as in the prior art, file recovery in the event of failure can be performed without waiting for normal processing to complete processing such as reading and saving data before update. As a result, the file update performance can be dramatically improved without degrading the reliability.
[0014]
Instead of restoring the active file, it is also effective to re-execute the process from the checkpoint using the standby file that reflects all the update contents indicated by the update information saved before the last checkpoint. . In other words, continuation of processing is ensured in the case where restart using the operation system file is impossible due to a failure of the operation system computer, etc., and the system availability is improved. In this case, if a standby file is newly secured in the third computer, the availability of the system can be further improved.
[0015]
DETAILED DESCRIPTION OF THE INVENTION
First, the basic principle of the present invention will be described with reference to FIG. As shown in FIG. 1, the computer system of the present invention is premised on a system that is multiplexed with an active system 10 and a standby system 20. Each operation will be described below.
[0016]
(Normal processing)
(1) In the operational system 10, the application program 11 issues a Write system call.
[0017]
(2) The jacket routine 12 hooks the Write system call, issues a Write system call to the operating operating system, and transmits the Write request to the standby system 20. However, it is not necessary to immediately send a write request to the standby system 20, and it may be sent by the next checkpoint. Further, in the standby system 20, the received Write request is not immediately executed, but temporarily stored in the indeterminate queue 211.
[0018]
(3) When the checkpoint process is instructed, the active system 10 must finish transmitting all the accumulated write requests to the standby system 20.
[0019]
(4) On the other hand, in the standby system 20, the write request stored in the unconfirmed queue 211 is moved to the confirmed queue 212.
[0020]
(5) Write requests transferred to the confirmation queue 212 are sequentially processed by the operating system of the standby system 20.
[0021]
That is, in the file update that occurs in the normal process, there is no waiting for the completion of the process of reading and saving the data before the update.
[0022]
(Rollback processing)
(3) When a failure occurs, rollback processing is instructed to both the active system 10 and the standby system 20.
[0023]
(4) ′ At this time, all the Write requests remaining in the active system 10 are transmitted to the standby system 20. Also, since the Write request stored in the standby unconfirmed queue 211 is issued after the last checkpoint, the data before update is read from the standby file 23 by referring to this, The active file 14 is rolled back using the read data before update. As a result, both the active file 14 and the standby file 23 are in the state at the time of the last checkpoint.
[0024]
(5) ′ The standby system 20 then cancels all write requests remaining in the indeterminate queue 211.
[0025]
Thereby, it is possible to restart from the check point.
[0026]
Next, an embodiment of the present invention will be described.
[0027]
(First embodiment)
First, a first embodiment of the present invention will be described. FIG. 2 shows a system configuration of a computer system according to the first embodiment of the present invention. As shown in FIG. 2, in the computer system of this embodiment, the computer is duplicated by a primary computer 30 and a backup computer 40, and these are connected by a network 50. Each of the primary computer 30 and the backup computer 40 includes both the operation system 10 and the standby system 20 described above. When the operation system 10 operates in either one, the standby system 20 is on the other side. Operate. Here, the active system 10 will be described on the primary computer 30 side, and the standby system 20 will be described on the backup computer 40 side.
[0028]
The process 35 is executed on the primary computer 30 and updates a duplicated file of the primary file 39 and the backup file 41. Here, the primary file 39 is arranged on the primary computer 30 and the backup file 41 is arranged on the backup computer 40, and is updated via the file system 36 on the primary computer 30 and the file system 48 on the backup computer 40.
[0029]
The file system 36 on the primary computer 30 includes a primary file operation unit 38 and a primary file restoration unit 37. On the other hand, the file system 48 on the backup computer 40 includes a backup file operation unit 43, an unconfirmed queue 431, a confirmed queue 432, a backup file update unit 44, and a primary file restoration information reading unit 42.
[0030]
When the process 35 updates the duplicated file, it is performed via the primary file operation unit 38 and the backup file operation unit 43. When the process 35 performs a write corresponding to the duplicated file, the primary file 39 is immediately updated as it is, but the backup file 41 is not updated at that time, and the “file write information” is backed up. The file is stored in an indeterminate queue 431 on the backup computer 40 via the file operation unit 43.
[0031]
When the process 35 collects checkpoints, the checkpoint control unit 31 gives an instruction to the checkpoint information storage unit 32 and the primary file operation unit 38. Upon receiving the checkpoint collection instruction, the checkpoint information storage unit 32 stores the checkpoint information (address space and processor context) on the primary computer 30 and the backup computer 40 (checkpoint information 34 on the primary computer 30). And checkpoint information 45 on the backup computer 40).
[0032]
On the other hand, when receiving the checkpoint collection instruction, the primary file operation unit 38 moves the “file write information” stored in the unconfirmed queue 431 to the confirmation queue 432 via the backup file operation unit 43. The “file writing information” moved to the confirmation queue 432 is used for updating the backup file 41 by the backup file updating unit 44 after collecting the checkpoint, and discarded after the backup file 41 is updated. As a result, the same write operation as that performed on the primary file 39 after the checkpoint is also performed on the backup file 41.
[0033]
When the process 35 generates a failure such as an abort and re-executes the process 35 from the last checkpoint collected on the primary computer 30, the address space and the processor context are the checkpoint information restoration unit 37 on the primary computer 30. Restored by.
[0034]
Regarding the file, the backup file 41 is not yet restored since the “file write information” is only stored in the undetermined queue 431 and the update after the check point is not yet updated. However, since the primary file 39 has already been updated after the checkpoint, it needs to be restored. Therefore, based on the “file write information” stored in the indeterminate queue 431, the pre-update data of the primary file 39 is read from the backup file 41, and the read pre-update data is written to the primary file 39 to be restored. . Thereafter, the “file write information” stored in the unconfirmed queue 431 is discarded. If “file write information” is stored in the confirmation queue 432, the restoration process described above is started after the reflection of the “file write information” to the backup file 41 is completed.
[0035]
On the other hand, when the primary computer 30 or the operating system that controls the primary computer 30 causes a failure such as a system failure and the process 35 is re-executed from the last checkpoint collected on the backup computer 40, the address space and the processor The context is restored to the process 47 by the checkpoint information restoration unit 46.
[0036]
Regarding the file, the backup file 41 is not yet restored since the “file write information” is only stored in the undetermined queue 431 and the update after the check point is not yet updated.
[0037]
The transfer of the “file writing information” from the primary computer 30 to the backup computer 40 can be optimized. If the primary computer 30 does not go down when a failure occurs, the primary file 39 is restored and processing from the checkpoint is resumed using the primary file 39. On the other hand, if the primary computer 30 goes down when a failure occurs, the backup file 41 is used to resume processing from the checkpoint.
[0038]
Therefore, the “file write information” does not need to be sent immediately from the primary file operation unit 38 to the backup file operation unit 43. In other words, these “file write information” may be sent until the next check point. Therefore, in consideration of transfer efficiency, the “file write information” is temporarily stored in the primary file operation unit 38, and is stored “fixed capacity”, “fixed capacity”. It is possible to send to the backup file operation unit 43 collectively using the occurrence of events such as “time has passed” and “checkpoint collection requested” as a trigger.
[0039]
FIG. 3 shows a schematic configuration of a computer system to which this embodiment is applied. The computer is duplicated by a primary computer 30 and a backup computer 40. A disk device 60a is connected to the primary computer 30, and a disk device 60b is connected to the backup computer 40, respectively. The process 35 is executed on the primary computer, and the files accessed by the process 35 are duplicated by the primary file 39 and the backup file 41, and are arranged in the disk device 60a and the disk device 60b, respectively. Yes.
[0040]
The checkpoint holds the checkpoint information on both the primary computer 30 side (primary checkpoint information 34) and the backup computer 40 side (backup checkpoint information 45). In this figure, the checkpoint is held on the disk device, but it may be held on the memory.
[0041]
If a failure such as a system failure occurs in the primary computer 30 or the operating system that controls the primary computer 30, the process 47 is re-executed using the checkpoint information 45 on the backup computer 40 side. In this case, the process 47 uses the backup file 41.
[0042]
It is also possible to create a triple or more file system having a plurality of primary files 39 or backup files 41. In this case, for example, if it is a triple file system,
(1) Two primary files and one backup file
(2) One primary file and two backup files
Such a combination is conceivable.
[0043]
FIG. 4 is a diagram showing how a file is updated in this embodiment. In this example, a process 35 running on the primary computer 30 is converted into a duplicated file (a primary file 39 on the primary computer 30 and a backup file 41 on the backup computer 40) having 4-byte data “ABCD”. On the other hand, “X” is written to the first byte at time t1 (1). As a result, the primary file 39 is updated immediately, but the backup file 41 is not updated immediately, and only “file write information” is stored.
[0044]
Thereafter, a checkpoint is taken at time t2, thereby confirming the execution of the previous “file write information” (2). After time t2, the backup file 41 is updated based on the determined “file write information”.
[0045]
FIG. 5 is a diagram showing how the primary file is restored when a failure occurs in this embodiment. In this example, a process 35 running on the primary computer 30 is converted into a duplicated file (a primary file 39 on the primary computer 30 and a backup file 41 on the backup computer 40) having 4-byte data “ABCD”. On the other hand, “X” is written to the first byte at time t1 (1). As a result, the primary file 39 is updated immediately, but the backup file 41 is not updated immediately, and only “file write information” is stored.
[0046]
Thereafter, a failure occurs at time t2 (2). That is, since the primary file 39 is updated with the “file writing information” at time t1, it needs to be restored, but the backup file 41 has not yet been updated, so there is no need for restoration. Here, the updated portion of the primary file 39 is changed by the “file writing information” stored at the time t1. Therefore, in the restoration of the primary file 39, the data at the position indicated in the undetermined “file writing information” is read from the backup file 41, and the read data is written to the primary file 39, whereby the primary file 39 To restore.
[0047]
Then, the process 35 is re-executed on the primary computer 30 using the checkpoint taken on the primary computer 30. This re-executed process 35 uses the restored primary file 39.
[0048]
FIG. 6 is a flowchart showing the flow of processing when the file operation unit is instructed to “write file”. In this case, first, “file write information” is stored and linked to the indeterminate queue 431 (step A1). Next, the primary file 39 is updated according to “file write information” (step A2). At this point, assuming that the “file writing” operation is completed, a notification of completion is sent to the requesting side (step A3).
[0049]
FIG. 7 is a flowchart showing a flow of processing when the file operation unit is instructed to “collect checkpoint”. In this case, the stored “file writing information” is moved from the unconfirmed queue 431 to the confirmed queue 432 (step B1).
[0050]
FIG. 8 is a flowchart showing a processing flow of the backup file update unit. In this case, first, it is checked whether “file write information” is linked to the confirmation queue 432 (step C1). If not linked (N in Step C1), the backup file update unit 44 continues this check. On the other hand, if linked (Y in Step C1), the backup file 41 is updated based on the “file write information” linked to the confirmation queue 432 (Step C2). Then, the executed “file writing information” is removed from the confirmation queue 432 (step C3).
[0051]
FIG. 9 is a flowchart showing a processing flow when a failure such as an abort occurs in the process 35 and the process 35 is re-executed from the last checkpoint collected on the primary computer 30.
[0052]
When a failure occurs in the process 35, first, the checkpoint information restoring unit 33 on the primary computer 30 is instructed to restore the address space and the processor context (step D1). Next, the primary file restoring unit 33 is instructed. An instruction “restore primary file” is issued (step D2).
[0053]
FIG. 10 is a flowchart showing a processing flow when the checkpoint information restoration unit on the primary computer 30 is instructed to “restore address space and processor context”. In this case, first, the address space of the process 35 is restored (step E1). Next, the state of the processor context at the time of checkpoint collection of the process 35 is restored (step E2).
[0054]
FIG. 11 is a flowchart showing the flow of processing when the primary file restoration unit 37 is instructed to “restore primary file”. In this case, first, it is checked whether or not “file write information” is linked to the indeterminate queue 431 (step F1). If the “file write information” is linked (Y in step F1), the updated portion of the data in the primary file 39 is stored in accordance with the “file write information” linked to the indeterminate queue 431. By reading from the backup file 41 and writing the read data to the primary file 39, the data of the updated portion of the primary file 39 is restored (step F2). Then, the “file write information” used for restoration is removed (discarded) from the indeterminate queue 431 (step F3). This process is repeated until there is no “file write information” linked to the indeterminate queue 431.
[0055]
When a failure such as a system failure occurs in the primary computer 30 or the operating system that controls the primary computer 30, the process 35 is re-executed from the last checkpoint collected on the backup computer 40. In this case, the backup file 41 takes over the processing. FIG. 12 is a diagram showing how the backup file 41 takes over processing when a failure occurs.
[0056]
In this example, a process 35 operating on the primary computer 30 is a duplicated file having a 4-byte data “ABCD” (a primary file 39 on the primary computer 30 and a backup file 41 on the backup computer 40). On the other hand, “X” is written to the first byte at time t1 (1). As a result, the primary file 39 is updated immediately, but the backup file 41 is not updated immediately, and only “file write information” is stored.
[0057]
After this, a failure has occurred in the primary computer 30 at time t2 (2). In this case, the process 47 is re-executed on the backup computer 40 using a checkpoint taken on the backup computer 40. At this time, the process 47 continues processing using the backup file 41, but the primary file 39 has been updated at the time t1, but the backup file 41 has not yet been updated. In the re-execution of the process 47, the backup file 42 can be used as it is.
[0058]
If a backup file is disconnected due to a failure, a new backup file can be created after that, so that the initial state shown in FIG. 1 can be reproduced again, and recovery is possible even if a failure occurs again. Processing is possible.
[0059]
If the backup file is taken over due to a failure and the process is re-executed from the checkpoint, a new backup file is created with the backup file as the primary file, and the initial state as shown in FIG. Can be reproduced, and recovery processing can be performed even when a failure occurs again. There are the following two methods for creating the backup file again.
[0060]
(1) Save the update information and data of the primary file after detaching the backup file, and when reconnecting the backup file, reflect the update information and data of the primary file after detachment to the backup file .
[0061]
(2) Copy the primary file to the backup file. However, if the primary file continues to be updated during copying, the update information and data of the file are also reflected in the backup file at the same time as copying is started.
[0062]
Further, the following method combining these two methods is also effective.
[0063]
(3) Assuming that the separated backup file (or the primary file before the failure) is reconnected, the primary file after the backup file is detached so that the method (1) can be taken until a certain period of time elapses. Save the update information and data. When the predetermined time has elapsed, the method (1) is closed, the storage of the update information and data of the primary file after the backup file is cut off is stopped, and the method (2) is adopted. Also, when reconnecting with a file other than the disconnected backup file, saving the update information and data of the primary file after disconnecting the backup file is stopped and the method (2) is adopted.
[0064]
(Second Embodiment)
Next explained is the second embodiment of the invention. In the first embodiment, the duplexed computer system has been described. However, the present invention is also effective when applied to a file system on a computer that is not duplexed. Therefore, in this embodiment, a case where the present invention is applied to a file system on a computer that is not duplexed will be described as an example. FIG. 13 is a configuration diagram when the present invention is applied to a file system on a computer which is not duplicated. In this system, the computer is not duplicated, and only the computer 30 exists. The process 35 is executed on the computer 30 and updates the duplicated file of the primary file 39 and the backup file 41. That is, the primary file 39 and the backup file 41 are both arranged on the computer 30 and updated via the file system 36.
[0065]
The file system 36 on the computer 30 includes a primary file operation unit 38, a primary file restoration unit 37, a backup file operation unit 43, an unconfirmed queue 431, a confirmation queue 432, a backup file update unit 44, and a primary file restoration information reading unit 42. Contains.
[0066]
When the process 35 updates the duplicated file, it is performed via the primary file operation unit 38 and the backup file operation unit 43. When the process 35 performs a write on the duplicated file, the primary file 39 is updated as it is, but the backup file 41 is not updated, and “file write information” is transmitted via the backup file operation unit 43. It is stored in the indeterminate queue 431.
[0067]
When the process 35 collects checkpoints, the checkpoint control unit 31 issues an instruction to the checkpoint information storage unit 32 and the primary file operation unit 43. When receiving the checkpoint collection instruction, the checkpoint information storage unit 32 performs the address space and the processor context on the computer 30 (checkpoint information 34).
[0068]
On the other hand, when the primary file operation unit 38 receives a checkpoint collection instruction, the primary file operation unit 38 moves the “file write information” stored in the unconfirmed queue 431 to the confirmation queue 432 via the backup file operation unit 43. The “file write information” moved to the confirmation queue 432 is used for updating the backup file 41 by the backup file updating unit 44 after collecting the checkpoint, and discarded after the backup file 41 is updated. As a result, the write operation is performed on the backup file 41 in the same manner as performed on the primary file 39 after the checkpoint.
[0069]
When a failure such as an abort occurs in the process 35 and the process 35 is re-executed from the last checkpoint collected on the computer 30, the address space and the processor context are restored by the checkpoint information restoration unit 33 on the computer 30. The
[0070]
Regarding the file, the backup file 41 is not yet restored because the file after the checkpoint has not yet been updated since the “file write information” is only stored in the unconfirmed queue 431. However, since the primary file 39 has already been updated after the checkpoint, it needs to be restored. Therefore, based on the “file write information” stored in the indeterminate queue 431, the pre-update data of the primary file 39 is read from the backup file 41, and the read pre-update data is written to the primary file 39 to be restored. . Thereafter, the “file write information” stored in the unconfirmed queue 431 is discarded. If “file write information” is stored in the confirmation queue 432, the restoration process described above is started after the reflection of the “file write information” to the backup file 41 is completed.
[0071]
FIG. 14 shows a schematic configuration of a computer system to which this embodiment is applied. The system of the present embodiment operates only with the computer 30 and is not duplicated. A disk device 60 a and a disk device 60 b are connected to the computer 30. The process 35 is executed on the computer 30, and the files accessed by the process 35 are duplicated by the primary file 39 and the backup file 41, and are arranged in the disk device 60a and the disk device 60b, respectively. Yes.
[0072]
In this way, by applying the present invention, the state (checkpoint information) such as the address space of the process and the context of the processor is continuously saved while being periodically executed, and when a failure occurs, the last saved checkpoint In a system in which measures are taken against a failure by re-executing the process from the beginning, it is not necessary to once read the pre-update data from the file when updating the file, so that the file update performance is greatly improved.
[0073]
Note that the file management method described in the above embodiment can be stored and distributed in a recording medium such as a floppy disk, an optical disk, and a semiconductor memory as a program that can be executed by a computer.
[0074]
【The invention's effect】
As described above in detail, according to the present invention, when a process requests a file update, the update information indicating the update contents is acquired and stored, and only the primary file is immediately updated. After being collected, the update contents indicated by the saved update information are reflected in the backup file. For example, when a process is aborted, all the pre-update data corresponding to the data updated after the last collected checkpoint is read from the backup file based on the saved update information. The primary file is restored to the checkpoint using the read data before update, and the process is re-executed (the process can be re-executed using the backup file).
[0075]
In other words, in this computer system, when a file is updated as in the prior art, file recovery in the event of failure can be performed without waiting for normal processing to complete processing such as reading and saving data before update. As a result, the file update performance can be dramatically improved without degrading the reliability.
[Brief description of the drawings]
FIG. 1 is a conceptual diagram for explaining the basic principle of the present invention.
FIG. 2 is a diagram showing a system configuration of a computer system according to the first embodiment of the present invention.
FIG. 3 is an exemplary diagram showing a schematic configuration of a computer system to which the embodiment is applied.
FIG. 4 is a view showing a state in which a file is updated in the embodiment.
FIG. 5 is a view showing a state in which a primary file is restored when a failure occurs in the embodiment.
FIG. 6 is an exemplary flowchart illustrating a process flow when the file operation unit according to the embodiment is instructed to write a file.
FIG. 7 is an exemplary flowchart illustrating a processing flow when the file operation unit according to the embodiment is instructed to “collect checkpoints”.
FIG. 8 is an exemplary flowchart showing the flow of processing of a backup file update unit of the embodiment;
FIG. 9 is an exemplary flowchart showing the flow of processing when a failure such as an abort occurs in the process of the embodiment and the process is re-executed from the last checkpoint collected on the primary computer 30;
FIG. 10 is an exemplary flowchart showing a processing flow when the checkpoint information restoring unit on the primary computer of the embodiment is instructed to “restoration of address space and processor context”;
FIG. 11 is an exemplary flowchart illustrating a processing flow when the primary file restoring unit according to the embodiment is instructed to restore the primary file.
FIG. 12 is a view showing a state in which processing is taken over by a backup file when a failure according to the embodiment occurs.
FIG. 13 is a diagram showing a system configuration of a computer system according to a second embodiment of the present invention.
FIG. 14 is a diagram showing a schematic configuration of a computer system to which the embodiment is applied.
FIG. 15 shows that it is difficult to cancel writing to a previous file, so when writing to a file, before writing the data to the file, read and save the data before writing, and then The figure explaining the structure of the conventional system which performs the data writing to a file.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 10 ... Operation system, 11 ... Application program, 12 ... Jacket routine, 13 ... OS buffer cache, 14 ... Disk apparatus, 20 ... Standby system, 21 daemon, 211 ... Unconfirmed queue, 212 ... Confirm queue, 22 ... OS Buffer cache, 23 ... disk device, 30 ... primary computer, 31 ... checkpoint control unit, 32 ... checkpoint information storage unit, 33 ... checkpoint information restoration unit, 34 ... checkpoint information, 35 ... process, 36 ... file system 37 ... Primary file restoration unit, 38 ... Primary file operation unit, 39 ... Primary file, 40 ... Backup computer, 41 ... Backup file, 42 ... Primary file restoration information reading unit, 43 ... Backup File operation unit, 431 ... indeterminate queue, 432 ... confirmed queue, 44 ... backup file update unit, 45 ... checkpoint information, 46 ... checkpoint information restoration unit, 47 ... process, 50 ... network, 60a, 60b ... disk device .

Claims

Check in which checkpoint information including address space and processor context for restarting processing of an interrupted process is stored in a computer system that is duplicated by two computers of an active system and a standby system In a computer system that periodically collects points and stores the checkpoint information on both the active and standby computers,
A file to be updated by a process executed on the active computer is duplicated on both the active computer and the standby computer,
When an update of a file is instructed from the process, the update information is stored on the standby computer and only the active file is updated. When the update is completed, the update request source is By notifying the completion of the update , in the last collected checkpoint, which is necessary when the process is restored to the state at the last collected checkpoint based on the checkpoint information including the address space and the processor context. First file management means for maintaining a standby file in a state ;
For each of the checkpoint Ru are taken, the by the updates indicated in the update information to reflect the file of the standby system, a second file to update the file of the standby state at checkpoint that is the harvested A computer system comprising: management means.

2. The system according to claim 1, further comprising means for buffering the update information on the active computer and transferring the update information to the standby computer at a time until the checkpoint is collected. Computer system.

When the process aborts, the data before update for the file update executed after the last checkpoint is read from the standby file according to the update information, and the active file is in the state at the time of the checkpoint 3. The computer system according to claim 1, further comprising means for re-execution of the process from the checkpoint after the restoration.

When the process is aborted, the update information stored after the last checkpoint is deleted, and the update indicated by the update information before the checkpoint is reflected in the standby file, and then the process is 3. The computer system according to claim 1, further comprising means for re-execution from the checkpoint on a standby computer.

When a failure occurs in the active computer or the operating system that controls the active computer, the update information stored after the last checkpoint is deleted, and the update indicated by the update information before the checkpoint 3. The computer system according to claim 1, further comprising means for re-executing the process from the checkpoint on the standby computer after reflecting the file in the standby file. .

And a means for stopping transfer of the checkpoint and update information to the standby computer when a failure occurs in the standby computer or an operating system that controls the standby computer. The computer system according to claim 1 or 2, characterized in that

After a failure occurs in the active file, the update information stored after the last checkpoint is deleted, and the update indicated by the update information before the checkpoint is reflected in the standby file 3. The computer system according to claim 1, further comprising means for re-executing the process from the checkpoint on the standby computer.

3. The computer according to claim 1, further comprising means for stopping transfer of the checkpoint and update information to the standby computer when a failure occurs in the standby file. system.

3. The computer system according to claim 1, further comprising means for newly securing a standby file on the third computer when the standby file is disconnected.

When the process is re-executed from the checkpoint using the standby file, the standby file is switched to the active system, and a new standby file is created on the active computer. 3. The computer system according to claim 1, further comprising means for ensuring.

Periodically collect checkpoints where checkpoint information including address space and processor context is stored in order to restart processing that has been duplicated by two computers, the active system and the standby system. The checkpoint information is stored on both the active and standby computers, and a file updated by a process executed on the active computer is duplicated on both the active and standby computers. In the computer system file management method provided
When an update of a file is instructed from the process, the update information is stored on the standby computer and only the active file is updated. When the update is completed, the update request source is By notifying the completion of the update , in the last collected checkpoint, which is necessary when the process is restored to the state at the last collected checkpoint based on the checkpoint information including the address space and the processor context. A first file management step for maintaining a standby file in a state ;
For each of the checkpoint Ru are taken, the by the updates indicated in the update information to reflect the file of the standby system, a second file to update the file of the standby state at checkpoint that is the harvested A file management method comprising: a management step.

Data before update for a file update executed after the last checkpoint is read from the standby file according to the update information, and the active file is restored to the state at the time of the checkpoint. 12. The file management method according to claim 11, further comprising a step of re-execution from the checkpoint.

The update information stored after the last checkpoint is deleted, the update indicated by the update information before the checkpoint is reflected in the standby file, and then the process is checked on the standby computer. 12. The file management method according to claim 11, further comprising a step of re-execution from the point.

Periodically collect checkpoints where checkpoint information including address space and processor context is stored in order to restart processing that has been duplicated by two computers, the active system and the standby system. The checkpoint information is stored on both the active and standby computers, and a file updated by a process executed on the active computer is multiplexed on both the active and standby computers. A program for managing files of a provided computer system,
When an update of a file is instructed from the process, the update information is stored on the standby computer and only the active file is updated. When the update is completed, the update request source is The last collected checkpoint is required when the process is restored to the state at the last collected checkpoint based on the checkpoint information including the address space and processor context. Keep the standby file in the state
For each of the checkpoint Ru are taken, the by reflecting the updated contents indicated in the update information file of the standby system, the file in the standby system to update the state of the checkpoint which is the sampled computer A computer-readable storage medium storing a program for operating the computer.

The program reads the data before update for the file update executed after the last checkpoint from the standby file using the update information, and restores the active file to the state at the time of the checkpoint. The computer readable storage medium of claim 14, further operating the computer to re-execute the process from the checkpoint.

The program deletes update information stored after the last checkpoint, reflects the update indicated by the update information before the checkpoint in the standby file, and then transfers the process to the standby computer. The computer readable storage medium of claim 14, further operating the computer to re-execute from the checkpoint above.