JP6891545B2

JP6891545B2 - Data management equipment, information processing systems, data management methods, and programs

Info

Publication number: JP6891545B2
Application number: JP2017041855A
Authority: JP
Inventors: 草野　和寛; 和寛草野
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2017-03-06
Filing date: 2017-03-06
Publication date: 2021-06-18
Anticipated expiration: 2037-03-06
Also published as: JP2018147242A

Description

本発明は、データ管理装置、情報処理システム、データ管理方法、および、プログラムに関する。 The present invention relates to a data management device, an information processing system, a data management method, and a program.

大規模システムで長時間実行されるプログラムに対し、適当な間隔でチェックポイント処理を実行してデータを保存し、障害時に、チェックポイントで保存した時点から実行を再開する技術が存在する。チェックポイントの時間間隔は、構成要素となっているシステムの故障率等を参考にして決定できる。 There is a technology that executes checkpoint processing at appropriate intervals to save data for a program that is executed for a long time in a large-scale system, and resumes execution from the point saved at the checkpoint in the event of a failure. The checkpoint time interval can be determined with reference to the failure rate of the system that is a component.

この時間間隔は、大規模システムになるほど短くなる。このため、チェックポイントの時間間隔が短くなる大規模システムにおいては、チェックポイント処理に要する時間を短くすることが重要になっている。 This time interval becomes shorter for larger systems. Therefore, in a large-scale system in which the checkpoint time interval is short, it is important to shorten the time required for checkpoint processing.

チェックポイントに関する技術の一例が、特許文献１に記載されている。この技術では、稼働系の仮想計算機を持つ物理計算機がチェックポイントごとに、その仮想計算機のスナップショット差分情報を作成し、待機系の物理計算機に送信する。そして、この技術では、待機系の物理計算機に稼働系の仮想計算機が生成される。 An example of the technique relating to the checkpoint is described in Patent Document 1. In this technology, a physical computer having an active virtual computer creates snapshot difference information of the virtual computer for each checkpoint and sends it to the standby physical computer. Then, in this technology, a virtual computer of the operating system is generated in the physical computer of the standby system.

また、チェックポイントに関する技術の別の一例が、非特許文献１に記載されている。この技術では、チェックポイントごとに、前回のチェックポイントで保存したデータの差分がページベースの手法を用いて作成され、チェックポイントデータとして保存される。 Further, another example of the technique relating to the checkpoint is described in Non-Patent Document 1. In this technique, for each checkpoint, the difference between the data saved at the previous checkpoint is created using a page-based method and saved as checkpoint data.

そして、この技術では、チェックポイントデータが数回保存されてから、チェックポイントデータが反映された復元データが作成される。 Then, in this technique, the checkpoint data is saved several times, and then the restored data reflecting the checkpoint data is created.

特開２０１４−１０２７２４号公報Japanese Unexamined Patent Publication No. 2014-102724

C. Wang, F. Mueller, C. Engelmann, and S. L. Scott, 「Hybrid Fu11/Incrementa1 Checkpoint/Restart for MPI Jobs in HPC Environment」 Technica1 Report North Carolina State University TR-2009-14, 2009C. Wang, F. Mueller, C. Engelmann, and S. L. Scott, "Hybrid Fu11 / Incrementa1 Checkpoint / Restart for MPI Jobs in HPC Environment" Technica1 Report North Carolina State University TR-2009-14, 2009

特許文献１の技術においては、待機系の物理計算機が、チェックポイントごとに、稼働系の仮想計算機を生成する。したがって、稼働系、および、待機系が計算ノードである構成においては、計算ノードの機能が複雑になるという問題がある。 In the technique of Patent Document 1, the standby physical computer generates an operating virtual computer for each checkpoint. Therefore, in a configuration in which the active system and the standby system are compute nodes, there is a problem that the functions of the compute nodes become complicated.

また、非特許文献１の技術においては、チェックポイントデータが数回保存されてから、チェックポイントデータが反映された復元データが作成される。したがって、データの復元時に、時間的オーバーヘッドが大きいという問題がある。 Further, in the technique of Non-Patent Document 1, after the checkpoint data is stored several times, the restored data reflecting the checkpoint data is created. Therefore, there is a problem that the time overhead is large when the data is restored.

本発明の目的は、上記問題点を解決するデータ管理装置、情報処理システム、データ管理方法、および、プログラムを提供することである。 An object of the present invention is to provide a data management device, an information processing system, a data management method, and a program that solve the above problems.

本発明のデータ管理装置は、計算ノードによりグローバルストレージに格納されたアプリケーションプログラムのチェックポイントデータに基づいて、前記グローバルストレージに復元データを作成、または、前記グローバルストレージの前記復元データを更新するデータ更新手段と、
被代替の前記計算ノードの前記復元データが更新完了であれば前記復元データの位置情報を、更新途中であれば更新が完了してから前記復元データの位置情報を、更新未開始であれば前記復元データの位置情報および対応する前記チェックポイントデータの位置情報を、前記アプリケーションプログラムの実行を引き継ぐ代替の前記計算ノードに送るデータ位置管理手段と、を含む。 The data management device of the present invention creates restored data in the global storage or updates the restored data in the global storage based on the checkpoint data of the application program stored in the global storage by the compute node. Means and
If the restored data of the alternative calculation node has completed the update, the position information of the restored data is obtained. If the update is in progress, the position information of the restored data is obtained after the update is completed. A data position management means for sending the position information of the restored data and the position information of the corresponding checkpoint data to the alternative calculation node that takes over the execution of the application program is included.

本発明のデータ管理方法は、計算ノードによりグローバルストレージに格納されたアプリケーションプログラムのチェックポイントデータに基づいて、前記グローバルストレージに復元データを作成、または、前記グローバルストレージの前記復元データを更新するデータ更新し、
被代替の前記計算ノードの前記復元データが更新完了であれば前記復元データの位置情報を、更新途中であれば更新が完了してから前記復元データの位置情報を、更新未開始であれば前記復元データの位置情報および対応する前記チェックポイントデータの位置情報を、前記アプリケーションプログラムの実行を引き継ぐ代替の前記計算ノードに送る。 The data management method of the present invention creates restored data in the global storage based on the checkpoint data of the application program stored in the global storage by the compute node, or updates the restored data in the global storage. And
If the restored data of the alternative calculation node has completed the update, the position information of the restored data is obtained. If the update is in progress, the position information of the restored data is obtained after the update is completed. The location information of the restored data and the location information of the corresponding checkpoint data are sent to the alternative calculation node that takes over the execution of the application program.

本発明のプログラムは、計算ノードによりグローバルストレージに格納されたアプリケーションプログラムのチェックポイントデータに基づいて、前記グローバルストレージに復元データを作成、または、前記グローバルストレージの前記復元データを更新するデータ更新する処理と、
被代替の前記計算ノードの前記復元データが更新完了であれば前記復元データの位置情報を、更新途中であれば更新が完了してから前記復元データの位置情報を、更新未開始であれば前記復元データの位置情報および対応する前記チェックポイントデータの位置情報を、前記アプリケーションプログラムの実行を引き継ぐ代替の前記計算ノードに送る処理と、をコンピュータに実行させる。 The program of the present invention creates restored data in the global storage or updates the restored data in the global storage based on the checkpoint data of the application program stored in the global storage by the compute node. When,
If the restored data of the alternative calculation node has completed the update, the position information of the restored data is obtained. If the update is in progress, the position information of the restored data is obtained after the update is completed. The computer is made to execute the process of sending the position information of the restored data and the position information of the corresponding checkpoint data to the alternative calculation node that takes over the execution of the application program.

本発明は、計算ノードの機能が複雑にならず、かつ、データの復元時に時間的オーバーヘッドが大きくないという効果を持つ。 The present invention has the effect that the function of the calculation node is not complicated and the time overhead is not large when the data is restored.

本発明の第１の実施の形態の構成を示すブロック図である。It is a block diagram which shows the structure of the 1st Embodiment of this invention. 本発明の第１の実施の形態の動作を示す動作説明図である。It is an operation explanatory drawing which shows the operation of the 1st Embodiment of this invention. 本発明の第２の実施の形態の構成を示すブロック図である。It is a block diagram which shows the structure of the 2nd Embodiment of this invention. 本発明の第２の実施の形態の動作を示す動作説明図である。It is operation explanatory drawing which shows the operation of the 2nd Embodiment of this invention. 本発明の第２の実施の形態の動作を示す動作説明図である。It is operation explanatory drawing which shows the operation of the 2nd Embodiment of this invention. チェックポイントデータ、復元データの時間的変化を示す説明図であるIt is explanatory drawing which shows the time change of checkpoint data and restoration data. 本発明の第２の実施の形態の動作を示す動作説明図である。It is operation explanatory drawing which shows the operation of the 2nd Embodiment of this invention.

次に、本発明の第１の実施の形態について図面を参照して詳細に説明する。 Next, the first embodiment of the present invention will be described in detail with reference to the drawings.

図１は、第１の実施の形態の構成を示すブロック図である。図１を参照すると、第１の実施の形態のデータ管理装置１００は、データ更新手段１１０、および、データ位置管理手段１２０を含む。 FIG. 1 is a block diagram showing a configuration of the first embodiment. Referring to FIG. 1, the data management device 100 of the first embodiment includes a data updating means 110 and a data position management means 120.

データ管理装置１００は、グローバルストレージ、および、計算ノードに接続される。データ管理装置１００は、ネットワークを介して、グローバルストレージ、および、計算ノードに接続されてもよい。 The data management device 100 is connected to the global storage and the calculation node. The data management device 100 may be connected to the global storage and the calculation node via the network.

次に、第１の実施の形態の動作について図面を参照して詳細に説明する。 Next, the operation of the first embodiment will be described in detail with reference to the drawings.

図２は、第１の実施の形態の動作を示す動作説明図である。図２を参照すると、データ更新手段１１０は、計算ノードによりグローバルストレージに格納されたチェックポイントデータに基づいて復元データをグローバルストレージに作成、または、復元データを更新する（ステップＳ６０１）。 FIG. 2 is an operation explanatory diagram showing the operation of the first embodiment. Referring to FIG. 2, the data updating means 110 creates the restored data in the global storage or updates the restored data based on the checkpoint data stored in the global storage by the calculation node (step S601).

また、データ位置管理手段１２０は、ある計算ノード（被代替）の障害時等に、アプリケーションプログラムの実行を引き継ぐ代替計算ノードからの通知を受け取る（ステップＳ６０２）。そして、データ位置管理手段１２０は、被代替計算ノードの復元データが更新済み（すなわち、更新完了で最新）であれば（ステップＳ６０３／Ｙ）、復元データの位置情報を、通知元の計算ノードに送る（ステップＳ６０４）。ここで、復元データが更新途中である場合、データ更新手段１１０による更新が完了してから、ステップＳ６０４に進む方式が可能である。 Further, the data position management means 120 receives a notification from the alternative calculation node that takes over the execution of the application program when a failure of a certain calculation node (substitute) occurs (step S602). Then, if the restored data of the alternative calculation node has been updated (that is, the latest update is completed) (step S603 / Y), the data position management means 120 transfers the position information of the restored data to the calculation node of the notification source. Send (step S604). Here, if the restored data is in the process of being updated, a method of proceeding to step S604 after the update by the data updating means 110 is completed is possible.

被代替計算ノードの復元データが未更新であれば（ステップＳ６０３／Ｎ）、データ位置管理手段１２０は、復元データの位置情報、および、チェックポイントデータの位置情報を、通知元の計算ノードに送る（ステップＳ６０５）。この未更新とは、更新未開始のことである。 If the restored data of the alternative calculation node has not been updated (step S603 / N), the data position management means 120 sends the position information of the restored data and the position information of the checkpoint data to the calculation node of the notification source. (Step S605). This non-update means that the update has not started.

ここで、位置情報とは、たとえば、グローバルストレージ上のアドレスのことである。障害時等に、通知を出した計算ノードは、復元データの位置情報、または、チェックポイントの位置情報を受け取ると、復元データ、または、チェックポイントデータをストレージから取り出し、障害の復旧を図ることができる。 Here, the location information is, for example, an address on the global storage. When the compute node that issued the notification in the event of a failure receives the location information of the restored data or the location information of the checkpoint, the restored data or checkpoint data can be retrieved from the storage to recover from the failure. it can.

次に、第１の実施の形態の効果について説明する。 Next, the effect of the first embodiment will be described.

第１の実施の形態は、計算ノードとは独立したデータ管理装置１００が、チェックポイントデータに基づいて復元データをグローバルストレージに作成、または、復元データを更新する構成である。したがって、第１の実施の形態は、計算ノードの構成が複雑にならないという効果を持つ。 In the first embodiment, the data management device 100 independent of the calculation node creates the restored data in the global storage based on the checkpoint data, or updates the restored data. Therefore, the first embodiment has the effect that the configuration of the calculation node is not complicated.

また、第１の実施の形態は、計算ノードとは独立したデータ管理装置１００が、チェックポイントごとに、チェックポイントデータに基づいて復元データを更新する構成である。したがって、第１の実施の形態は、データの復元時に、最新の復元データがすでに保存されている確率が非常に大きいので、時間的オーバーヘッドが少ないという効果を持つ。 Further, in the first embodiment, the data management device 100 independent of the calculation node updates the restored data for each checkpoint based on the checkpoint data. Therefore, the first embodiment has an effect that the time overhead is small because the probability that the latest restored data is already stored at the time of data restoration is very high.

次に、本発明の第２の実施の形態について図面を参照して詳細に説明する。 Next, a second embodiment of the present invention will be described in detail with reference to the drawings.

図３は、第２の実施の形態の構成を示すブロック図である。図３を参照すると、第２の実施の形態の情報処理システム５００は、データ管理装置１００、１以上の計算ノード２１０〜計算ノード２ｎ０（ｎは整数）、および、グローバルストレージ３００が、ネットワーク４００で接続される構成である。 FIG. 3 is a block diagram showing the configuration of the second embodiment. Referring to FIG. 3, in the information processing system 500 of the second embodiment, the data management device 100, one or more calculation nodes 210 to calculation nodes 2n0 (n is an integer), and the global storage 300 are connected to the network 400. It is a configuration to be connected.

計算ノード２１０〜計算ノード２ｎ０に関しては、同一構成なので、以下、計算ノード２１０について説明する。計算ノード２１０は、プロセッサ２１１、および、ローカルストレージ２１２を含む。プロセッサ２１１上で、オペレーティングシステム２１３、アプリケーションプログラム２１４、および、ローカルチェックポイント管理プログラム２１５が動作する。 Since the calculation nodes 210 to 2n0 have the same configuration, the calculation node 210 will be described below. Compute node 210 includes processor 211 and local storage 212. The operating system 213, the application program 214, and the local checkpoint management program 215 run on the processor 211.

ローカルストレージ２１２には、オペレーティングシステム２１３、アプリケーションプログラム２１４、および、ローカルチェックポイント管理プログラム２１５が格納され、プロセッサ２１１に読み出され実行される。また、ローカルストレージ２１２には、アプリケーションプログラム２１４で使用されるデータが格納される。 The operating system 213, the application program 214, and the local checkpoint management program 215 are stored in the local storage 212, and are read and executed by the processor 211. Further, the local storage 212 stores data used by the application program 214.

オペレーティングシステム２１３、アプリケーションプログラム２１４、および、ローカルチェックポイント管理プログラム２１５は、プロセッサ２１１内のハードウェアとして実現される構成が可能である。この場合、たとえば、ローカルチェックポイント管理プログラム２１５は、ハードウェアのローカルチェックポイント管理手段である。 The operating system 213, the application program 214, and the local checkpoint management program 215 can be configured to be implemented as hardware in the processor 211. In this case, for example, the local checkpoint management program 215 is a hardware local checkpoint management means.

グローバルストレージ３００には、チェックポイントデータ３０１、および、復元データ３０２が格納される。 Checkpoint data 301 and restoration data 302 are stored in the global storage 300.

次に、第２の実施の形態の動作について説明する。 Next, the operation of the second embodiment will be described.

計算ノード２１０のオペレーティングシステム２１３の管理下でアプリケーションプログラム２１４が実行され、その実行中にアプリケーションプログラム２１４に対応するチェックポイントの処理が行われる場合について説明する。 The case where the application program 214 is executed under the control of the operating system 213 of the calculation node 210 and the checkpoint corresponding to the application program 214 is processed during the execution will be described.

まず、動作の概要について説明する。 First, the outline of the operation will be described.

アプリケーションプログラム２１４は、実行時に一定時間が経過すると、その実行を中断する。そして、ローカルチェックポイント管理プログラム２１５が、チェックポイントデータ３０１をグローバルストレージ３００に保存する処理を行い、その後、アプリケーションプログラム２１４が実行を再開する。 The application program 214 suspends its execution after a certain period of time has elapsed during execution. Then, the local checkpoint management program 215 performs a process of saving the checkpoint data 301 in the global storage 300, and then the application program 214 resumes execution.

データ管理装置１００は、保存されたチェックポイントデータ３０１を復元データ３０２に適用して復元データ３０２を更新する。 The data management device 100 applies the saved checkpoint data 301 to the restored data 302 to update the restored data 302.

障害発生に対処するリスタートの処理においては、たとえば、代替の計算ノード２１０のローカルチェックポイント管理プログラム２１５が、データ管理装置１００に復元データ３０２の問い合わせを行う。そして、データ管理装置１００が、復元データ３０２の位置情報を計算ノード２１０に送る。ローカルチェックポイント管理プログラム２１５は、復元データ３０２の位置情報の位置情報を受け取り、グローバルストレージ３００から復元データ３０２を取り出す。アプリケーションプログラム２１４は、その復元データを用いて実行を再開する。 In the restart process for dealing with the occurrence of a failure, for example, the local checkpoint management program 215 of the alternative calculation node 210 queries the data management device 100 for the restored data 302. Then, the data management device 100 sends the position information of the restored data 302 to the calculation node 210. The local checkpoint management program 215 receives the location information of the location information of the restored data 302, and retrieves the restored data 302 from the global storage 300. The application program 214 resumes execution using the restored data.

また、データ管理装置１００が、復元データ３０２の位置情報、および、チェックポイントデータ３０１の位置情報の両方を送る場合がある。チェックポイントデータ３０１の位置情報が必要となるのは、以前のチェックポイントでチェックポイントデータ３０１が保存され、それを復元データ３０２に反映させる上書き処理が行われていない場合である。 Further, the data management device 100 may send both the position information of the restored data 302 and the position information of the checkpoint data 301. The position information of the checkpoint data 301 is required when the checkpoint data 301 is saved at the previous checkpoint and the overwrite process for reflecting the checkpoint data 301 is not performed in the restored data 302.

この場合、ローカルチェックポイント管理プログラム２１５は、チェックポイントデータ３０１の位置情報に基づきチェックポイントデータ３０１を取りだし、復元データ３０２に反映する（上書きする）。そして、アプリケーションプログラム２１４は、その復元データ３０２を用いて実行を再開する。 In this case, the local checkpoint management program 215 takes out the checkpoint data 301 based on the position information of the checkpoint data 301 and reflects (overwrites) it in the restored data 302. Then, the application program 214 resumes execution using the restored data 302.

次に、第２の実施の形態の動作について、さらに詳細に図面を参照して説明する。 Next, the operation of the second embodiment will be described in more detail with reference to the drawings.

まず、チェックポイントデータ３０１の保存の動作について説明する。図４は、第２の実施の形態の計算ノード２１０におけるチェックポイントデータ３０１の保存動作を示す動作説明図である。 First, the operation of saving the checkpoint data 301 will be described. FIG. 4 is an operation explanatory diagram showing an operation of saving checkpoint data 301 in the calculation node 210 of the second embodiment.

図４を参照すると、計算ノード２１０のオペレーティングシステム２１３が、一定時間の経過を認識し、実行されているアプリケーションプログラム２１４に実行の中断を指示する。アプリケーションプログラム２１４は、この指示を受け、実行を中断する（ステップＳ６２１）。 Referring to FIG. 4, the operating system 213 of the compute node 210 recognizes the passage of a certain period of time and instructs the running application program 214 to suspend the execution. Upon receiving this instruction, the application program 214 suspends execution (step S621).

このアプリケーションプログラム２１４が、計算ノード２１０〜計算ノード２ｎ０の複数に分散されて並列に実行されている場合、オペレーティングシステム２１３が、同期処理を行う（ステップＳ６２２）。すなわち、全ての計算ノード２１０〜計算ノード２ｎ０のオペレーティングシステム２１３が連携し、分散され並列に実行されているアプリケーションプログラム２１４を中断させる。計算ノード２１０〜計算ノード２ｎ０相互の通信処理の完了の確認は、同期処理に含まれる。 When the application program 214 is distributed to a plurality of calculation nodes 210 to calculation nodes 2n0 and executed in parallel, the operating system 213 performs synchronous processing (step S622). That is, the operating systems 213 of all the calculation nodes 210 to 2n0 cooperate with each other to interrupt the application program 214 that is distributed and executed in parallel. Confirmation of completion of communication processing between calculation nodes 210 to calculation nodes 2n0 is included in the synchronization processing.

次に、中断されたアプリケーションプログラム２１４を含む計算ノード２１０〜計算ノード２ｎ０のローカルチェックポイント管理プログラム２１５がそれぞれのオペレーティングシステム２１３により起動される（ステップＳ６２３）。 Next, the local checkpoint management program 215 of the compute node 210 to compute node 2n0 including the interrupted application program 214 is started by each operating system 213 (step S623).

そして、各ローカルチェックポイント管理プログラム２１５は、アプリケーションプログラム２１４により作成、編集されたデータに基づき、チェックポイントデータ３０１を作成し、グローバルストレージ３００に保存する（ステップＳ６２４）。 Then, each local checkpoint management program 215 creates checkpoint data 301 based on the data created and edited by the application program 214, and saves the checkpoint data 301 in the global storage 300 (step S624).

保存するデータ量を削減するために、前回のチェックポイントで保存したチェックポイントデータ３０１が含まれるデータ（すなわち、復元データ３０２相当）から変更された差分をチェックポイントデータ３０１とする手法が適用可能である。チェックポイントごとに、前回のチェックポイント時のデータと今回のデータとの差分を取る手法としては、たとえば、ページベースの手法、ハッシュベースの手法等がある。 In order to reduce the amount of data to be saved, a method can be applied in which the difference changed from the data including the checkpoint data 301 saved at the previous checkpoint (that is, equivalent to the restored data 302) is used as the checkpoint data 301. is there. As a method of taking the difference between the data at the time of the previous checkpoint and the data at this time for each checkpoint, for example, there are a page-based method, a hash-based method, and the like.

次に、各ローカルチェックポイント管理プログラム２１５は、データ管理装置１００にチェックポイントデータ３０１の保存が終了したことを示す終了通知を送る（ステップＳ６２５）。各ローカルチェックポイント管理プログラム２１５は、アプリケーションプログラム２１４を含む全ての計算ノード２１０〜計算ノード２ｎ０におけるチェックポイントデータ３０１の保存が完了することを確認する同期処理を行う（ステップＳ６２６）。 Next, each local checkpoint management program 215 sends a completion notification indicating that the storage of the checkpoint data 301 is completed to the data management device 100 (step S625). Each local checkpoint management program 215 performs a synchronous process for confirming that the storage of the checkpoint data 301 in all the calculation nodes 210 to 2n0 including the application program 214 is completed (step S626).

この同期処理が完了すると、各ローカルチェックポイント管理プログラム２１５は、同期の完了をそれぞれのオペレーティングシステム２１３に通知する（ステップＳ６２７）。各オペレーティングシステム２１３は、通知を受け取ると、中断していた各アプリケーションプログラム２１４に実行の再開を指示し、各アプリケーションプログラム２１４は、この指示を受け、実行を再開する（ステップＳ６２８）。 When this synchronization process is completed, each local checkpoint management program 215 notifies each operating system 213 of the completion of synchronization (step S627). Upon receiving the notification, each operating system 213 instructs each of the suspended application programs 214 to resume execution, and each application program 214 receives this instruction and resumes execution (step S628).

次に、保存されたチェックポイントデータ３０１と復元データ３０２とを組み合わせて復元データ３０２を更新する動作について説明する。図５は、第２の実施の形態のデータ管理装置１００の動作を示す動作説明図である。 Next, the operation of updating the restored data 302 by combining the saved checkpoint data 301 and the restored data 302 will be described. FIG. 5 is an operation explanatory diagram showing the operation of the data management device 100 of the second embodiment.

図４のステップＳ６２５に示されるように、各ローカルチェックポイント管理プログラム２１５は、データ管理装置１００にチェックポイントデータ３０１の保存が終了したことを示す終了通知を送る。 As shown in step S625 of FIG. 4, each local checkpoint management program 215 sends a completion notification indicating that the storage of the checkpoint data 301 is completed to the data management device 100.

図５を参照すると、データ管理装置１００のデータ更新手段１１０は、アプリケーションプログラム２１４が実行されている計算ノード２１０〜計算ノード２ｎ０全てから終了通知を受け取ったかどうかを確認する（ステップＳ６４１）。 Referring to FIG. 5, the data updating means 110 of the data management device 100 confirms whether or not the end notification has been received from all the calculation nodes 210 to 2n0 in which the application program 214 is executed (step S641).

まだ、受け取ってなければ（ステップＳ６４１／Ｎ）、データ更新手段１１０は、確認を繰り返す（ステップＳ６４１）。データ更新手段１１０は、アプリケーションプログラム２１４が実行されている計算ノード２１０〜計算ノード２ｎ０全てから終了通知を受け取ると（ステップＳ６４１／Ｙ）、ステップＳ６４２以降の処理に進む。すなわち、データ更新手段１１０は、終了通知に対応して、順次、復元データ３０２を更新する処理を行う。 If it has not been received yet (step S641 / N), the data updating means 110 repeats the confirmation (step S641). When the data update means 110 receives the end notification from all the calculation nodes 210 to 2n0 in which the application program 214 is executed (step S641 / Y), the data update means 110 proceeds to the processing of step S642 and subsequent steps. That is, the data updating means 110 sequentially performs a process of updating the restored data 302 in response to the end notification.

データ更新手段１１０は、終了通知を受け取り、かつ、対応する復元データ３０２を更新していない（最新でない）計算ノード２１０〜計算ノード２ｎ０が存在するかどうかを確認する（ステップＳ６４２）。 The data update means 110 receives the end notification and confirms whether or not the calculation nodes 210 to 2n0 that have not updated the corresponding restored data 302 (not the latest) exist (step S642).

対応する復元データ３０２を更新していない計算ノード２１０〜計算ノード２ｎ０が存在する場合（ステップＳ６４２／Ｙ）、データ更新手段１１０は、その計算ノード２１０〜計算ノード２ｎ０を選択する（ステップＳ６４３）。次に、データ更新手段１１０は、選択した計算ノード２１０〜計算ノード２ｎ０の復元データ３０２に、対応するチェックポイントデータ３０１を上書きして最新のチェックポイントにおける復元データ３０２の内容に更新する（ステップＳ６４４）。そして、データ更新手段１１０は、ステップＳ６４２の処理に戻る。 When there is a calculation node 210 to the calculation node 2n0 that has not updated the corresponding restored data 302 (step S642 / Y), the data update means 110 selects the calculation node 210 to the calculation node 2n0 (step S643). Next, the data updating means 110 overwrites the restored data 302 of the selected calculation node 210 to the calculation node 2n0 with the corresponding checkpoint data 301 and updates the contents of the restored data 302 at the latest checkpoint (step S644). ). Then, the data updating means 110 returns to the process of step S642.

対応する復元データ３０２を更新していない計算ノード２１０〜計算ノード２ｎ０が存在しない場合（ステップＳ６４２／Ｎ）、データ更新手段１１０は、処理を終了する。 When the calculation node 210 to the calculation node 2n0 that has not updated the corresponding restoration data 302 does not exist (step S642 / N), the data update means 110 ends the process.

以上では、アプリケーションプログラム２１４が実行されている計算ノード２１０〜計算ノード２ｎ０全てから終了通知を受け取ったかどうかを確認してから、ステップＳ６４２以降の復元データ３０２の更新が実施された。別の方式として、データ更新手段１１０が、終了通知を受け取るごとに、その計算ノード２１０〜計算ノード２ｎ０に対応する復元データ３０２を更新する方式も可能である。 In the above, after confirming whether or not the end notification has been received from all the calculation nodes 210 to 2n0 in which the application program 214 is executed, the restoration data 302 after step S642 is updated. As another method, the data updating means 110 may update the restored data 302 corresponding to the calculation node 210 to the calculation node 2n0 each time the end notification is received.

上記のデータ管理装置１００の処理は、アプリケーションプログラム２１４の実行と並列して実行されることが可能である。復元データ３０２の更新は、保存したチェックポイントデータ３０１を復元データ３０２に上書きする書き込み処理であるため、通常は、複雑な処理を行うアプリケーションプログラム２１４の実行時間の間に十分に終了可能である。 The processing of the data management device 100 can be executed in parallel with the execution of the application program 214. Since the update of the restored data 302 is a write process that overwrites the saved checkpoint data 301 with the restored data 302, it can usually be sufficiently completed during the execution time of the application program 214 that performs complicated processing.

次に、以上で説明したチェックポイントデータ３０１、および、復元データ３０２の時間的変化について説明する。 Next, the checkpoint data 301 described above and the temporal change of the restored data 302 will be described.

図６は、チェックポイントデータ３０１、および、復元データ３０２の時間的変化を示す説明図である。図６を参照すると、アプリケーションプログラム２１４の実行が開始され、最初のチェックポイントにおけるチェックポイントデータ３０１がＦ０である。この場合、全てのデータであるＦ０が、チェックポイントデータ３０１としてグローバルストレージ３００に保存される。そして、このチェックポイントデータ３０１（Ｆ０）が、そのまま復元データ３０２（Ｆ０）となる。 FIG. 6 is an explanatory diagram showing changes over time in the checkpoint data 301 and the restored data 302. Referring to FIG. 6, the execution of the application program 214 is started, and the checkpoint data 301 at the first checkpoint is F0. In this case, all data F0 is stored in the global storage 300 as checkpoint data 301. Then, the checkpoint data 301 (F0) becomes the restored data 302 (F0) as it is.

次のチェックポイントにおいて、前のチェックポイント時のデータＦ０と現在のデータＦ１との差分Ｄ１がチェックポイントデータ３０１としてグローバルストレージ３００に保存される。データ管理装置１００は、差分Ｄ１を、この時点の復元データ３０２であるＦ０に上書きする処理を行い、復元データ３０２をＦ１と同等にする。 At the next checkpoint, the difference D1 between the data F0 at the time of the previous checkpoint and the current data F1 is stored in the global storage 300 as checkpoint data 301. The data management device 100 performs a process of overwriting the difference D1 with F0, which is the restored data 302 at this time, and makes the restored data 302 equivalent to F1.

これ以降も同様に、データ管理装置１００は、チェックポイントデータ３０１（Ｄ２、…、Ｄｍ）を使用して復元データ３０２（Ｆ２、…、Ｆｍ）を更新する。このｍは、整数である。以上のようにして、最新のチェックポイント時点の復元データ３０２をグローバルストレージ３００に保持することが可能となる。 From this point onward, the data management device 100 similarly updates the restored data 302 (F2, ..., Fm) using the checkpoint data 301 (D2, ..., Dm). This m is an integer. As described above, the restored data 302 at the latest checkpoint can be held in the global storage 300.

次に、障害等により実行を再開するリスタート処理の動作について説明する。障害等が発生して動作不可能な計算ノード２ｉ０（ｉは整数）がある場合、代替の計算ノード２ｊ０（ｊは整数）が割り当てられて、アプリケーションプログラム２１４の実行が再開される。 Next, the operation of the restart process for resuming execution due to a failure or the like will be described. When there is a calculation node 2i0 (i is an integer) that cannot operate due to a failure or the like, an alternative calculation node 2j0 (j is an integer) is assigned and the execution of the application program 214 is restarted.

代替の計算ノード２ｊ０が、計算ノード２１０である場合について説明する。すなわち、代替の計算ノード２１０のアプリケーションプログラム２１４が、グローバルストレージ３００に保持されている復元データ３０２を使って引き継ぎを行う場合について説明する。 The case where the alternative calculation node 2j0 is the calculation node 210 will be described. That is, a case where the application program 214 of the alternative calculation node 210 takes over using the restored data 302 held in the global storage 300 will be described.

まず、計算ノード２１０のローカルチェックポイント管理プログラム２１５が実行を再開するデータを得るため、データ管理装置１００に復元データ３０２の問い合わせを行う。 First, the local checkpoint management program 215 of the calculation node 210 queries the data management device 100 for the restored data 302 in order to obtain data for resuming execution.

図７は、上記問い合わせを受け取ったデータ管理装置１００の動作を示す動作説明図である。図７を参照すると、データ管理装置１００のデータ位置管理手段１２０は、計算ノード２１０からの問い合わせを受け取り（ステップＳ６５１）、障害の発生した計算ノード２ｉ０の復元データ３０２の更新状態を確認する（ステップＳ６５２）。 FIG. 7 is an operation explanatory diagram showing the operation of the data management device 100 that has received the inquiry. Referring to FIG. 7, the data position management means 120 of the data management device 100 receives an inquiry from the calculation node 210 (step S651), and confirms the update status of the restored data 302 of the failed calculation node 2i0 (step S651). S652).

復元データ３０２が、未更新である場合（ステップＳ６５３／Ｙ）、データ位置管理手段１２０は、計算ノード２ｉ０の復元データ３０２の位置情報、および、チェックポイントデータ３０１の位置情報を計算ノード２１０に返却する（ステップＳ６５４）。そして、データ位置管理手段１２０は、処理を終了する。この場合、未更新は、更新途中を含まない。 When the restored data 302 is not updated (step S653 / Y), the data position management means 120 returns the position information of the restored data 302 of the calculation node 2i0 and the position information of the checkpoint data 301 to the calculation node 210. (Step S654). Then, the data position management means 120 ends the process. In this case, unupdated does not include the process of updating.

復元データ３０２が更新途中、または、更新済みである場合（ステップＳ６５３／Ｎ）、データ位置管理手段１２０は、更新途中であるかどうかをデータ更新手段１１０に確認する（ステップＳ６５５）。更新途中である場合は（ステップＳ６５５／Ｙ）、データ位置管理手段１２０は、確認を繰り返す（ステップＳ６５５）。すなわち、データ位置管理手段１２０は、データ更新手段１１０による更新が完了するまで待つ。 When the restored data 302 is being updated or has been updated (step S653 / N), the data position management means 120 confirms with the data updating means 110 whether or not the restored data 302 is being updated (step S655). If the update is in progress (step S655 / Y), the data position management means 120 repeats the confirmation (step S655). That is, the data position management means 120 waits until the update by the data update means 110 is completed.

更新途中でない場合（ステップＳ６５５／Ｎ）、データ位置管理手段１２０は、障害の発生した計算ノード２ｉ０の復元データ３０２の位置情報を計算ノード２１０に返却して（ステップＳ６５６）、処理を終了する。 If the update is not in progress (step S655 / N), the data position management means 120 returns the position information of the restored data 302 of the failed calculation node 2i0 to the calculation node 210 (step S656), and ends the process.

以上説明したように、障害発生のタイミングにより、復元データ３０２の状態は、３種の状態のいずれかである。第１種の状態は、保存された最新のチェックポイントデータ３０１による更新が済んだ復元データ３０２となっている状態である。 As described above, the state of the restored data 302 is one of three types depending on the timing of failure occurrence. The first type of state is the restored data 302 that has been updated by the latest saved checkpoint data 301.

第２種の状態は、保存された最新のチェックポイントデータ３０１による更新処理が開始されているが、まだ、完了していない途中の状態である。第３種の状態は、保存された最新のチェックポイントデータ３０１による更新処理がまだ開始されていない状態である。 The second type of state is a state in which the update process by the latest saved checkpoint data 301 has been started, but has not been completed yet. The third type of state is a state in which the update process by the latest saved checkpoint data 301 has not been started yet.

第１種の状態、または、第２種の状態の場合、計算ノード２１０は、返却された復元データ３０２の位置情報を使用して、復元データ３０２をグローバルストレージ３００から取り出し、障害の発生した計算ノード２ｉ０の処理を引き継ぐことができる。 In the case of the first type state or the second type state, the calculation node 210 uses the position information of the returned restored data 302 to retrieve the restored data 302 from the global storage 300, and the calculation in which the failure has occurred. The processing of node 2i0 can be taken over.

たとえば、計算ノード２１０のローカルチェックポイント管理プログラム２１５が、復元データ３０２をローカルストレージ２１２に取り出し、アプリケーションプログラム２１４が、その復元データ３０２を使用して、引き継ぎの実行を開始する。 For example, the local checkpoint management program 215 of the compute node 210 retrieves the restored data 302 into the local storage 212, and the application program 214 uses the restored data 302 to start executing the takeover.

第３種の状態の場合、計算ノード２１０において、復元データ３０２にチェックポイントデータ３０１の上書きが必要になる。チェックポイントデータ３０１が前回との差分であれば（ハッシュベースの手法等）、オーバーヘッドは少ない。 In the case of the third type of state, it is necessary to overwrite the checkpoint data 301 with the restored data 302 at the calculation node 210. If the checkpoint data 301 is a difference from the previous time (hash-based method, etc.), the overhead is small.

この場合、ローカルチェックポイント管理プログラム２１５が、ローカルストレージ２１２に取り出した復元データ３０２にチェックポイントデータ３０１の上書きを行う。そして、アプリケーションプログラム２１４が、上書きされた復元データ３０２を使用して、引き継ぎの実行を開始する。 In this case, the local checkpoint management program 215 overwrites the restored data 302 taken out to the local storage 212 with the checkpoint data 301. Then, the application program 214 starts the execution of the takeover using the overwritten restored data 302.

次に、第２の実施の形態の効果について説明する。 Next, the effect of the second embodiment will be described.

第２の実施の形態は、第１の実施の形態の一例なので、第１の実施の形態と同一の効果を持つ。また、第２の実施の形態は、差分をチェックポイントデータ３０１として使用しているので、復元データ３０２の更新の時間が短いという効果を持つ。 Since the second embodiment is an example of the first embodiment, it has the same effect as the first embodiment. Further, in the second embodiment, since the difference is used as the checkpoint data 301, there is an effect that the update time of the restored data 302 is short.

上記の実施の形態の一部、または、全部は、以下の付記のようにも記載されうるが、以下には限られない。 Some or all of the above embodiments may also be described, but not limited to:

［付記１］
計算ノードによりグローバルストレージに格納されたアプリケーションプログラムのチェックポイントデータに基づいて、前記グローバルストレージに復元データを作成、または、前記グローバルストレージの前記復元データを更新するデータ更新手段と、
被代替の前記計算ノードの前記復元データが更新完了であれば前記復元データの位置情報を、更新途中であれば更新が完了してから前記復元データの位置情報を、更新未開始であれば前記復元データの位置情報および対応する前記チェックポイントデータの位置情報を、前記アプリケーションプログラムの実行を引き継ぐ代替の前記計算ノードに送るデータ位置管理手段と、
を含むことを特徴とするデータ管理装置。 [Appendix 1]
A data update means for creating restored data in the global storage or updating the restored data in the global storage based on the checkpoint data of the application program stored in the global storage by the compute node.
If the restored data of the alternative calculation node has completed the update, the position information of the restored data is obtained. If the update is in progress, the position information of the restored data is obtained after the update is completed. A data position management means for sending the position information of the restored data and the position information of the corresponding checkpoint data to the alternative calculation node that takes over the execution of the application program.
A data management device characterized by including.

［付記２］
前記チェックポイントデータは、前回のチェックポイントにおける前記アプリケーションプログラムのデータと今回のデータとの差分であることを特徴とする付記１のデータ管理装置。 [Appendix 2]
The data management device according to Appendix 1, wherein the checkpoint data is a difference between the data of the application program at the previous checkpoint and the data of the present time.

［付記３］
前記データ更新手段は、前記アプリケーションプログラムが、複数の前記計算ノードに分散され並列に実行されている場合、前記複数の前記計算ノード全ての前記チェックポイントデータを前記復元データに上書きし、最新のチェックポイントにおける前記復元データに更新することを特徴とする付記１、または、２のデータ管理装置。 [Appendix 3]
When the application program is distributed to a plurality of the computing nodes and executed in parallel, the data updating means overwrites the checkpoint data of all the plurality of computing nodes with the restored data to perform the latest check. The data management device of Appendix 1 or 2, characterized in that the restored data at the point is updated.

［付記４］
前記データ更新手段は、前記アプリケーションプログラムが分散され並列に実行されている前記計算ノード全てから前記チェックポイントデータの保存が終了したことを示す終了通知を受け取ってから、最新のチェックポイントにおける前記復元データに更新することを特徴とする付記３のデータ管理装置。 [Appendix 4]
The data updating means receives the end notification indicating that the storage of the checkpoint data is completed from all the computing nodes in which the application program is distributed and executed in parallel, and then the restored data at the latest checkpoint. The data management device of Appendix 3 characterized in that it is updated to.

［付記５］
前記データ更新手段は、前記アプリケーションプログラムが分散され並列に実行されている前記計算ノードから前記チェックポイントデータの保存が終了したことを示す終了通知を受け取るごとに、最新のチェックポイントにおける前記復元データに更新することを特徴とする付記３のデータ管理装置。 [Appendix 5]
Each time the data update means receives a termination notification indicating that the storage of the checkpoint data has been completed from the calculation node in which the application program is distributed and executed in parallel, the data is restored to the restored data at the latest checkpoint. The data management device of Appendix 3 characterized by updating.

［付記６］付記１ないし５のいずれか１つの前記データ管理装置と、計算ノードと、グローバルストレージと、
を含むことを特徴とする情報処理システム。 [Appendix 6] The data management device, the calculation node, the global storage, and the global storage of any one of the appendices 1 to 5.
An information processing system characterized by including.

［付記７］
前記計算ノードは、チェックポイントごとに、前記チェックポイントデータを作成し、前記グローバルストレージに保存し、前記終了通知を前記データ処理装置に送るローカルチェックポイント管理プログラムを実行することを特徴とする付記６の情報処理システム。 [Appendix 7]
The calculation node creates the checkpoint data for each checkpoint, stores the checkpoint data in the global storage, and executes a local checkpoint management program that sends the end notification to the data processing device. Information processing system.

［付記８］
計算ノードによりグローバルストレージに格納されたアプリケーションプログラムのチェックポイントデータに基づいて、前記グローバルストレージに復元データを作成、または、前記グローバルストレージの前記復元データを更新するデータ更新し、
被代替の前記計算ノードの前記復元データが更新完了であれば前記復元データの位置情報を、更新途中であれば更新が完了してから前記復元データの位置情報を、更新未開始であれば前記復元データの位置情報および対応する前記チェックポイントデータの位置情報を、前記アプリケーションプログラムの実行を引き継ぐ代替の前記計算ノードに送ることを特徴とするデータ管理方法。 [Appendix 8]
Based on the checkpoint data of the application program stored in the global storage by the compute node, the restored data is created in the global storage, or the restored data in the global storage is updated.
If the restored data of the alternative calculation node has completed the update, the position information of the restored data is obtained. If the update is in progress, the position information of the restored data is obtained after the update is completed. A data management method comprising sending the location information of restored data and the location information of the corresponding checkpoint data to an alternative computing node that takes over the execution of the application program.

［付記９］
前記チェックポイントデータは、前回のチェックポイントにおける前記アプリケーションプログラムのデータと今回のデータとの差分であることを特徴とする付記８のデータ管理方法。 [Appendix 9]
The data management method according to Appendix 8, wherein the checkpoint data is a difference between the data of the application program at the previous checkpoint and the data of the present time.

［付記１０］
前記アプリケーションプログラムが、複数の前記計算ノードに分散され並列に実行されている場合、前記複数の前記計算ノード全ての前記チェックポイントデータを前記復元データに上書きし、最新のチェックポイントにおける前記復元データに更新することを特徴とする付記８、または、９のデータ管理方法。 [Appendix 10]
When the application program is distributed to a plurality of the computing nodes and executed in parallel, the checkpoint data of all the plurality of computing nodes is overwritten with the restored data, and the restored data at the latest checkpoint is used. The data management method of Appendix 8 or 9, characterized in that it is updated.

［付記１１］
前記アプリケーションプログラムが分散され並列に実行されている前記計算ノード全てから前記チェックポイントデータの保存が終了したことを示す終了通知を受け取ってから、最新のチェックポイントにおける前記復元データに更新することを特徴とする付記１０のデータ管理方法。 [Appendix 11]
It is characterized in that after receiving the end notification indicating that the storage of the checkpoint data is completed from all the computing nodes in which the application program is distributed and executed in parallel, the data is updated to the restored data at the latest checkpoint. The data management method of Appendix 10.

［付記１２］
前記アプリケーションプログラムが分散され並列に実行されている前記計算ノードから前記チェックポイントデータの保存が終了したことを示す終了通知を受け取るごとに、最新のチェックポイントにおける前記復元データに更新することを特徴とする付記１０のデータ管理方法。 [Appendix 12]
Each time the application program receives a termination notification indicating that the checkpoint data storage has been completed from the computing node in which the application program is distributed and executed in parallel, the restored data at the latest checkpoint is updated. The data management method of Appendix 10.

［付記１３］
計算ノードによりグローバルストレージに格納されたアプリケーションプログラムのチェックポイントデータに基づいて、前記グローバルストレージに復元データを作成、または、前記グローバルストレージの前記復元データを更新するデータ更新する処理と、
被代替の前記計算ノードの前記復元データが更新完了であれば前記復元データの位置情報を、更新途中であれば更新が完了してから前記復元データの位置情報を、更新未開始であれば前記復元データの位置情報および対応する前記チェックポイントデータの位置情報を、前記アプリケーションプログラムの実行を引き継ぐ代替の前記計算ノードに送る処理と、
をコンピュータに実行させること特徴とするプログラム。 [Appendix 13]
Based on the checkpoint data of the application program stored in the global storage by the compute node, the process of creating restored data in the global storage or updating the restored data in the global storage, and the process of updating the data.
If the restored data of the alternative calculation node has completed the update, the position information of the restored data is obtained. If the update is in progress, the position information of the restored data is obtained after the update is completed. A process of sending the position information of the restored data and the position information of the corresponding checkpoint data to the alternative calculation node that takes over the execution of the application program.
A program characterized by having a computer execute.

［付記１４］
前記チェックポイントデータは、前回のチェックポイントにおける前記アプリケーションプログラムのデータと今回のデータとの差分であることを特徴とする付記１３のプログラム。 [Appendix 14]
The program of Appendix 13 characterized in that the checkpoint data is a difference between the data of the application program at the previous checkpoint and the data of the present time.

［付記１５］
前記アプリケーションプログラムが、複数の前記計算ノードに分散され並列に実行されている場合、前記複数の前記計算ノード全ての前記チェックポイントデータを前記復元データに上書きし、最新のチェックポイントにおける前記復元データに更新する処理をコンピュータに実行させることを特徴とする付記１３、または、１４のプログラム。 [Appendix 15]
When the application program is distributed to a plurality of the computing nodes and executed in parallel, the checkpoint data of all the plurality of computing nodes is overwritten with the restored data, and the restored data at the latest checkpoint is used. The program of Appendix 13 or 14, characterized in that a computer executes an update process.

［付記１６］
前記アプリケーションプログラムが分散され並列に実行されている前記計算ノード全てから前記チェックポイントデータの保存が終了したことを示す終了通知を受け取ってから、最新のチェックポイントにおける前記復元データに更新する処理をコンピュータに実行させることを特徴とする付記１５のプログラム。 [Appendix 16]
After receiving the end notification indicating that the checkpoint data has been saved from all the computing nodes in which the application program is distributed and executed in parallel, the computer performs a process of updating to the restored data at the latest checkpoint. Appendix 15 program, characterized in that it is executed by a computer.

［付記１７］
前記アプリケーションプログラムが分散され並列に実行されている前記計算ノードから前記チェックポイントデータの保存が終了したことを示す終了通知を受け取るごとに、最新のチェックポイントにおける前記復元データに更新する処理をコンピュータに実行させることを特徴とする付記１５のプログラム。 [Appendix 17]
Every time the application program is distributed and executed in parallel, the computer is updated with the restored data at the latest checkpoint each time it receives a termination notification indicating that the checkpoint data has been saved. Appendix 15 program characterized by being executed.

１００データ管理装置
１１０データ更新手段
１２０データ位置管理手段
２１０計算ノード
２ｉ０計算ノード
２ｊ０計算ノード
２ｎ０計算ノード
２１１プロセッサ
２１２ローカルストレージ
２１３オペレーティングシステム
２１４アプリケーションプログラム
２１５ローカルチェックポイント管理プログラム
３００グローバルストレージ
４００ネットワーク
５００情報処理システム 100 Data management device 110 Data update means 120 Data position management means 210 Calculation node 2i0 Calculation node 2j0 Calculation node 2n0 Calculation node 211 Processor 212 Local storage 213 Operating system 214 Application program 215 Local checkpoint management program 300 Global storage 400 Network 500 Information processing system

Claims

A data update means for creating restored data in the global storage or updating the restored data in the global storage based on the checkpoint data of the application program stored in the global storage by the compute node.
If the restored data of the alternative calculation node has completed the update, the position information of the restored data is obtained. If the update is in progress, the position information of the restored data is obtained after the update is completed. A data position management means for sending the position information of the restored data and the position information of the corresponding checkpoint data to the alternative calculation node that takes over the execution of the application program.
A data management device characterized by including.

The data management device according to claim 1, wherein the checkpoint data is a difference between the data of the application program at the previous checkpoint and the data of the present time.

When the application program is distributed to a plurality of the computing nodes and executed in parallel, the data updating means overwrites the checkpoint data of all the plurality of computing nodes with the restored data to perform the latest check. The data management device according to claim 1 or 2, characterized in that the restored data at the point is updated.

The data updating means receives the end notification indicating that the storage of the checkpoint data is completed from all the computing nodes in which the application program is distributed and executed in parallel, and then the restored data at the latest checkpoint. The data management device according to claim 3, wherein the data is updated to.

Each time the data update means receives a termination notification indicating that the storage of the checkpoint data has been completed from the calculation node in which the application program is distributed and executed in parallel, the data is restored to the restored data at the latest checkpoint. The data management device of claim 3, characterized in that it is updated.

The calculation node is characterized in that it executes a local checkpoint management program that creates the checkpoint data for each checkpoint, stores the checkpoint data in the global storage, and sends the end notification to the data management device. 4 or 5 data management device.

The data management device according to any one of claims 1 to 6 , a calculation node, global storage, and the like.
An information processing system characterized by including.

The data management device
Based on the checkpoint data of the application program stored in the global storage by calculation node, create a restore data to the global storage, or further new the restored data of the global storage,
If the restored data of the alternative calculation node has completed the update, the position information of the restored data is obtained. If the update is in progress, the position information of the restored data is obtained after the update is completed. A data management method comprising sending the location information of restored data and the location information of the corresponding checkpoint data to an alternative computing node that takes over the execution of the application program.

The data management method according to claim 8, wherein the checkpoint data is a difference between the data of the application program at the previous checkpoint and the data of the present time.

And processing according to the check point data of the application program stored in the global storage, create a restore data to the global storage, or to update the restored data of the global storage by calculation nodes,
If the restored data of the alternative calculation node has completed the update, the position information of the restored data is obtained. If the update is in progress, the position information of the restored data is obtained after the update is completed. A process of sending the position information of the restored data and the position information of the corresponding checkpoint data to the alternative calculation node that takes over the execution of the application program.
A program characterized by having a computer that operates as a data management device execute.