JP2010257335A

JP2010257335A - Computer system

Info

Publication number: JP2010257335A
Application number: JP2009108290A
Authority: JP
Inventors: Soichi Furuya; 聡一古屋
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2009-04-27
Filing date: 2009-04-27
Publication date: 2010-11-11

Abstract

<P>PROBLEM TO BE SOLVED: To provide a rollback system in which active regular resources are reduced, and intensive control is not used. <P>SOLUTION: A computer system includes a first computer, a second computer and a third computer, and the respective computers are connected through a network. The first computer, the second computer and the third computer are configured to successively transmit and receive jobs for processing those jobs, and the first computer is configured to manage dependency showing a relation between the processing of the second computer and the processing of the other computer, and to determine whether or not the second computer is normally operating, and to, when determining that the second computer is not normally operating, obtain the identifier of the latest job processed by the second computer from the third computer, and to perform rollback based on the obtained identifier of the job. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、計算機システムに関し、特に、複数のホストがデータの送受信を通じた連携によって処理を実施する計算機システムに関する。 The present invention relates to a computer system, and more particularly to a computer system in which a plurality of hosts perform processing by cooperation through data transmission / reception.

ホスト間でデータを授受することによって連携するシステムは、データベース及びネットワークルーティングなどの機能を含め、多岐に渡る機能を提供する。しかし、システムにおいて一つのホストがダウンすることによって、ダウンしたホストに連携していた他のホストは、トラフィックを完了できず、トラフィックの飽和が次々に発生する。これによって、一つのホストにおけるダウンが、全システムをダウンさせる大規模障害の原因となる。この、一つのホストがダウンすることによって発生する大規模障害のような、局所的な障害に起因する大規模障害を回避する方法が、これまで必要とされ、検討されている。 A system that cooperates by transferring data between hosts provides various functions including functions such as a database and network routing. However, when one host goes down in the system, other hosts linked to the down host cannot complete the traffic, and traffic saturation occurs one after another. As a result, a down in one host causes a large-scale failure that brings down the entire system. A method for avoiding a large-scale failure caused by a local failure, such as a large-scale failure that occurs when one host goes down, has been required and studied so far.

これまで検討されてきた代表的な方法には、大きく二通りの方法がある。 There are two main methods that have been studied so far.

第１の方法には、システム内のホストが持つデータの静的なバックアップを代替ホストに周期的に取得し、バックアップを取得されたホストがダウンしたことを検知した場合、代替ホストが業務を継続するホットスタンバイがある。この第１の方法を効果的に実施するための技術には、ロールバックポイントの特定方法及び代替ホストの特定方法などが、従来提案されている（例えば、特許文献１参照）。 The first method is to periodically acquire a static backup of the data held by the host in the system to the alternative host, and if the host that has acquired the backup detects that the host has gone down, the alternative host continues to operate. There is a hot standby to. As a technique for effectively implementing the first method, a rollback point specifying method, an alternative host specifying method, and the like have been conventionally proposed (for example, see Patent Document 1).

第２の方法は、集中制御によってジョブを管理する方法である（例えば、特許文献２参照）。各ホストはリソースマネージャなどの集中制御ホストへ、逐次自らの処理の状況を告知（メッセージング）することによって、集中制御ホストはジョブの状況を一元管理する。ホストにおいて障害が発生した場合、集中制御ホストは、障害が発生したホストにおいて処理されたジョブの状況に基づいて、ロールバックポイントを判定する。 The second method is a method for managing jobs by centralized control (see, for example, Patent Document 2). Each host successively notifies (messaging) its processing status to a central control host such as a resource manager, whereby the central control host centrally manages the job status. When a failure occurs in the host, the centralized control host determines a rollback point based on the status of the job processed in the host in which the failure has occurred.

特開２００１−４３１０５号公報JP 2001-43105 A 特開平７−２６２０７３号公報JP-A-7-262073

前述に記載した第１の方法を用いる場合、各々のホストは、リソース及びデータ領域を多重化し、常に多重化したデータ領域に対応するハードウェアを稼働させる必要がある。そのため、ホットスタンバイを用いるシステムは、ランニングコストがとても高い。 When the first method described above is used, each host needs to multiplex resources and data areas and always operate hardware corresponding to the multiplexed data areas. Therefore, a system using a hot standby has a very high running cost.

一方、前述に記載した第２の方法を用いる場合、システムは、周期的にバックアップを取るために必要となるリソースを、常に稼働させておく必要はないため、第１の方法と比較して、第２の方法のランニングコストは低い。しかし、第２の方法を用いる場合、リソースマネージャの可用性が、システムの可用性に直接影響するため、第２の方法は、本質的にシステム全体の可用性を高めない。 On the other hand, when using the second method described above, the system does not need to always operate the resources required for taking a backup periodically, so compared with the first method, The running cost of the second method is low. However, when using the second method, the availability of the resource manager directly affects the availability of the system, so the second method essentially does not increase the overall system availability.

そこで本発明は、常に稼働するリソースを減らし、かつ集中制御をしないロールバックシステムを提供することを目的とする。 Accordingly, an object of the present invention is to provide a rollback system that reduces resources that are always in operation and does not perform centralized control.

本発明の代表的な一形態によると、第１の計算機、第２の計算機及び第３の計算機を備え、前記各計算機がネットワークによって接続される計算機システムであって、前記第１の計算機、前記第２の計算機及び前記第３の計算機は、ジョブを順に送信及び受信することによって、前記ジョブを処理し、前記第１の計算機は、前記第２の計算機の処理と他の前記計算機の処理との関係を示す依存関係を管理し、前記第２の計算機が正常に稼働しているか否かを判定し、前記第２の計算機が正常に稼働していないと判定された場合、前記第２の計算機において処理された最新の前記ジョブの識別子を前記第３の計算機から取得し、前記取得したジョブの識別子に基づいて、ロールバックをする。 According to a typical embodiment of the present invention, a computer system comprising a first computer, a second computer, and a third computer, wherein the computers are connected by a network, the first computer, The second computer and the third computer process the job by transmitting and receiving jobs in sequence, and the first computer performs processing of the second computer and processing of the other computers. If it is determined that the second computer is not operating normally, it is determined whether or not the second computer is operating normally. The identifier of the latest job processed in the computer is acquired from the third computer, and rollback is performed based on the acquired identifier of the job.

本発明の一実施形態によると、高い可用性をもつロールバックシステムを提供できる。 According to an embodiment of the present invention, a rollback system with high availability can be provided.

本発明の実施形態の処理の例を示す説明図である。It is explanatory drawing which shows the example of the process of embodiment of this invention. 本発明の第１の実施形態のシステムの全体構成を示すブロック図である。1 is a block diagram illustrating an overall configuration of a system according to a first embodiment of this invention. 本発明の第１の実施形態の処理を示すシーケンス図である。It is a sequence diagram which shows the process of the 1st Embodiment of this invention. 本発明の第１の実施形態の依存関係情報配信処理を示すフローチャートである。It is a flowchart which shows the dependence relationship information delivery process of the 1st Embodiment of this invention. 本発明の第１の実施形態のロールバック処理を示すフローチャートである。It is a flowchart which shows the rollback process of the 1st Embodiment of this invention. 本発明の第１の実施形態のホスト間の関係を示す説明図である。It is explanatory drawing which shows the relationship between the hosts of the 1st Embodiment of this invention. 本発明の第２の実施形態のホスト間の関係を示す説明図である。It is explanatory drawing which shows the relationship between the hosts of the 2nd Embodiment of this invention. 本発明の第２の実施形態のシステム全体の構成を示す説明図である。It is explanatory drawing which shows the structure of the whole system of the 2nd Embodiment of this invention. 本発明の第３の実施形態のシステム全体の構成を示す説明図である。It is explanatory drawing which shows the structure of the whole system of the 3rd Embodiment of this invention. 本発明の第３の実施形態の処理を示すシーケンス図である。It is a sequence diagram which shows the process of the 3rd Embodiment of this invention. 本発明の第４の実施形態のシステム全体の構成を示す説明図である。It is explanatory drawing which shows the structure of the whole system of the 4th Embodiment of this invention. 本発明の第４の実施形態の処理を示すシーケンス図である。It is a sequence diagram which shows the process of the 4th Embodiment of this invention. 本発明の実施形態の処理の流れの組合せを示す説明図である。It is explanatory drawing which shows the combination of the flow of the process of embodiment of this invention.

図１は、本発明の実施形態の処理の具体例を示す説明図である。 FIG. 1 is an explanatory diagram illustrating a specific example of processing according to the embodiment of this invention.

図１に示す処理は、資材を発注する業務において、資材の発注に必要な承認を取得し、ディーラへ発注するためにデータを送信する処理である。 The process shown in FIG. 1 is a process for obtaining approval necessary for ordering materials and transmitting data for ordering to a dealer in the business of placing orders for materials.

図１に示すホストＡ１１００は処理Ａ１１１４を実行し、ホストＢ１２００は処理Ｂ１２０５を処理し、ホストＣ１３００は処理Ｃ１３０５を処理し、ホストＤ１６００は処理Ｄ１６０１を処理する。処理Ａ１１１４は、外部から購買情報を入力され、部長向け承認要求シートを処理Ｂに出力する。処理Ｂ１２０５は、処理Ａ１１１４から部長向け承認要求シートを入力され、センター長向け承認要求シートを処理Ｃ１３０５に出力する。処理Ｃ１３０５は、処理Ｂ１２０５からセンター長向け承認要求シートを入力され、ディーラ向け発注シートを処理Ｄ１６０１に出力する。処理Ｄ１６０１は、処理Ｃ１３０５からディーラ向け発注シートを入力され、入力された発注シートを処理する。 The host A 1100 illustrated in FIG. 1 executes process A 1114, the host B 1200 processes process B 1205, the host C 1300 processes process C 1305, and the host D 1600 processes process D 1601. In process A 1114, purchase information is input from the outside, and an approval request sheet for managers is output to process B. In process B1205, the approval request sheet for managers is input from process A1114, and the approval request sheet for center managers is output to process C1305. In process C1305, the approval request sheet for the center manager is input from process B1205, and the order sheet for the dealer is output to process D1601. In process D1601, the order sheet for dealer is input from process C1305, and the input order sheet is processed.

処理Ａ１１１４、処理Ｂ１２０５、処理Ｃ１３０５、処理Ｄ１６０１は、順次、直列的に実行される。なお、後述する第１、第３、第４の実施形態においては、処理Ｃ１３０５までの処理を記載し、第２の実施形態においては、処理Ｄ１６０１までの処理を記載する。 Process A1114, process B1205, process C1305, and process D1601 are sequentially executed in series. In the first, third, and fourth embodiments to be described later, processing up to processing C1305 is described, and in the second embodiment, processing up to processing D1601 is described.

図２は、本発明の第１の実施形態のシステムの全体構成を示すブロック図である。 FIG. 2 is a block diagram showing the overall configuration of the system according to the first embodiment of this invention.

図２に示すシステムにおいて、ホストＡ１１００、ホストＢ１２００、ホストＢ'１４００、及びホストＣ１３００は、ネットワーク１５００によって接続される。ホストＡ１１００、ホストＢ１２００、ホストＢ'１４００、及びホストＣ１３００は、プロセッサ、メモリ、入力装置、出力装置、及びネットワークインターフェースを備える計算機である。 In the system shown in FIG. 2, a host A 1100, a host B 1200, a host B ′ 1400, and a host C 1300 are connected by a network 1500. The host A 1100, host B 1200, host B ′ 1400, and host C 1300 are computers including a processor, a memory, an input device, an output device, and a network interface.

ホストＡ１１００は、メモリに、記憶装置１１０１とリングバッファ１１０３とを備える。記憶装置１１０１は、ミドルウェアＡ１１０２、ロールバックＡ１１１２、バッファプッシュＡ１１１３、処理Ａ１１１４、定時実行デーモン１１１５、及び依存関係表１１１６を含む。 The host A 1100 includes a storage device 1101 and a ring buffer 1103 in the memory. The storage device 1101 includes middleware A1102, rollback A1112, buffer push A1113, processing A1114, scheduled execution daemon 1115, and dependency relationship table 1116.

ミドルウェアＡ１１０２、ロールバックＡ１１１２、バッファプッシュＡ１１１３、処理Ａ１１１４、及び定時実行デーモン１１１５は、プログラムであり、プロセッサによって参照され、実行される。依存関係表１１１６は、データの集合であり、後述するホストＢから送信され、ホスト間の依存関係を示すデータを含む。 The middleware A1102, the rollback A1112, the buffer push A1113, the process A1114, and the scheduled execution daemon 1115 are programs that are referenced and executed by the processor. The dependency relationship table 1116 is a set of data, and is transmitted from the host B described later and includes data indicating the dependency relationship between the hosts.

ミドルウェアＡ１１０２は、依存関係表更新１１１０、診断１１１１、及び判定１１１９を含む。依存関係表更新１１１０、診断１１１１、及び判定１１１９は、プログラムである。 The middleware A 1102 includes a dependency relationship table update 1110, a diagnosis 1111, and a determination 1119. The dependency table update 1110, diagnosis 1111, and determination 1119 are programs.

リングバッファ１１０３は、先頭ポインタを保存するエリア１１１７と、データを保存するエリア１１１８とを含む。 The ring buffer 1103 includes an area 1117 for storing a head pointer and an area 1118 for storing data.

ホストＢ１２００は、メモリに記憶装置１２０１を備える。記憶装置１２０１は、ミドルウェアＢ１２０２と、処理Ｂ１２０５と、処理Ｂ依存表１２０６とを含む。ミドルウェアＢ１２０２は、依存関係情報配信１２０３、及び受診１２０４を含む。依存関係情報配信１２０３、及び受診１２０４は、プログラムである。また、処理Ｂ１２０５も、プログラムである。処理Ｂ依存表１２０６は、データの集合であり、処理Ｂ１２０５にデータを入力するホスト、すなわち処理Ｂ１２０５の依存元であるホスト、及び、処理Ｂ１２０５からデータを入力されるホスト、すなわち処理Ｂ１２０５の依存先であるホストを示す。 The host B 1200 includes a storage device 1201 in the memory. The storage device 1201 includes middleware B 1202, processing B 1205, and processing B dependency table 1206. The middleware B 1202 includes dependency relationship information distribution 1203 and a consultation 1204. The dependency relationship information distribution 1203 and the consultation 1204 are programs. Processing B1205 is also a program. The process B dependency table 1206 is a set of data. The host that inputs data to the process B 1205, that is, the host that is the dependency source of the process B 1205, and the host that receives data from the process B 1205, that is, the dependency destination of the process B 1205 Indicates a host that is

ホストＣ１３００は、メモリに記憶装置１３０１を備える。記憶装置１３０１は、ミドルウェアＣ１３０２、処理Ｃ１３０５、及び最新ジョブリスト１３０６を含む。ミドルウェアＣ１３０２、及び処理Ｃ１３０５は、プログラムである。ミドルウェアＣ１３０２は、リスト更新１３０３、及び最新ジョブ検索１３０４を含む。 The host C 1300 includes a storage device 1301 in the memory. The storage device 1301 includes middleware C1302, processing C1305, and latest job list 1306. The middleware C1302 and the process C1305 are programs. The middleware C1302 includes a list update 1303 and a latest job search 1304.

ホストＢ'１４００は、ホストＢ１２００において障害が発生した場合に、ホストＢ１２００の代替となる。ホストＢ'１４００は、ホストＢ１２００と同じプログラムを含むが、ホストＢ１２００の動的なデータは、ホストＢ'１４００に同期されない。また、ホストＢ'１４００は、ホストＢ１２００が正常に稼働している場合は、稼働していない。ホストＢ'１４００は、他ホストから起動するよう要求されることによって起動する。 The host B ′ 1400 replaces the host B 1200 when a failure occurs in the host B 1200. Host B′1400 includes the same program as host B1200, but the dynamic data of host B1200 is not synchronized with host B′1400. In addition, the host B ′ 1400 is not operating when the host B 1200 is operating normally. The host B ′ 1400 starts when requested to start from another host.

図３は、本発明の第１の実施形態の処理を示すシーケンス図である。 FIG. 3 is a sequence diagram showing processing of the first exemplary embodiment of the present invention.

図３は、図２に示す計算機システムによって実行される処理を示す。 FIG. 3 shows processing executed by the computer system shown in FIG.

図３に示す処理によって、本発明の実施形態の計算機システムは、ホストＢ１２００において障害が発生した際に、ホストＢ１２００において実行されていた処理を、ホストＢ'１４００に実行させる。 With the processing illustrated in FIG. 3, the computer system according to the embodiment of this invention causes the host B ′ 1400 to execute the processing that has been executed in the host B 1200 when a failure occurs in the host B 1200.

ホストＢ１２００が起動した後（ステップ２００１）、ホストＢ１２００は、依存関係情報配信１２０３によって、ホストＢ１２００における処理Ｂ１２０５の依存関係を、処理Ｂ１２０５の依存元であるホストＡ１１００に送信する。具体的には、ホストＢ１２００は、ホストＢ１２００の出力がホストＣ１３００に入力されることを示す依存関係［Ｂ−＞Ｃ］を含む処理Ｂ依存表１２０６を、ホストＡ１１００に送信する。 After the host B 1200 is activated (step 2001), the host B 1200 transmits the dependency relationship of the process B 1205 in the host B 1200 to the host A 1100 that is the dependency source of the process B 1205 by the dependency relationship information distribution 1203. Specifically, the host B 1200 transmits a process B dependency table 1206 including a dependency relationship [B-> C] indicating that the output of the host B 1200 is input to the host C 1300 to the host A 1100.

ホストＡ１１００は、処理Ｂ依存表１２０６を受信した場合、依存関係表更新１１１０によって、受信した処理Ｂ１２０５についての依存関係［Ｂ−＞Ｃ］と、依存関係に一意に割り当てられたタイプＩＤとを依存関係表１１１６に入力する。タイプＩＤは、任意の英数字である。図２に示す依存関係表１１１６は、依存関係表更新１１１０が実行された後の依存関係表１１１６を示し、タイプ列は"Ｔ１"を、データ列は依存関係［Ｂ−＞Ｃ］を示す。 When the host A 1100 receives the process B dependency table 1206, the host A 1100 depends on the dependency table update 1110 for the received dependency [B-> C] for the process B 1205 and the type ID uniquely assigned to the dependency relationship. Input to the relationship table 1116. The type ID is an arbitrary alphanumeric character. The dependency relationship table 1116 illustrated in FIG. 2 indicates the dependency relationship table 1116 after the dependency relationship update 1110 is executed, the type column indicates “T1”, and the data column indicates the dependency relationship [B-> C].

依存関係情報配信１２０３の詳細を図４において後述する。 Details of the dependency relationship information distribution 1203 will be described later with reference to FIG.

ホストＡ１１００が、依存関係表更新１１１０によって、依存関係表１１１６を更新した後、ホストＡ１１００は、処理Ａ１１１４を実行する。処理Ａ１１１４に続いて、ホストＢ１２００は、処理Ｂ１２０５を実行する。処理Ｂ１２０５に続いて、ホストＣ１３００は、処理Ｃ１３０５を実行する。 After the host A 1100 updates the dependency table 1116 by the dependency table update 1110, the host A 1100 executes the process A 1114. Subsequent to the process A 1114, the host B 1200 executes the process B 1205. Subsequent to the process B1205, the host C1300 executes a process C1305.

処理Ａ１１１４、処理Ｂ１２０５、及び処理Ｃ１３０５の一連の処理は、ホスト間でデータを通信することによって連携する。本発明の実施形態において、ホスト間で通信される一連のデータの単位を、ジョブと呼ぶ。処理Ａ１１１４から処理Ｃ１３０５に至る一連のジョブは、処理Ａ１１１４によって一意な識別子を割り当てられる。処理Ａ１１１４によって割り当てられた識別子を、ジョブＩＤと呼ぶ。 A series of processes of process A 1114, process B 1205, and process C 1305 cooperate by communicating data between hosts. In the embodiment of the present invention, a series of data units communicated between hosts is called a job. A series of jobs from the process A 1114 to the process C 1305 is assigned a unique identifier by the process A 1114. The identifier assigned by the process A 1114 is referred to as a job ID.

ジョブＩＤは、どのホストにおいて処理されたデータであるかを示す情報を含む。このため、ホストＡ１１００、ホストＢ１２００、及びホストＣ１３００は、過去にどのホストにおいて処理されたデータであるかを示す情報を、常に含む。 The job ID includes information indicating in which host the data is processed. Therefore, the host A 1100, the host B 1200, and the host C 1300 always include information indicating in which host the data has been processed in the past.

処理Ａ１１１４が終了し、処理Ａ１１１４の出力が処理Ｂ１２０５へ送信された後、ホストＡ１１００は、処理Ａ１１１４を復元するためのロールバックデータを作成し、作成したロールバックデータをバッファプッシュＡ１１１３によってリングバッファ１１０３に入力する。ロールバックデータは、ジョブＩＤによって示される。また、バッファプッシュＡ１１１３は、ロールバックデータを、先頭ポインタ１１１７が示す番地に入力する。そして、バッファプッシュＡ１１１３は、先頭ポインタ１１１７の番地を、最も古いジョブＩＤが入力された番地に更新する。 After the process A1114 is completed and the output of the process A1114 is transmitted to the process B1205, the host A1100 creates rollback data for restoring the process A1114, and the created rollback data is transferred to the ring buffer 1103 by the buffer push A1113. To enter. Rollback data is indicated by a job ID. Further, the buffer push A 1113 inputs the rollback data at the address indicated by the head pointer 1117. Then, the buffer push A 1113 updates the address of the head pointer 1117 to the address where the oldest job ID is input.

なお、リングバッファ１１０３における最も古いジョブＩＤは、リングバッファ１１０３が番地の昇順にジョブＩＤを入力される場合、先頭ポインタが示す番地よりも一つ多い番地のジョブＩＤである。リングバッファ１１０３が番地の降順にジョブＩＤを入力される場合、リングバッファ１１０３における最も古いジョブＩＤは、先頭ポインタが示す番地よりも一つ少ない番地のジョブＩＤである。本実施形態において、リングバッファ１１０３は、番地の昇順にジョブＩＤを入力される。 Note that the oldest job ID in the ring buffer 1103 is the job ID of the address one more than the address indicated by the head pointer when the job IDs are input in the ascending order of the addresses in the ring buffer 1103. When the job IDs are input to the ring buffer 1103 in descending address order, the oldest job ID in the ring buffer 1103 is the job ID of the address one less than the address indicated by the head pointer. In this embodiment, the ring buffer 1103 receives job IDs in ascending order of addresses.

ホストＣ１３００は、ホストＢ１２００における処理Ｂ１２０５が終了した後、リスト更新１３０３によって、最新ジョブリスト１３０６のホスト列にホストＡ１１００を示す"Ａ"を、ジョブＩＤ列にホストＡ１１００において処理されたジョブであることを示すジョブＩＤを記録する。なお、最新ジョブリスト１３０６が、最新のジョブＩＤを示すために、ホストＢ１２００における処理Ｂ１２０５が終了した後、リスト更新１３０３は、最新ジョブリスト１３０６の過去のジョブＩＤを最新のジョブＩＤによって上書きする。 The host C 1300 is a job in which “A” indicating the host A 1100 is displayed in the host column of the latest job list 1306 and “A” is displayed in the job ID column after the processing B 1205 in the host B 1200 is completed. Is recorded. Since the latest job list 1306 indicates the latest job ID, the list update 1303 overwrites the past job ID of the latest job list 1306 with the latest job ID after the processing B1205 in the host B 1200 is completed.

ホストＡ１１００は、ホストＢ１２００が正常に稼働しているか否かを周期的に診断１１１１する。ホストＡ１１００は、定時実行デーモン１１１５によって診断１１１１を周期的に実行する。診断１１１１は、ｐｉｎｇコマンドによるネットワークインターフェースの稼働状況、及び正常に処理を実行できるか否かなどの確認を含む。ホストＢ１２００は、ホストＡ１１００による診断１１１１を受信した後、受診１２０４によって、ホストＡ１１００に稼働状況を返信する。 The host A 1100 periodically diagnoses 1111 whether or not the host B 1200 is operating normally. The host A 1100 periodically executes the diagnosis 1111 using the scheduled execution daemon 1115. The diagnosis 1111 includes confirmation of the operation status of the network interface by the ping command and whether or not processing can be normally executed. After receiving the diagnosis 1111 from the host A 1100, the host B 1200 returns the operating status to the host A 1100 through the consultation 1204.

ホストＡ１１００は、ホストＢ１２００から稼働状況を受信すると、判定１１１９によって、稼働状況に含まれる処理の実行状況、稼働状況を返信するまでの時間、稼働状況の内容に基づいて、ホストＢ１２００が正常に稼働しているか否かを判定する。 When the host A 1100 receives the operation status from the host B 1200, the host B 1200 operates normally based on the execution status of the process included in the operation status, the time until the operation status is returned, and the content of the operation status according to the determination 1119. It is determined whether or not.

判定１１１９によって、ホストＢ１２００が正常に稼働していると判定された場合、ロールバックする必要はないため、ホストＡ１１００は、定時実行デーモン１１１５に従って、診断１１１１を繰り返す。 If it is determined by the determination 1119 that the host B 1200 is operating normally, the host A 1100 repeats the diagnosis 1111 according to the scheduled execution daemon 1115 because there is no need to roll back.

判定１１１９によって、ホストＢ１２００が異常であると判定された場合、ホストＡ１１００は、ロールバックをするため、最新のジョブＩＤを送信するようにホストＣ１３００に要求し、ホストＢ'１４００を起動する。なお、ホストＣ１３００が最新のジョブＩＤをホストＡ１１００に送信するよう要求を受信した後に、ホストＣ１３００がホストＢ'１４００を起動してもよい。 If it is determined by the determination 1119 that the host B 1200 is abnormal, the host A 1100 requests the host C 1300 to transmit the latest job ID and starts up the host B ′ 1400 in order to roll back. The host C 1300 may activate the host B ′ 1400 after the host C 1300 receives a request to transmit the latest job ID to the host A 1100.

ホストＣ１３００は、ホストＡ１１００から最新のジョブＩＤを要求された後、最新ジョブ検索１３０４を実行する。ホストＣ１３００は、最新ジョブ検索１３０４によって、ホストＡ１１００によって実行されたジョブのうち、最新のジョブのＩＤを検索し、検索した結果の最新のジョブＩＤを、ホストＡ１１００に返信する。最新ジョブ検索１３０４を実行する際、ホストＣ１３００は、最新ジョブリスト１３０６のホスト列が"Ａ"を示すジョブＩＤ（以降、最新ジョブＩＤと記載）を、ホストＡ１１００に返信する。 The host C 1300 executes the latest job search 1304 after receiving the latest job ID from the host A 1100. The host C 1300 searches the latest job search 1304 for the latest job ID among the jobs executed by the host A 1100, and returns the latest job ID as a result of the search to the host A 1100. When executing the latest job search 1304, the host C 1300 returns a job ID (hereinafter referred to as the latest job ID) in which the host column of the latest job list 1306 indicates “A” to the host A 1100.

ホストＡ１１００は、最新ジョブＩＤを受信した場合、ロールバックＡ１１１２を実行する。ロールバックＡ１１１２の詳細を、図５において後述する。 When the host A1100 receives the latest job ID, the host A1100 executes rollback A1112. Details of the rollback A 1112 will be described later with reference to FIG.

ホストＢ'１４００は、ホストＡ１１００から送信された起動する旨の指示に従って、起動する（ステップ２００２）。その後、ホストＢ'１４００は、ホストＢ'１４００において実行される処理Ｂ１２０５に関する依存関係の情報を、ホストＡ１１００に送信する（依存関係情報配信２００３）。ホストＡ１１００は、処理Ａ１１１４の依存先を、ホストＢ１２００における処理Ｂ１２０５からホストＢ'１４００における処理Ｂ１２０５へ、判定１１１９においてホストＢ１２００が異常であると判定してから、処理Ａ１１１４が実行されるまでの間、いずれのタイミングにおいて変更してもよい。 The host B ′ 1400 is activated in accordance with the activation instruction transmitted from the host A 1100 (step 2002). Thereafter, the host B ′ 1400 transmits dependency information related to the process B 1205 executed in the host B ′ 1400 to the host A 1100 (dependency relationship information distribution 2003). The host A 1100 changes the dependence destination of the process A 1114 from the process B 1205 in the host B 1200 to the process B 1205 in the host B ′ 1400, and after determining in the determination 1119 that the host B 1200 is abnormal, until the process A 1114 is executed. The timing may be changed at any timing.

また、ホストＣ１３００は、判定１１１９において、ホストＡ１１００から、ホストＢ１２００の代わりにホストＢ'１４００が起動した旨を送信され、処理Ｃ１３０５の依存元を、ホストＢ１２００における処理Ｂ１２０５からホストＢ'１４００における処理Ｂ１２０５へ変更してもよい。または、ホストＣ１３００は、ホストＢ'１４００における処理Ｂ１２０５が終了した際に、ホストＢ１２００に代わりにホストＢ'１４００が起動した旨を送信され、処理Ｃ１３０５の依存元を、ホストＢ１２００における処理Ｂ１２０５からホストＢ'１４００における処理Ｂ１２０５へ変更してもよい。 Further, in the determination 1119, the host C 1300 is notified from the host A 1100 that the host B ′ 1400 has been activated instead of the host B 1200, and the dependence source of the process C 1305 is changed from the process B 1205 in the host B 1200 to the process in the host B ′ 1400. You may change to B1205. Alternatively, when the process B1205 in the host B′1400 is completed, the host C1300 is notified that the host B′1400 has started up instead of the host B1200, and the host of the process C1305 is changed from the process B1205 to the host B1200. You may change to process B1205 in B'1400.

図４は、本発明の第１の実施形態の依存関係情報配信１２０３を示すフローチャートである。 FIG. 4 is a flowchart showing the dependency relationship information distribution 1203 according to the first embodiment of this invention.

ホストＢ１２００は、処理Ｂ１２０５の処理Ｂ依存表１２０６を参照する（ステップ１１００１）。ステップ１１００１における参照の結果に基づいて、ホストＢ１２００は、処理Ｂの依存元へ、依存関係を示す情報を送信する（ステップ１１００２）。具体的には、ホストＢ１２００は、処理Ｂ１２０５の結果を出力するホストＣ１３００を示す識別子を、処理Ｂ１２０５にデータを入力するホストＡ１１００に送信する。第１の実施形態におけるステップ１１００２においては、ホストＢ１２００は、「ホストＢ１２００における処理Ｂの結果は、ホストＣ１３００に渡される」という意味を示す依存関係［Ｂ−＞Ｃ］をホストＡ１１００に送信する。 The host B 1200 refers to the process B dependency table 1206 of the process B 1205 (step 11001). Based on the result of the reference in step 11001, the host B 1200 transmits information indicating the dependency relationship to the dependency source of the process B (step 11002). Specifically, the host B 1200 transmits an identifier indicating the host C 1300 that outputs the result of the process B 1205 to the host A 1100 that inputs data to the process B 1205. In step 11002 in the first embodiment, the host B 1200 transmits a dependency [B-> C] indicating that “the result of the process B in the host B 1200 is passed to the host C 1300” to the host A 1100.

図５は、本発明の第１の実施形態のロールバックＡ１１１２を示すフローチャートである。 FIG. 5 is a flowchart showing rollback A1112 according to the first embodiment of this invention.

ホストＡ１１００は、ホストＣ１３００から最新ジョブＩＤを受信した後（ステップ１３００１）、リングバッファ１１０３のジョブＩＤ列のうち、受信した最新ジョブＩＤと一致するエントリを検索し、受信した最新ジョブＩＤと一致するエントリが存在するか否かを判定する（ステップ１３００２）。 After receiving the latest job ID from the host C 1300 (step 13001), the host A 1100 searches the job ID column of the ring buffer 1103 for an entry that matches the received latest job ID, and matches the received latest job ID. It is determined whether or not an entry exists (step 13002).

リングバッファ１１０３は、あらかじめ定められた量よりも多いジョブが入力される場合、古いジョブＩＤを消去する。このため、リングバッファ１１０３が、必ずしも受信した最新ジョブＩＤを含むとは限らない。ステップ１３００２において、受信した最新ジョブＩＤがリングバッファ１１０３に存在しない場合、ホストＡ１１００は、最も古いジョブＩＤ、すなわち、先頭ポインタ１１１７が示す番地のジョブＩＤを選択する（ステップ１３００３）。 The ring buffer 1103 deletes old job IDs when more jobs are input than a predetermined amount. For this reason, the ring buffer 1103 does not necessarily include the received latest job ID. In step 13002, if the received latest job ID does not exist in the ring buffer 1103, the host A 1100 selects the oldest job ID, that is, the job ID of the address indicated by the head pointer 1117 (step 13003).

受信した最新ジョブＩＤがリングバッファ１１０３に存在する場合、ホストＡ１１００は、受信した最新ジョブＩＤの次に実行されるべきジョブＩＤを、ロールバックポイントとする（ステップ１３００４）。 When the received latest job ID exists in the ring buffer 1103, the host A 1100 sets the job ID to be executed next to the received latest job ID as a rollback point (step 13004).

ステップ１３０３またはステップ１３０４の後、ホストＡ１１００は、ホストＢ'１４００において処理が可能か否かを判定する（ステップ１３００５）。ホストＢ'１４００において処理ができない場合、ホストＡ１１００は、あらかじめ定められた時間待機し（１３００６）、ホストＢ'１４００において処理ができるまで待機する。 After step 1303 or step 1304, the host A 1100 determines whether or not processing is possible in the host B ′ 1400 (step 13005). When the host B′1400 cannot process, the host A1100 waits for a predetermined time (13006) and waits until the host B′1400 can process.

ステップ１３００５において、ホストＢ'１４００において処理が可能であると判定された場合、ホストＡ１１００は、ステップ１３００４において選択したロールバックポイントから、ホストＡ１１００における最新のジョブまで、すなわち、リングバッファ１１０３における先頭ポインタ１１１７が示す番地よりも一つ少ない番地のジョブまで、を実行し（ロールバック：ステップ１３００７）、ロールバックＡ１１１２を終了する。 If it is determined in step 13005 that processing can be performed in the host B ′ 1400, the host A 1100 determines from the rollback point selected in step 13004 to the latest job in the host A 1100, that is, the head pointer in the ring buffer 1103. Up to a job having an address one less than the address indicated by 1117 is executed (rollback: step 13007), and rollback A1112 is terminated.

第１の実施形態において、ホストＡ１１００、ホストＢ１２００、ホストＣ１３００の順に処理を連携し、ホストＢ１２００が障害となった際には、ホストＡ１１００、ホストＢ１２００、ホストＣ１３００、及びホストＢ'１４００によるロールバックの連携を示した。第１の実施形態において、ホストＢ'１４００は、ホストＢ１２００のクールスタンバイである。 In the first embodiment, the processes are coordinated in the order of host A1100, host B1200, and host C1300. When host B1200 fails, rollback by host A1100, host B1200, host C1300, and host B′1400 Showed cooperation. In the first embodiment, the host B ′ 1400 is a cool standby of the host B 1200.

前述の通り、第１の実施形態によれば、バックアップのためのリソースをクールスタンバイとし、集中制御をするリソースマネージャを用いずに、ロールバックをすることができる。 As described above, according to the first embodiment, the backup resource can be set as a cool standby, and rollback can be performed without using a resource manager that performs centralized control.

図６は、本発明の第１の実施形態の処理の関係を示す説明図である。 FIG. 6 is an explanatory diagram showing the processing relationship of the first embodiment of the present invention.

図６に示すＡ４００１、Ｂ４００２、Ｃ４００３、及びＢ'４００４は、第１の実施形態におけるホストＡ１１００、ホストＢ１２００、ホストＣ１３００、及びホストＢ'１４００に相当し、処理において、各々上流、中央、下流、代替の役割を果たす。Ｂ４００２において障害が発生した場合、Ｂ'４００４はＢ４００２の代替ホストとなる。 A4001, B4002, C4003, and B′4004 shown in FIG. 6 correspond to the host A1100, host B1200, host C1300, and host B′1400 in the first embodiment. Play an alternative role. If a failure occurs in B4002, B′4004 becomes an alternative host for B4002.

四つ以上のホストが連携をする場合、クールスタンバイとなるホストの数を変更せず、一つのクールスタンバイのホストによって複数のホストを代替機させてもよい。三つ以上のホストが連携をする場合は、第１の実施形態を多重に実施すればよい。第２の実施形態は、第１の実施形態を多重に実施した場合の計算機システムである。 When four or more hosts cooperate, a plurality of hosts may be replaced by a single cool standby host without changing the number of cool standby hosts. When three or more hosts cooperate with each other, the first embodiment may be implemented in a multiplex manner. The second embodiment is a computer system when the first embodiment is implemented in a multiple manner.

図７は、本発明の第２の実施形態の処理の関係を示す説明図である。 FIG. 7 is an explanatory diagram showing the relationship of processing according to the second embodiment of the present invention.

第２の実施形態において図７に示すＡ５００１、Ｂ５００２、Ｃ５００３、及びＤ５００４は、図６に示すＡ４００１、Ｂ４００２、及びＣ４００３の関係を二重に組み合わせた関係である。Ａ５００１、Ｂ５００２、及びＣ５００３と、Ｂ５００２、Ｃ５００３、及びＤ５００４とは、各々上流、中央、及び下流の役割を果たす。また、Ａ５００１、Ｂ５００２、及びＣ５００３の関係において、Ｂ５００２の代替はＢ'５００５である。また、Ｂ５００２、Ｃ５００３、及びＤ５００４の関係において、Ｃ５００３の代替もＢ'５００５である。 In the second embodiment, A5001, B5002, C5003, and D5004 shown in FIG. 7 are a combination of the relationships of A4001, B4002, and C4003 shown in FIG. A5001, B5002, and C5003 and B5002, C5003, and D5004 play upstream, central, and downstream roles, respectively. In addition, in the relationship between A5001, B5002, and C5003, B'5005 is an alternative to B5002. Further, in the relationship between B5002, C5003, and D5004, the alternative of C5003 is also B′5005.

Ｂ'５００５は、Ｂ５００２又はＣ５００３のどちらか一方において障害が発生する場合、Ｂ５００２又はＣ５００３の各々のリソースを共有できる。Ｂ'５００５は、Ｂ５００２及びＣ５００３に備わる静的なソフトウェア及び設定を個別に備えることによって、Ｂ５００２又はＣ５００３のクールスタンバイの役割を果たす。 If a failure occurs in either B5002 or C5003, B′5005 can share the resources of B5002 or C5003. B′5005 plays the role of a cool standby of B5002 or C5003 by separately providing static software and settings provided in B5002 and C5003.

図８は、本発明の第２の実施形態のシステム全体の構成を示す説明図である。 FIG. 8 is an explanatory diagram showing the configuration of the entire system according to the second embodiment of this invention.

図８に示すホストＡ１１００、ホストＢ１２００、ホストＣ１３００、ホストＤ１６００、及びホストＢ'１４００は、図７に示すＡ５００１、Ｂ５００２、Ｃ５００３、Ｄ５００４、及びＢ'５００５と同じ役割を果たす。ホストＢ'１４００は、ホストＢ１２００又はホストＣ１３００のどちらか一方において障害が発生する場合、障害が発生したホストの代替ホストとなる。 The host A1100, host B1200, host C1300, host D1600, and host B′1400 shown in FIG. 8 play the same role as A5001, B5002, C5003, D5004, and B′5005 shown in FIG. When a failure occurs in either the host B 1200 or the host C 1300, the host B ′ 1400 becomes an alternative host for the host in which the failure has occurred.

図８に示すホストＡ１１００は、図２に示すホストＡ１１００と同じである。また、図８に示すホストＢ１２００、及びホストＣ１３００には、図２に示すホストＢ１２００、及びホストＣ１３００に、各々図７に示す上流、中央の役割を果たすためのソフトウェア及びデータの集合が追加される。 The host A 1100 shown in FIG. 8 is the same as the host A 1100 shown in FIG. Further, to the host B 1200 and the host C 1300 shown in FIG. 8, a set of software and data for performing the upstream and central roles shown in FIG. 7 is added to the host B 1200 and the host C 1300 shown in FIG. .

ホストＢ１２００には、図２に示すホストＡ１１００が図６に示す上流の役割を果たすために備えた機能に相当する機能が追加される。すなわち、図８に示すホストＢ１２００には、ミドルウェアＡ１１０２、ロールバックＢ３００１、バッファプッシュＢ３００２、定時実行デーモン１１１５、依存関係表１１１６、及びリングバッファ１１０３が追加される。 A function corresponding to the function provided for the host A 1100 shown in FIG. 2 to play the upstream role shown in FIG. 6 is added to the host B 1200. That is, middleware A1102, rollback B3001, buffer push B3002, scheduled execution daemon 1115, dependency table 1116, and ring buffer 1103 are added to the host B1200 shown in FIG.

また、ホストＣ１３００には、図２に示すＢ１２００が図６に示す中央の役割を果たすために備えた機能に相当する機能が追加される。すなわち、図８に示すホストＣ１３００には、ミドルウェアＢ１２０２、及び処理Ｃ依存表３３０６が追加される。 Further, the host C1300 is added with a function corresponding to the function provided for the B1200 shown in FIG. 2 to play the central role shown in FIG. That is, the middleware B 1202 and the processing C dependency table 3306 are added to the host C 1300 illustrated in FIG.

また、ホストＤ１６００は、図２に示すＣ１３００が図６に示す下流の役割を果たすために備えた機能に相当する機能が備わる。ホストＤ１６００は、プロセッサ、メモリ、入力装置、出力装置、及びネットワークインターフェースを備える計算機である。ホストＤ１６００は、メモリに記憶装置１６０１を備え、記憶装置１６０１は、ミドルウェアＣ１３０２、処理Ｄ１６０１、及び最新ジョブリスト１６０２を備える。 Further, the host D1600 has a function corresponding to the function provided for the C1300 shown in FIG. 2 to play the downstream role shown in FIG. The host D1600 is a computer including a processor, a memory, an input device, an output device, and a network interface. The host D1600 includes a storage device 1601 in a memory, and the storage device 1601 includes a middleware C1302, a process D1601, and a latest job list 1602.

第２の実施形態の計算機システムは、図８に示すホストＡ１１００、ホストＢ１２００、ホストＣ１３００、及びホストＤ１６００によって、図１に示す処理Ａ１１１４、処理Ｂ１２０５、処理Ｃ１３０５、処理Ｄ１６０１を実行する。図８に示すホストＢ１２００において障害が発生した場合、第２の実施形態のシステムは、図３に示す処理フローと同じ処理フローによってロールバックする。ホストＣ１３００において障害が発生した場合、第２の実施形態のシステムは、図３に示す処理フローにおいて、ホストＡ１１００の処理をホストＢ１２００が、ホストＢ１２００の処理をホストＣ１３００が、ホストＣ１３００の処理をホストＤ１６００が実行することによって、ロールバックをする。 The computer system according to the second embodiment executes processing A 1114, processing B 1205, processing C 1305, and processing D 1601 illustrated in FIG. 1 by the host A 1100, host B 1200, host C 1300, and host D 1600 illustrated in FIG. When a failure occurs in the host B 1200 shown in FIG. 8, the system of the second embodiment rolls back by the same processing flow as the processing flow shown in FIG. When a failure occurs in the host C1300, the system of the second embodiment, in the processing flow shown in FIG. 3, the host B1200 performs the processing of the host A1, the host C1300 performs the processing of the host B1200, and the processing of the host C1300. When D1600 executes, it rolls back.

第２の実施形態によれば、複数のホストによって処理をする場合においても、第１の実施形態を用いてロールバックをすることができる。 According to the second embodiment, even when processing is performed by a plurality of hosts, it is possible to roll back using the first embodiment.

第１の実施形態において、依存関係の把握及びホストＢ１２００の診断１１１１を、上流の役割であるホストＡ１１００が実行したが、第３の実施形態において、依存関係の把握及びホストＢ１２００の診断を、下流のホストＣ１３００が実行する。 In the first embodiment, the grasping of the dependency relationship and the diagnosis 1111 of the host B1200 are executed by the host A1100 which is the upstream role. In the third embodiment, the grasping of the dependency relationship and the diagnosis of the host B1200 are performed downstream. Executed by the host C1300.

図９は、本発明の第３の実施形態のシステム全体の構成を示す説明図である。 FIG. 9 is an explanatory diagram showing the configuration of the entire system according to the third embodiment of this invention.

図９に示すホストＡ１１００は、図２に示すミドルウェアＡ１１０２の代わりに、ミドルウェアＡＡ６００１を備える。ミドルウェアＡＡ６００１は、ロールバックデーモン６００２を備える。 A host A 1100 illustrated in FIG. 9 includes middleware AA6001 instead of the middleware A1102 illustrated in FIG. The middleware AA 6001 includes a rollback daemon 6002.

図９に示すホストＢ１２００は、図２に示すミドルウェアＢ１２０２の代わりに、ミドルウェアＢＢ６００３を備える。ミドルウェアＢＢ６００３は、依存関係情報配信６００４を備える。 The host B 1200 shown in FIG. 9 includes middleware BB6003 instead of the middleware B1202 shown in FIG. The middleware BB 6003 includes dependency relationship information distribution 6004.

図９に示すホストＣ１３００は、図２に示すミドルウェアＣ１３０２の代わりに、ミドルウェアＣＣ６００５を備える。また、定時実行デーモン１１１５は、図２においてホストＡ１１００に備えられたが、図９においてホストＣ１３００に備えられる。また、図２においてホストＡ１１００に備えられた依存関係表１１１６は、図９においてホストＣ１３００が備える依存関係表６００７と同じである。ミドルウェアＣＣ６００５は、依存関係表更新６００６、診断１１１１、判定１１１９、リスト更新１３０３、及び最新ジョブ探索１３０４を備える。 A host C 1300 illustrated in FIG. 9 includes middleware CC6005 instead of the middleware C1302 illustrated in FIG. The scheduled execution daemon 1115 is provided in the host A 1100 in FIG. 2, but is provided in the host C 1300 in FIG. 2 is the same as the dependency relationship table 6007 provided in the host C 1300 in FIG. The middleware CC 6005 includes a dependency table update 6006, a diagnosis 1111, a determination 1119, a list update 1303, and a latest job search 1304.

図１０は、本発明の第３の実施形態の処理を示すシーケンス図である。 FIG. 10 is a sequence diagram showing processing of the third exemplary embodiment of the present invention.

第３の実施形態において、ホストＢ１２００は、依存関係情報配信６００４をホストＣ１３００に送信する。ホストＢ１２００は、処理Ｂ１２０５の処理Ｂ依存表１２０６に基づいて処理Ｂ１２０５の依存先であるホストＣ１３００へ依存関係を送信する。具体的には、ホストＢ１２００は、処理Ｂ１２０５の依存元であるホストＡ１１００の識別子を、処理Ｂ１２０５の依存先であるホストＣ１３００に送信する。すなわち、ホストＢ１２００は、「ホストＡ１１００の結果が、ホストＢ１２００に渡される」という意味を表わす依存関係［Ａ−＞Ｂ］をホストＣ１３００に送信する。 In the third embodiment, the host B 1200 transmits dependency relationship information distribution 6004 to the host C 1300. The host B 1200 transmits the dependency relationship to the host C 1300 that is the dependency destination of the process B 1205 based on the process B dependency table 1206 of the process B 1205. Specifically, the host B 1200 transmits the identifier of the host A 1100 that is the dependency source of the process B 1205 to the host C 1300 that is the dependency destination of the process B 1205. That is, the host B 1200 transmits a dependency relationship [A-> B] representing the meaning that “the result of the host A 1100 is passed to the host B 1200” to the host C 1300.

ホストＣ１３００は、依存関係情報配信６００４によって、依存関係を送信された後、受信した依存関係［Ａ−＞Ｂ］を用いて、依存関係表６００７を更新する（依存関係表更新６００６）。 After the dependency relationship is transmitted by the dependency relationship information distribution 6004, the host C 1300 updates the dependency relationship table 6007 using the received dependency relationship [A-> B] (dependency relationship update 6006).

ホストＡ１１００、ホストＢ１２００、及びホストＣ１３００は、各々処理Ａ１１１４、処理Ｂ１２０５、及び処理Ｃ１３０５を実行する。 The host A 1100, the host B 1200, and the host C 1300 execute the process A 1114, the process B 1205, and the process C 1305, respectively.

ホストＣ１３００は、ホストＢ１２００が正常に稼働しているか否かを周期的に診断する。ホストＣ１３００は、定時実行デーモン１１１５によって診断１１１１を周期的に実行する。診断１１１１は、第１の実施形態における診断１１１１と同じである。ホストＢ１２００は、ホストＣ１３００による診断１１１１を受信した後、受診１２０４によって、ホストＣ１３００に稼働状況を返信する。 The host C 1300 periodically diagnoses whether the host B 1200 is operating normally. The host C 1300 periodically executes the diagnosis 1111 using the scheduled execution daemon 1115. The diagnosis 1111 is the same as the diagnosis 1111 in the first embodiment. After receiving the diagnosis 1111 from the host C 1300, the host B 1200 returns the operating status to the host C 1300 through the consultation 1204.

ホストＣ１３００は、ホストＢ１２００から稼働状況を受信すると、判定１１１９によって、ホストＢ１２００が正常に稼働しているか否かを判定する。 Upon receiving the operating status from the host B 1200, the host C 1300 determines whether or not the host B 1200 is operating normally by a determination 1119.

判定１１１９によって、ホストＢ１２００が正常に稼働していると判定された場合、ロールバックする必要はないため、ホストＣ１３００は、定時実行デーモン１１１５に従って、診断１１１１を繰り返す。 If it is determined by the determination 1119 that the host B 1200 is operating normally, the host C 1300 repeats the diagnosis 1111 according to the scheduled execution daemon 1115 because it is not necessary to roll back.

判定１１１９によって、ホストＢ１２００が異常であると判定された場合、ホストＣ１３００は、ホストＢ'１４００を起動する（ステップ２００１）。そして、ホストＣ１３００は、最新ジョブリスト１３０６を検索して最新ジョブＩＤを検索し（最新ジョブ検索１３０４）、最新ジョブＩＤをホストＡ１１００に送信する。 If it is determined by the determination 1119 that the host B 1200 is abnormal, the host C 1300 activates the host B ′ 1400 (step 2001). Then, the host C 1300 searches the latest job list 1306 to search for the latest job ID (latest job search 1304), and transmits the latest job ID to the host A 1100.

なお、第１の実施形態において実行されたような、ホストＡ１１００からホストＣ１３００への最新ジョブＩＤを送信する要求に相当する通信は、第３の実施形態においては不要である。しかし、第３の実施形態においてホストＡ１１００は、ホストＢ１２００の診断結果をロールバックのきっかけとすることができないため、ホストＡ１１００は、常にロールバックデーモン６００２によって自らのネットワークポートを監視する。そして、ホストＣ１３００が最新ジョブ検索１３０４によって、最新ジョブＩＤであるロールバックポイントを送信した場合、ホストＡ１１００は、受信したロールバックポイントに基づいてロールバック１１１２を実行する。 Note that the communication corresponding to the request for transmitting the latest job ID from the host A 1100 to the host C 1300 as executed in the first embodiment is unnecessary in the third embodiment. However, since the host A 1100 cannot use the diagnosis result of the host B 1200 as a rollback trigger in the third embodiment, the host A 1100 always monitors its network port by the rollback daemon 6002. When the host C 1300 transmits a rollback point that is the latest job ID by the latest job search 1304, the host A 1100 executes the rollback 1112 based on the received rollback point.

ホストＢ'１４００は、ホストＣ１３００から起動する旨を受信し、起動する（ステップ２００２）。その後、処理Ｂ１２０５に関する依存関係の情報をホストＣ１３００に送信する。ホストＣ１３００は、処理Ｃ１３０５の依存元を、ホストＢ１２００における処理Ｂ１２０５からホストＢ'１４００における処理Ｂ１２０５へ、判定１１１９においてホストＢ１２００が異常であると判定してから、処理Ｃ１３０５が実行されるまでの間、いずれのタイミングにおいて変更してもよい。 The host B ′ 1400 receives the activation from the host C 1300 and activates (step 2002). Thereafter, the dependency relationship information regarding the process B 1205 is transmitted to the host C 1300. The host C1300 changes the dependency source of the process C1305 from the process B1205 in the host B1200 to the process B1205 in the host B′1400, and after determining in the determination 1119 that the host B1200 is abnormal, until the process C1305 is executed. The timing may be changed at any timing.

また、ホストＡ１１００は、ホストＣ１３００から最新ジョブ検索１３０４において最新のジョブＩＤを送信される際に、ホストＢ１２００の代わりに、ホストＢ'１４００が起動した旨を送信されてもよい。そして、ホストＡ１１００は、処理Ａ１１１４の依存先を、ホストＢ'１４００における処理Ｂ１２０５に変更してもよい。 Further, when the latest job ID is transmitted from the host C 1300 in the latest job search 1304, the host A 1100 may transmit that the host B ′ 1400 is activated instead of the host B 1200. The host A 1100 may change the dependence destination of the process A 1114 to the process B 1205 in the host B ′ 1400.

第３の実施形態においてホストＡ１１００は、ホストＣ１３００へ最新ジョブＩＤの送信を要求することはない。このため、第３の実施形態は、第１の実施形態よりも、ホストＡ１１００からホストＣ１３００への通信がすくない。 In the third embodiment, the host A 1100 does not request the host C 1300 to send the latest job ID. For this reason, the third embodiment requires less communication from the host A 1100 to the host C 1300 than the first embodiment.

一方、第３の実施形態においてホストＡ１１００は、ロールバックデーモン６００２を常に実行させ続けているため、第１の実施形態におけるホストＡ１１００の負荷よりも、第３の実施形態におけるホストＡ１１００の負荷が、高くなる。 On the other hand, since the host A 1100 continues to execute the rollback daemon 6002 in the third embodiment, the load on the host A 1100 in the third embodiment is higher than the load on the host A 1100 in the first embodiment. Get higher.

第４の実施形態において、ホストＢ１２００の依存元及び依存先がホストＢ１２００における処理Ｂ１２０６の終了を認識し、依存元が、リングバッファではなくリストバッファによって、ホストＢ１２００における未処理のジョブを管理する計算機システムを示す。 In the fourth embodiment, the dependence source and dependence destination of the host B 1200 recognize the end of the processing B 1206 in the host B 1200, and the dependence source manages the unprocessed job in the host B 1200 by the list buffer instead of the ring buffer. Indicates the system.

図１１は、本発明の第４の実施形態のシステム全体の構成を示す説明図である。 FIG. 11 is an explanatory diagram showing the configuration of the entire system according to the fourth embodiment of this invention.

図１１に示すホストＡ１１００は、図２に示すホストＡ１１００が備えたロールバックＡ１１１２、バッファプッシュＡ１１１３、及びリングバッファ１１０３に代わり、ロールバックＡ８１１２、バッファプッシュＡ８１１３、バッファ更新８１１４、及びリストバッファ８１０３を備える。バッファプッシュＡ８１１３は、リストバッファ８１０３にエントリを追加し、追加したエントリのフラグに"○"を更新する。バッファ更新は、ホストＢ１２００によって処理されたジョブを示すジョブＩＤのフラグを"×"にする。リストバッファ８１０３は、番地列、フラグ列、ジョブＩＤ列、及びポインタ列を含む。 A host A 1100 illustrated in FIG. 11 includes a roll back A 8112, a buffer push A 8113, a buffer update 8114, and a list buffer 8103 instead of the roll back A 1112, the buffer push A 1113, and the ring buffer 1103 provided in the host A 1100 illustrated in FIG. . The buffer push A 8113 adds an entry to the list buffer 8103 and updates “o” to the flag of the added entry. In the buffer update, the flag of the job ID indicating the job processed by the host B 1200 is set to “x”. The list buffer 8103 includes an address column, a flag column, a job ID column, and a pointer column.

図１１に示すホストＢ１２００は、図２に示す依存関係情報配信１２０３に代わり、依存関係情報配信８２０３を備える。 A host B 1200 illustrated in FIG. 11 includes a dependency relationship information distribution 8203 instead of the dependency relationship information distribution 1203 illustrated in FIG.

図１１に示すホストＣ１３００は、図２に示すホストＣ１３００の機能に加え、依存関係表更新５１１０、トークンバック８３０５、及び依存関係表５１１６を備える。 A host C 1300 illustrated in FIG. 11 includes a dependency table update 5110, a token back 8305, and a dependency table 5116 in addition to the functions of the host C 1300 illustrated in FIG.

図１２は、本発明の第４の実施形態の処理を示すシーケンス図である。 FIG. 12 is a sequence diagram showing processing of the fourth exemplary embodiment of the present invention.

ホストＢ１２００は、処理Ｂ１２０５の依存関係の情報をホストＡ１１００、及びホストＣ１３００の両方に送信する（依存関係情報配信８２０３）。依存関係情報配信８２０３において、ホストＡ１１００に送信される依存関係の情報は、第１の実施形態の依存関係情報配信１２０３においてホストＡ１１００に送信された依存関係と同じである。また、ホストＣ１３００に送信される依存関係の情報は、第３の実施形態の依存関係情報配信６００４においてホストＣ１３００に送信された依存関係と同じである。 The host B 1200 transmits the dependency relationship information of the process B 1205 to both the host A 1100 and the host C 1300 (dependency relationship information distribution 8203). In the dependency relationship information distribution 8203, the dependency relationship information transmitted to the host A 1100 is the same as the dependency relationship transmitted to the host A 1100 in the dependency relationship information distribution 1203 of the first embodiment. Further, the dependency relationship information transmitted to the host C 1300 is the same as the dependency relationship transmitted to the host C 1300 in the dependency relationship information distribution 6004 of the third embodiment.

ホストＡ１１００、及びホストＣ１３００は、各々依存関係表１１１６、及び依存関係表５１１６を備える。また、ホストＡ１１００、ホストＢ１２００、及びホストＣ１３００は、処理Ａ１１１４、処理Ｂ１２０５、及び処理Ｃ１３０５を実行する。この際、処理Ａ１１１４は、処理Ｂ１２０５に、ジョブＩＤを送信する。処理Ｂ１２０５が終了した後、処理Ｃ１３０５を実行する前に、ホストＣ１３００は、ホストＡ１１００に、ホストＡ１１００から来たジョブがホストＢ１２００によって処理された旨を送信する（トークンバック８３０５）。ホストＡ１１００は、ホストＣ１３００からジョブが処理された旨を受信した場合、ジョブがホストＢ１２００によって処理されたので、バッファ更新８１１４によって、リストバッファ８１０３からジョブを削除する。 The host A 1100 and the host C 1300 include a dependency relationship table 1116 and a dependency relationship table 5116, respectively. In addition, the host A 1100, the host B 1200, and the host C 1300 execute the process A 1114, the process B 1205, and the process C 1305. At this time, the process A 1114 transmits the job ID to the process B 1205. After the processing B1205 is completed and before the processing C1305 is executed, the host C1300 transmits to the host A1100 that the job coming from the host A1100 has been processed by the host B1200 (tokenback 8305). When the host A 1100 receives information that the job has been processed from the host C 1300, the host A 1100 deletes the job from the list buffer 8103 by the buffer update 8114 because the job has been processed by the host B 1200.

リストバッファ８１０３は、リスト構造を持つバッファであり、各エントリはフラグ、ジョブＩＤ、及び次のエントリへのポインタを含む。リストバッファ８１０３は、２つのタイミングにおいて更新される。 The list buffer 8103 is a buffer having a list structure, and each entry includes a flag, a job ID, and a pointer to the next entry. The list buffer 8103 is updated at two timings.

一つのタイミングは、ホストＡ１１００において処理Ａ１１１４が終了した際である。ホストＡ１１００において処理Ａ１１１４が終了し、結果のデータをホストＢ１２００に送信する際、ホストＡ１１００は、バッファプッシュＡ８１１３によってリストバッファ８１０３に新たなジョブＩＤを追加する。ホストＡ１１００は、バッファプッシュＡ８１１３によって、リストバッファ８１０３のポインタの末尾に、ホストＡ１１００から送信され、ホストＢ１２００において処理されるジョブＩＤを追加し、追加したジョブＩＤのフラグを"○"とする。 One timing is when the process A 1114 ends in the host A 1100. When the processing A 1114 ends in the host A 1100 and the resulting data is transmitted to the host B 1200, the host A 1100 adds a new job ID to the list buffer 8103 by the buffer push A 8113. The host A 1100 adds a job ID transmitted from the host A 1100 and processed in the host B 1200 to the end of the pointer of the list buffer 8103 by the buffer push A 8113, and sets the flag of the added job ID to “◯”.

また、もう一つのタイミングは、ホストＢ１２００において処理Ｂ１２０５が終了し、ホストＣ１３００に処理Ｂ１２０５によって処理されたジョブが到着し、ホストＣ１３００が、到着したジョブのジョブＩＤをホストＡ１１００に送信した際である。ホストＢ１２００において処理Ｂ１２０５によってジョブが処理された旨をホストＣ１３００から送信された場合、ホストＡ１１００は、バッファ更新８１１４によって、終了したジョブを示すジョブＩＤのフラグを"×"に更新する。 Another timing is when the processing B 1205 ends in the host B 1200, the job processed by the processing B 1205 arrives at the host C 1300, and the host C 1300 transmits the job ID of the arrived job to the host A 1100. . When the host B 1200 transmits from the host C 1300 that the job has been processed by the process B 1205, the host A 1100 updates the job ID flag indicating the completed job to “×” by the buffer update 8114.

なお、ホストＡ１１００は、バッファプッシュＡ８１１３によってリストバッファ８１０３にジョブＩＤを追加する際、フラグが"×"になっているエントリに、ジョブＩＤを追加してもよい。 When the host A 1100 adds a job ID to the list buffer 8103 by the buffer push A 8113, the host A 1100 may add a job ID to an entry whose flag is “x”.

バッファ更新８１１４が終了した後、ホストＡ１１００は、ホストＢ１２００が正常に稼働しているか否かを判定するため、ホストＢ１２００を診断１１１１する。ホストＢ１２００は、ホストＡ１１００からの診断１１１１に受診１２０４し、正常に稼働しているか否かを返信する。ホストＡ１１００は、ホストＢ１２００が正常に稼働しているか否かを判定１１１９し、正常に稼働している場合、定時実行デーモン１１１５に従って次の診断１１１１を実行する。 After the buffer update 8114 is completed, the host A 1100 diagnoses the host B 1200 to determine whether or not the host B 1200 is operating normally. The host B 1200 receives the diagnosis 1111 from the host A 1100 and returns whether or not it is operating normally. The host A 1100 determines 1119 whether or not the host B 1200 is operating normally. If it is operating normally, the host A 1100 executes the next diagnosis 1111 according to the scheduled execution daemon 1115.

判定１１１９において、ホストＢ１２００が異常であると判定された場合、ホストＡ１１００は、ロールバックＡ８１１２を実行する。ロールバックＡ８１１２は、図５に示すフローチャートのステップ１３００１〜ステップ１３００４を、リストバッファ８１０３からフラグが"○"であるジョブＩＤを選択し、ロールバックポイントとするステップに代えた処理である。 When it is determined in the determination 1119 that the host B 1200 is abnormal, the host A 1100 executes rollback A 8112. The rollback A 8112 is a process in which steps 13001 to 13004 in the flowchart shown in FIG. 5 are replaced with a step of selecting a job ID with a flag “◯” from the list buffer 8103 and using it as a rollback point.

また、判定１１１９において、ホストＢ１２００が異常であると判定された場合、ホストＡ１１００は、ホストＢ'１４００を起動２００２する。 If it is determined in the determination 1119 that the host B 1200 is abnormal, the host A 1100 activates the host B ′ 1400 2002.

ホストＢ'１４００は、ホストＡ１１００に従って起動２００２された後、依存関係情報配信２００３によって、ホストＡ１１００に、ホストＢ'１４００における処理Ｂ１２０５の依存関係を送信する。 After the host B′1400 is activated 2002 according to the host A1100, the dependency relationship information distribution 2003 transmits the dependency relationship of the process B1205 in the host B′1400 to the host A1100.

ホストＡ１１００及びホストＣ１３００が、ホストＢ１２００の代わりにホストＢ'１４００が起動されたことによって、処理Ａ１１１４の依存先、及び処理Ｃ１３０５の依存元を変更するタイミングは、第１の実施形態と同じである。 The timing at which the host A 1100 and the host C 1300 change the dependency destination of the process A 1114 and the dependency source of the process C 1305 when the host B ′ 1400 is activated instead of the host B 1200 is the same as in the first embodiment. .

第４の実施形態によれば、処理Ａ１１１４は、処理Ｂ１２０５が処理を終了したか否かを認識できる。また、リングバッファ１１０３と異なり、リストバッファ８１０３は、個別のジョブについて管理できる。このため、第４の実施形態によれば、ジョブの順序が変化するような処理の流れ（Ｏｕｔ−Ｏｆ−Ｏｒｄｅｒ）を持つシステムにおいて、ジョブを重複させることなくロールバックすることができる。 According to the fourth embodiment, the process A 1114 can recognize whether or not the process B 1205 has finished the process. Unlike the ring buffer 1103, the list buffer 8103 can manage individual jobs. Therefore, according to the fourth embodiment, in a system having a processing flow (Out-Of-Order) in which the job order changes, it is possible to roll back the jobs without duplication.

なお、第３の実施の形態及び第４の実施の形態は、前述した第２の実施の形態と同様に四つ以上のホストを備えるシステムにも適用することができる。この場合、連続してジャブの処理を実行する３台のホストの組を選択し、各選択されたホストの組において第３の実施の形態又は第４の実施の形態の動作をさせることによって、本実施の形態の効果を得ることができる。 Note that the third embodiment and the fourth embodiment can be applied to a system including four or more hosts as in the second embodiment described above. In this case, by selecting a set of three hosts that successively execute jab processing, and causing each selected host set to perform the operation of the third embodiment or the fourth embodiment, The effect of this embodiment can be obtained.

前述の通り、本発明の実施形態における、処理Ａ１１１４、処理Ｂ１２０５、処理Ｃ１３０５、及び処理Ｄ１６０１は、各々、物理的に分けられたホストによって実行された。しかし、前述のホストは、物理的に分けられている必要はなく、一つの計算機に作られた複数の仮想計算機であってもよい。 As described above, the process A 1114, the process B 1205, the process C 1305, and the process D 1601 in the embodiment of the present invention are each executed by physically separated hosts. However, the above-described hosts do not need to be physically separated, and may be a plurality of virtual machines created in one computer.

図１３は、本発明の実施形態の処理の流れのパターンを示す説明図である。 FIG. 13 is an explanatory diagram showing a process flow pattern according to the embodiment of this invention.

図１３に示すパターン１、及びパターン２は、第１の実施形態における処理の流れのパターンである。また、図１３に示すパターン３、及びパターン４は、第３の実施形態における処理の流れのパターンである。図１３に示す依存関係を、ホストＡ、ホストＢ、ホストＣ、及びホストＤの関係として示す。 Pattern 1 and pattern 2 shown in FIG. 13 are processing flow patterns in the first embodiment. Further, pattern 3 and pattern 4 shown in FIG. 13 are processing flow patterns in the third embodiment. The dependency relationship shown in FIG. 13 is shown as the relationship between host A, host B, host C, and host D.

パターン１は、ホストＡからホストＢへ、ホストＢからホストＣとホストＤとに処理されるパターンである。パターン１において、ホストＢの依存先は、ホストＣとホストＤとである。パターン１におけるホストＢは、ホストＡに依存関係［Ｂ−＞（Ｃ，Ｄ）］を送信する。ホストＡは、ホストＢが稼働しなくなった場合、最新ジョブ検索の要求をホストＣ及びホストＤの両方に送信する。そして、ホストＣ及びホストＤによる最新ジョブ検索の結果に従って、ロールバックポイントを定める。 Pattern 1 is a pattern processed from host A to host B and from host B to host C and host D. In pattern 1, the dependence destination of host B is host C and host D. The host B in the pattern 1 transmits the dependency [B-> (C, D)] to the host A. When the host B stops operating, the host A sends a request for the latest job search to both the host C and the host D. Then, the rollback point is determined according to the latest job search result by the host C and the host D.

パターン２は、ホストＡからホストＢへ、ホストＢからホストＣ又はホストＤに処理が流れるパターンである。パターン２において、ホストＢの依存先は、ホストＢにおいて実行される処理の結果に従って、ホストＣ、又はホストＤのどちらかになる。パターン２におけるホストＢも、ホストＡに依存関係［Ｂ−＞（Ｃ，Ｄ）］を送信する。そして、パターン２のホストＡも、ホストＢが稼働しなくなった場合、最新ジョブ検索の要求をホストＣ及びホストＤの両方に送信し、ホストＣ及びホストＤによる最新ジョブ検索の結果に従って、ロールバックポイントを定める。 Pattern 2 is a pattern in which processing flows from host A to host B and from host B to host C or host D. In the pattern 2, the dependence destination of the host B is either the host C or the host D according to the result of processing executed in the host B. The host B in the pattern 2 also transmits the dependency [B-> (C, D)] to the host A. When the host A of the pattern 2 stops operating, the latest job search request is sent to both the host C and the host D, and rollback is performed according to the latest job search result by the host C and the host D. Determine points.

パターン３は、ホストＡ又はホストＢから、ホストＣへ、ホストＣからホストＤへ、処理が流れる。パターン３において、ホストＣの依存元は、ホストＡ又はホストＢである。パターン３におけるホストＣは、ホストＤに依存関係［Ａ−＞Ｃ］又は依存関係［Ｂ−＞Ｃ］を送信する。ホストＣが稼働しなくなった場合、ホストＤは、最新ジョブを検索し、最新ジョブがホストＡから流れてきたジョブであれば、最新ジョブＩＤをホストＡに送信し、最新ジョブがホストＢから流れてきたジョブであれば、最新ジョブＩＤをホストＢに送信する。ホストＡ又はホストＢは、ホストＢからの最新ジョブＩＤに従い、ロールバックポイントを定める。 In pattern 3, processing flows from host A or host B to host C, and from host C to host D. In Pattern 3, the dependency source of the host C is the host A or the host B. The host C in the pattern 3 transmits the dependency [A-> C] or the dependency [B-> C] to the host D. When the host C stops operating, the host D searches for the latest job, and if the latest job is a job that flows from the host A, the latest job ID is transmitted to the host A, and the latest job flows from the host B. If it is a received job, the latest job ID is transmitted to the host B. Host A or host B determines a rollback point according to the latest job ID from host B.

パターン４は、ホストＡとホストＢとの処理の終了を条件に、ホストＣにおける処理が開始され、ホストＣからホストＤに処理が流れる。パターン４において、ホストＣの依存元は、ホストＡ及びホストＢである。パターン４におけるホストＣは、ホストＤに依存関係［（Ａ，Ｂ）−＞Ｃ］を送信する。ホストＣが稼働しなくなった場合、ホストＤは、最新ジョブを検索し、ホストＡ及びホストＢの両方に最新のジョブＩＤを送信する。そして、ホストＡ及びホストＢは、受信した最新のジョブＩＤに従って、ロールバックポイントを定める。 Pattern 4 starts processing at host C on condition that processing by host A and host B ends, and processing flows from host C to host D. In pattern 4, the dependency sources of the host C are the host A and the host B. The host C in the pattern 4 transmits the dependency [(A, B)-> C] to the host D. When the host C stops operating, the host D searches for the latest job and transmits the latest job ID to both the host A and the host B. Then, host A and host B determine a rollback point according to the latest received job ID.

図１３に示すパターン１〜パターン４によれば、一つの処理への入力、又は出力が複数の処理からもたらされる場合においても、前述の実施形態のシステムを適用することができる。 According to Pattern 1 to Pattern 4 shown in FIG. 13, the system of the above-described embodiment can be applied even when an input or output to one process is derived from a plurality of processes.

前述の実施形態において提案する計算機システムは、複数のホストを含み、高い可用性を持ち、かつコストが低い。 The computer system proposed in the above embodiment includes a plurality of hosts, has high availability, and is low in cost.

１１００、１２００、１３００、１４００ホスト
１５００ネットワーク
１１０１、１２０１、１３０１、１６０１記憶装置
１１０３リングバッファ
８１０３リストバッファ 1100, 1200, 1300, 1400 Host 1500 Network 1101, 1201, 1301, 1601 Storage device 1103 Ring buffer 8103 List buffer

Claims

A computer system comprising a first computer, a second computer, and a third computer, wherein the computers are connected by a network,
The first computer, the second computer, and the third computer process the job by transmitting and receiving the job in order,
The first calculator is:
Managing the dependency indicating the relationship between the processing of the second computer and the processing of the other computer;
Determining whether the second computer is operating normally;
If it is determined that the second computer is not operating normally, the latest identifier of the job processed in the second computer is acquired from the third computer,
A computer system that performs rollback based on the acquired identifier of the job.

The first computer requests an identifier of the latest job processed in the second computer from the third computer,
The computer system according to claim 1, wherein the third computer transmits the requested latest job to the first computer.

The computer system further includes a fourth computer,
When it is determined that the second computer is not operating normally, the first computer starts the fourth computer to execute processing instead of the second computer,
The said 4th computer transmits the dependency which shows the relationship between the process of the said 4th computer, and the process of the said other computer to the said 1st computer, The Claim 1 or 2 characterized by the above-mentioned. Computer system.

The computer system according to claim 1, wherein the first computer includes a ring buffer for managing an identifier of the job.

The computer system further includes a fifth computer,
The first computer, the second computer, the third computer, and the fifth computer process the job by transmitting and receiving the job in order,
Roll back the first computer in a combination of the first computer, the second computer and the third computer;
5. The computer system according to claim 1, wherein the second computer is rolled back in a combination of the second computer, the third computer, and the fifth computer. 6. .

A computer system comprising a first computer, a second computer, and a third computer, wherein the computers are connected by a network,
The first computer, the second computer, and the third computer process the job by transmitting and receiving the job in order,
The third computer is
Managing the dependency indicating the relationship between the processing of the second computer and the processing of the other computer;
Determining whether the second computer is operating normally;
When it is determined that the second computer is not operating normally, the latest identifier of the job processed in the second computer is transmitted to the first computer,
The computer system is characterized in that the first computer rolls back based on the identifier of the transmitted job.

The computer system further includes a fourth computer,
When it is determined that the second computer is not operating normally, the third computer starts up the fourth computer to execute processing instead of the second computer,
The computer according to claim 6, wherein the fourth computer transmits a dependency relationship indicating a relationship between the processing of the fourth computer and the processing of the other computer to the third computer. system.

The computer system according to claim 6 or 7, wherein the third computer includes a ring buffer for managing an identifier of the job.

The computer system further includes a fifth computer,
The first computer, the second computer, the third computer, and the fifth computer process the job by transmitting and receiving the job in order,
Roll back the first computer in a combination of the first computer, the second computer and the third computer;
9. The computer system according to claim 6, wherein the second computer is rolled back in a combination of the second computer, the third computer, and the fifth computer. .

A computer system comprising a first computer, a second computer, and a third computer, wherein the computers are connected by a network,
The first computer, the second computer, and the third computer process the job by transmitting and receiving the job in order,
The first computer and the third computer manage a dependency indicating a relationship between the processing of the second computer and the processing of the other computer,
The third computer sends an end status indicating that the job has been processed in the second computer to the first computer;
The first calculator is:
Determining whether the second computer is operating normally;
When it is determined that the second computer is not operating normally, rollback is performed based on the transmitted end status.

The computer system further includes a fourth computer,
When it is determined that the second computer is not operating normally, the first computer starts the fourth computer to execute processing instead of the second computer,
The computer system according to claim 10, wherein the fourth computer transmits a dependency relationship of processing in the fourth computer to the third computer.

The computer system according to claim 10 or 11, wherein the first computer includes a list buffer for managing an identifier of the job.