JP4856561B2

JP4856561B2 - Node control method, node control program, and node

Info

Publication number: JP4856561B2
Application number: JP2007022174A
Authority: JP
Inventors: 忠城吉田; 史和小西; 啓敏須賀; 清次冨田; 東潮日高
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2007-01-31
Filing date: 2007-01-31
Publication date: 2012-01-18
Anticipated expiration: 2027-01-31
Also published as: JP2008191705A

Description

本発明は、冗長構成システムのノード制御技術に関する。 The present invention relates to a node control technique for a redundant configuration system.

従来、単一ノードにおいて、管理しているデータベースの任意の時点のスナップショット（以下、チェックポイントと呼ぶ）と、チェックポイントからの更新差分情報（以下、ジャーナルと呼ぶ）とをハードディスク等の記憶媒体（補助記憶部）に記録しておき、このデータベースに障害が発生した場合には、ノードは、このチェックポイントとジャーナルとを用いてデータベースを障害直前の状態に復旧（リカバリ）する技術がある。 Conventionally, in a single node, a snapshot at an arbitrary point of a database managed (hereinafter referred to as a checkpoint) and update difference information (hereinafter referred to as a journal) from the checkpoint are stored in a storage medium such as a hard disk. There is a technique that is recorded in the (auxiliary storage unit), and when a failure occurs in this database, the node uses the checkpoint and the journal to recover the database to the state immediately before the failure.

また、データベースを管理するノードを冗長構成とし、これらのノード間でデータベースを共有するシステム（冗長構成システム）もある。このようなシステムにおいて、サービスを提供するノード（以下、Activeノードと呼ぶ）に障害が発生して待機中のノード（以下、Standbyノードと呼ぶ）に切り替わるとき、長時間を要するという問題がある。 There is also a system (redundant configuration system) in which nodes that manage databases have a redundant configuration and the database is shared between these nodes. In such a system, there is a problem that it takes a long time when a failure occurs in a node that provides a service (hereinafter referred to as an Active node) and the node is switched to a standby node (hereinafter referred to as a Standby node).

このような問題を解決するため、ActiveノードとStandbyノードと間でデータベースを非共有とする冗長構成システムが提案されている。この冗長構成システムを、図７を用いて説明する。図７は、従来の冗長構成システムを例示した図である。図７において、ノード３はActiveノードであり、ノード４はStandbyノードであるものとする。 In order to solve such a problem, a redundant configuration system in which a database is not shared between an Active node and a Standby node has been proposed. This redundant configuration system will be described with reference to FIG. FIG. 7 is a diagram illustrating a conventional redundant configuration system. In FIG. 7, it is assumed that the node 3 is an active node and the node 4 is a standby node.

図７に示すように、ノード４がStandbyノードとして起動する際には、自ノードのチェックポイントとジャーナルではなく、Activeノードであるノード３のチェックポイントとジャーナルとを読み込んで起動する（Ｓ１）。そして、ノード４が起動した後、Activeノードであるノード３は、自ノードで発生したジャーナルを自ノードの補助記憶部に蓄積する（Ｓ２）。また、このノード３は、この自ノードで発生したジャーナルをノード４へ送信する（Ｓ３）。そして、ノード４は、ノード３から送信されたジャーナルを、自ノードの記憶部に反映し、自身の補助記憶部のジャーナルに蓄積する（Ｓ４）。 As shown in FIG. 7, when the node 4 is activated as a standby node, the checkpoint and journal of the node 3 that is the active node are read and activated instead of the checkpoint and journal of the own node (S1). Then, after the node 4 is activated, the node 3 as the active node accumulates the journal generated in the own node in the auxiliary storage unit of the own node (S2). Further, this node 3 transmits the journal generated in this node to the node 4 (S3). Then, the node 4 reflects the journal transmitted from the node 3 in the storage unit of its own node, and accumulates it in the journal of its own auxiliary storage unit (S4).

このように、Standbyノード（ノード４）とActiveノード（ノード３）との間で、データベースと、チェックポイントおよびジャーナルの同期をとっておくことで、このノード３に障害が発生した場合でも、Standbyノード（ノード４）は自ノードを起動することができる。つまり、Activeノードに障害が発生した場合でも、その障害によるサービス停止時間を短くすることができる。 In this way, by synchronizing the database, the checkpoint, and the journal between the standby node (node 4) and the active node (node 3), even if a failure occurs in this node 3, the standby is performed. The node (node 4) can activate its own node. That is, even when a failure occurs in the Active node, the service stop time due to the failure can be shortened.

以下の非特許文献１には、このような冗長構成システムにおける各ノードのActive化／Standby化の制御技術が開示されている。
Times Ten レプリケーション・ガイドリリース5.0、Times Ten社、東京エレクトロン株式会社、平成15年8月、ｐ.27-32 Non-Patent Document 1 below discloses a control technology for making each node active / standby in such a redundant configuration system.
Times Ten Replication Guide Release 5.0, Times Ten, Tokyo Electron Limited, August 2003, p.27-32

しかし、前記した従来の冗長構成システムは、いずれかのノードがActiveノードとして起動完了しないと、他のノードは補助記憶部からのデータ（チェックポイントとジャーナル）を読み込まなかった。このため、このような冗長構成システムにおいて障害が発生し、すべてのノードの再起動が必要になったとき、サービス開始までに時間がかかるという問題があった。 However, in the above-described conventional redundant configuration system, if any node does not complete activation as an active node, the other nodes do not read data (checkpoint and journal) from the auxiliary storage unit. For this reason, when a failure occurs in such a redundant configuration system and it is necessary to restart all nodes, there is a problem that it takes time to start the service.

このような問題を、図８および図９を用いて詳細に説明する。図８および図９は、従来技術の問題を説明するために引用した図である。ここでは、冗長構成システムのノード３,４が、ノード３→ノード４の順に起動し、このノード３が起動に失敗する場合を例に説明する。なお、システム管理部３２０,４２０は、ActiveノードおよびStandbyノードを選択（決定）する。ここでは、システム管理部３２０,４２０が、冗長構成システムを構成するノード３,４のうち、最初に起動したノードをActiveノードとして選択する場合を例に説明する。データ管理部３２１,４２１は、各ノードの補助記憶部のデータを管理する。なお、以下の説明において、補助記憶部のデータ読み込み（チェックポイントとジャーナルのロード）とは、データ管理部３２１,４２１が補助記憶部のチェックポイントとジャーナルとを使って、記憶部にデータベースを構築することを示す。 Such a problem will be described in detail with reference to FIGS. FIG. 8 and FIG. 9 are diagrams cited for explaining the problems of the prior art. Here, a case will be described as an example where the nodes 3 and 4 of the redundant configuration system are activated in the order of the node 3 → the node 4 and the node 3 fails to be activated. The system managers 320 and 420 select (determine) the Active node and the Standby node. Here, a case will be described as an example where the system management units 320 and 420 select the first activated node as the active node among the nodes 3 and 4 configuring the redundant configuration system. The data management units 321 and 421 manage the data in the auxiliary storage unit of each node. In the following description, data reading from the auxiliary storage unit (loading of checkpoints and journals) means that the data management units 321 and 421 construct a database in the storage unit using the checkpoints and journals of the auxiliary storage unit. Indicates to do.

図８に示すように、ノード３のシステム管理部３２０は、ノード３の電源投入等を契機として、データ管理部３２１に起動指示を出力する（Ｓ８０１）。そして、起動指示を受けたデータ管理部３２１は自ノードのデータ管理部３２１を起動すると、システム管理部３２０へ起動完了通知を返す（Ｓ８０２）。そして、システム管理部３２０は、ActiveノードおよびStandbyノードを決定する（Ｓ８０３：Active／Standbyの決定）。ここでは、例えば、システム管理部３２０は最初に起動したノード（つまり、自ノードであるノード３）をActiveノードとして決定する。 As illustrated in FIG. 8, the system management unit 320 of the node 3 outputs an activation instruction to the data management unit 321 when the power of the node 3 is turned on (S801). Upon receiving the activation instruction, the data management unit 321 activates the data management unit 321 of its own node, and returns an activation completion notification to the system management unit 320 (S802). Then, the system management unit 320 determines an Active node and a Standby node (S803: Determination of Active / Standby). Here, for example, the system management unit 320 determines the first activated node (that is, the node 3 that is its own node) as the Active node.

そして、システム管理部３２０は、データ管理部３２１にActive化指示を行う（Ｓ８０４）。これを受けて、データ管理部３２１はActive化処理を開始する（Ｓ８０５）。そして、データ管理部３２１は自ノードの補助記憶部からデータ（チェックポイントとジャーナル）読み込みを開始する（Ｓ８１４）。 Then, the system management unit 320 issues an activation instruction to the data management unit 321 (S804). In response to this, the data management unit 321 starts the activation process (S805). Then, the data management unit 321 starts reading data (checkpoint and journal) from the auxiliary storage unit of its own node (S814).

また、ノード４においても同様に、システム管理部４２０は、ノード４の電源投入等を契機として、データ管理部４２１に起動指示を出力する（Ｓ８１１）。そして、起動指示を受けたデータ管理部４２１は起動を完了すると、システム管理部４２０へ起動完了通知を返す（Ｓ８１２）。そして、ノード３のシステム管理部３２０は、ActiveノードおよびStandbyノードを決定する（Ｓ８１３：Active／Standbyの決定）。前記したとおり、ここではノード３が最初に起動しているので、このノード３がActiveノードになる可能性が高い。しかし、ノード４は、ノード３がActiveノードとして起動完了したことを確認しないと、ノード４はStandbyノードになれないので、ノード４のシステム管理部４２０は、Activeノードであるノード３の処理を待つ。つまり、このノード３からのActive化の完了通知（Active化完了通知）の受信を待つ。 Similarly, in the node 4, the system management unit 420 outputs an activation instruction to the data management unit 421 when the power of the node 4 is turned on (S <b> 811). Upon receiving the activation instruction, the data management unit 421 returns an activation completion notification to the system management unit 420 when the activation is completed (S812). Then, the system management unit 320 of the node 3 determines an Active node and a Standby node (S813: Determination of Active / Standby). As described above, since the node 3 is activated first, there is a high possibility that this node 3 becomes an active node. However, if the node 4 does not confirm that the node 3 has been activated as an active node, the node 4 cannot become a standby node. Therefore, the system management unit 420 of the node 4 waits for processing of the node 3 that is an active node. . That is, it waits for reception of an activation completion notice (activation completion notice) from this node 3.

図９の説明に移る。ここで、ノード３のシステム管理部３２０が自ノードの異常通知（例えば、ノード３のデータベースにおける障害発生の通知）を受信すると（Ｓ９０１）、ノード３のシステム管理部３２０およびノード４のシステム管理部４２０はActiveノードの変更を行う（Ｓ９０２：Active／Standbyの変更）。例えば、ノード３のシステム管理部３２０が自ノードの障害発生をノード４のシステム管理部４２０へ通知し、これに基づきシステム管理部４２０は、自ノードをActiveノードに変更すると決定する。このような決定をしたシステム管理部４２０は、データ管理部４２１へActive化指示を出力する（Ｓ９０３）。これを受けて、データ管理部４２１はActive化処理を開始し（Ｓ９０４）、自ノードからデータを読み込む（Ｓ９０５）。 Turning to the description of FIG. Here, when the system management unit 320 of the node 3 receives an abnormality notification of the own node (for example, notification of failure occurrence in the database of the node 3) (S901), the system management unit 320 of the node 3 and the system management unit of the node 4 420 changes the Active node (S902: Active / Standby change). For example, the system management unit 320 of the node 3 notifies the system management unit 420 of the node 4 of the occurrence of a failure of the own node, and based on this, the system management unit 420 determines to change the own node to an active node. The system management unit 420 that has made such a determination outputs an activation instruction to the data management unit 421 (S903). In response to this, the data management unit 421 starts the activation process (S904), and reads data from its own node (S905).

一方、障害が発生したノード３は、自ノードの異常に対する後処理を行う（Ｓ９１１）。この後のＳ９１２〜Ｓ９１４の処理は、前記した図８のＳ８１１〜Ｓ８１３の処理と同様なので説明を省略する。このような処理の後、ノード３のシステム管理部３２０は、Activeノードであるノード４の処理を待つ。 On the other hand, the node 3 in which the failure has occurred performs post-processing for the abnormality of the own node (S911). The subsequent processing of S912 to S914 is the same as the processing of S811 to S813 of FIG. After such processing, the system management unit 320 of the node 3 waits for processing of the node 4 that is an active node.

一方、ノード４のデータ管理部４２１はActive化を完了すると、システム管理部４２０へActive化完了通知を出力し（Ｓ９１５）、これを受けてシステム管理部４２０はサービスを開始する（Ｓ９１６）。例えば、自ノードのデータベースに記憶されたデータの読み出しや書込みを可能にする。 On the other hand, when the data management unit 421 of the node 4 completes the activation, it outputs an activation completion notification to the system management unit 420 (S915), and in response to this, the system management unit 420 starts a service (S916). For example, data stored in the database of the own node can be read or written.

この後、ノード４のシステム管理部４２０は、自ノードのActive化完了通知をノード３のシステム管理部３２０へ送信し（Ｓ９１７）、このような通知を受信したシステム管理部３２０は、自ノードをStandbyノードと決定する（Ｓ９１８）。そして、データ管理部３２１へ自ノードのStandby化指示を出力する（Ｓ９１９）。これを受けてデータ管理部３２１はStandby化処理を開始し（Ｓ９２０）、Activeノード（ノード４）からのデータ読み込みを行う（Ｓ９２１）。そして、データ管理部３２１は、このようなStandby化を完了すると、Standby化完了通知をシステム管理部３２０へ出力し（Ｓ９２２）、処理を終了する。つまり、ノード４はActiveノードとなり、ノード３はStandbyノードとなる。 Thereafter, the system management unit 420 of the node 4 transmits a notification of completion of activation of its own node to the system management unit 320 of the node 3 (S917). The standby node is determined (S918). Then, the standby instruction of the own node is output to the data management unit 321 (S919). In response to this, the data management unit 321 starts the standby process (S920), and reads data from the Active node (node 4) (S921). When the data management unit 321 completes such standby conversion, the data management unit 321 outputs a standby conversion completion notification to the system management unit 320 (S922), and ends the process. That is, the node 4 becomes an active node, and the node 3 becomes a standby node.

このように、従来の冗長構成システムにおいて障害が発生し、システムのすべてのノードを再起動させたとき、Activeノードに決定されたノード（図８、図９に例示したノード３）に障害が発生すると、他のノード（図８、図９に例示したノード４）は、その障害発生後からデータの読み込みを開始する。このとき、ハードディスク等の記憶媒体に記憶されたデータの読み込み処理にはＩ／Ｏ（Input/Output）処理が発生する。このため、Activeノードに決定されたノードが起動に失敗すると、サービス開始に長時間を要してしまうという問題があった。 Thus, when a failure occurs in a conventional redundant configuration system and all nodes of the system are restarted, a failure occurs in the node determined as the active node (node 3 illustrated in FIGS. 8 and 9). Then, another node (node 4 illustrated in FIGS. 8 and 9) starts reading data after the occurrence of the failure. At this time, an input / output (I / O) process occurs in a process of reading data stored in a storage medium such as a hard disk. For this reason, when a node determined as an active node fails to start, there is a problem that it takes a long time to start a service.

本発明は、前記した問題を解決し、冗長構成システムの障害等により、すべてのノードを再起動する場合において、この再起動に失敗するノードがあったとしても、速やかにサービス開始できるようにすることを目的とする。 The present invention solves the above-described problem, and when all nodes are restarted due to a failure in a redundant configuration system, etc., even if there is a node that fails to restart, the service can be started promptly. For the purpose.

前記した課題を解決するため、本発明は、冗長構成システムのノードが起動されると、このノードのデータ管理部は、システム管理部からの指示を待たずに補助記憶部から記憶部上にチェックポイントとジャーナル（データ）を読み出す（読み込む）構成とした。ここでノードは、データ読み出し前に、他のノード（他のノードのデータ管理部）がまだActive状態ではないことを確認する。そして、他のノードのデータ管理部がまだActive状態でなければ、自ノードの補助記憶部からデータを読み出しておく。つまり、仮Active状態になる。一方、他のノードのデータ管理部が既にActive状態であれば、このActive状態のノードの補助記憶部からデータを読み出しておく。つまり、仮Standby状態になる。このようにすることで、ノードのデータ管理部は、起動後、システム管理部から自ノードのActive化指示を受信したとき、自ノードをすぐにActive状態に遷移させることができる。従って、冗長構成システムのノードのうち、Activeノードとして起動中の他のノードに障害が発生した場合であっても、自ノードがすぐにActiveノードになることができる。つまり、冗長構成システムは速やかにサービスを開始できるので、サービス停止時間を短くすることができる。 In order to solve the above-described problem, according to the present invention, when a node of a redundant configuration system is activated, the data management unit of this node checks from the auxiliary storage unit to the storage unit without waiting for an instruction from the system management unit. The point and journal (data) are read (read). Here, the node confirms that the other node (data management unit of the other node) is not yet in the active state before reading the data. If the data management unit of another node is not yet in the active state, data is read from the auxiliary storage unit of the own node. That is, it becomes a temporary active state. On the other hand, if the data management unit of another node is already in the active state, data is read from the auxiliary storage unit of the node in the active state. That is, it becomes a temporary standby state. By doing so, the data management unit of the node can immediately transition the own node to the active state when receiving the activation instruction of the own node from the system management unit after being activated. Therefore, even if a failure occurs in another node that is active as an active node among the nodes of the redundant configuration system, the local node can immediately become an active node. That is, since the redundant configuration system can start the service quickly, the service stop time can be shortened.

なお、データ管理部は、他のノード（他のノードのデータ管理部）がActive状態でないことを確認した上で、自ノードの補助記憶部からデータを読み込むようにしたのは、もし他のノードがActive状態であれば、自ノードがActive状態になる（Activeノードになる）可能性は低いからである。つまり、この場合、自ノードは、Standby状態になる（Standbyノードになる）可能性が高いので、他のノードの補助記憶部からデータを読み込み、Standby状態になるための準備をしておく。一方、データ管理部は、他のノードがまだActive状態でないことの確認ができれば、自ノードがActive状態になる可能性があるので、自ノードの補助記憶部からデータ読み込みを行い、Active状態になるための準備をしておく。 The data management unit confirms that the other node (data management unit of the other node) is not in the active state, and then reads the data from the auxiliary storage unit of its own node if the other node This is because if the node is in the active state, it is unlikely that the local node is in the active state (becomes an active node). That is, in this case, since the own node is highly likely to be in the Standby state (becomes a Standby node), data is read from the auxiliary storage unit of another node and preparations for entering the Standby state are made. On the other hand, if the data management unit can confirm that the other nodes are not yet in the active state, the local node may be in the active state. Therefore, the data management unit reads the data from the auxiliary storage unit of the local node and enters the active state. Get ready for.

なお、ここでの補助記憶部とは、例えば、ＨＤＤ（Hard Disk Drive）や、フラッシュメモリ等、不揮発性の記憶媒体である。また、記憶部は、例えば、ＲＡＭ（Random Access Memory）、ＨＤＤ（Hard Disk Drive）や、フラッシュメモリ等である。 Here, the auxiliary storage unit is a non-volatile storage medium such as an HDD (Hard Disk Drive) or a flash memory. The storage unit is, for example, a RAM (Random Access Memory), a HDD (Hard Disk Drive), a flash memory, or the like.

また、ここでのチェックポイントとは、前記したとおり、ノードが管理しているデータベース中のデータの任意の時点のスナップショットである。また、ジャーナルとは、このチェックポイントからの更新差分情報である。なお、前記したとおり、補助記憶部のデータ読み込み（チェックポイントとジャーナルのロード）とは、ノードが補助記憶部のチェックポイントとジャーナルとを使って、記憶部上にデータベースを構築することを示す。 Further, the check point here is a snapshot at an arbitrary point in time of data in the database managed by the node as described above. The journal is update difference information from this checkpoint. As described above, reading data in the auxiliary storage unit (loading checkpoints and journals) indicates that the node constructs a database on the storage unit using the checkpoints and journals in the auxiliary storage unit.

すなわち、請求項１に記載の発明は、複数のノードを備える冗長構成システムにおいて、前記複数のノードの起動時に、前記複数のノードそれぞれがネットワーク経由でお互いの状態を確認し、前記複数のノードのうち、いずれか１つのノードを、サービスを提供するActiveノードとし、前記Activeノード以外のノードをStandbyノードとするノード制御方法であって、前記ノードのデータ管理部が、前記ノードのシステム管理部から、前記ノードの起動指示を受信したとき、前記ネットワーク経由で、前記冗長構成システムにおける他のノードへ、前記他のノードのデータ管理部がActive状態か否かの問い合わせ情報を送信し、前記他のノードからの前記問い合わせ情報の応答により、前記他のノードのデータ管理部が前記Active状態ではないと判断したとき、自ノードの補助記憶部から、この補助記憶部に記憶されたチェックポイントおよびジャーナルを前記自ノードの記憶部上に読み出し、前記ノードのシステム管理部が、前記複数のノードのうち、前記チェックポイントおよびジャーナルの読み出しが最初に完了したノードを前記Activeノードとして選択し、それ以外のノードを前記Standbyノードとして選択し、前記ノードのデータ管理部が、前記ノードのシステム管理部が前記自ノードを前記Activeノードとして選択したことにより、前記ノードのシステム管理部から、前記自ノードのActive化指示を受信したとき、前記自ノードのデータ管理部をActive化することを特徴とするノード制御方法とした。 That is, according to the first aspect of the present invention, in the redundant configuration system including a plurality of nodes, each of the plurality of nodes confirms the state of each other via the network when the plurality of nodes are activated. A node control method in which any one of the nodes is an active node that provides a service and a node other than the active node is a standby node, wherein the data management unit of the node is connected to the system management unit of the node. When the start instruction of the node is received, the inquiry information as to whether or not the data management unit of the other node is in the active state is transmitted to the other node in the redundant configuration system via the network, and the other the response of the inquiry information from the node, the data management unit of said other node is determined not to be the Active state When reading the checkpoint and journal stored in the auxiliary storage unit from the auxiliary storage unit of the own node onto the storage unit of the own node, the system management unit of the node, among the plurality of nodes, the select the node checkpoint and journal reading has been completed first as the Active node, select the other node as the Standby node, the data management unit of the node, the system management unit of the node the own A node control method comprising: activating a data management unit of the own node when an instruction to activate the own node is received from a system management unit of the node by selecting a node as the Active node It was.

請求項７に記載の発明は、サービスを提供するActiveノードおよびそのActiveノードの待機ノードであるStandbyノードを備える冗長構成システムに用いられるノードであって、前記ノードのシステム管理部から、前記ノードの起動指示を受信したとき、ネットワーク経由で、前記冗長構成システムにおける他のノードへ、前記他のノードのデータ管理部がActive状態か否かの問い合わせ情報を送信し、前記他のノードからの前記問い合わせ情報の応答により、前記他のノードのデータ管理部が前記Active状態か否かを判断する他ノード状態管理部と、前記他ノード状態管理部において、前記他のノードのデータ管理部が前記Active状態でないと判断したとき、自ノードの補助記憶部から、この補助記憶部に記憶されたチェックポイントおよびジャーナルを前記自ノードの記憶部上に読み出す自ノードデータロード部と、前記冗長構成システムのノードのうち、前記チェックポイントおよびジャーナルの読み出しが最初に完了したノードを前記Activeノードとして選択し、それ以外のノードを前記Standbyノードとして選択するシステム管理部と、前記ノードのシステム管理部が前記自ノードを前記Activeノードとして選択したことにより、このシステム管理部から、前記自ノードのActive化指示を受信したとき、前記自ノードのデータ管理部をActive化する自ノード状態管理部とを備えることを特徴とするノードとした。 The invention according to claim 7 is a node used in a redundant configuration system including an active node that provides a service and a standby node that is a standby node of the active node. When receiving the activation instruction, the inquiry information from the other nodes is transmitted via the network to other nodes in the redundant configuration system whether or not the data management unit of the other nodes is in the active state. the response information, and the other node state management unit the data managing unit of the other nodes to determine whether the Active state, the in another node status management unit, a data management unit of the other nodes the Active state When it is determined that the checkpoint and the journal stored in the auxiliary storage unit are The node that has read the checkpoint and the journal first is selected as the Active node among the node data load unit to be read on the storage unit of the node and the node of the redundant configuration system , and the other nodes When the system management unit that selects the standby node and the system management unit of the node select the local node as the active node, the system management unit receives an activation instruction for the local node. The node includes a self-node state management unit that activates the data management unit of the self-node.

このようにすることで、冗長構成システムのノードのうち、他のノードのデータ管理部がまだActive状態でないことが確認できたノードは、自ノードの補助記憶部からデータを読み出す。つまり、仮Active状態になる。そして、ノードのデータ管理部は、システム管理部から自ノードのActive化指示を受信したとき、自ノードをすぐにActive状態に遷移させることができる。従って、冗長構成システムのノードのうち、Activeノードとして起動中の他のノードに障害が発生した場合であっても、自ノードがすぐにActive状態になることができ、速やかにサービスを開始できる。 By doing so, a node that has confirmed that the data management unit of another node is not yet in the active state among the nodes of the redundant configuration system reads the data from the auxiliary storage unit of its own node. That is, it becomes a temporary active state. Then, the data management unit of the node can immediately transition the own node to the active state when receiving the activation instruction of the own node from the system management unit. Therefore, even when a failure occurs in another node that is active as an active node among the nodes of the redundant configuration system, the own node can immediately enter the active state, and the service can be started promptly.

請求項２に記載の発明は、請求項１に記載のノード制御方法において、前記ノードのデータ管理部が、前記ノードのシステム管理部が前記自ノードを前記Standbyノードとして選択したことにより、前記ノードのシステム管理部から、前記自ノードのStandby化指示を受信したとき、前記自ノードの記憶部上の前記チェックポイントおよびジャーナルを破棄し、前記ネットワーク経由で、前記Activeノードとして選択された他のノードの補助記憶部から、前記チェックポイントおよびジャーナルを前記自ノードの記憶部上に読み出し、前記自ノードのデータ管理部をStandby化することを特徴とするノード制御方法とした。 According to a second aspect of the present invention, in the node control method according to the first aspect, the data management unit of the node causes the system management unit of the node to select the own node as the standby node. When the standby instruction of the local node is received from the system management unit of the local node, the checkpoint and journal on the storage unit of the local node are discarded, and another node selected as the active node via the network The node control method is characterized in that the checkpoint and the journal are read out from the auxiliary storage unit to the storage unit of the own node, and the data management unit of the own node is set to Standby.

請求項８に記載の発明は、請求項７に記載のノードにおいて、前記他ノード状態管理部において、前記ノードのシステム管理部が前記自ノードを前記Standbyノードとして選択したことにより、前記ノードのシステム管理部から、前記自ノードのStandby化指示を受信したとき、前記他ノードデータロード部は、前記自ノードの前記記憶部上の前記チェックポイントおよびジャーナルを破棄し、前記ネットワーク経由で、前記Activeノードとして選択された他のノードの補助記憶部から、この補助記憶部に記憶された前記チェックポイントおよびジャーナルを前記自ノードの記憶部上に読み出し、前記自ノード状態管理部は、前記自ノードのデータ管理部をStandby化することを特徴とするノードとした。 According to an eighth aspect of the present invention, in the node according to the seventh aspect, in the other node state management unit, the system management unit of the node selects the own node as the Standby node, whereby the system of the node When receiving the standby instruction of the own node from the management unit, the other node data load unit discards the checkpoint and journal on the storage unit of the own node, and the Active node via the network The check point and journal stored in the auxiliary storage unit are read out from the auxiliary storage unit of the other node selected as the local node storage unit, and the local node state management unit reads the data of the local node The node is characterized in that the management unit is set to Standby.

このようにすることで、冗長構成システムのノードのうち、他のノードのデータ管理部がまだActive状態でないことが確認できたノードは、自ノードの補助記憶部からデータを読み出す（仮Active状態になる）。そして、このノードのデータ管理部は、システム管理部から自ノードのStandby化指示を受信したとき、いったん自ノードの記憶部上のデータを破棄し、ネットワーク経由で、Activeノードの補助記憶部からデータを読み出し直す。つまり、仮Active状態のノードを、Standby化することができる。 In this way, among the nodes of the redundant configuration system, the node that has confirmed that the data management unit of the other node is not yet in the active state reads the data from the auxiliary storage unit of its own node (in the temporary active state) Become). When the data management unit of this node receives the standby instruction of its own node from the system management unit, the data management unit once discards the data on the storage unit of its own node and transmits data from the auxiliary storage unit of the Active node via the network. Read again. That is, the node in the temporary active state can be changed to standby.

請求項３に記載の発明は、請求項１または請求項２に記載のノード制御方法において、前記ノードのデータ管理部が、前記他のノードからの前記問い合わせ情報の応答により、前記他のノードのデータ管理部が前記Active状態であると判断したとき、前記ネットワーク経由で、前記Activeノードとして選択された他のノードの補助記憶部から、前記チェックポイントおよびジャーナルを前記自ノードの記憶部上に読み出し、前記ノードのデータ管理部が、前記ノードのシステム管理部が前記自ノードを前記Standbyノードとして選択したことにより、前記ノードのシステム管理部から、前記自ノードのStandby化指示を受信したとき、前記自ノードのデータ管理部をStandby化することを特徴とするノード制御方法とした。 According to a third aspect of the present invention, in the node control method according to the first or second aspect, the data management unit of the node causes a response of the inquiry information from the other node to when the data management unit determines that the is an Active state, via the network, from the auxiliary storage unit of another node selected as the Active node reads the checkpoint and journal in a storage unit of the own node The data management unit of the node receives the standby instruction of the local node from the system management unit of the node by the system management unit of the node selecting the local node as the standby node. The node control method is characterized in that the data management unit of its own node is set to Standby.

請求項９に記載の発明は、請求項７または請求項８に記載のノードにおいて、前記ノードのデータ管理部において、前記他のノードからの前記問い合わせ情報の応答により、前記他のノードのデータ管理部が前記Active状態であると判断し、前記ネットワーク経由で、前記Activeノードとして選択された他のノードの補助記憶部から、前記チェックポイントおよびジャーナルを前記自ノードの記憶部上に読み出し、前記ノードのデータ管理部が、前記ノードのシステム管理部が前記自ノードを前記Standbyノードとして選択したことにより、前記ノードのシステム管理部から、前記自ノードのStandby化指示を受信したとき、前記自ノードのデータ管理部をStandby化することを特徴とするノードとした。 According to a ninth aspect of the present invention, in the node according to the seventh or eighth aspect, in the data management unit of the node, the data management of the other node is performed based on a response to the inquiry information from the other node. part is determined to the be Active state, via the network, from the auxiliary storage unit of another node selected as the Active node reads the checkpoint and journal in a storage unit of the own node, the node When the node's system management unit has selected the local node as the standby node , the data management unit of the node receives the standby instruction of the local node from the system management unit of the node. The node is characterized by changing the data management unit to Standby.

このようにすることで、冗長構成システムのノードが起動され、他のノードのデータ管理部がActive状態であることが確認できたノードは、他ノードの補助記憶部からデータを読み出す（仮Standby状態になる）。そして、このノードのデータ管理部は、システム管理部から自ノードのStandby化指示を受信したとき、自ノードのデータ管理部をStandby化する。つまり、仮Standby状態のノードを、Standby化することができる。 By doing so, the node of the redundant configuration system is activated, and the node that has confirmed that the data management unit of the other node is in the active state reads the data from the auxiliary storage unit of the other node (provisional standby state) become). Then, when the data management unit of the node receives the standby instruction for the local node from the system management unit, the data management unit of the local node is switched to the standby mode. That is, a node in the temporary standby state can be changed to standby.

請求項４に記載の発明は、請求項１または請求項２に記載のノード制御方法において、前記ノードのデータ管理部が、前記他のノードからの前記問い合わせ情報の応答により、前記他のノードのデータ管理部が前記Active状態であると判断したとき、前記ネットワーク経由で、前記Activeノードとして選択された他のノードの補助記憶部から、前記チェックポイントおよびジャーナルを前記自ノードの記憶部上に読み出し、前記ノードのデータ管理部が、前記他のノードの補助記憶部から、前記チェックポイントおよびジャーナルを前記自ノードの記憶部上に読み出した後に、前記ノードのシステム管理部が、前記他のノードが前記Active状態でなくたったことを検知したことにより、前記ノードのシステム管理部から、前記自ノードのActive化指示を受信したとき、前記自ノードのデータ管理部をActive化することを特徴とするノード制御方法とした。 According to a fourth aspect of the present invention, in the node control method according to the first or second aspect , the data management unit of the node causes the other node to respond to the inquiry information from the other node. When the data management unit determines that it is in the Active state, the checkpoint and journal are read out from the auxiliary storage unit of another node selected as the Active node to the storage unit of the local node via the network. The data management unit of the node reads the checkpoint and journal from the auxiliary storage unit of the other node onto the storage unit of the own node, and then the system management unit of the node by it is detected that the standing rather than the Active state, the system management unit of the node, Active directive of the own node When receiving, and the node control method characterized by Active the data management unit of the own node.

請求項１０に記載の発明は、請求項７または請求項８に記載のノードにおいて、前記ノードのデータ管理部において、前記他のノードからの前記問い合わせ情報の応答により、前記他のノードのデータ管理部が前記Active状態であると判断し、前記ネットワーク経由で、前記Activeノードとして選択された他のノードの補助記憶部から、前記チェックポイントおよびジャーナルを前記自ノードの記憶部上に読み出し、前記ノードのデータ管理部が、前記他のノードの補助記憶部から、前記チェックポイントおよびジャーナルを前記自ノードの記憶部上に読み出した後に、前記ノードのシステム管理部が、前記他のノードが前記Active状態でなくたったことを検知したことにより、前記ノードのシステム管理部から、前記自ノードのActive化指示を受信したとき、前記自ノードのデータ管理部をActive化することを特徴とするノードとした。 According to a tenth aspect of the present invention, in the node according to the seventh or eighth aspect , in the data management unit of the node, the data management of the other node is performed based on a response to the inquiry information from the other node. And the node reads out the checkpoint and journal from the auxiliary storage unit of another node selected as the Active node via the network to the storage unit of the own node, data management unit, from the auxiliary storage unit of the other nodes, after reading the checkpoint and journal in a storage unit of the own node, the system management unit of the node, the other nodes the Active state by standing and that it has been detected but, from the system management unit of the node, receiving an Active directive of the own node When I was a node, characterized in that Active the data management unit of the own node.

このようにすることで、冗長構成システムのノードが起動され、他のノードのデータ管理部がActive状態であることが確認できたノードは、他ノードの補助記憶部からデータを読み出す（仮Standby状態になる）。そして、このノードのデータ管理部は、システム管理部から自ノードのActive指示を受信したとき、自ノードのデータ管理部をActive化する。つまり、仮Standby状態のノードを、Active化することができる。 By doing so, the node of the redundant configuration system is activated, and the node that has confirmed that the data management unit of the other node is in the active state reads the data from the auxiliary storage unit of the other node (provisional standby state) become). Then, the data management unit of this node activates the data management unit of the own node when receiving the Active instruction of the own node from the system management unit. In other words, a node in the temporary standby state can be made active.

請求項５に記載の発明は、請求項３または請求項４に記載のノード制御方法において、前記ノードのデータ管理部が、前記他のノードからの前記問い合わせ情報の応答により、前記他のノードのデータ管理部が前記Active状態であると判断したとき、前記ネットワーク経由で、前記Activeノードとして選択された他のノードの補助記憶部から、前記チェックポイントおよびジャーナルを前記自ノードの記憶部上に読み出す際に、前記チェックポイントおよびジャーナルを読み出せなかったとき、前記自ノードの補助記憶部から、前記チェックポイントおよびジャーナルを読み出すことを特徴とするノード制御方法とした。 According to a fifth aspect of the present invention, in the node control method according to the third or fourth aspect, the data management unit of the node causes a response of the inquiry information from the other node in response to the inquiry information from the other node. When the data management unit determines that it is in the active state, the checkpoint and journal are read out from the auxiliary storage unit of the other node selected as the active node onto the storage unit of the own node via the network. In this case, the node control method is characterized in that, when the checkpoint and the journal cannot be read , the checkpoint and the journal are read from the auxiliary storage unit of the own node.

このようにすることで、冗長構成システムのノードが、仮Standby状態になるため、またはStandbyノードになるため、他のノードからのデータ読み込もうとして失敗した場合、仮Active状態になることができる。 By doing in this way, since the node of the redundant configuration system is in a temporary standby state or becomes a standby node, if an attempt to read data from another node fails, it can be in a temporary active state.

請求項６に記載の発明は、請求項１ないし請求項５のいずれか１項に記載のノード制御方法を、コンピュータであるノードに実行させるためのノード制御プログラムとした。 The invention described in claim 6 is a node control program for causing a node which is a computer to execute the node control method according to any one of claims 1 to 5.

このような制御プログラムによれば、請求項１ないし請求項５のいずれか１項に記載の制御方法をコンピュータであるノードに実行させることができる。 According to such a control program, a control method according to any one of claims 1 to 5 can be executed by a node which is a computer.

本発明によれば、冗長構成のノードを含むシステムの障害等により、すべてのノードを再起動する場合において、この再起動に失敗するノードがあったとしても、速やかにサービスを開始できる。 According to the present invention, when all nodes are restarted due to a failure or the like of a system including redundantly configured nodes, even if there is a node that fails to restart, the service can be started promptly.

以下、本発明を実施するための最良の形態（以下、実施の形態という）を、図面を参照しながら説明する。 Hereinafter, the best mode for carrying out the present invention (hereinafter referred to as an embodiment) will be described with reference to the drawings.

＜冗長構成システムの概要＞
まず、図１を用いて本実施の形態の冗長構成システムの概要を説明する。図１は、本実施の形態の冗長構成システムの概要を説明する図である。ここでは、図１の冗長構成システムは、補助記憶部を非共有とするノード１,２の２台のノードにより構成される場合を例に説明する。 <Overview of redundant configuration system>
First, the outline of the redundant configuration system according to the present embodiment will be described with reference to FIG. FIG. 1 is a diagram for explaining the outline of the redundant configuration system according to the present embodiment. Here, the redundant configuration system of FIG. 1 will be described as an example in which the redundant configuration system is configured by two nodes 1 and 2 that do not share the auxiliary storage unit.

ノード１は、システム管理部１２０と、データ管理部１２１と、補助記憶部１３と、記憶部１４とを備える。このシステム管理部１２０は、他のノード（ノード２）と情報を交換して、ActiveノードとStandbyノードとを決定する。また、データ管理部１２１は、補助記憶部１３（または、補助記憶部２３）に記憶されるデータの管理を行う。さらに、このデータ管理部１２１は、他のノード（ノード２）のデータ管理部２２１がActive状態か否かを確認して、自ノードを仮Active状態（詳細は後記）または仮Standby状態（詳細は後記）に遷移させる。また、このデータ管理部１２１は、補助記憶部１３（または、補助記憶部２３）に記憶されるデータを記憶部１４上に読み出す。また、補助記憶部１３は、このデータベースのチェックポイントおよびジャーナルを記憶する。なお、この補助記憶部１３は、ＨＤＤ、フラッシュメモリ等の不揮発性の記憶媒体から構成される。また、記憶部１４は、ＲＡＭ、ＨＤＤ、フラッシュメモリ等の記憶媒体から構成される。さらに、以下の説明において、補助記憶部１３,２３のデータ読み込み（チェックポイントとジャーナルのロード）とは、データ管理部１２１が補助記憶部１３,２３のチェックポイントとジャーナルとを使って、自ノードの記憶部１４（記憶部２４）にデータベースを構築することを示す。 The node 1 includes a system management unit 120, a data management unit 121, an auxiliary storage unit 13, and a storage unit 14. The system management unit 120 exchanges information with another node (node 2) to determine an Active node and a Standby node. The data management unit 121 manages data stored in the auxiliary storage unit 13 (or the auxiliary storage unit 23). Further, the data management unit 121 confirms whether or not the data management unit 221 of another node (node 2) is in the Active state, and sets its own node in a temporary Active state (details will be described later) or a temporary Standby state (for details). Transition to below. The data management unit 121 reads data stored in the auxiliary storage unit 13 (or the auxiliary storage unit 23) onto the storage unit 14. The auxiliary storage unit 13 stores checkpoints and journals of this database. The auxiliary storage unit 13 is composed of a nonvolatile storage medium such as an HDD or a flash memory. The storage unit 14 includes a storage medium such as a RAM, an HDD, or a flash memory. Furthermore, in the following description, data reading (loading of checkpoints and journals) in the auxiliary storage units 13 and 23 means that the data management unit 121 uses the checkpoints and journals of the auxiliary storage units 13 and 23 to It shows that a database is constructed in the storage unit 14 (storage unit 24).

また、このノード１のペアとなるノード２も、システム管理部２２０と、データ管理部２２１と、補助記憶部２３と、記憶部２４とを備える。このシステム管理部２２０と、データ管理部２２１と、補助記憶部２３とは、それぞれ、ノード１のシステム管理部１２０と、データ管理部１２１と、補助記憶部１３と同じ機能なので説明を省略する。このノード１とノード２とはクロスケーブル等により接続され、お互いにデータの送受信を行う。ここでは、冗長構成システムの障害発生後、ノード１,２両方のリセットのため、ノード１,２に電源を投入して再起動させる場合を例に説明する。 Further, the node 2 that is a pair of the node 1 also includes a system management unit 220, a data management unit 221, an auxiliary storage unit 23, and a storage unit 24. Since the system management unit 220, the data management unit 221, and the auxiliary storage unit 23 have the same functions as the system management unit 120, the data management unit 121, and the auxiliary storage unit 13 of the node 1, description thereof will be omitted. The node 1 and the node 2 are connected by a cross cable or the like, and transmit / receive data to / from each other. Here, a case will be described as an example in which, after a failure of the redundant configuration system, both the nodes 1 and 2 are reset and then the nodes 1 and 2 are powered on and restarted.

電源投入等により、ノード１が起動すると、データ管理部１２１は、ノード２のデータ管理部２２１がActive状態か否かを確認する状態確認を行う。そして、データ管理部１２１は、ノード２のデータ管理部２２１がActive状態でないことを確認すると、自ノードの補助記憶部１３から記憶部１４上にデータ（チェックポイントとジャーナル）を読み込む。つまり、ノード２のデータ管理部２２１がActive状態でなければ、自ノードがActiveノードになる可能性があるので、事前に自身の補助記憶部１３からデータを読み込んで、記憶部１４上にデータベースを構築しておく。以下、このようなノードの状態を仮Active状態と呼ぶ。 When the node 1 is activated due to power-on or the like, the data management unit 121 performs state confirmation to confirm whether or not the data management unit 221 of the node 2 is in the active state. When the data management unit 121 confirms that the data management unit 221 of the node 2 is not in the active state, the data management unit 121 reads data (checkpoint and journal) from the auxiliary storage unit 13 of the own node onto the storage unit 14. That is, if the data management unit 221 of the node 2 is not in the active state, the own node may become an active node. Therefore, the data is read from the auxiliary storage unit 13 in advance and the database is stored on the storage unit 14. Build it. Hereinafter, such a node state is referred to as a temporary active state.

また、ノード２も電源投入等により起動する。そして、ノード２のデータ管理部２２１も、ノード１のデータ管理部１２１がActive状態でないことを確認すると、自ノードの補助記憶部２３からデータ（チェックポイントとジャーナル）を読み込み、仮Active状態になる。このように、ノード１,２は、他のノードのデータ管理部１２１,２２１がActive状態でなければ、自ノードがActiveノードになる可能性があるので、事前に自身の補助記憶部１３,２３からデータを読み込んでおき、記憶部１４,２４上にデータベースを構築しておく。 The node 2 is also activated when the power is turned on. When the data management unit 221 of the node 2 confirms that the data management unit 121 of the node 1 is not in the active state, the data management unit 221 reads the data (checkpoint and journal) from the auxiliary storage unit 23 of the own node and enters the temporary active state. . Thus, since the nodes 1 and 2 may become the active nodes if the data management units 121 and 221 of the other nodes are not in the active state, the auxiliary storage units 13 and 23 of the nodes 1 and 2 in advance. The data is read in, and a database is constructed on the storage units 14 and 24.

なお、ここでは説明を省略しているが、ノード１,２は他のノードのデータ管理部２２１,１２１がActive状態である場合、この他のノードの補助記憶部２３,１３からデータを読み込む。つまり、ノード１,２は、他のノードのデータ管理部２２１,１２１がActive状態であれば、自ノードがStandbyノードになる可能性があるので、事前に他のノードの補助記憶部２３,１３からデータを読み込んでおき、記憶部２４,１４上にデータベースを構築しておく。以下、このようなノードの状態を仮Standby状態と呼ぶ。 Although explanation is omitted here, the nodes 1 and 2 read data from the auxiliary storage units 23 and 13 of the other nodes when the data management units 221 and 121 of the other nodes are in the active state. In other words, if the data management units 221 and 121 of the other nodes are in the active state, the nodes 1 and 2 may become the standby nodes, so that the auxiliary storage units 23 and 13 of the other nodes in advance. The data is read out from the storage unit 24 and the database is constructed on the storage units 24 and 14. Hereinafter, such a node state is referred to as a temporary standby state.

このようにして仮Active状態になったノード１に障害が発生すると、ノード２は以下のような処理を行う。すなわち、ノード２のシステム管理部２２０において、ノード１に障害が発生したことを検知すると、自ノード（ノード２）をActiveノードにすることを決定する。次に、システム管理部２２０からのActive化指示を受けたデータ管理部２２１は自身のActive化処理を行う。そして、Active化処理を完了すると、ノード２はサービスを開始する。このようにノード２は、既にデータの読み込みを済ませてあるので、ノード１が起動に失敗した場合でも、ノード２は速やかにActiveノードに切り替わり、データベースを用いたサービスを開始できる。 When a failure occurs in the node 1 in the temporary active state in this way, the node 2 performs the following processing. That is, when the system management unit 220 of the node 2 detects that a failure has occurred in the node 1, it decides to make its own node (node 2) an active node. Next, the data management unit 221 that has received an activation instruction from the system management unit 220 performs its own activation process. When the activation process is completed, the node 2 starts the service. As described above, since the node 2 has already read the data, even if the node 1 fails to start, the node 2 can quickly switch to the active node and start the service using the database.

なお、起動に失敗したノード１は、その後の電源再投入等により、再起動を開始する。そして、ノード１は、ノード２のデータ管理部２２１がActive状態であることを確認すると、このノード２からデータを読み込み、仮Standby状態になる。この後、システム管理部１２０からのStandby化指示を受信すると、これを受けてデータ管理部１２１はStandby状態になる。 Note that the node 1 that has failed to start up restarts when the power is turned on again thereafter. When the node 1 confirms that the data management unit 221 of the node 2 is in the active state, the node 1 reads data from the node 2 and enters a temporary standby state. Thereafter, when a standby instruction is received from the system management unit 120, the data management unit 121 enters the standby state in response to the reception.

＜ノードの詳細＞
次に、図２を用いて、本実施の形態の冗長構成システムのノード１,２を詳細に説明する。図２は、図１の各ノードの構成を示したブロック図である。ノード１,２は同じ構成であるので、ここでは代表してノード１を説明する。 <Node details>
Next, the nodes 1 and 2 of the redundant configuration system of this embodiment will be described in detail with reference to FIG. FIG. 2 is a block diagram showing the configuration of each node in FIG. Since the nodes 1 and 2 have the same configuration, the node 1 will be described as a representative here.

なお、ノード１は、例えば、コンピュータにより実現され、ノード２等の外部装置とデータ送受信を行うための入出力部１１と、補助記憶部１３および記憶部１４のデータの管理や、ActiveノードおよびStandbyノードの決定等を行う処理部１２と、データベース１４０が構築される記憶部１４と、このデータベース１４０のチェックポイント１３１やジャーナル１３２等を記憶する補助記憶部１３とを含んで構成される。 Note that the node 1 is realized by, for example, a computer and manages data in the input / output unit 11, the auxiliary storage unit 13 and the storage unit 14 for transmitting / receiving data to / from an external device such as the node 2, and the Active node and Standby. The processing unit 12 that determines a node, the storage unit 14 in which the database 140 is constructed, and the auxiliary storage unit 13 that stores the checkpoint 131, the journal 132, and the like of the database 140 are configured.

入出力部１１は、外部装置と各種データの入出力を行うための入出力インタフェースや、通信インタフェース等から構成される。 The input / output unit 11 includes an input / output interface for inputting / outputting various data to / from an external device, a communication interface, and the like.

また、処理部１２は、このノード１が備えるＣＰＵ（Central Processing Unit）によるプログラム実行処理や、専用回路等により実現される。 The processing unit 12 is realized by a program execution process by a CPU (Central Processing Unit) included in the node 1 or a dedicated circuit.

補助記憶部１３は、ＨＤＤ、フラッシュメモリ等の記憶媒体から構成される。また、記憶部１４は、ＲＡＭ、ＨＤＤ、フラッシュメモリ等の記憶媒体から構成される。なお、処理部１２の機能をソフトウェア的に実現する場合、この補助記憶部１３には、ノード１の機能を実現するためのプログラムが記憶される。つまり、ＣＰＵが、この補助記憶部１３に記憶されるプログラムを記憶部１４上に展開し、実行することで前記した処理部１２の機能を実現する。 The auxiliary storage unit 13 includes a storage medium such as an HDD or a flash memory. The storage unit 14 includes a storage medium such as a RAM, an HDD, or a flash memory. When the function of the processing unit 12 is realized by software, the auxiliary storage unit 13 stores a program for realizing the function of the node 1. That is, the CPU develops the program stored in the auxiliary storage unit 13 on the storage unit 14 and executes the program, thereby realizing the function of the processing unit 12 described above.

なお、以下の説明において、冗長構成システムのStandbyノードが１台である場合を例に説明するが、Standbyノードは複数台であってもよい。また、このノード１,２はＬＡＮ(Local Area Network)、インターネット等のネットワークに接続されていてもよい。さらに、このノード１には、キーボードやマウス等の入力装置、液晶ディスプレイ等の出力装置等が接続されていてもよい。 In the following description, a case where there is one standby node in the redundant configuration system will be described as an example, but there may be a plurality of standby nodes. The nodes 1 and 2 may be connected to a network such as a LAN (Local Area Network) or the Internet. Further, an input device such as a keyboard and a mouse, an output device such as a liquid crystal display, and the like may be connected to the node 1.

＜入出力部＞
入出力部１１は、前記したとおり、外部装置と各種データの入出力を司る。例えば、他のノードの状態確認（詳細は後記）、Active化完了通知、他ノードとの間でのデータ等の入出力を司る。 <Input / output unit>
As described above, the input / output unit 11 performs input / output of various data with the external device. For example, it controls input / output of status confirmation of other nodes (details will be described later), notification of completion of activation, and data with other nodes.

＜処理部＞
処理部１２は、システム管理部１２０と、データ管理部１２１とを備える。 <Processing unit>
The processing unit 12 includes a system management unit 120 and a data management unit 121.

このシステム管理部１２０は、Activeノードになるノード、Standbyノードになるノードを決定する。つまり、システム管理部１２０は、冗長構成システムにおいて、矛盾がないよう、データ管理部１２１の状態（Active状態か、Standby状態か）を管理する。なお、このシステム管理部１２０は、冗長構成システムのノードに実装される公知のミドルウェア等を用いてよい。 The system management unit 120 determines a node to be an active node and a node to be a standby node. That is, the system management unit 120 manages the state of the data management unit 121 (active state or standby state) so that there is no contradiction in the redundant configuration system. The system management unit 120 may use known middleware or the like mounted on a node of the redundant configuration system.

データ管理部１２１は、システム管理部１２０から起動指示を受け付けると、他のノード（ノード２）のデータ管理部２２１の状態確認を行い、自ノードが仮Active状態になるか、仮Standby状態になるかを判断する。また、システム管理部１２０からの指示に基づき自ノードのデータ管理部１２１をActive状態に遷移させたり、Standby状態に遷移させたりする。このようなデータ管理部１２１は、インタフェース部１２２と、状態制御部１２３と、データロード部１２６とを備える。なお、データ管理部１２１をActive状態にするとは、外部からのアプリケーション処理を受け付ける状態にすることであり、データ管理部１２１をStandby状態にするとは、Activeノードから送信されるジャーナルを自ノードの補助記憶部１３に反映できる状態にすることを示す。 When the data management unit 121 receives an activation instruction from the system management unit 120, the data management unit 121 checks the status of the data management unit 221 of another node (node 2), and the local node enters the temporary active state or the temporary standby state. Determine whether. Further, based on an instruction from the system management unit 120, the data management unit 121 of the own node is transitioned to the Active state or transitioned to the Standby state. Such a data management unit 121 includes an interface unit 122, a state control unit 123, and a data load unit 126. Note that setting the data management unit 121 to the active state means accepting application processing from the outside, and setting the data management unit 121 to the standby state means that the journal transmitted from the active node assists the own node. It shows that it can be reflected in the storage unit 13.

インタフェース部１２２は、前記したシステム管理部１２０とデータ管理部１２１とのデータ送受信のインタフェースである。例えば、システム管理部１２０からの起動指示や、Active状態またはStandby状態への状態遷移指示（Active化指示またはStandby化指示）を受信したり、データ管理部１２１からの起動完了通知を出力したりするためのインタフェースである。 The interface unit 122 is an interface for data transmission / reception between the system management unit 120 and the data management unit 121 described above. For example, a start instruction from the system management unit 120, a state transition instruction to the Active state or the Standby state (Active instruction or Standby instruction), or a start completion notification from the data management unit 121 is output. Interface.

状態制御部１２３は、自ノードのデータ管理部１２１の状態（Active状態かStandby状態か）を制御する。この状態制御部１２３は、他ノード状態管理部１２４と自ノード状態管理部１２５とを備える。 The state control unit 123 controls the state (active state or standby state) of the data management unit 121 of the own node. The state control unit 123 includes an other node state management unit 124 and a local node state management unit 125.

他ノード状態管理部１２４は、他のノード（例えば、ノード２）に対し、この他のノードのデータ管理部（例えば、ノード２のデータ管理部２２１）の状態がActive状態か否かを問い合わせる情報を送信する。そして、その情報の応答により、データ管理部２２１がActive状態か否かを判断し、この判断結果を記憶部１４の所定領域に記憶しておく。例えば、Active状態か否かを問い合わせる情報の送信後、所定時間経過しても、ノード２からこのノード２のデータ管理部２２１の状態がActive状態である旨の応答がなかった場合、このノード２はActive状態ではないと判断する。そして、その判断結果を記憶部１４の所定領域に記憶しておく。一方、所定時間以内に、ノード２からこのノード２のデータ管理部２２１の状態がActive状態である旨の応答があれば、このノード２はActive状態であると判断する。そして、その判断結果を記憶部１４の所定領域に記憶しておく。 The other node state management unit 124 inquires of another node (for example, the node 2) whether or not the state of the data management unit (for example, the data management unit 221 of the node 2) of the other node is in the Active state. Send. Then, based on the response of the information, it is determined whether or not the data management unit 221 is in the Active state, and the determination result is stored in a predetermined area of the storage unit 14. For example, if a response indicating that the state of the data management unit 221 of the node 2 is in the active state is not received from the node 2 even after a predetermined time has elapsed after the transmission of the information for inquiring whether the state is the active state, the node 2 Is determined not to be in the active state. The determination result is stored in a predetermined area of the storage unit 14. On the other hand, if there is a response from the node 2 that the state of the data management unit 221 of the node 2 is in the active state within a predetermined time, the node 2 is determined to be in the active state. The determination result is stored in a predetermined area of the storage unit 14.

自ノード状態管理部１２５は、記憶部１４に記憶された他のノードのデータ管理部２２１の状態（Active状態か否か）や、システム管理部１２０からの状態遷移指示に基づき自ノードの状態（例えば、仮Active状態、仮Standby状態、Active状態、Standby状態等）を決定する。そして、決定した自ノードの状態を記憶部１４に記憶しておく。また、他のノード（ノード２等）側から、自ノードのデータ管理部（例えば、ノード１のデータ管理部１２１）の状態がActive状態か否かを問い合わせる情報を受信したとき、その応答をノード２へ返す。 The own node state management unit 125 determines the state of the own node based on the state of the data management unit 221 of other nodes stored in the storage unit 14 (whether the state is the active state) or the state transition instruction from the system management unit 120. For example, a temporary active state, a temporary standby state, an active state, a standby state, etc.) are determined. Then, the determined state of the own node is stored in the storage unit 14. When receiving information from another node (such as node 2) that inquires whether the state of the data management unit (for example, the data management unit 121 of node 1) of the own node is in the active state, the response is sent to the node Return to 2.

データロード部１２６は、自ノードの補助記憶部１３または他のノードの補助記憶部２３から、自ノードの記憶部１４上にデータ（チェックポイント１３１,２３１とジャーナル１３２,２３２）を読み込むことで、記憶部１４上にデータベース１４０を構築する。このようなデータロード部１２６は、自ノードデータロード部１２７と、他ノードデータロード部１２８とを備える。 The data load unit 126 reads data (check points 131 and 231 and journals 132 and 232) from the auxiliary storage unit 13 of the own node or the auxiliary storage unit 23 of another node onto the storage unit 14 of the own node. A database 140 is constructed on the storage unit 14. Such a data load unit 126 includes a local node data load unit 127 and another node data load unit 128.

自ノードデータロード部１２７は、状態制御部１２３からの指示に基づき、自ノードの補助記憶部１３から、自ノードの記憶部１４上にデータを読み込ことで、記憶部１４上にデータベース１４０を構築する。また、他ノードデータロード部１２８は、入出力部１１,２１経由で他のノード（例えば、ノード２）の補助記憶部２３から、データを読み込む。 The own node data load unit 127 reads the data from the auxiliary storage unit 13 of the own node into the storage unit 14 of the own node based on the instruction from the state control unit 123, thereby creating the database 140 on the storage unit 14. To construct. The other node data loading unit 128 reads data from the auxiliary storage unit 23 of another node (for example, the node 2) via the input / output units 11 and 21.

なお、前記したデータ管理部１２１の機能は、システム管理部１２０の一部に組み込むようにしてもよい。 Note that the functions of the data management unit 121 described above may be incorporated into a part of the system management unit 120.

＜記憶部＞
記憶部１４は、データ管理処理部１２１が構築したデータベース１４０を記憶する。また、図示を省略しているが、自ノードおよび他のノードの状態（例えば、仮Active状態、仮Standby状態、Active状態、Standby状態等）を記憶する。 <Storage unit>
The storage unit 14 stores the database 140 constructed by the data management processing unit 121. Although not shown in the figure, the states of the own node and other nodes (for example, the temporary active state, the temporary standby state, the active state, and the standby state) are stored.

＜補助記憶部＞
補助記憶部１３は、記憶部１４のデータベース１４０のチェックポイント１３１およびジャーナル１３２を記憶する。 <Auxiliary storage unit>
The auxiliary storage unit 13 stores checkpoints 131 and journals 132 of the database 140 in the storage unit 14.

これらの構成の詳細は、動作手順の説明の項で後記する。 Details of these configurations will be described later in the description of the operation procedure.

＜動作手順＞
次に、図２を参照しつつ、図３を用いてノード１,２の動作手順を説明する。図３は、図２の各ノードの動作手順を例示したフローチャートである。ここでも代表してノード１の動作手順を説明する。 <Operation procedure>
Next, the operation procedure of the nodes 1 and 2 will be described with reference to FIG. FIG. 3 is a flowchart illustrating an operation procedure of each node in FIG. Here again, the operation procedure of the node 1 will be described as a representative.

電源投入等により、ノード１が起動すると、データ管理部１２１のインタフェース部１２２は、システム管理部１２０から起動指示を受け付ける（Ｓ３０１）。このような起動指示を受けたデータ管理部１２１は、他のノードの状態確認を行う（Ｓ３０２）。つまり、ノード１の他ノード状態管理部１２４は、他のノード（ノード２）のデータ管理部２２１の状態がActive状態か否かを問い合わせる情報を送信し、その応答を待つ。そして、他ノード状態管理部１２４は、この応答から、ノード２のデータ管理部２２１の状態がActive状態ではないと判断したとき（Ｓ３０３でActive以外）、自ノード状態管理部１２５は自ノードを仮Active状態にすることを決定し（Ｓ３０７）、データロード部１２６に対して、自ノード（ノード１）からのデータ読み込みを指示する。そして、自ノードデータロード部１２７は、ノード１の補助記憶部１３からのデータ読み込みを開始する（Ｓ３０８）。この後、データ読み込みが完了し、自ノードの起動が完了すると、自ノード状態管理部１２５は自ノードの状態が仮Active状態になったことを記憶部１４等に記憶し、起動完了通知をインタフェース部１２２経由でシステム管理部１２０へ出力する（Ｓ３０９）。 When the node 1 is activated due to power-on or the like, the interface unit 122 of the data management unit 121 receives an activation instruction from the system management unit 120 (S301). Receiving such an activation instruction, the data management unit 121 checks the status of other nodes (S302). That is, the other node state management unit 124 of the node 1 transmits information for inquiring whether or not the state of the data management unit 221 of the other node (node 2) is the Active state, and waits for a response. When the other node state management unit 124 determines from this response that the state of the data management unit 221 of the node 2 is not in the Active state (other than Active in S303), the own node state management unit 125 temporarily sets the own node. The active state is determined (S307), and the data loading unit 126 is instructed to read data from its own node (node 1). Then, the node data loading unit 127 starts reading data from the auxiliary storage unit 13 of the node 1 (S308). Thereafter, when the data reading is completed and the activation of the own node is completed, the own node state management unit 125 stores in the storage unit 14 or the like that the state of the own node is in the temporary active state, and notifies the activation completion notification to the interface. The data is output to the system management unit 120 via the unit 122 (S309).

一方、Ｓ３０３において、他ノード状態管理部１２４は、ノード２のデータ管理部２２１の状態をActive状態と判断したとき（Ｓ３０３でActive）、自ノード状態管理部１２５は自ノードを仮Standby状態にすることを決定し（Ｓ３０４）、データロード部１２６に対して、他ノード（ノード２）からのデータ読み込みを指示する。そして、他ノードデータロード部１２８は、入出力部１１経由でノード２の補助記憶部２３からのデータ読み込みを開始する（Ｓ３０５）。この後、ノード２の補助記憶部２３からのデータ読み込みに成功し（Ｓ３０６でＮｏ）、自ノードの起動が完了すると、自ノード状態管理部１２５は自ノードの状態が仮Standby状態になったことを記憶部１４等に記憶する。そして、起動完了通知をインタフェース部１２２経由でシステム管理部１２０へ出力する（Ｓ３０９）。 On the other hand, in S303, when the other node state management unit 124 determines that the state of the data management unit 221 of the node 2 is the Active state (Active in S303), the own node state management unit 125 sets the own node to the temporary Standby state. This is determined (S304), and the data loading unit 126 is instructed to read data from another node (node 2). Then, the other node data loading unit 128 starts reading data from the auxiliary storage unit 23 of the node 2 via the input / output unit 11 (S305). After this, the data has been successfully read from the auxiliary storage unit 23 of the node 2 (No in S306), and when the activation of the own node is completed, the own node state management unit 125 indicates that the own node state has become a temporary standby state. Is stored in the storage unit 14 or the like. Then, a startup completion notification is output to the system management unit 120 via the interface unit 122 (S309).

一方、Ｓ３０６において他ノードデータロード部１２８がノード２の補助記憶部２３からのデータ読み込みに失敗した場合（Ｓ３０６でＹｅｓ）、自ノード状態管理部１２５は、自ノードを仮Active状態にすることを決定し（Ｓ３０７）、データロード部１２６に対して、自ノード（ノード１）からのデータ読み込みを指示する。そして、自ノードデータロード部１２７は、補助記憶部１３からデータを読み込む（Ｓ３０９）。 On the other hand, when the other node data load unit 128 fails to read data from the auxiliary storage unit 23 of the node 2 in S306 (Yes in S306), the own node state management unit 125 sets the own node to the temporary Active state. Decide (S307), and instruct the data loading unit 126 to read data from the own node (node 1). Then, the node data loading unit 127 reads data from the auxiliary storage unit 13 (S309).

このようにノード１は、他のノードの状態確認を行った上で、自ノードからデータを読み込んで仮Active状態になったり、他ノードからデータを読み込んで仮Standby状態になったりする。 As described above, the node 1 confirms the state of the other node, and then reads data from its own node to enter a temporary active state, or reads data from another node to enter a temporary standby state.

なお、ここでは説明を省略したが、前記した処理によりデータ管理部１２１から起動完了通知を受け付けると、システム管理部１２０は自ノードをActiveノードとするか、Standbyノードとするかを決定する。すなわち、システム管理部１２０は、他のノード（ノード２）のシステム管理部２２０に、このノード２がActiveノードであるかStandbyノードであるかを確認し、その結果、他のノードがActiveノードであるとき、自ノードをStandbyノードにすると決定する。そして、状態遷移指示としてデータ管理部１２１に対し、Standby化指示を出力する。一方、他のノードがStandbyノードであるとき、または障害等が発生しているとき、システム管理部１２０は、自ノードをActiveノードにすると決定する。そして、状態遷移指示として、データ管理部１２１に対しActive化指示を出力する。 Although not described here, when the activation completion notification is received from the data management unit 121 by the above-described processing, the system management unit 120 determines whether the own node is an Active node or a Standby node. That is, the system management unit 120 checks with the system management unit 220 of another node (node 2) whether the node 2 is an active node or a standby node, and as a result, the other node is an active node. At some point, it decides to make its own node a Standby node. Then, a standby conversion instruction is output to the data management unit 121 as a state transition instruction. On the other hand, when the other node is a standby node, or when a failure or the like occurs, the system management unit 120 determines that the own node is an active node. Then, an activation instruction is output to the data management unit 121 as a state transition instruction.

この後の処理を、図２を参照しつつ、図４を用いて説明する。図４は、図２の各ノードの動作手順を例示したフローチャートである。ここでも代表してノード１の動作手順を説明する。 The subsequent processing will be described with reference to FIG. 4 and FIG. FIG. 4 is a flowchart illustrating an operation procedure of each node in FIG. Here again, the operation procedure of the node 1 will be described as a representative.

まず、データ管理部１２１の状態制御部１２３は、インタフェース部１２２経由でシステム管理部１２０から状態遷移指示を受け付けると（Ｓ４０１）、この指示種別を判断する（Ｓ４０２）。ここで、指示種別がActive化指示である場合（Ｓ４０２でActive化）、データ管理部１２１は自ノード（自ノードのデータ管理部１２１）をActive化した後（Ｓ４０７）、Active化完了を示す状態遷移完了通知をシステム管理部１２０へ出力する（Ｓ４０８）。これを受けて、システム管理部１２０は自ノード全体のActive化処理（例えば、自ノードのアプリケーションモジュールのActive化）を行い、サービスを開始する。つまり、起動後、自ノードまたは他のノードからデータの読み込みをしておいたデータ管理部１２１は、システム管理部１２０からのActive化指示が出力されたとき、アプリケーション１３からのデータ検索処理等を行えるようにする。 First, when the state control unit 123 of the data management unit 121 receives a state transition instruction from the system management unit 120 via the interface unit 122 (S401), the state control unit 123 determines the instruction type (S402). Here, when the instruction type is an activation instruction (active in step S402), the data management unit 121 activates the own node (data management unit 121 of the own node) (S407), and indicates a status indicating completion of activation. A transition completion notification is output to the system management unit 120 (S408). In response to this, the system management unit 120 performs an activation process for the entire own node (for example, activates an application module of the own node) and starts a service. That is, after starting, the data management unit 121 that has read data from its own node or another node performs a data search process from the application 13 when an activation instruction is output from the system management unit 120. Make it possible.

一方、Ｓ４０２において、システム管理部１２０からの指示種別がStandby化指示であり（Ｓ４０２でStandby化）、かつ、自ノード状態管理部１２５が自ノードの状態（つまり、データ管理部１２１の状態）を仮Standby状態と判断したとき（Ｓ４０３で仮Standby状態）、データ管理部１２１は自ノード（自ノードのデータ管理部１２１）をStandby化した後（Ｓ４０６）、Standby化完了を示す状態遷移完了通知をシステム管理部１２０へ出力する（Ｓ４０８）。 On the other hand, in S402, the instruction type from the system management unit 120 is the standby instruction (standby is changed in S402), and the own node state management unit 125 indicates the state of the own node (that is, the state of the data management unit 121). When it is determined that the state is the temporary standby state (temporary standby state in step S403), the data management unit 121 converts the local node (the local node data management unit 121) into the standby state (S406), and then sends a state transition completion notification indicating the completion of the standby state. The data is output to the system management unit 120 (S408).

また、Ｓ４０３において、自ノード状態管理部１２５が自ノードの状態（つまり、データ管理部１２１の状態）を仮Active状態と判断したとき（Ｓ４０３で仮Active状態）、データ管理部１２１は自ノードの状態をStandby化するため以下の処理を行う。すなわち、まず、自ノード状態管理部１２５は、自ノードの補助記憶部１３から読み込んだデータを破棄する（Ｓ４０４）。そして、自ノード状態管理部１２５は、他ノードデータロード部１２８へ他のノードの補助記憶部２３からのデータ読み込みを指示する。これを受けて、他ノードデータロード部１２８は、入出力部１１経由で、他のノードの補助記憶部２３からデータを読み込む（Ｓ４０５）。そして、データ読み込みを完了し、データ管理部１２１は自ノード（自ノードのデータ管理部１２１）をStandby化した後（Ｓ４０６）、Standby化完了を示す状態遷移完了通知をシステム管理部１２０へ出力する（Ｓ４０８）。このような状態遷移完了通知を受けたシステム管理部１２０は自ノード全体のStandby化処理を行う。 In S403, when the own node state management unit 125 determines that the state of the own node (that is, the state of the data management unit 121) is the temporary Active state (temporary Active state in S403), the data management unit 121 The following processing is performed to change the state to Standby. That is, first, the own node state management unit 125 discards the data read from the auxiliary storage unit 13 of the own node (S404). Then, the own node state management unit 125 instructs the other node data load unit 128 to read data from the auxiliary storage unit 23 of another node. In response, the other node data load unit 128 reads data from the auxiliary storage unit 23 of the other node via the input / output unit 11 (S405). Then, after the data reading is completed, the data management unit 121 sets the own node (the data management unit 121 of the own node) in the standby mode (S406), and then outputs a state transition completion notification indicating the completion of the standby mode to the system management unit 120. (S408). Upon receiving such a state transition completion notification, the system management unit 120 performs a standby process for the entire node.

＜動作手順の例＞
次に、図２を参照しつつ、図５および図６を用いて、ノード１,２の動作手順の例を説明する。図５および図６は、図２の各ノードの動作手順を例示したフローチャートである。また、ここでは、短時間の時間差をもってノード１,２に電源が投入され、最初に起動したノード１に障害が発生し、後から起動したノード２がActiveノードになる場合を例に説明する。 <Example of operation procedure>
Next, an example of the operation procedure of the nodes 1 and 2 will be described using FIG. 5 and FIG. 6 with reference to FIG. 5 and 6 are flowcharts illustrating the operation procedure of each node in FIG. Here, a case will be described as an example where the nodes 1 and 2 are powered on with a short time lag, a failure occurs in the first activated node 1, and the later activated node 2 becomes an active node.

ノード１において、インタフェース部１２２は、システム管理部１２０から起動指示を受ける（Ｓ５０１）。このような起動指示を受けると状態制御部１２３の他ノード状態管理部１２４は、ノード２のデータ管理部２２１に対し状態確認を行う（Ｓ５０２）。つまり、ノード１の他ノード状態管理部１２４は、ノード２のデータ管理部１２１の状態がActive状態か否かを問い合わせる情報を送信し、その応答を受信する（Ｓ５０３）。この段階では、ノード２はまだ起動されていないため、状態は不定である。このため、ノード１の自ノード状態管理部１２５は、自ノードを仮Active状態にすると決定し（Ｓ５０４）、自ノードデータロード部１２７に対し、自ノードからのデータ読み込みを指示する。そして、自ノードデータロード部１２７は自ノードの補助記憶部１３からのデータ読み込みを開始する（Ｓ５０５）。 In the node 1, the interface unit 122 receives an activation instruction from the system management unit 120 (S501). When receiving such an activation instruction, the other node state management unit 124 of the state control unit 123 checks the state of the data management unit 221 of the node 2 (S502). That is, the other node state management unit 124 of the node 1 transmits information for inquiring whether or not the state of the data management unit 121 of the node 2 is in the Active state, and receives a response (S503). At this stage, since the node 2 has not been activated yet, the state is indefinite. For this reason, the local node state management unit 125 of the node 1 determines to set the local node to the temporary active state (S504), and instructs the local node data load unit 127 to read data from the local node. Then, the own node data load unit 127 starts reading data from the auxiliary storage unit 13 of the own node (S505).

一方、ノード２において、インタフェース部２２２は、システム管理部２２０から起動指示を受けると（Ｓ５１１）、ノード１での処理と同様に、他ノード状態管理部２２４は、ノード１のデータ管理部１２１に対し状態確認を行う（Ｓ５１２）。つまり、ノード２の他ノード状態管理部２２４は、ノード１のデータ管理部１２１の状態がActive状態か否かを問い合わせる情報を送信し、その応答を受信する（Ｓ５１３）。この段階では、ノード１はまだ完全にActiveノードにはなっていない。そのため、このノード１からの応答をもとに、ノード２の他ノード状態管理部２２４は、ノード１はActive状態ではないと判断する。そして、自ノード状態管理部２２５は、自ノードを仮Active状態にすると決定し（Ｓ５１４）、自ノードデータロード部２２７に対し、自ノードからのデータ読み込みを指示する。そして、自ノードデータロード部２２７は自ノードの補助記憶部２３からのデータ読み込みを開始する（Ｓ５１５）。 On the other hand, in the node 2, when the interface unit 222 receives an activation instruction from the system management unit 220 (S 511), the other node state management unit 224 sends the data management unit 121 of the node 1 to the data management unit 121. The state is checked (S512). That is, the other node state management unit 224 of the node 2 transmits information for inquiring whether or not the state of the data management unit 121 of the node 1 is the Active state, and receives the response (S513). At this stage, node 1 is not yet fully active. Therefore, based on the response from the node 1, the other node state management unit 224 of the node 2 determines that the node 1 is not in the active state. Then, the own node state management unit 225 determines to set the own node to the temporary active state (S514), and instructs the own node data load unit 227 to read data from the own node. Then, the own node data load unit 227 starts reading data from the auxiliary storage unit 23 of the own node (S515).

図６の説明に移る。通常、このような冗長構成システムにおいて先に起動を開始したノードが早く起動完了するが、先に起動を開始したノード１の起動中に何らかの障害が発生した場合、データ管理部１２１は、システム管理部１２０へ異常通知（障害が発生したこと）を出力する（図６のＳ６０１）。 Turning to the description of FIG. Normally, in such a redundant configuration system, the node that started first completes the startup earlier, but if any failure occurs during the startup of the node 1 that started first, the data management unit 121 performs system management. An abnormality notification (that a failure has occurred) is output to the unit 120 (S601 in FIG. 6).

異常検知を受信したシステム管理部１２０は、後処理を行い（Ｓ６０２）、ノード１の再起動を行う。なお、障害の状態によっては、データ管理部１２１はシステム管理部１２０へ異常通知を出力することができない場合もあるが、このような場合、システム管理部１２０の備えるプロセス管理機能や、起動監視プロセスのタイムアウトにより異常検知を行うようにしてもよい。 Upon receiving the abnormality detection, the system management unit 120 performs post-processing (S602) and restarts the node 1. Note that the data management unit 121 may not be able to output an abnormality notification to the system management unit 120 depending on the failure state. In such a case, the process management function provided in the system management unit 120 or the activation monitoring process Anomaly detection may be performed by the timeout.

一方、ノード２の自ノードデータロード部２２７において、自ノードからのデータ読み込みが完了すると、データ管理部２２１は、インタフェース部２２２によりシステム管理部２２０へ起動完了通知を出力する（Ｓ６１１）。これを受けたシステム管理部２２０は、冗長構成システムのノードのうち、どのノードをActiveノードとするかを決定する（Ｓ６１２）。なお、ここでのActiveノードの決定は、公知の技術を用いてよいが、サービス停止時間を極力短縮するために、例えば、最初に起動完了したノードをActiveノードとして決定する。すなわち、システム管理部２２０は、他のノード（ノード１）がActiveノードかStandbyノードかの確認を行い、その結果、他のノード（ノード１）が既にActiveノードであれば、自ノード（ノード２）をStandbyノードにすると決定し、まだActiveノードでなければ、自ノード（ノード２）をActiveノードにすると決定する。ここでは、自ノード（ノード２）のシステム管理部２２０は、まだ他のノード（ノード１）がActiveノードではないので、自ノード（ノード２）をActiveノードにすると決定する。 On the other hand, when the own node data load unit 227 of the node 2 completes reading of data from the own node, the data management unit 221 outputs a startup completion notification to the system management unit 220 through the interface unit 222 (S611). Receiving this, the system management unit 220 determines which of the nodes of the redundant configuration system is the active node (S612). In this case, a known technique may be used to determine the Active node. However, in order to shorten the service stop time as much as possible, for example, the node that has been activated first is determined as the Active node. That is, the system management unit 220 confirms whether the other node (node 1) is an active node or a standby node, and as a result, if the other node (node 1) is already an active node, the own node (node 2) ) To be a standby node, and if it is not yet an active node, it is decided to make its own node (node 2) an active node. Here, since the other node (node 1) is not yet an active node, the system management unit 220 of the own node (node 2) determines that the own node (node 2) is the active node.

このような決定をしたシステム管理部２２０は、データ管理部２２１へActive化指示を出力する（Ｓ６１３）。そして、データ管理部２２１の自ノード状態管理部２２５は、自ノード（自ノードのデータ管理部２２１）のActive化処理を実行する（Ｓ６１４）。この後、自ノード状態管理部２２５はActive化処理を完了すると、システム管理部２２０へActive化完了通知（Active化完了を示す状態遷移完了通知）を出力する（Ｓ６１５）。これを受けて、システム管理部２２０は、自ノード全体のActive化処理（例えば、自ノードのアプリケーションモジュールのActive化）を行い、サービスを開始する（Ｓ６１６）。 The system management unit 220 that has made such a determination outputs an activation instruction to the data management unit 221 (S613). Then, the own node state management unit 225 of the data management unit 221 executes the activation process of the own node (the own node data management unit 221) (S614). After that, when completing the activation processing, the own node state management unit 225 outputs an activation completion notification (state transition completion notification indicating completion of activation) to the system management unit 220 (S615). In response to this, the system management unit 220 performs an activation process for the entire own node (for example, activates an application module of the own node) and starts a service (S616).

このようにすることで、冗長構成システムにおいてノード１が起動に失敗した場合でも、速やかにサービスを開始できる。なお、図示を省略しているが、ノード２のシステム管理部２２０は、自ノード全体のActive化処理を完了すると、入出力部２１経由で自ノードのActive化完了通知をノード１のシステム管理部１２０へ送信しておく。 In this way, even if the node 1 fails to start in the redundant configuration system, the service can be started promptly. Although not shown, when the system management unit 220 of the node 2 completes the activation process for the entire node, the system management unit of the node 1 notifies the activation completion notification of the node via the input / output unit 21. It transmits to 120.

一方、起動に失敗し、再起動を行ったノード１は、前記した図５のＳ５０１〜Ｓ５０３と同様に以下の処理を行う。すなわち、データ管理部１２１は、システム管理部１２０からの起動指示を受けると（Ｓ６２１）、他ノード状態管理部１２４により、ノード２のデータ管理部２２１に対し状態確認を行い（Ｓ６２２）、このデータ管理部２２１からその応答を受信する（Ｓ６２３）。 On the other hand, the node 1 that failed to start and restarted performs the following processing in the same manner as S501 to S503 in FIG. That is, when the data management unit 121 receives an activation instruction from the system management unit 120 (S621), the other node status management unit 124 checks the status of the data management unit 221 of the node 2 (S622). The response is received from the management unit 221 (S623).

この段階では、ノード２のデータ管理部２２１はActive状態である。そのため、他ノード状態管理部１２４は、ノード２から、データ管理部２２１はActive状態である旨の応答を受信する。これにより、他ノード状態管理部１２４は、ノード２のデータ管理部２２１はActive状態と判断し、自ノード状態管理部１２５は、自ノードを仮Standby状態にすると決定する（Ｓ６２４）。そして、データロード部１２６に対し、ノード２からのデータ読み込みを指示し、他ノードデータロード部１２８は、ノード２の補助記憶部２３からデータを読み出す（Ｓ６２５）。 At this stage, the data management unit 221 of the node 2 is in the Active state. Therefore, the other node state management unit 124 receives from the node 2 a response that the data management unit 221 is in the Active state. Accordingly, the other node state management unit 124 determines that the data management unit 221 of the node 2 is in the Active state, and the own node state management unit 125 determines to place the own node in the temporary Standby state (S624). Then, the data loading unit 126 is instructed to read data from the node 2, and the other node data loading unit 128 reads the data from the auxiliary storage unit 23 of the node 2 (S625).

この後、ノード１のデータ管理部１２１は起動を完了すると、システム管理部１２０へ起動完了通知を出力する（Ｓ６２６）。このような通知を受けたノード１のシステム管理部１２０は、他のノード（ノード２）に、このノード２の状態を問い合わせ、自ノード（ノード１）をStandbyノードにすると決定する（Ｓ６２７）。そして、ノード１のシステム管理部１２０は、データ管理部１２１へStandby化指示を出力し（Ｓ６２８）、これを受けてデータ管理部１２１は、自ノード状態管理部１２５において、自ノード（自ノードのデータ管理部１２１）のStandby化処理を実行する（Ｓ６２９）。次に、自ノード状態管理部１２５は自ノード（自ノードのデータ管理部１２１）のStandby化処理を完了すると、システム管理部１２０へStandby化完了通知を出力し（Ｓ６３０）、処理を終了する。 Thereafter, when the data management unit 121 of the node 1 completes the startup, it outputs a startup completion notification to the system management unit 120 (S626). Upon receiving such notification, the system management unit 120 of the node 1 inquires of the other node (node 2) about the state of the node 2 and determines that the own node (node 1) is the standby node (S627). Then, the system management unit 120 of the node 1 outputs a standby instruction to the data management unit 121 (S628), and in response to this, the data management unit 121 causes the local node state management unit 125 to execute The standby processing of the data management unit 121) is executed (S629). Next, when the local node state management unit 125 completes the standby processing of the local node (data management unit 121 of the local node), it outputs a standby conversion completion notification to the system management unit 120 (S630), and ends the processing.

このようにノード１が起動に失敗し、ノード２がActiveノードになった場合、ノード１はいったん仮Standby状態になり、その後、Standbyノードとして起動する。 In this way, when the node 1 fails to start and the node 2 becomes an active node, the node 1 temporarily enters the standby state, and then starts as a standby node.

なお、ここでは説明を省略したが、前記した処理によりActiveノードとして起動したノード２は、Standbyノードとして起動したノード１へジャーナルの送信を開始する。 Although explanation is omitted here, the node 2 activated as the active node by the above-described processing starts transmission of the journal to the node 1 activated as the standby node.

また、前記した実施の形態において、仮Standby状態のノードがStandbyノードとして起動する場合について説明したが、Activeノードとして起動する場合もある。例えば、ノード１が仮Standby状態になった後、障害発生等によりノード２側がActive状態ではなくなった場合、これを検知したノード１は仮Standby状態からActive状態に遷移する。つまり、ノード１がActiveノードになり、サービスを開始する。 Further, in the above-described embodiment, the case where the node in the temporary standby state is activated as the standby node has been described. However, the node may be activated as the active node. For example, after the node 1 is in the temporary standby state, if the node 2 is no longer in the active state due to a failure or the like, the node 1 that detects this transitions from the temporary standby state to the active state. That is, node 1 becomes an active node and starts the service.

冗長構成システムのノード１,２が以上のような処理を行うことで、ノード１,２の起動中に障害が発生しても速やかにサービスを開始できるようになる。 When the nodes 1 and 2 of the redundant configuration system perform the processing as described above, the service can be started promptly even if a failure occurs while the nodes 1 and 2 are activated.

本実施の形態に係る冗長構成システムは、コンピュータであるノード１,２が所定のプログラムを実行することで実現することができ、そのプログラムをコンピュータによる読み取り可能な記憶媒体（ＣＤ−ＲＯＭ等）に記憶して提供することも可能である。また、そのプログラムを、インターネット等のネットワークを通して提供することも可能である。 The redundant configuration system according to the present embodiment can be realized by the nodes 1 and 2 being computers executing a predetermined program, and the program is stored in a computer-readable storage medium (CD-ROM or the like). It can also be stored and provided. It is also possible to provide the program through a network such as the Internet.

本実施の形態の冗長構成システムの概要を説明する図である。It is a figure explaining the outline | summary of the redundant configuration system of this Embodiment. 図１の各ノードの構成を示したブロック図である。It is the block diagram which showed the structure of each node of FIG. 図２の各ノードの動作手順を例示したフローチャートである。3 is a flowchart illustrating an operation procedure of each node in FIG. 2. 図２の各ノードの動作手順を例示したフローチャートである。3 is a flowchart illustrating an operation procedure of each node in FIG. 2. 図２の各ノードの動作手順を例示したフローチャートである。3 is a flowchart illustrating an operation procedure of each node in FIG. 2. 図２の各ノードの動作手順を例示したフローチャートである。3 is a flowchart illustrating an operation procedure of each node in FIG. 2. 従来の冗長構成システムを例示した図である。It is the figure which illustrated the conventional redundant composition system. 従来技術の問題を説明するために引用した図である。It is the figure quoted in order to demonstrate the problem of a prior art. 従来技術の問題を説明するために引用した図である。It is the figure quoted in order to demonstrate the problem of a prior art.

Explanation of symbols

１,２,３,４ノード
１１,２１入出力部
１２,２２処理部
１３,２３補助記憶部
１４,２４記憶部
１２０,２２０,３２０,４２０システム管理部
１２１,２２１,３２１,４２１データ管理部
１２２,２２２インタフェース部
１２３,２２３状態制御部
１２４,２２４他ノード状態管理部
１２５,２２５自ノード状態管理部
１２６,２２６データロード部
１２７,２２７自ノードデータロード部
１２８,２２８他ノードデータロード部
１４０,２４０データベース
１３１,２３１チェックポイント
１３２,２３２ジャーナル 1, 2, 3, 4 Node 11, 21 Input / output unit 12, 22 Processing unit 13, 23 Auxiliary storage unit 14, 24 Storage unit 120, 220, 320, 420 System management unit 121, 221, 321, 421 Data management unit 122,222 Interface unit 123,223 State control unit 124,224 Other node state management unit 125,225 Own node state management unit 126,226 Data load unit 127,227 Own node data load unit 128,228 Other node data load unit 140 , 240 Database 131,231 Checkpoint 132,232 Journal

Claims

In a redundant configuration system including a plurality of nodes, each of the plurality of nodes confirms the state of each other via a network when the plurality of nodes are activated, and any one of the plurality of nodes is serviced. A node control method in which an active node is provided and a node other than the active node is a standby node,
The data management unit of the node
Inquiry information on whether or not the data management unit of the other node is in the Active state to the other node in the redundant configuration system via the network when the node activation instruction is received from the system management unit of the node Send
The response of the inquiry information from the other nodes, when the data management section of said other node is determined not to be the Active state,
From the auxiliary storage unit of the own node, the checkpoint and journal stored in the auxiliary storage unit are read out on the storage unit of the own node,
A system management unit of the node;
Wherein the plurality of nodes, said nodes to the check point and journal reading has been completed first selected as Active node, select the other node as the Standby node,
The data management unit of the node
When the system management unit of the node has selected the own node as the Active node, when receiving an activation instruction of the own node from the system management unit of the node,
A node control method comprising: activating the data management unit of the own node.

The data management unit of the node
When the system management unit of the node has selected the local node as the standby node, when receiving the standby instruction of the local node from the system management unit of the node,
Discard the checkpoint and journal on the storage unit of the node,
Via the network, from the auxiliary storage unit of the other node selected as the Active node , read the checkpoint and journal on the storage unit of the own node ,
The node control method according to claim 1, wherein the data management unit of the own node is set to Standby.

The data management unit of the node
The response of the inquiry information from the other nodes, when the data management portion of the other nodes is determined that the an Active state,
Via the network, from the auxiliary storage unit of the other node selected as the Active node, read the checkpoint and journal on the storage unit of the own node,
The data management unit of the node
When the system management unit of the node has selected the local node as the standby node, when receiving the standby instruction of the local node from the system management unit of the node,
The node control method according to claim 1 or 2, wherein the data management unit of the own node is set to Standby.

The data management unit of the node
When it is determined that the data management unit of the other node is in the Active state by the response of the inquiry information from the other node,
Via the network, from the auxiliary storage unit of the other node selected as the Active node, read the checkpoint and journal on the storage unit of the own node,
The data management unit of the node
After reading the checkpoint and journal from the auxiliary storage unit of the other node onto the storage unit of the local node, the system management unit of the node detects that the other node is not in the Active state. Therefore, when receiving the activation instruction of the own node from the system management unit of the node,
The node control method according to claim 1 or 2 , wherein the data management unit of the own node is activated.

The data management unit of the node
When it is determined that the data management unit of the other node is in the Active state by the response of the inquiry information from the other node,
When the checkpoint and journal cannot be read out when reading the checkpoint and journal from the auxiliary storage unit of another node selected as the Active node via the network onto the storage unit of the local node. ,
The node control method according to claim 3 or 4 , wherein the checkpoint and the journal are read from an auxiliary storage unit of the own node.

A node control program for causing a node which is a computer to execute the node control method according to any one of claims 1 to 5.

A node used in a redundant configuration system including an active node that provides a service and a standby node that is a standby node of the active node,
When receiving an instruction to start the node from the system management unit of the node, the inquiry information on whether the data management unit of the other node is in an active state is sent to another node in the redundant configuration system via the network. and transmitted, the response of the inquiry information from said other node, the other node state management unit data management unit determines whether the Active state of the other nodes,
Wherein the other-node status management unit, when said data management unit of another node determines that the non-Active state, from the auxiliary storage unit of the node, the local node checkpoints and journal stored in the auxiliary storage unit A self-node data load unit to be read on the storage unit of
Among the nodes of the redundant configuration system, a node that first reads the checkpoint and journal is selected as the Active node, and a system management unit that selects other nodes as the Standby node ;
When the system management unit of the node selects the own node as the Active node, when receiving an activation instruction for the own node from the system management unit, the node's data management unit is activated. A node comprising a node state management unit.

In the other node state management unit, when the system management unit of the node has selected the local node as the standby node, when the standby instruction of the local node is received from the system management unit of the node,
The other node data load unit
The checkpoint and journal stored on the storage unit of the local node are discarded, and the checkpoint stored in the auxiliary storage unit from the auxiliary storage unit of another node selected as the Active node via the network And read the journal onto the storage unit of the node ,
The own node state management unit
The node according to claim 7, wherein the data management unit of the own node is set to Standby.

In the data management unit of the node, the response of the inquiry information from the other nodes, it is determined that the data management section of said other node is the Active state,
Via the network, from the auxiliary storage unit of the other node selected as the Active node, read the checkpoint and journal on the storage unit of the own node,
The data management unit of the node
When the system management unit of the node has selected the local node as the standby node, when receiving the standby instruction of the local node from the system management unit of the node,
The node according to claim 7 or 8, wherein the data management unit of the own node is set to Standby.

In the data management unit of the node, it is determined that the data management unit of the other node is in the Active state based on a response to the inquiry information from the other node.
Via the network, from the auxiliary storage unit of the other node selected as the Active node, read the checkpoint and journal on the storage unit of the own node,
Data management unit of the node,
After reading the checkpoint and journal from the auxiliary storage unit of the other node onto the storage unit of the local node, the system management unit of the node detects that the other node is not in the Active state. Therefore, when receiving the activation instruction of the own node from the system management unit of the node,
The node according to claim 7 or 8 , wherein the data management unit of the own node is activated.