JPH07311748A

JPH07311748A - Fault recovery system of decentralized data base system

Info

Publication number: JPH07311748A
Application number: JP6102784A
Authority: JP
Inventors: Takashi Ozaki; ▲たかし▼ 尾崎
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1994-05-17
Filing date: 1994-05-17
Publication date: 1995-11-28

Abstract

PURPOSE:To provide a fault recovery system of the decentralized data base which minimizes the time required for fault recovery even when the decentralized data base is large in scale and complex, decreases the number of bypasses evading a recovery failure due to a communication path fault, and further enables the fault recovery state of the whole decentralized data base to be confirmed at each node. CONSTITUTION:Plural nodes 11 which constitute the decentralized data base are divided logically into recovery node groups 13. A recovery control node 15 which is positioned above the respective nodes 11 and controls the fault recovery of the recovery node group 13 is connected to the respective nodes 11 by communication lines 14 which exchange recovery information and determination information. Plural recovery control nodes 15 are divided logically into recovery node groups 17 and connected to recovery control nodes positioned one layer above through communication lines 18 to form a hierarchical structure, and the respective recovery control nodes control the fault recovery of nodes positioned below.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、通信ネットワークによ
り結合されたネットワークノードあるいは疎結合された
並列プロセッサにより構成された大規模な分散データベ
ースシステムにおいて、通信路あるいはノードの障害を
含むシステム障害が発生した場合にシステム全体の障害
回復を高速に行う障害回復方式に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention, in a large-scale distributed database system composed of network nodes connected by a communication network or parallel processors loosely connected, causes a system failure including a failure of a communication path or a node. In this case, the present invention relates to a failure recovery method that quickly recovers the failure of the entire system.

【０００２】[0002]

【従来の技術】図１０は、例えば特開平１−１９４０４
０号公報に示された分散データベースの構成図である。
通信ネットワーク４にクライアントノード及びサーバノ
ードのプロセッサ３ａ、３ｂがそれぞれ結合している。
各ノードはそれぞれ、データベース管理システム２ａ、
２ｂ、データベース５ａ、５ｂ、履歴情報６ａ、６ｂ及
びトランザクション状態管理テーブル１ａ、１ｂを有し
ている。2. Description of the Related Art FIG. 10 shows, for example, Japanese Patent Laid-Open No. 1-19404.
It is a block diagram of the distributed database shown by 0 publication.
Processors 3a and 3b of a client node and a server node are coupled to the communication network 4, respectively.
Each node has a database management system 2a,
2b, databases 5a and 5b, history information 6a and 6b, and transaction status management tables 1a and 1b.

【０００３】従来の分散データベースシステムの回復方
式においては、各トランザクションの終了状態をトラン
ザクション状態管理テーブル１ａ、１ｂに保持し、障害
回復時には、トランザクション状態管理テーブル１ａ、
１ｂを回復させる。通信再開後にはトランザクション状
態管理テーブル１ａ、１ｂの内容に基づき各ノード間で
メッセージを交換することにより、各ノードの履歴情報
６ａ、６ｂによって回復する方式である。In the conventional recovery method of a distributed database system, the end status of each transaction is held in the transaction status management tables 1a and 1b, and at the time of failure recovery, the transaction status management table 1a,
Restore 1b. After the communication is restarted, messages are exchanged between the nodes based on the contents of the transaction status management tables 1a and 1b to recover the history information 6a and 6b of the nodes.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、従来の
分散データベースシステムの回復方式では、障害時に直
接データベースアクセスをしていた２ノード間でメッセ
ージ交換が実施されるため、大規模な分散データベース
においてシステム障害が発生した場合、データベースア
クセスをしていたノード数が多ければそれだけメッセー
ジ転送量が増大するので、システム全体の障害回復処理
が終了するまでの時間が増大するという問題があった。However, in the conventional recovery method of the distributed database system, since message exchange is performed between the two nodes that directly accessed the database at the time of failure, system failure occurs in a large-scale distributed database. If the number of nodes accessing the database is large, the amount of message transfer increases accordingly, and there is a problem that the time until the failure recovery processing of the entire system ends increases.

【０００５】障害回復に要する時間をメッセージの交換
回数で見てみると、従来において分散データベースを構
成するノード数をｎとした場合、最低でアクセス中のノ
ードの対が全て異なる場合の２ｎ回から、最大で全ての
ノードが他のノードとアクセス中の場合のｎ（ｎ−１）
回までのメッセージ交換回数が必要となる。従って、大
規模で複雑なアクセスがあればあるほど膨大なメッセー
ジ交換回数となるという問題があった。Looking at the time required for failure recovery in terms of the number of message exchanges, when the number of nodes forming a distributed database is n in the past, at least 2n times when all pairs of nodes being accessed are different. , N (n-1) when at most all nodes are accessing other nodes
The number of message exchanges is required. Therefore, there is a problem that the larger the number of complicated accesses, the greater the number of message exchanges.

【０００６】更に、回復するまでにかかる所要時間がノ
ード数と分散データベースアクセスの複雑度双方に依存
するため、当該所要時間の変動幅が大きく、回復処理が
終了してみないと回復所要時間が特定できないという問
題があった。Furthermore, since the time required for recovery depends on both the number of nodes and the complexity of the distributed database access, the fluctuation range of the time required is large, and the recovery time is required until the recovery process is completed. There was a problem that it could not be specified.

【０００７】また、従来のような直接データベースアク
セスをしていた２ノード間で行われる方式においては、
２ノード間の通信路に障害のあった場合にその障害を回
避するための迂回路は、分散データベースアクセスをす
る可能性のある全てのノード間に設ける必要がある。従
って、迂回路を介して完全に通信障害を回避するために
は、ｎ（ｎ−１）／２という膨大な迂回路数が必要にな
るという問題があった。Further, in the conventional method for performing direct database access between two nodes,
If there is a failure in the communication path between the two nodes, a detour for avoiding the failure must be provided between all the nodes that may access the distributed database. Therefore, there is a problem that a huge number of detours of n (n-1) / 2 is required to completely avoid the communication failure via the detour.

【０００８】更に、従来における障害回復方式は、障害
時に直接データベースアクセスをしていた２ノード間の
メッセージ交換で実施されるため、直接データベースア
クセスをしていた当事者同士の回復は確認することがで
きるものの分散データベース全体として回復が完了した
のか、あるいは回復が完了せずアクセス不能なノードが
存在するのか等、分散データベース全体の状態を確認す
る手段がなかった。Further, since the conventional failure recovery method is carried out by exchanging messages between two nodes that directly access the database at the time of failure, recovery between the parties who directly access the database can be confirmed. However, there was no way to check the status of the entire distributed database, such as whether recovery was completed for the entire distributed database, or whether there was an inaccessible node because recovery was not completed.

【０００９】本発明は、かかる課題を解決するためにな
されものであり、大規模で複雑な分散データベースにお
いても障害回復にかかる時間を最小限に抑え、また通信
路障害による回復不能を回避する迂回路の数を削減し、
更に分散データベース全体の障害回復状態を各ノードで
確認できる分散データベースの障害回復方式を提供する
ことを目的とする。The present invention has been made in order to solve such a problem, and even in a large-scale and complicated distributed database, the time required for failure recovery is minimized, and the bypass for avoiding the unrecoverability due to the communication path failure is avoided. Reduce the number of paths,
It is another object of the present invention to provide a distributed database failure recovery method in which the failure recovery status of the entire distributed database can be confirmed at each node.

【００１０】[0010]

【課題を解決するための手段】以上の目的を達成するた
めに、請求項１記載の発明は、プロセッサとデータベー
スとを有する複数のノードがネットワークにより結合さ
れ、各ノードにおける処理が他ノードのデータベースを
アクセスする分散データベースシステムにおいて、複数
のノードを論理的にグループに分割し、前記各グループ
毎に障害回復の制御を行う回復制御ノードを設けること
で各ノードの障害回復制御をツリー構造的に行うことを
特徴とする。In order to achieve the above object, the invention according to claim 1 is such that a plurality of nodes each having a processor and a database are connected by a network, and a process in each node is a database of another node. In a distributed database system that accesses a node, a plurality of nodes are logically divided into groups, and a recovery control node that controls the failure recovery is provided for each group to perform failure recovery control of each node in a tree structure. It is characterized by

【００１１】請求項２記載の発明は、請求項１記載の分
散データベースシステムの障害回復方式において、複数
の前記回復制御ノードを論理的にグループに分割し、前
記各グループ毎に障害回復の制御を行う上位の回復制御
ノードを設けることで各ノードの障害回復制御を階層的
に行うことを特徴とする。According to a second aspect of the present invention, in the disaster recovery method for a distributed database system according to the first aspect, a plurality of the recovery control nodes are logically divided into groups, and failure recovery control is performed for each of the groups. By providing a higher recovery control node to perform, failure recovery control of each node is performed hierarchically.

【００１２】請求項３記載の発明は、請求項２記載の分
散データベースシステムの障害回復方式において、前記
上位の回復制御ノードが最終的に１つになるまで階層を
形成することを特徴とする。According to a third aspect of the present invention, in the failure recovery method of the distributed database system according to the second aspect, a hierarchy is formed until the number of the upper recovery control nodes finally becomes one.

【００１３】請求項４記載の発明は、請求項１乃至２記
載の分散データベースシステムの障害回復方式におい
て、ツリー構造の上位と下位に位置するノード間に迂回
路を持つことを特徴とする。The invention according to a fourth aspect is characterized in that, in the failure recovery method of the distributed database system according to the first or second aspect, a detour is provided between nodes located at a higher level and a lower level of the tree structure.

【００１４】請求項５記載の発明は、請求項１乃至４記
載の分散データベースシステムの障害回復方式におい
て、前記回復制御ノードは、不確定なトランザクション
の状態を確定するための回復情報を通信相手のノードか
ら得られない場合に当該ノードへの問い合わせる機構を
有することを特徴とする。According to a fifth aspect of the present invention, in the failure recovery method of the distributed database system according to the first to fourth aspects, the recovery control node sends recovery information for establishing a state of an indefinite transaction to a communication partner. It is characterized by having a mechanism for inquiring to the node when it cannot be obtained from the node.

【００１５】請求項６記載の発明は、請求項１乃至５記
載の分散データベースシステムの障害回復方式におい
て、前記回復制御ノードは、障害回復の状態を下位の前
記ノードに通知する機構を有することを特徴とする。According to a sixth aspect of the present invention, in the disaster recovery method of the distributed database system according to the first to fifth aspects, the recovery control node has a mechanism for notifying a lower level node of a failure recovery state. Characterize.

【００１６】[0016]

【作用】上記のような構成を持つ分散データベースの障
害回復方式において、分散データベースのシステム障害
時に、各ノードがアクセスしていた分散データベースの
回復に関する情報を論理的に各ノードの上位に位置する
回復制御ノードに送信する。上位に位置する回復制御ノ
ードは、障害回復の制御を行うグループ内で解決できる
障害に関しては、結果をグループ内の各ノードに通知す
る。一方、当該グループ内で解決できない障害に関して
は、更に上位の回復制御ノードに回復に関する情報を送
信する。これを最上位の回復制御ノードまで階層的に行
うことにより不確定なトランザクションの回復情報を全
て収集してトランザクションの状態を確定させる。確定
したトランザクションの状態は、階層を逆にたどって各
ノードに通知され、障害の発生したノードは回復を行
う。In the distributed database failure recovery method having the above-mentioned configuration, when the system failure of the distributed database occurs, the information related to the recovery of the distributed database accessed by each node is logically located above the node. Send to control node. The upper level recovery control node notifies each node in the group of the result of a failure that can be resolved in the group that controls failure recovery. On the other hand, regarding a failure that cannot be resolved within the group, information regarding recovery is transmitted to a higher-level recovery control node. By performing this hierarchically up to the highest-level recovery control node, all recovery information of uncertain transactions is collected and the transaction status is confirmed. The state of the confirmed transaction is notified to each node by tracing back the hierarchy, and the faulty node recovers.

【００１７】また、上位と下位に位置するノード間の通
信路が障害を起こし当該通信路を介してメッセージの交
換が不能となった場合であっても、迂回路を設けたこと
でノード間における回復情報の報告や確定情報等メッセ
ージの交換を行うことができる。Further, even when the communication path between the nodes located at the upper and lower positions fails and the message cannot be exchanged via the communication path, the detour is provided between the nodes. Messages such as recovery information reports and confirmation information can be exchanged.

【００１８】更に、上位の回復制御ノードが不確定なト
ランザクションの状態を確定するための回復情報を通信
相手のノードから得られない場合であっても当該ノード
に対しトランザクション状態を問い合わせるメッセージ
を送信することで、トランザクション状態を示す応答メ
ッセージを得ることができるので、不確定なトランザク
ションの状態を確定させ回復を行うことができる。Further, even when the upper recovery control node cannot obtain the recovery information for establishing the state of the indefinite transaction from the node of the communication partner, it sends a message inquiring the transaction state to the node. As a result, a response message indicating the transaction status can be obtained, so that the status of an indefinite transaction can be fixed and recovery can be performed.

【００１９】また、障害の全面的回復あるいは部分的回
復時に、上位の回復制御ノードが分散データベースシス
テムの回復の状態を分散データベースを構成する下位の
各ノードに通知し、分散データベースのサービス再開を
利用者に知らしめることができる。Further, at the time of full or partial recovery from a failure, the upper recovery control node notifies the lower nodes constituting the distributed database of the recovery status of the distributed database system, and the service restart of the distributed database is used. Can inform others.

【００２０】[0020]

【実施例】以下、図面に基づいて本発明に係る好適な実
施例を説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Preferred embodiments of the present invention will be described below with reference to the drawings.

【００２１】実施例１．図１は、第１実施例の要旨を説
明するための分散データベースの概念図である。プロセ
ッサとデータベース１２とを有し分散データベースを構
成する複数のノード１１は、論理的に回復ノードグルー
プ１３に分割される。各ノード１１の上位に位置するプ
ロセッサとデータベース１６とを有する回復制御ノード
１５は、回復ノードグループ１３の障害回復を制御す
る。なお、回復制御ノード１５は、ノード１１に対して
論理的にツリー構造の上位に位置するが、回復ノードグ
ループ１３に含まれるノード１１の１つであってもよ
い。回復制御ノード１５と回復ノードグループ１３に含
まれる各ノード１１は、回復に関する情報を交換する通
信路１４で接続されている。 Example 1. FIG. 1 is a conceptual diagram of a distributed database for explaining the gist of the first embodiment. A plurality of nodes 11 having a processor and a database 12 and forming a distributed database are logically divided into recovery node groups 13. The recovery control node 15 having a processor and a database 16 positioned above each node 11 controls failure recovery of the recovery node group 13. The recovery control node 15 is logically positioned higher in the tree structure than the node 11, but may be one of the nodes 11 included in the recovery node group 13. The recovery control node 15 and each node 11 included in the recovery node group 13 are connected by a communication path 14 that exchanges information regarding recovery.

【００２２】更に、複数の回復制御ノード１５は、上述
したノード１１と同様に論理的に回復ノードグループ１
７に分割され、１階層上位に位置する回復制御ノードに
よって各回復制御ノード１５は、通信路１８を介して障
害回復の制御が行われる。このように上位の回復制御ノ
ードが１つになるまで階層を形成する。Further, the plurality of recovery control nodes 15 logically recover the recovery node group 1 like the node 11 described above.
Each recovery control node 15 is divided into seven, and each recovery control node 15 is controlled by the recovery control node located one layer higher than the other through the communication path 18. In this way, a hierarchy is formed until the number of upper recovery control nodes becomes one.

【００２３】本実施例において特徴的なことは、分散デ
ータベースのシステム障害時に、各ノード１１がアクセ
スしていた分散データベースの回復に関する情報を上位
に位置する回復制御ノード１５に送信し、回復制御ノー
ド１５は回復ノードグループ１３内で解決できる障害に
関しては、結果を各ノード１１に通知し、回復ノードグ
ループ１３内で解決できない障害に関しては、上位の回
復制御ノードに回復に関する情報を送信し、これを最上
位の回復制御ノードまで階層的に行うことにより不確定
なトランザクションの回復に関する情報を全て収集して
トランザクションの状態を確定させることである。更
に、確定したトランザクションの状態を階層を逆にたど
って当該ノードに通知し回復を行うことである。このよ
うに、階層的に各ノードの回復制御を行わせることで分
散データベースシステムが大規模であったとしてもその
システム内で交換されるメッセージ数の増加を最小限に
抑えることができる。The characteristic feature of this embodiment is that when a system failure occurs in the distributed database, the information related to the recovery of the distributed database accessed by each node 11 is transmitted to the recovery control node 15 located at a higher level, For a failure that can be resolved in the recovery node group 13, 15 notifies each node 11 of the result, and for a failure that cannot be resolved in the recovery node group 13, sends information about the recovery to a higher-level recovery control node. By hierarchically performing the recovery control node at the highest level, all the information related to the recovery of an indefinite transaction is collected and the state of the transaction is confirmed. Furthermore, the state of the confirmed transaction is traced backwards to notify the node concerned and recovery is performed. In this way, by hierarchically controlling the recovery of each node, even if the distributed database system is large in scale, it is possible to minimize the increase in the number of messages exchanged in the system.

【００２４】図２は、第１実施例を説明するための分散
データベースの構成図である。分散データベースを構成
する複数の最下位のノード２１は、最下層の回復ノード
グループ２２に分割される。各ノード２１は、最下層の
回復ノードグループ２２の回復を制御する第１階層の回
復制御ノード２４に通信路２３を介して接続される。ま
た、第１階層の複数の回復制御ノード２４は、第１階層
の回復ノードグループ２５に分割される。第１階層の各
回復制御ノード２４は、回復ノードグループ２５の回復
を制御する第２階層の回復制御ノード２７に通信路２６
を介して接続される。第２階層の複数の回復制御ノード
２７は、第２階層の回復ノードグループ２８に分割され
る。そして、第２階層の各回復制御ノード２７は、回復
ノードグループ２８の回復を制御する第３階層であり本
実施例においては最上位に位置する回復制御ノード３０
に通信路２９を介して接続される。FIG. 2 is a block diagram of a distributed database for explaining the first embodiment. The plurality of lowest nodes 21 forming the distributed database are divided into the lowest recovery node group 22. Each node 21 is connected via a communication path 23 to a first layer recovery control node 24 that controls recovery of the lowest layer recovery node group 22. Further, the plurality of recovery control nodes 24 of the first layer are divided into the recovery node group 25 of the first layer. Each of the recovery control nodes 24 of the first layer communicates with the recovery control node 27 of the second layer that controls the recovery of the recovery node group 25 through the communication path 26.
Connected via. The plurality of recovery control nodes 27 in the second layer are divided into recovery node groups 28 in the second layer. Then, each recovery control node 27 of the second hierarchy is the third hierarchy that controls recovery of the recovery node group 28, and the recovery control node 30 located at the highest level in this embodiment.
To the communication path 29.

【００２５】図３、４、５は、それぞれ最下位に位置す
るノード２１、中間層に位置する回復制御ノード２４、
２７及び最上位に位置する回復制御ノード３０における
障害時の回復処理のフローチャートであり、以下、第１
実施例の回復操作を図を用いて説明する。3, 4 and 5, the node 21 located at the lowest level, the recovery control node 24 located at the middle layer,
27 is a flowchart of a recovery process at the time of a failure in the recovery control node 30 located at 27 and the highest level.
The recovery operation of the embodiment will be described with reference to the drawings.

【００２６】まず、図３を用いて最下位に位置するノー
ド２１における障害時の回復処理について説明する。First, the recovery process at the time of a failure in the lowest node 21 will be described with reference to FIG.

【００２７】分散データベースのシステム障害時に、最
下位のノード２１ではステップ３１の判定に基づいて、
障害時に自ノードが他ノードとの間で分散データベース
アクセスしている場合には、ステップ３２において、ア
クセス中のトランザクション全てについて自ノード及び
相手ノードのアドレスとトランザクションとを特定でき
る識別子と、回復プロトコルに基づくコミット、ロール
バック、不確定のいずれかを示す自ノードのトランザク
ション状態を示す情報（以下、回復情報という）を含む
メッセージを通信路２３を通して第１階層の回復制御ノ
ード２４に送信する。ステップ３３において、自ノード
に不確定トランザクションが存在する時は、ステップ３
４において回復制御ノード２４からのトランザクション
確定情報の通知を待ち、確定情報受信後、確定情報に従
ってデータベースをコミット又はロールバックしてトラ
ンザクションを確定させる。At the time of the system failure of the distributed database, the lowest node 21 determines the
If the own node is accessing the distributed database with another node at the time of failure, in step 32, an identifier that can specify the addresses and transactions of the own node and the partner node for all transactions being accessed, and the recovery protocol are specified. A message including information indicating the transaction status of the own node (hereinafter, referred to as recovery information) indicating any of commit, rollback, and indeterminate based on the communication is transmitted to the recovery control node 24 of the first layer through the communication path 23. In step 33, if there is an indefinite transaction in the own node, step 3
In step 4, the transaction control node 24 waits for notification of transaction confirmation information, and after receiving the confirmation information, the transaction is committed by committing or rolling back the database according to the confirmation information.

【００２８】次に、図４を用いて中間層に位置する回復
制御ノード２４、２７における障害時の回復処理につい
て説明する。Next, with reference to FIG. 4, a recovery process at the time of a failure in the recovery control nodes 24 and 27 located in the intermediate layer will be described.

【００２９】第１階層の回復制御ノード２４は、ステッ
プ４１及び判定４２に従い受信処理を行い、判定４３に
よって不確定状態の報告を受けたトランザクションの障
害時の通信相手となるノードが回復制御ノード２４であ
るか第１階層の回復制御ノード２４が制御する最下層の
回復ノードグループ２２に存在し、そのトランザクショ
ンのコミットかロールバックかの状態を確定できる時に
はステップ４６に従い不確定状態のトランザクションを
持つノードに確定したトランザクション状態を通知す
る。ステップ４３における判定によって各ノードの障害
時の通信相手が回復制御ノード２４でなく回復ノードグ
ループ２２に存在しない場合は、ステップ４４におい
て、回復制御ノード２４は、それらのノードの回復情報
全て及び回復制御ノード２４自身が回復ノードグループ
２２に属さない他ノードとの間で分散データベースアク
セスしている場合には回復制御ノード２４自身の回復情
報を付加したメッセージを通信路２６を通して上位の第
２階層の回復制御ノード２７に送信する。ステップ４５
において、上位の第２階層の回復制御ノード２７からの
トランザクション確定情報の通知を待ち、確定情報受信
後はステップ４６に従い不確定状態のトランザクション
を報告してきたノードに確定したトランザクション状態
を通知する。The recovery control node 24 of the first layer performs the receiving process according to the step 41 and the judgment 42, and the node which becomes the communication partner at the time of the failure of the transaction which received the report of the indeterminate state by the judgment 43 is the recovery control node 24. Or the node exists in the recovery node group 22 in the lowest layer controlled by the recovery control node 24 of the first layer and the state of commit or rollback of the transaction can be confirmed, the node having the transaction in the indeterminate state according to step 46. Notify the confirmed transaction status to. If the communication partner at the time of the failure of each node is not the recovery control node 24 but the recovery node group 22 according to the determination in step 43, the recovery control node 24 determines all the recovery information and recovery control of those nodes in step 44. When the node 24 itself accesses the distributed database with another node that does not belong to the recovery node group 22, a message to which recovery information of the recovery control node 24 itself is added is recovered through the communication path 26 to the upper second hierarchy. It is transmitted to the control node 27. Step 45
In step 2, the transaction control information 27 from the recovery control node 27 of the second upper layer is waited for, and after the confirmation information is received, the node which has reported the transaction in the indetermination state is notified of the confirmed transaction state according to step 46.

【００３０】第２階層の回復制御ノード２７では、第１
階層の回復制御ノード２４が回復制御ノードグループ２
２のノード２１に対して実施したのと同じ操作を行う。
すなわち、障害時の各ノードの通信相手となるノードが
回復制御ノード２７であるか回復制御ノード２７の下層
に位置する回復ノードグループ２５あるいは更にその下
層の回復ノードグループ２２と同階層の回復ノードグル
ープに存在し、そのトランザクションのコミットかロー
ルバックかの状態を確定できる時にはステップ４６の処
理に従い不確定状態のトランザクションを持つノードに
確定したトランザクション状態を通知する。一方、通信
相手となるノードが回復制御ノード２７から下層に全く
存在しない場合は、ステップ４４において、回復制御ノ
ード２７は、回復制御ノード２７から下位に位置する全
ノード及び回復制御ノード２７自身の回復情報を付加し
たメッセージを通信路２９を通して上位の最上位の回復
制御ノード３０に送信する。In the recovery control node 27 of the second hierarchy, the first
Recovery control node 24 in the hierarchy is recovery control node group 2
The same operation as performed for the second node 21 is performed.
That is, the node with which each node communicates at the time of failure is the recovery control node 27, or the recovery node group 25 located in the lower layer of the recovery control node 27, or the recovery node group in the same layer as the recovery node group 22 in the lower layer. When the state of commit or rollback of the transaction can be confirmed, the confirmed transaction state is notified to the node having the transaction in the indeterminate state according to the processing of step 46. On the other hand, if there is no communication partner node in the lower layer from the recovery control node 27, in step 44, the recovery control node 27 recovers all the nodes located below the recovery control node 27 and the recovery control node 27 itself. The message with the information added is transmitted to the uppermost recovery control node 30 via the communication path 29.

【００３１】次に、図５を用いて最上位に位置する回復
制御ノード３０における障害時の回復処理について説明
する。Next, the recovery process at the time of a failure in the recovery control node 30 located at the top will be described with reference to FIG.

【００３２】第１実施例では回復制御ノード３０が最上
位の階層の回復制御ノードとなるので、回復制御ノード
３０の下位に階層的に全ての回復ノードグループを持つ
ことになる。従って、ステップ５１、５２において、下
層の回復ノードグループ２８、２５、２２に属する全ノ
ード２７、２４、２１からの回復情報を所定の待ち時間
の間に受信する。ステップ５３において、待ち時間の間
に回復情報が報告されていれば、基本的に各ノードの通
信相手全てが確定し、各トランザクションの不確定状態
も確定し、不確定状態のトランザクションを報告してき
たノードに確定したトランザクション状態を通知する。
但し、通信路２３、２６、２９の障害等で通信相手の回
復情報が得られない場合もあるので、そのときはステッ
プ５４において回復情報が報告されているトランザクシ
ョンで相手側ノードからの回復情報が報告されていない
トランザクションに関しては、後に回復情報が得られた
ときに確定情報を送信できるようにその回復情報を保持
しておく。不確定状態のトランザクションの確定した状
態については、ステップ５５において、不確定状態のト
ランザクションを持つノードに確定情報を通信路２９を
介して送信する。なお、中間層に位置する回復制御ノー
ド２４、２７は、ステップ４５、４６において回復情報
を報告したのと逆の階層経路をたどって、最上位の回復
制御ノード３０からの確定情報を不確定状態のトランザ
クションを持つノードに通知する。また、不確定状態の
トランザクションを持つ各ノードは、ステップ３４にお
いて確定情報を受信することによりトランザクションを
確定する。In the first embodiment, since the recovery control node 30 is the recovery control node of the highest hierarchy, all recovery node groups are hierarchically provided under the recovery control node 30. Therefore, in steps 51 and 52, the recovery information from all the nodes 27, 24 and 21 belonging to the recovery node groups 28, 25 and 22 in the lower layer is received during a predetermined waiting time. In step 53, if the recovery information is reported during the waiting time, basically all the communication partners of each node are confirmed, the indeterminate state of each transaction is also confirmed, and the transaction in the indeterminate state is reported. Notify the node of the confirmed transaction status.
However, the recovery information of the communication partner may not be obtained due to a failure of the communication paths 23, 26, 29. In that case, the recovery information from the partner node is not recovered in the transaction whose recovery information is reported in step 54. For unreported transactions, the recovery information is retained so that definite information can be sent later when recovery information is obtained. Regarding the confirmed state of the transaction in the indefinite state, in step 55, the confirmation information is transmitted to the node having the transaction in the indefinite state via the communication path 29. The recovery control nodes 24 and 27 located in the middle tier follow the hierarchical path opposite to the one in which the recovery information is reported in steps 45 and 46, and the definite information from the uppermost recovery control node 30 is in an uncertain state. Notify the node that has the transaction. Further, each node having the transaction in the indeterminate state commits the transaction by receiving the committing information in step 34.

【００３３】このような本実施例における回復方式にお
いては、分散データベース全体にわたり大規模なアクセ
スがあった場合でも、分散データベースを構成するノー
ド数をｎとしたとき、分散データベースアクセスの複雑
さに依存せず常に２ｎ回以下のメッセージ交換回数によ
り回復することができ、従来の２ノード間の回復による
場合が分散データベースアクセスの複雑さに依存して最
低でも２ｎ回最大ではノード数の２次関数ｎ（ｎ−１）
回必要であったのに比し、大幅にメッセージ交換回数を
軽減することができる。In the recovery method according to this embodiment, even if a large-scale access is made to the entire distributed database, when the number of nodes constituting the distributed database is n, it depends on the complexity of the distributed database access. It is possible to always recover by 2n times or less of message exchanges, and in the case of the conventional recovery between two nodes, depending on the complexity of the distributed database access, at least 2n times and at most a quadratic function n of the number of nodes. (N-1)
The number of message exchanges can be greatly reduced compared to the case where the number of times was required.

【００３４】実施例２．図６は、第２実施例の要旨を説
明するための分散データベースの概念図である。なお、
第１実施例と同様の要素には同じ符号を付け説明を省略
する。 Example 2. FIG. 6 is a conceptual diagram of a distributed database for explaining the gist of the second embodiment. In addition,
The same elements as those in the first embodiment are designated by the same reference numerals and the description thereof will be omitted.

【００３５】第２実施例において特徴的なことは、ツリ
ー構造の上位と下位に位置するノード間に迂回路を持つ
ことである。第２実施例においては、ノード１１と回復
制御ノード１５との間を直接接続した迂回通信路１９及
び同一回復ノードグループ１３内の他のノード１１を経
由して間接的に接続した迂回通信路２０を持つ。A characteristic of the second embodiment is that there is a detour between nodes located at the upper and lower levels of the tree structure. In the second embodiment, the bypass communication path 19 directly connected between the node 11 and the recovery control node 15 and the bypass communication path 20 indirectly connected via the other node 11 in the same recovery node group 13. have.

【００３６】これにより、第１実施例において回復操作
の際に回復制御ノード１５との間の通信路１４に何らか
の通信障害が生じ、通信路１４を用いて回復制御ノード
１５との間のメッセージの交換が不能となった場合であ
っても、回復制御ノード１５との間の回復情報の報告や
確定情報の受信を直接的な迂回通信路１９あるいは間接
的な迂回通信路２０、隣接したノード１１を介して行う
ことができる。なお、障害回復制御の処理は、第１実施
例と同様の処理手順で行うことができる。As a result, some communication failure occurs in the communication path 14 with the recovery control node 15 during the recovery operation in the first embodiment, and a message with the recovery control node 15 is transmitted using the communication path 14. Even when the exchange becomes impossible, the recovery control node 15 reports the recovery information and receives the confirmation information by using the direct detour communication path 19 or the indirect detour communication path 20, and the adjacent node 11 Can be done through. Note that the failure recovery control processing can be performed in the same processing procedure as in the first embodiment.

【００３７】このような迂回路の設定により、分散デー
タベースを構成するノード数ｎとしたとき、従来２ノー
ド間の回復による場合はｎ（ｎ−１）／２必要であった
迂回路の数は、本実施例によれば直接的な迂回通信路１
９の数ｎで十分であることから分散データベースシステ
ムが大規模になるほどその差が明確になる。When the number of nodes constituting the distributed database is set to n by such detour setting, the number of detours required to be n (n-1) / 2 in the conventional case of recovery between two nodes is According to the present embodiment, the direct detour communication path 1
Since the number n of 9 is sufficient, the difference becomes clear as the distributed database system becomes large in scale.

【００３８】なお、本実施例においては、最下位に位置
するノード１１とその上位に位置する回復制御ノード１
５との関係を用いて本実施例の特徴である迂回路の構成
について説明したが、これに限らず他の階層間あるいは
回復ノードグループにおいても同様に迂回路の設定を行
うことができる。In the present embodiment, the node 11 located at the bottom and the recovery control node 1 located above it.
Although the configuration of the detour, which is a feature of this embodiment, has been described with reference to the relationship with No. 5, it is not limited to this, and the detour can be similarly set between other layers or in the recovery node group.

【００３９】実施例３．図７は、第３実施例を説明する
ための分散データベースの構成図である。なお、第１実
施例と同様の要素には同じ符号を付け説明を省略する。 Example 3. FIG. 7 is a configuration diagram of a distributed database for explaining the third embodiment. The same elements as those in the first embodiment are designated by the same reference numerals and the description thereof will be omitted.

【００４０】本実施例において特徴的なことは、あるノ
ードの回復制御を行う回復制御ノード以外のノードとの
間に迂回路を持たせることである。第３実施例において
は、あるノード、例えばノード２４ａの上位の回復制御
ノード２７ａと同一階層に属する他の回復制御ノード２
７ｂとの間に迂回通信路７１を設定するとともに、上位
の回復制御ノード２７ａより更に上位階層に属する回復
制御ノード３０との間に迂回通信路７２を設定する。A feature of this embodiment is that a detour is provided between the node and a node other than the recovery control node that controls the recovery of a certain node. In the third embodiment, another recovery control node 2 belonging to the same hierarchy as a higher recovery control node 27a of a node, for example, the node 24a.
The bypass communication path 71 is set between the recovery control node 27 and the recovery control node 30 belonging to a higher hierarchy than the recovery control node 27a at the higher level.

【００４１】これにより、ノード２４ａ、２７ａ間の通
信路２６に障害があった場合あるいはノード２４ａが属
する回復ノードグループ２５ａの回復制御ノード２７ａ
がダウンした場合であっても、迂回通信路７１あるいは
迂回通信路７２を介して他の回復制御ノード２７ｂある
いは回復制御ノード３０とメッセージ交換し、回復制御
操作をそれら他の回復制御ノードに代行させることによ
り回復を行うことができる。障害回復制御の処理は、第
２実施例と同様の処理手順で行うことができる。Thus, when there is a failure in the communication path 26 between the nodes 24a and 27a, or the recovery control node 27a of the recovery node group 25a to which the node 24a belongs.
Even when the recovery control node is down, a message is exchanged with another recovery control node 27b or the recovery control node 30 via the bypass communication path 71 or the bypass communication path 72, and the recovery control operation is delegated to those other recovery control nodes. By doing so, recovery can be performed. The failure recovery control processing can be performed in the same processing procedure as in the second embodiment.

【００４２】また、本実施例においては、第１階層に位
置する回復制御ノード２４とその上位に位置する回復制
御ノード２７、３０との関係を用いて本実施例の特徴で
ある迂回路の構成について説明したが、これに限らず他
の階層間あるいは回復ノードグループにおいても同様に
迂回路の設定を行うことができる。Further, in this embodiment, the configuration of the detour, which is a feature of this embodiment, is used by using the relationship between the recovery control node 24 located in the first layer and the recovery control nodes 27 and 30 located above it. However, the detour can be similarly set between other layers or in the recovery node group.

【００４３】実施例４．本実施例において特徴的なこと
は、図２もしくは図７の構成において、最上位の回復制
御ノード３０は、不確定なトランザクションの状態を確
定するための回復情報を通信相手の回復制御ノード２７
から得られない場合に当該ノードへの問い合わせる機構
を有することである。これにより、通常処理中、通信路
２９の障害等で得られなかった回復情報を後で得ること
ができる。 Example 4. A characteristic of this embodiment is that in the configuration of FIG. 2 or 7, the highest-level recovery control node 30 provides recovery information for establishing the state of an indefinite transaction with the recovery control node 27 of the communication partner.
It is to have a mechanism to inquire to the node when it cannot be obtained from the node. This makes it possible to later obtain recovery information that was not obtained due to a failure of the communication path 29 during normal processing.

【００４４】図８は、第４実施例における最上位ノード
の回復処理のフローチャートであり、以下、図８を用い
て本実施例における問い合わせ処理について説明する。FIG. 8 is a flowchart of the recovery process of the highest-level node in the fourth embodiment, and the inquiry process in this embodiment will be described below with reference to FIG.

【００４５】最上位の回復制御ノード３０は、図５に示
した処理と同様ステップ５１、５２、５５において、基
本的に各ノードの通信相手全てが確定し、基本的に各ト
ランザクションの不確定状態も確定し、不確定状態のト
ランザクションを報告してきたノードに対して確定した
トランザクション状態を通知する。The highest recovery control node 30 basically establishes all communication partners of each node in steps 51, 52, and 55 in the same way as the processing shown in FIG. 5, and basically sets the indeterminate state of each transaction. Is also confirmed, and the confirmed transaction state is notified to the node that has reported the transaction in the indeterminate state.

【００４６】しかし、通信路の障害や通信相手の処理の
遅延で通信相手の回復情報が得られない場合もあるの
で、その時はステップ５３の判定に従い、ステップ８４
において、回復情報が得られないノードに対し、直接あ
るいはツリー構造の各回復制御ノード２７等を介して問
い合わせメッセージを送信する。問い合わせメッセージ
を受信した当該ノードは、回復情報を送信することによ
り、回復制御ノード３０は、回復情報が得られなかった
ノードから回復情報を得ることができる。なお、問い合
わせに無応答なノードが存在する場合は、回復不能なノ
ードとしてその情報を保持する。However, the recovery information of the communication partner may not be obtained in some cases due to the failure of the communication path or the delay of the processing of the communication partner.
At, a query message is transmitted to a node for which recovery information cannot be obtained, either directly or via each recovery control node 27 having a tree structure. The node that has received the inquiry message transmits the recovery information, so that the recovery control node 30 can obtain the recovery information from the node for which the recovery information has not been obtained. If there is a node that does not respond to the inquiry, the information is held as an unrecoverable node.

【００４７】これにより、迂回路を含めた通信路の障害
や通信相手ノードのダウンが発生している場合を除き、
単に通信相手の処理の遅延等で処理が遅れている場合は
遅延を最小にして分散データベースシステム全体の回復
情報を入手することができ、最小の時間で障害回復処理
が完了する。また、その時点で回復不能なノードを確実
に特定することができる。As a result, except when a failure of the communication path including the detour or a down of the communication partner node occurs,
If the processing is simply delayed due to the processing delay of the communication partner, the delay can be minimized to obtain the recovery information of the entire distributed database system, and the failure recovery processing is completed in the minimum time. In addition, the node that cannot be recovered at that time can be surely specified.

【００４８】なお、本実施例においては、分散データベ
ースシステム全体の回復情報を得るために最上位に位置
する回復制御ノード３０に関して説明したが、他の階層
の回復制御ノードにおいて実施してもよい。In this embodiment, the recovery control node 30 located at the highest level in order to obtain the recovery information of the entire distributed database system has been described, but the recovery control node of another hierarchy may be used.

【００４９】実施例５．本実施例において特徴的なこと
は、図２もしくは図７の構成において、最上位の回復制
御ノード３０は、障害回復の状態を下位に位置するノー
ド２７等に通知する機構を有することである。これによ
り、全ノードに対して分散データベースシステムの回復
の状態を知らせることができる。 Example 5. A feature of this embodiment is that in the configuration of FIG. 2 or FIG. 7, the highest recovery control node 30 has a mechanism for notifying the lower node 27 or the like of the failure recovery status. This allows all nodes to be notified of the recovery status of the distributed database system.

【００５０】図９は、第５実施例における最上位ノード
の回復処理のフローチャートであり、以下、図９を用い
て本実施例における障害回復の状態を通知する処理につ
いて説明する。FIG. 9 is a flow chart of the recovery process of the highest node in the fifth embodiment, and the process of notifying the failure recovery status in this embodiment will be described below with reference to FIG.

【００５１】最上位の回復制御ノード３０は、図８に示
した処理と同様ステップ５１、５２、５５において、基
本的に各ノードの通信相手全てが確定し、基本的に各ト
ランザクションの不確定状態も確定し、不確定状態のト
ランザクションを報告してきたノードに対して確定した
トランザクション状態を通知する。また、ステップ８４
において、回復情報が得られないノードに対し問い合わ
せメッセージを送信することで、回復情報を得る。The recovery control node 30 at the highest level basically establishes all the communication partners of each node in steps 51, 52 and 55 as in the processing shown in FIG. Is also confirmed, and the confirmed transaction state is notified to the node that has reported the transaction in the indeterminate state. Also, step 84
At, the recovery information is obtained by transmitting an inquiry message to the node for which the recovery information is not acquired.

【００５２】ステップ９５において、ステップ８４によ
り把握した回復不能なノードの存在の有無等分散データ
ベースシステム全体の回復状況を示すシステム回復状態
情報を含むメッセージを各階層を通して全ノードに通知
する。なお、不確定状態のトランザクションを報告して
いたノードに対しては、確定情報とともにメッセージを
送信する。In step 95, all nodes are notified of a message including system recovery status information indicating the recovery status of the entire distributed database system, such as the presence or absence of unrecoverable nodes grasped in step 84. A message is sent together with the confirmation information to the node that has reported the transaction in the indeterminate state.

【００５３】これにより、従来、各ノードでは把握でき
なかった分散データベースシステム全体の回復状況を全
ノードにおいて確認することができる。As a result, the recovery status of the entire distributed database system, which cannot be conventionally grasped by each node, can be confirmed at all nodes.

【００５４】ところで、上記各実施例では、回線等のネ
ットワークで構成された分散データベースシステムにつ
いて述べたが、疎結合された並列計算機システム内に構
築された分散データベースシステムにも適用できること
は言うまでもない。By the way, in each of the above-mentioned embodiments, the distributed database system constituted by a network such as a line has been described, but it goes without saying that it can also be applied to a distributed database system constructed in a loosely coupled parallel computer system.

【００５５】[0055]

【発明の効果】以上説明したように本発明では、以下に
記載される効果を奏する。As described above, the present invention has the following effects.

【００５６】分散データベースを構成するノード数をｎ
とするとき、障害時に分散データベースシステム全体に
わたる大規模なアクセスをしていた場合に、障害時にど
んな複雑なアクセスをしていようとも、回復に要するメ
ッセージ交換回数は常にｎの１次関数２ｎ回以下とな
り、大幅に回復時間を削減することが可能となる。The number of nodes constituting the distributed database is n
When a large-scale access is made to the entire distributed database system at the time of failure, the number of message exchanges required for recovery is always 2n times or less as a primary function of n, no matter how complicated access is made at the time of failure. Therefore, the recovery time can be significantly reduced.

【００５７】また、回復するまでにかかる所要時間がノ
ード数の１次関数に依存するので回復処理の終了時間を
推定することが可能となる。Further, since the time required for recovery depends on the linear function of the number of nodes, it is possible to estimate the end time of the recovery process.

【００５８】また、分散データベースを構成するノード
数をｎとするとき、通信路障害時にも回復に全く支障の
ないように設定すべき迂回路の数はｎであり、大幅に迂
回路数を削減することが可能となる。When the number of nodes forming the distributed database is n, the number of detours that should be set so as not to hinder recovery at the time of a communication path failure is n, and the number of detours is significantly reduced. It becomes possible to do.

【００５９】更に、回復に必要な情報が得られない場合
でも、問い合わせ機構を設けたことにより、迂回路を含
めた通信路の障害や相手ノードのダウンが発生している
場合を除き、単に通信相手の処理の遅延等で処理が遅れ
ている場合は遅延を最小にして回復情報を入手すること
ができ、最小の時間で障害回復処理を完了させることが
可能となる。また、その時点で回復不能なノードを確実
に特定することができる。Further, even when the information necessary for the recovery cannot be obtained, the communication mechanism is simply provided by the provision of the inquiry mechanism, except when the failure of the communication path including the detour or the failure of the partner node occurs. When the processing is delayed due to the processing delay of the other party, the delay can be minimized to obtain the recovery information, and the failure recovery processing can be completed in the minimum time. In addition, the node that cannot be recovered at that time can be surely specified.

【００６０】また、障害回復の状態を下位に位置する各
ノードに通知する機構を設けたことで、従来、各ノード
では把握できなかった分散データベースシステム全体の
回復状況を全ノードにおいて確認することが可能とな
る。これにより、分散データベースを利用した業務の全
面的再開や部分的再開を利用者に確実にアナウンスでき
る。Further, by providing a mechanism for notifying the lower level nodes of the failure recovery status, it is possible to confirm the recovery status of the entire distributed database system in all the nodes, which could not be grasped in the conventional node. It will be possible. As a result, it is possible to reliably notify the user of the full or partial resumption of business using the distributed database.

[Brief description of drawings]

【図１】本発明に係る第１実施例の要旨を説明するた
めの分散データベースの概念図である。FIG. 1 is a conceptual diagram of a distributed database for explaining the gist of a first embodiment according to the present invention.

【図２】第１実施例を説明するための分散データベー
スの構成図である。FIG. 2 is a configuration diagram of a distributed database for explaining the first embodiment.

【図３】第１実施例において最下位に位置するノード
における障害時の回復処理のフローチャートである。FIG. 3 is a flowchart of a recovery process at the time of a failure in the lowest node in the first embodiment.

【図４】第１実施例において中間層に位置する回復制
御ノードにおける障害時の回復処理のフローチャートで
ある。FIG. 4 is a flowchart of a recovery process at the time of a failure in the recovery control node located in the middle tier in the first embodiment.

【図５】第１実施例において最上位に位置する回復制
御ノードにおける障害時の回復処理のフローチャートで
ある。FIG. 5 is a flowchart of a recovery process at the time of a failure in the recovery control node located at the highest level in the first embodiment.

【図６】本発明に係る第２実施例の要旨を説明するた
めの分散データベースの概念図である。FIG. 6 is a conceptual diagram of a distributed database for explaining the gist of the second embodiment according to the present invention.

【図７】本発明に係る第３実施例を説明するための分
散データベースの構成図である。FIG. 7 is a configuration diagram of a distributed database for explaining a third embodiment according to the present invention.

【図８】本発明に係る第４実施例における最上位ノー
ドの回復処理のフローチャートである。FIG. 8 is a flowchart of a recovery process of the highest node in the fourth embodiment according to the present invention.

【図９】本発明に係る第５実施例における最上位ノー
ドの回復処理のフローチャートである。FIG. 9 is a flowchart of a recovery process of the highest node in the fifth embodiment according to the present invention.

【図１０】従来の分散データベースの構成図である。FIG. 10 is a configuration diagram of a conventional distributed database.

[Explanation of symbols]

１１、２１ノード、１２、１６データベース、１
３、１７、２２、２５、２８回復ノードグループ、１
４、１８、２３、２６、２９通信路、１５、２４、２
７、３０回復制御ノード、１９、２０、７１、７２
迂回通信路。11, 21 nodes, 12, 16 databases, 1
3, 17, 22, 25, 28 Recovery node group, 1
4, 18, 23, 26, 29 Channel, 15, 24, 2
7, 30 Recovery control node, 19, 20, 71, 72
Detour communication path.

Claims

[Claims]

1. In a distributed database system in which a plurality of nodes each having a processor and a database are connected by a network, and a process in each node accesses a database of another node, the plurality of nodes are logically divided into groups, and A failure recovery method for a distributed database system characterized in that a failure control node for controlling failure recovery is provided for each group so that failure recovery control for each node is performed in a tree structure.

2. A failure recovery method for a distributed database system according to claim 1, wherein a plurality of said recovery control nodes are logically divided into groups, and a higher recovery control node for controlling failure recovery for each group. A disaster recovery method for distributed database systems, in which the failure recovery control of each node is performed hierarchically by providing the.

3. The failure recovery method for a distributed database system according to claim 2, wherein a hierarchy is formed until the number of the upper recovery control nodes finally becomes one. .

4. The failure recovery method for a distributed database system according to claim 1, wherein a detour is provided between nodes located at an upper level and a lower level of the tree structure.

5. The failure recovery method for a distributed database system according to any one of claims 1 to 4, wherein the recovery control node cannot obtain recovery information for establishing the state of an indefinite transaction from a node of a communication partner. A failure recovery method for a distributed database system, characterized in that it has a mechanism for inquiring about the node.

6. The failure recovery method for a distributed database system according to claim 1, wherein the recovery control node has a mechanism for notifying a lower level node of the failure recovery status. Disaster recovery method.