JP2008097155A

JP2008097155A - Data storage controller and data storage device

Info

Publication number: JP2008097155A
Application number: JP2006275820A
Authority: JP
Inventors: Xiangyong Ouyang; 湘勇欧▲やん▼; Haruo Yokota; 治夫横田; Tomohiro Yoshihara; 朋宏吉原; Masaru Kobayashi; 大小林; Kaku Ejiri; 革江尻; Mitsuhiko Ota; 光彦太田
Original assignee: Fujitsu Ltd; Tokyo Institute of Technology NUC
Current assignee: Fujitsu Ltd; Tokyo Institute of Technology NUC
Priority date: 2006-10-06
Filing date: 2006-10-06
Publication date: 2008-04-24

Abstract

<P>PROBLEM TO BE SOLVED: To improve efficiency of data processing by suppressing increase of overhead related to transaction. <P>SOLUTION: A primary master transmits a request message to execute an operation. A primary cohort executes an operation and transmits an ACK message to the primary master when it receives the execution request message, and transmits a message including log information related to the operation, and a backup cohort receives the message and stores the log information in a memory. The primary master transmits a request message to execute decision processing after completion of all operations. The primary cohort, in receiving the execution request message, executes the decision processing and transmits a message including the log information related to the decision processing. The backup cohort receives the message and executes the decision processing based on the log information stored in the memory. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、データと当該データのバックアップデータとをそれぞれ異なるデータ記憶部に記憶させる制御をおこなうデータ記憶制御装置および複数のデータ記憶部を有し、データと当該データのバックアップデータとをそれぞれ異なるデータ記憶部に記憶させるデータ記憶装置に関し、特に、トランザクションに係るオーバーヘッドの増大を抑制し、データ処理の効率を向上させることができるデータ記憶制御装置およびデータ記憶装置に関する。 The present invention has a data storage control device and a plurality of data storage units that perform control for storing data and backup data of the data in different data storage units, and the data and backup data of the data are different from each other. More particularly, the present invention relates to a data storage control device and a data storage device capable of suppressing an increase in overhead associated with a transaction and improving data processing efficiency.

従来、ＳＡＮ（Storage Area Network）などのネットワークを介して接続された各ディスクが、ディクスコントローラ上のプロセッサやメモリを利用して自律的なデータ管理をおこなうとともに、ディスク間で通信をおこなってホストから要求されたデータ処理を実行する自律ディスクが提案されている（非特許文献１を参照）。 Conventionally, each disk connected via a network such as a SAN (Storage Area Network) performs autonomous data management using a processor and memory on a disk controller, and communicates between the disks from the host. An autonomous disk that executes requested data processing has been proposed (see Non-Patent Document 1).

この自律ディスクでは、データ管理の信頼性を高めるため、ディスクに記憶されたデータのバックアップを、当該ディスクとは別のディスクに格納する。そして、ディスクに障害が発生してデータが読み出せなくなった場合には、バックアップされたデータが代わりに読み出され、データ処理が実行される。 In this autonomous disk, in order to improve the reliability of data management, a backup of data stored in the disk is stored in a disk different from the disk. Then, when a failure occurs in the disk and data cannot be read, the backed up data is read instead and data processing is executed.

このようなシステムにおいては、トランザクションの原子性を保障することが極めて重要である。これを実現するコミットプロトコルとして、２フェーズコミット（２ＰＣ）プロトコルや、アーリープリペア（ＥＲ）プロトコルなどがある（非特許文献２を参照）。 In such a system, it is extremely important to ensure the atomicity of transactions. As a commit protocol for realizing this, there are a two-phase commit (2PC) protocol, an early prepare (ER) protocol, and the like (see Non-Patent Document 2).

Haruo Yokota, “Ａutonomous Disks for Advanced Database Applications”, in Proc. of International Symposium on Database Applications in Non-traditional Environments (DANTE'99), pp.448-457.Haruo Yokota, “Autonomous Disks for Advanced Database Applications”, in Proc. Of International Symposium on Database Applications in Non-traditional Environments (DANTE'99), pp.448-457. J. Stamos and F. Cristian, “A low-cost atomic commit protocol”, In proceeding of ninth symposium on Reliable distributed systems, October 1990.J. Stamos and F. Cristian, “A low-cost atomic commit protocol”, In proceeding of ninth symposium on Reliable distributed systems, October 1990.

しかしながら、上述した従来技術におけるコミットプロトコルが、ディスクに格納されたデータを別のディスクにバックアップする処理をおこなう自律ディスクに適用された場合には、満足できるパフォーマンスが得られないという問題があった。 However, when the above-described commit protocol in the prior art is applied to an autonomous disk that performs processing for backing up data stored in a disk to another disk, there is a problem that satisfactory performance cannot be obtained.

具体的には、２フェーズコミットプロトコルやアーリープリペアプロトコルでは、オペレーションごとにログをディスク装置に書き込み、書き込みが正常終了したことを示すメッセージをオペレーションの実行を要求した装置に応答する必要がある。そのため、トランザクションに係るオーバーヘッドが大きくなり、自律ディスクの処理効率を低下させてしまう。 Specifically, in the two-phase commit protocol or the early prepare protocol, it is necessary to write a log to the disk device for each operation and return a message indicating that the writing has been completed normally to the device that requested the execution of the operation. As a result, the overhead associated with the transaction increases and the processing efficiency of the autonomous disk decreases.

このように、自律ディスクにおいては、トランザクションに係るオーバーヘッドの増大をいかにして抑制し、データ処理の効率を向上させることができるかが重要な課題となっている。 Thus, in an autonomous disk, how to suppress an increase in overhead related to a transaction and improve data processing efficiency is an important issue.

本発明は、上述した従来技術による問題点を解消するためになされたものであり、トランザクションに係るオーバーヘッドの増大を抑制し、データ処理の効率を向上させることができるデータ記憶制御装置およびデータ記憶装置を提供することを目的とする。 The present invention has been made to solve the above-described problems caused by the prior art, and is a data storage control device and a data storage device capable of suppressing an increase in overhead associated with a transaction and improving data processing efficiency. The purpose is to provide.

上述した課題を解決し、目的を達成するため、本発明は、データと当該データのバックアップデータとをそれぞれ異なるデータ記憶部（実施例におけるディスク装置４０ａ〜４０ｄ）に記憶させる制御をおこなうデータ記憶制御装置であって、データに対するオペレーションの実行要求メッセージを複数送信する第１の記憶制御部（実施例におけるプライマリマスター）と、前記オペレーションの実行要求メッセージを受信した場合に、当該オペレーションを実行するとともに、オペレーションの実行が完了するたびにオペレーションの実行要求メッセージに対する応答メッセージを第１の記憶制御部に送信し、オペレーションに係るログ情報を生成して当該ログ情報を含んだメッセージを送信する１つ以上の第２の記憶制御部（実施例におけるプライマリコホート）と、前記１つ以上の第２の記憶制御部により送信されたメッセージを受信して各オペレーションに係るログ情報をメモリに記憶する１つ以上の第３の記憶制御部（実施例におけるバックアップコホート）と、を備え、前記第１の記憶制御部は、前記１つ以上の第２の記憶制御部により送信された応答メッセージを受信してすべてのオペレーションの実行が完了したか否かを判定し、すべてのオペレーションの実行完了後、オペレーションに対するディシジョン処理の実行要求メッセージを送信し、前記１つ以上の第２の記憶制御部は、第１の記憶制御部により送信されたディシジョン処理の実行要求メッセージを受信した場合にディシジョン処理を実行して、前記１つ以上の第２の記憶制御部のそれぞれに接続されたデータ記憶部に記憶されたデータの更新を制御するとともに、ディシジョン処理に係るログ情報を生成して当該ログ情報を含んだメッセージを送信し、前記１つ以上の第３の記憶制御部は、前記１つ以上の第２の記憶制御部により送信されたメッセージを受信し、前記メモリに記憶されたログ情報に基づいてディシジョン処理を実行して、前記１つ以上の第３の記憶制御部のそれぞれに接続されたデータ記憶部に記憶されたバックアップデータの更新制御をおこなうことを特徴とする。 In order to solve the above-described problems and achieve the object, the present invention is a data storage control for performing control for storing data and backup data of the data in different data storage units (disk devices 40a to 40d in the embodiment). A first storage control unit (primary master in the embodiment) that transmits a plurality of operation execution request messages for data, and when the operation execution request message is received, executes the operation; Each time execution of the operation is completed, a response message to the operation execution request message is transmitted to the first storage control unit, log information related to the operation is generated, and a message including the log information is transmitted. Second storage control unit (in the example Primary cohort) and one or more third storage control units that receive messages transmitted by the one or more second storage control units and store log information related to each operation in a memory (Example) The first storage controller receives response messages sent by the one or more second storage controllers and whether or not all operations have been executed. After the completion of execution of all the operations, a decision processing execution message for the operation is transmitted, and the one or more second storage control units transmit the decision processing transmitted by the first storage control unit. When an execution request message is received, a decision process is executed and connected to each of the one or more second storage control units. Controlling update of data stored in the data storage unit, generating log information related to decision processing and transmitting a message including the log information, the one or more third storage control units, Each of the one or more third storage control units receives a message transmitted by one or more second storage control units and executes a decision process based on the log information stored in the memory. The update control of the backup data stored in the data storage unit connected to is performed.

また、本発明は、上記発明において、前記１つ以上の第３の記憶制御部は、前記ディシジョン処理を実行した後、ディシジョン処理の完了を示すディシジョン完了メッセージを送信し、前記１つ以上の第２の記憶制御部は、前記１つ以上の第３の記憶制御部からディシジョン完了メッセージを受信した場合に、前記第１の記憶制御部に対してデータの同期処理が完了したことを示す同期完了メッセージを送信することを特徴とする。 Further, the present invention is the above invention, wherein the one or more third storage control units, after executing the decision process, transmit a decision completion message indicating completion of the decision process, and When the two storage control units receive the decision completion message from the one or more third storage control units, the synchronization completion indicating that the data synchronization processing has been completed to the first storage control unit A message is transmitted.

また、本発明は、上記発明において、オペレーションの実行要求メッセージを送信する１つ以上の第２の記憶制御部を示すメンバーシップログ情報をメモリに記憶する第４の記憶制御部（実施例におけるバックアップマスター）をさらに備え、前記第１の記憶制御部は、前記メンバーシップログ情報を生成して第４の記憶制御部に送信するとともに、前記ディシジョン処理の実行を決定した場合にディシジョン処理の実行決定を示すディシジョンログ情報を生成して第４の記憶制御部に送信し、前記第４の記憶制御部は、前記ディシジョンログ情報を受信した後、前記第１の記憶制御部に係る障害を検出した場合に、前記メンバーシップログ情報に基づいてオペレーションに対するディシジョン処理の実行要求メッセージを前記１つ以上の第２の記憶制御部に送信することを特徴とする。 Further, according to the present invention, in the above invention, a fourth storage control unit (backup in the embodiment) for storing membership log information indicating one or more second storage control units for transmitting an operation execution request message in a memory. The first storage control unit generates the membership log information and transmits the membership log information to the fourth storage control unit. When the execution of the decision process is determined, the first storage control unit determines the execution of the decision process. Is generated and transmitted to the fourth storage control unit, and after receiving the decision log information, the fourth storage control unit detects a failure related to the first storage control unit A decision process execution request message for an operation based on the membership log information. And transmits the to the storage control unit.

また、本発明は、上記発明において、前記１つ以上の第２の記憶制御部は、前記第１の記憶制御部からディシジョン処理の実行要求メッセージを受信した後、前記１つ以上の第３の記憶制御部に係る障害を検出した場合に、第１の記憶制御部に対して前記ディシジョン完了メッセージを送信し、前記障害が復旧した場合に、前記１つ以上の第３の記憶制御部は、前記メモリに記憶されたログ情報に基づいてディシジョン処理を実行して、前記１つ以上の第３の記憶制御部のそれぞれに接続されたデータ記憶部に記憶されたバックアップデータの更新制御をおこなうことを特徴とする。 Further, the present invention is the above invention, wherein the one or more second storage control units receive the decision processing execution request message from the first storage control unit, and then the one or more third storage control units. When a failure related to the storage control unit is detected, the decision completion message is transmitted to the first storage control unit, and when the failure is recovered, the one or more third storage control units are: Executing a decision process based on the log information stored in the memory, and performing update control of backup data stored in a data storage unit connected to each of the one or more third storage control units; It is characterized by.

また、本発明は、上記発明において、前記第１の記憶制御部は、前記１つ以上の第２の記憶制御部により送信されたすべての応答メッセージを受信した後、前記１つ以上の第２の記憶制御部のうちのいずれかに係る障害を検出した場合に、前記１つ以上の第３の記憶制御部にオペレーションに対するディシジョン処理の実行要求メッセージを送信し、前記１つ以上の第３の記憶制御部は、第１の記憶制御部により送信されたディシジョン処理の実行要求メッセージを受信した場合に、ディシジョン処理を実行し、前記障害が復旧した場合にディシジョン処理に係るログ情報を障害から復旧した第２の記憶制御部に送信し、障害から復旧した第２の記憶制御部は、ディシジョン処理に係るログ情報を受信し、受信したログ情報に基づいてディシジョン処理を実行することを特徴とする。 Further, the present invention is the above invention, wherein the first storage control unit receives all response messages transmitted by the one or more second storage control units, and then receives the one or more second storage control units. When a failure related to any one of the storage control units is detected, a decision process execution request message for an operation is transmitted to the one or more third storage control units, and the one or more third storage control units are transmitted. The storage control unit executes the decision process when receiving the decision process execution request message transmitted by the first storage control unit, and recovers the log information related to the decision process from the failure when the failure is recovered The second storage control unit, which has transmitted to the second storage control unit and recovered from the failure, receives the log information related to the decision process, and determines based on the received log information. And executes the ® down process.

また、本発明は、複数のデータ記憶部（実施例におけるディスク装置４０ａ〜４０ｄ）を有し、データと当該データのバックアップデータとをそれぞれ異なるデータ記憶部に記憶させるデータ記憶装置であって、データに対するオペレーションの実行要求メッセージを複数送信する第１の記憶制御部（実施例におけるプライマリマスター）と、前記オペレーションの実行要求メッセージを受信した場合に、当該オペレーションを実行するとともに、オペレーションの実行が完了するたびにオペレーションの実行要求メッセージに対する応答メッセージを第１の記憶制御部に送信し、オペレーションに係るログ情報を生成して当該ログ情報を含んだメッセージを送信する１つ以上の第２の記憶制御部（実施例におけるプライマリコホート）と、前記１つ以上の第２の記憶制御部により送信されたメッセージを受信して各オペレーションに係るログ情報をメモリに記憶する１つ以上の第３の記憶制御部（実施例におけるバックアップコホート）と、を備え、前記第１の記憶制御部は、前記１つ以上の第２の記憶制御部により送信された応答メッセージを受信してすべてのオペレーションの実行が完了したか否かを判定し、すべてのオペレーションの実行完了後、オペレーションに対するディシジョン処理の実行要求メッセージを送信し、前記１つ以上の第２の記憶制御部は、第１の記憶制御部により送信されたディシジョン処理の実行要求メッセージを受信した場合にディシジョン処理を実行して、前記１つ以上の第２の記憶制御部のそれぞれに接続されたデータ記憶部に記憶されたデータを更新するとともに、ディシジョン処理に係るログ情報を生成して当該ログ情報を含んだメッセージを送信し、前記１つ以上の第３の記憶制御部は、前記１つ以上の第２の記憶制御部により送信されたメッセージを受信し、前記メモリに記憶されたログ情報に基づいてディシジョン処理を実行して、前記１つ以上の第３の記憶制御部のそれぞれに接続されたデータ記憶部に記憶されたバックアップデータの更新をおこなうことを特徴とする。 Further, the present invention is a data storage device that has a plurality of data storage units (disk devices 40a to 40d in the embodiment) and stores data and backup data of the data in different data storage units, When a first storage control unit (primary master in the embodiment) for transmitting a plurality of operation execution request messages for the operation and the operation execution request message is received, the operation is executed and the execution of the operation is completed. One or more second storage control units each transmitting a response message to the operation execution request message to the first storage control unit, generating log information related to the operation, and transmitting a message including the log information (Primary cohort in the examples); One or more third storage control units (backup cohort in the embodiment) that receive messages transmitted by one or more second storage control units and store log information related to each operation in a memory; The first storage control unit receives the response message transmitted by the one or more second storage control units, determines whether or not execution of all operations is completed, After the execution of the operation is completed, a decision process execution request message for the operation is transmitted, and the one or more second storage control units receive the decision process execution request message transmitted by the first storage control unit In this case, a decision process is executed to store the data stored in the data storage unit connected to each of the one or more second storage control units. The log data relating to the decision process and transmitting a message including the log information, wherein the one or more third storage control units are configured to transmit the one or more second storage controls. Receiving a message transmitted by the unit, executing a decision process based on the log information stored in the memory, and storing the data in a data storage unit connected to each of the one or more third storage control units The backup data that has been updated is updated.

また、本発明は、上記発明において、前記１つ以上の第２の記憶制御部は、前記第１の記憶制御部からディシジョン処理の実行要求メッセージを受信した後、前記１つ以上の第３の記憶制御部に係る障害を検出した場合に、第１の記憶制御部に対して前記ディシジョン完了メッセージを送信し、前記障害が復旧した場合に、前記１つ以上の第３の記憶制御部は、前記メモリに記憶されたログ情報に基づいてディシジョン処理を実行して、前記１つ以上の第３の記憶制御部のそれぞれに接続されたデータ記憶部に記憶されたバックアップデータを更新することを特徴とする。 Further, the present invention is the above invention, wherein the one or more second storage control units receive the decision processing execution request message from the first storage control unit, and then the one or more third storage control units. When a failure related to the storage control unit is detected, the decision completion message is transmitted to the first storage control unit, and when the failure is recovered, the one or more third storage control units are: A decision process is executed based on the log information stored in the memory to update backup data stored in a data storage unit connected to each of the one or more third storage control units. And

また、本発明は、データと当該データのバックアップデータとをそれぞれ異なるデータ記憶部（実施例におけるディスク装置４０ａ〜４０ｄ）に記憶させるデータ記憶方法であって、第１の記憶制御部（実施例におけるプライマリマスター）が、データに対するオペレーションの実行要求メッセージを送信するオペレーション実行要求メッセージ送信工程と、１つ以上の第２の記憶制御部（実施例におけるプライマリコホート）が、前記オペレーションの実行要求メッセージを受信した場合に、当該オペレーションを実行するとともに、オペレーションの実行が完了するたびにオペレーションの実行要求メッセージに対する応答メッセージを第１の記憶制御部に送信し、オペレーションに係るログ情報を生成して当該ログ情報を含んだメッセージを送信し、１つ以上の第３の記憶制御部（実施例におけるバックアップコホート）が、前記１つ以上の第２の記憶制御部により送信されたメッセージを受信して各オペレーションに係るログ情報をメモリに記憶するオペレーション実行工程と、前記第１の記憶制御部が、前記１つ以上の第２の記憶制御部により送信された応答メッセージを受信してすべてのオペレーションの実行が完了したか否かを判定し、すべてのオペレーションの実行完了後、オペレーションに対するディシジョン処理の実行要求メッセージを送信するディシジョン実行要求メッセージ送信工程と、前記１つ以上の第２の記憶制御部が、第１の記憶制御部により送信されたディシジョン処理の実行要求メッセージを受信した場合にディシジョン処理を実行して、前記１つ以上の第２の記憶制御部のそれぞれに接続されたデータ記憶部に記憶されたデータを更新するとともに、ディシジョン処理に係るログ情報を生成して当該ログ情報を含んだメッセージを送信し、前記１つ以上の第３の記憶制御部が、前記１つ以上の第２の記憶制御部により送信されたメッセージを受信し、前記メモリに記憶されたログ情報に基づいてディシジョン処理を実行して、前記１つ以上の第３の記憶制御部のそれぞれに接続されたデータ記憶部に記憶されたバックアップデータの更新をおこなうディシジョン実行工程と、を含んだことを特徴とする。 The present invention is also a data storage method for storing data and backup data of the data in different data storage units (disk devices 40a to 40d in the embodiment), and includes a first storage control unit (in the embodiment). The operation execution request message transmission step in which the primary master) transmits an operation execution request message for data, and one or more second storage control units (primary cohort in the embodiment) receive the operation execution request message. In this case, the operation is executed, and a response message for the operation execution request message is transmitted to the first storage control unit every time the execution of the operation is completed, and log information related to the operation is generated to generate the log information. Messe containing Log information concerning each operation by receiving one or more third storage control units (the backup cohort in the embodiment) receiving the message transmitted by the one or more second storage control units. And an operation execution step of storing all the operations in the memory, and whether or not the first storage control unit has received the response message transmitted by the one or more second storage control units and has completed the execution of all the operations. A decision execution request message transmission step for transmitting a decision processing execution message for the operation after completion of execution of all the operations, and the one or more second storage control units include a first storage control The decision processing is executed when the decision processing execution request message transmitted by the unit is received, Updating data stored in the data storage unit connected to each of the two or more second storage control units, generating log information related to the decision processing and transmitting a message including the log information, One or more third storage control units receive the message transmitted by the one or more second storage control units, execute a decision process based on the log information stored in the memory, A decision execution step of updating backup data stored in a data storage unit connected to each of the one or more third storage control units.

また、本発明は、上記発明において、前記１つ以上の第３の記憶制御部が、前記ディシジョン処理を実行した後、ディシジョン処理の完了を示すディシジョン完了メッセージを送信するディシジョン完了メッセージ送信工程と、前記１つ以上の第２の記憶制御部が、前記１つ以上の第３の記憶制御部からディシジョン完了メッセージを受信した場合に、前記第１の記憶制御部に対してデータの同期処理が完了したことを示す同期完了メッセージを送信する同期完了メッセージ送信工程とをさらに含んだことを特徴とする。 Further, the present invention is the above-mentioned invention, wherein the one or more third storage control units execute a decision process, and then transmit a decision completion message indicating completion of the decision process. When the one or more second storage control units receive a decision completion message from the one or more third storage control units, the data synchronization processing is completed for the first storage control unit And a synchronization completion message transmission step of transmitting a synchronization completion message indicating that the operation has been completed.

また、本発明は、上記発明において、前記オペレーション実行要求メッセージ送信工程は、前記第１の記憶制御部が、オペレーションの実行要求メッセージを送信する１つ以上の第２の記憶制御部を示すメンバーシップログ情報を生成して第４の記憶制御部（実施例におけるバックアップマスター）に送信し、第４の記憶制御部が、前記メンバーシップログ情報をメモリに記憶し、前記ディシジョン実行要求メッセージ送信工程は、前記第１の記憶制御部が、前記ディシジョン処理の実行を決定した場合にディシジョン処理の実行決定を示すディシジョンログ情報を生成して第４の記憶制御部に送信し、第４の記憶制御部が、前記ディシジョンログ情報を受信した後、前記第１の記憶制御部に係る障害を検出した場合に、前記メンバーシップログ情報に基づいてオペレーションに対するディシジョン処理の実行要求メッセージを前記１つ以上の第２の記憶制御部に送信することを特徴とする。 Further, the present invention is the above invention, wherein the operation execution request message transmission step includes a membership in which the first storage control unit indicates one or more second storage control units that transmit an operation execution request message. Log information is generated and transmitted to a fourth storage control unit (backup master in the embodiment), the fourth storage control unit stores the membership log information in a memory, and the decision execution request message transmission step includes: When the first storage control unit decides to execute the decision process, it generates decision log information indicating the decision to execute the decision process and transmits it to the fourth storage control unit. If the failure relating to the first storage control unit is detected after receiving the decision log information, the membership And transmitting the decision process of the execution request message for operation said one or more second storage controller based on the grayed information.

また、本発明は、上記発明において、前記ディシジョン実行工程は、前記１つ以上の第２の記憶制御部が、前記第１の記憶制御部からディシジョン処理の実行要求メッセージを受信した後、前記１つ以上の第３の記憶制御部に係る障害を検出した場合に、第１の記憶制御部に対して前記ディシジョン完了メッセージを送信し、前記障害が復旧した場合に、前記１つ以上の第３の記憶制御部が、前記メモリに記憶されたログ情報に基づいてディシジョン処理を実行して、前記１つ以上の第３の記憶制御部のそれぞれに接続されたデータ記憶部に記憶されたバックアップデータを更新することを特徴とする。 Further, the present invention is the above-described invention, wherein the decision execution step is performed after the one or more second storage control units receive the decision processing execution request message from the first storage control unit. When a failure related to one or more third storage control units is detected, the decision completion message is transmitted to the first storage control unit, and when the failure is recovered, the one or more third storage control units are transmitted. Backup data stored in a data storage unit connected to each of the one or more third storage control units by executing a decision process based on the log information stored in the memory. It is characterized by updating.

また、本発明は、上記発明において、前記ディシジョン実行要求メッセージ送信工程は、前記第１の記憶制御部が、前記１つ以上の第２の記憶制御部により送信されたすべての応答メッセージを受信した後、前記１つ以上の第２の記憶制御部のうちのいずれかに係る障害を検出した場合に、前記１つ以上の第３の記憶制御部にオペレーションに対するディシジョン処理の実行要求メッセージを送信し、前記ディシジョン実行工程は、前記１つ以上の第３の記憶制御部が、第１の記憶制御部により送信されたディシジョン処理の実行要求メッセージを受信した場合に、ディシジョン処理を実行し、前記障害が復旧した場合にディシジョン処理に係るログ情報を障害から復旧した第２の記憶制御部に送信し、障害から復旧した第２の記憶制御部が、ディシジョン処理に係るログ情報を受信し、受信したログ情報に基づいてディシジョン処理を実行することを特徴とする。 Also, in the present invention according to the above invention, in the decision execution request message transmission step, the first storage control unit has received all response messages transmitted by the one or more second storage control units. Then, when a failure related to any one of the one or more second storage control units is detected, a decision processing execution request message for an operation is transmitted to the one or more third storage control units. The decision execution step executes the decision process when the one or more third storage control units receive a decision process execution request message transmitted by the first storage control unit, and When log is recovered, log information related to decision processing is transmitted to the second storage control unit recovered from the failure, and the second storage control unit recovered from the failure is It receives the log information related to decision processing, and executes the decision processing based on the received log information.

また、本発明は、データと当該データのバックアップデータとをそれぞれ異なるデータ記憶部（実施例におけるディスク装置４０ａ〜４０ｄ）に記憶させる制御をおこなうデータ記憶制御プログラムであって、第１の記憶制御部（実施例におけるプライマリマスター）が、データに対するオペレーションの実行要求メッセージを送信するオペレーション実行要求メッセージ送信手順と、１つ以上の第２の記憶制御部（実施例におけるプライマリコホート）が、前記オペレーションの実行要求メッセージを受信した場合に、当該オペレーションを実行するとともに、オペレーションの実行が完了するたびにオペレーションの実行要求メッセージに対する応答メッセージを第１の記憶制御部に送信し、オペレーションに係るログ情報を生成して当該ログ情報を含んだメッセージを送信し、１つ以上の第３の記憶制御部（実施例におけるバックアップコホート）が、前記１つ以上の第２の記憶制御部により送信されたメッセージを受信して各オペレーションに係るログ情報をメモリに記憶するオペレーション実行手順と、前記第１の記憶制御部が、前記１つ以上の第２の記憶制御部により送信された応答メッセージを受信してすべてのオペレーションの実行が完了したか否かを判定し、すべてのオペレーションの実行完了後、オペレーションに対するディシジョン処理の実行要求メッセージを送信するディシジョン実行要求メッセージ送信手順と、前記１つ以上の第２の記憶制御部が、第１の記憶制御部により送信されたディシジョン処理の実行要求メッセージを受信した場合にディシジョン処理を実行して、前記１つ以上の第２の記憶制御部のそれぞれに接続されたデータ記憶部に記憶されたデータを更新するとともに、ディシジョン処理に係るログ情報を生成して当該ログ情報を含んだメッセージを送信し、前記１つ以上の第３の記憶制御部が、前記１つ以上の第２の記憶制御部により送信されたメッセージを受信し、前記メモリに記憶されたログ情報に基づいてディシジョン処理を実行して、前記１つ以上の第３の記憶制御部のそれぞれに接続されたデータ記憶部に記憶されたバックアップデータの更新をおこなうディシジョン実行手順と、をコンピュータに実行させることを特徴とする。 Further, the present invention is a data storage control program for performing control for storing data and backup data of the data in different data storage units (disk devices 40a to 40d in the embodiment), the first storage control unit The operation execution request message transmission procedure in which the (primary master in the embodiment) transmits an operation execution request message for data, and one or more second storage control units (primary cohort in the embodiment) execute the operation. When the request message is received, the operation is executed, and a response message to the operation execution request message is transmitted to the first storage control unit every time the execution of the operation is completed, and log information related to the operation is generated. Tote A message including log information is transmitted, and one or more third storage control units (backup cohorts in the embodiment) receive the messages transmitted by the one or more second storage control units, and An operation execution procedure for storing log information related to an operation in a memory, and execution of all operations by the first storage control unit receiving a response message transmitted by the one or more second storage control units A decision execution request message transmission procedure for transmitting a decision processing execution message for an operation after completion of execution of all operations, and the one or more second storage control units, When a decision processing execution message sent by the first storage control unit is received, the decision is made Update the data stored in the data storage unit connected to each of the one or more second storage control units, and generate log information related to the decision process to generate the log information. And the one or more third storage control units receive the message transmitted by the one or more second storage control units, and log information stored in the memory is stored in the log information. Executing a decision process based on the decision execution procedure for updating backup data stored in a data storage unit connected to each of the one or more third storage control units. It is characterized by.

本発明によれば、第１の記憶制御部が、データに対するオペレーションの実行要求メッセージを送信し、１つ以上の第２の記憶制御部が、オペレーションの実行要求メッセージを受信した場合に、当該オペレーションを実行するとともに、オペレーションの実行が完了するたびにオペレーションの実行要求メッセージに対する応答メッセージを第１の記憶制御部に送信し、オペレーションに係るログ情報を生成して当該ログ情報を含んだメッセージを送信し、１つ以上の第３の記憶制御部が、１つ以上の第２の記憶制御部により送信されたメッセージを受信して各オペレーションに係るログ情報をメモリに記憶し、第１の記憶制御部が、１つ以上の第２の記憶制御部により送信された応答メッセージを受信してすべてのオペレーションの実行が完了したか否かを判定し、すべてのオペレーションの実行完了後、オペレーションに対するディシジョン処理の実行要求メッセージを送信し、１つ以上の第２の記憶制御部が、第１の記憶制御部により送信されたディシジョン処理の実行要求メッセージを受信した場合にディシジョン処理を実行して、１つ以上の第２の記憶制御部のそれぞれに接続されたデータ記憶部に記憶されたデータを更新するとともに、ディシジョン処理に係るログ情報を生成して当該ログ情報を含んだメッセージを送信し、１つ以上の第３の記憶制御部が、１つ以上の第２の記憶制御部により送信されたメッセージを受信し、メモリに記憶されたログ情報に基づいてディシジョン処理を実行して、１つ以上の第３の記憶制御部のそれぞれに接続されたデータ記憶部に記憶されたバックアップデータの更新をおこなうこととしたので、トランザクションに係るオーバーヘッドの増大を抑制し、データ処理の効率を向上させることができる。 According to the present invention, when the first storage control unit transmits an operation execution request message for data and one or more second storage control units receive the operation execution request message, the operation is performed. Each time execution of the operation is completed, a response message to the operation execution request message is transmitted to the first storage control unit, log information relating to the operation is generated, and a message including the log information is transmitted. The one or more third storage control units receive the message transmitted by the one or more second storage control units, store the log information related to each operation in the memory, and perform the first storage control. The unit receives response messages sent by one or more second storage controllers and executes all operations It is determined whether or not it has been completed, and after execution of all operations is completed, a decision processing execution request message for the operation is transmitted, and one or more second storage control units are transmitted by the first storage control unit When the decision processing execution request message is received, the decision processing is executed to update the data stored in the data storage unit connected to each of the one or more second storage control units, and the decision processing Generating log information and transmitting a message including the log information, wherein one or more third storage control units receive a message transmitted by one or more second storage control units, A data storage unit that executes decision processing based on log information stored in the memory and is connected to each of the one or more third storage control units Since it was decided to update the stored backup data, suppressing an increase in the overhead of the transaction, it is possible to improve the efficiency of data processing.

また、本発明によれば、１つ以上の第３の記憶制御部が、ディシジョン処理を実行した後、ディシジョン処理の完了を示すディシジョン完了メッセージを送信し、１つ以上の第２の記憶制御部が、１つ以上の第３の記憶制御部からディシジョン完了メッセージを受信した場合に、第１の記憶制御部に対してデータの同期処理が完了したことを示す同期完了メッセージを送信することとしたので、データとバックアップデータとの間で同期を確実にとることができる。 Further, according to the present invention, after one or more third storage control units execute a decision process, a decision completion message indicating completion of the decision process is transmitted, and one or more second storage control units However, when a decision completion message is received from one or more third storage control units, a synchronization completion message indicating that data synchronization processing has been completed is sent to the first storage control unit Therefore, it is possible to ensure synchronization between the data and the backup data.

また、本発明によれば、第１の記憶制御部が、オペレーションの実行要求メッセージを送信する１つ以上の第２の記憶制御部を示すメンバーシップログ情報を生成して第４の記憶制御部に送信し、第４の記憶制御部が、メンバーシップログ情報をメモリに記憶し、第１の記憶制御部が、ディシジョン処理の実行を決定した場合にディシジョン処理の実行決定を示すディシジョンログ情報を生成して第４の記憶制御部に送信し、第４の記憶制御部が、ディシジョンログ情報を受信した後、第１の記憶制御部に係る障害を検出した場合に、メンバーシップログ情報に基づいてオペレーションに対するディシジョン処理の実行要求メッセージを１つ以上の第２の記憶制御部に送信することとしたので、第１の記憶制御部に係る障害が発生した場合でも、ディシジョン処理を迅速に実行することができる。 Further, according to the present invention, the first storage control unit generates membership log information indicating one or more second storage control units that transmit the operation execution request message to generate the fourth storage control unit. The fourth storage control unit stores the membership log information in the memory, and when the first storage control unit determines the execution of the decision process, the decision log information indicating the execution determination of the decision process is displayed. Based on the membership log information when the fourth storage control unit detects a failure related to the first storage control unit after receiving the decision log information. The decision processing execution request message for the operation is transmitted to one or more second storage control units, so that a failure related to the first storage control unit occurs. , It is possible to quickly execute the decision process.

また、本発明によれば、１つ以上の第２の記憶制御部が、第１の記憶制御部からディシジョン処理の実行要求メッセージを受信した後、１つ以上の第３の記憶制御部に係る障害を検出した場合に、第１の記憶制御部に対してディシジョン完了メッセージを送信し、障害が復旧した場合に、１つ以上の第３の記憶制御部が、メモリに記憶されたログ情報に基づいてディシジョン処理を実行して、１つ以上の第３の記憶制御部のそれぞれに接続されたデータ記憶部に記憶されたバックアップデータを更新することとしたので、第３の記憶制御部に係る障害が発生した場合でも、ディシジョン処理を迅速に実行することができる。 Further, according to the present invention, one or more second storage control units receive one or more third storage control units after receiving a decision processing execution request message from the first storage control unit. When a failure is detected, a decision completion message is transmitted to the first storage control unit, and when the failure is recovered, one or more third storage control units store log information stored in the memory. Since the decision processing is executed based on this and the backup data stored in the data storage unit connected to each of the one or more third storage control units is updated, the third storage control unit Even when a failure occurs, the decision process can be executed quickly.

また、本発明によれば、第１の記憶制御部が、１つ以上の第２の記憶制御部により送信されたすべての応答メッセージを受信した後、１つ以上の第２の記憶制御部のうちのいずれかに係る障害を検出した場合に、１つ以上の第３の記憶制御部にオペレーションに対するディシジョン処理の実行要求メッセージを送信し、１つ以上の第３の記憶制御部が、第１の記憶制御部により送信されたディシジョン処理の実行要求メッセージを受信した場合に、ディシジョン処理を実行し、障害が復旧した場合にディシジョン処理に係るログ情報を障害から復旧した第２の記憶制御部に送信し、障害から復旧した第２の記憶制御部が、ディシジョン処理に係るログ情報を受信し、受信したログ情報に基づいてディシジョン処理を実行することとしたので、第２の記憶制御部に係る障害が発生した場合でも、ディシジョン処理を迅速に実行することができる。 In addition, according to the present invention, after the first storage control unit receives all response messages transmitted by the one or more second storage control units, the one or more second storage control units When a failure related to any one of them is detected, a decision processing execution message for the operation is transmitted to one or more third storage control units, and one or more third storage control units When the decision processing execution request message transmitted by the storage control unit is received, the decision processing is executed, and when the failure is recovered, the log information related to the decision processing is restored to the second storage control unit that has recovered from the failure. Since the second storage control unit that has transmitted and recovered from the failure has received the log information related to the decision processing, and decided to execute the decision processing based on the received log information. Even if a failure of the second storage controller has occurred, it can quickly perform the decision process.

本発明によれば、トランザクションに係るオーバーヘッドの増大を抑制することによりデータ処理の効率を向上させることができるため、ホストからのデータ処理要求に対するレスポンスを高速化でき、自律ディスクなどの自律分散型ストレージシステムにおける記憶ノードの増大にも効果的に対処することができるという効果を奏する。 According to the present invention, since it is possible to improve the efficiency of data processing by suppressing an increase in overhead associated with a transaction, it is possible to speed up the response to a data processing request from a host, and an autonomous distributed storage such as an autonomous disk There is an effect that it is possible to effectively cope with an increase in storage nodes in the system.

以下に添付図面を参照して、本発明に係るデータ記憶制御装置およびデータ記憶装置の好適な実施例を詳細に説明する。 Exemplary embodiments of a data storage control device and a data storage device according to the present invention will be described below in detail with reference to the accompanying drawings.

まず、本発明に係る自律ディスクシステムの機能構成について説明する。図１は、本実施例に係る自律ディスクシステムの機能構成を示す図である。図１に示すように、この自律ディスクシステムは、クライアント１０ａ〜１０ｃとプロセッシングエレメント２０ａ〜２０ｄとがＳＡＮ（Storage Area Network）などのネットワーク５０を介して接続された構成となっている。 First, the functional configuration of the autonomous disk system according to the present invention will be described. FIG. 1 is a diagram illustrating a functional configuration of the autonomous disk system according to the present embodiment. As shown in FIG. 1, this autonomous disk system has a configuration in which clients 10a to 10c and processing elements 20a to 20d are connected via a network 50 such as a SAN (Storage Area Network).

クライアント１０ａ〜１０ｃは、プロセッシングエレメント２０ａ〜２０ｄに対してデータの検索、挿入、削除などの各種データ処理の実行要求を送信するクライアント装置である。 The clients 10a to 10c are client devices that transmit execution requests for various data processing such as data search, insertion, and deletion to the processing elements 20a to 20d.

プロセッシングエレメント２０ａ〜２０ｄは、クライアント１０ａ〜１０ｃにより送信された各種データ処理の実行要求を受信し、要求されたデータ処理を実行する処理装置である。各プロセッシングエレメント２０ａ〜２０ｄは、処理の対象となるデータを分散して記憶する。そして、各プロセッシングエレメント２０ａ〜２０ｄは、Ｆａｔ−Ｂｔｒｅｅと呼ばれるディレクトリ構造を用いてデータの検索をおこなう。なお、本発明はディレクトリ方式として、特にＦａｔ−Ｂｔｒｅｅ方式に限定するものではなく、ディレクトリを複数のプロセッシングエレメントで共有する分散ディレクトリ方式全般に適用できるものである。 The processing elements 20a to 20d are processing devices that receive various data processing execution requests transmitted by the clients 10a to 10c and execute the requested data processing. Each processing element 20a to 20d stores data to be processed in a distributed manner. Each processing element 20a to 20d searches for data using a directory structure called Fat-Btree. The present invention is not particularly limited to the Fat-Btree method as a directory method, and can be applied to all distributed directory methods in which a directory is shared by a plurality of processing elements.

図２は、Ｆａｔ−Ｂｔｒｅｅについて説明する説明図である。Ｆａｔ−Ｂｔｒｅｅは、従来の並列Ｂｔｒｅｅを改良したものであり、自律ディスクのような分散型データベースシステムにおいて、データに高速にアクセスできるように開発されたディレクトリ構造である。 FIG. 2 is an explanatory diagram for explaining Fat-Btree. Fat-Btree is an improvement over the conventional parallel Btree, and has a directory structure developed so that data can be accessed at high speed in a distributed database system such as an autonomous disk.

このＦａｔ−Ｂｔｒｅｅでは、図２に示すように、Ｂ−ｔｒｅｅのデータページを各プロセッシングエレメント２０ａ〜２０ｄに均等に記憶させ、そのデータページからみて上位にあるインデックスページのみを各プロセッシングエレメント２０ａ〜２０ｄに記憶させる。すなわち、Ｆａｔ−Ｂｔｒｅｅでは、各プロセッシングエレメント２０ａ〜２０ｄは、Ｂ−ｔｒｅｅの部分木を記憶する。 In this Fat-Btree, as shown in FIG. 2, B-tree data pages are stored evenly in the processing elements 20a to 20d, and only the index pages that are higher than the data pages are processed elements 20a to 20d. Remember me. That is, in Fat-Btree, each processing element 20a to 20d stores a B-tree subtree.

このように、Ｆａｔ−Ｂｔｒｅｅでは、ルートページに近く、アクセス頻度が高いインデックスページほどより多くのプロセッシングエレメント２０ａ〜２０ｄにコピーされるので、検索速度を高速化することができる。 As described above, in Fat-Btree, an index page that is closer to the root page and has a higher access frequency is copied to more processing elements 20a to 20d, so that the search speed can be increased.

また、更新頻度が高い下位のインデックスページほど、そのコピーを記憶したプロセッシングエレメント２０ａ〜２０ｄの数が減少するため、コピーの更新をおこなう際に同期が必要となるプロセッシングエレメント２０ａ〜２０ｄの数が少なくなり、オーバーヘッドを減らすことができる。 Further, the lower the index page having the higher update frequency, the smaller the number of processing elements 20a to 20d that store the copy, and therefore the smaller the number of processing elements 20a to 20d that need to be synchronized when updating the copy. And overhead can be reduced.

図１の説明に戻ると、各プロセッシングエレメント２０ａ〜２０ｄは、コントローラ３０ａ〜３０ｄと、ディスク装置４０ａ〜４０ｄとを有する。 Returning to the description of FIG. 1, the processing elements 20a to 20d include controllers 30a to 30d and disk devices 40a to 40d.

コントローラ３０ａ〜３０ｄは、ディスク装置４０ａ〜４０ｄに対するデータアクセスの制御をおこなう処理部である。このコントローラ３０ａ〜３０ｄは、図２に示したＦａｔ−Ｂｔｒｅｅを用いてデータの検索をおこない、ディスク装置４０ａ〜４０ｄに記憶されたデータに対する各種データ処理を実行する。コントローラ３０ａ〜３０ｄの詳細な機能構成については、後に図３を用いて説明する。 The controllers 30a to 30d are processing units that control data access to the disk devices 40a to 40d. The controllers 30a to 30d search for data using the Fat-Btree shown in FIG. 2, and execute various data processing on the data stored in the disk devices 40a to 40d. A detailed functional configuration of the controllers 30a to 30d will be described later with reference to FIG.

ディスク装置４０ａ〜４０ｄは、ハードディスク装置などの記憶装置である。具体的には、このディスク装置４０ａ〜４０ｄは、図２に示したＦａｔ−Ｂｔｒｅｅのディレクトリ情報、他のディスク装置４０ａ〜４０ｄに記憶されたディレクトリ情報のバックアップデータ、各種データ処理の対象となるデータ、他のディスク装置４０ａ〜４０ｄに記憶されたデータのバックアップデータなどを記憶する。 The disk devices 40a to 40d are storage devices such as hard disk devices. Specifically, the disk devices 40a to 40d include the Fat-Btree directory information shown in FIG. 2, backup data of directory information stored in the other disk devices 40a to 40d, and data to be subjected to various data processing. The backup data of the data stored in the other disk devices 40a to 40d is stored.

つぎに、図１に示したプロセッシングエレメント２０ａ〜２０ｄの機能構成について詳細に説明する。図３は、図１に示したプロセッシングエレメント２０ａ〜２０ｄの機能構成を示す図である。なお、各プロセッシングエレメント２０ａ〜２０ｄの機能構成は同様なものとなる。 Next, the functional configuration of the processing elements 20a to 20d shown in FIG. 1 will be described in detail. FIG. 3 is a diagram showing a functional configuration of the processing elements 20a to 20d shown in FIG. The functional configuration of each processing element 20a to 20d is the same.

図１で説明したように、プロセッシングエレメント２０ａ（２０ｂ，２０ｃ，２０ｄ）は、コントローラ３０ａ（３０ｂ，３０ｃ，３０ｄ）とディスク装置４０ａ（４０ｂ，４０ｃ，４０ｄ）とから構成されている。そして、コントローラ３０ａ（３０ｂ，３０ｃ，３０ｄ）は、ネットワークインターフェース部３１ａ（３１ｂ，３１ｃ，３１ｄ）、ディスクインターフェース部３２ａ（３２ｂ，３２ｃ，３２ｄ）、メモリ３３ａ（３３ｂ，３３ｃ，３３ｄ）、制御部３４ａ（３４ｂ，３４ｃ，３４ｄ）を有する。 As described with reference to FIG. 1, the processing element 20a (20b, 20c, 20d) includes a controller 30a (30b, 30c, 30d) and a disk device 40a (40b, 40c, 40d). The controller 30a (30b, 30c, 30d) includes a network interface unit 31a (31b, 31c, 31d), a disk interface unit 32a (32b, 32c, 32d), a memory 33a (33b, 33c, 33d), and a control unit 34a. (34b, 34c, 34d).

ネットワークインターフェース部３１ａ（３１ｂ，３１ｃ，３１ｄ）は、クライアント１０ａ〜１０ｃとの間でネットワーク５０を介してデータの授受をおこなうネットワークインターフェースである。ディスクインターフェース部３２ａ（３２ｂ，３２ｃ，３２ｄ）は、ディスク装置４０ａ（４０ｂ，４０ｃ，４０ｄ）との間でデータの授受をおこなうディスクインターフェースである。 The network interface unit 31a (31b, 31c, 31d) is a network interface that exchanges data with the clients 10a to 10c via the network 50. The disk interface unit 32a (32b, 32c, 32d) is a disk interface that exchanges data with the disk device 40a (40b, 40c, 40d).

メモリ３３ａ（３３ｂ，３３ｃ，３３ｄ）は、制御部３４ａ（３４ｂ，３４ｃ，３４ｄ）によりデータの読み出しや書き込みがなされる半導体主記憶装置である。このメモリ３３ａ（３３ｂ，３３ｃ，３３ｄ）は、ログ３３０ａ（３３０ｂ，３３０ｃ，３３０ｄ）、バックアップディスク管理データ３３１ａ（３３１ｂ，３３１ｃ，３３１ｄ）、ＡＣＫ管理データ３３２ａ（３３２ｂ，３３２ｃ，３３２ｄ）などの各種データを記憶する。 The memory 33a (33b, 33c, 33d) is a semiconductor main storage device in which data is read and written by the control unit 34a (34b, 34c, 34d). The memory 33a (33b, 33c, 33d) includes various data such as a log 330a (330b, 330c, 330d), backup disk management data 331a (331b, 331c, 331d), and ACK management data 332a (332b, 332c, 332d). Remember.

ログ３３０ａ（３３０ｂ，３３０ｃ，３３０ｄ）は、各トランザクションにおいてなされた一連のオペレーションの内容に係る情報を記憶したログデータである。バックアップディスク管理データ３３１ａ（３３１ｂ，３３１ｃ，３３１ｄ）は、元データを記憶するディスク装置４０ａ〜４０ｄと、元データのバックアップデータを記憶するディスク装置４０ａ〜４０ｄとの間の対応関係を記憶したデータである。 The log 330a (330b, 330c, 330d) is log data storing information related to the contents of a series of operations performed in each transaction. The backup disk management data 331a (331b, 331c, 331d) is data that stores the correspondence between the disk devices 40a to 40d that store the original data and the disk devices 40a to 40d that store the backup data of the original data. is there.

図４は、図３に示したバックアップディスク管理データ３３１ａ（３３１ｂ，３３１ｃ，３３１ｄ）の一例を示す図である。図４に示すように、このバックアップディスク管理データ３３１ａ（３３１ｂ，３３１ｃ，３３１ｄ）は、プライマリディスクＩＤおよびバックアップディスクＩＤを記憶している。 FIG. 4 is a diagram showing an example of the backup disk management data 331a (331b, 331c, 331d) shown in FIG. As shown in FIG. 4, this backup disk management data 331a (331b, 331c, 331d) stores a primary disk ID and a backup disk ID.

プライマリディスクＩＤは、元データを記憶するディスク装置４０ａ〜４０ｄを識別する識別情報である。バックアップディスクＩＤは、元データのバックアップデータを記憶するディスク装置４０ａ〜４０ｄを識別する識別情報である。 The primary disk ID is identification information for identifying the disk devices 40a to 40d that store the original data. The backup disk ID is identification information for identifying the disk devices 40a to 40d that store the backup data of the original data.

図３の説明に戻ると、ＡＣＫ管理データ３３２ａ（３３２ｂ，３３２ｃ，３３２ｄ）は、制御部３４ａ（３４ｂ，３４ｃ，３４ｄ）が他のプロセッシングエレメント２０ｂ〜２０ｄに対してオペレーションの実行要求を送信した場合に、オペレーションの実行完了を示すＡＣＫメッセージを他のプロセッシングエレメント２０ｂ〜２０ｄから受信したか否かを示すデータである。 Returning to the description of FIG. 3, the ACK management data 332a (332b, 332c, 332d) is obtained when the control unit 34a (34b, 34c, 34d) transmits an operation execution request to the other processing elements 20b to 20d. Further, it is data indicating whether or not an ACK message indicating the completion of the operation has been received from the other processing elements 20b to 20d.

図５は、図３に示したＡＣＫ管理データ３３２ａ（３３２ｂ，３３２ｃ，３３２ｄ）の一例を示す図である。図５に示すように、ＡＣＫ管理データ３３２ａ（３３２ｂ，３３２ｃ，３３２ｄ）は、オペレーションＩＤ、オペレーション対象ディスクＩＤおよびＡＣＫフラグを記憶している。 FIG. 5 is a diagram illustrating an example of the ACK management data 332a (332b, 332c, 332d) illustrated in FIG. As shown in FIG. 5, the ACK management data 332a (332b, 332c, 332d) stores an operation ID, an operation target disk ID, and an ACK flag.

オペレーションＩＤは、他のプロセッシングエレメント２０ｂ〜２０ｄに対して実行要求を送信した各オペレーションを識別する識別情報である。オペレーション対象ディスクＩＤは、オペレーションの実行要求を送信したプロセッシングエレメント２０ｂ〜２０ｄが有するディスク装置４０ｂ〜４０ｄを識別する識別情報である。 The operation ID is identification information for identifying each operation that has transmitted an execution request to the other processing elements 20b to 20d. The operation target disk ID is identification information for identifying the disk devices 40b to 40d included in the processing elements 20b to 20d that have transmitted the operation execution request.

ＡＣＫフラグは、オペレーションの実行要求を送信したプロセッシングエレメント２０ｂ〜２０ｄからＡＣＫメッセージを受信したか否かを示すフラグである。このフラグが「１」である場合は、プロセッシングエレメント２０ｂ〜２０ｄからＡＣＫメッセージを受信したことを示し、「０」である場合には、プロセッシングエレメント２０ｂ〜２０ｄからＡＣＫメッセージを受信していないことを示す。 The ACK flag is a flag indicating whether or not an ACK message has been received from the processing elements 20b to 20d that have transmitted the operation execution request. When this flag is “1”, it indicates that an ACK message has been received from the processing elements 20b to 20d. When this flag is “0”, it indicates that no ACK message has been received from the processing elements 20b to 20d. Show.

図３の説明に戻ると、制御部３４ａ（３４ｂ，３４ｃ，３４ｄ）は、コントローラ３０ａ（３０ｂ，３０ｃ，３０ｄ）を全体制御する制御部であり、各機能部間のデータの授受を司る。この制御部３４ａ（３４ｂ，３４ｃ，３４ｄ）は、トランザクション実行部３４０ａ（３４０ｂ，３４０ｃ，３４０ｄ）および障害復旧処理部３４１ａ（３４１ｂ，３４１ｃ，３４１ｄ）を有する。 Returning to the description of FIG. 3, the control unit 34 a (34 b, 34 c, 34 d) is a control unit that totally controls the controller 30 a (30 b, 30 c, 30 d), and controls data exchange between the functional units. The control unit 34a (34b, 34c, 34d) includes a transaction execution unit 340a (340b, 340c, 340d) and a failure recovery processing unit 341a (341b, 341c, 341d).

トランザクション実行部３４０ａ（３４０ｂ，３４０ｃ，３４０ｄ）は、クライアント１０ａ〜１０ｃあるいは他のプロセッシングエレメント２０ｂ〜２０ｄからトランザクションに係る各種データ処理の実行要求を受信した場合に、Ｆａｔ−Ｂｔｒｅｅのディレクトリ情報に基づいて処理対象となるデータを検索し、そのデータに対する各種データ処理を実行する機能部である。 When the transaction execution unit 340a (340b, 340c, 340d) receives an execution request for various data processing related to a transaction from the clients 10a to 10c or the other processing elements 20b to 20d, the transaction execution unit 340a (340b, 340c, 340d) is based on the Fat-Btree directory information. It is a functional unit that searches for data to be processed and executes various data processing on the data.

このトランザクション実行部３４０ａ（３４０ｂ，３４０ｃ，３４０ｄ）は、バックアップアシスト１．５相コミットプロトコル（ＢＡ−１．５ＰＣ）を用いてトランザクションを実行する。このＢＡ−１．５ＰＣについては、後に詳しく説明する。 The transaction execution unit 340a (340b, 340c, 340d) executes a transaction using the backup assist 1.5 phase commit protocol (BA-1.5PC). This BA-1.5PC will be described in detail later.

障害復旧処理部３４１ａ（３４１ｂ，３４１ｃ，３４１ｄ）は、トランザクションの実行中にいずれかのプロセッシングエレメント２０ａ〜２０ｄに障害が発生した場合に、その障害に対する復旧処理をおこなう機能部である。 The failure recovery processing unit 341a (341b, 341c, 341d) is a functional unit that performs recovery processing for a failure when a failure occurs in any of the processing elements 20a to 20d during execution of the transaction.

この障害復旧処理部３４１ａ（３４１ｂ，３４１ｃ，３４１ｄ）は、障害が発生したプロセッシングエレメント２０ａ〜２０ｄの役割（障害が発生したプロセッシングエレメント２０ａ〜２０ｄが元データを記憶するものか、バックアップデータを記憶するものかなど）に応じて異なる方法で復旧処理を実行する。この復旧処理についても後に詳しく説明する。 The failure recovery processing unit 341a (341b, 341c, 341d) stores the role of the processing element 20a-20d in which the failure has occurred (whether the processing element 20a-20d in which the failure has occurred stores the original data or backup data). The recovery process is executed differently depending on whether it is something. This recovery process will also be described in detail later.

ディスク装置４０ａ（４０ｂ，４０ｃ，４０ｄ）は、ディレクトリデータ４００ａ（４００ｂ，４００ｃ，４００ｄ）、ディレクトリバックアップデータ４０１ａ（４０１ｂ，４０１ｃ，４０１ｄ）、プライマリデータ４０２ａ（４０２ｂ，４０２ｃ，４０２ｄ）、バックアップデータ４０３ａ（４０３ｂ，４０３ｃ，４０３ｄ）を記憶する。 The disk device 40a (40b, 40c, 40d) includes directory data 400a (400b, 400c, 400d), directory backup data 401a (401b, 401c, 401d), primary data 402a (402b, 402c, 402d), and backup data 403a ( 403b, 403c, 403d) are stored.

ディレクトリデータ４００ａ（４００ｂ，４００ｃ，４００ｄ）は、Ｆａｔ−Ｂｔｒｅｅにより構成されたディレクトリのデータである。このディレクトリデータ４００ａ（４００ｂ，４００ｃ，４００ｄ）は、図２に示したＢ−ｔｒｅｅの部分木に相当するデータである。 Directory data 400a (400b, 400c, 400d) is data of a directory configured by Fat-Btree. The directory data 400a (400b, 400c, 400d) is data corresponding to the B-tree subtree shown in FIG.

ディレクトリバックアップデータ４０１ａ（４０１ｂ，４０１ｃ，４０１ｄ）は、他のプロセッシングエレメント２０ｂ〜２０ｄのディスク装置４０ｂ〜４０ｄにより記憶されたディレクトリデータのバックアップデータである。 The directory backup data 401a (401b, 401c, 401d) is backup data of the directory data stored by the disk devices 40b to 40d of the other processing elements 20b to 20d.

プライマリデータ４０２ａ（４０２ｂ，４０２ｃ，４０２ｄ）は、トランザクションにおける各種データ処理の実行対象となるデータである。バックアップデータ４０３ａ（４０３ｂ，４０３ｃ，４０３ｄ）は、他のプロセッシングエレメント２０ｂ〜２０ｄのディスク装置４０ｂ〜４０ｄにより記憶されたプライマリデータのバックアップデータである。 The primary data 402a (402b, 402c, 402d) is data that is an execution target of various data processing in a transaction. The backup data 403a (403b, 403c, 403d) is backup data of primary data stored by the disk devices 40b-40d of the other processing elements 20b-20d.

つぎに、非同期ｎｅｉｇｈｂｏｒ−ＷＡＬプロトコルについて説明する。図６は、非同期ｎｅｉｇｈｂｏｒ−ＷＡＬプロトコルについて説明するシーケンス図である。本実施例では、トランザクションのログオーバーヘッドを抑制するため、非同期ｎｅｉｇｈｂｏｒ−ＷＡＬプロトコルを用いる。 Next, the asynchronous neighbor-WAL protocol will be described. FIG. 6 is a sequence diagram illustrating the asynchronous neighbor-WAL protocol. In this embodiment, an asynchronous neighbor-WAL protocol is used to suppress transaction log overhead.

この非同期ｎｅｉｇｈｂｏｒ−ＷＡＬプロトコルでは、元データを記憶するプライマリプロセッシングエレメントは、ログフラッシュ要求を受け付けた場合に（ステップＳ１０１）、プライマリプロセッシングエレメントのコントローラに備えられたメモリにログを書き込む（ステップＳ１０２）。 In this asynchronous neighbor-WAL protocol, when the primary processing element that stores the original data receives a log flush request (step S101), the log is written in the memory provided in the controller of the primary processing element (step S102).

さらに、プライマリプロセッシングエレメントは、メモリに書き込んだログの情報を含んだログメッセージを、バックアップデータを記憶するバックアッププロセッシングエレメントに送信する（ステップＳ１０３）。 Further, the primary processing element transmits a log message including information on the log written in the memory to the backup processing element that stores the backup data (step S103).

バックアッププロセッシングエレメントは、このログメッセージを受信すると、バックアッププロセッシングエレメントのコントローラに備えられたメモリにログを書き込み（ステップＳ１０４）、非同期ｎｅｉｇｈｂｏｒ−ＷＡＬプロトコルを用いたログの書き込み処理を終了する。 When the backup processing element receives this log message, the backup processing element writes the log in the memory provided in the controller of the backup processing element (step S104), and ends the log writing process using the asynchronous neighbor-WAL protocol.

この非同期ｎｅｉｇｈｂｏｒ−ＷＡＬプロトコルでは、バックアッププロセッシングエレメントは、ログの書き込み処理の完了を示すＡＣＫメッセージをプライマリプロセッシングエレメントに送信しないので、ログ書き込みの同期はとられない。また、ログはディスク装置に書き込まれるのではなく、メモリに書き込まれるので、ログオーバーヘッドを大幅に抑制することができる。一方、ここで、バックアッププロセッシングエレメントからＡＣＫメッセージをプライマリプロセッシングエレメントに送信し、ログ書き込みの同期を取る場合を同期ｎｅｉｇｈｂｏｒ−ＷＡＬプロトコルと呼ぶ。 In this asynchronous neighbor-WAL protocol, the backup processing element does not send an ACK message indicating the completion of the log writing process to the primary processing element, so log writing is not synchronized. In addition, since the log is not written in the disk device but in the memory, the log overhead can be greatly suppressed. On the other hand, a case where an ACK message is transmitted from the backup processing element to the primary processing element to synchronize log writing is referred to as a synchronous neighbor-WAL protocol.

つぎに、本実施例に係るＢＡ−１．５フェーズコミットプロトコルについて説明する。図７は、本実施例に係るＢＡ−１．５フェーズコミットプロトコルについて説明するシーケンス図である。 Next, the BA-1.5 phase commit protocol according to this embodiment will be described. FIG. 7 is a sequence diagram illustrating the BA-1.5 phase commit protocol according to the present embodiment.

図７では、各プロセッシングエレメント２０ａ〜２０ｄは、プライマリマスター、バックアップマスター、プライマリコホート、バックアップコホートの異なる役割をそれぞれ担っている。 In FIG. 7, the processing elements 20a to 20d have different roles of a primary master, a backup master, a primary cohort, and a backup cohort, respectively.

プライマリマスターは、クライアント１０ａ〜１０ｃからトランザクションの実行要求を受け付け、他のプロセッシングエレメント２０ａ〜２０ｄに当該トランザクションにおける各種オペレーションの実行要求を送信するプロセッシングエレメントである。バックアップマスターは、トランザクションに係る各種ログをプライマリマスターから取得してバックアップ用に記憶するプロセッシングエレメントである。 The primary master is a processing element that receives a transaction execution request from the clients 10a to 10c and transmits execution requests for various operations in the transaction to the other processing elements 20a to 20d. The backup master is a processing element that acquires various logs related to transactions from the primary master and stores them for backup.

プライマリコホートは、トランザクションにおいて更新される元データを記憶したプロセッシングエレメントである。バックアップコホートは、プライマリコホートにより記憶されたデータのバックアップデータを記憶するプロセッシングエレメントである。 The primary cohort is a processing element that stores original data that is updated in a transaction. The backup cohort is a processing element that stores backup data of data stored by the primary cohort.

以下の説明においては、プライマリマスター、バックアップマスター、プライマリコホート、バックアップコホートはそれぞれ、プロセッシングエレメント２０ａ、プロセッシングエレメント２０ｂ、プロセッシングエレメント２０ｃ、プロセッシングエレメント２０ｄであるものとする。なお、プロセッシングエレメント２０ｃ（プライマリコホート）およびプロセッシングエレメント２０ｄ（バックアップコホート）はそれぞれ１台だけでなく、複数台あるものとする。 In the following description, it is assumed that the primary master, the backup master, the primary cohort, and the backup cohort are the processing element 20a, the processing element 20b, the processing element 20c, and the processing element 20d, respectively. It is assumed that there are not only one processing element 20c (primary cohort) and processing elements 20d (backup cohort), but also a plurality of processing elements.

まず、プロセッシングエレメント２０ａのトランザクション実行部３４０ａは、クライアント１０ａ〜１０ｃからトランザクションの実行要求を受け付けた場合に、メンバーシップログを生成し、メモリ３３ａにログ３３０ａとして記憶するとともに、そのメンバーシップログをプロセッシングエレメント２０ｂに送信し、プロセッシングエレメント２０ｂのメモリ３３ｂにメンバーシップログをログ３３０ｂとして記憶させる（ステップＳ２０１）。 First, when the transaction execution unit 340a of the processing element 20a receives a transaction execution request from the clients 10a to 10c, the transaction execution unit 340a generates a membership log, stores it as the log 330a in the memory 33a, and processes the membership log. The information is transmitted to the element 20b, and the membership log is stored as the log 330b in the memory 33b of the processing element 20b (step S201).

ここで、メンバーシップログは、トランザクションにおいて実行される各種オペレーションの対象データを記憶しているプロセッシングエレメント２０ｃの情報であり、ディレクトリデータ４００ａに基づいて生成される。 Here, the membership log is information of the processing element 20c that stores target data of various operations executed in the transaction, and is generated based on the directory data 400a.

また、プロセッシングエレメント２０ａのトランザクション実行部３４０ａは、メンバーシップログを送信すべきプロセッシングエレメント２０ｂをバックアップディスク管理データ３３１ｂを参照することにより判定し、そのプロセッシングエレメント２０ｂにメンバーシップログを送信する。 The transaction execution unit 340a of the processing element 20a determines the processing element 20b to which the membership log is to be transmitted by referring to the backup disk management data 331b, and transmits the membership log to the processing element 20b.

プロセッシングエレメント２０ｂのトランザクション実行部３４０ｂは、プロセッシングエレメント２０ａからメンバーシップログを受信し、そのメンバーシップログをメモリ３３ｂにログ３３０ｂとして記憶する。 The transaction execution unit 340b of the processing element 20b receives the membership log from the processing element 20a, and stores the membership log as a log 330b in the memory 33b.

プロセッシングエレメント２０ａのトランザクション実行部３４０ａは、メンバーシップログをプロセッシングエレメント２０ｂに送信した後、複数のプロセッシングエレメント２０ｃにトランザクションに係るオペレーションの実行要求メッセージを送信する（ステップＳ２０２）。ここで、オペレーションの実行要求メッセージを送信するプロセッシングエレメント２０ｃはメンバーシップログにより特定される。 The transaction execution unit 340a of the processing element 20a transmits the membership log to the processing element 20b, and then transmits an operation execution request message related to the transaction to the plurality of processing elements 20c (step S202). Here, the processing element 20c that transmits the operation execution request message is specified by the membership log.

各プロセッシングエレメント２０ｃのトランザクション実行部３４０ｃは、このオペレーションの実行要求メッセージを受信し、当該オペレーションを実行する。そして、トランザクション実行部３４０ｃは、当該オペレーションに係るログを生成し、図６で説明した非同期ｎｅｉｇｈｂｏｒ−ＷＡＬプロトコルを用いて、そのログをメモリ３３ｃにログ３３０ｃとして記憶するとともに、各プロセッシングエレメント２０ｃのバックアップデータを記憶するプロセッシングエレメント２０ｄにログを送信する（ステップＳ２０３）。このログを受信したプロセッシングエレメント２０ｄのトランザクション実行部３４０ｄは、送信されたログをメモリ３３ｄにログ３３０ｄとして記憶する。 The transaction execution unit 340c of each processing element 20c receives this operation execution request message and executes the operation. Then, the transaction execution unit 340c generates a log related to the operation, stores the log as the log 330c in the memory 33c using the asynchronous neighbor-WAL protocol described in FIG. 6, and backs up each processing element 20c. The log is transmitted to the processing element 20d that stores data (step S203). The transaction execution unit 340d of the processing element 20d that has received this log stores the transmitted log as the log 330d in the memory 33d.

プロセッシングエレメント２０ｃのトランザクション実行部３４０ｃは、オペレーションの実行が完了すると、オペレーションの実行要求メッセージを送信したプロセッシングエレメント２０ａに対して、オペレーションの実行が完了したことを示すＡＣＫメッセージを送信する（ステップＳ２０４）。 When the execution of the operation is completed, the transaction execution unit 340c of the processing element 20c transmits an ACK message indicating that the execution of the operation is completed to the processing element 20a that has transmitted the operation execution request message (step S204). .

プロセッシングエレメント２０ａのトランザクション実行部３４０ａは、プロセッシングエレメント２０ｃのトランザクション実行部３４０ｃからＡＣＫメッセージを受信し、ＡＣＫ管理データ３３２ａのＡＣＫフラグを「１」に更新する。 The transaction execution unit 340a of the processing element 20a receives the ACK message from the transaction execution unit 340c of the processing element 20c, and updates the ACK flag of the ACK management data 332a to “1”.

そして、トランザクション実行部３４０ａは、プロセッシングエレメント２０ｃに対して実行を要求するオペレーションがさらにある場合、つぎのオペレーションの実行要求メッセージをプロセッシングエレメント２０ｃに送信する（ステップＳ２０５）。 Then, when there is an operation for requesting the processing element 20c to execute, the transaction execution unit 340a transmits an execution request message for the next operation to the processing element 20c (step S205).

以下、プロセッシングエレメント２０ａのトランザクション実行部３４０ａ、プロセッシングエレメント２０ｃのトランザクション実行部３４０ｃ、および、プロセッシングエレメント２０ｄのトランザクション実行部３４０ｄは、ステップＳ２０２〜ステップＳ２０４と同様の処理を、すべてのオペレーションが完了するまで繰り返す（ステップＳ２０６〜ステップＳ２１０）。 Hereinafter, the transaction execution unit 340a of the processing element 20a, the transaction execution unit 340c of the processing element 20c, and the transaction execution unit 340d of the processing element 20d perform the same processing as in steps S202 to S204 until all operations are completed. Repeat (step S206 to step S210).

そして、プロセッシングエレメント２０ａのトランザクション実行部３４０ａは、すべてのオペレーションに対応するＡＣＫメッセージをメンバーシップログに登録されたすべてのプロセッシングエレメント２０ｃから受信したことを確認した後、トランザクションに対するコミット処理（またはアボート処理）の実行要求に係る情報を含んだディシジョンログをメモリ３３ａにログ３３０ａとして記憶するとともに、プロセッシングエレメント２０ｂにそのディシジョンログを送信する（ステップＳ２１１）。 Then, the transaction execution unit 340a of the processing element 20a confirms that the ACK message corresponding to all the operations has been received from all the processing elements 20c registered in the membership log, and then commit processing (or abort processing) for the transaction. ) Is stored as a log 330a in the memory 33a, and the decision log is transmitted to the processing element 20b (step S211).

プロセッシングエレメント２０ｂのトランザクション実行部３４０ｂは、プロセッシングエレメント２０ａからディシジョンログを受信し、そのディシジョンログをメモリ３３ｂにログ３３０ｂとして記憶する。 The transaction execution unit 340b of the processing element 20b receives the decision log from the processing element 20a, and stores the decision log as the log 330b in the memory 33b.

そして、プロセッシングエレメント２０ａのトランザクション実行部３４０ａは、トランザクションに対するコミット処理（またはアボート処理）の実行を要求するディシジョンメッセージをメンバーシップログに登録されたすべてのプロセッシングエレメント２０ｃに送信する（ステップＳ２１２）。 Then, the transaction execution unit 340a of the processing element 20a transmits a decision message requesting execution of the commit process (or abort process) for the transaction to all the processing elements 20c registered in the membership log (step S212).

プロセッシングエレメント２０ｃのトランザクション実行部３４０ｃは、このディシジョンメッセージを受信し、トランザクションのコミット処理（またはアボート処理）を実行して、ペンディング状態となる。このペンディング状態では、トランザクション実行部３４０ｃは、当該ディシジョンに係るログを生成し、同期ｎｅｉｇｈｂｏｒ−ＷＡＬプロトコルを用いて、そのログをメモリ３３ｃにログ３３０ｃとして記憶するとともに、プロセッシングエレメント２０ｄに送信する（ステップＳ２１３）。 The transaction execution unit 340c of the processing element 20c receives this decision message, executes a transaction commit process (or abort process), and enters a pending state. In this pending state, the transaction execution unit 340c generates a log related to the decision, stores the log as the log 330c in the memory 33c using the synchronous neighbor-WAL protocol, and transmits the log to the processing element 20d (step S20). S213).

このログを受信したプロセッシングエレメント２０ｄのトランザクション実行部３４０ｄは、送信されたログをメモリ３３ｄにログ３３０ｄとして記憶する。そして、トランザクション実行部３４０ｄは、プロセッシングエレメント２０ｃから受信し、メモリ３３ｄにログ３３０ｄとして情報が記憶されたすべてのオペレーションを実行する。 The transaction execution unit 340d of the processing element 20d that has received this log stores the transmitted log as the log 330d in the memory 33d. Then, the transaction execution unit 340d executes all operations received from the processing element 20c and whose information is stored in the memory 33d as a log 330d.

オペレーションの実行後、トランザクション実行部３４０ｄは、オペレーションの実行完了を示すＡＣＫメッセージをプロセッシングエレメント２０ｃに送信する（ステップＳ２１４）。ＡＣＫメッセージを送信した後、トランザクション実行部３４０ｄは、このトランザクションに係るすべてのログをメモリ３３ｄから消去し、メモリ領域を開放する。 After execution of the operation, the transaction execution unit 340d transmits an ACK message indicating completion of the operation to the processing element 20c (step S214). After transmitting the ACK message, the transaction execution unit 340d deletes all logs related to this transaction from the memory 33d and releases the memory area.

プロセッシングエレメント２０ｃのトランザクション実行部３４０ｃは、このＡＣＫメッセージを受信すると、プロセッシングエレメント２０ａに対して、プロセッシングエレメント２０ｃおよびプロセッシングエレメント２０ｄにおいてディシジョン処理が完了したことを示すディシジョンＡＣＫメッセージを送信する（ステップＳ２１５）。ディシジョンＡＣＫメッセージを送信した後、トランザクション実行部３４０ｃは、このトランザクションに係るすべてのログをメモリ３３ｃから消去し、メモリ領域を開放する。 Upon receipt of this ACK message, the transaction execution unit 340c of the processing element 20c transmits a decision ACK message indicating that the decision processing has been completed in the processing element 20c and the processing element 20d to the processing element 20a (step S215). . After transmitting the decision ACK message, the transaction execution unit 340c deletes all logs related to this transaction from the memory 33c and releases the memory area.

プロセッシングエレメント２０ａのトランザクション実行部３４０ａは、ディシジョンＡＣＫメッセージをメンバーシップログに登録されたすべてのプロセッシングエレメント２０ｃから受信すると、トランザクションのディシジョン処理が完了したことを示すエンドログを生成し、非同期ｎｅｉｇｈｂｏｒ−ＷＡＬプロトコルを用いて、エンドログをメモリ３３ａにログ３３０ａとして記憶するとともに、プロセッシングエレメント２０ｂにエンドログを送信する（ステップＳ２１６）。 When the transaction execution unit 340a of the processing element 20a receives the decision ACK message from all the processing elements 20c registered in the membership log, the transaction execution unit 340a generates an end log indicating that the transaction decision processing has been completed, and the asynchronous neighbor-WAL. Using the protocol, the end log is stored in the memory 33a as the log 330a, and the end log is transmitted to the processing element 20b (step S216).

このエンドログを受信したプロセッシングエレメント２０ｂのトランザクション実行部３４０ｂは、送信されたエンドログをメモリ３３ｂにログ３３０ｂとして記憶し、このＢＡ−１．５フェーズコミットプロトコルを用いたトランザクション処理を終了する。 The transaction execution unit 340b of the processing element 20b that has received the end log stores the transmitted end log in the memory 33b as the log 330b, and ends the transaction process using the BA-1.5 phase commit protocol.

なお、ステップＳ２１３において、プライマリコホートであるプロセッシングエレメント２０ｃは、ディシジョンに係るログをバックアップコホートであるプロセッシングエレメント２０ｄに送信し、それに対するＡＣＫメッセージを待ち受けるが、プロセッシングエレメント２０ｄに何らかの障害が発生し、ＡＣＫメッセージの受信ができなかった場合には、メモリに記憶したログを消去することなく、プライマリマスターであるプロセッシングエレメント２０ａにディシジョンＡＣＫメッセージを送信する。このログは、後にプロセッシングエレメント２０ｄの復旧処理をおこなう際に用いられる。 In step S213, the processing element 20c that is the primary cohort transmits a log related to the decision to the processing element 20d that is the backup cohort, and waits for an ACK message. However, the processing element 20d has some trouble, and the ACK When the message cannot be received, the decision ACK message is transmitted to the processing element 20a which is the primary master without deleting the log stored in the memory. This log is used later when the processing element 20d is restored.

このように、プライマリコホートであるプロセッシングエレメント２０ｃは、ディシジョン処理においてペンディング状態を経るが、バックアップコホートであるプロセッシングエレメント２０ｄのペンディング状態における処理結果はトランザクションの最終結果に影響を及ぼさない。 In this way, the processing element 20c that is the primary cohort goes through the pending state in the decision processing, but the processing result in the pending state of the processing element 20d that is the backup cohort does not affect the final result of the transaction.

ＢＡ−１．５フェーズコミットプロトコルという名称は、このようなペンディング状態におけるプライマリコホートのディシジョン処理を０．５相とカウントし、プライマリマスターのディシジョン処理を１相とカウントすることにより命名されたものである。 The name of the BA-1.5 phase commit protocol is named by counting the primary cohort decision process in such a pending state as 0.5 phase and the primary master decision process as 1 phase. is there.

また、上記トランザクション処理では、トランザクション処理が終了したログをメモリ３３ｃ、３３ｄから消去していくため、メモリオーバーフローの発生を抑制することができる。また、メモリオーバーフローが発生する場合には、ログの一部をディスク装置４０ｃ、４０ｄに書き込むこととしてもよい。 Further, in the transaction process, the log after the transaction process is completed is deleted from the memories 33c and 33d, so that the occurrence of memory overflow can be suppressed. If a memory overflow occurs, a part of the log may be written to the disk devices 40c and 40d.

つぎに、プライマリマスター、バックアップマスター、プライマリコホート、バックアップコホートのいずれかに障害が発生した場合の復旧処理について説明する。なお、プライマリコホートであるプロセッシングエレメント２０ｃ、および、バックアップコホートであるプロセッシングエレメント２０ｄはそれぞれ１台だけでなく、複数台あるものとする。 Next, recovery processing when a failure occurs in any of the primary master, the backup master, the primary cohort, and the backup cohort will be described. It is assumed that there are a plurality of processing elements 20c as primary cohorts and a plurality of processing elements 20d as backup cohorts.

（１）バックアップマスターに障害が発生した場合
バックアップマスターであるプロセッシングエレメント２０ｂの障害は、トランザクション処理には何の影響も及ぼさない。プライマリマスターであるプロセッシングエレメント２０ａの障害復旧処理部３４１ａは、プロセッシングエレメント２０ｂに障害が発生したことを検出する。 (1) When a failure occurs in the backup master The failure of the processing element 20b, which is the backup master, has no effect on the transaction processing. The failure recovery processing unit 341a of the processing element 20a that is the primary master detects that a failure has occurred in the processing element 20b.

障害の発生は、常時接続が確立されているソケット通信がタイムアウトなどにより切断されたか否かや、プロセッシングエレメント２０ｂから障害が発生したことを示す障害発生メッセージを受信したか否かなどを調べることにより検出される。 The occurrence of the failure is determined by checking whether or not the socket communication for which the constant connection is established has been disconnected due to a timeout or the like, or whether or not a failure occurrence message indicating that a failure has occurred is received from the processing element 20b. Detected.

そして、障害が検出された場合には、プロセッシングエレメント２０ａのトランザクション実行部３４０ａは、非同期ｎｅｉｇｈｂｏｒ−ＷＡＬプロトコルによりプロセッシングエレメント２０ｂのメモリ３３ｂにログを書き込むことはせず、プロセッシングエレメント２０ａのメモリ３３ａにだけログを書き込み、その後は通常の処理を継続する。 If a failure is detected, the transaction execution unit 340a of the processing element 20a does not write a log to the memory 33b of the processing element 20b by the asynchronous neighbor-WAL protocol, but only to the memory 33a of the processing element 20a. Write the log, then continue normal processing.

（２）プライマリマスターに障害が発生した場合
（２−１）トランザクションのディシジョン前に障害が発生した場合
図８は、ディシジョン前にプライマリマスターに障害が検出された場合の復帰処理について説明するシーケンス図である。図８に示すように、バックアップマスターであるプロセッシングエレメント２０ｂの障害復旧処理部３４１ｂは、プライマリマスターであるプロセッシングエレメント２０ａに発生した障害を検出する（ステップＳ３０１）。 (2) When a failure occurs in the primary master (2-1) When a failure occurs before the decision of the transaction FIG. 8 is a sequence diagram for explaining the return processing when a failure is detected in the primary master before the decision It is. As illustrated in FIG. 8, the failure recovery processing unit 341b of the processing element 20b that is the backup master detects a failure that has occurred in the processing element 20a that is the primary master (step S301).

そして、障害復旧処理部３４１ｂは、メンバーシップログに登録されたプライマリコホートであるすべてのプロセッシングエレメント２０ｃおよびバックアップコホートであるプロセッシングエレメント２０ｄにトランザクションの実行を中止するアボートメッセージを送信する（ステップＳ３０２、ステップＳ３０３）。プロセッシングエレメント２０ｃの障害復旧処理部３４１ｃおよびプロセッシングエレメント２０ｄの障害復旧処理部３４１ｄは、このアボートメッセージを受信して、トランザクションの実行を中止する。 Then, the failure recovery processing unit 341b transmits an abort message for canceling the execution of the transaction to all the processing elements 20c that are the primary cohort registered in the membership log and the processing element 20d that is the backup cohort (step S302, step S302). S303). The failure recovery processing unit 341c of the processing element 20c and the failure recovery processing unit 341d of the processing element 20d receive this abort message and stop executing the transaction.

（２−２）トランザクションのディシジョン処理実行決定後で、すべてのプライマリコホートからＡＣＫメッセージを受信する前に障害が発生した場合
図９は、ディシジョン処理実行決定後、すべてのプライマリコホートからＡＣＫメッセージを受信する前にプライマリマスターに障害が検出された場合の復帰処理について説明するシーケンス図である。 (2-2) When a failure occurs after deciding to execute a decision process and before receiving ACK messages from all primary cohorts Figure 9 shows receiving ACK messages from all primary cohorts after deciding to execute a decision process It is a sequence diagram explaining a return process when a failure is detected in the primary master before performing.

ここで、バックアップマスターであるプロセッシングエレメント２０ｂは、プライマリマスターであるプロセッシングエレメント２０ａがディシジョン処理の実行を決定した後、プロセッシングエレメント２０ａからディシジョンログを受信するので、ディシジョン処理の実行決定を検知することができる。 Here, the processing element 20b that is the backup master receives the decision log from the processing element 20a after the processing element 20a that is the primary master has decided to execute the decision process, and therefore can detect the decision to execute the decision process. it can.

図９に示すように、バックアップマスターであるプロセッシングエレメント２０ｂの障害復旧処理部３４１ｂは、プライマリマスターであるプロセッシングエレメント２０ａに発生した障害を検出する（ステップＳ４０１）。そして、障害復旧処理部３４１ｂは、メンバーシップログに登録されたすべてのプロセッシングエレメント２０ｃに対してディシジョンメッセージを送信する（ステップＳ４０２）。 As illustrated in FIG. 9, the failure recovery processing unit 341b of the processing element 20b that is the backup master detects a failure that has occurred in the processing element 20a that is the primary master (step S401). Then, the failure recovery processing unit 341b transmits a decision message to all the processing elements 20c registered in the membership log (step S402).

ここで、プロセッシングエレメント２０ｂのメモリ３３ｂには、メンバーシップログがすでに記憶されているので、障害復旧処理部３４１ｂは、ディシジョンメッセージを送信すべきすべてのプロセッシングエレメント２０ｃをすべて検出することができる。 Here, since the membership log is already stored in the memory 33b of the processing element 20b, the failure recovery processing unit 341b can detect all the processing elements 20c to which the decision message should be transmitted.

このディシジョンメッセージを受信したプロセッシングエレメント２０ｃのトランザクション実行部３４０ｃは、ディシジョン処理を実行するとともに、当該ディシジョン処理に係るログを生成し、同期ｎｅｉｇｈｂｏｒ−ＷＡＬプロトコルを用いてバックアップコホートであるプロセッシングエレメント２０ｄにログを送信する（ステップＳ４０３）。 The transaction execution unit 340c of the processing element 20c that has received this decision message executes the decision process, generates a log related to the decision process, and logs to the processing element 20d that is a backup cohort using the synchronous neighbor-WAL protocol. Is transmitted (step S403).

このログを受信したプロセッシングエレメント２０ｄのトランザクション実行部３４０ｄは、プロセッシングエレメント２０ｃから受信したログに情報が含まれるすべてのオペレーションを実行し、オペレーションの実行完了を示すＡＣＫメッセージをプロセッシングエレメント２０ｃに送信する（ステップＳ４０４）。 The transaction execution unit 340d of the processing element 20d that has received this log executes all operations whose information is included in the log received from the processing element 20c, and transmits an ACK message indicating the completion of the operation to the processing element 20c ( Step S404).

プロセッシングエレメント２０ｃのトランザクション実行部３４０ｃは、このＡＣＫメッセージを受信すると、プロセッシングエレメント２０ｂに対して、プロセッシングエレメント２０ｃおよびプロセッシングエレメント２０ｄにおいてディシジョン処理が完了したことを示すディシジョンＡＣＫメッセージを送信する（ステップＳ４０５）。 Upon receipt of this ACK message, the transaction execution unit 340c of the processing element 20c transmits a decision ACK message indicating that the decision processing has been completed in the processing element 20c and the processing element 20d to the processing element 20b (step S405). .

プロセッシングエレメント２０ｂの障害復旧処理部３４１ｂは、ディシジョンＡＣＫメッセージを受信すると、ディシジョン処理が完了したことを示すエンドログを生成し、エンドログをメモリ３３ｂにログ３３０ｂとして記憶してトランザクション処理を終了する。 Upon receiving the decision ACK message, the failure recovery processing unit 341b of the processing element 20b generates an end log indicating that the decision process is completed, stores the end log in the memory 33b as the log 330b, and ends the transaction process.

（３）バックアップコホートに障害が発生した場合
トランザクション処理においては、バックアップコホートの障害はプライマリマスターおよびバックアップマスターには影響を及ぼさない。実際、プライマリマスターは、ディシジョン処理の実行以前には、プライマリコホートとバックアップコホートとの間で同期処理がおこなわれないので、バックアップコホートの障害を検出できない。 (3) When a failure occurs in the backup cohort In transaction processing, a failure in the backup cohort does not affect the primary master and the backup master. In fact, the primary master cannot detect the failure of the backup cohort because the synchronization process is not performed between the primary cohort and the backup cohort before the decision process is executed.

図１０は、バックアップコホートに異常が検出された場合の復帰処理について説明するシーケンス図である。図１０に示すように、まず、プライマリコホートであるプロセッシングエレメント２０ｃの障害復旧処理部３４１ｃは、バックアップコホートであるプロセッシングエレメント２０ｄに発生した障害を検出する（ステップＳ５０１）。 FIG. 10 is a sequence diagram illustrating a return process when an abnormality is detected in the backup cohort. As shown in FIG. 10, first, the failure recovery processing unit 341c of the processing element 20c that is the primary cohort detects a failure that has occurred in the processing element 20d that is the backup cohort (step S501).

障害復旧処理部３４１ｃは、障害を検出すると、プライマリマスターであるプロセッシングエレメント２０ａにＡＣＫメッセージを送信する（ステップＳ５０２）。ここで、トランザクション実行部３４０ｃは、メモリ３３ｃに記憶しているログを消去せず、保持しておく。 When the failure recovery processing unit 341c detects a failure, the failure recovery processing unit 341c transmits an ACK message to the processing element 20a that is the primary master (step S502). Here, the transaction execution unit 340c retains the log stored in the memory 33c without erasing it.

その後、障害復旧処理部３４１ｃは、プロセッシングエレメント２０ｄの復旧を検出する（ステップＳ５０３）。そして、障害復旧処理部３４１ｃは、復旧の検出後、メモリ３３ｃに保持していたログをプロセッシングエレメント２０ｄに送信する（ステップＳ５０４）。プロセッシングエレメント２０ｄのトランザクション実行部３４０ｄは、このログを受信してメモリ３３ｄに記憶するとともに、ログを用いてデータを最新の状態に更新する。 Thereafter, the failure recovery processing unit 341c detects the recovery of the processing element 20d (step S503). Then, after detecting the recovery, the failure recovery processing unit 341c transmits the log held in the memory 33c to the processing element 20d (step S504). The transaction execution unit 340d of the processing element 20d receives this log and stores it in the memory 33d, and updates the data to the latest state using the log.

（４）プライマリコホートに障害が発生した場合
（４−１）最後のオペレーションのプリペアード状態前に障害が発生した場合
プライマリマスターであるプロセッシングエレメント２０ａの障害復旧処理部３４１ａが、最後のオペレーションのプリペアード状態前（図７のログライト（ログｎ）以前）に、プライマリコホートであるプロセッシングエレメント２０ｃの１つに障害を検出した場合、障害復旧処理部３４１ａは、障害を検出したプロセッシングエレメント２０ｃ以外のすべてのプライマリコホートおよびバックアップコホートに対して、トランザクションの中止を要求するアボートメッセージを送信する。 (4) When a failure occurs in the primary cohort (4-1) When a failure occurs before the prepared state of the last operation The failure recovery processing unit 341a of the processing element 20a, which is the primary master, prepares the prepared state of the last operation. When a failure is detected in one of the processing elements 20c that is the primary cohort before (before log write (log n) in FIG. 7), the failure recovery processing unit 341a detects all of the processing elements 20c other than the processing element 20c that has detected the failure. An abort message is sent to the primary cohort and backup cohort requesting transaction abort.

（４−２）最後のオペレーションのプリペアード状態以降に障害が発生した場合
この状況においては、プライマリマスターであるプロセッシングエレメント２０ａは、プライマリコホートであるすべてのプロセッシングエレメント２０ｃからオペレーションの実行要求に対するＡＣＫメッセージを収集し、すべてのプロセッシングエレメント２０ｃに対してディシジョンメッセージを送信する。 (4-2) When a Failure Occurs Since the Prepared State of the Last Operation In this situation, the processing element 20a that is the primary master sends an ACK message for an operation execution request from all the processing elements 20c that are the primary cohort. Collect and send decision messages to all processing elements 20c.

各プロセッシングエレメント２０ｃにおいてはオペレーションの実行は完了しているので、この段階で障害が発生した場合には、プロセッシングエレメント２０ａは、トランザクションを中止するのではなく、そのまま継続させる。 Since the execution of the operation is completed in each processing element 20c, if a failure occurs at this stage, the processing element 20a does not abort the transaction but continues it.

図１１は、最後のオペレーションのプリペアード状態以降にプライマリコホートに障害が検出された場合の復帰処理について説明するシーケンス図である。図１１に示すように、まず、プライマリマスターであるプロセッシングエレメント２０ａの障害復旧処理部３４１ａは、プライマリコホートであるプロセッシングエレメント２０ｃに発生した障害を検出する（ステップＳ６０１）。 FIG. 11 is a sequence diagram for explaining return processing when a failure is detected in the primary cohort after the prepared state of the last operation. As shown in FIG. 11, first, the failure recovery processing unit 341a of the processing element 20a that is the primary master detects a failure that has occurred in the processing element 20c that is the primary cohort (step S601).

そして、障害復旧処理部３４１ａは、バックアップコホートであるプロセッシングエレメント２０ｄに、トランザクションに対するディシジョン処理の実行を要求するディシジョンメッセージを送信する（ステップＳ６０２）。 Then, the failure recovery processing unit 341a transmits a decision message requesting execution of the decision process for the transaction to the processing element 20d that is the backup cohort (step S602).

プロセッシングエレメント２０ｄのトランザクション実行部３４０ｄは、このディシジョンメッセージを受信し、ディシジョン処理を実行して、ディシジョンＡＣＫメッセージをプロセッシングエレメント２０ａに送信する（ステップＳ６０３）。ここで、トランザクション実行部３４０ｄは、メモリ３３ｄに記憶しているログを消去せず、保持しておく。 The transaction execution unit 340d of the processing element 20d receives this decision message, executes a decision process, and transmits a decision ACK message to the processing element 20a (step S603). Here, the transaction execution unit 340d retains the log stored in the memory 33d without erasing it.

プロセッシングエレメント２０ｃの障害復旧処理部３４１ｃは、障害から復旧した後、プロセッシングエレメント２０ｄに対して、ログの問い合わせをおこなう（ステップＳ６０４）。 The failure recovery processing unit 341c of the processing element 20c makes a log inquiry to the processing element 20d after recovering from the failure (step S604).

プロセッシングエレメント２０ｄの障害復旧処理部３４１ｄは、この問い合わせを受け付けると、プロセッシングエレメント２０ｃにログを送信する（ステップＳ６０５）。プロセッシングエレメント２０ｃの障害復旧処理部３４１ｃは、このログを用いて、データを最新の状態に更新する。 When the failure recovery processing unit 341d of the processing element 20d receives this inquiry, it transmits a log to the processing element 20c (step S605). The failure recovery processing unit 341c of the processing element 20c uses this log to update the data to the latest state.

本実施例にて説明してきた各種の処理は、あらかじめ用意されたプログラムをコンピュータで実行することによって実現することができる。そこで、以下では、上記各種処理を実現するプログラムを実行するコンピュータの一例について説明する。 The various processes described in the present embodiment can be realized by executing a program prepared in advance on a computer. In the following, an example of a computer that executes a program that implements the various processes will be described.

図１２は、図３に示したコントローラ３０ａ〜３０ｄとなるコンピュータのハードウェア構成を示す図である。このコンピュータは、ネットワークを介して他のコンピュータとの間でデータの授受をおこなうネットワークインターフェース１００、ディスク装置との間でデータの授受をおこなうディスクインターフェース１０１、ＣＰＵ（Central Processing Unit）１０２、ＲＯＭ（Read Only Memory）１０３、ＲＡＭ（Random Access Memory）１０４をバス１０５で接続して構成される。 FIG. 12 is a diagram illustrating a hardware configuration of a computer serving as the controllers 30a to 30d illustrated in FIG. This computer includes a network interface 100 that exchanges data with other computers via a network, a disk interface 101 that exchanges data with a disk device, a CPU (Central Processing Unit) 102, and a ROM (Read Only memory (RAM) 103 and RAM (Random Access Memory) 104 are connected by a bus 105.

そして、ＲＯＭ１０３には、コントローラ３０ａ〜３０ｄの機能と同様の機能を発揮するデータ記憶制御プログラム１０３ａが記憶されている。なお、このデータ記憶制御プログラム１０３ａについては、適宜分散して記憶することとしてもよい。 The ROM 103 stores a data storage control program 103a that exhibits the same functions as the functions of the controllers 30a to 30d. The data storage control program 103a may be stored in a distributed manner as appropriate.

そして、ＣＰＵ１０２が、データ記憶制御プログラム１０３ａをＲＯＭ１０３から読み出して実行することにより、データ記憶制御プロセス１０２ａが起動され、機能するようになる。なお、ここではＣＰＵ１０２がデータ記憶制御プログラム１０３ａを実行することしているが、ＭＣＵ（Micro Controller Unit）やＭＰＵ（Micro Processing Unit）がデータ記憶制御プログラム１０３ａを実行することとしてもよい。 Then, when the CPU 102 reads out the data storage control program 103a from the ROM 103 and executes it, the data storage control process 102a is started and functions. Although the CPU 102 executes the data storage control program 103a here, an MCU (Micro Controller Unit) or an MPU (Micro Processing Unit) may execute the data storage control program 103a.

このデータ記憶制御プログラム１０３ａは、図３に示したトランザクション実行部３４０ａ（３４０ｂ，３４０ｃ，３４０ｄ）、障害復旧処理部３４１ａ（３４１ｂ，３４１ｃ，３４１ｄ）に対応する。 The data storage control program 103a corresponds to the transaction execution unit 340a (340b, 340c, 340d) and the failure recovery processing unit 341a (341b, 341c, 341d) shown in FIG.

また、ＣＰＵ１０２は、ＲＡＭ１０４にログ１０４ａ、バックアップディスク管理データ１０４ｂ、ＡＣＫ管理データ１０４ｃを記憶させるとともに、ＲＡＭ１０４からそれらのデータを読み出して各種データ処理を実行する。 Further, the CPU 102 stores the log 104a, the backup disk management data 104b, and the ACK management data 104c in the RAM 104, and reads out the data from the RAM 104 and executes various data processing.

ログ１０４ａ、バックアップディスク管理データ１０４ｂ、ＡＣＫ管理データ１０４ｃは、図３に示したログ３３０ａ（３３０ｂ，３３０ｃ，３３０ｄ）、バックアップディスク管理データ３３１ａ（３３１ｂ，３３１ｃ，３３１ｄ）、ＡＣＫ管理データ３３２ａ（３３２ｂ，３３２ｃ，３３２ｄ）にそれぞれ対応する。 The log 104a, the backup disk management data 104b, and the ACK management data 104c are the log 330a (330b, 330c, 330d), the backup disk management data 331a (331b, 331c, 331d), and the ACK management data 332a (332b, 332b, 332d) illustrated in FIG. 332c, 332d).

つぎに、本実施例に係るＢＡ−１．５フェーズコミットプロトコル（ＢＡ−１．５ＰＣ）、２フェーズコミットプロトコル（２ＰＣ）、および、アーリープリペアプロトコル（ＥＰ）間でのオーバーヘッドの比較について述べる。ここでいうオーバーヘッドには、トランザクションのオペレーション処理、コミット処理におけるメッセージ交換と、強制ディスク書き込みとの両方を含んでいる。 Next, an overhead comparison between the BA-1.5 phase commit protocol (BA-1.5PC), the two-phase commit protocol (2PC), and the early prepare protocol (EP) according to the present embodiment will be described. The overhead here includes both transaction operation processing, message exchange in commit processing, and forced disk writing.

図１３は、各プロトコルのコミットされたトランザクションのオーバーヘッドを示す図である。ここで、Ｐは、トランザクションにおいて実行されるオペレーションの数であり、Ｎは、オペレーションを実行するコホートの数である。 FIG. 13 is a diagram illustrating the overhead of committed transactions for each protocol. Here, P is the number of operations performed in the transaction, and N is the number of cohorts that perform the operation.

図１３に示されるように、ＢＡ−１．５フェーズコミットプロトコル、２フェーズコミットプロトコル、アーリープリペアプロトコルにおいて送信されるメッセージ数はそれぞれ、３＋３Ｐ＋４Ｎ、２Ｐ＋８Ｎ、４Ｐ＋２Ｎとなる。 As shown in FIG. 13, the numbers of messages transmitted in the BA-1.5 phase commit protocol, the two phase commit protocol, and the early prepare protocol are 3 + 3P + 4N, 2P + 8N, and 4P + 2N, respectively.

また、ＢＡ−１．５フェーズコミットプロトコル、２フェーズコミットプロトコル、アーリープリペアプロトコルにおいて実行されるディスク装置への強制ログライト数はそれぞれ、０、４Ｎ＋１、２Ｐ＋２となる。 The number of forced log writes to the disk device executed in the BA-1.5 phase commit protocol, the two phase commit protocol, and the early prepare protocol is 0, 4N + 1, and 2P + 2, respectively.

ＢＡ−１．５フェーズコミットプロトコルのメッセージ数は、図７から得られる。また、ＢＡ−１．５フェーズコミットプロトコルでは、非同期ｎｅｉｇｈｂｏｒ−ＷＡＬプロトコルを用いるため、ディスク装置への強制ログライトは不要であり、強制ログライト数は０となる。 The number of messages in the BA-1.5 phase commit protocol can be obtained from FIG. Further, in the BA-1.5 phase commit protocol, the asynchronous neighbor-WAL protocol is used, so that the forced log write to the disk device is unnecessary, and the number of forced log writes becomes zero.

２フェーズコミットプロトコルでは、コホートは、コミット処理時に強制ディスクライトを２回実行する必要がある（１回は投票時、もう１回はディシジョン時）。一方、マスターは、ディシジョン時に１回強制ディスクライトを実行する必要がある。 In the two-phase commit protocol, the cohort needs to execute a forced disk write twice during commit processing (once at voting and once at decision). On the other hand, the master needs to execute a forced disk write once during the decision.

また、マスターは、各コホートとの間でメッセージを２往復分交換する必要がある。したがって、トランザクションにおいてＰ個のオペレーションをおこなう場合には、交換されるメッセージの数は４Ｎ＋Ｐとなり、強制ディスクライトの数は２Ｎ＋１となる。 The master also needs to exchange two round-trip messages with each cohort. Therefore, when performing P operations in a transaction, the number of exchanged messages is 4N + P, and the number of forced disk writes is 2N + 1.

この２フェーズコミットプロトコルを自律ディスクに適用した場合、プライマリコホートは、オペレーションおよびコミット処理時にバックアップコホートにメッセージを送信してその応答を受信する。また、コミット処理時には強制ログライトが２回発生する。したがって、２フェーズコミットプロトコルにおけるメッセージ数は８Ｎ＋２Ｐとなり、強制ログライト数は４Ｎ＋１となる。 When this two-phase commit protocol is applied to an autonomous disk, the primary cohort sends a message to the backup cohort and receives a response during operation and commit processing. In addition, a forced log write occurs twice during the commit process. Therefore, the number of messages in the two-phase commit protocol is 8N + 2P, and the number of forced log writes is 4N + 1.

アーリープリペアプロトコルでは、コホートは、オペレーションごとに更新データとともにログをディスク装置に強制ライトする。そして、コホートは、マスターにオペレーションに対するＡＣＫメッセージを送信する。 In the early prepare protocol, the cohort forcibly writes the log to the disk device together with the update data for each operation. The cohort then sends an ACK message for the operation to the master.

コミット時には、コホートは、中止されたトランザクションに対してのみ強制ログライトをおこない、中止されたトランザクションに対してのみＡＣＫメッセージを送信する。マスターがおこなう強制ログライトは、トランザクションの開始時におけるメンバーシップログの強制ログライトと、コミット時におけるディシジョンログの強制ログライトである。したがって、アーリープリペアプロトコルにおけるメッセージ数は２Ｐ＋Ｎとなり、強制ログライト数はＰ＋２となる。 At commit time, the cohort performs forced log write only for the aborted transaction and transmits an ACK message only for the aborted transaction. The forced log write performed by the master is a forced log write of the membership log at the start of the transaction and a forced log write of the decision log at the commit time. Therefore, the number of messages in the early prepare protocol is 2P + N, and the number of forced log writes is P + 2.

このアーリープリペアプロトコルを自律ディスクに適用した場合、プライマリコホートはバックアップコホートに対してオペレーションおよびディシジョンメッセージを送信する必要があるため、メッセージ数は４Ｐ＋２Ｎとなり、強制ログライト数は２Ｐ＋２となる。 When this early prepare protocol is applied to an autonomous disk, the primary cohort needs to send operation and decision messages to the backup cohort, so the number of messages is 4P + 2N, and the number of forced log writes is 2P + 2.

図１３からわかるように、ＢＡ−１．５フェーズコミットプロトコルでは強制ログライトが発生せず、２フェーズコミットプロトコル、アーリープリペアプロトコルと比較してデータ処理を高速におこなうことができる。また、Ｎ＞３で、２Ｎ＋３＜Ｐ＜４Ｎ−３の関係をＰが満足する場合には、メッセージ数も最小となり、他の２つのプロトコルよりもデータ処理を効率的におこなうことができる。 As can be seen from FIG. 13, forced log write does not occur in the BA-1.5 phase commit protocol, and data processing can be performed at a higher speed than the two-phase commit protocol and the early prepare protocol. When N> 3 and P satisfies the relationship 2N + 3 <P <4N−3, the number of messages is minimized, and data processing can be performed more efficiently than the other two protocols.

つぎに、ＢＡ−１．５フェーズコミットプロトコル、２フェーズコミットプロトコル、および、アーリープリペアプロトコル間でのスループットの比較について述べる。図１４は、各プロトコルのスループットの比較結果を示す図である。図１４には、２４台のディスク装置を有する自律ディスクにおけるさまざまなサイズのデータのインサート処理に係るスループットが示されている。 Next, a comparison of throughput between the BA-1.5 phase commit protocol, the two-phase commit protocol, and the early prepare protocol will be described. FIG. 14 is a diagram showing a comparison result of the throughput of each protocol. FIG. 14 shows throughputs related to insert processing of data of various sizes in an autonomous disk having 24 disk devices.

図１４に示すように、ＢＡ−１．５フェーズコミットプロトコルのスループットは、すべてのデータサイズにおいて、２フェーズコミットプロトコルおよびアーリープリペアプロトコルのスループットを上回っている。これは、ＢＡ−１．５フェーズコミットプロトコルでは、コミット処理におけるメッセージ交換の複雑さを解消し、強制ログライトを不要としたためである。 As shown in FIG. 14, the throughput of the BA-1.5 phase commit protocol exceeds the throughputs of the two-phase commit protocol and the early prepare protocol for all data sizes. This is because the BA-1.5 phase commit protocol eliminates the complexity of message exchange in the commit process and eliminates the need for forced log writing.

図１５、１６および１７はそれぞれ、ＢＡ−１．５フェーズコミットプロトコル、２ＰＣプロトコル、および、ＥＰプロトコルのスループットを示す図である。図１５、１６および１７に示されるように、いずれのプロトコルにおいてもディスク装置（ノード）の数が増えるにつれてスループットが増大することがわかる。 FIGS. 15, 16 and 17 are diagrams illustrating the throughput of the BA-1.5 phase commit protocol, the 2PC protocol, and the EP protocol, respectively. As shown in FIGS. 15, 16 and 17, it can be seen that in any protocol, the throughput increases as the number of disk devices (nodes) increases.

そして、ＢＡ−１．５フェーズコミットプロトコルにおけるスループットは、ディスク装置の数がいくつであっても、２ＰＣプロトコルおよびＥＰプロトコルのスループットよりも大きくなっている。このように、自律ディスクにＢＡ−１．５フェーズコミットプロトコルを適用することにより、データ処理を高速におこなうことができる。 The throughput in the BA-1.5 phase commit protocol is larger than the throughput of the 2PC protocol and the EP protocol regardless of the number of disk devices. In this manner, data processing can be performed at high speed by applying the BA-1.5 phase commit protocol to the autonomous disk.

上述してきたように、本実施例では、プライマリマスターであるプロセッシングエレメント２０ａが、データに対するオペレーションの実行要求メッセージを送信し、プライマリコホートである１つ以上のプロセッシングエレメント２０ｂが、オペレーションの実行要求メッセージを受信した場合に、当該オペレーションを実行するとともに、オペレーションの実行が完了するたびにオペレーションの実行要求メッセージに対するＡＣＫメッセージをプロセッシングエレメント２０ａに送信し、オペレーションに係るログ情報を生成して当該ログ情報を含んだメッセージを送信し、バックアップコホートである１つ以上のプロセッシングエレメント２０ｄが、１つ以上のプロセッシングエレメント２０ｃにより送信されたメッセージを受信して各オペレーションに係るログ情報をメモリ３３ｄに記憶し、プロセッシングエレメント２０ａが、１つ以上のプロセッシングエレメント２０ｃにより送信されたＡＣＫメッセージを受信してすべてのオペレーションの実行が完了したか否かを判定し、すべてのオペレーションの実行完了後、オペレーションに対するディシジョン処理の実行要求メッセージを送信し、１つ以上のプロセッシングエレメント２０ｃは、プロセッシングエレメント２０ａにより送信されたディシジョン処理の実行要求メッセージを受信した場合にディシジョン処理を実行して、１つ以上のプロセッシングエレメント２０ｃのそれぞれに接続されたディスク装置４０ｃに記憶されたデータを更新するとともに、ディシジョン処理に係るログ情報を生成して当該ログ情報を含んだメッセージを送信し、１つ以上のプロセッシングエレメント２０ｄが、１つ以上のプロセッシングエレメント２０ｃにより送信されたメッセージを受信し、メモリ３３ｄに記憶されたログ情報に基づいてディシジョン処理を実行して、１つ以上のプロセッシングエレメント２０ｄのそれぞれに接続されたディスク装置４０ｄに記憶されたバックアップデータの更新をおこなうこととしたので、トランザクションに係るオーバーヘッドの増大を抑制し、データ処理の効率を向上させることができる。 As described above, in this embodiment, the processing element 20a that is a primary master transmits an operation execution request message for data, and one or more processing elements 20b that are primary cohorts receive an operation execution request message. When the operation is received, the operation is executed, and an ACK message for the operation execution request message is transmitted to the processing element 20a every time the execution of the operation is completed, and log information related to the operation is generated to include the log information. And one or more processing elements 20d that are backup cohorts receive messages sent by one or more processing elements 20c. Log information relating to each operation is stored in the memory 33d, and the processing element 20a receives the ACK message transmitted by the one or more processing elements 20c and determines whether or not the execution of all the operations is completed; After execution of all the operations is completed, a decision process execution request message for the operation is transmitted. When one or more processing elements 20c receive the decision process execution request message transmitted by the processing element 20a, the decision process is performed. And updating the data stored in the disk device 40c connected to each of the one or more processing elements 20c, and generating log information related to the decision processing to generate the log The one or more processing elements 20d receive the message transmitted by the one or more processing elements 20c, and execute the decision process based on the log information stored in the memory 33d. Since the backup data stored in the disk device 40d connected to each of the one or more processing elements 20d is updated, an increase in overhead associated with the transaction is suppressed and data processing efficiency is improved. be able to.

また、本実施例では、バックアップコホートである１つ以上のプロセッシングエレメント２０ｄが、ディシジョン処理を実行した後、ディシジョン処理の完了を示すディシジョン完了メッセージを送信し、プライマリコホートである１つ以上のプロセッシングエレメント２０ｃが、１つ以上のプロセッシングエレメント２０ｄからディシジョン完了メッセージを受信した場合に、プライマリマスターであるプロセッシングエレメント２０ａに対してデータの同期処理が完了したことを示す同期完了メッセージを送信することとしたので、データとバックアップデータとの間で同期を確実にとることができる。 Further, in this embodiment, one or more processing elements 20d that are backup cohorts execute a decision process, and then transmit a decision completion message indicating the completion of the decision process, thereby one or more processing elements that are the primary cohort. When 20c receives a decision completion message from one or more processing elements 20d, it sends a synchronization completion message indicating that the data synchronization processing has been completed to the processing element 20a as the primary master. Thus, synchronization can be ensured between the data and the backup data.

また、本実施例では、プライマリマスターであるプロセッシングエレメント２０ａが、オペレーションの実行要求メッセージを送信する１つ以上のプロセッシングエレメント２０ｃを示すメンバーシップログ情報を生成してバックアップマスターであるプロセッシングエレメント２０ｂに送信し、プロセッシングエレメント２０ｂが、メンバーシップログ情報をメモリ３３ｂに記憶し、プロセッシングエレメント２０ａが、ディシジョン処理の実行を決定した場合にディシジョン処理の実行決定を示すディシジョンログ情報を生成してプロセッシングエレメント２０ｂに送信し、プロセッシングエレメント２０ｂが、ディシジョンログ情報を受信した後、プロセッシングエレメント２０ａに係る障害を検出した場合に、メンバーシップログ情報に基づいてオペレーションに対するディシジョン処理の実行要求メッセージを１つ以上のプロセッシングエレメント２０ｃに送信することとしたので、プロセッシングエレメント２０ａに係る障害が発生した場合でも、ディシジョン処理を迅速に実行することができる。 In this embodiment, the processing element 20a as the primary master generates membership log information indicating one or more processing elements 20c for transmitting the operation execution request message, and transmits the membership log information to the processing element 20b as the backup master. Then, the processing element 20b stores the membership log information in the memory 33b, and when the processing element 20a determines execution of the decision processing, it generates decision log information indicating the execution determination of the decision processing, and sends it to the processing element 20b. And when the processing element 20b detects a failure related to the processing element 20a after receiving the decision log information, the membership log information Since it was decided to send the execution request message Decision process to one or more processing elements 20c for operations based on, even if a failure of the processing elements 20a occurs, it is possible to quickly execute the decision process.

また、本実施例では、プライマリコホートである１つ以上のプロセッシングエレメント２０ｃが、プライマリマスターであるプロセッシングエレメント２０ａからディシジョン処理の実行要求メッセージを受信した後、バックアップコホートである１つ以上のプロセッシングエレメント２０ｄに係る障害を検出した場合に、プロセッシングエレメント２０ａに対してディシジョン完了メッセージを送信し、障害が復旧した場合に、１つ以上のプロセッシングエレメント２０ｄが、メモリ３３ｄに記憶されたログ情報に基づいてディシジョン処理を実行して、１つ以上のプロセッシングエレメント２０ｄのそれぞれに接続されたディスク装置４０ｄに記憶されたバックアップデータを更新することとしたので、プロセッシングエレメント２０ｄに係る障害が発生した場合でも、ディシジョン処理を迅速に実行することができる。 Further, in this embodiment, after one or more processing elements 20c that are primary cohorts receive a decision processing execution request message from the processing element 20a that is primary master, one or more processing elements 20d that are backup cohorts. When a failure is detected, a decision completion message is transmitted to the processing element 20a. When the failure is recovered, one or more processing elements 20d are determined based on the log information stored in the memory 33d. Since the process is executed to update the backup data stored in the disk device 40d connected to each of the one or more processing elements 20d, the processing element 20d Even if according failure can be quickly execute the decision process.

また、本実施例では、プライマリマスターであるプロセッシングエレメント２０ａが、プライマリコホートである１つ以上のプロセッシングエレメント２０ｃにより送信されたすべての応答メッセージを受信した後、１つ以上のプロセッシングエレメント２０ｃのうちのいずれかに係る障害を検出した場合に、１つ以上のプロセッシングエレメント２０ｄにオペレーションに対するディシジョン処理の実行要求メッセージを送信し、１つ以上のプロセッシングエレメント２０ｄが、プロセッシングエレメント２０ａにより送信されたディシジョン処理の実行要求メッセージを受信した場合に、ディシジョン処理を実行し、障害が復旧した場合にディシジョン処理に係るログ情報を障害が発生したプロセッシングエレメント２０ｃに送信し、プロセッシングエレメント２０ｃは、ディシジョン処理に係るログ情報を受信し、受信したログ情報に基づいてディシジョン処理を実行することとしたので、プロセッシングエレメント２０ｃに係る障害が発生した場合でも、ディシジョン処理を迅速に実行することができる。 Further, in the present embodiment, after the processing element 20a that is the primary master receives all the response messages transmitted by the one or more processing elements 20c that are the primary cohort, one of the one or more processing elements 20c When a failure related to any of the above is detected, a decision processing execution message for the operation is transmitted to one or more processing elements 20d, and the one or more processing elements 20d transmit the decision processing transmitted by the processing element 20a. When the execution request message is received, the decision process is executed, and when the failure is recovered, the log information related to the decision process is transmitted to the processing element 20c where the failure has occurred, Since the processing element 20c receives the log information related to the decision processing, and executes the decision processing based on the received log information, the decision processing can be quickly performed even when a failure related to the processing element 20c occurs. Can be executed.

さて、これまで本発明の実施例について説明したが、本発明は上述した実施例以外にも、特許請求の範囲に記載した技術的思想の範囲内において種々の異なる実施例にて実施されてもよいものである。 Although the embodiments of the present invention have been described so far, the present invention may be implemented in various different embodiments in addition to the above-described embodiments within the scope of the technical idea described in the claims. It ’s good.

たとえば、図７のステップＳ２１５において、プライマリコホートは、バックアップコホートとの間でデータの同期をとった後に、プライマリマスターに対してディシジョンＡＣＫメッセージを送信しているが、バックアップコホートとのデータ同期を待たずに、ディシジョンＡＣＫメッセージを送信することとしてもよい。これにより、ディシジョン処理に要する時間をさらに短縮することができる。 For example, in step S215 of FIG. 7, the primary cohort sends a decision ACK message to the primary master after synchronizing data with the backup cohort, but waits for data synchronization with the backup cohort. Instead, a decision ACK message may be transmitted. Thereby, the time required for the decision processing can be further shortened.

また、本実施例において説明した各処理のうち、自動的におこなわれるものとして説明した処理の全部または一部を手動的におこなうこともでき、あるいは、手動的におこなわれるものとして説明した処理の全部または一部を公知の方法で自動的におこなうこともできる。 In addition, among the processes described in this embodiment, all or part of the processes described as being performed automatically can be performed manually, or the processes described as being performed manually can be performed. All or a part can be automatically performed by a known method.

この他、上記文書中や図面中で示した処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。 In addition, the processing procedure, control procedure, specific name, and information including various data and parameters shown in the above-described document and drawings can be arbitrarily changed unless otherwise specified.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示のように構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。 Each component of each illustrated device is functionally conceptual and does not necessarily need to be physically configured as illustrated. In other words, the specific form of distribution / integration of each device is not limited to that shown in the figure, and all or a part thereof may be functionally or physically distributed or arbitrarily distributed in arbitrary units according to various loads or usage conditions. Can be integrated and configured.

さらに、各装置にて行なわれる各処理機能は、その全部または任意の一部が、ＣＰＵ（あるいはＭＣＵやＭＰＵ）および当該ＣＰＵ（あるいはＭＣＵやＭＰＵ）にて解析実行されるプログラムにて実現され、あるいは、ワイヤードロジックによるハードウェアとして実現され得る。 Furthermore, each processing function performed in each device is realized by a CPU (or MCU or MPU) and a program that is analyzed and executed by the CPU (or MCU or MPU), Alternatively, it can be realized as hardware by wired logic.

（付記１）データと当該データのバックアップデータとをそれぞれ異なるデータ記憶部に記憶させる制御をおこなうデータ記憶制御装置であって、
データに対するオペレーションの実行要求メッセージを複数送信する第１の記憶制御部と、
前記オペレーションの実行要求メッセージを受信した場合に、当該オペレーションを実行するとともに、オペレーションの実行が完了するたびにオペレーションの実行要求メッセージに対する応答メッセージを第１の記憶制御部に送信し、オペレーションに係るログ情報を生成して当該ログ情報を含んだメッセージを送信する１つ以上の第２の記憶制御部と、
前記１つ以上の第２の記憶制御部により送信されたメッセージを受信して各オペレーションに係るログ情報をメモリに記憶する１つ以上の第３の記憶制御部と、
を備え、
前記第１の記憶制御部は、前記１つ以上の第２の記憶制御部により送信された応答メッセージを受信してすべてのオペレーションの実行が完了したか否かを判定し、すべてのオペレーションの実行完了後、オペレーションに対するディシジョン処理の実行要求メッセージを送信し、
前記１つ以上の第２の記憶制御部は、第１の記憶制御部により送信されたディシジョン処理の実行要求メッセージを受信した場合にディシジョン処理を実行して、前記１つ以上の第２の記憶制御部のそれぞれに接続されたデータ記憶部に記憶されたデータの更新を制御するとともに、ディシジョン処理に係るログ情報を生成して当該ログ情報を含んだメッセージを送信し、
前記１つ以上の第３の記憶制御部は、前記１つ以上の第２の記憶制御部により送信されたメッセージを受信し、前記メモリに記憶されたログ情報に基づいてディシジョン処理を実行して、前記１つ以上の第３の記憶制御部のそれぞれに接続されたデータ記憶部に記憶されたバックアップデータの更新制御をおこなう
ことを特徴とするデータ記憶制御装置。 (Supplementary Note 1) A data storage control device that performs control to store data and backup data of the data in different data storage units,
A first storage control unit for transmitting a plurality of operation execution request messages for data;
When the operation execution request message is received, the operation is executed, and a response message to the operation execution request message is transmitted to the first storage control unit every time the execution of the operation is completed. One or more second storage control units for generating information and transmitting a message including the log information;
One or more third storage control units that receive messages transmitted by the one or more second storage control units and store log information related to each operation in a memory;
With
The first storage control unit receives the response message transmitted by the one or more second storage control units, determines whether or not execution of all operations is completed, and executes all operations. After completion, send a decision processing execution message for the operation,
The one or more second storage control units execute the decision processing when receiving the decision processing execution request message transmitted by the first storage control unit, and the one or more second storage control units While controlling the update of the data stored in the data storage unit connected to each of the control unit, generate log information related to the decision processing and send a message including the log information,
The one or more third storage control units receive a message transmitted from the one or more second storage control units and execute a decision process based on log information stored in the memory. A data storage control device that performs update control of backup data stored in a data storage unit connected to each of the one or more third storage control units.

（付記２）前記１つ以上の第３の記憶制御部は、前記ディシジョン処理を実行した後、ディシジョン処理の完了を示すディシジョン完了メッセージを送信し、前記１つ以上の第２の記憶制御部は、前記１つ以上の第３の記憶制御部からディシジョン完了メッセージを受信した場合に、前記第１の記憶制御部に対してデータの同期処理が完了したことを示す同期完了メッセージを送信することを特徴とする付記１に記載のデータ記憶制御装置。 (Supplementary Note 2) After the execution of the decision process, the one or more third storage control units transmit a decision completion message indicating the completion of the decision process, and the one or more second storage control units When a decision completion message is received from the one or more third storage control units, a synchronization completion message indicating that data synchronization processing has been completed is transmitted to the first storage control unit. The data storage control device according to appendix 1, which is characterized by the above.

（付記３）オペレーションの実行要求メッセージを送信する１つ以上の第２の記憶制御部を示すメンバーシップログ情報をメモリに記憶する第４の記憶制御部をさらに備え、前記第１の記憶制御部は、前記メンバーシップログ情報を生成して第４の記憶制御部に送信するとともに、前記ディシジョン処理の実行を決定した場合にディシジョン処理の実行決定を示すディシジョンログ情報を生成して第４の記憶制御部に送信し、前記第４の記憶制御部は、前記ディシジョンログ情報を受信した後、前記第１の記憶制御部に係る障害を検出した場合に、前記メンバーシップログ情報に基づいてオペレーションに対するディシジョン処理の実行要求メッセージを前記１つ以上の第２の記憶制御部に送信することを特徴とする付記１または２に記載のデータ記憶制御装置。 (Additional remark 3) It further has a 4th storage control part which memorizes membership log information which shows one or more 2nd storage control parts which transmit an execution request message of operation in a memory, The 1st storage control part Generates the membership log information and transmits it to the fourth storage control unit, and generates decision log information indicating the decision to execute the decision process when the decision to execute the decision process is generated, and stores the fourth log. When the fourth storage control unit detects a failure related to the first storage control unit after receiving the decision log information, the fourth storage control unit responds to the operation based on the membership log information. The decision process execution request message is transmitted to the one or more second storage control units. Over data storage control device.

（付記４）前記１つ以上の第２の記憶制御部は、前記第１の記憶制御部からディシジョン処理の実行要求メッセージを受信した後、前記１つ以上の第３の記憶制御部に係る障害を検出した場合に、第１の記憶制御部に対して前記ディシジョン完了メッセージを送信し、前記障害が復旧した場合に、前記１つ以上の第３の記憶制御部は、前記メモリに記憶されたログ情報に基づいてディシジョン処理を実行して、前記１つ以上の第３の記憶制御部のそれぞれに接続されたデータ記憶部に記憶されたバックアップデータの更新制御をおこなうことを特徴とする付記１、２または３に記載のデータ記憶制御装置。 (Supplementary Note 4) After the one or more second storage control units receive the decision processing execution request message from the first storage control unit, the faults related to the one or more third storage control units When the failure is detected, the decision completion message is transmitted to the first storage control unit. When the failure is recovered, the one or more third storage control units are stored in the memory. The decision processing is executed based on log information, and update control of backup data stored in a data storage unit connected to each of the one or more third storage control units is performed. 2. The data storage control device according to 2 or 3.

（付記５）前記第１の記憶制御部は、前記１つ以上の第２の記憶制御部により送信されたすべての応答メッセージを受信した後、前記１つ以上の第２の記憶制御部のうちのいずれかに係る障害を検出した場合に、前記１つ以上の第３の記憶制御部にオペレーションに対するディシジョン処理の実行要求メッセージを送信し、前記１つ以上の第３の記憶制御部は、第１の記憶制御部により送信されたディシジョン処理の実行要求メッセージを受信した場合に、ディシジョン処理を実行し、前記障害が復旧した場合にディシジョン処理に係るログ情報を障害から復旧した第２の記憶制御部に送信し、障害から復旧した第２の記憶制御部は、ディシジョン処理に係るログ情報を受信し、受信したログ情報に基づいてディシジョン処理を実行することを特徴とする付記１〜４のいずれか１つに記載のデータ記憶制御装置。 (Supplementary Note 5) The first storage control unit receives all response messages transmitted by the one or more second storage control units, and then includes the one or more second storage control units. When a failure related to any of the above is detected, a decision processing execution request message for an operation is transmitted to the one or more third storage control units, and the one or more third storage control units A second storage control in which the decision process is executed when the decision process execution request message transmitted by the first storage control unit is received, and the log information related to the decision process is restored from the failure when the failure is recovered; The second storage control unit that is transmitted to the unit and recovered from the failure receives the log information related to the decision process, and executes the decision process based on the received log information Data storage control apparatus according to any one of Appendices 1 to 4, characterized.

（付記６）複数のデータ記憶部を有し、データと当該データのバックアップデータとをそれぞれ異なるデータ記憶部に記憶させるデータ記憶装置であって、
データに対するオペレーションの実行要求メッセージを複数送信する第１の記憶制御部と、
前記オペレーションの実行要求メッセージを受信した場合に、当該オペレーションを実行するとともに、オペレーションの実行が完了するたびにオペレーションの実行要求メッセージに対する応答メッセージを第１の記憶制御部に送信し、オペレーションに係るログ情報を生成して当該ログ情報を含んだメッセージを送信する１つ以上の第２の記憶制御部と、
前記１つ以上の第２の記憶制御部により送信されたメッセージを受信して各オペレーションに係るログ情報をメモリに記憶する１つ以上の第３の記憶制御部と、
を備え、
前記第１の記憶制御部は、前記１つ以上の第２の記憶制御部により送信された応答メッセージを受信してすべてのオペレーションの実行が完了したか否かを判定し、すべてのオペレーションの実行完了後、オペレーションに対するディシジョン処理の実行要求メッセージを送信し、
前記１つ以上の第２の記憶制御部は、第１の記憶制御部により送信されたディシジョン処理の実行要求メッセージを受信した場合にディシジョン処理を実行して、前記１つ以上の第２の記憶制御部のそれぞれに接続されたデータ記憶部に記憶されたデータを更新するとともに、ディシジョン処理に係るログ情報を生成して当該ログ情報を含んだメッセージを送信し、
前記１つ以上の第３の記憶制御部は、前記１つ以上の第２の記憶制御部により送信されたメッセージを受信し、前記メモリに記憶されたログ情報に基づいてディシジョン処理を実行して、前記１つ以上の第３の記憶制御部のそれぞれに接続されたデータ記憶部に記憶されたバックアップデータの更新をおこなう
ことを特徴とするデータ記憶装置。 (Appendix 6) A data storage device that includes a plurality of data storage units and stores data and backup data of the data in different data storage units,
A first storage control unit for transmitting a plurality of operation execution request messages for data;
When the operation execution request message is received, the operation is executed, and a response message to the operation execution request message is transmitted to the first storage control unit every time the execution of the operation is completed. One or more second storage control units for generating information and transmitting a message including the log information;
One or more third storage control units that receive messages transmitted by the one or more second storage control units and store log information related to each operation in a memory;
With
The first storage control unit receives the response message transmitted by the one or more second storage control units, determines whether or not execution of all operations is completed, and executes all operations. After completion, send a decision processing execution message for the operation,
The one or more second storage control units execute the decision processing when receiving the decision processing execution request message transmitted by the first storage control unit, and the one or more second storage control units Update the data stored in the data storage unit connected to each of the control unit, generate log information related to the decision processing and send a message including the log information,
The one or more third storage control units receive a message transmitted from the one or more second storage control units and execute a decision process based on log information stored in the memory. A data storage device that updates backup data stored in a data storage unit connected to each of the one or more third storage control units.

（付記７）前記１つ以上の第３の記憶制御部は、前記ディシジョン処理を実行した後、ディシジョン処理の完了を示すディシジョン完了メッセージを送信し、前記１つ以上の第２の記憶制御部は、前記１つ以上の第３の記憶制御部からディシジョン完了メッセージを受信した場合に、前記第１の記憶制御部に対してデータの同期処理が完了したことを示す同期完了メッセージを送信することを特徴とする付記６に記載のデータ記憶装置。 (Supplementary Note 7) The one or more third storage control units, after executing the decision processing, transmit a decision completion message indicating completion of the decision processing, and the one or more second storage control units When a decision completion message is received from the one or more third storage control units, a synchronization completion message indicating that data synchronization processing has been completed is transmitted to the first storage control unit. The data storage device according to appendix 6, which is characterized.

（付記８）オペレーションの実行要求メッセージを送信する１つ以上の第２の記憶制御部を示すメンバーシップログ情報をメモリに記憶する第４の記憶制御部をさらに備え、前記第１の記憶制御部は、前記メンバーシップログ情報を生成して第４の記憶制御部に送信するとともに、前記ディシジョン処理の実行を決定した場合にディシジョン処理の実行決定を示すディシジョンログ情報を生成して第４の記憶制御部に送信し、前記第４の記憶制御部は、前記ディシジョンログ情報を受信した後、前記第１の記憶制御部に係る障害を検出した場合に、前記メンバーシップログ情報に基づいてオペレーションに対するディシジョン処理の実行要求メッセージを前記１つ以上の第２の記憶制御部に送信することを特徴とする付記６または７に記載のデータ記憶装置。 (Supplementary Note 8) The first storage control unit further includes a fourth storage control unit that stores, in a memory, membership log information indicating one or more second storage control units that transmit the operation execution request message. Generates the membership log information and transmits it to the fourth storage control unit, and generates decision log information indicating the decision to execute the decision process when the decision to execute the decision process is generated, and stores the fourth log. When the fourth storage control unit detects a failure related to the first storage control unit after receiving the decision log information, the fourth storage control unit responds to the operation based on the membership log information. 8. The decision process execution request message is transmitted to the one or more second storage control units. Over data storage device.

（付記９）前記１つ以上の第２の記憶制御部は、前記第１の記憶制御部からディシジョン処理の実行要求メッセージを受信した後、前記１つ以上の第３の記憶制御部に係る障害を検出した場合に、第１の記憶制御部に対して前記ディシジョン完了メッセージを送信し、前記障害が復旧した場合に、前記１つ以上の第３の記憶制御部は、前記メモリに記憶されたログ情報に基づいてディシジョン処理を実行して、前記１つ以上の第３の記憶制御部のそれぞれに接続されたデータ記憶部に記憶されたバックアップデータを更新することを特徴とする付記６、７または８に記載のデータ記憶装置。 (Supplementary note 9) The one or more second storage control units receive a decision processing execution request message from the first storage control unit, and then have a failure related to the one or more third storage control units. When the failure is detected, the decision completion message is transmitted to the first storage control unit. When the failure is recovered, the one or more third storage control units are stored in the memory. Appendices 6 and 7, wherein a decision process is executed based on log information to update backup data stored in a data storage unit connected to each of the one or more third storage control units Or the data storage device according to 8;

（付記１０）前記第１の記憶制御部は、前記１つ以上の第２の記憶制御部により送信されたすべての応答メッセージを受信した後、前記１つ以上の第２の記憶制御部のうちのいずれかに係る障害を検出した場合に、前記１つ以上の第３の記憶制御部にオペレーションに対するディシジョン処理の実行要求メッセージを送信し、前記１つ以上の第３の記憶制御部は、第１の記憶制御部により送信されたディシジョン処理の実行要求メッセージを受信した場合に、ディシジョン処理を実行し、前記障害が復旧した場合にディシジョン処理に係るログ情報を障害から復旧した第２の記憶制御部に送信し、障害から復旧した第２の記憶制御部は、ディシジョン処理に係るログ情報を受信し、受信したログ情報に基づいてディシジョン処理を実行することを特徴とする付記６〜９のいずれか１つに記載のデータ記憶装置。 (Supplementary Note 10) The first storage control unit receives all response messages transmitted by the one or more second storage control units, and then includes the one or more second storage control units. When a failure related to any of the above is detected, a decision processing execution request message for an operation is transmitted to the one or more third storage control units, and the one or more third storage control units A second storage control in which the decision process is executed when the decision process execution request message transmitted by the first storage control unit is received, and the log information related to the decision process is restored from the failure when the failure is recovered; The second storage control unit that has been transmitted to the unit and recovered from the failure receives the log information related to the decision process, and executes the decision process based on the received log information. Data storage device according to any one of Appendices 6-9 characterized by.

（付記１１）データと当該データのバックアップデータとをそれぞれ異なるデータ記憶部に記憶させるデータ記憶方法であって、
第１の記憶制御部が、データに対するオペレーションの実行要求メッセージを送信するオペレーション実行要求メッセージ送信工程と、
１つ以上の第２の記憶制御部が、前記オペレーションの実行要求メッセージを受信した場合に、当該オペレーションを実行するとともに、オペレーションの実行が完了するたびにオペレーションの実行要求メッセージに対する応答メッセージを第１の記憶制御部に送信し、オペレーションに係るログ情報を生成して当該ログ情報を含んだメッセージを送信し、１つ以上の第３の記憶制御部が、前記１つ以上の第２の記憶制御部により送信されたメッセージを受信して各オペレーションに係るログ情報をメモリに記憶するオペレーション実行工程と、
前記第１の記憶制御部が、前記１つ以上の第２の記憶制御部により送信された応答メッセージを受信してすべてのオペレーションの実行が完了したか否かを判定し、すべてのオペレーションの実行完了後、オペレーションに対するディシジョン処理の実行要求メッセージを送信するディシジョン実行要求メッセージ送信工程と、
前記１つ以上の第２の記憶制御部が、第１の記憶制御部により送信されたディシジョン処理の実行要求メッセージを受信した場合にディシジョン処理を実行して、前記１つ以上の第２の記憶制御部のそれぞれに接続されたデータ記憶部に記憶されたデータを更新するとともに、ディシジョン処理に係るログ情報を生成して当該ログ情報を含んだメッセージを送信し、前記１つ以上の第３の記憶制御部が、前記１つ以上の第２の記憶制御部により送信されたメッセージを受信し、前記メモリに記憶されたログ情報に基づいてディシジョン処理を実行して、前記１つ以上の第３の記憶制御部のそれぞれに接続されたデータ記憶部に記憶されたバックアップデータの更新をおこなうディシジョン実行工程と、
を含んだことを特徴とするデータ記憶方法。 (Supplementary note 11) A data storage method for storing data and backup data of the data in different data storage units,
An operation execution request message transmission step in which the first storage control unit transmits an operation execution request message for data;
When one or more second storage control units receive the operation execution request message, the one or more second storage control units execute the operation, and each time the execution of the operation is completed, a first response message to the operation execution request message is sent. To the storage control unit, generates log information related to the operation, transmits a message including the log information, and one or more third storage control units transmit the one or more second storage controls. An operation execution step of receiving a message transmitted by the unit and storing log information related to each operation in a memory;
The first storage control unit receives the response message transmitted by the one or more second storage control units, determines whether or not all the operations have been executed, and executes all the operations. A decision execution request message transmission step for transmitting a decision processing execution request message for the operation after completion;
When the one or more second storage control units receive the decision processing execution request message transmitted by the first storage control unit, the decision processing is executed, and the one or more second storage control units Updating the data stored in the data storage unit connected to each of the control units, generating log information related to the decision process, and transmitting a message including the log information; The storage control unit receives the message transmitted by the one or more second storage control units, executes a decision process based on the log information stored in the memory, and the one or more third storage units A decision execution step for updating backup data stored in a data storage unit connected to each of the storage control units;
A data storage method comprising:

（付記１２）前記１つ以上の第３の記憶制御部が、前記ディシジョン処理を実行した後、ディシジョン処理の完了を示すディシジョン完了メッセージを送信するディシジョン完了メッセージ送信工程と、前記１つ以上の第２の記憶制御部が、前記１つ以上の第３の記憶制御部からディシジョン完了メッセージを受信した場合に、前記第１の記憶制御部に対してデータの同期処理が完了したことを示す同期完了メッセージを送信する同期完了メッセージ送信工程とをさらに含んだことを特徴とする付記１１に記載のデータ記憶方法。 (Supplementary Note 12) A decision completion message transmission step of transmitting a decision completion message indicating completion of the decision processing after the one or more third storage control units have executed the decision processing, and the one or more first storage control portions. When the two storage control units receive the decision completion message from the one or more third storage control units, the synchronization completion indicating that the data synchronization processing is completed to the first storage control unit The data storage method according to claim 11, further comprising a synchronization completion message transmission step of transmitting a message.

（付記１３）前記オペレーション実行要求メッセージ送信工程は、前記第１の記憶制御部が、オペレーションの実行要求メッセージを送信する１つ以上の第２の記憶制御部を示すメンバーシップログ情報を生成して第４の記憶制御部に送信し、第４の記憶制御部が、前記メンバーシップログ情報をメモリに記憶し、前記ディシジョン実行要求メッセージ送信工程は、前記第１の記憶制御部が、前記ディシジョン処理の実行を決定した場合にディシジョン処理の実行決定を示すディシジョンログ情報を生成して第４の記憶制御部に送信し、第４の記憶制御部が、前記ディシジョンログ情報を受信した後、前記第１の記憶制御部に係る障害を検出した場合に、前記メンバーシップログ情報に基づいてオペレーションに対するディシジョン処理の実行要求メッセージを前記１つ以上の第２の記憶制御部に送信することを特徴とする付記１１または１２に記載のデータ記憶方法。 (Supplementary Note 13) In the operation execution request message transmission step, the first storage control unit generates membership log information indicating one or more second storage control units that transmit the operation execution request message. The fourth storage control unit stores the membership log information in a memory, and the decision execution request message transmission step is performed by the first storage control unit in the decision process. Decision log information indicating decision of execution of decision processing is generated and transmitted to the fourth storage control unit, and after the fourth storage control unit receives the decision log information, When a failure related to one storage control unit is detected, the decision processing for the operation is executed based on the membership log information. Data storage method according to appendix 11 or 12 and transmits the request message the one or more in the second storage control unit.

（付記１４）前記ディシジョン実行工程は、前記１つ以上の第２の記憶制御部が、前記第１の記憶制御部からディシジョン処理の実行要求メッセージを受信した後、前記１つ以上の第３の記憶制御部に係る障害を検出した場合に、第１の記憶制御部に対して前記ディシジョン完了メッセージを送信し、前記障害が復旧した場合に、前記１つ以上の第３の記憶制御部が、前記メモリに記憶されたログ情報に基づいてディシジョン処理を実行して、前記１つ以上の第３の記憶制御部のそれぞれに接続されたデータ記憶部に記憶されたバックアップデータを更新することを特徴とする付記１１、１２または１３に記載のデータ記憶方法。 (Supplementary Note 14) In the decision execution step, the one or more second storage control units receive the decision processing execution request message from the first storage control unit, and then the one or more third storage control units When a failure related to the storage control unit is detected, the decision completion message is transmitted to the first storage control unit, and when the failure is recovered, the one or more third storage control units are: A decision process is executed based on the log information stored in the memory to update backup data stored in a data storage unit connected to each of the one or more third storage control units. The data storage method according to Supplementary Note 11, 12 or 13.

（付記１５）前記ディシジョン実行要求メッセージ送信工程は、前記第１の記憶制御部が、前記１つ以上の第２の記憶制御部により送信されたすべての応答メッセージを受信した後、前記１つ以上の第２の記憶制御部のうちのいずれかに係る障害を検出した場合に、前記１つ以上の第３の記憶制御部にオペレーションに対するディシジョン処理の実行要求メッセージを送信し、前記ディシジョン実行工程は、前記１つ以上の第３の記憶制御部が、第１の記憶制御部により送信されたディシジョン処理の実行要求メッセージを受信した場合に、ディシジョン処理を実行し、前記障害が復旧した場合にディシジョン処理に係るログ情報を障害から復旧した第２の記憶制御部に送信し、障害から復旧した第２の記憶制御部が、ディシジョン処理に係るログ情報を受信し、受信したログ情報に基づいてディシジョン処理を実行することを特徴とする付記１１〜１４のいずれか１つに記載のデータ記憶方法。 (Supplementary Note 15) In the decision execution request message transmission step, after the first storage control unit has received all response messages transmitted by the one or more second storage control units, the one or more ones When a failure related to any one of the second storage control units is detected, a decision processing execution request message for an operation is transmitted to the one or more third storage control units, and the decision execution step includes: When the one or more third storage control units receive the decision processing execution request message transmitted by the first storage control unit, the decision processing is executed, and the decision is made when the failure is recovered. The log information related to the process is transmitted to the second storage control unit recovered from the failure, and the second storage control unit recovered from the failure is involved in the decision processing. Data storage method according to the received log information, any one of Appendices 11 to 14, characterized in that to execute the decision processing based on the received log information.

（付記１６）データと当該データのバックアップデータとをそれぞれ異なるデータ記憶部に記憶させる制御をおこなうデータ記憶制御プログラムであって、
第１の記憶制御部が、データに対するオペレーションの実行要求メッセージを送信するオペレーション実行要求メッセージ送信手順と、
１つ以上の第２の記憶制御部が、前記オペレーションの実行要求メッセージを受信した場合に、当該オペレーションを実行するとともに、オペレーションの実行が完了するたびにオペレーションの実行要求メッセージに対する応答メッセージを第１の記憶制御部に送信し、オペレーションに係るログ情報を生成して当該ログ情報を含んだメッセージを送信し、１つ以上の第３の記憶制御部が、前記１つ以上の第２の記憶制御部により送信されたメッセージを受信して各オペレーションに係るログ情報をメモリに記憶するオペレーション実行手順と、
前記第１の記憶制御部が、前記１つ以上の第２の記憶制御部により送信された応答メッセージを受信してすべてのオペレーションの実行が完了したか否かを判定し、すべてのオペレーションの実行完了後、オペレーションに対するディシジョン処理の実行要求メッセージを送信するディシジョン実行要求メッセージ送信手順と、
前記１つ以上の第２の記憶制御部が、第１の記憶制御部により送信されたディシジョン処理の実行要求メッセージを受信した場合にディシジョン処理を実行して、前記１つ以上の第２の記憶制御部のそれぞれに接続されたデータ記憶部に記憶されたデータを更新するとともに、ディシジョン処理に係るログ情報を生成して当該ログ情報を含んだメッセージを送信し、前記１つ以上の第３の記憶制御部が、前記１つ以上の第２の記憶制御部により送信されたメッセージを受信し、前記メモリに記憶されたログ情報に基づいてディシジョン処理を実行して、前記１つ以上の第３の記憶制御部のそれぞれに接続されたデータ記憶部に記憶されたバックアップデータの更新をおこなうディシジョン実行手順と、
をコンピュータに実行させることを特徴とするデータ記憶制御プログラム。 (Supplementary Note 16) A data storage control program for performing control for storing data and backup data of the data in different data storage units,
An operation execution request message transmission procedure in which the first storage control unit transmits an operation execution request message for data;
When one or more second storage control units receive the operation execution request message, the one or more second storage control units execute the operation, and each time the execution of the operation is completed, a first response message to the operation execution request message is sent. To the storage control unit, generates log information related to the operation, transmits a message including the log information, and one or more third storage control units transmit the one or more second storage controls. An operation execution procedure for receiving a message transmitted by the unit and storing log information related to each operation in a memory;
The first storage control unit receives the response message transmitted by the one or more second storage control units, determines whether or not all the operations have been executed, and executes all the operations. After completion, a decision execution request message sending procedure for sending a decision processing execution request message for the operation;
When the one or more second storage control units receive the decision processing execution request message transmitted by the first storage control unit, the decision processing is executed, and the one or more second storage control units Updating the data stored in the data storage unit connected to each of the control units, generating log information related to the decision process, and transmitting a message including the log information; The storage control unit receives the message transmitted by the one or more second storage control units, executes a decision process based on the log information stored in the memory, and the one or more third storage units A decision execution procedure for updating backup data stored in a data storage unit connected to each of the storage control units;
A data storage control program for causing a computer to execute.

以上のように、本発明に係るデータ記憶制御装置およびデータ記憶装置は、トランザクションに係るオーバーヘッドの増大を抑制し、データ処理の効率を向上させることができるデータ記憶システムに有用である。 As described above, the data storage control device and the data storage device according to the present invention are useful for a data storage system that can suppress an increase in overhead related to a transaction and improve the efficiency of data processing.

本実施例に係る自律ディスクシステムの機能構成を示す図である。It is a figure which shows the function structure of the autonomous disk system which concerns on a present Example. Ｆａｔ−Ｂｔｒｅｅについて説明する説明図である。It is explanatory drawing explaining Fat-Btree. 図１に示したプロセッシングエレメント２０ａ〜２０ｄの機能構成を示す図である。It is a figure which shows the function structure of the processing elements 20a-20d shown in FIG. 図３に示したバックアップディスク管理データ３３１ａ〜３３１ｄの一例を示す図である。FIG. 4 is a diagram illustrating an example of backup disk management data 331a to 331d illustrated in FIG. 3. 図３に示したＡＣＫ管理データ３３２ａ〜３３２ｄの一例を示す図である。It is a figure which shows an example of the ACK management data 332a-332d shown in FIG. 非同期ｎｅｉｇｈｂｏｒ−ＷＡＬプロトコルについて説明するシーケンス図である。It is a sequence diagram explaining an asynchronous neighbor-WAL protocol. 本実施例に係るＢＡ−１．５フェーズコミットプロトコルについて説明するシーケンス図である。It is a sequence diagram explaining the BA-1.5 phase commit protocol which concerns on a present Example. ディシジョン前にプライマリマスターに障害が検出された場合の復帰処理について説明するシーケンス図である。It is a sequence diagram explaining a return process when a failure is detected in the primary master before the decision. ディシジョン処理実行決定後、すべてのプライマリコホートからＡＣＫメッセージを受信する前にプライマリマスターに障害が検出された場合の復帰処理について説明するシーケンス図である。FIG. 10 is a sequence diagram for explaining a return process when a failure is detected in the primary master before ACK messages are received from all primary cohorts after decision processing execution is determined. バックアップコホートに障害が検出された場合の復帰処理について説明するシーケンス図である。It is a sequence diagram explaining a return process when a failure is detected in a backup cohort. 最後のオペレーションのプリペアード状態以降にプライマリコホートに障害が検出された場合の復帰処理について説明するシーケンス図である。It is a sequence diagram explaining a return process when a failure is detected in the primary cohort after the prepared state of the last operation. 図３に示したコントローラ３０ａ〜３０ｄとなるコンピュータのハードウェア構成を示す図である。It is a figure which shows the hardware constitutions of the computer used as the controllers 30a-30d shown in FIG. 各プロトコルのコミットされたトランザクションのオーバーヘッドを示す図である。FIG. 4 is a diagram illustrating the overhead of committed transactions for each protocol. 各プロトコルのスループットの比較結果を示す図である。It is a figure which shows the comparison result of the throughput of each protocol. ＢＡ−１．５フェーズコミットプロトコルのスループットを示す図である。It is a figure which shows the throughput of BA-1.5 phase commit protocol. ２ＰＣプロトコルのスループットを示す図である。It is a figure which shows the throughput of 2PC protocol. ＥＰプロトコルのスループットを示す図である。It is a figure which shows the throughput of EP protocol.

Explanation of symbols

１０ａ〜１０ｃクライアント
１００ネットワークインターフェース
１０１ディスクインターフェース
１０２ＣＰＵ
１０３ＲＯＭ
１０４ＲＡＭ
１０５バス
２０ａ〜２０ｄプロセッシングエレメント
３０ａ〜３０ｄコントローラ
３１ａ〜３１ｄネットワークインターフェース部
３２ａ〜３２ｄディスクインターフェース部
３３ａ〜３３ｄメモリ
３３０ａ〜３３０ｄ，１０４ａログ
３３１ａ〜３３１ｄ，１０４ｂバックアップディスク管理データ
３３２ａ〜３３２ｄ，１０４ｃＡＣＫ管理データ
３４ａ〜３４ｄ制御部
３４０ａ〜３４０ｄトランザクション実行部
３４１ａ〜３４１ｄ障害復旧処理部
４０ａ〜４０ｄディスク装置
４００ａ〜４００ｄディレクトリデータ
４０１ａ〜４０１ｄディレクトリバックアップデータ
４０２ａ〜４０２ｄプライマリデータ
４０３ａ〜４０３ｄバックアップデータ
５０ネットワーク 10a to 10c Client 100 Network interface 101 Disk interface 102 CPU
103 ROM
104 RAM
105 Bus 20a-20d Processing element 30a-30d Controller 31a-31d Network interface unit 32a-32d Disk interface unit 33a-33d Memory 330a-330d, 104a Log 331a-331d, 104b Backup disk management data 332a-332d, 104c ACK management data 34a to 34d Control unit 340a to 340d Transaction execution unit 341a to 341d Failure recovery processing unit 40a to 40d Disk device 400a to 400d Directory data 401a to 401d Directory backup data 402a to 402d Primary data 403a to 403d Backup data 50 Network

Claims

A data storage control device that performs control to store data and backup data of the data in different data storage units,
A first storage control unit for transmitting a plurality of operation execution request messages for data;
When the operation execution request message is received, the operation is executed, and a response message to the operation execution request message is transmitted to the first storage control unit every time the execution of the operation is completed. One or more second storage control units for generating information and transmitting a message including the log information;
One or more third storage control units that receive messages transmitted by the one or more second storage control units and store log information related to each operation in a memory;
With
The first storage control unit receives the response message transmitted by the one or more second storage control units, determines whether or not execution of all operations is completed, and executes all operations. After completion, send a decision processing execution message for the operation,
The one or more second storage control units execute the decision processing when receiving the decision processing execution request message transmitted by the first storage control unit, and the one or more second storage control units While controlling the update of the data stored in the data storage unit connected to each of the control unit, generate log information related to the decision processing and send a message including the log information,
The one or more third storage control units receive a message transmitted from the one or more second storage control units and execute a decision process based on log information stored in the memory. A data storage control device that performs update control of backup data stored in a data storage unit connected to each of the one or more third storage control units.

The one or more third storage control units, after executing the decision processing, transmit a decision completion message indicating completion of the decision processing, and the one or more second storage control units When a decision completion message is received from the third storage control unit, a synchronization completion message indicating that data synchronization processing has been completed is transmitted to the first storage control unit. Item 4. The data storage control device according to Item 1.

A fourth storage control unit that stores, in a memory, membership log information indicating one or more second storage control units that transmit an operation execution request message; and the first storage control unit includes the member Ship log information is generated and transmitted to the fourth storage control unit. When execution of the decision process is determined, decision log information indicating determination of execution of the decision process is generated and transmitted to the fourth storage control unit. When the fourth storage control unit detects a failure related to the first storage control unit after receiving the decision log information, the fourth storage control unit executes a decision process for the operation based on the membership log information. 3. The data recording according to claim 1, wherein a request message is transmitted to the one or more second storage control units. The control device.

The one or more second storage control units detect a failure related to the one or more third storage control units after receiving a decision processing execution request message from the first storage control unit The decision complete message is transmitted to the first storage control unit, and when the failure is recovered, the one or more third storage control units are based on the log information stored in the memory. A decision process is executed to update control of backup data stored in a data storage unit connected to each of the one or more third storage control units. 4. The data storage control device according to 3.

The first storage control unit receives all response messages transmitted by the one or more second storage control units, and then receives the response message from any one of the one or more second storage control units. When such a failure is detected, a decision process execution request message for an operation is transmitted to the one or more third storage control units, and the one or more third storage control units perform the first storage control. When the decision process execution request message transmitted by the unit is received, the decision process is executed, and when the failure is recovered, log information related to the decision process is transmitted to the second storage control unit recovered from the failure. The second storage control unit recovered from the failure receives log information related to the decision processing, and executes the decision processing based on the received log information. Data storage control apparatus according to any one of claims 1 to 4.

A data storage device having a plurality of data storage units and storing data and backup data of the data in different data storage units,
A first storage control unit for transmitting a plurality of operation execution request messages for data;
When the operation execution request message is received, the operation is executed, and a response message to the operation execution request message is transmitted to the first storage control unit every time the execution of the operation is completed. One or more second storage control units for generating information and transmitting a message including the log information;
One or more third storage control units that receive messages transmitted by the one or more second storage control units and store log information related to each operation in a memory;
With
The first storage control unit receives the response message transmitted by the one or more second storage control units, determines whether or not execution of all operations is completed, and executes all operations. After completion, send a decision processing execution message for the operation,
The one or more second storage control units execute the decision processing when receiving the decision processing execution request message transmitted by the first storage control unit, and the one or more second storage control units Update the data stored in the data storage unit connected to each of the control unit, generate log information related to the decision processing and send a message including the log information,
The one or more third storage control units receive a message transmitted from the one or more second storage control units and execute a decision process based on log information stored in the memory. A data storage device that updates backup data stored in a data storage unit connected to each of the one or more third storage control units.

The one or more third storage control units, after executing the decision processing, transmit a decision completion message indicating completion of the decision processing, and the one or more second storage control units When a decision completion message is received from the third storage control unit, a synchronization completion message indicating that data synchronization processing has been completed is transmitted to the first storage control unit. Item 7. The data storage device according to Item 6.

A fourth storage control unit that stores, in a memory, membership log information indicating one or more second storage control units that transmit an operation execution request message; and the first storage control unit includes the member Ship log information is generated and transmitted to the fourth storage control unit. When execution of the decision process is determined, decision log information indicating determination of execution of the decision process is generated and transmitted to the fourth storage control unit. When the fourth storage control unit detects a failure related to the first storage control unit after receiving the decision log information, the fourth storage control unit executes a decision process for the operation based on the membership log information. 8. The data storage according to claim 6, wherein a request message is transmitted to the one or more second storage control units. Apparatus.

The one or more second storage control units detect a failure related to the one or more third storage control units after receiving a decision processing execution request message from the first storage control unit The decision complete message is transmitted to the first storage control unit, and when the failure is recovered, the one or more third storage control units are based on the log information stored in the memory. 9. The decision processing is executed to update backup data stored in a data storage unit connected to each of the one or more third storage control units. The data storage device described.

The first storage control unit receives all response messages transmitted by the one or more second storage control units, and then receives the response message from any one of the one or more second storage control units. When such a failure is detected, a decision process execution request message for an operation is transmitted to the one or more third storage control units, and the one or more third storage control units perform the first storage control. When the decision process execution request message transmitted by the unit is received, the decision process is executed, and when the failure is recovered, log information related to the decision process is transmitted to the second storage control unit recovered from the failure. The second storage control unit recovered from the failure receives log information related to the decision processing, and executes the decision processing based on the received log information. Data storage device according to any one of claims 6-9.