JP2013033441A

JP2013033441A - Database management method

Info

Publication number: JP2013033441A
Application number: JP2012013839A
Authority: JP
Inventors: Hiroyuki Yamada; 浩之山田
Original assignee: Murakumo Corp
Current assignee: Murakumo Corp
Priority date: 2011-07-05
Filing date: 2012-01-26
Publication date: 2013-02-14
Anticipated expiration: 2032-01-20
Also published as: JP5283769B2; JP2013033439A; JP2013033440A; JP5283768B2; JP5283766B2; JP2013037669A; JP5283767B2

Abstract

PROBLEM TO BE SOLVED: To realize a database management technique of a multiple master node type which can efficiently utilize resources.SOLUTION: A management method performs the steps of: updating a database of a higher master node if a transaction minimal value is equal to or more than a cluster minimal value and aborting lower database update information if the transaction minimal value is smaller than the cluster minimal value, in the higher master node which receives the lower database update information; generating an update record of the higher master node as a transaction log; distributing the transaction log to the lower master node; updating the own database on the basis of the received transaction log; and discarding a transaction with the transaction minimal value which is smaller than the cluster minimal value.

Description

本発明は、データベースの管理方法、特に複数のマスタノードがネットワークにより階層的に接続されているデータベースの管理方法に関する。 The present invention relates to a database management method, and more particularly to a database management method in which a plurality of master nodes are hierarchically connected by a network.

PostgreSQLのような追記型のデータベースといえども、リソースを有効活用するためには、タプルを削除した後に一定時間経過したら当該タプルを完全消去(vacuum)することが望ましい。 Even for a write-once database such as PostgreSQL, in order to make effective use of resources, it is desirable to completely delete the tuple after a certain period of time has elapsed after it has been deleted.

しかし、トランザクションは実行開始から終了までを単一のスナップショットで管理するため、あるトランザクションによって特定のタプルが削除されたとしても、そのことによって即座に全てのトランザクションから当該タプルが参照されなくなるとは限らない。そのため、当該タプルを参照する一連のトランザクションが全てコミットされない限りそのタプルは完全消去されないことが原則である。もしそのタプルが削除されたとしても、その時点で実行中の他のスナップショット上では当該タプルが参照できるようにしておかなければならないのである。 However, since a transaction is managed from the start to the end with a single snapshot, even if a specific tuple is deleted by a certain transaction, this does not immediately refer to the tuple from all transactions. Not exclusively. Therefore, in principle, the tuple is not completely erased unless all the series of transactions that refer to the tuple are committed. Even if the tuple is deleted, it must be visible on other snapshots currently being executed.

そのため、PostgreSQLでは、トランザクション毎に付与されているトランザクションIDを利用して、各トランザクションのスナップショットからみてすでにコミットされているトランザクションIDを参照しながらどこまで完全消去すればよいかを判断する仕組みを採用していた。すなわち、各トランザクションのスナップショットで実行中とみなされているトランザクションIDのうち最小の値を計算し、その最小値のＩＤに合わせて完全消去する運用をしていたわけである。 Therefore, PostgreSQL uses a transaction ID assigned to each transaction and adopts a mechanism to determine how far to completely erase while referring to the transaction ID that has already been committed as seen from the snapshot of each transaction. Was. That is, the minimum value of the transaction IDs that are considered to be being executed in the snapshot of each transaction is calculated, and the deletion is performed in accordance with the ID of the minimum value.

特開２００６−２９３９１０号公報JP 2006-293910 A

以上述べた運用はシングルノード構成のデータベースでの運用であるが、これをマルチマスタノードに適用した場合にはシングルノードシステムでは予期すらできない以下のような課題が発生することが本発明者によって見出された。 Although the operations described above are operations on a database having a single node configuration, when the present invention is applied to a multi-master node, the present inventors have found that the following problems that cannot be expected in a single node system occur. It was issued.

すなわち、マルチマスタノードシステムは、階層的に構築された膨大な数のノードで構成され、かつそのノード上で並列に同時実行される膨大なトランザクションを有している。そのため、ノード間の処理能力にばらつきがあった場合、一つのノードの処理遅延が全体の処理能力に影響を与えてしまうことになる。 That is, the multi-master node system is composed of a huge number of nodes constructed hierarchically and has a huge number of transactions that are simultaneously executed in parallel on the nodes. For this reason, when there is a variation in processing capability between nodes, the processing delay of one node affects the overall processing capability.

そのため、マルチマスタノード構成のデータベースの場合、いつまでたっても完全消去できる最小値ＩＤの値が大きくならず、そのためにノード全体の完全消去処理の実効性が確保できなかった。 For this reason, in the case of a database having a multi-master node configuration, the value of the minimum value ID that can be completely erased indefinitely does not increase, so that the effectiveness of the complete erase process of the entire node cannot be ensured.

このため、強制的に最小値ＩＤをある程度の大きさの値に設定して完全消去処理を行ってしまう方法が考えられる。
ところで、この種のマルチマスタノード形式のデータベースにおいて、本発明者はその更新を効率的に矛盾無く行うために、下位のマスタノードのデータベースのシャドウコピ
ーと自身のメモリ上に展開されたヒープタプルマップとを書き込みセットとして上位マスタノードに送信してノード全体のデータベースの更新処理を統一する技術を提案している（特願２０１０−２３９７１３）。 For this reason, a method of forcibly setting the minimum value ID to a value of a certain size and performing the complete erasure process is conceivable.
By the way, in this kind of multi-master node format database, the present inventor has performed a shadow copy of the database of the lower master node and a heap tuple map developed in its own memory in order to perform the update efficiently and without contradiction. Has been proposed as a writing set to unify database update processing for the entire node (Japanese Patent Application No. 2010-239713).

ところで、この下位ノードで書き込みセットを生成して上位ノードに向けて送信している間に、書き込みセットの更新対象になっているタプルに対する完全消去命令が上位ノードから下位ノードに対して送信された場合、その後に前記書き込みセットを受信した上位ノードでは、既に完全消去されたはずのタプルを書き込みセットの内容に基づいて更新してしまう可能性があった。 By the way, while the write set is generated and transmitted to the upper node in this lower node, a complete erasure command for the tuple that is the update target of the write set is transmitted from the upper node to the lower node. In this case, the upper node that subsequently received the write set may update the tuple that should have been completely erased based on the contents of the write set.

そのため、マルチマスタノード形式のデータベースにおいても、データベース全体の整合性を崩すことなく、トランザクションの廃棄処理を行う必要があった。 Therefore, even in a multi-master node format database, it is necessary to perform a transaction discarding process without destroying the consistency of the entire database.

前記課題を解決するために、本発明では以下の手段を採用した。 In order to solve the above-described problems, the present invention employs the following means.

本発明の第１形態は、レコード更新が可能な上位と下位のマスタノードを階層的に有する追記型データベースの管理方法であって、いずれかの下位マスタノードのセッションにおいて、上位マスタノードに対して、当該下位のマスタノードのデータベースのシャドウコピーと、自身のメモリ上に展開されたヒープタプルマップと、実行中のトランザクションが参照するスナップショットのトランザクション最小値とを書き込みセットとして送信するステップと、前記上位マスタノードにおいて、前記下位マスタノードから受信した書き込みセット中のトランザクション最小値と上位マスタノードが保持するクラスタ最小値とを比較して、トランザクション最小値がクラスタ最小値と等しいか大きい場合には、当該書き込みセット中のヒープタプルマップと自身のデータベースとを比較して、ターゲットとして登録されているタプルのデータベースにおける更新の有無を検証し、更新がなされているときには当該書き込みセットをアボートし、更新がなされていないときには前記シャドウコピーを用いて自身のデータベースを更新するとともに、この更新記録をトランザクションログとして生成するステップと、前記でトランザクション最小値がクラスタ最小値よりも小さい場合には、当該書き込みセットをアボートするステップと、前記トランザクションログを前記送信元の下位マスタノードを含む下位マスタノードに配信するステップと、前記下位マスタノードにおいて、前記で受信したトランザクションログに基づいて自身のデータベースを更新するステップと、前記上位マスタノードが保持するクラスタ最小値を下位マスタノードに通知して、該クラスタ最小値よりも小さいトランザクション最小値を有するトランザクションを廃棄させるステップと、からなるマスタノードを階層的に有する追記型データベースの管理方法である。 A first aspect of the present invention is a write-once database management method that hierarchically has upper and lower master nodes capable of record update, and in a session of any lower master node, Sending the shadow copy of the database of the lower master node, the heap tuple map expanded on its own memory, and the transaction minimum value of the snapshot referenced by the executing transaction as a write set; In the upper master node, the transaction minimum value in the write set received from the lower master node is compared with the cluster minimum value held by the upper master node, and if the transaction minimum value is equal to or greater than the cluster minimum value, Heap data in the write set The map is compared with its own database to verify whether there is an update in the database of the tuple registered as a target. When the update is made, the write set is aborted, and when the update is not made, the shadow copy Updating its own database using the command, generating the update record as a transaction log, aborting the write set if the transaction minimum value is smaller than the cluster minimum value, and the transaction Distributing the log to a lower master node including the transmission source lower master node, updating the own database based on the transaction log received in the lower master node, and the upper master node; A method for managing a write-once database having hierarchical master nodes, comprising: notifying a lower master node of a cluster minimum value held by a node and discarding a transaction having a transaction minimum value smaller than the cluster minimum value. It is.

また、本発明の第２形態は、前記下位マスタノードは、実行中のトランザクションを時系列に記録した複数のスナップショット間のトランザクション最小値同士を比較して最も小さい値をノード最小値とし、このノード最小値をツリー最小値として上位マスタノードに通知し、前記上位マスタノードは、自身のノード最小値と、前記１または２以上の下位マスタノードから通知されたツリー最小値とから選択的にクラスタ最小値を決定・保持する前記第１形態記載の追記型データベースの管理方法である。 Further, according to the second aspect of the present invention, the lower master node compares the transaction minimum values between a plurality of snapshots in which transactions being executed are recorded in time series, and sets the smallest value as the node minimum value. The node minimum value is notified to the upper master node as a tree minimum value, and the upper master node is selectively clustered from its own node minimum value and the tree minimum value notified from the one or more lower master nodes. The write-once database management method according to the first embodiment, wherein the minimum value is determined and held.

また、本発明の第３形態は、前記上位マスタノードと前記下位マスタノードとの間の階層に中位マスタノードを有しており、前記下位マスタノードは、実行中のトランザクションを時系列に記録した複数のスナップショット間のトランザクション最小値同士を比較して最も小さい値をノード最小値とし、このノード最小値をツリー最小値として中位マスタノードに通知し、前記中位マスタノードは、自身のノード最小値と、前記１または２以上の下位マスタノードから通知されたツリー最小値とを比較して最も小さい値を当該中位マ
スタノードのツリー最小値として上位マスタノードに通知し、上位マスタノードは、自身のノード最小値と、前記１または２以上の中位マスタノードから通知されたツリー最小値とから選択的にクラスタ最小値を決定・保持する第１形態記載の追記型データベースの管理方法である。 The third form of the present invention has a middle master node in a hierarchy between the upper master node and the lower master node, and the lower master node records the transactions being executed in time series. The transaction minimum values between the plurality of snapshots are compared, the smallest value is set as the node minimum value, and the node minimum value is notified to the intermediate master node as the tree minimum value. The node minimum value is compared with the tree minimum value notified from the one or more lower master nodes, and the lowest value is notified to the upper master node as the tree minimum value of the intermediate master node. Is a cluster minimum value selectively from its own node minimum value and the tree minimum value notified from the one or more intermediate master nodes. It is a management method of a write-once database of the first aspect, wherein the determining and holding.

また、本発明の第４形態は、前記中位マスタノードはさらに２階層以上のツリー構造を有している第３形態記載の追記型データベースの管理方法である。 The fourth aspect of the present invention is the write-once database management method according to the third aspect, wherein the intermediate master node further has a tree structure of two or more hierarchies.

また、本発明の第５形態、前記上位マスタノードが保持するクラスタ最小値の下位マスタノードへの通知は、前記トランザクションログの下位マスタノードへの通知とは非同期で行われる第１形態記載の追記型データベースの管理方法である。 Further, in the fifth aspect of the present invention, the notification to the lower master node of the cluster minimum value held by the upper master node is performed asynchronously with the notification to the lower master node of the transaction log. This is a type database management method.

この方法によれば、レプリケーション管理プロセスと非同期にクラスタ最小値を下位マスタノードに通知することにより、効率的なトランザクションＩＤの廃棄処理ができる。 According to this method, the transaction ID can be efficiently discarded by notifying the lower master node of the cluster minimum value asynchronously with the replication management process.

また、本発明の第６形態は、前記上位マスタノードが保持するクラスタ最小値の下位マスタノードへの通知は、前記トランザクションログに当該クラスタ最小値を含めて行われる第１形態記載の追記型データベースの管理方法である。 According to a sixth aspect of the present invention, the write-down type database according to the first aspect, wherein the notification of the cluster minimum value held by the upper master node to the lower master node is performed by including the cluster minimum value in the transaction log. It is a management method.

この方法によれば、レプリケーション管理プロセスの中にクラスタ最小値を含めることによって、通知順を強制することができるため、矛盾のないトランザクションＩＤの廃棄が可能になる。 According to this method, since the notification order can be forced by including the cluster minimum value in the replication management process, it is possible to discard the transaction ID without any contradiction.

また、本発明の第７形態は、前記書込セットは、下位マスタノードの代わりに中位マスタノードで生成され、少なくとも当該中位マスタノードで実行中のトランザクションが参照するスナップショットのトランザクション最小値が書き込まれている第３形態または第４形態記載の追記型データベースの管理方法である。 According to a seventh aspect of the present invention, the write set is generated at a middle master node instead of a lower master node, and at least a transaction minimum value of a snapshot referenced by a transaction being executed at the middle master node. Is a write-once database management method according to the third or fourth embodiment.

本発明の第８形態は、ノードを階層的に有する上位マスタノードにおける追記型データベースの管理方法であって、いずれかの下位マスタノードのセッションにおいて、当該下位マスタノードのデータベースのシャドウコピーと、下位マスタノードのメモリ上に展開されたヒープタプルマップと、実行中のトランザクションが参照するスナップショットのトランザクション最小値とを書込セットとして上位マスタノードに対して送信させるステップと、前記で受信した書込セット中のトランザクション最小値と上位マスタノードが保持するクラスタ最小値とを比較して、トランザクション最小値がクラスタ最小値と等しいか大きい場合には、当該書き込みセット中のヒープタプルマップと自身のデータベースとを比較して、ターゲットとして登録されているタプルのデータベースにおける更新の有無を検証し、更新がなされているときには当該書き込みセットをアボートし、更新がなされていないときには前記シャドウコピーを用いて自身のデータベースを更新するステップと、前記でトランザクション最小値がクラスタ最小値よりも小さい場合には、当該書き込みセットをアボートするステップと、前記上位マスタノードが保持するクラスタ最小値を下位マスタノードに通知して、該最小値よりも小さいトランザクション最小値を有するトランザクションを廃棄させるステップと、からなる下位マスタノードを階層的に有する上位マスタノードにおける追記型データベースの管理方法である。 An eighth embodiment of the present invention is a method for managing a write-once database in an upper master node having nodes hierarchically, and in any lower master node session, a shadow copy of the lower master node database, Transmitting the heap tuple map expanded in the memory of the master node and the transaction minimum value of the snapshot referenced by the executing transaction to the upper master node as a writing set, and the writing received in the above If the transaction minimum value in the set is compared with the cluster minimum value held by the upper master node and the transaction minimum value is equal to or greater than the cluster minimum value, the heap tuple map in the write set and its own database As a target Verifying whether there is an update in the database of recorded tuples, aborting the write set if an update has been made, updating the database using the shadow copy if no update has been made, and When the transaction minimum value is smaller than the cluster minimum value, the step of aborting the write set and the cluster minimum value held by the upper master node are notified to the lower master node, and the transaction smaller than the minimum value is transmitted. And a step of discarding a transaction having a minimum value, and a method for managing a write-once database in an upper master node having hierarchically lower master nodes.

本発明の第９形態は、レコード更新が可能な上位と下位のマスタノードを階層的に有する追記型データベースにおける下位マスタノードのデータベース管理方法であって、いずれかの下位マスタノードでデータベースへの更新命令が生じたときに、当該下位マスタノードのデータベース処理部が上位マスタノードに対して、自身のメモリ上に展開されたデータベースのシャドウコピーと、ヒープタプルマップと、実行中のトランザクションが参
照するスナップショットのトランザクション最小値とを書込セットとして生成する際に、それを受信する前記上位マスタノードに対して、前記下位マスタノードから受信した書込セット中のトランザクション最小値と上位マスタノードが保持するクラスタ最小値とを比較して、トランザクション最小値がクラスタ最小値と等しいか大きいときに前記シャドウコピーを用いて上位マスタノードのデータベースを更新させ、この更新記録をトランザクションログとして生成させ、前記トランザクションログを前記送信元の下位マスタノードを含む下位マスタノードに配信させるために、前記書込セットを上位マスタノードに送信するステップと、前記下位マスタノードのトランザクションログ処理部は、前記トランザクションログを受信したときに、当該トランザクションログに基づいて自身のデータベースを更新するステップと、下位マスタノードにおいて、前記上位マスタノードが保持するクラスタ最小値を受信して、該最小値よりも小さいトランザクション最小値を有するトランザクションを廃棄するステップと、からなる下位マスタノードのデータベースの管理方法である。 A ninth aspect of the present invention is a database management method for a lower master node in a write-once database that hierarchically has upper and lower master nodes capable of record update, and updates to the database at any lower master node When an instruction occurs, the database processing unit of the lower master node sends a shadow copy of the database expanded on its own memory to the upper master node, a heap tuple map, and a snap referenced by the transaction being executed. When the transaction minimum value of a shot is generated as a writing set, the transaction master value and the upper master node in the writing set received from the lower master node are held by the upper master node that receives the shot transaction minimum value. Compared to the cluster minimum value, the transaction minimum When the database is equal to or larger than the cluster minimum value, the shadow copy is used to update the database of the upper master node, and this update record is generated as a transaction log. A step of transmitting the writing set to a higher-order master node for distribution to a node, and a transaction log processing unit of the lower-order master node, when receiving the transaction log, based on the transaction log And a lower master node receiving a cluster minimum value held by the upper master node and discarding a transaction having a transaction minimum value smaller than the minimum value. It is a management method of soil database.

本発明の第１０形態は、前記下位マスタノードは、実行中のトランザクションを時系列に記録した複数のスナップショット間のトランザクション最小値同士を比較して最も小さい値をノード最小値とし、このノード最小値をツリー最小値として上位マスタノードに通知し、前記上位マスタノードは、自身のノード最小値と、前記１または２以上の下位マスタノードから通知されたツリー最小値とから選択的にクラスタ最小値を決定・保持する第８形態または第９形態記載の追記型データベースの管理方法である。 In the tenth aspect of the present invention, the lower master node compares the transaction minimum values between a plurality of snapshots in which transactions being executed in time series are recorded, and sets the smallest value as the node minimum value. The value is notified to the upper master node as a tree minimum value, and the upper master node selectively selects a cluster minimum value from its own node minimum value and the tree minimum value notified from the one or more lower master nodes. The write-once database management method according to the eighth aspect or the ninth aspect.

本発明の第１１形態は、前記上位マスタノードと前記下位マスタノードとの間の階層に中位マスタノードを有しており、前記下位マスタノードは、実行中のトランザクションを時系列に記録した複数のスナップショット間のトランザクション最小値同士を比較して最も小さい値をノード最小値とし、このノード最小値をツリー最小値として中位マスタノードに通知し、前記中位マスタノードは、自身のノード最小値と、前記１または２以上の下位マスタノードから通知されたツリー最小値とを比較して最も小さい値を当該中位マスタノードのツリー最小値として上位マスタノードに通知し、上位マスタノードは、自身のノード最小値と、前記１または２以上の中位マスタノードから通知されたツリー最小値とから選択的にクラスタ最小値を決定・保持する第８形態または第９形態記載の追記型データベースの管理方法である。 In an eleventh aspect of the present invention, a middle master node is provided in a hierarchy between the upper master node and the lower master node, and the lower master node is a plurality of time-sequential transactions recorded therein. The transaction minimum values between the snapshots of the snapshots are compared, the smallest value is set as the node minimum value, and this node minimum value is notified to the intermediate master node as the tree minimum value. The value is compared with the tree minimum value notified from the one or more lower master nodes, and the smallest value is notified to the upper master node as the tree minimum value of the intermediate master node. The cluster minimum value is selectively determined from its own node minimum value and the tree minimum value notified from the one or more intermediate master nodes. - a eighth form or how to manage the write-once database ninth embodiment according to retain.

本発明の第１２形態は、前記中位マスタノードはさらに２階層以上のツリー構造を有している第１１形態記載の追記型データベースの管理方法である。 A twelfth aspect of the present invention is the write-once database management method according to the eleventh aspect, wherein the intermediate master node further has a tree structure of two or more layers.

本発明の第１３形態は、前記上位マスタノードが保持するクラスタ最小値の下位マスタノードへの通知は、前記トランザクションログの下位マスタノードへの通知とは非同期で行われる第８形態記載の追記型データベースの管理方法である。 In a thirteenth aspect of the present invention, the notification to the lower master node of the cluster minimum value held by the upper master node is performed asynchronously with the notification to the lower master node of the transaction log. This is a database management method.

本発明の第１４形態は、前記上位マスタノードが保持するクラスタ最小値の下位マスタノードへの通知は、前記トランザクションログに当該クラスタ最小値を含めて行われる第８形態記載の追記型データベースの管理方法である。 In a fourteenth aspect of the present invention, the write-down database management according to the eighth aspect is performed, wherein the notification of the cluster minimum value held by the upper master node to the lower master node is performed by including the cluster minimum value in the transaction log. Is the method.

本発明の第１５形態は、前記書込セットは、下位マスタノードの代わりに中位マスタノードで生成され、少なくとも当該中位マスタノードで実行中のトランザクションが参照するスナップショットのトランザクション最小値が書き込まれている第１１形態または第１２形態記載の追記型データベースの管理方法である。 In a fifteenth aspect of the present invention, the write set is generated at a middle master node instead of a lower master node, and at least a transaction minimum value of a snapshot referenced by a transaction being executed at the middle master node is written. The write-once database management method according to the eleventh or twelfth embodiment.

本発明の第１６形態は、更新が可能な上位と１または２以上の下位のマスタノードを階層的に有する追記型データベースの管理方法であって、前記各下位マスタノードにおいて
、実行中のトランザクションが参照するスナップショットのトランザクション最小値同士を比較して最も小さい値であるノード最小値を前記上位マスタノードに送信するステップと、前記各下位マスタノードからのノード最小値を受信した上位マスタノードが、自身のノード最小値を読み出して前記で受信したノード最小値と比較して、これらの各ノード最小値の範囲内で上位マスタノードが保持するクラスタ最小値を決定・更新するステップと、前記上位マスタノードで更新されたクラスタ最小値を各下位マスタノードに通知して、該最小値よりも小さいトランザクション最小値を有するトランザクションを廃棄させるステップと、からなるマスタノードを階層的に有する追記型データベースの管理方法である。 A sixteenth aspect of the present invention is a write-once database management method that hierarchically has an upper level that can be updated and one or more lower level master nodes, and in each lower level master node, a transaction being executed is Comparing the transaction minimum values of the snapshots to be referenced with each other and transmitting a node minimum value which is the smallest value to the upper master node; and the upper master node receiving the node minimum value from each of the lower master nodes, Reading the node minimum value of the node and comparing it with the node minimum value received in the above, determining and updating the cluster minimum value held by the upper master node within the range of the respective node minimum values, and the upper master The cluster minimum value updated in the node is notified to each subordinate master node, and the transaction is smaller than the minimum value. A step of discarding a transaction with Deployment minimum, a management method of a write-once database having a master node in a hierarchical consisting.

本発明の第１７形態は、前記上位マスタノードと下位マスタノードとの２階層である第１６形態記載のマスタノードを階層的に有する追記型データベースの管理方法である。 A seventeenth aspect of the present invention is a method for managing a write-once database that has hierarchically the master nodes according to the sixteenth aspect, which are two layers of the upper master node and the lower master node.

本発明の第１８形態は、前記上位マスタノードと、１階層以上の中位マスタノードと、下位マスタノードとからなる少なくとも３階層以上であり、前記中位マスタノードも自身のノード最小値を前記上位マスタノードに送信するステップを実行するとともに、上位マスタノードは、前記各下位マスタノードと中位マスタノードからのノード最小値を受信して、自身のノード最小値を読み出して前記で受信したノード最小値と比較して、これらの各ノード最小値の範囲内で上位マスタノードが保持するクラスタ最小値を決定・更新するステップを実行する第１６形態記載のマスタノードを階層的に有する追記型データベースの管理方法である。 According to an eighteenth aspect of the present invention, there are at least three or more hierarchies including the upper master node, a middle master node of one or more hierarchies, and a lower master node, and the intermediate master node also has its own node minimum value. The upper master node executes the step of transmitting to the upper master node, and the upper master node receives the node minimum value from each of the lower master node and the middle master node, reads out its own node minimum value, and received the node A write-once database that hierarchically has master nodes according to the sixteenth mode for executing a step of determining / updating a cluster minimum value held by a higher-order master node within a range of each node minimum value in comparison with the minimum value It is a management method.

本発明の第１９形態は、レコード更新が可能な上位と下位のマスタノードを階層的に有する追記型データベースの管理方法であって、いずれかの下位マスタノードのセッションにおいて、上位マスタノードに対して、当該下位のマスタノードのデータベースのシャドウコピーと、自身のメモリ上に展開されたヒープタプルマップと、実行中のトランザクションが参照するスナップショットのトランザクション最小値と、ノード内のトランザクション最小値同士を比較して、最も小さい値であるノード最小値とを書き込みセットとして送信するステップと、前記上位マスタノードおいて、前記各書込セットから読み出したノード最小値と、自身が保持するノード最小値と比較して、これらの各ノード最小値の範囲内の値で上位マスタノードが保持するクラスタ最小値を決定・更新するステップと、前記上位マスタノードにおいて、前記下位マスタノードから受信した書き込みセット中のトランザクション最小値と上位マスタノードが保持するクラスタ最小値とを比較して、トランザクション最小値がクラスタ最小値と等しいか大きい場合には、当該書き込みセット中のヒープタプルマップと自身のデータベースとを比較して、ターゲットとして登録されているタプルのデータベースにおける更新の有無を検証し、更新がなされているときには当該書き込みセットをアボートし、更新がなされていないときには前記シャドウコピーを用いて自身のデータベースを更新するとともに、この更新記録をトランザクションログとして生成するステップと、前記でトランザクション最小値がクラスタ最小値よりも小さい場合には、当該書き込みセットをアボートするステップと、前記トランザクションログを前記送信元の下位マスタノードを含む下位マスタノードに配信するステップと、前記下位マスタノードにおいて、前記で受信したトランザクションログに基づいて自身のデータベースを更新するステップと、前記上位マスタノードが保持するクラスタ最小値を下位マスタノードに通知して、該最小値よりも小さいトランザクション最小値を有するトランザクションを廃棄させるステップと、からなるマスタノードを階層的に有する追記型データベースの管理方法である。 According to a nineteenth aspect of the present invention, there is provided a write-once database management method having hierarchically upper and lower master nodes capable of updating records, and in any one of the lower master nodes, Compare the shadow copy of the database of the lower master node, the heap tuple map expanded in its own memory, the transaction minimum value of the snapshot referenced by the transaction being executed, and the transaction minimum value in the node. A node minimum value that is the smallest value is transmitted as a writing set, and the upper master node compares the node minimum value read from each writing set with the node minimum value held by itself. Therefore, the upper master node maintains a value within the range of the minimum value of each node. Determining and updating the cluster minimum value to be performed, and comparing the transaction minimum value in the write set received from the lower master node with the cluster minimum value held by the upper master node in the upper master node, If the value is equal to or greater than the cluster minimum value, the heap tuple map in the write set is compared with its own database to verify whether there is an update in the database of the tuple registered as the target. When the update is not performed, the write set is aborted. When the update is not performed, the shadow copy is used to update its own database, and this update record is generated as a transaction log. If it is smaller than the minimum value, the step of aborting the write set, the step of distributing the transaction log to a lower level master node including the lower level master node of the transmission source, and receiving at the lower level master node Updating its own database based on the transaction log, and notifying the lower master node of the cluster minimum value held by the upper master node, and discarding a transaction having a transaction minimum value smaller than the minimum value. Is a method for managing a write-once database having a hierarchical master node.

本発明の第２０形態は前記上位マスタノードと下位マスタノードとの間に、１階層以上の中位マスタノードを有しており、いずれかの中位マスタノードのセッションにおいて、上位マスタノードに対して、当該中位マスタノードのデータベースのシャドウコピーと、
自身のメモリ上に展開されたヒープタプルマップと、実行中のトランザクションが参照するスナップショットのトランザクション最小値と、ノード内のトランザクション最小値同士を比較して、最も小さい値であるノード最小値を書き込みセットとして送信するステップとをさらに含む第１９形態記載のマスタノードを階層的に有する追記型データベースの管理方法である。 In a twentieth aspect of the present invention, there is an intermediate master node having one or more hierarchies between the upper master node and the lower master node. In any of the intermediate master node sessions, The shadow copy of the database on the intermediate master node,
Compares the heap tuple map expanded in its own memory, the transaction minimum value of the snapshot referenced by the running transaction, and the transaction minimum value in the node, and writes the node minimum value which is the smallest value A write-once database management method that hierarchically includes master nodes according to the nineteenth aspect, further including a step of transmitting as a set.

本発明の第２１形態は、更新が可能な上位と下位のマスタノードを階層的に有する追記型データベースの管理方法であって、いずれかの下位マスタノードのセッションにおいて、実行中のトランザクションが参照するスナップショットのトランザクション最小値を含む下位データベース更新情報を上位マスタノードに送信するステップと、前記下位データベース更新情報を受信した上位マスタノードにおいて、前記トランザクション最小値と上位マスタノードが保持するクラスタ最小値とを比較して、トランザクション最小値がクラスタ最小値と等しいか大きい場合には、前記下位データベース更新情報に基づいて上位マスタノードのデータベースを更新するステップと、前記でトランザクション最小値がクラスタ最小値よりも小さい場合には、当該下位データベース更新情報をアボートするステップと、前記上位マスタノードの更新記録をトランザクションログとして生成するステップと、前記トランザクションログを前記送信元の下位マスタノードを含む下位マスタノードに配信するステップと、前記下位マスタノードにおいて、前記で受信したトランザクションログに基づいて自身のデータベースを更新するステップと、前記上位マスタノードが保持するクラスタ最小値を下位マスタノードに通知して、該最小値よりも小さいトランザクション最小値を有するトランザクションを廃棄させるステップと、からなる下位マスタノードを階層的に有する上位マスタノードにおける追記型データベースの管理方法である。 The twenty-first aspect of the present invention is a write-once database management method that hierarchically has upper and lower master nodes that can be updated, and refers to a transaction being executed in a session of any lower master node. Transmitting the lower database update information including the transaction minimum value of the snapshot to the upper master node; and in the upper master node receiving the lower database update information, the transaction minimum value and the cluster minimum value held by the upper master node; If the transaction minimum value is equal to or greater than the cluster minimum value, the step of updating the database of the upper master node based on the lower database update information, and the transaction minimum value is greater than the cluster minimum value If small A step of aborting the subordinate database update information, a step of generating an update record of the upper master node as a transaction log, and a step of distributing the transaction log to a lower master node including the lower master node of the transmission source, The lower master node updates its own database based on the transaction log received above, and notifies the lower master node of the cluster minimum value held by the upper master node, and a transaction smaller than the minimum value. And a step of discarding a transaction having a minimum value, and a method for managing a write-once database in an upper master node having hierarchically lower master nodes.

本発明の第２２形態は、前記下位データベース更新情報は、下位マスタノードのデータベースのシャドウコピーと、自身のメモリ上に展開されたヒープタプルマップと、実行中のトランザクションが参照するスナップショットのトランザクション最小値とからなる書込セットであり、上位マスタノードが前記書込セットを受信すると、トランザクション最小値と上位マスタノードが保持するクラスタ最小値とを比較して、トランザクション最小値がクラスタ最小値と等しいか大きい場合には、当該書き込みセット中のヒープタプルマップと自身のデータベースとを比較して、ターゲットとして登録されているタプルのデータベースにおける更新の有無を検証し、更新がなされているときには当該書込セットをアボートし、更新がなされていないときには前記シャドウコピーを用いて自身のデータベースを更新するステップと、前記でトランザクション最小値がクラスタ最小値よりも小さい場合には、当該書き込みセットをアボートするステップと、からなる第２１形態記載の追記型データベースの管理方法である。 In the twenty-second aspect of the present invention, the lower database update information includes a shadow copy of a database of a lower master node, a heap tuple map developed in its own memory, and a minimum transaction of a snapshot referenced by an executing transaction. When the upper master node receives the write set, the transaction minimum value is equal to the cluster minimum value by comparing the transaction minimum value with the cluster minimum value held by the upper master node. If it is larger, the heap tuple map in the writing set is compared with its own database to check whether there is an update in the database of the tuple registered as a target. Abort the set and have not been updated 21. The write-once type according to the twenty-first aspect, which includes the steps of updating the own database using the shadow copy, and aborting the write set when the transaction minimum value is smaller than the cluster minimum value. This is a database management method.

本発明の第２３形態は、上記に加えて、上位と下位のマスタノードの間に１階層以上の中位マスタノードを有しており、いずれかの中位マスタノードのセッションにおいて、上位マスタノードに対して、当該中位マスタノードのデータベースの更新対象となったトランザクション最小値を含む中位データベース更新情報を上位マスタノードに送信するステップと、前記中位データベース更新情報を受信した上位マスタノードにおいて、前記トランザクション最小値と上位マスタノードが保持するクラスタ最小値とを比較して、トランザクション最小値がクラスタ最小値と等しいか大きい場合には、前記中位データベース更新情報に基づいて上位マスタノードのデータベースを更新するステップと、を含む第２１形態または第２２形態記載の追記型データベースの管理方法である。 In the twenty-third mode of the present invention, in addition to the above, there is a middle master node of one or more layers between the upper and lower master nodes. In any of the middle master node sessions, the upper master node On the other hand, in the upper master node that has received the intermediate database update information, the step of transmitting the intermediate database update information including the minimum transaction value to be updated of the database of the intermediate master node to the upper master node; If the transaction minimum value is equal to or greater than the cluster minimum value by comparing the transaction minimum value with the cluster minimum value held by the upper master node, the database of the upper master node is determined based on the intermediate database update information. And updating the device according to the twenty-first or twenty-second embodiment. It is a management method of the type database.

本発明によれば、マルチマスタノード形式のデータベースにおいても、データベース全体の整合性を崩すことなく、より大きな最小値ＩＤレベルでツリー全体の完全消去を実現することができる。 According to the present invention, even in a multi-master node type database, complete erasure of the entire tree can be realized with a larger minimum value ID level without destroying the consistency of the entire database.

本願発明のマルチノード形式のデータベース構造を示す説明図Explanatory drawing which shows the database structure of the multi-node format of the present invention データベース制御部の構成を示すブロック図Block diagram showing the configuration of the database controller 書込みセットの内容を示す図Diagram showing the contents of a writing set 上位マスタノードにおける書込みセットの処理を示す説明図Explanatory drawing showing processing of write set in the upper master node トランザクションログデータの構成を示す説明図Explanatory diagram showing the structure of transaction log data マルチノード形式のデータベースツリー構造とトランザクション最小値とツリー最小値とノード最小値との関係を示す図Diagram showing the relationship between multi-node database tree structure, transaction minimum value, tree minimum value and node minimum value クラスタ最小値の更新と、そのクラスタ最小値を用いたトランザクションの廃棄処理の説明図Explanatory diagram of update of cluster minimum value and transaction discard processing using that cluster minimum value スナップショットの内容を示す図Figure showing snapshot contents 書込セットの変形例を示す図The figure which shows the modification of writing set

以下、本発明の実施の形態を図面を参照して説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図１は、本実施形態の階層的マスタノードの構造を示している。同図に示すように、上位マスタノード（ＭＳ１０１）の下に階層的に中位マスタノード（ＭＳ２０１，ＭＳ２０２・・・ＭＳ２０ｎ）や、下位マスタノード（ＭＳ３０１，ＭＳ３０２・・・ＭＳ３０ｎ）を有するノード構成となっている。各ノード（情報処理装置）にはデータベースを有している。また上位マスタノード（ＭＳ１０１）にはスレーブを有しているが、他の下位マスタノードにもスレーブを有していてもよい。このようなマスタ・スレーブ構成の場合には両者間のデータベースの更新には本出願人によるＰＣＴ／ＪＰ２０１０／０５４３１１（本出願人による未公開先行出願）に記載された更新管理技術を適用することができる。 FIG. 1 shows the structure of the hierarchical master node of this embodiment. As shown in the figure, a node configuration having a middle-order master node (MS201, MS202... MS20n) and a lower-order master node (MS301, MS302... MS30n) hierarchically under the upper master node (MS101). It has become. Each node (information processing apparatus) has a database. The upper master node (MS101) has a slave, but other lower master nodes may have slaves. In the case of such a master / slave configuration, the update management technique described in PCT / JP2010 / 054311 by the present applicant (unpublished prior application by the present applicant) may be applied to update the database between the two. it can.

図２は、下位マスタノード（ＭＳ２０１）の機能ブロック図であるが、上位マスタノード（ＭＳ１０１）も同様の機能を有している。
同図に示すように、クライアント（ＣＬ）からデータベースの更新命令が入力されるとデータベース処理部（１１ｂ）は、メモリ（ＭＭ）上に構築されたバックエンドメモリ（ＢＥＭ）上に下位データベース更新情報を示す書込セットを生成する。この書込セットは図４に示すようにヒープタプルマップ（ＨＴＭ）とシャドウコピー（ＳＣ）とトランザクション最小値（Ｘｍｉｎ）とで構成される。ここでは、マスタデータベース（１０１ａ）の行番号４を削除（ＤＥＬＥＴＥ）し、行番号５を新たな値（ｓｃ１）に書き換える（ＵＰＤＡＴＥ）する更新命令が入力されたものと仮定する。 FIG. 2 is a functional block diagram of the lower master node (MS201), but the upper master node (MS101) has the same function.
As shown in the figure, when a database update command is input from the client (CL), the database processing unit (11b) performs lower-level database update information on the back-end memory (BEM) constructed on the memory (MM). Is generated. As shown in FIG. 4, this writing set includes a heap tuple map (HTM), a shadow copy (SC), and a transaction minimum value (Xmin). Here, it is assumed that an update command for deleting (DELETE) row number 4 from the master database (101a) and rewriting (UPDATE) row number 5 to a new value (sc1) is input.

このとき、データベース処理部１１ｂは、マスタデータベース（１１ａ）を参照しながら当該マスタデータベース（１１ａ）に直接書き込むことは行わずに、バックエンドメモリ（ＢＥＭ）で生成された書込セットを通信モジュール（１１ｄ）より上位マスタに送信する。
このような処理は上位マスタノード（ＭＳ１０１）においても、下位マスタノード（ＭＳ２０１，ＭＳ２０２・・・ＭＳ２０ｎや、ＭＳ３０１，ＭＳ３０２・・・ＭＳ３０ｎ）においても同様である。 At this time, the database processing unit 11b does not write directly to the master database (11a) while referring to the master database (11a), but stores the write set generated in the back-end memory (BEM) as a communication module ( 11d) to the higher master.
Such processing is the same in the upper master node (MS101) and the lower master nodes (MS201, MS202... MS20n, MS301, MS302... MS30n).

図８は、各ノード（上位ノード、中位ノードまたは下位ノード）におけるトランザクションとそのスナップショットとの関係を示している。各ノードでは複数のトランザクションが並行して実行されており、複数のそれぞれの時点での時系列にトランザクションの実行状態を示すスナップショットが複数保持されている。 FIG. 8 shows the relationship between a transaction and its snapshot in each node (upper node, middle node or lower node). A plurality of transactions are executed in parallel at each node, and a plurality of snapshots indicating the execution state of the transactions in time series at a plurality of respective time points are held.

同図においてこのノードでは、トランザクションＩＤ＝２，４，５，６および９が実行中である。また、トランザクションＩＤ＝１，３，７，８および１０はコミットされている。また、トランザクションＩＤ＝１１は未実行状態である。この時点でのスナップショットには、実行中またはコミットされたトランザクションの最大値をトランザクション最大値（ここではＸｍａｘ＝１０）として、実行中のトランザクションの最小値をトランザクション最小値（ここではＸｍｉｎ＝２）として記録されている。 In this figure, transaction ID = 2, 4, 5, 6 and 9 are being executed in this node. Transaction IDs = 1, 3, 7, 8, and 10 are committed. Transaction ID = 11 is in an unexecuted state. In the snapshot at this time, the maximum value of the transaction being executed or committed is set as the transaction maximum value (here, Xmax = 10), and the minimum value of the transaction being executed is set as the transaction minimum value (here, Xmin = 2). It is recorded as.

このようなスナップショットは、前述のバックエンドメモリ（ＢＥＭ）に記録されるようになっている。
これらのスナップショット間の複数のトランザクション最小値の中で、最も小さい値がノード最小値となる。同図では、スナップショット７０１のトランザクション最小値が２（Ｘｍｉｎ＝２）であり、スナップショット７０２のトランザクション最小値が４（Ｘｍｉｎ＝４）であるので、ノード最小値は２（Ｎｍｉｎ＝２）となる。なお、下位マスタノードにさらに下位のノードが無い場合には、当該ノード最小値（Ｎｍｉｎ＝２）がツリー最小値（Ｔｍｉｎ＝２）となる。 Such a snapshot is recorded in the aforementioned back-end memory (BEM).
Among the plurality of transaction minimum values between these snapshots, the smallest value is the node minimum value. In the figure, since the transaction minimum value of the snapshot 701 is 2 (Xmin = 2) and the transaction minimum value of the snapshot 702 is 4 (Xmin = 4), the node minimum value is 2 (Nmin = 2). Become. When there is no lower node in the lower master node, the node minimum value (Nmin = 2) becomes the tree minimum value (Tmin = 2).

下位マスタノード（たとえばＭＳ３０１）で生成されたノード最小値（Ｎｍｉｎ）、すなわちツリー最小値（Ｔｍｉｎ）は、後述する書込セットとは非同期で随時中位ノード（たとえばＭＳ２０１）に通知される。この中位マスタノード（ＭＳ２０１）ではそれぞれの下位マスタノード（ＭＳ３０１−ＭＳ３０ｎ）から受信したツリー最小値（Ｔｍｉｎ）と自身のスナップショット中のノード最小値（Ｎｍｉｎ）とを比較して、より小さい値を自身のツリー最小値（Ｔｍｉｎ）として更新する。 The node minimum value (Nmin) generated by the lower-level master node (for example, MS 301), that is, the tree minimum value (Tmin) is notified to the middle level node (for example, MS 201) as needed asynchronously with the writing set described later. The middle master node (MS201) compares the tree minimum value (Tmin) received from each lower master node (MS301-MS30n) with the node minimum value (Nmin) in its own snapshot to obtain a smaller value. Is updated as its own tree minimum value (Tmin).

次に、それぞれの中位マスタノード（ＭＳ２０１−ＭＳ２０ｎ）のツリー最小値（Ｔｍｉｎ）は上位マスタノード（ＭＳ１０１）に通知される。
上位マスタノード（ＭＳ１０１）では、それぞれの中位マスタノード（ＭＳ２０１−ＭＳ２０ｎ）から受信したツリー最小値（Ｔｍｉｎ）を受信すると、自身のスナップショット中のノード最小値（Ｎｍｉｎ）とを比較して、いずれかの最小値またはその最小値間の値を採用して自身のツリー最小値（Ｔｍｉｎ）を更新する。そして、このツリー最小値（Ｔｍｉｎ）とその時点でのクラスタ最小値（Ｃｍｉｎ）を比較して、その間の値をとって新たなクラスタ最小値（Ｃｍｉｎ）を決定・更新する。 Next, the minimum tree value (Tmin) of each middle master node (MS201-MS20n) is notified to the upper master node (MS101).
When the upper master node (MS101) receives the tree minimum value (Tmin) received from each middle master node (MS201-MS20n), it compares the node minimum value (Nmin) in its own snapshot, Any minimum value or a value between the minimum values is adopted to update its own tree minimum value (Tmin). Then, the tree minimum value (Tmin) is compared with the cluster minimum value (Cmin) at that time, and a value between them is determined and a new cluster minimum value (Cmin) is determined and updated.

次に、図６を用いて下位マスタノードＥ，Ｆのノード最小値（Ｎｍｉｎ）すなわちツリー最小値（Ｔｍｉｎ）、中位マスタノードＢ，Ｃ，Ｄ，Ｅのツリー最小値（Ｔｍｉｎ）および上位マスタノードＡのツリー最小値（Ｔｍｉｎ）とクラスタ最小値（Ｃｍｉｎ）とが更新される状態を説明する。 Next, referring to FIG. 6, the node minimum value (Nmin) of the lower master nodes E and F, that is, the tree minimum value (Tmin), the tree minimum value (Tmin) of the middle master nodes B, C, D, and E, and the upper master A state where the tree minimum value (Tmin) and the cluster minimum value (Cmin) of the node A are updated will be described.

ここで、下位マスタノードＦのトランザクション最小値（Ｘｍｉｎ）同士を比較した結果、ノード最小値（Ｎｍｉｎ）が３、下位マスタノードＧの最小値（Ｎｍｉｎ）が４の場合、これらの下位マスタノードＦ，Ｇにはさらに下の階層が無いので、これらのノード最小値（Ｎｍｉｎ）がそれぞれのツリー最小値（Ｔｍｉｎ）となる。 Here, as a result of comparing the transaction minimum values (Xmin) of the lower master nodes F, when the node minimum value (Nmin) is 3 and the minimum value (Nmin) of the lower master node G is 4, these lower master nodes F , G have no further lower layer, and these node minimum values (Nmin) become the respective tree minimum values (Tmin).

これらのツリー最小値（Ｔｍｉｎ＝３，Ｔｍｉｎ＝４）がそれぞれ中位マスタノードＤに通知される。
中位マスタノードＤでは、ノード最小値として４を有しており、これらが比較されて最も低い値（ここでは３）が中位ノードＤのツリー最小値として設定される（Ｔｍｉｎ＝３）。 These minimum tree values (Tmin = 3, Tmin = 4) are notified to the middle master node D, respectively.
The intermediate master node D has 4 as the node minimum value, and these are compared and the lowest value (3 in this case) is set as the tree minimum value of the intermediate node D (Tmin = 3).

一方、中位マスタノードＥでは、ノード最小値（Ｎｍｉｎ）が５であり、このノードは下位マスタノードを有していないため、当該ノード最小値（Ｎｍｉｎ＝５）がツリー最小
値（Ｔｍｉｎ＝５）として設定される。 On the other hand, in the intermediate master node E, the node minimum value (Nmin) is 5, and since this node has no lower-level master node, the node minimum value (Nmin = 5) is the tree minimum value (Tmin = 5). ) Is set.

中位マスタノードＣでは、中位マスタノードＤからのツリー最小値（Ｔｍｉｎ＝３）と中位マスタノードＥからのツリー最小値（Ｔｍｉｎ＝５）とを受信する。一方、中位マスタノードＣでは自身のノード最小値（Ｎｍｉｎ）として４を有しており、これらを比較して最も小さい値である３を自身のツリー最小値（Ｔｍｉｎ＝３）として更新する。そしてこのツリー最小値は上位マスタノードＡに通知される。 The middle master node C receives the tree minimum value (Tmin = 3) from the middle master node D and the tree minimum value (Tmin = 5) from the middle master node E. On the other hand, the intermediate master node C has 4 as its own node minimum value (Nmin), and compares them to update the smallest value of 3 as its own tree minimum value (Tmin = 3). The minimum tree value is notified to the upper master node A.

一方、同図では、前記中位マスタノードＣと並列な中位マスタノードＢが存在しているが、この中位マスタノードＢには下の階層のマスタノードが無いため、自身のノード最小値（Ｎｍｉｎ＝６）がそのままツリー最小値（Ｔｍｉｎ＝６）として上位マスタノードＡに通知される。 On the other hand, in the figure, there is a middle master node B in parallel with the middle master node C, but since this middle master node B has no master node in the lower hierarchy, its own node minimum value (Nmin = 6) is directly notified to the upper master node A as the tree minimum value (Tmin = 6).

上位マスタノードＡでは、中位マスタノードＢからのツリー最小値（Ｔｍｉｎ＝６）と、中位マスタノードＣからのツリー最小値（Ｔｍｉｎ＝３）とを受信して、自身のノード最小値（Ｎｍｉｎ＝７）と比較して、これらの中で最も小さい値を上位マスタノードＡにおけるツリー最小値（Ｔｍｉｎ＝３）として更新する。 The upper master node A receives the tree minimum value (Tmin = 6) from the middle master node B and the tree minimum value (Tmin = 3) from the middle master node C, and receives its own node minimum value (Tmin = 3). Compared with Nmin = 7), the smallest value among them is updated as the tree minimum value (Tmin = 3) in the upper master node A.

そしてマスタノードＡでは、自身のノード最小値（Ｎｍｉｎ＝７）と、更新されたツリー最小値（Ｔｍｉｎ＝３）との間の数値をクラスタ最小値（Ｃｍｉｎ）として設定する。
このときのクラスタ最小値（Ｃｍｉｎ）の決定基準としては、クラスタ最小値をより小さい値（たとえばＣｍｉｎ＝３）とすれば、全てのノードのスナップショット上のトランザクション最小値が３以上になるので、下位・中位マスタノードで生成される書込セットが上位マスタノードに到達したときにアボートされる確率は低くなり、無駄になる書込セットが少なくなり各ノードでの負荷は低減される。一方、完全消去の基準が小さい値となってしまうために、ノード全体の完全消去による効率的運用は図りにくくなる。 The master node A sets a numerical value between its own node minimum value (Nmin = 7) and the updated tree minimum value (Tmin = 3) as the cluster minimum value (Cmin).
As a criterion for determining the cluster minimum value (Cmin) at this time, if the cluster minimum value is set to a smaller value (for example, Cmin = 3), the transaction minimum value on the snapshots of all nodes becomes 3 or more. The probability that a write set generated in the lower / middle master node will be aborted when it reaches the upper master node is reduced, and the write set that is wasted is reduced, and the load on each node is reduced. On the other hand, since the standard for complete erasure becomes a small value, it is difficult to achieve efficient operation by complete erasure of the entire node.

それとは逆に、クラスタ最小値（Ｃｍｉｎ）をより大きな値（たとえばＣｍｉｎ＝７）とすれば、完全消去の効率は良くなるが、中位・下位マスタノードで生成された書込セットが上位マスタノードに到着したときにアボートされる確率が高くなり、中位・下位マスタノードにおける書込セットの負荷は大きくなる。 On the other hand, if the cluster minimum value (Cmin) is set to a larger value (for example, Cmin = 7), the efficiency of complete erasure is improved, but the write set generated at the middle / lower master node is the higher master. The probability of being aborted when arriving at a node increases, and the load of the write set at the middle / lower master node increases.

このようなクラスタ最小値（Ｃｍｉｎ）の設定方法は、あらかじめノード最小値（Ｎｍｉｎ）とツリー最小値（Ｔｍｉｎ）との規定差分値を定めておき、この規定差分値を越えた場合には強制的に規定差分内となるような基準値に制御してもよい。 Such a cluster minimum value (Cmin) setting method is such that a prescribed difference value between a node minimum value (Nmin) and a tree minimum value (Tmin) is determined in advance, and if this prescribed difference value is exceeded, it is compulsory. The reference value may be controlled to be within the specified difference.

このようにクラスタ最小値（Ｃｍｉｎ）が設定されると、このクラスタ最小値（Ｃｍｉｎ）に基づいてタプルの完全消去が行われる。つまり、上位マスタノードにおいて、このクラスタ最小値（Ｃｍｉｎ）より小さいトランザクションＩＤを持つトランザクションによって削除されたタプルを完全消去することができる。そして上位マスタノードで完全消去されたタプルは、以下で説明するトランザクションの廃棄と書き込みセットのアボート処理とによって、中位・下位マスタノードでも安全に完全消去できることになる。 When the cluster minimum value (Cmin) is set in this way, tuples are completely erased based on the cluster minimum value (Cmin). That is, in the upper master node, tuples deleted by a transaction having a transaction ID smaller than the cluster minimum value (Cmin) can be completely deleted. The tuples completely erased by the upper master node can be safely and completely erased by the middle and lower master nodes by discarding the transaction and aborting the write set described below.

なお、以上の説明では、中位マスタノードＢ−Ｅまたは下位マスタノードＦ，Ｇがノード最小値（Ｎｍｉｎ）同士を比較してツリー最小値を更新しながら上位マスタノードＡに通知される例で説明したが、これに限定されることなく、各中位マスタノードＢ−Ｅまたは下位マスタノードＦ，Ｇがそれぞれ自身のノード最小値（Ｎｍｉｎ）を直接上位マスタノードＡに通知するようにしてもよい。上位マスタノードＡはこのようにして通知されたノード最小値（Ｎｍｉｎ）を自身が保持するノード最小値（Ｎｍｉｎ）と比較して、これらの範囲内で新たな値を決定してそれをクラスタ最小値（Ｃｍｉｎ）として更新すること
ができる。 In the above description, the middle master node BE or the lower master nodes F and G are notified to the upper master node A while comparing the node minimum values (Nmin) with each other and updating the tree minimum value. Although described, the present invention is not limited to this, and each middle master node BE or lower master nodes F and G may directly notify the upper master node A of its own node minimum value (Nmin). Good. The upper master node A compares the node minimum value (Nmin) notified in this way with the node minimum value (Nmin) held by itself, determines a new value within these ranges, and determines it as the cluster minimum value. It can be updated as a value (Cmin).

このように各中位・下位マスタノードＢ−Ｇがそれぞれのノード最小値（Ｎｍｉｎ）を直接上位マスタノードＡに通知することにより、各階層でツリー最小値（Ｔｍｉｎ）を更新しながら上位マスタノードＡに通知する方式よりも複雑な処理が必要なく通知システムを簡略化できる。
さらに、中位マスタノードＢ−Ｅまたは下位マスタノードＦ，Ｇのそれぞれで書込セットを生成するときにヒープタプルマップ（ＨＴＭ）、シャドウコピー（ＳＣ）、トランザクション最小値（Ｘｍｉｎ）とともにノード最小値（Ｎｍｉｎ）を登録させるようにしてもよい（図９参照）。このように書込セット中にノード最小値（Ｎｍｉｎ）を登録して上位マスタノードＡに通知させることによって、上位マスタノードＡに対して書込セット以外の別の通知システムが不要となりシステムを簡略化できる。 In this way, each middle / lower master node BG directly notifies the upper master node A of the node minimum value (Nmin), thereby updating the tree minimum value (Tmin) in each hierarchy. A notification system can be simplified without the need for more complicated processing than the method of notifying A.
Further, when the write set is generated in each of the intermediate master node BE or the lower master nodes F and G, the node minimum value together with the heap tuple map (HTM), shadow copy (SC), and transaction minimum value (Xmin). (Nmin) may be registered (see FIG. 9). By registering the node minimum value (Nmin) during the writing set and notifying the higher master node A in this way, another notification system other than the writing set is not required for the higher master node A, and the system is simplified. Can be

この場合、上位マスタノードＡは、各中位マスタノードＢ−Ｅおよび下位マスタノードＦ，Ｇから直接送信されたそれぞれの書込セットからノード最小値（Ｎｍｉｎ）を読み出して、自身が保持するノード最小値（Ｎｍｉｎ）と比較して、これらの範囲内で新たな値を決定してそれをクラスタ最小値（Ｃｍｉｎ）として更新してもよい。 In this case, the upper master node A reads the node minimum value (Nmin) from each writing set directly transmitted from each middle master node BE and the lower master nodes F and G, and holds the node Compared to the minimum value (Nmin), a new value may be determined within these ranges and updated as the cluster minimum value (Cmin).

上位マスタノードＡにおいて、以上のようにクラスタ最小値（Ｃｍｉｎ）を更新した後は、前述と同じ処理が行われる。すなわち、前記上位マスタノードＡにおいて、前記中位マスタノードＢ−Ｅまたは下位マスタノードＦ，Ｇから受信した書き込みセット中のトランザクション最小値（Ｘｍｉｎ）と上位マスタノードＡが保持するクラスタ最小値（Ｃｍｉｎ）とを比較して、トランザクション最小値（Ｘｍｉｎ）がクラスタ最小値（Ｃｍｉｎ）と等しいか大きい場合には、当該書き込みセット中のヒープタプルマップ（ＨＴＭ）と自身のデータベースとを比較して、ターゲットとして登録されているタプルのデータベースにおける更新の有無を検証し、更新がなされているときには当該書き込みセットをアボートし、更新がなされていないときには前記シャドウコピー（ＳＣ）を用いて自身のデータベースを更新するとともに、この更新記録をトランザクションログとして生成する。一方、トランザクション最小値（Ｘｍｉｎ）がクラスタ最小値（Ｃｍｉｎ）よりも小さい場合には、当該書き込みセットをアボートする。 In the upper master node A, after updating the cluster minimum value (Cmin) as described above, the same processing as described above is performed. That is, in the upper master node A, the transaction minimum value (Xmin) in the write set received from the intermediate master node BE or the lower master nodes F and G and the cluster minimum value (Cmin) held by the upper master node A ) And the transaction minimum value (Xmin) is equal to or larger than the cluster minimum value (Cmin), the heap tuple map (HTM) in the write set is compared with its own database, and the target Is updated in the database of tuples registered as, and when the update is made, the write set is aborted, and when the update is not made, the own database is updated using the shadow copy (SC). Along with this transaction record, Produced as a log. On the other hand, when the transaction minimum value (Xmin) is smaller than the cluster minimum value (Cmin), the write set is aborted.

上位マスタノードＡで設定・更新されたクラスタ最小値（たとえばＣｍｉｎ＝６）は、図７に示すように、中位マスタノードＢ，Ｃ，Ｄ，Ｅおよび下位マスタノードＦ，Ｇに通知される。このクラスタ最小値（Ｃｍｉｎ＝６）を受信した各ノードでは、この値（Ｃｍｉｎ＝６）よりも小さいトランザクション最小値を持つスナップショットを参照しているトランザクションを全て廃棄する。これによって、各ノードでのトランザクション最小値（Ｘｍｉｎ）、ノード最小値（Ｎｍｉｎ）、それらに基づくツリー最小値（Ｔｍｉｎ）は必ず６よりも大きな値をとる。したがって、クラスタ最小値（Ｃｍｉｎ）の通知後は完全消去されたタプルを参照するトランザクションは存在しなくなる。 The minimum cluster value (for example, Cmin = 6) set and updated by the upper master node A is notified to the middle master nodes B, C, D, E and the lower master nodes F, G as shown in FIG. . Each node that has received this cluster minimum value (Cmin = 6) discards all transactions that refer to a snapshot having a transaction minimum value smaller than this value (Cmin = 6). As a result, the minimum transaction value (Xmin), the minimum node value (Nmin), and the minimum tree value (Tmin) based on them are always greater than 6. Therefore, after the notification of the cluster minimum value (Cmin), there is no transaction that refers to the completely deleted tuple.

なお、図６および図７では、上位マスタノードＡ、２階層の中位マスタノードＢ，Ｃ，Ｄ，Ｅ、下位マスタノードＥ，Ｆからなる４階層のツリー構造を有するマルチマスタノードで説明したが、上位マスタノードと下位マスタノードの２階層構造、または中位マスタノードが３階層以上すなわち全体で５階層以上のツリー構造を有していてもよい。 6 and 7, the description has been given with the multi-master node having a four-level tree structure including the upper master node A, the two-level middle master nodes B, C, D, E, and the lower-level master nodes E, F. However, the upper master node and the lower master node may have a two-layer structure, or the middle master node may have a tree structure of three or more layers, that is, a total of five or more layers.

次に、図３を用いて書込セットの生成およびそれを用いた更新について説明する。
同図は、下位マスタノード（一例としてＭ３０１）におけるマスタデータベース（１１ａ）と、書込セットとの関係を示している。マスタデータベース（１１ａ）は行番号と、命令内容と、ポインタとによって構成されており、新たな命令がクライアント端末（ＣＬ）からなされる毎に行番号が追加されていく追記型のデータベースである。同図の場合、
前記で説明したように、行番号４を削除（ＤＥＬＥＴＥ）し、行番号５を新たな命令内容に書き換える（ｓｃ１にＵＰＤＡＴＥ）する場合を示している。 Next, generation of a writing set and update using the writing set will be described with reference to FIG.
This figure shows the relationship between the master database (11a) in the lower master node (M301 as an example) and the writing set. The master database (11a) is composed of line numbers, instruction contents, and pointers, and is a write-once database in which line numbers are added each time a new instruction is issued from the client terminal (CL). In the case of the figure,
As described above, the case where line number 4 is deleted (DELETE) and line number 5 is rewritten to new instruction contents (UPDATE to sc1) is shown.

下位マスタノード（ＭＳ３０１）においてクライアント端末（ＣＬ）からの命令によりマスタデータベースに対してこのような更新命令がなされると、前述のように、バックエンドメモリ（ＢＥＭ）上でヒープタプルマップ（ＨＴＭ、ヒープファイル）とシャドウコピー（ＳＣ）とトランザクション最小値（Ｘｍｉｎ）からなる書込セットが生成される。 When such an update command is issued to the master database by a command from the client terminal (CL) in the lower master node (MS301), as described above, the heap tuple map (HTM, A writing set comprising a heap file), a shadow copy (SC), and a transaction minimum value (Xmin) is generated.

ヒープタプルマップ（ＨＴＭ）には、元の行番号（ｃｔｉｄ）と、新しい行の行番号（ｓｃｔｉｄ）が関係付けられて登録されるようになっている。このようにヒープタプルマップ（ＨＴＭ）はデータベースの更新毎に追加生成されていく。なお、行番号５の命令内容（ｓｃ１）が書き込まれる行番号はこの段階ではまだ不明であるため、ｓｃｔｉｄには新しい命令（ｓｃ１）を書き込んでおく。 In the heap tuple map (HTM), the original line number (ctid) and the line number (sctid) of the new line are associated and registered. In this way, a heap tuple map (HTM) is additionally generated every time the database is updated. Note that since the line number in which the instruction content (sc1) of line number 5 is written is still unknown at this stage, a new instruction (sc1) is written in sctid.

一方、シャドウコピー（ＳＣ）には、マスタデータベース（１１ａ）を参照して書き換えられるべき行番号のシャドウコピーを生成する。このとき、新たに追加される行番号はこの段階では不明であるので、行番号には新たな命令（ｓｃ１）を書き込んでおく。 On the other hand, for the shadow copy (SC), a shadow copy of the line number to be rewritten with reference to the master database (11a) is generated. At this time, since the newly added line number is unknown at this stage, a new command (sc1) is written in the line number.

なお、この段階で下位マスタノード（ＭＳ３０１）のデータベース処理部（１１ｂ）は、ヒープタプルマップ（ＨＴＭ）の生成によりＤＥＬＥＴＥ命令が適用される行番号４と、ＵＰＤＡＴＥ命令が適用される旧行番号５は削除されることが既にわかるため、シャドウコピー（ＳＣ）としては新たな命令（ｓｃ１）だけを書き込んでおいてもよい。 At this stage, the database processing unit (11b) of the lower master node (MS301) uses the line number 4 to which the DELETE instruction is applied by generating the heap tuple map (HTM) and the old line number 5 to which the UPDATE instruction is applied. Since it is already known that the command is deleted, only a new command (sc1) may be written as the shadow copy (SC).

さらに、書込セットに付加されるトランザクション最小値（Ｘｍｉｎ）は、前記で説明したように実行中のトランザクションが参照しているスナップショット上に記録されているトランザクション最小値（Ｘｍｉｎ）をそのまま書き込む。 Further, as described above, as the transaction minimum value (Xmin) added to the writing set, the transaction minimum value (Xmin) recorded on the snapshot referred to by the transaction being executed is written as it is.

このようにして生成された書込セットは、当該下位マスタノード（ＭＳ３０１）から中位マスタノード（たとえばＭＳ２０１）、さらに上位マスタノード（ＭＳ１０１）に送信される。 The writing set generated in this way is transmitted from the lower master node (MS301) to the middle master node (for example, MS201) and further to the upper master node (MS101).

上位マスタノード（ＭＳ１０１）において、データベース処理部１１ｂ（中央処理装置（ＣＰＵ））は、前記で受信した書込セットからトランザクション最小値（Ｘｍｉｎ）を読み出す。そして、このトランザクション最小値（Ｘｍｉｎ）と上位マスタノード（ＭＳ１０１）のバックエンドメモリ（ＢＥＭ）に保持されたクラスタ最小値（Ｃｍｉｎ）と比較する。このとき、トランザクション最小値（Ｘｍｉｎ）がクラスタ最小値（Ｃｍｉｎ）と等しいか大きいときには、トランザクションログ処理部（１１ｃ）を起動してトランザクションログデータの生成を開始した後に、以下の処理を行う。 In the upper master node (MS101), the database processing unit 11b (central processing unit (CPU)) reads the transaction minimum value (Xmin) from the write set received above. Then, the transaction minimum value (Xmin) is compared with the cluster minimum value (Cmin) held in the back-end memory (BEM) of the upper master node (MS101). At this time, when the transaction minimum value (Xmin) is equal to or greater than the cluster minimum value (Cmin), the transaction log processing unit (11c) is activated to start generating transaction log data, and then the following processing is performed.

まず、ヒープタプルマップ（ＨＴＭ）を読み出して、自身のマスタデータベース１１ａと比較する。ここで、ターゲットとなっているタプル（ここでは行番号４，５および７）の内容がデータベース（１１ａ）上で更新されているか否かを検証する。図４では、行番号４−６については未更新であるため、行番号４に削除ポインタを付与し、書き換えられる旧番号５にも削除ポインタを付与する。そして、新たな行番号７に新しい命令（ｓｃ１）が書き込まれる。 First, a heap tuple map (HTM) is read and compared with its own master database 11a. Here, it is verified whether or not the contents of the target tuple (here, line numbers 4, 5, and 7) are updated on the database (11a). In FIG. 4, since the line numbers 4-6 are not updated, a deletion pointer is assigned to the line number 4 and a deletion pointer is also assigned to the old number 5 to be rewritten. Then, a new command (sc1) is written in the new line number 7.

下位マスタノード（Ｍ３０１）からの書込セット中のヒープタプルマップ（ＨＴＭ）と自身のデータベースを比較して、上位マスタノード（Ｍ１０１）において既に別の書込セットによって当該行が更新されているときには、当該書込セットによる処理はアボート（中断）される。 When the heap tuple map (HTM) in the writing set from the lower master node (M301) is compared with its own database, the row is already updated by another writing set in the upper master node (M101). The processing by the writing set is aborted.

一方、書込セット内のトランザクション最小値（Ｘｍｉｎ）とクラスタ最小値（Ｃｍｉｎ）と比較して、トランザクション最小値（Ｘｍｉｎ）がクラスタ最小値（Ｃｍｉｎ）よりも小さいときには、当該書込セットをアボートする。 On the other hand, when the transaction minimum value (Xmin) is smaller than the cluster minimum value (Cmin) as compared with the transaction minimum value (Xmin) and the cluster minimum value (Cmin) in the write set, the write set is aborted. .

このように書込セットをアボート処理する理由を以下に説明する。
書込セットの中にはクラスタ最小値（Ｃｍｉｎ）よりも小さいトランザクションＩＤを持つトランザクションが削除したタプルが更新対象となっている可能性がある。このようなタプルは上位マスタノードのクラスタ最小値（Ｃｍｉｎ）の設定に基づいて、既に廃棄されている可能性が高い。その場合、ヒープタプルマップを用いてこのタプルに対する整合性検出ができないので、上位マスタノードに矛盾を生じてしまう。そのため、クラスタ最小値（Ｃｍｉｎ）よりも小さいトランザクション最小値（Ｘｍｉｎ）を有する書込セットをアボートして係る矛盾の発生を防止しているわけである。 The reason for aborting the writing set will be described below.
In the writing set, there is a possibility that a tuple deleted by a transaction having a transaction ID smaller than the cluster minimum value (Cmin) is an update target. There is a high possibility that such a tuple has already been discarded based on the setting of the cluster minimum value (Cmin) of the upper master node. In that case, consistency cannot be detected for this tuple using the heap tuple map, resulting in a contradiction in the upper master node. Therefore, the writing set having the transaction minimum value (Xmin) smaller than the cluster minimum value (Cmin) is aborted to prevent the occurrence of the contradiction.

クラスタ最小値（Ｃｍｉｎ）の通知によるトランザクションの廃棄が中位・下位マスタノードで完了すれば、このような書込セットは生成されない。しかし、クラスタ最小値（Ｃｍｉｎ）を上位マスタノードから下位マスタノードに通知している間に中位・下位マスタノードで生成された書込セットは、クラスタ最小値（Ｃｍｉｎ）より小さいトランザクション最小値を持つ可能性があるので、書込セットを上位マスタノードで適用する際にチェックする必要がある。そこで前述のように、書込セットに登録されたトランザクション最小値（Ｘｍｉｎ）とクラスタ最小値（Ｃｍｉｎ）とを比較して、トランザクション最小値（Ｘｍｉｎ）が小さい場合には書込セットをアボートするようにして解決しているわけである。 If the discard of the transaction based on the notification of the cluster minimum value (Cmin) is completed at the middle / lower master node, such a write set is not generated. However, the write set generated in the middle / lower master node while notifying the cluster minimum value (Cmin) from the upper master node to the lower master node has a transaction minimum value smaller than the cluster minimum value (Cmin). Therefore, it is necessary to check when the writing set is applied on the upper master node. Therefore, as described above, the transaction minimum value (Xmin) registered in the writing set is compared with the cluster minimum value (Cmin), and if the transaction minimum value (Xmin) is small, the writing set is aborted. This is the solution.

このように、クラスタ最小値（Ｃｍｉｎ）の通知により、完全消去されたタプルを参照するトランザクションを全て廃棄することができ、前記通知（Ｃｍｉｎ）の通知と入れ違いに作成された書き込みセットもクラスタ最小値（Ｃｍｉｎ）を基準にアボートできる。したがって、中位・下位ノードでもデータベースの整合性を損なわずにタプルを完全消去することができる。 In this way, the notification of the cluster minimum value (Cmin) can discard all transactions that refer to the tuples that have been completely erased, and the write set created in the wrong way of the notification (Cmin) notification is also the cluster minimum value. Abort can be performed based on (Cmin). Therefore, tuples can be completely deleted even in the middle and lower nodes without losing the consistency of the database.

図５は、上位マスタノード（ＭＳ１０１）のマスタデータベース（１１ａ）が上記により更新されたときにトランザクションログ処理部（１１ｃ）で生成されるトランザクションログデータの一例である。このトランザクションログは、少なくとも命令とトランザクション内容（行番号とそれに対する実行処理内容）が時系列で連続的に記録されたファイルである。 FIG. 5 is an example of transaction log data generated by the transaction log processing unit (11c) when the master database (11a) of the upper master node (MS101) is updated as described above. This transaction log is a file in which at least instructions and transaction contents (line numbers and execution process contents corresponding thereto) are continuously recorded in time series.

同図によればトランザクションの開始命令（ＸＢ１）に続いて、命令番号と行番号とが対になったログが順次生成されている。たとえば、最初にＤＥＬＥＴＥ命令（Ｄ１）として行番号４を削除し（Ｄ１４）、次にＵＰＤＡＴＥ命令（Ｕ１）として行番号５を削除し行番号７を追加し（Ｕ１５７）次にこれらのコミット命令（ＸＣ１）を発行する。なお、このトランザクションログデータには、クラスタ最小値（Ｃｍｉｎ＝６）を追加してもよい。このようにクラスタ最小値（Ｃｍｉｎ＝６）を含めることによって、中位マスタノードＢ，Ｃ，Ｄ，Ｅまたは下位マスタノードＦ，Ｇにおけるトランザクションの廃棄順を制御することが可能となる。 According to the figure, following the transaction start instruction (XB1), a log in which an instruction number and a line number are paired is sequentially generated. For example, line number 4 is deleted first as a DELETE instruction (D1) (D14), then line number 5 is deleted as UPDATE instruction (U1), line number 7 is added (U157), and then these commit instructions ( XC1) is issued. Note that a minimum cluster value (Cmin = 6) may be added to the transaction log data. By including the cluster minimum value (Cmin = 6) in this way, it becomes possible to control the transaction discard order in the intermediate master nodes B, C, D, E or the lower master nodes F, G.

なお、前述したようにこのトランザクションログデータとは非同期にクラスタ最小値（Ｃｍｉｎ＝６）を中位マスタノードＢ，Ｃ，Ｄ，Ｅおよび下位マスタノードＦ，Ｇに通知する場合には、前述のトランザクションログデータの生成・送信を待つことなく効率的なトランザクションの廃棄処理が可能となる。 As described above, when the cluster minimum value (Cmin = 6) is notified to the intermediate master nodes B, C, D, E and the lower master nodes F, G asynchronously with the transaction log data, Efficient transaction discard processing can be performed without waiting for generation / transmission of transaction log data.

このトランザクションログデータは、通信モジュール（１１ｄ）より中位マスタノード（ＭＳ２０１）をはじめ、すべての下位マスタノード（ＭＳ３０２・・・ＭＳ３０ｎ）に対して配信される。 The transaction log data is distributed from the communication module (11d) to the intermediate master node (MS201) and all lower master nodes (MS302... MS30n).

前記トランザクションログデータを受信した中位・下位マスタノードでは、当該トランザクションログデータを自身のデータベースに複製（レプリケーション）する。
具体的には、下位マスタノード（たとえばＭ３０２）が図５に示したトランザクションログデータを通信モジュール１１ｄで受信すると、トランザクションログ処理部１１ｃを起動してこのトランザクションログデータを自身のマスタデータベース１１ａにレプリケーションする。この結果、行番号４と５に削除ポインタが付与され、新たな行番号７が追加される。そして、このトランザクションログデータにクラスタ最小値（Ｃｍｉｎ＝６）が追記されている場合には、各マスタノードにおいて、このクラスタ最小値（Ｃｍｉｎ＝６）よりも小さいトランザクション最小値（Ｘｍｉｎ）を持つスナップショットを参照するトランザクションが全て廃棄される。 The middle / lower master node that has received the transaction log data replicates the transaction log data to its own database.
Specifically, when the lower level master node (for example, M302) receives the transaction log data shown in FIG. 5 by the communication module 11d, the transaction log processing unit 11c is activated and the transaction log data is replicated to its own master database 11a. To do. As a result, deletion pointers are assigned to the line numbers 4 and 5, and a new line number 7 is added. When the cluster minimum value (Cmin = 6) is added to the transaction log data, a snap having a transaction minimum value (Xmin) smaller than the cluster minimum value (Cmin = 6) is obtained at each master node. All transactions that refer to the shot are discarded.

このように、中位・下位マスタノードでは上位マスタノードから送信されるトランザクションログデータのレプリケーションによって統一的にデータベースが管理されることになる。 As described above, the middle / lower master node uniformly manages the database by replication of the transaction log data transmitted from the upper master node.

以上、本発明を実施形態に基づいて説明したが、本発明はこれに限定されるものではない。以下、その変形例について説明する。
（上位マスタノードＭＳ１０１でデータベースの更新が行われる場合）
下位マスタノード（たとえばＭＳ３０１）でマスタデータベースの更新命令が発生した場合の処理については、図２で説明したように、バックエンドメモリ（ＢＥＭ）上でヒープタプルマップ（ＨＴＭ、ヒープファイル）とシャドウコピー（ＳＣ）とからなる書込セットが生成されるが、上位マスタノード（ＭＳ１０１）でマスタデータベースの更新命令が発生した場合には、上位ノードに通知する必要がないため、書込セットは生成されない。すなわち、このような場合、上位ノード（ＭＳ１０１）では、図４の左図に示すようにマスタデータベース（１１ａ）に対して直接更新データの書込が行われるとともに、図５に示すトランザクションログデータが生成される。このトランザクションログデータは下位マスタノードに配信され、前記トランザクションログデータを受信した下位マスタノードでは、当該トランザクションログデータを自身のマスタデータベースにレプリケーションする。 As mentioned above, although this invention was demonstrated based on embodiment, this invention is not limited to this. Hereinafter, the modification is demonstrated.
(When the database is updated in the upper master node MS101)
As described with reference to FIG. 2, the processing when the master database update instruction is issued in the lower master node (for example, MS301), heap tuple map (HTM, heap file) and shadow copy on the back-end memory (BEM) (SC) is generated. However, when a master database update command is generated in the upper master node (MS101), it is not necessary to notify the upper node, so a write set is not generated. . That is, in such a case, the upper node (MS101) writes update data directly to the master database (11a) as shown in the left diagram of FIG. 4, and the transaction log data shown in FIG. Generated. The transaction log data is distributed to the lower master node, and the lower master node receiving the transaction log data replicates the transaction log data to its own master database.

（下位マスタノードで書込セットを生成している段階で検索処理が実行されたとき）
下位マスタノード（たとえばＭＳ１０１）において、図３に示すような書込セットを生成している段階で、当該下位マスタノードのマスタデータベースに対して検索が実行されたとき、書込セットの生成された行番号以外の行番号を対象とした検索であれば問題はないが、該当行（ここでは行番号４および行番号５）に対する検索である場合、これらの行番号に対しては既に実質的な更新処理が行われているため、マスタデータベースの該当行を検索したとしても正確な検索結果は得られない。 (When the search process is executed while the writing set is generated in the lower master node)
When a search is executed on the master database of the subordinate master node in the stage where the subordinate master node (for example, MS 101) is generating a write set as shown in FIG. 3, the write set is generated. There is no problem if the search is for a line number other than the line number, but if the search is for the corresponding line (here, line number 4 and line number 5), these line numbers are already substantial. Since update processing is performed, an accurate search result cannot be obtained even if the corresponding row in the master database is searched.

このような場合、データベース処理部（１１ｂ）は、まずヒープタプルマップ（ＨＴＭ）を参照して、検索該当行番号がエントリされているか否かをチェックする。たとえば図３に示す例で、検索対象が行番号３である場合、データベース処理部（１１ｂ）は、メモリ（ＭＭ）上に構築されたバックエンドメモリ（ＢＥＭ）上の書込セット内のヒープタプルマップ（ＨＴＭ）を参照して、該当行（行番号３）がエントリされているか否かを判定する。図３の例では、該当行はエントリされていない。その場合には、マスタデータベース１１ａに直接アクセスして該当行（行番号３）を検索する。 In such a case, the database processing unit (11b) first refers to the heap tuple map (HTM) and checks whether or not the search relevant row number has been entered. For example, in the example shown in FIG. 3, when the search target is line number 3, the database processing unit (11 b) determines the heap tuple in the writing set on the back-end memory (BEM) built on the memory (MM). Referring to the map (HTM), it is determined whether or not the corresponding line (line number 3) has been entered. In the example of FIG. 3, the corresponding line is not entered. In that case, the master database 11a is directly accessed to search for the corresponding line (line number 3).

一方、図３に示す例で、検索対象が行番号４である場合、データベース処理部（１１ｂ）は書込セット内のヒープタプルマップ（ＨＴＭ）を参照したときに、該当行（行番号４）がエントリされていることを検出することになる。この場合、マスタデータベース１１ａにアクセスしても、残存する該当行は既に削除する更新命令が実行されているため、検索対象としても意味がない。この場合、データベース処理部（１１ｂ）は、ヒープタプルマップ（ＨＴＭ）を参照して該当行（行番号４）が削除されていることを検出する。このように、検索対象行が削除されているため、データベース処理部（１１ｂ）は該当行を検索対象とはしない。 On the other hand, in the example shown in FIG. 3, when the search target is line number 4, when the database processing unit (11b) refers to the heap tuple map (HTM) in the writing set, the corresponding line (line number 4). Will be detected. In this case, even if the master database 11a is accessed, there is no meaning as a search target because an update command for deleting the remaining corresponding row has already been executed. In this case, the database processing unit (11b) detects that the corresponding line (line number 4) has been deleted with reference to the heap tuple map (HTM). Thus, since the search target line is deleted, the database processing unit (11b) does not set the corresponding line as the search target.

一方、図３に示す例で、検索対象が行番号５である場合、上記と同様に、ヒープタプルマップ（ＨＴＭ）を参照して該当行（行番号５）に対応するシャドウコピー（ＳＣ１）が作成されていることを検出する。 On the other hand, in the example shown in FIG. 3, when the search target is the line number 5, the shadow copy (SC1) corresponding to the corresponding line (line number 5) is referred to the heap tuple map (HTM) as described above. Detect that it has been created.

このとき、データベース処理部（１１ｂ）は、シャドウコピー（ＳＣ１）を参照して、行番号５を書き換えた内容を検索対象とすればよい。
以上の説明では、１行毎にヒープタプルマップ（ＨＴＭ）を参照する場合で説明したが、このような処理方法に限らず、検索を行う際にデータベース処理部（１１ｂ）は、まず一括的にヒープタプルマップ（ＨＴＭ）を参照して、ここにエントリされている行番号の全てを削除されたものとしてこれらを除外してマスタデータベース（１１ａ）を検索しその後、再度ヒープタプルマップ（ＨＴＭ）を参照してそのエントリに基づいて該当行を検索対象から除外したり（行番号４の場合）、シャドウコピー（ＳＣ１）を検索対象としてもよい。 At this time, the database processing unit (11b) may refer to the shadow copy (SC1) and search for the contents in which line number 5 is rewritten.
In the above description, the case where the heap tuple map (HTM) is referred to for each row has been described. However, the present invention is not limited to such a processing method. Referring to the heap tuple map (HTM), search for the master database (11a) excluding all of the line numbers entered here, and then search the heap tuple map (HTM) again. With reference to the entry, the corresponding line may be excluded from the search target (in the case of line number 4), or the shadow copy (SC1) may be the search target.

（下位マスタノードで上位マスタノードからのトランザクションログデータによる更新が行われているときの競合が生じた場合）
上位マスタノードから配信されたトランザクションログデータによって下位マスタノードのデータベースにレプリケーションが行われているときに、該当行に対して下位マスタノードのデータベースの更新命令が実行されている場合、競合が発生することになる。 (If there is a conflict when updating with transaction log data from the upper master node in the lower master node)
When replication is performed to the database of the lower master node by the transaction log data distributed from the higher master node, a conflict occurs if an update instruction for the database of the lower master node is executed for the corresponding row. It will be.

具体的には、下位マスタノードへの更新命令に基づいて行番号４と５とを更新しているときに、上位マスタノードから行番号５に関するトランザクションログデータが配信されてきたときがこれに該当する。 Specifically, this applies when transaction log data related to line number 5 is delivered from the upper master node when line numbers 4 and 5 are updated based on an update command to the lower master node. To do.

このような場合には、下位マスタノードで作成された書込セットが上位マスタノードに送信されたとしても、上位マスタノードでは既に該当行に関するトランザクションログデータが配信されているため、前記書込セットは上位マスタノードで競合が検出されてアボートされることになる。したがって、当該下位マスタノードにおける競合は無視して差し支えない。この方法によれば、上位マスタノードにおいて本来アボートされるのがわかっているような書込セットでも上位マスタノードに送信しなければならず、上位マスタノードおよび下位マスタノードでのそれぞれの負担は増加するものの、上位マスタノードからのトランザクションログデータを下位マスタノードでレプリケーションする段階で一々競合のチェックを行わなくてよいため、処理が高速化できるという利点がある。 In such a case, even if the write set created in the lower master node is transmitted to the upper master node, the transaction log data relating to the corresponding row has already been distributed in the upper master node. Will be aborted when a conflict is detected at the higher master node. Therefore, the conflict in the lower master node can be ignored. According to this method, even writing sets that are known to be aborted in the upper master node must be transmitted to the upper master node, increasing the burden on the upper master node and the lower master node. However, there is an advantage that the processing can be speeded up because it is not necessary to check each other at the stage where the transaction log data from the upper master node is replicated at the lower master node.

一方、このような下位マスタノードでの競合を解決する別の方法としては、下位マスタノードにおいてヒープタプルマップ（ＨＴＭ）を生成したときに、これを当該下位マスタノード内のメモリに登録し、このメモリを共有メモリとして、他の下位マスタノードからアクセス可能な状態としておく。このような他の下位ノードマスタ間のアクセスは、それぞれの下位マスタノードのデータベース処理部１１ｂが行う。この方法によれば、ヒープタプルマップ（ＨＴＭ）の生成時に競合をチェックするため、トランザクションログデータによるレプリケーションが遅延する可能性はあるものの、競合の問題を下位マスタノー
ド間だけで解決できるので、上位マスタノードに負担をかけない。 On the other hand, as another method for solving such a conflict in the lower master node, when a heap tuple map (HTM) is generated in the lower master node, it is registered in the memory in the lower master node, and this The memory is used as a shared memory and is accessible from other subordinate master nodes. Such access between other lower-level node masters is performed by the database processing unit 11b of each lower-level master node. According to this method, since the conflict is checked when the heap tuple map (HTM) is generated, replication by transaction log data may be delayed, but the conflict problem can be solved only between the lower master nodes. Do not put a burden on the master node.

以上、本発明を実施形態に基づいて説明したがこれらに限定されるものではない。
たとえば、書込セットは下位マスタノード（ＭＳ３０１）で生成する場合で説明したがこれに限らず中位マスタノード（ＭＳ２０１）で生成してもよいことは勿論である。 As mentioned above, although this invention was demonstrated based on embodiment, it is not limited to these.
For example, the writing set is generated in the lower master node (MS301). However, the present invention is not limited to this, and the writing set may be generated in the middle master node (MS201).

本発明は、階層構造を備えたマルチマスタノード構造のデータベース管理システムに利用できる。 The present invention can be used for a database management system having a multi-master node structure having a hierarchical structure.

ＭＳ１０１上位マスタノード
ＳＬスレーブ
ＭＳ２０１，ＭＳ２０２・・・ＭＳ２０ｎ中位マスタノード
ＭＳ３０１，ＭＳ３０２・・・ＭＳ３０ｎ下位マスタノード
ＣＬクライアント端末
１１ａマスタデータベース
１１ｂデータベース処理部
１１ｃトランザクションログ処理部
１１ｄ通信モジュール
ＢＥＭバックエンドメモリ
ＨＴＭヒープタプルマップ
ＳＣシャドウコピー
Ｃｍｉｎクラスタ最小値
Ｘｍｉｎトランザクション最小値 MS101 Upper master node SL Slave MS201, MS202 ... MS20n Middle master node MS301, MS302 ... MS30n Lower master node CL Client terminal 11a Master database 11b Database processing unit 11c Transaction log processing unit 11d Communication module BEM Backend memory HTM Heap tuple map SC Shadow copy Cmin Cluster minimum value Xmin Transaction minimum value

Claims

A method for managing a write-once database having hierarchically upper and lower master nodes that can be updated,
Sending, in a session of any lower master node, lower database update information including the transaction minimum value of the snapshot referred to by the transaction being executed to the upper master node;
The upper master node that has received the lower database update information compares the transaction minimum value with the cluster minimum value held by the upper master node, and if the transaction minimum value is equal to or greater than the cluster minimum value, Updating the database of the upper master node based on the database update information;
If the transaction minimum value is smaller than the cluster minimum value, aborting the lower database update information; and
Generating an update record of the upper master node as a transaction log;
Delivering the transaction log to a subordinate master node including a subordinate master node of the transmission source;
In the lower master node, updating its own database based on the transaction log received above,
Notifying the lower master node of the cluster minimum value held by the upper master node, and discarding a transaction having a transaction minimum value smaller than the minimum value;
A method for managing a write-once database in an upper master node having hierarchically lower master nodes consisting of

The lower database update information is a write set consisting of a shadow copy of the database of the lower master node, a heap tuple map expanded in its own memory, and a transaction minimum value of a snapshot referred to by an executing transaction. Yes,
When the upper master node receives the write set, the transaction minimum value is compared with the cluster minimum value held by the upper master node, and if the transaction minimum value is equal to or greater than the cluster minimum value, The heap tuple map is compared with its own database to verify whether there is an update in the database of the tuple registered as a target. When the update is made, the write set is aborted, and no update is made. Sometimes updating the database with the shadow copy;
If the transaction minimum is less than the cluster minimum, aborting the write set;
The write-once database management method according to claim 1, comprising:

In addition to the above, there is an intermediate master node of one or more layers between the upper and lower master nodes,
In any one of the sessions of the middle master node, sending to the upper master node the middle database update information including the minimum transaction value that is the target of updating the database of the middle master node. ,
In the upper master node that has received the intermediate database update information, the transaction minimum value is compared with the cluster minimum value held by the upper master node, and if the transaction minimum value is equal to or greater than the cluster minimum value, The method for managing a write-once database according to claim 1 or 2, further comprising the step of updating the database of the upper master node based on the intermediate database update information.