JP2010128752A

JP2010128752A - Database system, server, update method, and program

Info

Publication number: JP2010128752A
Application number: JP2008302250A
Authority: JP
Inventors: Mitsuo Koyanagi; 光生小柳; Yosuke Ozawa; 陽介小澤; Miki Enoki; 美紀榎
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 2008-11-27
Filing date: 2008-11-27
Publication date: 2010-06-10
Anticipated expiration: 2028-11-27
Also published as: JP5425448B2

Abstract

PROBLEM TO BE SOLVED: To provide a database system, a server, an update method, and a program for efficiently executing a plurality of update requests to a database. SOLUTION: The server is connected to a database 90 and one or more other servers and includes a data storage part 50 for storing at least part of data obtained by dividing the database 90, a log storage part 62 for storing an update log by an update request concerning the data received from an application 30, a transmission part 88 for copying the update log and transmitting it to the other servers in response to the update request, a replica storage part 82 for storing a replica 110 of the update log by copying received from the other servers, and an execution part 64 for executing a batch update for the database 90 according to the sum of the update log 100 and the replica 110 stored in the log storage part 62 and the replica storage part 82 respectively. COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、データベース技術に関し、より詳細には、データベースに対する複数の更新要求を効率的に実行するためのデータベース・システム、サーバ、更新方法およびプログラムに関する。 The present invention relates to database technology, and more particularly to a database system, server, update method, and program for efficiently executing a plurality of update requests for a database.

近年、データベース・システムの大規模化に伴い、膨大な量のトランザクションを効率的かつ高速に処理し、高い信頼性、可用性および耐障害性にてデータ管理することに対する要望が高まっている。 In recent years, with an increase in the scale of a database system, there is an increasing demand for efficient and high-speed processing of an enormous amount of transactions and data management with high reliability, availability, and fault tolerance.

トランザクションの効率化に関連して、データベースに対する更新時に、要求された更新の内容を一旦保存し、複数の更新をまとめて一度にデータベースに送信することで、スループットを改善する手法が知られている。これは、バッチ更新として参照され、ネットワーク・トラフィックを軽減し、類似の更新要求の処理の効率化、およびデータベースにおけるディスクの書き込みの最適化を図ることができる。 In connection with transaction efficiency, there is a known method for improving throughput by temporarily storing the requested update contents and sending multiple updates at once to the database when updating the database. . This is referred to as a batch update, which can reduce network traffic, streamline processing of similar update requests, and optimize disk writes in the database.

上記バッチ更新は、通常は、トランザクション内のリクエストをまとめて送信する際に利用される。複数の更新要求をまとめて送信することによって、ボトルネックとなるデータベース処理速度を改善し、トランザクション全体のスループットを向上させることができる。しかしながら、バッチ更新では、一定数の更新要求をコミットできる状態となるまでの待機時間が生じ、レスポンスタイムが低下する。 The batch update is normally used when sending requests in a transaction collectively. By sending a plurality of update requests together, it is possible to improve the database processing speed, which is a bottleneck, and to improve the throughput of the entire transaction. However, in the batch update, a waiting time until a certain number of update requests can be committed is generated, and the response time is lowered.

同様にトランザクションの効率化に関連して、メインメモリ上にレプリカを生成する技術が知られている。例えば、非特許文献１は、送信側のローカルディスクおよびレプリケーション先のリモートマシンのメインメモリにトランザクション・ログを書き込む技術を開示している。非特許文献１では、同期的なディスク書き込みを回避して、リモートマシンへの同期的なネットワーク・データ転送により置き換え、上記ローカルディスクおよびリモートマシンのメインメモリ上にログの複製を保持する。これにより、非同期に実施される実際のデータベースの磁気ディスクへの書き込みまでのデータの信頼性を保証している。非特許文献１では、メインメモリ上でのレプリケーションによって、実測で１００倍近くトランザクションのスループットを向上できることが報告されている。同様の技術として、非特許文献２でも、１００倍以上の性能向上が得られることが報告されている。 Similarly, a technique for generating a replica on a main memory is known in relation to the efficiency of transactions. For example, Non-Patent Document 1 discloses a technique for writing a transaction log to the local disk on the transmission side and the main memory of the remote machine at the replication destination. In Non-Patent Document 1, synchronous disk writing is avoided and replacement is performed by synchronous network data transfer to a remote machine, and a log copy is held on the local disk and the main memory of the remote machine. This guarantees the reliability of data until the actual database is written to the magnetic disk asynchronously. Non-Patent Document 1 reports that the transaction throughput can be improved by nearly 100 times by actual measurement by replication on the main memory. As a similar technique, non-patent document 2 reports that a performance improvement of 100 times or more can be obtained.

その他、非特許文献３は、ライトスルーが可能なシステム・エリア・ネットワーク（ＳＡＮ）で接続されたサーバからなるクラスタを用いたプライマリ・バックアップ構成において、データをレプリケーションすることによって、性能、信頼性および可用性を向上させたシステムを開示している。 In addition, Non-Patent Document 3 describes performance, reliability, and reliability by replicating data in a primary backup configuration using a cluster composed of servers connected by a system area network (SAN) capable of write-through. A system with improved availability is disclosed.

S. Ioannidis, E. P. Markatos, J. Sevaslidou, ”On Using Network Memory to Improve the Performance of Transaction Based System”, Proceedings of International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA'98), (July 1998).S. Ioannidis, E. P. Markatos, J. Sevaslidou, `` On Using Network Memory to Improve the Performance of Transaction Based System '', Proceedings of International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA'98), (July 1998). D. E. Lowell, P. M. Chen, ”Free Transactions with Rio Vista”, 16th ACM Symposium on Operating Systems Principles, Pages 92 - 101, (October 1997).D. E. Lowell, P. M. Chen, `` Free Transactions with Rio Vista '', 16th ACM Symposium on Operating Systems Principles, Pages 92-101, (October 1997). C. Amza, A. L. Cox, W. Zwaenepoel, ”Data replication strategies for fault tolerance and availability on commodity clusters ”In Proceedings of the International Conference on Dependable Systems and Networks, pp. 459-467, (2000).C. Amza, A. L. Cox, W. Zwaenepoel, `` Data replication strategies for fault tolerance and availability on commodity clusters '' In Proceedings of the International Conference on Dependable Systems and Networks, pp. 459-467, (2000).

３層アーキテクチャのデータベースにおいて、データベースの大規模化により、スケールアップによる対応が困難である場合、例えばインスタンスに閉じた処理の実行であれば、アプリケーション・サーバ上にパーティショニングしたキャッシュ（またはオブジェクト・ストア）を導入して、スケールアウトする手法が採用される。このようなパーティショニングされたシステムにおいては、パーティション毎に要求される更新を集積して、トランザクションとは非同期にバッチ更新することが効率的であると考えられる。しかしながら、各パーティション毎に入力負荷が異なる場合に、以下説明するバッチサイズ・アンバランスによる問題が発生してしまう。 In a database with a three-tier architecture, if it is difficult to cope with scale-up due to an increase in the size of the database, for example, if processing that is closed to an instance is executed, a cache (or object store) partitioned on the application server ) Is introduced and a method of scaling out is adopted. In such a partitioned system, it is considered efficient to perform batch update asynchronously with transactions by accumulating updates required for each partition. However, when the input load is different for each partition, a problem due to batch size imbalance described below occurs.

システムにおける最長のバッチ更新の間隔、ＬＵＩ（Longest Update Interval；最長更新インターバル）は、障害状態（プライマリと全レプリカとが利用不可能となった状態）までのＭＴＴＦ（Mean Time to Failure；平均連続稼働時間）よりも短くなければならない。またバッチ更新は、実行のオーバヘッドが大きいため、バッチサイズが充分に大きくなければ、逆にスループットの低下を招く可能性がある。 The longest batch update interval in the system, LUI (Longest Update Interval), is the MTTF (Mean Time to Failure; average continuous operation) until the failure state (the primary and all replicas become unavailable) Time) must be shorter. In addition, since the batch update has a large execution overhead, if the batch size is not sufficiently large, there is a possibility that the throughput is reduced.

図１２は、バッチサイズ・アンバランスを概略的に示す図である。従来では、バッチサイズを固定したバッチ更新において、各パーティション毎に入力負荷が異なる場合、図１２（Ａ）に示すように、ＬＵＩが負荷の小さいパーティションによって決定され、ゆえに短いＭＴＴＦに対応することができなくなる。 FIG. 12 is a diagram schematically showing batch size imbalance. Conventionally, in a batch update with a fixed batch size, when the input load is different for each partition, as shown in FIG. 12 (A), the LUI is determined by a partition with a small load, and therefore, it corresponds to a short MTTF. become unable.

一方、バッチサイズではなく更新インターバルを固定したバッチ更新の場合、図１２（Ｂ）に示すように、高い負荷のパーティションと低いものとでバッチサイズが相違してしまい、多くが理想的なバッチサイズでバッチ更新されなくなってしまう。特に、負荷の小さな方のパーティションがほとんど更新を含んでいないにも関わらず、バッチ更新が実施されてしまう場合など、上述したオーバヘッドのため、最悪の場合、スループットの最高値がパーティションの個数に反比例してしまう。 On the other hand, in the case of batch update in which the update interval is fixed instead of the batch size, as shown in FIG. 12B, the batch size is different between the high load partition and the low one, and many are ideal batch sizes. Will no longer be batch updated. In particular, the maximum throughput value is inversely proportional to the number of partitions due to the overhead described above, such as when a batch update is executed even though the partition with the smaller load contains almost no update. Resulting in.

上記パーティショニングは、パーティション間の負荷バランスを考慮して実施される、しかしながら、それにも限界があり、何らかの要因によって、パーティション間の更新要求の負荷バランスが崩れてしまった場合、設定された更新インターバルでのログの集積量にバラツキが生じてしまう可能性があった。あるいは、固定されたバッチサイズでは、負荷の小さなパーティションによるＬＵＩがＭＴＴＦを越えてしまう可能性があった。さらに、このようなシステムにレプリケーションを適用した場合、レプリケーションによる負荷も、更新負荷量に比例するためバラツキが生じてしまう。すなわち、パーティショニングされた分散システムにおいて、バッチ更新によるスループット向上の恩恵を最大化するためには、パーティション間のバッチサイズ・アンバランスの問題を解消する必要があった。 The above partitioning is performed in consideration of the load balance between partitions. However, there is a limit, and if the load balance of update requests between partitions is broken for some reason, the set update interval is set. There was a possibility that the log accumulation amount would vary. Alternatively, with a fixed batch size, there is a possibility that the LUI due to a partition with a small load exceeds the MTTF. Furthermore, when replication is applied to such a system, the load due to replication is also proportional to the update load, resulting in variations. That is, in the partitioned distributed system, in order to maximize the benefit of throughput improvement by batch update, it is necessary to solve the problem of batch size imbalance between partitions.

本発明は、上記問題点に鑑みてなされたものであり、本発明は、パーティショニングされた分散システムにおいて、パーティション間のバッチサイズ・アンバランスによる問題を解消して、もってバッチ更新によるスループット向上を最大化することが可能なデータベース・システム、サーバ、更新方法およびプログラムを提供することを目的とする。 The present invention has been made in view of the above problems, and the present invention eliminates the problem of batch size imbalance between partitions in a partitioned distributed system, thereby improving throughput by batch update. It is an object of the present invention to provide a database system, a server, an update method, and a program that can be maximized.

本発明者らは、鋭意検討の結果、複数のサーバによるトランザクション・ログの相互レプリケーションをバッチ更新処理に適用することによって、従来問題となっていたパーティション間のバッチサイズ・アンバランスによる問題を回避することができ、もってバッチ更新によるスループット向上を最大化することができることを見出し、本発明に至ったのである。 As a result of intensive studies, the present inventors apply a transaction log mutual replication by a plurality of servers to batch update processing, thereby avoiding a problem caused by batch size imbalance between partitions, which has been a problem in the past. Thus, the inventors have found that throughput improvement by batch update can be maximized, and the present invention has been achieved.

本発明では、上記課題を解決するために、データベースと複数のサーバとを含むデータベース・システムにおいて、それぞれのサーバ上に、データベースの分割された少なくとも一部分のデータを格納しておき、アプリケーションからの更新要求に備える。そして、アプリケーションから上記データに関連する更新要求を受領して、その更新ログを格納するとともに、該更新ログを複製してシステム内の他のサーバに送信する。一方、他のサーバから受信した複製による更新ログのレプリカを、バッチ更新および多重化のために格納しておく。そして、上記更新ログおよび上記レプリカの合計に対応して、これら格納された更新ログおよび受信したレプリカを含ませたバッチ更新をまとめてデータベースに対し実行する。 In the present invention, in order to solve the above-mentioned problem, in a database system including a database and a plurality of servers, at least a part of the divided data of the database is stored on each server and updated from the application. Prepare for the request. Then, an update request related to the data is received from the application, the update log is stored, and the update log is duplicated and transmitted to other servers in the system. On the other hand, replicas of update logs by replication received from other servers are stored for batch update and multiplexing. Then, in correspondence with the sum of the update log and the replica, batch updates including the stored update log and the received replica are collectively performed on the database.

上記構成では、更新要求をデータベースに対し実行する前に、障害に対して独立な１以上のサーバ上に更新ログが多重化され、高い永続性が担保される。多重化の成功をもってコミットとされ、データベースに対する実際の更新要求の実行は、上記バッチ更新として、トランザクションとは非同期に実施される。さらに上記バッチ更新は、他のサーバから受信した更新ログのレプリカも含み、すなわち複数のサーバ間で集約されたものであるため、バッチ更新のサイズは、複数のサーバでの合計となり、理想的なバッチサイズが容易に実現され得る。そして、相互に複製し合うもののうち、より高い更新負荷のものの更新負荷量に依存して最長更新インターバル（ＬＵＩ）が決まり、もって平均連続稼働時間（ＭＴＴＦ）以下のＬＵＩが達成し易くなる。 In the above configuration, before executing the update request to the database, the update log is multiplexed on one or more servers independent of the failure, and high durability is ensured. The commit is performed when the multiplexing is successful, and the execution of the actual update request to the database is performed asynchronously with the transaction as the batch update. Furthermore, since the batch update includes a replica of the update log received from another server, that is, it is aggregated among a plurality of servers, the size of the batch update is the sum of the plurality of servers, and is ideal. Batch sizes can be easily realized. The longest update interval (LUI) is determined depending on the update load amount of the higher update load among those that mutually replicate, and it is easy to achieve an LUI that is equal to or less than the average continuous operation time (MTTF).

本発明では、さらに、複数のサーバから通知される更新負荷量に対応して、バッチ更新の実行主体を割り当てて通知するコーディネート・サーバをシステムに含めることができる。上記サーバは、複数のサーバ間でバッチ更新の実行主体を割り当てるために、受信した更新要求による更新負荷量を計量して通知することができ、サーバは、この割り当てに対応して、バッチ更新の実行主体となることができる。上記構成では、各サーバの更新負荷量に対応して、効率的に実行可能な主体（例えば、低い更新負荷のもの）にバッチ更新を実行させることが可能となる。 In the present invention, it is possible to further include a coordinated server that assigns and notifies a batch update execution subject in correspondence with update load amounts notified from a plurality of servers. In order to allocate the execution subject of batch update among a plurality of servers, the server can measure and notify the update load amount due to the received update request, and the server can perform batch update in response to the allocation. It can be an execution subject. In the configuration described above, it is possible to cause a subject that can be executed efficiently (for example, one with a low update load) to execute batch update in accordance with the update load amount of each server.

本発明では、上記サーバは、それぞれ上記データベースの分割されたデータに対応する1以上のパーティションを備えることができる。さらに本発明では、上記分割されたデータを格納するデータ格納部、上記更新ログを格納するログ格納部、上記レプリカを送信する送信部、上記レプリカを格納するレプリカ格納部および上記バッチ更新を実行する実行部を上記パーティション毎に構成することができる。本発明では、上記コーディネート・サーバは、パーティション更新負荷量のグループ内合計のグループ間での差異を最小化する組み合わせを求めて、更新ログを相互に複製し合うパーティションからなる相互複製グループ（レプリケーション・グループ）を編成することができる。 In the present invention, the server can include one or more partitions each corresponding to the divided data of the database. Furthermore, in the present invention, a data storage unit that stores the divided data, a log storage unit that stores the update log, a transmission unit that transmits the replica, a replica storage unit that stores the replica, and the batch update are executed. An execution unit can be configured for each partition. In the present invention, the coordinating server obtains a combination that minimizes the difference between the groups in the total amount of partition update load within the group, and obtains a mutual replication group (replication replication group) composed of partitions that mutually replicate update logs. Group).

上記構成では、相互複製グループ内の総更新負荷量がグループ間で均一化されるように制御されるため、総更新負荷量が最も小さなグループによりＬＵＩが決定され、もって、パーティション毎の入力負荷の均一化が困難な場合であっても、より容易なグループの総入力負荷の調整によってＬＵＩを制御することが可能となる。また、バッチ更新のスループットは、グループへの総入力負荷が均一にバランスされ、入力負荷が充分あれる場合に、最大のスループットが期待できる。つまり、上記構成によれば、ＬＵＩの容易な制御に加え、スループットのチューニングも可能となる。 In the above configuration, since the total update load amount in the mutual replication group is controlled to be uniform among the groups, the LUI is determined by the group having the smallest total update load amount, and thus the input load of each partition is determined. Even when uniformization is difficult, it is possible to control the LUI by adjusting the total input load of the group more easily. The batch update throughput can be expected to be maximized when the total input load to the group is evenly balanced and the input load is sufficient. That is, according to the above configuration, throughput can be tuned in addition to easy control of the LUI.

また本発明では、上記コーディネート・サーバは、実行主体のパーティションの障害に応答して、相互複製グループ内の障害のないパーティションの中から直近で最も低負荷なものを実行主体として割り当てて通知することができる。上記構成では、例え実行主体として割り当てられていたパーティションが動作するサーバが障害に陥ったとしても、更新ログがグループ内で相互複製されているため、他のパーティションに実行主体を切り替えて、バッチ更新を直ちに実施することが可能となる。したがって、耐障害性が向上される。 In the present invention, the coordinating server, in response to a failure of the partition of the execution subject, assigns and notifies the execution subject of the latest lightest load among the non-failed partitions in the mutual replication group. Can do. In the above configuration, even if the server on which the partition that was assigned as the execution subject operates fails, the update logs are mutually replicated within the group, so the execution subject is switched to another partition and batch update is performed. Can be implemented immediately. Therefore, fault tolerance is improved.

また本発明では、相互複製グループに属する他のパーティションの障害に応答して、または更新インターバルの経過に応答して、上記合計によらずバッチ更新を実行することができる。上記構成では、ＬＵＩがＭＴＴＦ以下となることを担保することができ、また、障害により永続性レベルが低下した状態から、データの安全性を迅速に確保することが可能となる。 Further, according to the present invention, batch update can be executed regardless of the above total in response to a failure of another partition belonging to the mutual replication group or in response to the elapse of the update interval. With the above configuration, it is possible to ensure that the LUI is equal to or lower than the MTTF, and it is possible to quickly ensure the safety of data from a state in which the persistence level is lowered due to a failure.

また本発明では、バッチ更新の実行権限は、上記相互複製グループ内での該実行権限の貸し出しを管理するリースサーバから、または相互複製グループのすべてのパーティションによる相互合意によって取得されるよう構成することができる。また、相互複製グループ内のすべてのパーティションからのレプリカの受信確認の受領に対応して、更新要求に応答して前記アプリケーションへ処理を戻すことができる。さらに本発明では、上記データ格納部、上記ログ格納部および上記レプリカ格納部は、サーバのメインメモリにより提供することができる。 In the present invention, the execution authority of batch update is configured to be acquired from a lease server that manages lending of the execution authority in the mutual replication group or by mutual agreement by all partitions of the mutual replication group. Can do. Further, in response to receipt of receipt of replicas from all partitions in the mutual replication group, processing can be returned to the application in response to an update request. Furthermore, in the present invention, the data storage unit, the log storage unit, and the replica storage unit can be provided by a main memory of a server.

以下、本発明について実施形態をもって説明するが、本発明は、後述する実施形態に限定されるものではない。 Hereinafter, although this invention is demonstrated with embodiment, this invention is not limited to embodiment mentioned later.

以下の実施形態では、データベースと、該データベースにアクセスするアプリケーションが動作する複数のアプリケーション・サーバからなるクラスタとを含んで構成される３層クライアント・サーバ構成のデータベース・システム１０を例として説明する。 In the following embodiment, a database system 10 having a three-tier client / server configuration including a database and a cluster including a plurality of application servers on which applications that access the database operate will be described as an example.

図１は、本発明の実施形態におけるデータベース・システム１０の概略図を示す。図１に示すデータベース・システム１０は、ネットワーク１２に接続するデータベース・サーバ１４を含んで構成される。ネットワーク１２は、例えば、ギガビット・イーサネット（登録商標）を含んで構成される。データベース・サーバ１４は、概ねパーソナル・コンピュータ、ワークステーション、ミッドレンジまたはメインフレームなどの汎用コンピュータ装置として構成されている。 FIG. 1 shows a schematic diagram of a database system 10 in an embodiment of the present invention. A database system 10 shown in FIG. 1 includes a database server 14 connected to a network 12. The network 12 includes, for example, Gigabit Ethernet (registered trademark). The database server 14 is generally configured as a general-purpose computer device such as a personal computer, workstation, mid-range, or mainframe.

データベース・サーバ１２は、より具体的には、シングルコア・プロセッサまたはマルチコア・プロセッサなどの中央処理装置（ＣＰＵ）、キャッシュ・メモリ、ＲＡＭ、ネットワーク・インタフェース・カード（ＮＩＣ）などを備える。データベース・サーバ１２は、さらにＳＡＳ（Serial Attached SCSI）、ＰＡＴＡ（Parallel ATA）、ＳＡＴＡ（Serial ATA）、ファイバ・チャネルなどのストレージ・インタフェースを介してディスク・ストレージ装置に接続されている。これによりデータベースの記憶領域が提供される。 More specifically, the database server 12 includes a central processing unit (CPU) such as a single core processor or a multicore processor, a cache memory, a RAM, a network interface card (NIC), and the like. The database server 12 is further connected to a disk storage device via a storage interface such as SAS (Serial Attached SCSI), PATA (Parallel ATA), SATA (Serial ATA), and Fiber Channel. This provides a database storage area.

本実施形態のデータベース・サーバ１４は、ＷＩＮＤＯＷＳ（登録商標）２００Ｘ、ＵＮＩＸ（登録商標）、ＬＩＮＵＸ（登録商標）、ｚ／ＯＳ（登録商標）などのオペレーティング・システム（以下、ＯＳとして参照する。）により制御され、例えばＤＢ２（登録商標）、Ｏｒａｃｌｅ（登録商標）Ｄａｔａｂａｓｅ、Ｍｉｃｒｏｓｏｆｔ（登録商標）ＳＱＬＳｅｒｖｅｒ（登録商標）などのリレーショナル・データベースを管理するデータベース管理システム（ＲＤＢＭＳ；Relational Database Management System）を実装している。データベースのデータモデルは、特に限定されるものではない。他の実施形態では、データベース・サーバ１４は、オブジェクト・リレーショナル・データベース、オブジェクト・データベース、階層型データベース、ネットワーク型データベース、ＸＭＬ（eXtensible Markup Language）データベースなど、他のデータモデルのデータベースを管理するＤＢＭＳを実装することもできる。 The database server 14 of the present embodiment is an operating system (hereinafter referred to as an OS) such as WINDOWS (registered trademark) 200X, UNIX (registered trademark), LINUX (registered trademark), z / OS (registered trademark). Implementation of a database management system (RDBMS) that manages relational databases such as DB2 (registered trademark), Oracle (registered trademark) Database, and Microsoft (registered trademark) SQL Server (registered trademark) is doing. The data model of the database is not particularly limited. In another embodiment, the database server 14 includes a DBMS that manages databases of other data models such as an object relational database, an object database, a hierarchical database, a network database, and an XML (eXtensible Markup Language) database. It can also be implemented.

図１に示すデータベース・システム１０は、ネットワーク１２を介してデータベース・サーバ１４にアクセスするアプリケーション・サーバ（以下、ＡＰサーバとして参照する。」）２０をさらに含んで構成される。ＡＰサーバ２０も、データベース・サーバ１４と同様のハードウェア構成を備える汎用コンピュータ装置として構成することができる。ＡＰサーバ２０は、Ｊａｖａ（登録商標）ＥＥ（Java（登録商標）Platform, Enterprise Edition）などにより、ビジネスロジックなどを実装したアプリケーションを実装し、ネットワーク１２を介して受信する図示しないクライアントからの要求を処理している。例えば、ＡＰサーバ２０は、WebSphere（登録商標）Application Server、JBoss（登録商標）、Oracle（登録商標）Application Server、BEA WebLogic Server（登録商標）などにより構成することができる。 The database system 10 shown in FIG. 1 further includes an application server (hereinafter referred to as an AP server) 20 that accesses the database server 14 via the network 12. The AP server 20 can also be configured as a general-purpose computer device having the same hardware configuration as the database server 14. The AP server 20 implements an application in which business logic or the like is implemented using Java (registered trademark) EE (Java (registered trademark) Platform, Enterprise Edition) or the like, and receives a request from a client (not shown) received via the network 12. Processing. For example, the AP server 20 can be configured by WebSphere (registered trademark) Application Server, JBoss (registered trademark), Oracle (registered trademark) Application Server, BEA WebLogic Server (registered trademark), or the like.

複数のＡＰサーバ２０ａ〜ｃは、ＡＰサーバ・クラスタ２２（以下、単位クラスタとして参照する。）を形成する。ＡＰサーバ２０ａ〜ｃは、それぞれ、データベース・サーバ１４が管理するデータベースがパーティショニングされて配置されたデータ格納部（Data Store）を保持し、クライアントからの要求を負荷分散しつつ処理している。データ格納部は、ＡＰサーバ２０の物理メインメモリの記憶空間により提供され、データベース・サーバ１４が管理するデータベースのキャッシュとして動作し、トランザクションの高速化を実現している。 The plurality of AP servers 20a to 20c form an AP server cluster 22 (hereinafter referred to as a unit cluster). Each of the AP servers 20a to 20c holds a data storage (Data Store) in which a database managed by the database server 14 is partitioned and arranged, and processes requests from clients while distributing the load. The data storage unit is provided by the storage space of the physical main memory of the AP server 20, operates as a database cache managed by the database server 14, and realizes high-speed transactions.

ＡＰサーバ２０ａ〜ｃは、アプリケーション動作に対応して発生する、挿入（Insert）、変更（Update）、削除（Delete）などのデータベース更新要求に対し、データ格納部内のデータを読み出して、更新要求に対応し、更新ログを生成するとともに、応答する。また、更新要求による更新ログを蓄積し、まとまった更新ログをデータベース・サーバ１４へ一括送信（フラッシュ）して、トランザクションとは非同期的にデータベースに更新を反映する処理、所謂、バッチ更新を実施する。 The AP servers 20a to 20c read out data in the data storage unit in response to a database update request such as insert, update, or delete that occurs in response to an application operation, and make an update request. Respond and generate an update log and respond. In addition, the update log by the update request is accumulated, and the batch update log is collectively sent (flushed) to the database server 14 to execute a process of reflecting the update in the database asynchronously with the transaction, so-called batch update. .

さらに本実施形態のＡＰサーバ２０ａ〜ｃは、相互に更新ログを同期的にレプリケーションすることにより、バッチ更新が実施されるまでの間の永続性を担保し、レプリケーションによる更新ログの多重化の成功をもってコミットとし、レスポンスタイムを向上させている。また、上記バッチ更新の際には、相互に交換した更新ログのレプリカも対象とする。 Furthermore, the AP servers 20a to 20c according to the present embodiment ensure the durability until the batch update is performed by synchronously replicating the update logs with each other, and succeed in multiplexing the update logs by replication. Commit to improve response time. In addition, when updating the batch, replicas of update logs exchanged with each other are also targeted.

図１に示すデータベース・システム１０は、さらに、ネットワーク１２に接続されるコーディネート・サーバ１６およびリースサーバ１８を含んで構成される。コーディネート・サーバ１６およびリースサーバ１８も同様に、ＡＰサーバ２０と同様のハードウェアおよびソフトウェア構成を備える汎用コンピュータ装置として構成することができる。コーディネート・サーバ１６およびリースサーバ１８の機能については、詳細を後述する。 The database system 10 shown in FIG. 1 further includes a coordinate server 16 and a lease server 18 connected to the network 12. Similarly, the coordinate server 16 and the lease server 18 can be configured as general-purpose computer devices having the same hardware and software configuration as the AP server 20. Details of the functions of the coordination server 16 and the lease server 18 will be described later.

図２は、本発明の実施形態によるデータベース・システム１０において、各サーバ上に実現される機能ブロック図を示す。図２に示すデータベース・システム１０に含まれる機能部（詳細は後述する。）は、それぞれ、対応するサーバにおいて、コンピュータ可読な記録媒体からプログラムを読み出し、メモリ上にプログラムを展開し、プログラムを実行することより各ハードウェア資源を動作制御することによって実現される。各サーバに配置される機能部は、例えばＥＪＢ（Enterprise Java（登録商標）Beans）のような分散オブジェクト技術のフレームワークにより相互に通信している。 FIG. 2 shows a functional block diagram implemented on each server in the database system 10 according to the embodiment of the present invention. The function units (details will be described later) included in the database system 10 shown in FIG. 2 each read a program from a computer-readable recording medium in a corresponding server, expand the program on the memory, and execute the program This is realized by controlling the operation of each hardware resource. The functional units arranged in each server communicate with each other using a distributed object technology framework such as EJB (Enterprise Java (registered trademark) Beans).

各ＡＰサーバ２０上には、１以上のアプリケーション・モジュール（以下、単にモジュールとして参照する。）３０が動作している。また各ＡＰサーバ２０上には、それぞれデータ格納部５０を含む１以上のパーティション４０が動作している。 One or more application modules (hereinafter simply referred to as modules) 30 are operating on each AP server 20. Further, one or more partitions 40 each including a data storage unit 50 are operating on each AP server 20.

データ格納部５０は、データベース・サーバ１４が管理するデータベース９０をアプリケーション側の規則によってパーティショニングして配置されるデータをキャッシュし、保持するデータを用いてモジュール３０からの要求に応えている。データ格納部５０は、それぞれのＡＰサーバ２０の物理メインメモリ上に割り当てられた記憶空間により提供され、データベース９０のテーブルからパーティニングされた子テーブルの全体または一部分のデータを保持されている。データ格納部５０は、いわゆる実体化ビューを保持することができる。データ格納部５０は、好適には、インメモリ型のリレーショナル・データベースとして構成することができる。 The data storage unit 50 caches data arranged by partitioning the database 90 managed by the database server 14 according to the rules on the application side, and responds to requests from the module 30 using the held data. The data storage unit 50 is provided by a storage space allocated on the physical main memory of each AP server 20, and holds all or a part of the data of the child table partitioned from the table of the database 90. The data storage unit 50 can hold a so-called materialized view. The data storage unit 50 can be preferably configured as an in-memory relational database.

パーティション４０は、それぞれ、データ格納部５０に加え、さらにバッチ処理部６０と、レプリカ処理部８０とを含んで構成される。バッチ処理部６０は、モジュール３０からのデータベース９０に対する更新要求による更新ログを一時的に格納している。レプリカ処理部８０は、上記データベース９０に対する更新要求による更新ログのレプリカを、後述する同一グループに所属する他のＡＰサーバ上で動作するパーティションのレプリカ処理部に送信する。送信先の他のすべてのパーティションのレプリカ処理部からレプリカの受領確認を受信して、更新ログの多重化の成功とされる。またレプリカ処理部８０は、同一グループに所属する他のパーティションのレプリカ処理部から更新ログのレプリカを受信して一時的に格納し、その受領確認を応答する。 Each partition 40 includes a batch processing unit 60 and a replica processing unit 80 in addition to the data storage unit 50. The batch processing unit 60 temporarily stores an update log according to an update request for the database 90 from the module 30. The replica processing unit 80 transmits an update log replica according to an update request to the database 90 to a replica processing unit of a partition operating on another AP server belonging to the same group, which will be described later. The replica reception confirmation is received from the replica processing units of all other partitions of the transmission destination, and the update log is successfully multiplexed. Further, the replica processing unit 80 receives and temporarily stores the update log replica from the replica processing unit of another partition belonging to the same group, and responds with a receipt confirmation.

バッチ処理部６０による更新ログの格納、およびレプリカ処理部８０のレプリカ送信による更新ログの多重化の成功をもってコミットとし、モジュール３０に対し更新要求の応答がなされる。上記更新ログおよびレプリカは、データ格納部５０と同様に、ＡＰサーバ２０の物理メインメモリの記憶空間により提供される。更新ログのレプリカは、プライマリ障害時に取り出せる形式にて保持していれば良いため、ディスクＩＯを回避することで、データベースへのアクセスと比較してレスポンスタイムを向上させることができる。 When the update log is successfully stored by the batch processing unit 60 and the update log is multiplexed by replica transmission by the replica processing unit 80, the commit is made and a response to the update request is made to the module 30. The update log and replica are provided by the storage space of the physical main memory of the AP server 20, similarly to the data storage unit 50. Since the update log replica only needs to be held in a format that can be retrieved in the event of a primary failure, avoiding disk IO can improve response time compared to database access.

一方、上記バッチ処理部６０は、自身が一時的に格納する更新ログ、およびレプリカ処理部８０が格納する受信した更新ログのレプリカの合計量に対応して、これらの更新ログを読み出す。そして、バッチ処理部６０は、これら蓄積された更新内容をデータベース９０に反映させるべくバッチ更新を実施する。バッチ更新を受信したデータベース・サーバ１４は、バッチ更新に対応する更新処理を効率化してデータベース９０に反映させ、更新内容を永続化させる。なお、上記更新ログは、データベースを更新する前と更新した後のデータ、操作の内容などを保持するトランザクション・ログとして構成することができ、ＡＰサーバ２０側で蓄積する更新内容をデータベース９０に反映し永続化するための履歴情報である。 On the other hand, the batch processing unit 60 reads these update logs corresponding to the total amount of update logs stored by itself and the replicas of received update logs stored by the replica processing unit 80. Then, the batch processing unit 60 performs a batch update to reflect these accumulated update contents in the database 90. The database server 14 that has received the batch update streamlines the update process corresponding to the batch update and reflects it in the database 90 to make the update contents permanent. The update log can be configured as a transaction log that holds data before and after updating the database, contents of operations, and the like, and the update contents accumulated on the AP server 20 side are reflected in the database 90. It is history information for persistence.

本実施形態のデータベース・システム１０においては、互いに独立したＡＰサーバ上で動作するパーティションから構成され、更新ログを相互にレプリケーションし合うものとして予め定められたグループ（以下、レプリケーション・グループとして参照する。）が構成される。上記バッチ更新は、すべてのパーティションがそれぞれに実施するのではなく、レプリケーション・グループに属するパーティションの内、いずれか１つのパーティションが実行主体として割り当てられて、実行される。負荷を分散させる観点から、好ましくは、上記レプリケーション・グループ内で直近の更新負荷量が最も小さいパーティションが実行主体として割り当てられる。 In the database system 10 of the present embodiment, it is configured by partitions that operate on AP servers that are independent from each other, and the update log is referred to as a group (hereinafter referred to as a replication group) that is determined in advance to mutually replicate. ) Is configured. The batch update is not executed by all partitions, but is executed by assigning any one of the partitions belonging to the replication group as an execution subject. From the viewpoint of distributing the load, the partition with the smallest update load amount in the replication group is preferably assigned as the execution subject.

コーディネート・サーバ１６上には、グループ調整部９２が動作している。グループ調整部９２は、上記レプリケーション・グループを編成し、管理している。図３（Ａ）は、本発明の実施形態においてコーディネート・サーバ１６が保持するパーティション管理テーブル１２０のデータ構造を示す。図３（Ａ）に示すパーティション管理テーブル１２０は、パーティションを識別するパーティションＩＤが入力されるフィールド１２０ａと、そのパーティションが動作するサーバを識別するサーバＩＤが入力されるフィールド１２０ｂと、そのパーティションが現在所属しているレプリケーション・グループを識別するグループＩＤが入力されるフィールド１２０ｃとを含んで構成される。 A group adjustment unit 92 operates on the coordinate server 16. The group adjustment unit 92 organizes and manages the replication group. FIG. 3A shows the data structure of the partition management table 120 held by the coordinated server 16 in the embodiment of the present invention. The partition management table 120 shown in FIG. 3A includes a field 120a in which a partition ID for identifying a partition is input, a field 120b in which a server ID for identifying a server on which the partition operates is input, And a field 120c in which a group ID for identifying the replication group to which the member belongs is input.

パーティション管理テーブル１２０は、さらにパーティションの直近の更新負荷量を示す値を保持するフィールド１２０ｄを含んで構成される。グループ調整部９２は、定期的に各パーティション４０から更新負荷量の報告を受けて、フィールド１２０ｄの値を更新する。更新負荷量を示す値としては、特に限定されるものではないが、例えば単位時間あたりの更新数を採用することができ、図３に示す例では、１時間あたりの更新数が入力されている。 The partition management table 120 further includes a field 120d that holds a value indicating the latest update load amount of the partition. The group adjustment unit 92 periodically receives an update load amount report from each partition 40 and updates the value of the field 120d. The value indicating the update load amount is not particularly limited, but, for example, the number of updates per unit time can be adopted. In the example shown in FIG. 3, the number of updates per hour is input. .

さらにグループ調整部９２は、フィールド１２０ｄの内容を定期的に参照し、グループ内で直近の更新負荷量が最小であるパーティションを実行主体（以下、実行パーティションとして参照する。）として割り当てて、通知する。パーティション管理テーブル１２０は、さらに、実行パーティションであるか否かを示す値が入力されるフィールド１２０ｅを含んで構成される。グループ調整部９２は、上記実行主体の割り当てに応じて、対応するフィールド１２０ｅの値を書き換える。また、グループ調整部９２は、実行主体に変更がある場合には、実行パーティションの割り当てから外れたパーティションに対してその旨を通知する。 Further, the group adjustment unit 92 periodically refers to the contents of the field 120d, assigns a partition having the smallest update load amount in the group as an execution subject (hereinafter referred to as an execution partition), and notifies the same. . The partition management table 120 further includes a field 120e into which a value indicating whether the partition is an execution partition is input. The group adjustment unit 92 rewrites the value of the corresponding field 120e in accordance with the execution subject assignment. In addition, when there is a change in the execution subject, the group adjustment unit 92 notifies the partition that is not assigned to the execution partition.

さらに、パーティション管理テーブル１２０は、稼働状況を示す値が入力されるフィールド１２０ｆを含む。グループ調整部９２は、各ＡＰサーバ２０からのハートビートが途絶えたことに応答して、その障害を検知し、稼働状況に対応させてフィールド１２０ｆの値を更新する。なお、障害の検出方法は、例えば、ハートビートに限定されるものではなく、他の実施形態では、グループ調整部９２がポーリングを行って、応答の有無によりサーバの障害を検知することもできる。 Further, the partition management table 120 includes a field 120f into which a value indicating the operation status is input. In response to the interruption of the heartbeat from each AP server 20, the group adjustment unit 92 detects the failure and updates the value of the field 120 f in accordance with the operating status. Note that the failure detection method is not limited to, for example, a heartbeat. In other embodiments, the group adjustment unit 92 may perform polling to detect a server failure based on the presence or absence of a response.

実行パーティションが動作するＡＰサーバ２０の障害を検知した場合には、グループ調整部９２は、該実行パーティションがレプリケーション・グループから外れたものとして、実行パーティションの変更を実施する。ここで、実行主体ではないパーティションは、非実行パーティションとする。グループ調整部９２は、一方、障害が検知されたＡＰサーバ２０上の非実行パーティションが属するグループの実行パーティションに対しては、バッチ更新の即時実行を指示する。 When a failure of the AP server 20 on which the execution partition operates is detected, the group adjustment unit 92 changes the execution partition on the assumption that the execution partition is out of the replication group. Here, the partition that is not the execution subject is a non-execution partition. On the other hand, the group adjustment unit 92 instructs immediate execution of batch update for the execution partition of the group to which the non-execution partition on the AP server 20 on which the failure is detected belongs.

さらにグループ調整部９２は、データベース・システム１０のスタートアップ時、またはメンテナンス時などにオペレータからの指示を受けて、クラスタ２２上で動作する各パーティション毎の直近の更新負荷量から、各グループ内の更新負荷量の総和がグループ間で均一化するようにパーティションをグループ分けする。これにより、グループ調整部９２は、レプリケーション・グループを編成することができる。レプリケーション・グループが決定されると、グループ調整部９２は、パーティション管理テーブル１２０のフィールド１２０ｃの値を書き換える。 Further, the group adjustment unit 92 receives an instruction from the operator at the start-up or maintenance of the database system 10 and updates the update within each group based on the latest update load amount for each partition operating on the cluster 22. Partitions are grouped so that the total load is uniform among the groups. As a result, the group adjustment unit 92 can organize a replication group. When the replication group is determined, the group adjustment unit 92 rewrites the value of the field 120c of the partition management table 120.

図４は、本発明の実施形態によるコーディネート・サーバが実行するレプリケーション・グループの編成処理のフローチャートを示す。図４に示す処理は、データベース・システム１０のスタートアップや、オペレータからの指示を受けてステップＳ１００から開始される。ステップＳ１０１では、グループ調整部９２は、パーティション管理テーブル１２０にアクセスして、フィールド１２０ｂ，１２０ｄから、各パーティションが動作するサーバのサーバＩＤと、各パーティションの更新負荷量の最新情報とを取得する。 FIG. 4 is a flowchart of a replication group organization process executed by the coordination server according to the embodiment of the present invention. The process shown in FIG. 4 is started from step S100 in response to the startup of the database system 10 or an instruction from the operator. In step S101, the group adjustment unit 92 accesses the partition management table 120 and acquires the server ID of the server on which each partition operates and the latest information on the update load amount of each partition from the fields 120b and 120d.

ステップＳ１０２では、グループ調整部９２は、各パーティションのグループ分けの可能な組合せを生成する。このとき、対応するサーバＩＤが重複して同一グループに含まれる組合せが排除される。また、１つのパーティションのみからなるグループを含むグループ分けの組合せも排除される。 In step S 102, the group adjustment unit 92 generates a possible combination of groups for each partition. At this time, combinations in which corresponding server IDs are duplicated and included in the same group are excluded. Also, grouping combinations including groups consisting of only one partition are eliminated.

ステップＳ１０３では、生成されたグループ分けの可能な組合せから、各グループ内の更新負荷量の総和のグループ間での差異を最小化するグループ分けの組合せを求め、レプリケーション・グループを編成する。例えば、グループ間の総更新負荷量の差の絶対値の総和が最小化される組合せを求める。 In step S103, a grouping combination that minimizes the difference between the total sums of the update load amounts in each group is obtained from the possible grouping combinations, and a replication group is organized. For example, a combination that minimizes the sum of absolute values of differences in total update load amounts between groups is obtained.

ステップＳ１０４では、編成されたレプリケーション・グループの各パーティションに対して、その所属グループ、および同一グループに所属する他パーティションを通知し、ステップＳ１０５で処理を終了させる。なお、ステップＳ１０４では、各レプリケーション・グループにつき、グループ内で最小の更新負荷量のパーティションに対し、実行パーティションに割り当てられた旨の通知を同時に実施することもできる。 In step S104, the assigned group and other partitions belonging to the same group are notified to each partition of the organized replication group, and the process ends in step S105. In step S104, for each replication group, a notification indicating that it is allocated to the execution partition can be simultaneously performed for the partition having the smallest update load amount in the group.

また、レプリケーション・グループを編成する方法は、上述の例に限定されるものではない。例えば、他の実施形態では、各パーティションにつき、報告された更新負荷の時系列を記録しておき、各グループ内の一定期間の平均更新負荷の総和のグループ間での差異を最小化するグループ分けの組合せを求めて、レプリケーション・グループを編成することもできる。図４に示すようなレプリケーション・グループの編成処理により、各パーティションの更新負荷量が相違する場合であっても、後述するように、バッチ更新のバッチサイズをグループ間で均一化することが可能となる。 Further, the method of organizing the replication group is not limited to the above example. For example, in another embodiment, grouping that records the time series of the reported update load for each partition and minimizes the difference between the groups of the average update load for a certain period within each group. You can also organize replication groups for these combinations. Even when the update load amount of each partition differs by the replication group organization process as shown in FIG. 4, the batch update batch size can be made uniform among the groups as described later. Become.

再び図２を参照すると、リースサーバ１８上には、リース管理部９４が動作している。実際のバッチ更新の際には、実行パーティションのバッチ処理部６０は、リース管理部９４に問い合わせて、バッチ更新の実行権限を確認する。リース管理部９４は、各レプリケーション・グループ内の実行パーティションのバッチ処理部６０から、バッチ更新の実行権限の問い合わせを受けて、期限付きで実行権限をリースする。リース管理部９４は、グループ内で唯一のパーティション４０のバッチ処理部６０に排他的にバッチ更新の実行権限を与えている。 Referring again to FIG. 2, the lease management unit 94 operates on the lease server 18. At the time of actual batch update, the batch processing unit 60 of the execution partition makes an inquiry to the lease management unit 94 to confirm the execution authority of batch update. The lease management unit 94 receives an inquiry about the execution authority of the batch update from the batch processing unit 60 of the execution partition in each replication group, and leases the execution authority with a time limit. The lease management unit 94 gives the batch update execution authority exclusively to the batch processing unit 60 of the only partition 40 in the group.

例えば実行パーティションのＡＰサーバ２０がネットワーク障害等により、クラスタ２２から切り離された場合、ＡＰサーバの２０障害を検知したグループ調整部９２が実行パーティションを変更することで、同一グループ内に複数の実行パーティションが存在する可能性がある。このような場合、クラスタ２２から切り離されたＡＰサーバ２０上の実行パーティションが、他のネットワークを介してデータベース９０にバッチ更新してしまう蓋然性がある。しかしながら、上記の実行権限の管理により、ある時点で唯一の実行パーティションにバッチ更新の実行権限が与えられているため、少なくとも同時にバッチ更新してしまうことを回避することができる。 For example, when the AP server 20 of the execution partition is disconnected from the cluster 22 due to a network failure or the like, the group adjustment unit 92 that detects the AP server 20 failure changes the execution partition, so that a plurality of execution partitions are included in the same group. May exist. In such a case, there is a possibility that the execution partition on the AP server 20 separated from the cluster 22 is batch updated in the database 90 via another network. However, by managing the execution authority described above, since the execution authority for batch update is given to a single execution partition at a certain time, it is possible to avoid batch updating at least simultaneously.

図３（Ｂ）は、本発明の実施形態においてリースサーバ１８が保持するリース管理テーブル１３０のデータ構造を示す。図３（Ｂ）に示すリース管理テーブル１３０は、グループＩＤが入力されるフィールド１３０ａと、そのグループの現在のバッチ更新の実行権限の割り当て状態を示す値が入力されるフィールド１３０ｂと、実行権限のリース期限が入力されるフィールド１３０ｃと、実行権限を付与したパーティションのＩＤが入力されるフィールド１３０ｄとを含んで構成される。 FIG. 3B shows the data structure of the lease management table 130 held by the lease server 18 in the embodiment of the present invention. The lease management table 130 shown in FIG. 3B includes a field 130a in which a group ID is input, a field 130b in which a value indicating a current batch update execution authority assignment state of the group is input, and an execution authority It includes a field 130c for inputting the lease term and a field 130d for inputting the ID of the partition to which the execution authority is given.

リース管理部９４は、各パーティションからバッチ更新の実行権限の問い合わせを受けて、そのグループの対応するフィールド１３０ｂの値を読み取る。その値が「ｌｏｃｋ」であり、かつ問い合わせのパーティションが権限を付与しているパーティションと相違すれば、リース管理部９４は、権限の取得失敗を通知する。リース管理部９４は、問い合わせを受けて、そのグループの対応するフィールド１３０ｂの値が「ｌｏｃｋ」であり、かつ問い合わせのパーティションが権限を付与しているパーティションであれば、更新期限を延長し、フィールド１３０ｃを書き換え、問い合わせ元に延長された期限を通知する。リース管理部９４は、問い合わせを受けて、フィールド１３０ｂの値が「ｕｎｌｏｃｋ」であれば、「ｌｏｃｋ」に書き換えて、問い合わせ元に権限の取得成功を通知し、フィールド１３０ｃ，ｄの値を書き換える。さらにリース管理部９４は、リース期限が切れた場合、フィールド１３０ｂの値を「ｕｎｌｏｃｋ」に書き換える。 The lease management unit 94 receives an inquiry about the execution authority of batch update from each partition, and reads the value of the corresponding field 130b of the group. If the value is “lock” and the inquired partition is different from the authorized partition, the lease management unit 94 notifies the authority acquisition failure. Upon receiving the inquiry, the lease management unit 94 extends the renewal time limit if the value of the corresponding field 130b of the group is “lock” and the inquiry partition is an authorized partition. 130c is rewritten, and the extended deadline is notified to the inquiry source. In response to the inquiry, if the value of the field 130b is “unlock”, the lease management unit 94 rewrites it to “lock”, notifies the inquiry source of successful acquisition of authority, and rewrites the values of the fields 130c and d. Further, the lease management unit 94 rewrites the value of the field 130b to “unlock” when the lease term has expired.

なお、本実施形態では、リースサーバ１８を別途設けてバッチ更新の実行権限を管理しているが、他の実施形態では、同一グループに属するパーティションのバッチ処理部６０間でのメッセージ交換による相互合意よって、実行パーティションに期限付きで実行権限を認める構成としてもよい。その場合には、処理効率の観点から好適には、上記レプリカ処理部８０間の更新ログのレプリカの送受信の際にピギーバックさせることができる。 In the present embodiment, the lease server 18 is separately provided to manage the execution authority of batch update. However, in other embodiments, mutual agreement by message exchange between the batch processing units 60 of partitions belonging to the same group. Therefore, a configuration may be adopted in which execution authority is granted to the execution partition with a time limit. In that case, from the viewpoint of processing efficiency, it is preferable to perform piggyback when the replica of the update log between the replica processing units 80 is transmitted and received.

以下、図５〜図１２を参照して、本発明の実施形態による更新ログのレプリケーション処理およびバッチ更新処理の詳細について説明する。図５は、本発明の実施形態による、更新ログのレプリケーションおよびバッチ更新に関連するデータフロー図である。図５に示す図は、バッチ処理部６０およびレプリカ処理部８０について、より詳細な機能ブロックを示している。 Details of the update log replication process and the batch update process according to the embodiment of the present invention will be described below with reference to FIGS. FIG. 5 is a data flow diagram associated with update log replication and batch updates according to an embodiment of the present invention. The diagram shown in FIG. 5 shows more detailed functional blocks for the batch processing unit 60 and the replica processing unit 80.

バッチ処理部６０は、ログ格納部６２、更新実行部６４、更新タイマ６６、閾値比較部６８、更新カウンタ７０、および実行権限取得部７２を含んで構成される。レプリカ処理部８０は、レプリカ格納部８２、更新カウンタ８４、レプリカ受信部８６、およびレプリカ送信部８８を含んで構成される。 The batch processing unit 60 includes a log storage unit 62, an update execution unit 64, an update timer 66, a threshold comparison unit 68, an update counter 70, and an execution authority acquisition unit 72. The replica processing unit 80 includes a replica storage unit 82, an update counter 84, a replica reception unit 86, and a replica transmission unit 88.

データ格納部５０は、モジュール３０からのデータベース９０に対する更新要求を受領して、この更新要求をバッチ処理部６０に渡す。バッチ処理部６０は、データ格納部５０から更新要求を受領して、その更新ログ１００をログ格納部６２に一時的に格納し、更新カウンタ７０をインクリメントし、データ格納部５０に処理を戻す。 The data storage unit 50 receives an update request for the database 90 from the module 30 and passes this update request to the batch processing unit 60. The batch processing unit 60 receives an update request from the data storage unit 50, temporarily stores the update log 100 in the log storage unit 62, increments the update counter 70, and returns the process to the data storage unit 50.

データ格納部５０は、さらに更新要求をレプリカ処理部８０に渡す。レプリカ処理部８０は、データ格納部５０から更新要求を受領して、レプリカ送信部８８を呼び出し、その更新要求による更新ログのレプリカを、同一グループに所属する他のＡＰサーバ上で動作するパーティション４１のレプリカ処理部８１のレプリカ受信部（図示せず。）に送信する。レプリカ処理部８０は、レプリカ送信部８８が送信先のすべてのパーティションからの受領確認を受信したことに応答して、データ格納部５０へ処理を戻す。データ格納部５０は、ログ格納部６２による更新ログの格納、およびレプリカ送信部８８のレプリカ送信に対する受信確認の応答をもってコミットとし、モジュール３０に対し更新要求の応答する。 The data storage unit 50 further passes an update request to the replica processing unit 80. The replica processing unit 80 receives the update request from the data storage unit 50, calls the replica transmission unit 88, and operates the replica 41 of the update log by the update request as a partition 41 that operates on another AP server belonging to the same group. To the replica receiving unit (not shown) of the replica processing unit 81. The replica processing unit 80 returns the processing to the data storage unit 50 in response to the reception of the receipt confirmation from all the destination partitions by the replica transmission unit 88. The data storage unit 50 makes a commit by storing the update log in the log storage unit 62 and the reception confirmation response to the replica transmission of the replica transmission unit 88, and sends an update request response to the module 30.

またレプリカ処理部８０のレプリカ受信部８６は、同一グループに所属する他のパーティションのレプリカ処理部８１のレプリカ送信部から更新ログのレプリカを受信して、更新ログのレプリカ（以下、更新ログ・レプリカとして参照する。）一時的にレプリカ格納部８２に格納し、更新カウンタ８４をインクリメントする。さらにレプリカ受信部８６は、更新ログ・レプリカの格納の後、その成功を受領確認として、送信元のレプリカ処理部８１のレプリカ送信部へ送信する。 The replica receiving unit 86 of the replica processing unit 80 receives an update log replica from the replica transmission unit of the replica processing unit 81 of another partition belonging to the same group, and updates the replica of the update log (hereinafter referred to as update log replica). The data is temporarily stored in the replica storage unit 82, and the update counter 84 is incremented. Further, after storing the update log replica, the replica receiving unit 86 transmits the success to the replica transmitting unit of the replica processing unit 81 of the transmission source as a receipt confirmation.

実行パーティション４０の閾値比較部６８は、バッチ処理部６０の更新カウンタ７０およびレプリカ処理部８０の更新カウンタ８４の値をモニタし、その合計値とバッチサイズとして予め設定した閾値とを比較している。つまり、レプリケーション・グループ全体が受領した更新数が計数され、閾値と比較されることとなる。実行パーティション４０の閾値比較部６８により上記の合計値が閾値を超えたと判定される場合には、更新実行部６４が呼び出される。更新実行部６４は、呼び出されて、バッチ更新の実行権限を有する場合には、ログ格納部６２から更新ログ１００を読み出し、レプリカ格納部８２から更新ログ・レプリカ１１０を読み出して、例えばＳＱＬ文を作成し、データベース９０に対しバッチ更新を実行する。バッチ更新の実行権限が無い場合には、実行権限取得部７２が呼び出され、バッチ更新の実行権限を取得した後にバッチ更新を実行する。 The threshold comparison unit 68 of the execution partition 40 monitors the values of the update counter 70 of the batch processing unit 60 and the update counter 84 of the replica processing unit 80, and compares the total value with a threshold set in advance as the batch size. . That is, the number of updates received by the entire replication group is counted and compared with the threshold value. When the threshold value comparison unit 68 of the execution partition 40 determines that the total value has exceeded the threshold value, the update execution unit 64 is called. When the update execution unit 64 is called and has the authority to execute batch update, the update execution unit 64 reads the update log 100 from the log storage unit 62, reads the update log replica 110 from the replica storage unit 82, and stores, for example, an SQL statement. Create and execute a batch update on the database 90. If there is no execution authority for batch update, the execution authority acquisition unit 72 is called to execute batch update after acquiring execution authority for batch update.

更新実行部６４は、更新カウンタ７０および更新カウンタ８４の合計値によらず、バッチ更新を実施することができる。バッチ処理部６０の更新タイマ６６は、所属するグループにおける最後のバッチ更新からの経過時間を計時する。更新タイマ６６の所与の時間が経過したことに応答して、更新実行部６４が呼び出される。この所与の時間は、データベース・システム１０のＭＴＴＦ以下の値とすることが好ましい。更新実行部６４は、更新タイマ６６の満了に応答して、実行権限の取得を適宜行い、更新ログ１００および更新ログ・レプリカ１１０を読み出して、データベース９０に対しバッチ更新を実行する。 The update execution unit 64 can perform batch update regardless of the total value of the update counter 70 and the update counter 84. The update timer 66 of the batch processing unit 60 measures the elapsed time from the last batch update in the group to which it belongs. In response to the elapse of a given time of the update timer 66, the update execution unit 64 is called. This given time is preferably less than or equal to the MTTF of the database system 10. In response to the expiration of the update timer 66, the update execution unit 64 appropriately acquires execution authority, reads the update log 100 and the update log replica 110, and executes batch update on the database 90.

コーディネート・サーバ１６のグループ調整部９２は、ＡＰサーバ２０の障害を検知して、そのＡＰサーバ２０上で動作する非実行パーティションのグループ内の実行パーティションのバッチ処理部６０にバッチ更新の即時実行を指示する。バッチ処理部６０は、バッチ更新の即時実行の指示を受けて、実行権限の取得を適宜行い、更新実行部６４を呼び出して、データベース９０に対しバッチ更新を実行させる。 The group adjustment unit 92 of the coordination server 16 detects the failure of the AP server 20 and immediately executes batch update to the batch processing unit 60 of the execution partition in the non-execution partition group operating on the AP server 20. Instruct. The batch processing unit 60 receives an instruction for immediate execution of batch update, appropriately acquires execution authority, calls the update execution unit 64, and causes the database 90 to execute batch update.

更新実行部６４は、バッチ更新が成功した場合、その対応する更新ログおよび更新ログ・レプリカを更新実行済みであるとして、ログ格納部６２およびレプリカ格納部８２から破棄するか、あるいは更新実行済みを示すフラグを立てる。そして更新実行部６４は、実行済みの更新要求を他の非実行パーティションに報告するために、更新実行済みの更新要求を報告する報告リストとしてリストアップして備える。レプリカ送信部８８は、次のレプリカ送信の際に報告リストをピギーバックして、グループ内の他のパーティションに報告する。 When the batch update is successful, the update execution unit 64 determines that the corresponding update log and update log replica have been updated and discards them from the log storage unit 62 and the replica storage unit 82, or determines that the update has been executed. Set a flag to indicate. Then, the update execution unit 64 includes a report list for reporting the update requests that have been updated, in order to report the executed update requests to other non-execution partitions. The replica transmission unit 88 piggybacks the report list at the time of the next replica transmission and reports it to other partitions in the group.

レプリカ受信部８６は、他のパーティションから報告リストを取得して、更新実行済みの更新ログ１００または更新ログ・レプリカ１１０をログ格納部６２およびレプリカ格納部８２から破棄するか、あるいは更新実行済みを示すフラグを立てる。フラグの立てられた更新要求は、後に、適宜破棄されることとなる。 The replica receiving unit 86 acquires a report list from another partition and discards the update log 100 or update log replica 110 that has been updated from the log storage unit 62 and the replica storage unit 82, or indicates that the update has been performed. Set a flag to indicate. The update request with the flag set will be appropriately discarded later.

更新カウンタ７０は、コーディネート・サーバ１６のグループ調整部９２に、自身の単位時間あたりの更新数を定期的に報告している。バッチ処理部６０は、実行パーティションとして割り当てられて、実行権限取得部７２を呼び出す。実行権限取得部７２は、実行パーティションである間、定期的にリースサーバ１８上のリース管理部９４に問い合わせて、実行権限を取得し、また維持する。 The update counter 70 periodically reports the number of updates per unit time to the group adjustment unit 92 of the coordination server 16. The batch processing unit 60 is assigned as an execution partition and calls the execution authority acquisition unit 72. The execution authority acquisition unit 72 periodically inquires the lease management unit 94 on the lease server 18 to acquire and maintain the execution authority while it is an execution partition.

図６は、本発明の実施形態によるＡＰサーバ２０が実行するアプリケーション側からの更新要求に対する処理動作のフローチャートを示す。図６に示す処理は、アプリケーション側からのデータベースに対する更新要求に対応して、ステップＳ２００から開始される。ステップＳ２０１では、データ格納部５０は、モジュール３０からデータベースに対する更新要求を受領し、この更新要求をバッチ処理部６０へ渡す。ステップＳ２０２では、バッチ処理部６０は、更新要求による更新ログをログ格納部６２に格納し、ステップＳ２０３で、更新カウンタ７０をインクリメントし、データ格納部５０へ処理を戻す。 FIG. 6 shows a flowchart of the processing operation for the update request from the application executed by the AP server 20 according to the embodiment of the present invention. The process shown in FIG. 6 is started from step S200 in response to an update request to the database from the application side. In step S 201, the data storage unit 50 receives an update request for the database from the module 30, and passes this update request to the batch processing unit 60. In step S202, the batch processing unit 60 stores an update log based on the update request in the log storage unit 62, and in step S203, increments the update counter 70 and returns the process to the data storage unit 50.

ステップＳ２０４では、更新要求がデータ格納部５０からレプリカ処理部８０へ渡され、レプリカ処理部８０は、更新要求による更新ログのレプリカを複製する。ステップＳ２０５では、報告リストにリストアップされた未報告の更新実行済みの更新要求があるか否かを判定する。ステップＳ２０５で、未報告の更新実行済みの更新要求が有ると判定された場合（ＹＥＳ）には、ステップＳ２０６へ処理を進める。ステップＳ２０６では、レプリカ処理部８０は、更新実行済みの更新要求を含む報告リストを、ステップＳ２０４で複製したレプリカに添付し、ステップＳ２０７へ処理を進める。 In step S204, the update request is passed from the data storage unit 50 to the replica processing unit 80, and the replica processing unit 80 duplicates the update log replica of the update request. In step S205, it is determined whether there is an unreported update execution request listed in the report list. If it is determined in step S205 that there is an unreported update execution request (YES), the process proceeds to step S206. In step S206, the replica processing unit 80 attaches the report list including the update request that has been updated to the replica copied in step S204, and advances the process to step S207.

一方、ステップＳ２０５で、未報告の更新実行済みの更新要求が無いと判定された場合（ＮＯ）には、ステップＳ２０７へ直接処理を進める。ステップＳ２０７では、レプリカ処理部８０は、レプリカ送信部８８を呼び出し、同一グループに所属する他のすべてのパーティション４０のレプリカ受信部８６に対し、適宜報告リストをピギーバックさせて、レプリカを送信させる。ステップＳ２０８では、同一グループに所属する他のすべてのパーティション４０から受領確認を受信するまでの間（ＮＯの間）、ステップＳ２０８をループさせ、待ち受ける。 On the other hand, if it is determined in step S205 that there is no unreported update request (NO), the process proceeds directly to step S207. In step S207, the replica processing unit 80 calls the replica transmission unit 88 to appropriately piggyback the report list to the replica reception units 86 of all the other partitions 40 belonging to the same group and transmit the replica. In step S208, step S208 is looped and waited until reception confirmation is received from all other partitions 40 belonging to the same group (during NO).

ステップＳ２０８で、他のすべてのパーティションから受領確認を受信したと判定された場合（ＹＥＳ）には、レプリカ処理部８０は、処理をデータ格納部５０へ戻し、ステップＳ２０９へ処理を進める。ステップＳ２０９では、データ格納部５０は、更新ログの多重化をもって当該更新要求をコミットとし、モジュール３０へ更新要求に対する応答をし、モジュール３０に処理を戻して、ステップＳ２１０で本処理動作を終了させる。 If it is determined in step S208 that receipt confirmation has been received from all other partitions (YES), the replica processing unit 80 returns the processing to the data storage unit 50 and proceeds to step S209. In step S209, the data storage unit 50 commits the update request by multiplexing the update log, responds to the update request to the module 30, returns the processing to the module 30, and ends the processing operation in step S210. .

なお、ステップＳ２０８の待ち受けの際に、もし何らかの原因で所定の時間内にすべてのパーティションからの受領確認を受信できなかった場合などには、適宜エラー・ハンドリングを行うことができる。ここでのエラー・ハンドリングは、特に限定されるものではないが、例えば、データベース・システム１０の運用ポリシーに応じて、１以上のレプリカの多重化の成功をもってコミットとし、後にレプリカの受領確認を受け取っていないパーティションにレプリカを再送するように構成することができる。 It should be noted that, when waiting in step S208, if it is not possible to receive confirmation of receipt from all partitions within a predetermined time for some reason, error handling can be performed as appropriate. The error handling here is not particularly limited, but, for example, according to the operation policy of the database system 10, a success is made when one or more replicas are multiplexed, and a receipt confirmation of the replica is received later. It can be configured to retransmit a replica to a partition that has not.

図７は、本発明の実施形態によるＡＰサーバ２０が実行する他サーバから送信されたレプリカの受信時の処理動作のフローチャートを示す。図７に示す処理は、他サーバのレプリカ送信部８８からのレプリカ送信に対応して、ステップＳ３００から開始される。ステップＳ３０１では、レプリカ受信部８６は、他サーバ上の同一グループのレプリカ送信部８８から更新ログ・レプリカを受信する。レプリカ処理部８０は、ステップＳ３０２で受信した更新ログ・レプリカをレプリカ格納部８２に格納し、ステップＳ３０３で更新カウンタ８４をインクリメントし、ステップＳ３０４で送信元のレプリカ送信部８８へ受領確認を応答する。 FIG. 7 shows a flowchart of processing operations at the time of receiving a replica transmitted from another server executed by the AP server 20 according to the embodiment of the present invention. The processing shown in FIG. 7 starts from step S300 in response to replica transmission from the replica transmission unit 88 of another server. In step S301, the replica reception unit 86 receives the update log replica from the replica transmission unit 88 of the same group on the other server. The replica processing unit 80 stores the update log replica received in step S302 in the replica storage unit 82, increments the update counter 84 in step S303, and returns a receipt confirmation to the replica transmission unit 88 of the transmission source in step S304. .

ステップＳ３０５では、受信したレプリカに報告リストが添付されているか否か、つまり、更新実行済み更新要求が報告されたか否かを判定する。ステップＳ３０５で、更新実行済み更新要求が報告されていると判定された場合（ＹＥＳ）には、ステップＳ３０６へ処理を進める。ステップＳ３０６では、報告リスト中の更新実行済みの更新要求に対応する更新ログおよび更新ログ・レプリカを、それぞれ、ログ格納部６２およびレプリカ格納部８２から破棄、あるいは更新実行済みを示すフラグを立てて、ステップＳ３０７で本処理動作を終了させる。一方、ステップＳ３０５で、更新実行済み更新要求が報告されていないと判定された場合（ＮＯ）には、ステップＳ３０７へ直接進めて、本処理動作を終了させる。 In step S305, it is determined whether or not a report list is attached to the received replica, that is, whether or not an update request that has been updated has been reported. If it is determined in step S305 that an update execution update request has been reported (YES), the process proceeds to step S306. In step S306, the update log and the update log replica corresponding to the update request that has been updated in the report list are discarded from the log storage unit 62 and the replica storage unit 82, respectively, or a flag indicating that the update has been executed is set. In step S307, this processing operation is terminated. On the other hand, if it is determined in step S305 that an update execution update request has not been reported (NO), the process proceeds directly to step S307, and this processing operation is terminated.

図８は、本発明の実施形態によるＡＰサーバ２０が実行するバッチ更新処理動作のフローチャートを示す。図８に示す処理は、実行パーティションとして割り当てられた旨の通知に対応してステップＳ４００から開始される。ステップＳ４０１では、バッチ処理部６０は、グループ調整部９２からの実行パーティションとして割り当てられた旨の通知メッセージを受信する。ステップＳ４０２では、バッチ処理部６０は、実行権限取得部７２を呼び出して、リースサーバ１８のリース管理部９４に対し、バッチ更新の実行権限を問い合わせる。 FIG. 8 shows a flowchart of the batch update processing operation executed by the AP server 20 according to the embodiment of the present invention. The process shown in FIG. 8 is started from step S400 in response to the notification that the execution partition is assigned. In step S401, the batch processing unit 60 receives a notification message from the group adjustment unit 92 indicating that it has been assigned as an execution partition. In step S 402, the batch processing unit 60 calls the execution authority acquisition unit 72 to inquire the execution authority of batch update to the lease management unit 94 of the lease server 18.

ステップＳ４０３では、バッチ処理部６０は、現在まだ実行パーティションであるか否かを判定する。ステップＳ４０３で、例えば、実行権限の問い合わせ中、問い合わせの再試行中、またはバッチ更新の契機となるイベントの待ち受け中に実行パーティションが変更された場合など、既に実行パーティションではないと判定された場合（ＮＯ）には、ステップＳ４１７へ処理を進め、本処理動作を終了させる。一方、ステップＳ４０３で、現在まだ実行パーティションであると判定された場合（ＹＥＳ）には、ステップＳ４０４へ処理を進める。 In step S403, the batch processing unit 60 determines whether it is still an execution partition. If it is determined in step S403 that it is not already an execution partition, for example, if the execution partition is changed during execution authority inquiry, retrying inquiry, or waiting for an event that triggers batch update (for example, If NO, the process proceeds to step S417, and the process operation is terminated. On the other hand, if it is determined in step S403 that the partition is still an execution partition (YES), the process proceeds to step S404.

ステップＳ４０４では、バッチ処理部６０は、ステップＳ４０２での問い合わせの結果実行権限を取得し、また実行権限のリース期限前の有効な権限を有しているか否かを判定する。ステップＳ４０４で、権限の取得に失敗したり、またはバッチ更新の契機となるイベントの待ち受け中にリース期限を超過したりしており、有効な権限を有さないと判定された場合（ＮＯ）には、ステップＳ４０５へ処理を進める。ステップＳ４０５では、一定時間待機し、再びステップＳ４０２へループさせ、実行権限の問い合わせを再試行する。 In step S404, the batch processing unit 60 acquires the execution authority as a result of the inquiry in step S402, and determines whether or not the execution authority has a valid authority before the lease expiration date. If it is determined in step S404 that the acquisition of authority has failed or the lease time limit has been exceeded while waiting for an event that triggers batch update and it is determined that the user does not have valid authority (NO) Advances the process to step S405. In step S405, the process waits for a certain time, loops again to step S402, and re-executes an inquiry about execution authority.

ステップＳ４０４で、有効な権限を有していると判定された場合（ＹＥＳ）には、ステップＳ４０６へ処理を進める。ステップＳ４０６では、バッチ処理部６０は、グループ調整部９２から非実行パーティション障害時におけるバッチ更新の即時実行が指示されているか否かを判定する。ステップＳ４０６で、バッチ更新の即時実行が指示されていないと判定された場合（ＮＯ）には、ステップＳ４０７へ処理を進める。ステップＳ４０７では、バッチ処理部６０は、更新タイマ６６が満了しているか否かを判定する。ステップＳ４０７で、更新タイマ６６が未だ満了していないと判定された場合（ＮＯ）には、ステップＳ４０８へ処理を進める。 If it is determined in step S404 that the user has valid authority (YES), the process proceeds to step S406. In step S406, the batch processing unit 60 determines whether or not the group adjustment unit 92 has instructed immediate execution of batch update when a non-execution partition failure occurs. If it is determined in step S406 that immediate execution of batch update is not instructed (NO), the process proceeds to step S407. In step S407, the batch processing unit 60 determines whether or not the update timer 66 has expired. If it is determined in step S407 that the update timer 66 has not yet expired (NO), the process proceeds to step S408.

バッチ処理部６０は、ステップＳ４０８では、閾値比較部６８を呼び出し、バッチ処理部６０の更新カウンタ７０およびレプリカ処理部８０の更新カウンタ８４の合計値を取得させ、ステップＳ４０９で、合計値が所定の閾値を越えているか否かを判定させる。ステップＳ４０９で、合計値が閾値を超えていないと判定された場合（ＮＯ）には、再びステップＳ４０３へ処理をループさせ、少なくとも実行パーティションである間、障害時の即時実行が指示されるか、更新タイマが満了するか、更新カウンタ７０および更新カウンタ８４の合計値が閾値を超えるまで、ステップＳ４０２〜Ｓ４０９の処理を繰り返させ、バッチ更新の契機となるイベントの発生を待ち受ける。 In step S408, the batch processing unit 60 calls the threshold comparison unit 68 to acquire the total value of the update counter 70 of the batch processing unit 60 and the update counter 84 of the replica processing unit 80. In step S409, the total value is a predetermined value. It is determined whether or not the threshold is exceeded. If it is determined in step S409 that the total value does not exceed the threshold value (NO), the process is looped again to step S403, and at least the execution partition is instructed for immediate execution during the execution partition, Until the update timer expires or the total value of the update counter 70 and the update counter 84 exceeds the threshold value, the processing of steps S402 to S409 is repeated, and the occurrence of an event that triggers batch update is awaited.

上記ステップＳ４０６で即時実行が指示されていると判定された場合（Ｓ４０６：ＹＥＳ）、上記ステップＳ４０７で更新タイマ６６が満了していると判定された場合（Ｓ４０７：ＹＥＳ）、またはステップＳ４０９で更新カウンタ７０および更新カウンタ８４の合計値が閾値を越えていると判定された場合（Ｓ４０９：ＹＥＳ）には、ステップＳ４１０へ処理を分岐させる。 If it is determined in step S406 that immediate execution is instructed (S406: YES), if it is determined in step S407 that the update timer 66 has expired (S407: YES), or updated in step S409. If it is determined that the total value of the counter 70 and the update counter 84 exceeds the threshold (S409: YES), the process branches to step S410.

ステップＳ４１０では、バッチ処理部６０は、更新実行部６４を呼び出し、ログ格納部６２およびレプリカ格納部８２に格納されている更新未実行の更新ログおよび更新ログ・レプリカを取得させる。ステップＳ４１１では、更新実行部６４は、データベース９０に問い合わせて、後述のバッチ更新時に付されるタイムスタンプなどの更新の版管理を可能とするバージョンＩＤの有効性を確認する。ステップＳ４１１で、バージョンＩＤが有効であると判定された場合（ＹＥＳ）には、ステップＳ４１２へ処理を進める。 In step S 410, the batch processing unit 60 calls the update execution unit 64 to acquire update logs and update log replicas that have not been updated and are stored in the log storage unit 62 and the replica storage unit 82. In step S411, the update execution unit 64 inquires of the database 90 and confirms the validity of the version ID that enables update version management such as a time stamp attached at the time of batch update described later. If it is determined in step S411 that the version ID is valid (YES), the process proceeds to step S412.

ステップＳ４１２では、更新実行部６４は、データベース９０に対しバッチ更新による永続化を要求するためのＳＱＬ文を作成し、バッチ更新を実行する。このとき、更新実行部６４は、後にバージョンの有効性を判定するために、同一トランザクション内でデータベース９０上のマスタに付したタイムスタンプなどのバージョンＩＤの更新も要求する。バッチ更新を受信したデータベース９０では、バッチ更新が含む複数の更新要求が最適化されて処理され、更新内容が永続化されることとなる。 In step S 412, the update execution unit 64 creates an SQL statement for requesting persistence by batch update to the database 90 and executes batch update. At this time, the update execution unit 64 also requests a version ID update such as a time stamp attached to the master on the database 90 within the same transaction in order to determine the validity of the version later. In the database 90 that has received the batch update, a plurality of update requests included in the batch update are optimized and processed, and the update contents are made permanent.

ステップＳ４１３では、ステップＳ４１２のバッチ更新が成功裡に完了したか否かを判定する。ステップＳ４１３で、データベース９０からバッチ更新の完了応答を受信して、バッチ更新が成功したと判定される場合（ＹＥＳ）には、ステップＳ４１４へ処理を進める。ステップＳ４１４では、更新実行部６４は、当該バッチ更新の実行により更新実行済みとなった更新要求を上記報告リストに追加する。なお、障害時の即時実行が指示される場合などには、次のレプリカ送信を待たずに、報告リストを直ちに送信することもできる。 In step S413, it is determined whether or not the batch update in step S412 has been successfully completed. In step S413, if a batch update completion response is received from the database 90 and it is determined that the batch update is successful (YES), the process proceeds to step S414. In step S414, the update execution unit 64 adds the update request that has been updated by executing the batch update to the report list. When an immediate execution at the time of failure is instructed, the report list can be transmitted immediately without waiting for the next replica transmission.

報告リストには、バッチ更新の実行時刻のタイムスタンプなどをバージョンＩＤとして含めることができる。ステップＳ４１１のバージョンＩＤの有効性の判定処理では、通知リストに含められた前回のバッチ更新の実行時刻を示すバージョンＩＤと、データベース９０から取得したバージョンＩＤとを比較し、データベース９０から取得したバージョンＩＤが前回のバッチ更新のものと一致することをもって有効とし、前後関係からバージョンＩＤの有効性を判定することができる。 The report list can include a batch update execution time stamp as a version ID. In the determination processing of the validity of the version ID in step S411, the version ID indicating the execution time of the previous batch update included in the notification list is compared with the version ID acquired from the database 90, and the version acquired from the database 90 It becomes valid when the ID matches that of the previous batch update, and the validity of the version ID can be determined from the context.

ステップＳ４１５では、バッチ処理部６０は、更新タイマ６６、更新カウンタ７０および更新カウンタ８４をリセットして、ステップＳ４０３へ処理をループさせ、次のバッチ更新に備える。一方、ステップＳ４１１で、バージョンが有効ではないと判定された場合（Ｓ４１１：ＮＯ）またはステップＳ４１３で、バッチ更新が成功裡に完了しなかったと判定された場合（Ｓ４１３：ＮＯ）には、ステップＳ４１６へ処理を分岐させる。 In step S415, the batch processing unit 60 resets the update timer 66, the update counter 70, and the update counter 84, loops the process to step S403, and prepares for the next batch update. On the other hand, if it is determined in step S411 that the version is not valid (S411: NO) or if it is determined in step S413 that the batch update has not been completed successfully (S413: NO), step S416 is performed. Branch processing to

ステップＳ４１６では、エラー処理を実行し、ステップＳ４０３へ処理をループさせ、次のバッチ更新に備える。ステップＳ４１６のエラー処理では、例えば、エラー警告などを管理者に通知するためにエラー出力して、パーティショニング・コーディネータに失敗を通知するなどのエラー・ハンドリングを行うことができる。この場合、例えば、グループ調整部９２は、別のパーティションを実行パーティションとして割り当てるなどをして対応することとなる。 In step S416, error processing is executed, and the process loops to step S403 to prepare for the next batch update. In the error processing in step S416, for example, error handling can be performed such as outputting an error warning to notify the administrator of the error and notifying the partitioning coordinator of the failure. In this case, for example, the group adjustment unit 92 responds by assigning another partition as an execution partition.

また、バージョンＩＤの無効エラーは、当該実行パーティション以外のパーティションにより既にバッチ更新が実施されている場合に発生することが想定される。より具体的には、バージョンＩＤの無効エラーは、実行パーティションが動作するＡＰサーバ２０のハートビートが何らかの理由でコーディネート・サーバに伝達されず、グループ調整部９２が他のパーティションを実行パーティションとして割り当てた場合であって、（ｉ）その新しい実行パーティションが先にバッチ更新を実施してしまった場合、あるいは、（ｉｉ）新しい実行パーティションが割り当てられた後に、障害が検知された古い実行パーティションによって先にバッチ更新が実施されてしまった場合が想定される。ステップＳ４１６の処理では、バージョンＩＤの無効エラーの場合に、データベース９０に問い合わせて該データベース９０との整合性を維持するように、バッチ更新済みの更新要求を削除あるいは更新済みを示すフラグを立て、対応する更新数を更新カウンタ７０，８４をデクリメントするなどのエラー・ハンドリングを行うこともできる。 Further, it is assumed that the version ID invalid error occurs when batch update has already been performed by a partition other than the execution partition. More specifically, the version ID invalid error indicates that the heartbeat of the AP server 20 on which the execution partition operates is not transmitted to the coordination server for some reason, and the group adjustment unit 92 assigns another partition as the execution partition. If (i) the new execution partition has performed a batch update first, or (ii) after the new execution partition has been assigned, the old execution partition detected the failure first. It is assumed that batch update has been performed. In the process of step S416, in the case of a version ID invalid error, a flag indicating deletion or update of a batch updated update request is set so as to inquire the database 90 and maintain consistency with the database 90, Error handling such as decrementing the update counters 70 and 84 by the corresponding number of updates can also be performed.

図９は、本発明の実施形態による実行パーティションの障害時のデータベース・システム１０の処理動作を示すシーケンス図を示す。図９では、第１パーティション４０−１が実行パーティションであり、第４パーティション４０−４が同一グループの非実行パーティションである場合を例として説明する。 FIG. 9 is a sequence diagram showing processing operations of the database system 10 when the execution partition fails according to the embodiment of the present invention. In FIG. 9, a case where the first partition 40-1 is an execution partition and the fourth partition 40-4 is a non-execution partition of the same group will be described as an example.

第１パーティション４０−１および第４パーティション４０−４は、ステップＳ５００およびステップＳ５０１で示すように、それぞれ、パートビートとして定期的に稼働通知をグループ調整部９２に対して行っている。例えば、グループ調整部９２が、ステップＳ５０２で第４パーティション４０−４から稼働通知を受信するも、実行パーティションである第１パーティション４０−１からの稼働通知を一定時間受信しない場合、ステップＳ５０３で、第１パーティション４０−１の障害を検知する。 As shown in step S500 and step S501, the first partition 40-1 and the fourth partition 40-4 periodically perform operation notification to the group adjustment unit 92 as part beats, respectively. For example, when the group adjustment unit 92 receives the operation notification from the fourth partition 40-4 in step S502 but does not receive the operation notification from the first partition 40-1 that is the execution partition for a certain period of time, in step S503, A failure of the first partition 40-1 is detected.

ステップＳ５０４では、グループ調整部９２は、障害を検知した第１パーティション４０−１と同一グループに所属する第４パーティション４０−４を新たな実行パーティションとして割り当て、その旨を通知する。実行パーティションとして新たに割り当てられた第４パーティション４０−４は、ステップＳ５０５で、直ちにバッチ更新の実行権限をリース管理部９４に問い合わせる。リース管理部９４は、ステップＳ５０６で第１パーティション４０−１に与えていた実行権限のリース期限の途過を待ち、満了を確認する。リース管理部９４は、ステップＳ５０７で、第４パーティション４０−４に実行権限を与える。 In step S504, the group adjustment unit 92 assigns the fourth partition 40-4 belonging to the same group as the first partition 40-1 in which the failure is detected as a new execution partition, and notifies that fact. In step S505, the fourth partition 40-4 newly assigned as the execution partition immediately inquires the lease management unit 94 about the execution authority of the batch update. The lease management unit 94 waits for the expiration of the lease period of the execution authority given to the first partition 40-1 in step S506, and confirms the expiration. In step S507, the lease management unit 94 gives execution authority to the fourth partition 40-4.

この場合、更新ログまたは１以上のレプリカのいずれかが１以上のＡＰサーバ２０上に存在する一方で、更新ログの多重化による永続性レベルが低下している状態となる。そのため、ステップＳ５０８では、新たな実行パーティションである第４パーティション４０−４は、データベース９０に対し速やかにバッチ更新を実行し、データベース９０上に永続化する。バッチ更新が成功裡に完了したことに応答して、ステップＳ５０９で、第４パーティション４０−４は、他の同一グループの第１パーティション４０−１に対し、直ちにバッチ更新実行済み更新要求の報告を試みることができる。 In this case, either the update log or one or more replicas exist on the one or more AP servers 20, while the persistence level due to the multiplexing of the update logs is lowered. For this reason, in step S508, the fourth partition 40-4, which is a new execution partition, executes batch update on the database 90 quickly and makes it permanent on the database 90. In response to the successful completion of the batch update, in step S509, the fourth partition 40-4 immediately reports a batch update executed update request to the other first partition 40-1 of the same group. Can try.

なお、障害後、バッチ更新を成功裡に完了させた後は、データベース・システム１０における運用上のポリシーに従った動作を行うことができ、特に限定されるものではない。例えば、障害サーバの復帰を待たずに低い永続性レベルにてサービスを再開してもよい。その他、障害サーバが復帰するのを待って所期の家属性レベルにてサービスを再開してもよい。あるいは、予備のＡＰサーバにパーティションを再配分して、所期の永続性レベルにてサービスを再開してもよい。低い永続性レベルでのサービス継続をしない運用では、ステップＳ５０４で、実行パーティションを割り当て直した際に、全てのパーティションに対しサービス停止を指示することができる。この場合、そのグループ内の各パーティション４０は、モジュール３０からの新たな更新要求を受領せず、全ての更新がデータベース９０に反映され、永続性レベルの回復を待って、再びサービスが再開されることとなる。 After the failure, after the batch update is successfully completed, the operation according to the operational policy in the database system 10 can be performed, and there is no particular limitation. For example, the service may be resumed at a low persistence level without waiting for the failure server to return. In addition, the service may be resumed at the intended home attribute level after the failure server is restored. Alternatively, the partition may be redistributed to the spare AP server, and the service may be resumed at the intended persistence level. In the operation that does not continue the service at the low persistence level, when the execution partition is reassigned in step S504, the service stop can be instructed to all the partitions. In this case, each partition 40 in the group does not receive a new update request from the module 30, and all updates are reflected in the database 90, and the service is resumed after waiting for the restoration of the persistence level. It will be.

図１０は、本発明の実施形態による非実行パーティションの障害時のデータベース・システム１０の処理動作を示すシーケンス図を示す。図１０では、図９と同様に、第１パーティション４０−１が初期の実行パーティションであり、第４パーティション４０−４が初期の同一グループの非実行パーティションである場合を例として説明する。 FIG. 10 is a sequence diagram showing processing operations of the database system 10 when a non-execution partition fails according to an embodiment of the present invention. 10, as in FIG. 9, the case where the first partition 40-1 is an initial execution partition and the fourth partition 40-4 is an initial non-execution partition of the same group will be described as an example.

第１パーティション４０−１および第４パーティション４０−４は、ステップＳ６００およびステップＳ６０１で示すように、それぞれ、パートビートとして定期的に稼働通知をグループ調整部９２に対して行っている。例えば、グループ調整部９２が、ステップＳ６０２で、実行パーティションである第１パーティション４０−１から稼働通知を受信するも、第４パーティション４０−４からの稼働通知を一定時間受信しない場合、ステップＳ６０３で、非実行パーティションである第４パーティション４０−４の障害を検知する。 As shown in step S600 and step S601, the first partition 40-1 and the fourth partition 40-4 periodically perform operation notification to the group adjustment unit 92 as part beats, respectively. For example, if the group adjustment unit 92 receives an operation notification from the first partition 40-1 that is the execution partition in step S602, but does not receive an operation notification from the fourth partition 40-4 for a certain period of time, the process proceeds to step S603. The failure of the fourth partition 40-4 that is a non-execution partition is detected.

ステップＳ６０４では、グループ調整部９２は、障害を検知した第４パーティション４０−４と同一グループに所属する実行パーティションである第１パーティション４０−１に、バッチ更新の即時実行を指示する。実行パーティションである第１パーティション４０−１は、ステップＳ６０５で、データベース９０に対し速やかにバッチ更新を実行し、データベース９０上に永続化する。バッチ更新が成功裡に完了したことに応答して、ステップＳ６０６で、第１パーティション４０−１は、他の同一グループの第４パーティション４０−４に対し、直ちにバッチ更新実行済み更新要求の報告を試みることができる。ステップＳ６０５の際に実行権限を有さなければ、実行権限をリース管理部９４に問い合わせる。 In step S604, the group adjustment unit 92 instructs the first partition 40-1 that is an execution partition belonging to the same group as the fourth partition 40-4 that has detected the failure, to execute batch update immediately. In step S605, the first partition 40-1, which is an execution partition, executes batch update for the database 90 promptly and makes it permanent on the database 90. In response to the successful completion of the batch update, in step S606, the first partition 40-1 immediately reports a batch update executed update request to the other fourth partition 40-4 of the same group. Can try. If there is no execution authority at step S605, the lease management unit 94 is inquired about the execution authority.

なお、障害後、バッチ更新を成功裡に完了させた後は、データベース・システム１０における運用上のポリシーに従った動作を行うことができ、実行パーティションについて上述したように、特に限定されるものではない。 After the failure, after the batch update is successfully completed, the operation according to the operational policy in the database system 10 can be performed, and the execution partition is not particularly limited as described above. Absent.

図１１は、本発明の実施形態による実行パーティションでのバッチ更新のバッチサイズと、ＬＵＩ（最長更新インターバル）との関係を示す。上述したように、従来技術では、何らかの要因によって、パーティション間の更新要求の負荷バランスが崩れてしまった場合、固定された更新インターバルでのログの集積量がバラツキが生じてしまうか、あるいは固定されたバッチサイズでは、負荷の小さなパーティションによるＬＵＩが、ＭＴＴＦを越えてしまう可能性があった。 FIG. 11 shows the relationship between the batch size of batch update in the execution partition and the LUI (longest update interval) according to the embodiment of the present invention. As described above, in the related art, when the load balance of update requests between partitions is broken due to some factor, the amount of accumulated logs at a fixed update interval varies or is fixed. In the case of the batch size, there is a possibility that the LUI due to the partition having a small load exceeds the MTTF.

一方、本発明の実施形態によるバッチ更新では、各パーティションが受領した更新要求による更新ログが、所属のレプリケーション・グループ内で共有され、集積される。そして、好適には、グループ内の直近負荷の最も小さなパーティションがバッチ更新の実行主体として割り当てられ、上記集積した更新要求が一括してデータベース９０に送信および反映される。 On the other hand, in the batch update according to the embodiment of the present invention, the update log by the update request received by each partition is shared and accumulated in the replication group to which it belongs. Preferably, the partition with the smallest load in the group is assigned as the execution subject of the batch update, and the accumulated update requests are transmitted and reflected in the database 90 in a lump.

更新要求の蓄積量は、個々のパーティションの更新負荷量によらず、編成されたグループ内の更新負荷の総量によって増大してゆく。したがって、バッチサイズ固定では、グループ内のＬＵＩは、最悪の場合でもグループ内の最大の更新負荷を有するパーティションに依存して定まる。そして、全体のＬＵＩは、最小の更新負荷の総量のレプリケーション・グループによって定まる。 The accumulation amount of update requests increases with the total amount of update loads in the organized group, regardless of the update load amount of each partition. Therefore, when the batch size is fixed, the LUI in the group is determined depending on the partition having the maximum update load in the group even in the worst case. The total LUI is determined by the replication group having the minimum amount of update load.

したがって、本発明の実施形態によるバッチ更新では、パーティション間の更新の入力負荷のバランスが困難であっても、より容易なレプリケーション・グループ編成による総入力負荷の制御によって、ＭＴＴＦ以下の所望の値にＬＵＩを制御することが可能となる。さらに、レプリケーション・グループを各パーティションの更新負荷量に対応させて再構成することもできるので、グループ間の総更新負荷の均一化を図ることができる。 Therefore, in the batch update according to the embodiment of the present invention, even if it is difficult to balance the input load of the update between partitions, the total input load can be controlled to a desired value less than MTTF by the easier replication group organization. The LUI can be controlled. Furthermore, since the replication group can be reconfigured according to the update load amount of each partition, the total update load between the groups can be made uniform.

また、従来技術では、入力負荷のバラツキが大きいとき、ＭＴＴＦに従ってＬＵＩを制限すると、図１２（Ｂ）に示すように、入力負荷の小さなパーティションでは、殆ど更新要求を含まないバッチ更新が実施されてしまうことがあった。バッチ更新は、効率化に大きく寄与する一方、実行のオーバヘッド自体は大きいため、この従来のバッチ更新では、バッチサイズ・アンバランスに起因して、レプリケーション・グループのメンバがｎ台の場合、最悪時には、最大スループットが最高時の約１／ｎに見積もられる。 Further, in the conventional technique, when the variation in input load is large, if the LUI is limited according to MTTF, as shown in FIG. 12B, a batch update that hardly includes an update request is performed in a partition with a small input load. There was a case. While batch update greatly contributes to efficiency, the overhead of execution itself is large. Therefore, in this conventional batch update, when there are n replication group members due to batch size imbalance, at worst, The maximum throughput is estimated to be about 1 / n of the maximum.

一方、本発明の実施形態では、上述したようにレプリケーション・グループ全体の更新要求を一括でバッチ更新するため、堪えずＬＵＩ内で最も集約した状態でバッチ更新を実施することができ、バッチサイズを増大させることが可能となる。特にレプリケーション・グループ間の入力負荷が均一に分散され、入力負荷が充分に大きいとき、最高のスループットを得ることが可能となる。 On the other hand, in the embodiment of the present invention, as described above, since the update request for the entire replication group is batch-updated in batch, the batch update can be executed in the most concentrated state within the LUI, and the batch size can be reduced. It can be increased. In particular, when the input load between the replication groups is evenly distributed and the input load is sufficiently large, the highest throughput can be obtained.

以上説明したように、本発明の実施形態によれば、パーティショニングされた分散システムにおいて、パーティション間のバッチサイズ・アンバランスによる問題を解消して、もってバッチ更新によるスループット向上を最大化することが可能なデータベース・システム、サーバ、更新方法およびプログラムを提供することができる。 As described above, according to the embodiments of the present invention, in a partitioned distributed system, it is possible to solve the problem due to batch size imbalance between partitions and thereby maximize throughput improvement by batch update. Possible database systems, servers, update methods and programs can be provided.

なお、本発明につき、発明の理解を容易にするために各機能部および各機能部の処理を記述したが、本発明は、上述した特定の機能部が特定の処理を実行する外、処理効率や実装上のプログラミングなどの効率を考慮して、いかなる機能部に、上述した処理を実行するための機能を割当てることができる。 Although the present invention has been described in order to facilitate understanding of the invention, each function unit and the process of each function unit have been described. However, the present invention is not limited to the above-described specific function unit executing a specific process. A function for executing the above-described processing can be assigned to any functional unit in consideration of efficiency such as programming for implementation and implementation.

本発明の上記機能は、Ｃ＋＋、Ｊａｖａ（登録商標）、Ｊａｖａ（登録商標）Ｂｅａｎｓ、Ｊａｖａ（登録商標）Ａｐｐｌｅｔ、Ｊａｖａ（登録商標）Ｓｃｒｉｐｔ、Ｐｅｒｌ、Ｒｕｂｙなどのオブジェクト指向プログラミング言語、ＳＱＬなどのデータベース言語などで記述された装置実行可能なプログラムにより実現でき、装置可読な記録媒体に格納して頒布または伝送して頒布することができる。 The above-described functions of the present invention include an object-oriented programming language such as C ++, Java (registered trademark), Java (registered trademark) Beans, Java (registered trademark) Applet, Java (registered trademark) Script, Perl, and Ruby, and a database such as SQL. It can be realized by a device executable program described in a language or the like, and can be stored in a device-readable recording medium and distributed or transmitted and distributed.

これまで本発明を、特定の実施形態をもって説明してきたが、本発明は、実施形態に限定されるものではなく、他の実施形態、追加、変更、削除など、当業者が想到することができる範囲内で変更することができ、いずれの態様においても本発明の作用・効果を奏する限り、本発明の範囲に含まれるものである。 Although the present invention has been described with specific embodiments, the present invention is not limited to the embodiments, and other embodiments, additions, changes, deletions, and the like can be conceived by those skilled in the art. It can be changed within the range, and any embodiment is included in the scope of the present invention as long as the effects and effects of the present invention are exhibited.

本発明の実施形態におけるデータベース・システムの概略図。1 is a schematic diagram of a database system in an embodiment of the present invention. 本発明の実施形態によるデータベース・システムにおいて、各サーバ上に実現される機能ブロック図。The functional block diagram implement | achieved on each server in the database system by embodiment of this invention. 本発明の実施形態において（Ａ）コーディネート・サーバが保持するパーティション管理テーブル、（Ｂ）リースサーバが保持するリース管理テーブルのデータ構造を示す図。The figure which shows the data structure of the (A) partition management table which a coordination server hold | maintains, (B) the lease management table which a lease server holds in embodiment of this invention. 本発明の実施形態によるコーディネート・サーバが実行するレプリケーション・グループの編成処理のフローチャート。6 is a flowchart of replication group organization processing executed by the coordinated server according to the embodiment of the present invention. 本発明の実施形態による、更新ログのレプリケーションおよびバッチ更新に関連するデータフロー図。FIG. 3 is a data flow diagram related to update log replication and batch update according to an embodiment of the present invention. 本発明の実施形態によるＡＰサーバが実行するアプリケーション側からの更新要求に対する処理動作のフローチャート。The flowchart of the processing operation with respect to the update request from the application side which AP server by embodiment of this invention performs. 本発明の実施形態によるＡＰサーバが実行する他サーバから送信されたレプリカの受信時の処理動作のフローチャート。The flowchart of the processing operation at the time of reception of the replica transmitted from the other server which AP server by embodiment of this invention performs. 本発明の実施形態によるＡＰサーバが実行するバッチ更新処理動作のフローチャート。The flowchart of the batch update process operation which AP server by embodiment of this invention performs. 本発明の実施形態による実行パーティションの障害時のデータベース・システム１０の処理動作を示すシーケンス図。The sequence diagram which shows the processing operation of the database system 10 at the time of the failure of the execution partition by embodiment of this invention. 本発明の実施形態による非実行パーティションの障害時のデータベース・システムの処理動作を示すシーケンス図The sequence diagram which shows the processing operation of the database system at the time of failure of the non-execution partition by embodiment of this invention 本発明の実施形態による実行パーティションでのバッチ更新のバッチサイズとＬＵＩとの関係を示す図。The figure which shows the relationship between batch size and LUI of the batch update in the execution partition by embodiment of this invention. 従来技術によるバッチサイズ・アンバランスを概略的に示す図。The figure which shows roughly the batch size imbalance by a prior art.

Explanation of symbols

１０…データベース・システム、１２…ネットワーク、１４…データベース・サーバ、１６…コーディネート・サーバ、１８…リースサーバ、２０…ＡＰサーバ、２２…クラスタ、３０…モジュール、４０，４１…パーティション、５０…データ格納部、６０…バッチ処理部、６２…ログ格納部、６４…更新実行部、６６…更新タイマ、６８…閾値比較部、７０…更新カウンタ、７２…実行権限取得部、８０，８１…レプリカ処理部、８２…レプリカ格納部、８４…更新カウンタ、８６…レプリカ受信部、８８…レプリカ送信部、９０…データベース、９２…グループ調整部、９４…リース管理部、１００…更新ログ、１１０…更新ログ・レプリカ、１２０…パーティション管理テーブル、１３０…リース管理テーブル DESCRIPTION OF SYMBOLS 10 ... Database system, 12 ... Network, 14 ... Database server, 16 ... Coordinate server, 18 ... Lease server, 20 ... AP server, 22 ... Cluster, 30 ... Module, 40, 41 ... Partition, 50 ... Data storage , 60 ... batch processing unit, 62 ... log storage unit, 64 ... update execution unit, 66 ... update timer, 68 ... threshold comparison unit, 70 ... update counter, 72 ... execution authority acquisition unit, 80, 81 ... replica processing unit , 82: Replica storage unit, 84: Update counter, 86 ... Replica reception unit, 88 ... Replica transmission unit, 90 ... Database, 92 ... Group adjustment unit, 94 ... Lease management unit, 100 ... Update log, 110 ... Update log Replica, 120 ... partition management table, 130 ... lease management table

Claims

A database system including a database and a plurality of servers, the server comprising:
A data storage unit for storing data of at least a part of the database divided;
A log storage unit for storing an update log according to an update request related to the data received from the application;
In response to the update request, a transmission unit that copies the update log and transmits it to another server;
A replica storage unit for storing a replica of an update log by replication received from another server;
An execution unit that executes batch update on the database in correspondence with the sum of the update log and the replica stored in the log storage unit and the replica storage unit.

The database system according to claim 1, further comprising a coordinated server that assigns an execution subject of the batch update in response to an update load amount notified from a plurality of servers and notifies the execution unit.

The server includes one or more partitions corresponding to the divided data of the database, and the data storage unit, the log storage unit, the transmission unit, the replica storage unit, and the execution unit are provided for each partition. The database system of claim 2, wherein the database system is configured.

The coordinating server organizes a mutual replication group including partitions that mutually replicate update logs for a combination that minimizes a difference between groups in a total of the partition update load amounts in the group. The database system described in.

The execution unit acquires the execution authority of the batch update from a lease server that manages lending of the execution authority in the mutual replication group or by mutual agreement by all partitions of the mutual replication group. Item 5. The database system according to Item 4.

The coordinating server, in response to a failure of a partition of an execution subject, allocates and notifies the most recently loaded partition having no failure among the non-failed partitions in the mutual replication group as an execution subject. The database system described.

The database according to claim 6, wherein the execution unit executes the batch update regardless of the sum in response to a failure of another partition belonging to the mutual replication group or in response to an update interval. ·system.

8. The database system according to claim 7, wherein the transmission unit returns processing to the application in response to the update request in response to receipt of receipt of replicas from all partitions in the mutual replication group. .

A server connected to a database and one or more other servers, the server comprising:
A data storage unit for storing data of at least a part of the database divided;
A log storage unit for storing an update log according to an update request related to the data received from the application;
In response to the update request, a transmission unit that copies the update log and transmits it to another server;
A replica storage unit for storing a replica of an update log by replication received from another server;
An execution unit that executes batch update on the database in correspondence with the sum of the update log and the replica stored in the log storage unit and the replica storage unit.

The server further includes a load measuring unit that measures and notifies an update load amount according to the received update request in order to allocate the execution subject of the batch update among a plurality of servers, and the execution unit corresponds to the allocation The server according to claim 9, which becomes an execution subject of the batch update.

Each of the servers includes one or more partitions corresponding to divided data of the database and belonging to any mutual replication group that mutually replicates update logs, the data storage unit, the log storage unit, The server according to claim 10, wherein the transmission unit, the replica storage unit, and the execution unit are configured for each partition.

The server according to claim 11, wherein the execution unit executes the batch update regardless of the sum in response to a failure of another partition belonging to the mutual replication group or in response to an update interval. .

A method of updating a database, comprising: a server connected to the database and one or more other servers;
Receiving, from an application, an update request related to at least a portion of the data in the database;
In response to the update request, storing an update log according to the received update request, and copying the update log and sending it to another server;
Receiving a replica receipt confirmation from the other server at the destination, and causing the application to respond in response to the update request, and further comprising:
Receiving and storing replicas of replicated update logs from other servers;
Executing a batch update on the database in correspondence with the stored update log and the total of the replicas.

The server further measures and notifies the update load amount of the received update request in order to allocate the execution subject of the batch update among a plurality of servers; The update method according to claim 13, wherein the step as an execution subject is executed.

The update according to claim 14, further comprising the step of realizing one or more partitions each belonging to a mutual replication group that corresponds to the divided data of the database and that replicates update logs to each other. Method.

The server further performs a step of executing the batch update regardless of the sum in response to a notification of a failure of another partition belonging to the mutual replication group or in response to an elapse of an update interval; The update method according to claim 15.

A computer executable program for causing a computer to function as a server connected to a database and one or more other servers, the program comprising:
A data storage unit for storing data of at least a part of the database divided;
A log storage unit for storing an update log according to an update request related to the data received from the application;
In response to the update request, a transmission unit that copies the update log and transmits it to another server,
A replica storage unit that stores a replica of an update log by replication received from another server, and the database corresponding to a total of the update log and the replica stored in the log storage unit and the replica storage unit A computer-executable program that functions as an execution unit that executes batch updates.

The server further functions as a load measuring unit that measures and notifies an update load amount according to the received update request in order to allocate an execution subject of the batch update among a plurality of servers, and the execution unit The program according to claim 17, wherein the program is made to function as an execution subject of the batch update.

Causing the server to generate one or more partitions corresponding to the divided data of the database and belonging to any mutual replication group that mutually replicates update logs, the data storage unit, the log storage unit, and the transmission The program according to claim 18, wherein the storage unit, the replica storage unit, and the execution unit are configured for each partition.

A database system including a database, a coordinating server, and a plurality of servers, the server comprising:
A data storage unit for storing data of at least a part of the database divided;
A log storage unit for storing an update log according to an update request related to the data received from the application;
In response to the update request, a transmission unit that copies the update log and transmits it to another server;
A replica storage unit for storing a replica of an update log by replication received from another server;
An execution unit that executes batch update on the database in correspondence with the total of the update log and the replica stored in the log storage unit and the replica storage unit, and
The coordinating server allocates an execution subject of batch update corresponding to the update load amount notified from a plurality of servers, and notifies the execution unit,
The server includes one or more partitions corresponding to the divided data of the database, and the data storage unit, the log storage unit, the transmission unit, the replica storage unit, and the execution unit are provided for each partition. Configured,
The coordinated server seeks a combination that minimizes the difference between the groups in the total of the partition update load amount, and organizes a mutual replication group including partitions that mutually replicate the update logs.
The execution unit acquires the execution authority of the batch update from a lease server that manages lending of the execution authority in the mutual replication group, or by mutual agreement by all partitions of the mutual replication group,
In response to the failure of the execution subject partition, the coordinating server assigns and notifies the execution unit of the latest and lowest load among the non-failed partitions in the mutual replication group,
The execution unit executes the batch update regardless of the sum in response to a failure of another partition belonging to the mutual replication group or in response to the elapse of an update interval,
In response to receipt of receipt of replicas from all partitions in the mutual replication group, the transmission unit returns processing to the application in response to the update request.
Database system.