JP2009169449A

JP2009169449A - High-reliability database system, synchronizing method thereof, intermediating method, intermediating device, and intermediating program

Info

Publication number: JP2009169449A
Application number: JP2007341727A
Authority: JP
Inventors: Takeshi Mishima; 健三島; Hiroshi Nakamura; 宏中村
Original assignee: Individual
Current assignee: Individual
Priority date: 2007-12-10
Filing date: 2007-12-10
Publication date: 2009-07-30

Abstract

<P>PROBLEM TO BE SOLVED: To provide a synchronism maintaining method for a high-reliability database system that always maintains synchronism between database servers even when a plurality of transactions including updating are executed in parallel. <P>SOLUTION: An intermediating device 20 selects one of a plurality of database servers 10 as a leader and others as followers in advance, and then transmits a processing request received from a client computer 50 only to the leader when receiving the processing request and also transmits the processing request to the followers when receiving a response to the processing request from the leader. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、複数のデータベースサーバを平行動作させる高信頼化データベースシステムに関し、特に各サーバ間の同期技術に関する。The present invention relates to a highly reliable database system that operates a plurality of database servers in parallel, and more particularly to a synchronization technique between servers.

第一の従来の高信頼化データベースシステムは例えば特許文献１に記載されたものが知られている。このシステムは図２１に示すように、複数のデータベースサーバ（以下「サーバ」と言う）と、クライアントコンピュータ（以下「クライアント」と言う）からの処理要求を各サーバに中継するとともに各サーバからの正当な応答の１つをクライアントに処理結果として返す仲介装置とを備えている。As the first conventional highly reliable database system, for example, the one described in Patent Document 1 is known. As shown in FIG. 21, this system relays processing requests from a plurality of database servers (hereinafter referred to as “servers”) and client computers (hereinafter referred to as “clients”) to each server, and from each server, And an intermediary device that returns one of the responses as a processing result to the client.

複数の更新クエリを実行した結果、サーバ間で同期が崩れることがある。それは、同一データアイテムに対する更新の順序が全てのサーバで同一でなかったことが原因である。仲介装置が同期崩れを検出した場合は、同期から外れたサーバ（以下「同期外れサーバ」という）をシステムから一旦切り離しそのサーバのデータを全て捨てる。仲介装置は正常稼動しているサーバ（以下「正常サーバ」という）からある時点のバックアップデータを作成し、その時点以降に正常サーバで実行される更新クエリを差分情報として保持しておく。仲介装置はバックアップデータを使って同期外れサーバのデータベースを復旧し、さらに差分情報を順に実行する。もし、全ての差分情報の実行結果が正常サーバの実行結果と同じならば同期外れサーバは再び同期状態に戻ることができる。しかし、差分情報の実行結果が正常サーバの実行結果と異なる場合は、再び同期崩れが発生したので再度バックアップデータの作成からやり直す。As a result of executing multiple update queries, synchronization may be lost between servers. This is because the update order for the same data item is not the same on all servers. When the intermediary device detects a loss of synchronization, a server out of synchronization (hereinafter referred to as “out of synchronization server”) is temporarily disconnected from the system and all of the server data is discarded. The intermediary apparatus creates backup data at a certain point in time from a normally operating server (hereinafter referred to as “normal server”), and holds update queries executed by the normal server after that point as difference information. The mediation apparatus uses the backup data to restore the out-of-synchronization server database, and sequentially executes the difference information. If the execution results of all the difference information are the same as the execution results of the normal server, the out-of-synchronization server can return to the synchronization state again. However, if the execution result of the difference information is different from the execution result of the normal server, the synchronization loss occurs again, so the process starts again from the creation of the backup data.

第二の従来の高信頼化データベースシステムは例えば非特許文献１に記載されたものが知られている。このシステムは図２２に示すように、複数のデータベースサーバ（以下「サーバ」と言う）と、クライアントコンピュータ（以下「クライアント」と言う）と、クライアントからのクエリをサーバへ渡しその応答をクライアントへ返すスケジューラと、いわゆる２相ロック（２ＰＬ）によって同期を維持するプロキシとから構成されている。As the second conventional highly reliable database system, for example, the one described in Non-Patent Document 1 is known. As shown in FIG. 22, this system has a plurality of database servers (hereinafter referred to as “servers”), a client computer (hereinafter referred to as “clients”), a query from the client to the server, and a response to the client. It is composed of a scheduler and a proxy that maintains synchronization by so-called two-phase lock (2PL).

２ＰＬを実現するためにクライアントが実行するアプリケーションは、トランザクションの先頭で、アクセスを行う予定の全てのテーブルに対してＬＯＣＫ命令でロックを獲得しなければならない。また、トランザクションの最後で、更新が終わったテーブルに対してＵＮＬＯＣＫ命令でロックを開放しなければならない。このＬＯＣＫ命令とＵＮＬＯＣＫ命令はプロキシに対するものであってサーバには送信されない。スケジューラはＬＯＣＫ命令を受信すると一意の番号を割り当ててそのＬＯＣＫ命令とともにその番号をプロキシへ送る。プロキシは一つのテーブルに対して一つのクライアントのみにしかアクセスを許さない。つまり、一つのテーブルにつき一番小さい番号を持ったＬＯＣＫ命令のみに応答を返す。クライアントはＬＯＣＫ命令に対して応答が返ってきたことによってテーブルに対してロックが確保できたと理解し、サーバで実行させるクエリの送信を開始する。プロキシはＵＮＬＯＣＫ命令を受信した場合、ロックを開放し次に小さい番号を持ったＬＯＣＫ命令に許可を与える。つまり、ＬＯＣＫ命令に対して応答を返す。An application executed by a client to implement 2PL must acquire a lock with a LOCK instruction for all tables to be accessed at the beginning of a transaction. At the end of the transaction, the table that has been updated must be unlocked with the UNLOCK instruction. These LOCK and UNLOCK commands are for the proxy and are not sent to the server. When the scheduler receives the LOCK command, it assigns a unique number and sends the number to the proxy along with the LOCK command. The proxy allows access to only one client per table. That is, a response is returned only to the LOCK instruction having the smallest number for one table. The client understands that a lock has been secured for the table by returning a response to the LOCK command, and starts sending a query to be executed by the server. When the proxy receives an UNLOCK command, it releases the lock and grants permission to the LOCK command with the next lower number. That is, a response is returned to the LOCK command.

以上より、複数のクライアントが同じテーブルをアクセスするトランザクションを同時に平行実行しようとしても、プロキシは一つのトランザクションだけにアクセスを許可するため、最大で一つの更新のみしか実行されない。従って、複数の更新が平行に同時実行されなければ複数の更新の順序の入れ代わりが起きようがないので全てのサーバは同期を維持し続ける。
特開２００７−２４１３２５号公報「Ｃｏｎｆｌｉｃｔ−ａｗａｒｅＳｃｈｅｄｕｌｉｎｇｆｏｒＤｙｎａｍｉｃＣｏｎｔｅｎｔＡｐｐｌｉｃａｔｉｏｎｓ」、ＵＳＥＮＩＸ２００３ As described above, even if a plurality of clients try to execute transactions that access the same table in parallel, the proxy permits access to only one transaction, so only one update is executed at most. Therefore, if a plurality of updates are not executed simultaneously in parallel, it is unlikely that the order of the plurality of updates will be changed, so that all servers continue to maintain synchronization.
JP 2007-241325 A “Conflict-aware Scheduling for Dynamic Content Applications”, USENIX 2003

しかしながら、上述の第一の従来技術は、複数のデータベースサーバ間の同期が崩れたことを検出して再び同期状態に戻す（以下「再同期処理」という）方法であって、複数のデータベースサーバ間の同期を維持する（同期が崩れないようにする）方法ではない。再同期処理は完了までに時間がかかること、その間同期外れサーバは正常サーバのバックアップになっておらずシステムの信頼性が落ちること、バックアップデータの作成中は正常サーバの負荷が重くなりクライアントへのサービスが滞るまたはバックアップデータの作成が完了するまでクライアントに対するサービスは停止しなければならないこと、再同期処理が失敗する事がありもし失敗したら最初からやり直さなければならないこと、何度やり直しても失敗し無限に再同期処理が繰り返される可能性があること、障害発生と同期崩れの区別がつかないので障害発生時に無駄な再同期処理が動作してしまうこと（障害から復旧するためには再同期処理は意味無い）、などの問題が生じる。サーバ間で同期が崩れるのはライトコンフリクト（複数のクライアントから発行された更新クエリが同一データアイテムを同時に更新しようと試みること）時に発生する。以下に具体例を示す。However, the first prior art described above is a method for detecting that synchronization between a plurality of database servers is lost and returning the synchronization state again (hereinafter referred to as “resynchronization processing”). It is not a method of maintaining the synchronization of (so that the synchronization is not lost). The resynchronization process takes time to complete, while the out-of-synchronization server is not backed up by the normal server and the reliability of the system is reduced. During the creation of backup data, the load on the normal server becomes heavy and the client The service to the client must be stopped until the service is delayed or the creation of backup data is completed, the resynchronization process may fail, and if it fails, it must be started from the beginning, even if it tries again and again, it will fail There is a possibility that resynchronization processing may be repeated indefinitely, and it is impossible to distinguish between failure occurrence and loss of synchronization, and wasteful resynchronization processing will operate at the time of failure occurrence (resynchronization processing to recover from failure) Does not make sense). Synchronization between servers occurs when a write conflict occurs (update queries issued from multiple clients attempt to update the same data item at the same time). Specific examples are shown below.

図２３にｔｅｓｔ＿ｔａｂｌｅという名前のテーブルに対して、図２４に示すトランザクションＴ１とＴ２が平行実行される場合を示す。Ｔ１はｉｄ＝１の行を更新し、Ｔ２はｉｄ＝１と２の行を更新する。つまり、ｉｄ＝１でライトコンフリクトする。どちらのトランザクションが先にｉｄ＝１のロックを獲得するかによって最終結果は異なってしまう。FIG. 23 shows a case where transactions T1 and T2 shown in FIG. 24 are executed in parallel with respect to a table named test_table. T1 updates the row with id = 1, and T2 updates the row with id = 1 and 2. That is, a write conflict occurs with id = 1. The final result depends on which transaction first acquires the lock with id = 1.

ロックを取る順序は、Ｔ１が先に取る場合とＴ２が先に取る場合の２通りが考えられる。例えばＴ１とＴ２が並行実行されていてトランザクション隔離レベルとしてＳＥＲＩＡＬＩＺＡＢＬＥである時、Ｔ１が先にロックを取ったならばｉｄ＝１の最終結果はｄｅｐ＝１となり（Ｔ２はＳＥＲＩＡＬＩＺＡＢＬＥ失敗でＴ２のＵＰＤＡＴＥは無効）、Ｔ２が先にロックを取ったならばｄｅｐ＝２となる（Ｔ１はＳＥＲＩＡＬＡＢＬＥ失敗でＴ１のＵＰＤＡＴＥは無効）。There are two possible orders for locking, T1 taking first and T2 taking first. For example, when T1 and T2 are executed in parallel and the transaction isolation level is SERIALIZEABLE, if T1 first locks, the final result of id = 1 is dep = 1 (T2 is SERIALIZEABLE failure and T2's UPDATE is Invalid), dep = 2 if T2 has previously locked (T1 is SERIALABLE failure and T1's UPDATE is invalid).

従って、複数のサーバでＴ１とＴ２を平行実行した時、あるサーバはＴ１が先にロックを獲得し、別のサーバはＴ２が先にロックを獲得したならば二つのデータベースは同期状態（同じデータを保持している状態）が崩れてしまい、前記問題を持った再同期化処理が実行されてしまう。Therefore, when T1 and T2 are executed in parallel on a plurality of servers, if one server acquires the lock first by T1 and another server acquires the lock first by T2, the two databases are synchronized (same data The state of holding the image) is broken, and the resynchronization processing having the above problem is executed.

また、この従来技術は同期が崩れていないのにも係わらず異なった応答を返すという問題もある。図２３に示すｔｅｓｔ＿ｔａｂｌｅに対して図２５に示すトランザクションＴ３とＴ４が平行実行される場合を示す。Ｔ３のＣＯＭＭＩＴとＴ４のＳＥＬＥＣＴの実行のタイミングによってはＳＥＬＥＣＴの結果が変わってしまう。In addition, this conventional technique also has a problem that a different response is returned although the synchronization is not lost. FIG. 25 shows a case where transactions T3 and T4 shown in FIG. 25 are executed in parallel with the test_table shown in FIG. The result of SELECT changes depending on the timing of execution of COMMIT of T3 and SELECT of T4.

あるサーバではＴ３のＣＯＭＭＩＴが実行される前にＴ４のＳＥＬＥＣＴが実行されたとすると、ＳＥＬＥＣＴの結果はｄｅｐ＝３となる。これに対して、別のサーバではＴ３のＣＯＭＭＩＴが実行された後にＴ４のＳＥＬＥＣＴが実行されたとすると、ｄｅｐ＝１となる。仲介装置で結果の比較や多数決で障害検出を行っている場合、同期が崩れている訳でもなく障害起きている訳でもなく正常状態なのに異常が起きたと誤認してしまう問題が発生する。If a T4 SELECT is executed before a T3 COMMIT is executed in a certain server, the SELECT result is dep = 3. On the other hand, if another server executes T3 SELECT after executing T3 COMMIT, dep = 1. When failure detection is performed by comparing results or majority voting in the mediation device, there is a problem that the synchronization is not broken, the failure is not occurring, and it is misunderstood that an abnormality has occurred even though it is in a normal state.

さらに、上述の第二の従来技術は、複数のデータベースサーバ間の同期を維持するためにアクセスできるトランザクションを一つだけに限定している。従って、平行実行したらライトコンフリクトする更新クエリを一つずつシーケンシャルに実行することによってライトコンフリクトを防ぐことで同期を維持する。しなしながら、ライトコンフリクトするかどうかはサーバで実行してみないと分からない（例えば図２４のＴ１とＴ２を見比べただけではライトコンフリクトするかどうかは分からない）ので、ライトコンフリクトしない更新クエリも一つずつシーケンシャルに実行しなければならない。従って、スループットが低下してしまうという問題がある。また、このシステムだけの固有の命令である、ＬＯＣＫ命令とＵＮＬＯＣＫ命令を記述する必要であり、既存のアプリケーションを流用できないという問題もある。以下に具体例を示す。Furthermore, the second prior art described above limits only one transaction that can be accessed to maintain synchronization among a plurality of database servers. Therefore, synchronization is maintained by preventing write conflicts by sequentially executing update queries that perform write conflicts one by one when they are executed in parallel. However, since it is not known if the server does not execute the write conflict (for example, it is not known whether the write conflict occurs simply by comparing T1 and T2 in FIG. 24). Must be executed sequentially one by one. Therefore, there is a problem that the throughput is lowered. In addition, it is necessary to describe a LOCK instruction and an UNLOCK instruction, which are unique instructions only for this system, and there is a problem that existing applications cannot be diverted. Specific examples are shown below.

図２３のｔｅｓｔ＿ｔａｂｌｅに対して、図２６に示すトランザクションＴ５とＴ６が平行実行される場合を示す。Ｔ５はｉｄ＝４の行を更新し、Ｔ６はｉｄ＝５の行を更新するため、本来は平行実行可能であるにも係わらず、本従来技術ではそれができないことを示す。まずは、プロキシが２ＰＬを実現可能にするために図２６のＴ５とＴ６はそれぞれ図２７のＴ５０とＴ６０へ改造しなければならない。The case where transactions T5 and T6 shown in FIG. 26 are executed in parallel with the test_table of FIG. Since T5 updates the row with id = 4 and T6 updates the row with id = 5, it indicates that this conventional technique cannot do this even though it can be executed in parallel. First, in order for the proxy to realize 2PL, T5 and T6 in FIG. 26 must be modified to T50 and T60 in FIG. 27, respectively.

スケジューラは二つのＬＯＣＫ命令のうちＴ５０のＬＯＣＫ命令を先に受信したとする。スケジューラは一意の番号を与えプロキシへ転送する。ｔｅｓｔ＿ｔａｂｌｅのロックを保持しているトランザクションが無いので、プロキシはＴ５０にロックを与えてその旨をスケジューラ経由でクライアントへ返答する。遅れてスケジューラに到着したＴ６０のＬＯＣＫ命令は別の番号を与えてプロキシへ転送する。ｔｅｓｔ＿ｔａｂｌｅのロックは空いていないため、このＬＯＣＫ命令はプロキシが保持する。Ｔ６０を実行中のクライアントはプロキシから返答がないため、次のクエリを実行できないまま返答を待つ。It is assumed that the scheduler first receives a T50 LOCK command out of two LOCK commands. The scheduler gives a unique number and forwards it to the proxy. Since there is no transaction holding the lock of test_table, the proxy gives a lock to T50 and replies to the client via the scheduler. A TLOCK LOCK instruction that arrives at the scheduler later is given another number and forwarded to the proxy. Since the lock of test_table is not free, this LOCK instruction is held by the proxy. Since the client executing T60 does not receive a response from the proxy, the client waits for a response without executing the next query.

返答をもらったＴ５０を実行中のクライアントはＵＰＤＡＴＥ、ＳＥＬＥＣＴを順に送信し、最後にＵＮＬＯＣＫ命令を送信する。プロキシはこのＵＮＬＯＣＫ命令を受け取ってＴ５０に対するロックを開放し次に小さい番号を持つトランザクションにロックを渡す。今、Ｔ６０しか待っているトランザクションが無いためＴ６０にロックを渡しクライアントへ返答する。The client that is executing T50 that has received a response transmits UPDATE and SELECT in order, and finally transmits an UNLOCK command. The proxy receives this UNLOCK instruction, releases the lock on T50, and passes the lock to the transaction with the next lower number. Since there is no transaction waiting for only T60, a lock is given to T60 and a response is returned to the client.

返答をもらったＴ６０を実行中のクライアントはここでやっと次のＵＰＤＡＴＥを送信できるようになる。The client that is executing T60 who has received a reply can finally transmit the next UPDATE here.

つまり、更新クエリを一つずつシーケンシャルに実行させることで更新の入れ替わりを無くし同期を維持する。しかし、Ｔ５０とＴ６０のようにライトコンフリクトせず本来平行に実行できる更新クエリも平行実行できないため性能が低下してしまうという問題がある。またＵＰＤＡＴＥが実行できないためにＵＰＤＡＴＥに続くクエリ（例えばＴ５０、Ｔ６０の場合はＳＥＬＥＣＴクエリ）も平行実行できず、ますます性能低下は無視できない問題である。さらに、サーバは改造せず既存のものを流用できるが、クライアントアプリケーションは改造しなければならないという問題もある。In other words, by sequentially executing update queries one by one, the replacement of updates is eliminated and synchronization is maintained. However, there is a problem that the performance is deteriorated because an update query that can be executed in parallel without writing conflict like T50 and T60 cannot be executed in parallel. Further, since UPDATE cannot be executed, a query following UPDATE (for example, a SELECT query in the case of T50 and T60) cannot be executed in parallel, and the deterioration in performance cannot be ignored. Furthermore, the server can be used without modification, but the client application must be modified.

本発明は、上記事情に鑑みてなされたものであり、その目的とするところは、複数の更新クエリを平行実行し性能低下を防ぐと同時にデータベースサーバ間で常に同期を維持し続ける高信頼化データベースシステムにおける同期維持方法を提供することにある。また、クライアントへのサービス停止時間を極力抑えて同期から外れたデータベースサーバを正常稼動中のデータベースサーバに同期させる同期化方法を提供することにある。本発明は、既存のデータベースサーバもアプリケーションも改造せずに流用できるため、低コストで実用的である。The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a highly reliable database that executes a plurality of update queries in parallel to prevent performance degradation and at the same time always maintains synchronization between database servers. It is to provide a synchronization maintaining method in a system. Another object of the present invention is to provide a synchronization method for synchronizing a database server that is out of synchronization with a database server that is operating normally by minimizing the service stop time to the client. Since the present invention can be used without modifying any existing database server or application, it is practical at low cost.

上記目的を達成するために、本願発明では、同一データに複数の処理要求が同時にアクセスを試みた場合一つの処理要求のみを実行しその他は待たせる機能を有する複数のデータベースサーバと、クライアントコンピュータからの処理要求を一つ以上のデータベースサーバに中継するとともに前記データベースサーバからの正当な応答の一つをクライアントコンピュータに処理結果として返す仲介装置とを備えた高信頼化データベースシステムにおいて、仲介装置は、前記複数のデータベースサーバのうち一つをｌｅａｄｅｒとし残りをｆｏｌｌｏｗｅｒとしてあらかじめ選定しておき、クライアントコンピュータから処理要求を受信した時、該処理要求をｌｅａｄｅｒのみに送信し、ｌｅａｄｅｒから該処理要求に対する応答を受信したらｆｏｌｌｏｗｅｒへ該処理要求を送信する制御部を備える。In order to achieve the above object, in the present invention, a plurality of database servers having a function of executing only one processing request and waiting for others when a plurality of processing requests try to access the same data at the same time, and a client computer In the highly reliable database system including the intermediary device that relays the processing request to one or more database servers and returns one of the legitimate responses from the database server as a processing result to the client computer, One of the plurality of database servers is selected in advance as a leader and the rest as a follower. When a processing request is received from a client computer, the processing request is transmitted only to the leader, and a response to the processing request is sent from the leader. Once received A control unit for transmitting the processing request to Ollower.

このようなシステムによれば、複数の更新クエリを平行実行し性能低下を防ぐと同時にデータベースサーバ間で常に同期を維持し続けることができる。According to such a system, a plurality of update queries can be executed in parallel to prevent performance degradation, and at the same time, synchronization can be constantly maintained between database servers.

また、本願発明では、クライアントコンピュータからの処理要求を差分情報として記憶する差分情報記憶部を備え、同期から外れているデータベースサーバ（以降「同期外れデータベースサーバ」と言う）の同期化要求があると、正常稼働中のデータベースサーバからバックアップデータの作成を開始し、クライアントコンピュータから受信する処理要求を差分情報記憶部にｌｅａｄｅｒからの応答を受信した順序で記憶し、バックアップデータの作成を完了すると該バックアップデータを用いて同期外れデータベースサーバを復元させ、同期外れデータベースサーバにおいて前記バックアップデータからのデータベースの復元が完了すると、差分情報記憶部に記憶されている処理要求を同期外れデータベースサーバに順次送出する。In the present invention, there is a difference information storage unit that stores a processing request from a client computer as difference information, and there is a synchronization request for a database server out of synchronization (hereinafter referred to as “out of synchronization database server”). , Start creating backup data from the database server that is operating normally, store the processing requests received from the client computer in the order in which the responses from the reader are received in the difference information storage unit, and complete the backup data creation The out-of-synchronization database server is restored using data, and when restoration of the database from the backup data is completed in the out-of-synchronization database server, the processing requests stored in the difference information storage unit are sequentially sent to the out-of-synchronization database server.

このようなシステムによれば、クライアントへのサービス停止時間を極力抑えて同期外れデータベースサーバを正常稼動中のデータベースサーバに同期させることができる。また、差分情報記憶部に記憶されている処理要求を実行中にｌｅａｄｅｒと実行順序が異なり同期化が失敗することは無い。According to such a system, the out-of-synchronization database server can be synchronized with a normally operating database server while minimizing the service stop time for the client. In addition, the execution order is different from the leader during execution of the processing request stored in the difference information storage unit, and synchronization does not fail.

さらに、本願発明では、既存のデータベースサーバもアプリケーションも改造せずに流用できるため、低コストで実用的である。Furthermore, in the present invention, since existing database servers and applications can be used without modification, they are practical at low cost.

以上説明したように本発明によれば、複数の更新クエリを平行実行し性能低下を防ぐと同時にデータベースサーバ間で常に同期を維持し続ける高信頼化データベースシステムにおける同期方法を提供することにある。また、クライアントへのサービス停止時間を極力抑えて同期から外れたデータベースサーバを正常稼動中のデータベースサーバに同期させる同期方法を提供することにある。本発明は、既存のデータベースサーバもアプリケーションも改造せずに流用できるため、低コストで実用的である。 As described above, according to the present invention, there is provided a synchronization method in a highly reliable database system in which a plurality of update queries are executed in parallel to prevent performance degradation, and at the same time, synchronization is always maintained between database servers. Another object of the present invention is to provide a synchronization method for synchronizing a database server that is out of synchronization with a database server that is operating normally while minimizing the service stop time for the client. Since the present invention can be used without modifying any existing database server or application, it is practical at low cost.

（第１の実施の形態）
本発明の第１の実施の形態に係る高信頼化データベースシステムについて図面を参照して説明する。図１は本実施の形態に係る高信頼化データベースシステムの全体構成を説明するブロック図である。(First embodiment)
A highly reliable database system according to a first embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram illustrating the overall configuration of a highly reliable database system according to the present embodiment.

この高信頼化データベースシステムは、図１に示すように、複数のデータベースサーバ（以下「サーバ」と言う）１０と仲介装置２０とをネットワーク３０で接続したものであり、ネットワーク４０を介して１以上のクライアントコンピュータ（以下「クライアント」と言う）５０からアクセスされるものである。本実施の形態では、図１に示すように、２台のサーバ１０ａ及び１０ｂを有しており、２台のクライアント５０ｍ及び５０ｎからアクセスされる。以降の説明において各サーバ１０を他のサーバ１０と区別する場合は添え字「ａ」「ｂ」を付加する。クライアント５０についても「ｍ」「ｎ」を付加する。本発明ではサーバもクライアントも本発明のために改造する必要はなく、既存のものを無改造で使用することができる。 As shown in FIG. 1, this highly reliable database system is a system in which a plurality of database servers (hereinafter referred to as “servers”) 10 and an intermediary device 20 are connected via a network 30. The client computer (hereinafter referred to as “client”) 50 is accessed. In this embodiment, as shown in FIG. 1, two servers 10a and 10b are provided and accessed from two clients 50m and 50n. In the following description, subscripts “a” and “b” are added to distinguish each server 10 from other servers 10. “M” and “n” are also added to the client 50. In the present invention, neither the server nor the client needs to be modified for the present invention, and the existing one can be used without modification.

複数のサーバのうち一つをｌｅａｄｅｒとして選定し残りをｆｏｌｌｏｗｅｒとする。ｌｅａｄｅｒはどのサーバを選んでも良い。本実施例では１０ａをｌｅａｄｅｒとして１０ｂをｆｏｌｌｏｗｅｒとする。サーバ１０は、ＳＱＬを介して処理を行うＲＤＢＭＳである。このようなサーバ１０としては様々なものがあり、例えばオープンソースソフトウェアのＰｏｓｔｇｒｅＳＱＬ（ｈｔｔｐ：／／ｗｗｗ．ｐｏｓｔｇｒｅｓ．ｏｒｇ／）やＯｒａｃｌｅ社によるＯｒａｃｌｅ（登録商標）（ｈｔｔｐ：／／ｗｗｗ．ｏｒａｃｌｅ．ｃｏｍ）などが挙げられる。One of a plurality of servers is selected as a leader and the rest is a follower. The leader may select any server. In this embodiment, 10a is a leader and 10b is a follower. The server 10 is an RDBMS that performs processing via SQL. There are various types of such servers 10, for example, PostgreSQL (http://www.postgres.org/) of open source software and Oracle (registered trademark) by Oracle (http://www.oracle.com). ) And the like.

サーバはライトコンフリクト（複数のクライアントから発行された更新クエリが同一データアイテムを同時に更新しようと試みること）時には一つの更新クエリのみの実行を許しそれ以外の更新クエリはブロックされる動作を行うものを使う。具体例を図２に示す。図２はクライアント５０ｍと５０ｎがそれぞれ図２４のトランザクションＴ１とＴ２を図２３のテーブルｔｅｓｔ＿ｔａｂｌｅに対して実行している例である。Ｔ１とＴ２のＵＰＤＡＴＥはｉｄ＝１でライトコンフリクトする。When a server has a write conflict (update queries issued by multiple clients trying to update the same data item at the same time), only one update query is allowed to execute, and other update queries are blocked. use. A specific example is shown in FIG. FIG. 2 shows an example in which the clients 50m and 50n execute the transactions T1 and T2 of FIG. 24 on the table test_table of FIG. The UPDATE of T1 and T2 has a write conflict with id = 1.

クライアント７０１ｎがＢＥＧＩＮを送信し（ステップＳ７００）、サーバ７００がそれを実行して応答を返す（ステップＳ７０１）。クライアント７０１ｍもＢＥＧＩＮを送信し（ステップＳ７０２）、サーバ７００がそれを実行して応答を返す（ステップＳ７０３）。クライアント７０１ｎとクライアント７０１ｍはほぼ同時にＵＰＤＡＴＥをサーバ７００へ送信する（ステップＳ７０４、Ｓ７０５）。二つのＵＰＤＡＴＥはライトコンフリクトするので一つしか実行されず残りはブロックされる。図２の場合、クライアント７０１ｎによって実行中のトランザクションＴ２がｉｄ＝１のロックを確保してＵＰＤＡＴＥの応答がクライアント７０１ｎへ返された（ステップＳ７０６）が、クライアント７０１ｍのトランザクションＴ１はロックが確保できずにブロックされＵＰＤＡＴＥの応答は返らない。ロックを開放するタイミングはロックを確保しているトランザクションＴ２がＣＯＭＭＩＴまたはＡＢＯＲＴでトランザクションが終了した時である。図２の場合、クライアント７０１ｎがＣＯＭＭＩＴをサーバ７００に送信し（ステップＳ７０７）、サーバ７００がそれを実行する事でｉｄ＝１のロックを開放する。従って、サーバ７００はＣＯＭＭＩＴに対する応答をクライアント７０１ｎへ返す（ステップＳ７０８）のに加えて、ブロックされていたＵＰＤＡＴＥを実行しその結果をクライアント７０１ｍへ返す（ステップＳ７０９）。The client 701n transmits BEGIN (step S700), and the server 700 executes it and returns a response (step S701). The client 701m also transmits BEGIN (step S702), and the server 700 executes it and returns a response (step S703). The client 701n and the client 701m transmit UPDATE to the server 700 almost simultaneously (steps S704 and S705). Since two UPDATEs have a write conflict, only one is executed and the rest are blocked. In the case of FIG. 2, the transaction T2 being executed by the client 701n secures the lock of id = 1 and the UPDATE response is returned to the client 701n (step S706), but the transaction T1 of the client 701m cannot secure the lock. Will be blocked and no UPDATE response will be returned. The timing for releasing the lock is when the transaction T2 that secures the lock is COMMIT or ABORT and the transaction ends. In the case of FIG. 2, the client 701n transmits COMMIT to the server 700 (step S707), and the server 700 executes it to release the lock with id = 1. Accordingly, in addition to returning a response to COMMIT to the client 701n (step S708), the server 700 executes the blocked UPDATE and returns the result to the client 701m (step S709).

サーバはいくつかのトランザクション隔離レベルを実装している。その中で、データの一貫性について最も厳密なものがＳＥＲＩＡＬＩＺＡＢＬＥである。ＳＥＲＩＡＬＩＺＡＢＬＥはｓｎａｐｓｈｏｔｉｓｏｌａｔｉｏｎの一種である。The server implements several transaction isolation levels. Among them, the strictest data consistency is SERIALIZEABLE. SERIALIZABLE is a kind of snapshot isolation.

ｓｎａｐｓｈｏｔｉｓｏｌａｔｉｏｎで動作するトランザクションは、トランザクションがスタートした時点でデータベース全体のコピーであるスナップショットを作成する。このスナップショットにはＣＯＭＭＩＴされた他のトランザクションの変更は含まれるが、ＣＯＭＭＩＴされていないトランザクションの変更は含まれていない。トランザクション内の全てのクエリの実行はこのスナップショットに対して行われる。トランザクションがＣＯＭＭＩＴすると、スナップショットへの変更はデータベース本体へ反映される。ライトコンフリクトが起きた場合は一つだけが成功しＣＯＭＭＩＴできる。図３を使って具体例を示す。A transaction that operates with snapshot isolation creates a snapshot that is a copy of the entire database when the transaction starts. This snapshot includes changes for other committed transactions, but does not include changes for uncommitted transactions. All queries in the transaction are executed on this snapshot. When the transaction is COMMIT, the change to the snapshot is reflected in the database body. If a write conflict occurs, only one can succeed and COMMIT. A specific example is shown using FIG.

図３は右へ行くほど時間が経過することを表す。四つのトランザクションＴ１０、Ｔ１１、Ｔ１２、Ｔ１３がある。Ｂ、Ｓ、Ｃ、ＡはそれぞれＢＥＧＩＮの実行、スナップショットの取得、ＣＯＭＭＩＴの実行、ＡＢＯＲＴの実行を表す。「ｉｄ＝１」とは更新した対象を表す。トランザクションＴ１１が取得したスナップショットにはトランザクションＴ１０の更新は含まれていない（なぜならばスナップショット取得時点でＣＯＭＭＩＴしていない）。Ｔ１０とＴ１１は同じｉｄ＝１を更新するためライトコンフリクトする。Ｔ１０がＣＯＭＭＩＴすると、他のライトコンフリクトするトランザクションは失敗するためＴ１１はＡＢＯＲＴする。トランザクションＴ１２はトランザクションＴ１０の更新もＴ１１の更新も含まれていない。更新対象が異なるためＴ１２はライトコンフリクトせず、ＣＯＭＭＩＴできる。トランザクションＴ１３のスナップショットにはトランザクションＴ１０の更新もＴ１２の更新も含まれている。ｉｄ＝１を更新するがライトコンフリクトしないため（なぜならば、Ｔ１０やＴ１１と平行に実行していない）、ＣＯＭＭＩＴできる。FIG. 3 shows that time goes to the right. There are four transactions T10, T11, T12, T13. B, S, C, and A represent BEGIN execution, snapshot acquisition, COMMIT execution, and ABORT execution, respectively. “Id = 1” represents the updated object. The snapshot acquired by the transaction T11 does not include the update of the transaction T10 (because it is not COMMIT at the time of snapshot acquisition). T10 and T11 have a write conflict because the same id = 1 is updated. When T10 is COMMIT, other write conflict transactions fail and T11 aborts. Transaction T12 does not include an update of transaction T10 or an update of T11. Since the update target is different, T12 can be COMMIT without a write conflict. The snapshot of transaction T13 includes an update of transaction T10 and an update of T12. Since id = 1 is updated but no write conflict occurs (because it is not executed in parallel with T10 or T11), COMMIT is possible.

本実施例では、ＳＥＲＩＡＬＩＺＡＢＬＥで動作するＰｏｓｔｇｒｅＳＱＬを具体的に説明する。In the present embodiment, PostgreSQL operating in SERIALIZEABLE will be specifically described.

仲介装置２０がネットワーク４０側に公開しているアドレスがクライアント５０にとっての仮想的なデータベースサーバである。つまり、クライアントは仲介装置２０がデータベースサーバそのものだとみなして処理を行えばよい。これは、サーバ１０が何台稼動していようと、障害でダウンしているサーバがあろうと、障害から復旧して再びシステムに組み込まれようと、どのようなシステムであってもクライアント５０は意識することなく（設定など変更することなく）仲介装置２０と通信を行う事ができる。The address disclosed by the mediation device 20 to the network 40 side is a virtual database server for the client 50. That is, the client may perform processing by regarding the intermediary device 20 as a database server itself. This is because, regardless of how many servers 10 are operating, whether there is a server that is down due to a failure, even if it is recovered from a failure and reintegrated into the system, the client 50 is conscious of any system. It is possible to communicate with the intermediary device 20 without having to do so (without changing settings or the like).

仲介装置２０は後述する同期維持アルゴリズムに従ってクライアントから受け取ったクエリをサーバ１０へ送信し正常な応答の一つをクライアントへ返すものである。The intermediary device 20 transmits a query received from the client to the server 10 according to a synchronization maintaining algorithm described later, and returns one of normal responses to the client.

クライアント５０はデータベースシステムに対してクエリを送信するものである。The client 50 transmits a query to the database system.

次に、同期維持アルゴリズムについて説明する。本アルゴリズムは次の二点の特徴を持ったデータベースシステムを前提とする。
（ａ）クライアントは、既に送信したクエリに対する応答を受信してから新たなクエリを送信する。
（ｂ）ライトコンフリクトした場合はロックを獲得できたトランザクションの更新クエリのみが実行可能でありロックを獲得できなかったトランザクションの更新クエリはロックを取得するまでブロックされる。Next, the synchronization maintaining algorithm will be described. This algorithm assumes a database system with the following two features.
(A) The client transmits a new query after receiving a response to the already transmitted query.
(B) In the case of a write conflict, only the update query of a transaction that can acquire a lock can be executed, and the update query of a transaction that cannot acquire a lock is blocked until a lock is acquired.

ここで更新クエリにはＵＰＤＡＴＥ，ＤＥＬＥＴＥなどデータアイテムを更新する際に行ロックを必要とするクエリの他に、これらのクエリの動作に影響を与えるクエリも更新クエリとして扱う。例えば、ＳＥＬＥＣＴＦＯＲＵＰＤＡＴＥはその実行にデータアイテムの更新を伴わないが、その実行に行ロックを必要としＵＰＤＡＴＥやＤＥＬＥＴＥと同時に実行した場合には行ロックの競合が発生するため、更新クエリとして扱う。Here, in addition to queries that require row locks when updating data items such as UPDATE and DELETE, queries that affect the operation of these queries are also handled as update queries. For example, SELECT FOR UPDATE does not involve updating data items in its execution, but requires a row lock for its execution, and when it is executed simultaneously with UPDATE or DELETE, a row lock conflict occurs, so it is treated as an update query.

全てのサーバで同期を維持するためには、同一データアイテムに対する更新の順序が全てのサーバで同一になることを保障する必要がある。この順序保障は、異なるトランザクションに属する更新クエリ同士だけでなく、同一トランザクション内の更新クエリ同士も保障する必要がある。上記二つの条件を持ったデータベースシステムでサーバ／クライアント間のクエリを次の三つのルールで制御することで順序保障を実現し同期を維持する。
（１）仲介装置２０はクライアント５０から受信した全ての更新クエリをｌｅａｄｅｒのみに送信する。ｌｅａｄｅｒへの送信順序は任意である。
（２）仲介装置２０は更新クエリの応答をｌｅａｄｅｒから受信したら、更新クエリを全てのｆｏｌｌｏｗｅｒへ送信する。
（３）仲介装置２０は全てのデータベースサーバから応答を受信したらクライアントへ応答を返す。In order to maintain synchronization on all servers, it is necessary to ensure that the update order for the same data item is the same on all servers. This order guarantee needs to guarantee not only update queries belonging to different transactions but also update queries in the same transaction. In the database system having the above two conditions, the server / client query is controlled by the following three rules to ensure the order and maintain the synchronization.
(1) The intermediary device 20 transmits all update queries received from the client 50 only to the leader. The order of transmission to the leader is arbitrary.
(2) When the intermediary device 20 receives a response to the update query from the reader, the mediation device 20 transmits the update query to all followers.
(3) When the intermediary device 20 receives responses from all the database servers, it returns a response to the client.

（ａ）と（３）で同一トランザクションに属する更新クエリ同士の順序の一意性を保障し、（ｂ）と（１）（２）で異なるトランザクションに属する更新クエリ同士の順序の一意性を保障する。更新クエリ以外のクエリはｌｅａｄｅｒ、ｆｏｌｌｏｗｅｒの区別無く全てのサーバへ送信する。(A) and (3) guarantee the uniqueness of the order of update queries belonging to the same transaction, and (b), (1) and (2) guarantee the uniqueness of the order of update queries belonging to different transactions. . Queries other than update queries are sent to all servers without distinction between leader and follower.

仲介装置２０は、図４に示すように、クライアント５０からの接続要求を受け付けて実際にクライアントと接続し空いている制御部２２へ接続情報を渡す接続処理部２１と、全てのサーバ１０と接続をした後上記アルゴリズムに従ってクエリと応答の送受信を制御する制御部２２とを備えている。本実施の形態では、三つの制御部２２ｘ、２２ｙ、２２ｚを有している。以降の説明において各制御部２２を他の制御部２２と区別する場合は添え字「ｘ」「ｙ」「ｚ」を付加する。As shown in FIG. 4, the intermediary device 20 accepts a connection request from the client 50 and connects to all servers 10, a connection processing unit 21 that actually connects to the client and passes connection information to the vacant control unit 22. And a control unit 22 that controls transmission and reception of queries and responses according to the above algorithm. In the present embodiment, there are three control units 22x, 22y, and 22z. In the following description, the subscripts “x”, “y”, and “z” are added to distinguish each control unit 22 from other control units 22.

次に、図５から図９のフローチャートを参照して制御部２２の動作について説明する。制御部２２はクエリをクライアント５０から受信すると（ステップＳ１）、スナップショットの作成が済んでいるかどうかを確認する（ステップＳ２）。Next, the operation of the control unit 22 will be described with reference to the flowcharts of FIGS. When receiving the query from the client 50 (step S1), the control unit 22 checks whether or not the snapshot has been created (step S2).

スナップショットが作成されていない場合（ステップＳ２）、クエリはスナップショットを作成するクエリかどうかを確認する（ステップＳ３）。スナップショットを作成するクエリはトランザクションの最初に現れるデータアイテムにアクセスするクエリである。つまり、例えば図２４のトランザクションＴ１の場合、ＢＥＧＩＮではなくＳＥＬＥＣＴである。ステップＳ３でスナップショットを作成するクエリであった場合はスナップショット作成ルーチンを実行し（ステップＳ４）再びクライアント５０からクエリを受信する（ステップＳ１）。ステップＳ３でスナップショットを作成するクエリでない場合は通常ルーチンを実行し（ステップＳ５）再びクライアント５０からクエリを受信する（ステップＳ１）。If a snapshot has not been created (step S2), it is confirmed whether the query is a query for creating a snapshot (step S3). A query that creates a snapshot is a query that accesses a data item that appears at the beginning of a transaction. That is, for example, in the case of the transaction T1 in FIG. 24, it is not BEGIN but SELECT. If the query is for creating a snapshot in step S3, a snapshot creation routine is executed (step S4), and the query is received again from the client 50 (step S1). If it is not a query for creating a snapshot in step S3, a normal routine is executed (step S5), and the query is received again from the client 50 (step S1).

前記Ｓ２で既にスナップショットが作成済みの場合、更新クエリかどうかをチェックし（ステップＳ６）、更新クエリならば同期維持ルーチン（ステップＳ７）を実行し再びクライアント５０からクエリを受信する（ステップＳ１）。前記ステップＳ６で更新クエリでない場合、コミットクエリかどうかをチェックし（ステップＳ８）、コミットクエリの場合はコミット処理ルーチンを実行し（ステップＳ９）再びクライアント５０からクエリを受信する（ステップＳ１）。前記Ｓ８でコミットクエリでない場合、通常ルーチンＳ５を実行し再びクライアント５０からクエリを受信する（ステップＳ１）。If a snapshot has already been created in S2, it is checked whether it is an update query (step S6). If it is an update query, a synchronization maintenance routine (step S7) is executed and a query is received again from the client 50 (step S1). . If it is not an update query in step S6, it is checked whether it is a commit query (step S8). If it is a commit query, a commit processing routine is executed (step S9), and a query is received again from the client 50 (step S1). When the query is not a commit query in S8, the normal routine S5 is executed and the query is received again from the client 50 (step S1).

図６のフローチャートを参照して通常ルーチンを説明する。全てのサーバ１０へそのクエリを送信し（ステップＳ１０）、全てのサーバ１０から応答を受信する（ステップＳ１１）。多数決で正当な応答を判定し（ステップＳ１２）正当な応答の一つをクライアント５０へ送信する（ステップ１３）。The normal routine will be described with reference to the flowchart of FIG. The query is transmitted to all servers 10 (step S10), and responses are received from all servers 10 (step S11). A valid response is determined by majority decision (step S12), and one of the valid responses is transmitted to the client 50 (step 13).

図７のフローチャートを参照してスナップショット作成ルーチンを説明する。スナップショットの作成とコミットの実行は排他制御しなければならない。そのためにコミットカウンタとスナップショット作成カウンタを用いる。コミットカウンタが１以上の場合、コミット実行中のトランザクションが存在することを意味する。同様に、スナップショット作成カウンタが１以上の場合、スナップショット作成中のトランザクションが存在することを意味する。両方が１以上にならないように制御することで、スナップショット作成中のトランザクションとコミット実行中のトランザクション両方が存在しないことを保証する。これらのカウンタの読み書きには共通のロック（例えばｐｔｈｒｅａｄ＿ｌｏｃｋ）を使う。The snapshot creation routine will be described with reference to the flowchart of FIG. Snapshot creation and commit execution must be controlled exclusively. For this purpose, a commit counter and a snapshot creation counter are used. When the commit counter is 1 or more, it means that there is a transaction that is being committed. Similarly, if the snapshot creation counter is 1 or more, it means that there is a transaction for which a snapshot is being created. By controlling so that both do not become 1 or more, it is ensured that there are no transactions that are creating snapshots and transactions that are executing commits. A common lock (for example, pthread_lock) is used to read and write these counters.

コミットカウンタが０でない場合（ステップＳ２０）、コミット実行中のトランザクションが存在するのでスリープする（ステップＳ２１）。ｗａｋｅｕｐすると再びコミットカウンタが０かどうかチェックする（ステップＳ２０）。コミットカウンタが０の場合、コミット実行中のトランザクションがないことを意味するのでスナップショット作成中であることを表すためにスナップショット作成カウンタを１増やし（ステップＳ２２）スナップショットの作成に入る。スナップショット作成は有効なクエリを実行する事で実現されるが更新クエリの場合はスナップショットの作成と同時に同期を維持しなければならない。すなわち、更新クエリかどうかをチェックし（ステップＳ２３）、更新クエリでない場合は通常ルーチンを実行し（ステップＳ５）、更新クエリの場合は同期維持ルーチンを実行する（ステップＳ７）。いずれのルーチンを実行してもスナップショット作成は完了するから、その旨を表すためにスナップショット作成カウンタを１減らす（ステップＳ２４）。スナップショット作成カウンタが０ならば（ステップＳ２５）、スナップショット作成カウンタが０になるのを待っている制御部２３をｗａｋｅｕｐさせる（ステップＳ２６）。If the commit counter is not 0 (step S20), there is a commit-executed transaction, and sleeps (step S21). When the wake up occurs, it is checked again whether the commit counter is 0 (step S20). If the commit counter is 0, it means that there is no transaction in the middle of commit execution, so that the snapshot creation counter is incremented by 1 to indicate that the snapshot is being created (step S22), and snapshot creation is started. Snapshot creation is realized by executing a valid query. In the case of an update query, synchronization must be maintained simultaneously with creation of the snapshot. That is, it is checked whether it is an update query (step S23). If it is not an update query, a normal routine is executed (step S5), and if it is an update query, a synchronization maintenance routine is executed (step S7). Since the snapshot creation is completed regardless of which routine is executed, the snapshot creation counter is decremented by 1 to indicate that fact (step S24). If the snapshot creation counter is 0 (step S25), the controller 23 waiting for the snapshot creation counter to become 0 is waked up (step S26).

図８のフローチャートを参照して同期維持ルーチンを説明する。サーバ１０のうちｌｅａｄｅｒのみにクエリを送信する（ステップＳ３０）。ｌｅａｄｅｒから応答を受信したら（ステップＳ３１）、全てのｆｏｌｌｏｗｅｒへクエリを送信する（ステップＳ３２）。全てのｆｏｌｌｏｗｅｒから応答を受信したら（ステップＳ３３）、多数決で正しい応答を判定し（ステップＳ３４）正しい応答の一つをクライアント５０へ返す（ステップＳ３５）。The synchronization maintaining routine will be described with reference to the flowchart of FIG. A query is transmitted only to the leader among the servers 10 (step S30). When a response is received from the leader (step S31), a query is transmitted to all followers (step S32). When responses are received from all followers (step S33), a correct response is determined by majority vote (step S34), and one of the correct responses is returned to the client 50 (step S35).

図９のフローチャートを参照してコミット処理ルーチンを説明する。前述のように、スナップショットの作成とコミットの実行は排他制御する。The commit processing routine will be described with reference to the flowchart of FIG. As described above, snapshot creation and commit execution are controlled exclusively.

スナップショット作成カウンタが０でない場合（ステップＳ４０）、スナップショット作成中のトランザクションが存在するのでスリープする（ステップＳ４１）。ｗａｋｅｕｐすると再びスナップショット作成カウンタが０かどうかチェックする（ステップ４０）。スナップショット作成カウンタが０の場合、スナップショット作成中のトランザクションがないことを意味するのでコミット実行中であることを表すためにコミットカウンタを１増やし（ステップ４２）通常ルーチンを使ってコミットの実行に入る。コミット実行が終わったらその旨を表すためにコミットカウンタを１減らす（ステップ４３）。コミットカウンタが０ならば（ステップＳ４４）、コミットカウンタが０になるのを待っている制御部２３をｗａｋｅｕｐさせる（ステップＳ４５）。If the snapshot creation counter is not 0 (step S40), there is a transaction for which a snapshot is being created, and the process sleeps (step S41). When the wake up occurs, it is checked again whether the snapshot creation counter is 0 (step 40). If the snapshot creation counter is 0, it means that there is no transaction that is creating a snapshot. Therefore, the commit counter is incremented by 1 to indicate that the commit is being executed (step 42). enter. When the commit execution is finished, the commit counter is decremented by 1 to indicate that fact (step 43). If the commit counter is 0 (step S44), the control unit 23 waiting for the commit counter to become 0 is waked up (step S45).

次に図１０のシーケンス図を参照してライトコンフリクトする更新クエリを含むトランザクションを実行しても同期が維持されることを説明する。つまり、ライトコンフリクトする更新クエリの実行順序がｌｅａｄｅｒとｆｏｌｌｏｗｅｒで必ず同一になることが保障されることを示す。クライアント５０ｍと５０ｎがそれぞれ図２４のトランザクションＴ１とＴ２を実行している場合を示す。仲介装置２０の接続処理部２１はクライアント５０ｍと５０ｎにそれぞれ制御部２２ｘと２２ｙを割り当てたとする。Next, with reference to the sequence diagram of FIG. 10, it will be described that synchronization is maintained even when a transaction including an update query that causes a write conflict is executed. In other words, it indicates that it is guaranteed that the execution order of update queries that have write conflicts is always the same in the leader and the follower. A case where the clients 50m and 50n are executing the transactions T1 and T2 of FIG. 24, respectively, is shown. Assume that the connection processing unit 21 of the mediation apparatus 20 assigns control units 22x and 22y to the clients 50m and 50n, respectively.

クライアント５０ｍはＢＥＧＩＮクエリを送信し制御部２２ｘがこれを受信する（ステップ５０）。制御部２２ｘは全てのサーバ１０にクエリを送信し（ステップＳ５１、５２）、全てのサーバ１０から応答を受信して（ステップＳ５３，５４）、正当な応答の一つをクライアント５０ｍへ返す（ステップＳ５５）。クライアント５０ｎについても同様である（ステップＳ５６〜Ｓ６１）。The client 50m transmits a BEGIN query and the control unit 22x receives it (step 50). The control unit 22x transmits a query to all the servers 10 (steps S51 and 52), receives responses from all the servers 10 (steps S53 and 54), and returns one of the valid responses to the client 50m (steps). S55). The same applies to the client 50n (steps S56 to S61).

次にクライアント５０ｍと５０ｎはほぼ同時にライトコンフリクトするＵＰＤＡＴＥクエリを送信する（ステップＳ６２、Ｓ６３）。そのクエリを受け取った制御部２２ｘと２２ｙはそのクエリをｌｅａｄｅｒのみに送信する（ステップＳ６４、Ｓ６５）。ほぼ同時に受信したクエリのｌｅａｄｅｒでの実行順序は制御部２２が送信した順序と同じとは限らない。制御部２２とサーバ１０間の通信路で順序が入れ替わるかもしれないしサーバ１０のスケジューラで順序が変わることもある。この例の場合はクライアント５０ｍが実行するトランザクションＴ１が先にロックを確保し更新クエリが実行されたことを表しており、制御部２２ｘは応答を受け取る（ステップＳ６６）。一方、ロックを確保できなかったクライアント５０ｎのトランザクションＴ２はロックが確保できるまでブロックする。つまり、更新クエリは実行されずにｌｅａｄｅｒが保持する。Next, the clients 50m and 50n transmit an UPDATE query that causes a write conflict almost simultaneously (steps S62 and S63). Receiving the query, the control units 22x and 22y transmit the query only to the leader (steps S64 and S65). The execution order of the queries received at almost the same time in the reader is not necessarily the same as the order of transmission by the control unit 22. The order may be changed in the communication path between the control unit 22 and the server 10, and the order may be changed by the scheduler of the server 10. In this example, the transaction T1 executed by the client 50m indicates that the lock is secured first and the update query is executed, and the control unit 22x receives a response (step S66). On the other hand, the transaction T2 of the client 50n that could not secure the lock blocks until the lock can be secured. That is, the update query is not executed and held by the leader.

応答を受け取った制御部２２ｘは次にｆｏｌｌｏｗｅｒへ更新クエリを送信する（ステップＳ６７）。ｆｏｌｌｏｗｅｒから応答を受け取ると（ステップ６８）、制御部２２ｘは正当な応答の一つをクライアント５０ｍへ返す（ステップＳ６９）。The control unit 22x that has received the response then transmits an update query to the follower (step S67). When receiving a response from the follower (step 68), the control unit 22x returns one of the valid responses to the client 50m (step S69).

クライアント５０ｍはコミットクエリを送信し制御部２２ｘがこれを受信する（ステップＳ７０）。制御部２２ｘは全てのサーバ１０にクエリを送信し（ステップＳ７１、７２）、全てのサーバ１０から応答を受信して（ステップＳ７３、７４）、正当な応答の一つをクライアント５０ｍへ返す（ステップＳ７５）。The client 50m transmits a commit query, and the control unit 22x receives it (step S70). The control unit 22x sends a query to all the servers 10 (steps S71 and 72), receives responses from all the servers 10 (steps S73 and 74), and returns one of the valid responses to the client 50m (steps). S75).

トランザクションＴ１のコミットが実行されるとＴ１が保持していたｉ＝１のライトロックは開放され、その開放を待っていたトランザクションＴ２がロックを確保しｌｅａｄｅｒで更新クエリが実行されその結果が制御部２２ｙへ返送される（ステップＳ７６）。制御部２２ｙはこのクエリをｆｏｌｌｏｗｅｒへも転送し（ステップＳ７７）、応答を受信する（ステップＳ７８）。制御部２２ｙはＵＰＤＡＴＥ失敗をクライアント５０ｎへ返す（ステップＳ７９）。When the commit of the transaction T1 is executed, the write lock of i = 1 held by the T1 is released, the transaction T2 waiting for the release secures the lock, the update query is executed by the reader, and the result is the control unit. Returned to 22y (step S76). The control unit 22y also transfers this query to the follower (step S77) and receives a response (step S78). The control unit 22y returns UPDATE failure to the client 50n (step S79).

以上のようにして、ｌｅａｄｅｒに送信した更新クエリの応答を受信したらｆｏｌｌｏｗｅｒに送信するように制御することにより、ライトコンフリクトする更新クエリの実行順序を各サーバ１０で同一にすることができるので同期を維持することができる。As described above, when the response to the update query transmitted to the reader is received, control is performed so that the update query is transmitted to the follower, so that the execution order of the update query that causes a write conflict can be made the same in each server 10. Can be maintained.

次に図１１と図１２のシーケンス図を参照してＳＥＬＥＣＴとＣＯＭＭＩＴの実行順序がサーバ毎に異なることなくＳＥＬＥＣＴの結果が一意になることを説明する。クライアント５０ｍと５０ｎがそれぞれ図２５のトランザクションＴ３とＴ４を実行している場合を示す。仲介装置２０の接続処理部２１はクライアント５０ｍと５０ｎにそれぞれ制御部２２ｘと２２ｙを割り当てたとする。Next, it will be described with reference to the sequence diagrams of FIGS. 11 and 12 that the SELECT result is unique without the execution order of SELECT and COMMIT being different for each server. A case where the clients 50m and 50n are executing the transactions T3 and T4 of FIG. 25, respectively, is shown. Assume that the connection processing unit 21 of the mediation apparatus 20 assigns control units 22x and 22y to the clients 50m and 50n, respectively.

クライアント５０ｍはＢＥＧＩＮクエリを送信し、制御部２２ｘがこれを受信した後全てのサーバへ転送し応答を受け取ってクライアント５０ｍへ応答を返す（ステップＳ２００〜Ｓ２０５）。ＵＰＤＡＴＥもｌｅａｄｅｒ及びｆｏｌｌｏｗｅｒで実行されクライアント５０ｍに応答が返る（ステップＳ２０６〜Ｓ２１１）。クライアント５０ｎについても同様にＢＥＧＩＮを実行する（ステップＳ２１２〜Ｓ２１７）。The client 50m transmits a BEGIN query, and the control unit 22x receives the response, transfers it to all servers, receives a response, and returns a response to the client 50m (steps S200 to S205). UPDATE is also executed by the leader and follower, and a response is returned to the client 50m (steps S206 to S211). Similarly, BEGIN is executed for the client 50n (steps S212 to S217).

クライアント５０ｍはＣＯＭＭＩＴを送信し制御部２２ｘがこれを受け取り（ステップＳ２１８）、ほぼ同時にクライアント５０ｎはＳＥＬＥＣＴを送信し制御部２２ｙがこれを受け取る（ステップＳ２１９）。制御部２２ｘと２２ｙはそれぞれ図９のコミット処理ルーチンと図７のスナップショット作成ルーチンを実行し始めるが、図９のスナップショット作成カウンタのチェック（ステップＳ４０）とコミットカウンタの変更（ステップＳ４２）は、図７のコミットカウンタのチェック（ステップＳ２０）とスナップショットカウンタの変更（ステップＳ２２）と排他制御されているので、同時に実行されることは無い。The client 50m transmits COMMIT and the control unit 22x receives it (step S218), and almost simultaneously, the client 50n transmits SELECT and the control unit 22y receives it (step S219). The control units 22x and 22y start to execute the commit processing routine of FIG. 9 and the snapshot creation routine of FIG. 7, respectively, but the snapshot creation counter check (step S40) and the commit counter change (step S42) of FIG. Since the exclusive control is performed with the check of the commit counter (step S20) and the change of the snapshot counter (step S22) in FIG. 7, they are not executed at the same time.

図１１の場合は制御部２２ｘが先にこのクリティカルリージョンを実行したとする。つまり、制御部２２ｘはスナップショット作成カウンタが０なので（ステップＳ４０）コミットカウンタを１に変更し（ステップＳ４２）、ＣＯＭＭＩＴを全てのサーバ１０へ送信する（ステップＳ２２０、２２１）。制御部２２ｙはコミットカウンタをチェックすると（ステップＳ２０）０ではないのでスリープする（ステップ２１）。In the case of FIG. 11, it is assumed that the control unit 22x executes this critical region first. That is, since the snapshot creation counter is 0 (step S40), the control unit 22x changes the commit counter to 1 (step S42), and transmits COMMIT to all the servers 10 (steps S220 and 221). When the control unit 22y checks the commit counter (step S20), the control unit 22y sleeps because it is not 0 (step 21).

制御部２２ｘがＣＯＭＭＩＴ成功を受信すると（ステップＳ２２２，２２３）、コミットカウンタを０に変更する（ステップＳ４３）。コミットカウンタが０なので（ステップＳ４４）制御部２２ｙをｗａｋｅｕｐし（ステップＳ４５）クライアント５０ｍへＣＯＭＭＩＴ成功を送信する（ステップＳ２２４）。When the control unit 22x receives COMMIT success (steps S222 and 223), the commit counter is changed to 0 (step S43). Since the commit counter is 0 (step S44), the controller 22y is waked up (step S45), and a COMMIT success is transmitted to the client 50m (step S224).

制御部２２ｙは再びコミットカウンタをチェックすると（ステップＳ２０）０なのでスナップショット作成カウンタを１に変更しＳＥＬＥＣＴを全てのサーバへ送信する（ステップＳ２２５、Ｓ２２６）。全てのサーバからＴ３の変更が反映されたｄｅｐ＝１を受信し（ステップＳ２２７、Ｓ２２８）、クライアント５０ｎへｄｅｐ＝１を返す（ステップＳ２２９）。When the control unit 22y checks the commit counter again (step S20), it is 0, so the snapshot creation counter is changed to 1, and the SELECT is transmitted to all servers (steps S225 and S226). Dep = 1 reflecting the change of T3 is received from all servers (steps S227 and S228), and dep = 1 is returned to the client 50n (step S229).

図１２は制御部２２ｙが先にクリティカルリージョンを実行した場合を説明する。なお、ステップＳ２００からステップＳ２１９までは図１１と同じなので説明を省略する。制御部２２ｙはコミットカウンタが０なので（ステップＳ２０）スナップショット作成カウンタを１に変更し（ステップＳ２２）、ＳＥＬＥＣＴを全てのサーバ１０へ送信する（ステップＳ２５０、２５１）。制御部２２ｘはスナップショット作成カウンタをチェックすると（ステップＳ４０）０ではないのでスリープする（ステップ４１）。FIG. 12 illustrates a case where the control unit 22y first executes the critical region. Steps S200 to S219 are the same as those in FIG. Since the commit counter is 0 (step S20), the control unit 22y changes the snapshot creation counter to 1 (step S22), and transmits SELECT to all the servers 10 (steps S250 and 251). When the snapshot creation counter is checked (step S40), the control unit 22x sleeps because it is not 0 (step 41).

サーバ１０ａはトランザクションＴ３のＣＯＭＭＩＴが実行されていないので、ｄｅｐの値は古いままである。従って、制御部２２ｙはｄｅｐ＝３をサーバ１０ａから受信する（ステップＳ２５２）。制御部２２ｙはサーバ１０ｂからも同様にｄｅｐ＝３を受信する（ステップＳ２５３）。制御部２２ｙはクライアント５０ｎへｄｅｐ＝３を送信すると（ステップＳ２５４）、スナップショット作成カウンタを０にする（ステップＳ２４）。スナップショット作成カウンタが０なので（ステップＳ２５）、制御部２２ｘをｗａｋｅｕｐする（ステップＳ２６）。Since the server 10a does not execute the COMMIT of the transaction T3, the value of dep remains old. Therefore, the control unit 22y receives dep = 3 from the server 10a (step S252). Similarly, the control unit 22y receives dep = 3 from the server 10b (step S253). When the controller 22y transmits dep = 3 to the client 50n (step S254), it sets the snapshot creation counter to 0 (step S24). Since the snapshot creation counter is 0 (step S25), the controller 22x is waked up (step S26).

制御部２２ｘは再びスナップショット作成カウンタをチェックすると（ステップＳ４０）０なのでコミットカウンタを１に変更しＣＯＭＭＩＴを全てのサーバへ送信する（ステップＳ２５５、Ｓ２５６）。全てのサーバからＣＯＭＭＩＴ成功を受信し（ステップＳ２５７、Ｓ２５８）、クライアント５０ｍへＣＯＭＭＩＴ成功を返す（ステップＳ２５９）。When the control unit 22x checks the snapshot creation counter again (step S40), since it is 0, the commit counter is changed to 1, and COMMIT is transmitted to all servers (steps S255 and S256). COMMIT success is received from all the servers (steps S257 and S258), and COMMIT success is returned to the client 50m (step S259).

次に図１３のシーケンス図を参照してライトコンフリクトしない更新クエリを含むトランザクションを実行した場合、同期を維持しつつ更新クエリを平行に実行できることを説明する。クライアント５０ｍと５０ｎがそれぞれ図２６のトランザクションＴ５とＴ６を実行している場合を示す。仲介装置２０の接続処理部２１はクライアント５０ｍと５０ｎにそれぞれ制御部２２ｘと２２ｙを割り当てたとする。Next, with reference to the sequence diagram of FIG. 13, it will be described that when a transaction including an update query that does not cause a write conflict is executed, the update query can be executed in parallel while maintaining synchronization. A case where the clients 50m and 50n are executing the transactions T5 and T6 of FIG. 26, respectively, is shown. Assume that the connection processing unit 21 of the mediation apparatus 20 assigns control units 22x and 22y to the clients 50m and 50n, respectively.

クライアント５０ｍはＢＥＧＩＮクエリを送信し、制御部２２ｘがこれを受信した後全てのサーバへ転送し応答を受け取ってクライアント５０ｍへ応答を返す（ステップＳ１００〜Ｓ１０５）。クライアント５０ｎについても同様である（ステップＳ１０６〜Ｓ１１１）。The client 50m transmits a BEGIN query, and the control unit 22x receives the response, transfers it to all servers, receives a response, and returns a response to the client 50m (steps S100 to S105). The same applies to the client 50n (steps S106 to S111).

次にクライアント５０ｍと５０ｎはほぼ同時にライトコンフリクトしないＵＰＤＡＴＥクエリを送信する（ステップＳ１１２、Ｓ１１３）。そのクエリを受け取った制御部２２ｘと２２ｙはそのクエリをｌｅａｄｅｒのみに送信する（ステップＳ１１４、Ｓ１１５）。更新クエリそれぞれの更新対象行は異なるため、制御部２２ｘも２２ｙも応答を受け取る（ステップＳ１１６、Ｓ１１７）。制御部２２ｘと２２ｙはｆｏｌｌｏｗｅｒへもクエリを送信しその応答を受け取り（ステップＳ１１８〜Ｓ１２１）、クライアント５０へ応答を返す（ステップＳ１２２，１２３）。Next, the clients 50m and 50n transmit an UPDATE query that does not cause a write conflict almost simultaneously (steps S112 and S113). The control units 22x and 22y that have received the query transmit the query only to the leader (steps S114 and S115). Since the update target rows of the update queries are different, both the control units 22x and 22y receive responses (steps S116 and S117). The control units 22x and 22y also send a query to the follower, receive the response (steps S118 to S121), and return the response to the client 50 (steps S122 and 123).

以上のようにして、ｌｅａｄｅｒに送信した更新クエリの応答を受信したらｆｏｌｌｏｗｅｒに送信するように制御することにより、ライトコンフリクトする更新クエリの実行順序を各サーバ１０で同一にすることができるので同期を維持することができる。また、ライトコンフリクトを起こさない更新クエリは平行に実行できるので性能低下を招く事は無い。さらに、各サーバから返るＳＥＬＥＣＴの結果は一致するため、障害と誤判断することもない。さらに、既存のサーバとクライアントを無改造で利用できるため、低コストで実用的である。As described above, when the response to the update query transmitted to the reader is received, control is performed so that the update query is transmitted to the follower, so that the execution order of the update query that causes a write conflict can be made the same in each server 10. Can be maintained. In addition, since update queries that do not cause write conflicts can be executed in parallel, there is no performance degradation. Furthermore, since the SELECT results returned from the servers match, it is not erroneously determined as a failure. Furthermore, since existing servers and clients can be used without modification, it is practical at low cost.

（第２の実施の形態）
本発明の第２の実施の形態に係る高信頼化データベースシステムについて説明する。本実施の形態が第１の実施の形態と異なる点は、同期から外れているサーバを同期状態にする（以下「同期化処理」という）点である。同期から外れている理由は、障害などで動作が停止していた場合や、新たにサーバ１０を増設する場合などが挙げられる。他の構成・動作等については第１の実施の形態と同様なので、ここでは相違点のみを説明する。(Second Embodiment)
A highly reliable database system according to the second embodiment of the present invention will be described. This embodiment is different from the first embodiment in that a server out of synchronization is set in a synchronized state (hereinafter referred to as “synchronization processing”). Reasons for being out of synchronization include a case where the operation has stopped due to a failure or a case where a server 10 is newly added. Since other configurations and operations are the same as those of the first embodiment, only the differences will be described here.

ここでは、３台のサーバ１０ａ〜１０ｃのうちサーバ１０ｃが同期から外れている状態から該サーバ１０ｃを１０ａおよび１０ｂに同期化させる場合を説明する。Here, the case where the server 10c is synchronized with 10a and 10b from the state where the server 10c is out of synchronization among the three servers 10a to 10c will be described.

図４の仲介装置２０は図１４の仲介装置２３のように変更する。仲介装置２３は、仲介装置２０と同様に接続処理部２１と制御部２２を持つのに加えて、ある時点の正常サーバ（この場合、サーバ１０ａまたはサーバ１０ｂ）のバックアップデータを作るバックアップデータ作成部２４と、そのある時点以降にｌｅａｄｅｒが実行したクエリを保持するバッファ２５と、それらを管理する管理部２６とを備えている。The intermediary device 20 in FIG. 4 is changed to the intermediary device 23 in FIG. The mediation device 23 has a connection processing unit 21 and a control unit 22 in the same manner as the mediation device 20, and also creates a backup data creation unit that creates backup data of a normal server at this point (in this case, the server 10 a or the server 10 b). 24, a buffer 25 for holding queries executed by the reader after that time, and a management unit 26 for managing them.

バックアップデータ作成部２４は、管理部２６の指示である時点の正常サーバのデータのバックアップを作成する。バッファ２５は、管理部２６の指示でｌｅａｄｅｒから応答を受け取った順序（ｆｏｌｌｏｗｅｒへ送信する順序に等しい）でクエリを保存する。管理部２６は同期化処理を開始する指示を受け取った場合、各部へ適切な指示を行う。初期状態ではバッファ２５は空である。The backup data creation unit 24 creates a backup of normal server data at the time point specified by the management unit 26. The buffer 25 stores the queries in the order in which the response is received from the leader according to the instruction from the management unit 26 (equal to the order in which the response is transmitted to the follower). When the management unit 26 receives an instruction to start the synchronization process, the management unit 26 gives an appropriate instruction to each unit. In the initial state, the buffer 25 is empty.

図１５に示すように、クライアント５０ｍはトランザクションを開始する（ステップＳ３００〜Ｓ３０５）。ここで管理部２６は同期化指示を受け取ると、クライアント５０からの新規接続は保留し新たなトランザクションを開始しないように接続処理部２１へ指示する（ステップＳ３０６）。同期化指示はシステム管理者が仲介装置に接続したキーボードから入力したり、図示していない管理端末からネットワークを通して入力したり、することで管理部２６へ送られる。管理部２６はすでに開始しているトランザクションの終了を待つ。具体的には、クライアント５０ｍがＵＰＤＡＴＥを実行し（ステップＳ３０７〜Ｓ３１２）、ＣＯＭＭＩＴを実行する（ステップＳ３１３〜Ｓ３１８）まで待つ。As shown in FIG. 15, the client 50m starts a transaction (steps S300 to S305). When the management unit 26 receives the synchronization instruction, the management unit 26 instructs the connection processing unit 21 not to start a new transaction with the new connection from the client 50 being suspended (step S306). The synchronization instruction is sent to the management unit 26 when the system administrator inputs it from a keyboard connected to the mediating apparatus or from a management terminal (not shown) via a network. The management unit 26 waits for the end of the already started transaction. Specifically, the client 50m executes UPDATE (steps S307 to S312), and waits until COMMIT is executed (steps S313 to S318).

サーバ１０で実行中のトランザクションがなくなったので管理部２６はバックアップデータ作成部２４に正常サーバのバックアップを取るように指示する（ステップＳ３１９）。指示を受けたバックアップデータ作成部２４は正常サーバ１０のうち一台を選んでバックアップデータを作成する。この例ではサーバ１０ｂのバックアップデータを作成する。バックアップデータを作成する方法は、例えばＰｏｓｔｇｒｅＳＱＬならばｐｇ＿ｄｕｍｐを使う。このツールはバックアップデータを作成中に新たな更新が実行されても、その更新はバックアップデータに反映されない特徴を持っている。Since there is no transaction being executed in the server 10, the management unit 26 instructs the backup data creation unit 24 to take a backup of the normal server (step S319). Upon receiving the instruction, the backup data creation unit 24 selects one of the normal servers 10 and creates backup data. In this example, backup data for the server 10b is created. As a method for creating the backup data, for example, in the case of PostgreSQL, pg_dump is used. This tool has a feature that even if a new update is executed while creating backup data, the update is not reflected in the backup data.

バックアップデータの取得開始を確認した管理部２６は、接続処理部２１に対してクライアント５０からの新規接続を再開するように指示する。接続処理部２１は５０ｎからの接続要求を処理し、制御部２２はＢＥＧＩＮを受け取る（ステップＳ３２０）。制御部２２はサーバ１０にＢＥＧＩＮを送信し（ステップＳ３２１、Ｓ３２２）、サーバ１０から応答を受け取る（ステップＳ３２３、Ｓ３２４）。応答を受け取ったことを契機に制御部２２はＢＥＧＩＮクエリをバッファ２５へ保存し（ステップＳ３２５）、応答をクライアント５０ｎへ返す（ステップＳ３２６）。ここでバッファ２５へ保存するタイミングはｌｅａｄｅｒから応答をもらった時点である。The management unit 26 that has confirmed the start of backup data acquisition instructs the connection processing unit 21 to resume the new connection from the client 50. The connection processing unit 21 processes the connection request from 50n, and the control unit 22 receives BEGIN (step S320). The control unit 22 transmits BEGIN to the server 10 (steps S321 and S322) and receives a response from the server 10 (steps S323 and S324). Upon receiving the response, the control unit 22 saves the BEGIN query in the buffer 25 (step S325) and returns the response to the client 50n (step S326). Here, the timing of saving in the buffer 25 is the time when a response is received from the leader.

次に図１６に示すようにクライアント５０ｎはＳＥＬＥＣＴを送信し（ステップＳ３２７）、制御部２２がこれをサーバへ転送し（Ｓ３２８、Ｓ３２９）サーバ１０から応答を受け取る（Ｓ３３０、Ｓ３３１）とバッファ２５へＳＥＬＥＣＴを保存し（ステップＳ３３２）クライアント５０ｎへ応答を返す（ステップＳ３３３）。Next, as shown in FIG. 16, the client 50n transmits SELECT (step S327), the control unit 22 transfers it to the server (S328, S329), receives a response from the server 10 (S330, S331), and enters the buffer 25. The SELECT is saved (step S332), and a response is returned to the client 50n (step S333).

ここでバックアップデータの取得が完了したことをバックアップデータ作成部２４は管理部２６へ通知し（ステップＳ３３４）、それを受けた管理部２６はそのバックアップデータを使ってデータベースを構築するようにサーバ１０ｃへ指示する（ステップＳ３３５）。サーバ１０ｃはｐｓｑｌなどを使ってデータベース構築を開始する（ステップＳ３３６）。Here, the backup data creation unit 24 notifies the management unit 26 that the acquisition of the backup data has been completed (step S334), and the management unit 26 that receives the backup data creates a database using the backup data. (Step S335). The server 10c starts database construction using psql or the like (step S336).

クライアント５０ｎはＵＰＤＡＴＥを送信した（ステップＳ３３７）。制御部２２はｌｅａｄｅｒであるサーバ１０ａへＵＰＤＡＴＥを送信し（ステップＳ３３８）、サーバ１０ａから応答を受信する（ステップＳ３３９）。この時点で制御部２２はＵＰＤＡＴＥをバッファ２５へ保存する（ステップＳ３４０）。ｌｅａｄｅｒから応答を受け取った順序でバッファに蓄積する理由は、サーバ１０ｃをｆｏｌｌｏｗｅｒとしてクエリを実行することでｌｅａｄｅｒに同期させるためである。The client 50n transmits UPDATE (step S337). The control unit 22 transmits UPDATE to the server 10a, which is a leader (step S338), and receives a response from the server 10a (step S339). At this point, the control unit 22 stores UPDATE in the buffer 25 (step S340). The reason why the responses are accumulated in the buffer in the order in which the responses are received from the leader is to synchronize with the leader by executing a query with the server 10c as a follower.

制御部２２はｆｏｌｌｏｗｅｒであるサーバ１０ｂへＵＰＤＡＴＥを送信し（ステップＳ３４１）、サーバ１０ｂから応答を受信して（ステップＳ３４２）、クライアント５０ｎへ応答を返す（ステップＳ３４３）。The control unit 22 transmits UPDATE to the server 10b as a follower (step S341), receives a response from the server 10b (step S342), and returns a response to the client 50n (step S343).

ここでサーバ１０ｃのデータベース構築が完了し（ステップＳ３４４）、管理部２６はそれを知る（ステップＳ３４５）。ただし、サーバ１０ｃはステップＳ３１８までが反映された１０ａおよび１０ｂのデータベースに復旧したのであって、現在の１０ａおよび１０ｂに同期した状態ではない。同期した状態にするためにはステップＳ３２０以降のクエリを実行する必要がある。それらのクエリはバッファ２５に保存してある。Here, the database construction of the server 10c is completed (step S344), and the management unit 26 knows this (step S345). However, the server 10c has been restored to the databases 10a and 10b reflecting the steps up to step S318, and is not in a state synchronized with the current 10a and 10b. In order to achieve the synchronized state, it is necessary to execute the query after step S320. Those queries are stored in the buffer 25.

制御部２２はバッファの先頭から順にクエリを実行する。バッファに保存されている順序はｆｏｌｌｏｗｅｒが実行すべき順序で保存されているので、この順序でサーバ１０ｃへ送信し実行すれば必ずサーバ１０ａ及び１０ｂと同期する。管理部２６は制御部２２に対してバッファ内のクエリの実行を指示する（ステップＳ３４６）。制御部２２はバッファ２５の先頭から順にクエリを実行する（ステップＳ３４７〜ステップＳ３５２）。バッファ２５内が空になったらサーバ１０ｃはサーバ１０ａおよび１０ｂと同期したのでシステムに追加する。The control unit 22 executes the queries in order from the top of the buffer. Since the order stored in the buffer is stored in the order in which the follower should be executed, if the data is transmitted to the server 10c and executed in this order, it is always synchronized with the servers 10a and 10b. The management unit 26 instructs the control unit 22 to execute the query in the buffer (step S346). The control unit 22 executes the queries in order from the top of the buffer 25 (steps S347 to S352). When the buffer 25 becomes empty, the server 10c is synchronized with the servers 10a and 10b and is added to the system.

以上のようにして、第１の実施の形態を応用してサービス停止時間を極力無くして同期から外れたサーバをｌｅａｄｅｒに同期させることができる。つまり、バックアップデータを使ってある時点まで復旧したサーバ１０にある時点以降のクエリを実行することによってｌｅａｄｅｒに同期化することができる。ここである時点以降のクエリはｆｏｌｌｏｗｅｒの実行を再現するので、同期化に失敗することなくｌｅａｄｅｒに同期化できる。他の効果については第１の実施の形態と同様である。As described above, by applying the first embodiment, it is possible to synchronize a server that is out of synchronization with the leader with minimal service outage time. That is, it is possible to synchronize with the leader by executing a query after a certain point in the server 10 restored to a certain point using the backup data. Since the query after a certain point here reproduces the execution of the follower, it can be synchronized with the reader without failing in synchronization. Other effects are the same as those in the first embodiment.

なお、本実施形態では、バッファ２５には全てのクエリを蓄積したが、データベースの状態を変えるクエリのみを蓄積しても良い。例えば、単純なデータのリードなどはデータベースの状態を変えないのでバッファ２５への蓄積を省略しても良い。ただし、単純なリードであっても、トランザクションの最初の有効なクエリであってスナップショットを作成するトリガになっている場合省略はできない。このような工夫で、バッファ２５を節約することができる。In the present embodiment, all queries are stored in the buffer 25, but only queries that change the state of the database may be stored. For example, since simple data read does not change the state of the database, accumulation in the buffer 25 may be omitted. However, even a simple read cannot be omitted if it is the first valid query in the transaction and is a trigger to create a snapshot. With such a device, the buffer 25 can be saved.

また、本実施の形態では、同期化指示を受け取った後、実行中のトランザクションが無い状態を作り出してからバックアップデータの作成を開始したが、同期化指示を受け取る前からあらかじめクエリの保存をしておけば、同期化指示を受け取った直後にバックアップデータの作成が開始できる。In this embodiment, after receiving the synchronization instruction, the creation of the backup data is started after creating a state in which there is no transaction being executed. However, before the synchronization instruction is received, the query is saved in advance. If so, the creation of backup data can be started immediately after receiving the synchronization instruction.

例えば、図１５において、ＢＥＧＩＮクエリ（ステップＳ３００）をあらかじめバッファに保存してあれば、同期化指示を受け取った時点（ステップＳ３０６）でバックアップデータを作成できる。なぜならば、このバックアップデータにはコミットされていないステップＳ３００から始まるトランザクションは含まれていないので、そのトランザクションのクエリがバッファに保存されていれば良い。For example, in FIG. 15, if the BEGIN query (step S300) is stored in the buffer in advance, the backup data can be created when the synchronization instruction is received (step S306). This is because the backup data does not include an uncommitted transaction starting from step S300, and it is only necessary that the query of the transaction is stored in the buffer.

あらかじめバッファに保存しておく一つの方法として、全てのクエリをバッファに保存しておく方法が考えられる。しかし、この方法ではどんどんバッファを食いつぶしてしまう。そこで、コミットされてデータベースに反映されたトランザクションのクエリはバッファから削除するようにすれば、バッファを食いつぶしてしまうことはない。As one method of storing in the buffer in advance, a method of storing all the queries in the buffer can be considered. However, this method consumes more and more buffer. Therefore, if the transaction query that has been committed and reflected in the database is deleted from the buffer, the buffer will not be devoured.

これらの工夫により、サービスの停止時間は全く無くすことができる。With these devices, the service stop time can be completely eliminated.

（第３の実施の形態）
本発明の第３の実施の形態に係る高信頼化データベースシステムについて説明する。本実施の形態が第１の実施の形態と異なる点は、クエリの送信タイミングである。他の構成・動作等については第１の実施の形態と同様なので、ここでは相違点のみを説明する。(Third embodiment)
A highly reliable database system according to the third embodiment of the present invention will be described. This embodiment is different from the first embodiment in query transmission timing. Since other configurations and operations are the same as those in the first embodiment, only the differences will be described here.

第１の実施の形態では多数決を行っているため（３）は必要な条件であった。しかし、（３）によってシステム全体の性能は最も性能が低いサーバに足を引っ張られる（最も遅い応答が返って来ないとクライアントへ応答が返らない）ことになり、性能を重視する場合はこれが欠点となる。そこで、多数決ではなく、ｌｅａｄｅｒからの応答が常に正しいとしてｌｅａｄｅｒから応答が返ったら即座にクライアントへ応答を返すことでこの欠点が解消できる。In the first embodiment, since majority vote is performed, (3) is a necessary condition. However, due to (3), the performance of the entire system will be pulled to the server with the lowest performance (the response will not be returned to the client unless the slowest response is returned). It becomes. Therefore, this disadvantage can be solved by returning the response to the client immediately when the response from the leader is returned, assuming that the response from the leader is always correct, instead of the majority vote.

ただし、第１の実施の形態では条件（ａ）と（３）によって同一トランザクション内の同一データを更新する更新クエリの順序を保障していた。上記改良を行うことで他の方法で同一トランザクション内の同一データを更新する更新クエリの順序の一意性を保障しなければならない。そこで、（ａ）と（３）は削除し（１）と（２）をそれぞれ以下の（Ｉ）と（ＩＩ）に変更する。However, in the first embodiment, the order of update queries for updating the same data in the same transaction is guaranteed by the conditions (a) and (3). By making the above improvements, it is necessary to guarantee the uniqueness of the order of update queries that update the same data in the same transaction by other methods. Therefore, (a) and (3) are deleted and (1) and (2) are changed to the following (I) and (II), respectively.

（Ｉ）仲介装置２０は、もし、クライアント５０から新しい更新クエリを受信した時、既にサーバ１０へ送信した別の古い更新クエリに対して応答を返していないサーバ１０が存在しかつそれらの更新クエリが同一トランザクションに属する場合、新しい更新クエリを仲介装置２０で保持する。そして、その応答を受信したら、新しい更新クエリをｌｅａｄｅｒのみに送信する。
（ＩＩ））仲介装置２０はその更新クエリの応答をｌｅａｄｅｒから受信したら、その更新クエリを全てのｆｏｌｌｏｗｅｒへ送信すると同時にその応答をクライアント５０へ返す。(I) If the mediation device 20 receives a new update query from the client 50, there are servers 10 that have not returned a response to another old update query that has already been transmitted to the server 10, and those update queries. Are in the same transaction, the new update query is held in the mediating apparatus 20. When the response is received, a new update query is transmitted only to the leader.
(II)) When the intermediary device 20 receives the update query response from the leader, it transmits the update query to all followers and simultaneously returns the response to the client 50.

なお、（Ｉ）におけるｌｅａｄｅｒへの送信中断は同一トランザクション内の更新クエリ同士が対象であることに注意したい。クライアント５０が複数のトランザクションを実行する場合があるが、トランザクションが異なれば（Ｉ）における送信中断はせずにトランザクションを平行実行して高性能化を実現する。Note that the interruption of transmission to the leader in (I) is for update queries in the same transaction. There are cases where the client 50 executes a plurality of transactions, but if the transactions are different, the transactions are executed in parallel without interrupting the transmission in (I) to achieve high performance.

上記（Ｉ）（ＩＩ）のルールを満たすためには、図５の全体ルーチン、図７のスナップショット作成ルーチン、図９のコミット処理ルーチンは変わらないが、図６の通常ルーチンと図８の同期維持ルーチンはそれぞれ図１７と図１８に変わる。In order to satisfy the above rules (I) and (II), the entire routine of FIG. 5, the snapshot creation routine of FIG. 7, and the commit processing routine of FIG. 9 are not changed, but the normal routine of FIG. 6 and the synchronization of FIG. The maintenance routine changes to FIGS. 17 and 18, respectively.

図１７の通常処理ルーチンは多数決（ステップＳ１２）が無くなるだけである。正しい応答の一つをクライアント５０へ送信するステップＳ１３は、ｌｅａｄｅｒの応答を送信する。In the normal processing routine of FIG. 17, the majority vote (step S12) is merely eliminated. In step S13 in which one of the correct responses is transmitted to the client 50, a leader response is transmitted.

図１８の同期維持ルーチンは、ｌｅａｄｅｒへ更新クエリを送信し（ステップＳ３０）、ｌｅａｄｅｒから応答を受信する（ステップＳ３１）までは同じだが、即座にｌｅａｄｅｒからの応答をクライアント５０へ送信し（ステップＳ３６）、先に送信した更新クエリがあるかどうかをチェックする（ステップＳ３７）。無い場合は全てのｆｏｌｌｏｗｅｒへ現在の更新クエリを送信し（ステップＳ３９）、ある場合は先に送信した更新クエリの応答を全て受信した（ステップＳ３８）後で、全てのｆｏｌｌｏｗｅｒへ現在の更新クエリを送信する（ステップＳ３９）。The synchronization maintaining routine of FIG. 18 transmits the update query to the leader (step S30) and the same process until the response is received from the leader (step S31), but immediately transmits the response from the leader to the client 50 (step S36). ), It is checked whether there is an update query transmitted earlier (step S37). If there is not, the current update query is transmitted to all followers (step S39). If there is, all the responses of the update query transmitted earlier are received (step S38), and then the current update query is sent to all followers. Transmit (step S39).

また、ＳＥＬＥＣＴなどデータベースの状態を何も変更しないクエリの実行はいずれか一つのサーバで実行しても良い。これにより、負荷分散できるのでシステム全体の性能が向上する効果がある。具体的には、図１９に示すように、一つのサーバ１０へクエリを送信し（ステップＳ８０）、そのサーバ１０から応答を受信し（ステップＳ８１）、その応答をクライアント５０へ送信する（ステップＳ８２）。図１９の負荷分散ルーチンを使う場合は、図５の全体ルーチンを図２０のように変更する。In addition, execution of a query that does not change the state of the database such as SELECT may be executed by any one server. As a result, the load can be distributed, and the performance of the entire system is improved. Specifically, as shown in FIG. 19, a query is transmitted to one server 10 (step S80), a response is received from the server 10 (step S81), and the response is transmitted to the client 50 (step S82). ). When the load distribution routine of FIG. 19 is used, the entire routine of FIG. 5 is changed as shown in FIG.

以上のようにして、システムの応答時間を短縮することができ、システム全体のスループットを上げることができる。また、負荷分散を導入することによって、さらにスループットを上げることができる。他の効果については第１の実施の形態と同様である。As described above, the response time of the system can be shortened, and the throughput of the entire system can be increased. Further, the throughput can be further increased by introducing load balancing. Other effects are the same as those in the first embodiment.

なお、本実施の形態は、第１の実施の形態の変形例として説明したが、第２の実施の形態において同様の変形を適用できることは言うまでもない。In addition, although this Embodiment was demonstrated as a modification of 1st Embodiment, it cannot be overemphasized that the same deformation | transformation can be applied in 2nd Embodiment.

以上、本発明の実施の形態について詳述したが、上記実施の形態は例示的なものであり、本発明はこれに限定されるものではない。本発明の範囲は特許請求の範囲に示されており、この特許請求の範囲の意味に入る全ての変形例は本発明に含まれるものである。なお、以下の各変形例は適宜組み合わせて上記各実施の形態に適用できる。As mentioned above, although embodiment of this invention was explained in full detail, the said embodiment is an illustration and this invention is not limited to this. The scope of the invention is set forth in the appended claims, and all modifications that come within the meaning of the claims are intended to be embraced by the invention. Note that the following modifications can be applied to the above-described embodiments in appropriate combinations.

上記実施の形態では、トランザクション隔離レベルとしてＳＥＲＩＡＬＩＺＡＢＬＥについて説明してきたが、他のトランザクション隔離レベルでも良い。ＳＥＲＩＡＬＩＺＡＢＬＥは最初に一回スナップショットを作成したらそれ以降スナップショットを変更することはないが、例えば、ＲＥＡＤＣＯＭＭＩＴＴＥＤの場合は、ＣＯＭＭＩＴが実行される毎にスナップショットを獲得し直すように変形すれば良い。In the above embodiment, SERIALIZEABLE has been described as the transaction isolation level, but other transaction isolation levels may be used. SERIALIZABLE does not change the snapshot after the first snapshot is created, but for example, in the case of READ COMMITTED, it may be modified so that the snapshot is reacquired every time COMMIT is executed. .

また、ＳＥＲＩＡＬＩＺＡＢＬＥではリードは他のクエリとコンフリクトしないが、ライトとコンフリクトした場合、リードがライトをブロックするような、トランザクション隔離レベルでも良い。この場合、更新クエリだけでなくＳＥＬＥＣＴなど読み込みクエリも更新クエリとして扱い上記同期維持アルゴリズムで実行すれば良い。In SERIALIZEABLE, the read does not conflict with other queries, but in the case of conflict with the write, the transaction isolation level may be such that the read blocks the write. In this case, not only the update query but also a read query such as SELECT may be handled as an update query and executed by the synchronization maintaining algorithm.

また、上記実施の形態では仲介装置を独立した装置として説明したが、仲介装置の機能を実現するプログラムとデータベースサーバを同一コンピュータ上で動作させても良い。これにより、仲介装置のハードウェアが必要なくなり、コストを削減する効果がある。In the above embodiment, the mediation device is described as an independent device. However, the program realizing the function of the mediation device and the database server may be operated on the same computer. This eliminates the need for hardware of the mediation device, and has the effect of reducing costs.

また、仲介装置を複数用意しても良い。これにより、システム全体の信頼性がより高まる。A plurality of mediation devices may be prepared. This further increases the reliability of the entire system.

また、上記実施例ではｌｅａｄｅｒまたはｆｏｌｌｏｗｅｒに選定されたサーバは常に変わらなかったが、変えても良い。例えば、ｌｅａｄｅｒが故障した場合、そのｌｅａｄｅｒをシステムから切り離し、ｆｏｌｌｏｗｅｒの中から新しいｌｅａｄｅｒを選定しても良い。これにより、故障に対する耐性が高まりより高信頼化が実現できる。In the above embodiment, the server selected as the leader or follower is not always changed, but may be changed. For example, when a leader breaks down, the leader may be disconnected from the system, and a new leader may be selected from the followers. As a result, resistance to failure is increased and higher reliability can be realized.

本発明は、複数のデータベースサーバを平行動作させる高信頼化データベースシステムに適用することができる。 The present invention can be applied to a highly reliable database system in which a plurality of database servers are operated in parallel.

第１の実施形態に係る高信頼化データベースシステムの構成図Configuration diagram of a highly reliable database system according to the first embodiment ライトコンフリクトを説明する図Diagram explaining write conflict ｓｎａｐｓｈｏｔｉｓｏｌａｔｉｏｎを説明する図Diagram explaining snapshot isolation 第１の実施形態に係る仲介装置の構成図The block diagram of the mediation apparatus which concerns on 1st Embodiment 第１の実施形態に係る仲介装置の全体の動作を説明するフローチャートThe flowchart explaining the whole operation | movement of the mediation apparatus which concerns on 1st Embodiment. 第１の実施形態に係る通常ルーチンを説明するフローチャートFlowchart for explaining a normal routine according to the first embodiment 第１の実施形態に係るスナップショット作成ルーチンを説明するフローチャートFlowchart for explaining a snapshot creation routine according to the first embodiment 第１の実施形態に係る同期維持ルーチンを説明するフローチャートFlowchart for explaining a synchronization maintaining routine according to the first embodiment 第１の実施形態に係るコミット処理ルーチンを説明するフローチャートFlowchart for explaining a commit processing routine according to the first embodiment ライトコンフリクトする場合の動作を説明するシーケンスチャートSequence chart explaining operation in case of write conflict ＣＯＭＭＩＴとＳＥＬＥＣＴの実行順序によってＳＥＬＥＣＴの結果が変わる場合の動作を説明するシーケンスチャートSequence chart for explaining the operation when the result of SELECT changes depending on the execution order of COMMIT and SELECT ＣＯＭＭＩＴとＳＥＬＥＣＴの実行順序によってＳＥＬＥＣＴの結果が変わる場合の動作を説明するシーケンスチャートSequence chart for explaining the operation when the result of SELECT changes depending on the execution order of COMMIT and SELECT ライトコンフリクトしない場合の動作を説明するシーケンスチャートSequence chart explaining the operation when there is no write conflict 第２の実施形態に係る仲介装置の構成図The block diagram of the mediation apparatus which concerns on 2nd Embodiment 同期から外れたサーバを同期する場合の動作を説明するシーケンスチャートSequence chart explaining the operation when synchronizing servers that are out of sync 同期から外れたサーバを同期する場合の動作を説明するシーケンスチャートSequence chart explaining the operation when synchronizing servers that are out of sync 第３の実施形態に係る通常ルーチンを説明するフローチャートFlowchart for explaining a normal routine according to the third embodiment 第３の実施形態に係る同期維持ルーチンを説明するフローチャートFlowchart for explaining a synchronization maintaining routine according to the third embodiment 第３の実施形態に係る負荷分散ルーチンを説明するフローチャートFlowchart for explaining a load distribution routine according to the third embodiment 第３の実施形態に係る仲介装置の全体の動作を説明するフローチャートThe flowchart explaining the whole operation | movement of the mediation apparatus which concerns on 3rd Embodiment. 従来の高信頼化データベースシステムの構成図Configuration diagram of a conventional highly reliable database system 従来の高信頼化データベースシステムの構成図Configuration diagram of a conventional highly reliable database system データベースサーバが保持しているテーブルの一例An example of a table held by the database server ライトコンフリクトするトランザクションの一例An example of a conflicting transaction 実行順序が問題となるトランザクションの一例An example of a transaction whose execution order is a problem ライトコンフリクトしないトランザクションの一例An example of a transaction that does not conflict 従来の高信頼化データベースシステムのために図２６を改造したトランザクションの一例Example of transaction modified from FIG. 26 for a conventional highly reliable database system

Explanation of symbols

１０…サーバ、２０，２３…仲介装置、２１…接続処理部、２２…制御部、２４…バックアップデータ作成部、２５…バッファ、２６…管理部、３０，４０…ネットワーク、５０…クライアントDESCRIPTION OF SYMBOLS 10 ... Server, 20, 23 ... Mediation apparatus, 21 ... Connection processing part, 22 ... Control part, 24 ... Backup data creation part, 25 ... Buffer, 26 ... Management part, 30, 40 ... Network, 50 ... Client

Claims

When multiple processing requests try to access the same data at the same time, only one processing request is executed and the others are waiting, and processing requests from client computers are relayed to one or more database servers In addition, in a highly reliable database system including an intermediary device that returns one of the legitimate responses from the database server as a processing result to a client computer, a method of maintaining synchronization between the database servers,
The intermediary device selects in advance one of the plurality of database servers as a leader and the rest as a follower,
When a processing request is received from a client computer,
(A) Send the processing request only to the leader,
(B) When a response to the processing request is received from the reader, the processing request is transmitted to the follower.

When the intermediary device receives a processing request (hereinafter referred to as a “new processing request”) from the client computer in step (a), the intermediary device has already transmitted another processing request (hereinafter referred to as an “old processing request”) to the server. If there is a server that has not returned a response to the message and the new processing request and the old processing request belong to the same transaction, the new processing request is held in an intermediary device, and when the response is received, the new processing request is The synchronization method in the highly reliable database system according to claim 1, wherein the synchronization method is transmitted only to the database.

3. The synchronization in the highly reliable database system according to claim 1, wherein the intermediary device returns one of the responses to the client computer when receiving all the responses from the follower that transmitted the processing request. Method.

The synchronization method in the highly reliable database system according to any one of claims 1 to 3, wherein the client computer transmits a new processing request after receiving a response to the processing request that has already been transmitted.

The intermediary device includes a difference information storage unit that stores processing requests from client computers as difference information, and when there is a synchronization request for a database server that is out of synchronization (hereinafter referred to as “out of synchronization database server”), Start creation of backup data from the operating database server, store processing requests received from the client computer in the difference information storage unit in the order in which the responses from the reader are received, and when the backup data creation is completed, the backup data is stored The out-of-synchronization database server is used, and when restoration of the database from the backup data is completed in the out-of-synchronization database server, the processing requests stored in the difference information storage unit are sequentially sent to the out-of-sync database server To The synchronization method in the highly reliable database system according to any one of claims 1 to 4.

When the intermediary device receives a processing request from a client computer, the processing request performs a synchronization maintaining process in the case of a processing request that affects the operation of a processing request involving database update or a processing request involving database update,
In the synchronization maintaining process, the step (a) is performed, and the step (b) is performed.
The synchronization method in the highly reliable database system according to any one of claims 1 to 5.

When the intermediary device receives a processing request from the client computer, the processing request performs the synchronization maintaining process in the case of the processing request that affects the operation of the processing request that involves updating the database or the processing request that involves updating the database, If the processing request is neither a processing request involving database update nor a processing request affecting the operation of the processing request involving database update nor a processing request for confirming update to the database, normal processing is performed, and the processing request If the process request is not a process request involving database update but a process request that affects the operation of the process request involving database update, it is a process request for confirming the update to the database, and a confirmation process is performed.
In the normal processing, a processing request from the client computer is transmitted to the server, a response is received from the server that transmitted the processing request,
In the confirmation process, if another process request is being executed, the process waits until the other process request is not being executed, and if the other process request is not being executed, the other process request cannot be executed, The synchronization method in the highly reliable database system according to any one of claims 1 to 6, wherein normal processing is performed and another processing request can be executed again.

When the mediation device receives a processing request from the client computer, if a snapshot that is a copy of the database has not been created, and if the processing request is a processing request for creating a snapshot, the mediation device performs a snapshot creation process. Is not created and the processing request is not a processing request for creating a snapshot, normal processing is performed, and when the snapshot has been created and the processing request is a processing request involving database update or processing involving database update In the case of a processing request that affects the operation of the request, the synchronization maintenance processing is performed, and if the snapshot has already been created and the processing request is not a processing request that involves updating the database, it affects the operation of the processing request that involves updating the database. Give processing request not data If it is not a processing request to confirm the update to the base Perform normal processing, and if the snapshot has already been created and the processing request is not a processing request that involves database update, it affects the operation of the processing request that involves database update If it is not a processing request to be given but a processing request to confirm the update to the database, a confirmation process is performed,
In the snapshot creation process, when a process request for confirming an update to the database is being executed, a process for confirming an update to the database is waited until a process request for confirming an update to the database is not being executed. When the request is not being executed, the processing request for confirming the update to the database cannot be executed, and the processing request affects the operation of the processing request with the database update or the processing request with the database update. If this is not the case, it is not a processing request that involves updating the database but a processing request that affects the operation of the processing request that involves updating the database. Allow the request to execute,
In the confirmation process, if a process request to create a snapshot is being executed, the process waits until the process request to create a snapshot is not being executed, and if a process request to create a snapshot is not being executed, The high reliability according to any one of claims 1 to 7, wherein a processing request for creating a shot cannot be executed, normal processing is performed, and a processing request for creating a snapshot can be executed again. A synchronization method in a database system.

When multiple processing requests try to access the same data at the same time, only one processing request is executed and the others are waiting, and processing requests from client computers are relayed to one or more database servers And a highly reliable database system including an intermediary device that returns one of the legitimate responses from the database server as a processing result to the client computer,
The intermediary device preselects one of the plurality of database servers as a leader and the rest as a follower, and upon receiving a processing request from a client computer, transmits the processing request only to the leader, and the processing from the leader A high-reliability database system comprising a control unit that transmits a processing request to a follower when a response to the request is received.

When multiple processing requests try to access the same data at the same time, only one processing request is executed and the others are waiting, and processing requests from client computers are relayed to one or more database servers And an intermediary method in a highly reliable database system comprising an intermediary device that returns one of the legitimate responses from the database server as a processing result to a client computer,
One of the plurality of database servers is selected in advance as a leader and the rest as a follower. When a processing request is received from a client computer, the processing request is transmitted only to the leader, and a response to the processing request is sent from the leader. An intermediary method characterized by transmitting the processing request to a follower when received.

When multiple processing requests try to access the same data at the same time, only one processing request is executed and the others are waiting, and processing requests from client computers are relayed to one or more database servers And an intermediary device in a highly reliable database system comprising an intermediary device that returns one of the legitimate responses from the database server as a processing result to a client computer,
One of the plurality of database servers is selected in advance as a leader and the rest as a follower. When a processing request is received from a client computer, the processing request is transmitted only to the leader, and a response to the processing request is sent from the leader. An intermediary device comprising a control unit that transmits the processing request to a follower when received.

When multiple processing requests try to access the same data at the same time, only one processing request is executed and the others are waiting, and processing requests from client computers are relayed to one or more database servers And a program for realizing an intermediary device in a highly reliable database system comprising an intermediary device that returns one of the legitimate responses from the database server as a processing result to a client computer,
A computer is selected in advance as one of the plurality of database servers as leader and the rest as follower. When a processing request is received from a client computer, the processing request is transmitted only to the leader, and the processing request is sent from the leader. An intermediary program that functions as a control unit that transmits the processing request to a follower upon receiving a response to.