JP5002296B2

JP5002296B2 - Cluster system and program

Info

Publication number: JP5002296B2
Application number: JP2007081623A
Authority: JP
Inventors: 孝治村松; 和樹才藤; 茂夫大道; 雅田中; 雅樹阿部
Original assignee: Toshiba Corp; Toshiba Solutions Corp
Current assignee: Toshiba Corp; Toshiba Digital Solutions Corp
Priority date: 2007-03-27
Filing date: 2007-03-27
Publication date: 2012-08-15
Anticipated expiration: 2027-03-27
Also published as: JP2008242741A

Description

本発明は、複数のサーバマシンと共有ディスク装置とから構成されたクラスタシステム及びプログラムに関し、特に、フェールオーバが発生した場合であっても、共有ディスク装置内のデータの整合を図ることが可能なクラスタシステム及びプログラムに関する。 The present invention relates to a cluster system and a program composed of a plurality of server machines and a shared disk device, and in particular, a cluster capable of matching data in the shared disk device even when a failover occurs. The present invention relates to a system and a program.

例えば、非特許文献１で開示されているクラスタシステムでは、図１６に示すように、稼動系のサーバマシン１０（＃Ａ）と待機系のサーバマシン１０（＃Ｂ）とが設けられ、稼動系のサーバマシン１０（＃Ａ）がアプリケーション１１（＃Ａ）を実行し、（ｉ）に示すように、実行結果であるデータを共有ディスク装置１３に書き込む。その間、（ｉｉ）に示すように、稼動系のサーバマシン１０（＃Ａ）のクラスタソフト１２（＃Ａ）と待機系のサーバマシン１０（＃Ｂ）のクラスタソフト１２（＃Ｂ）とは、通信路１４を経由して、ハートビートａと呼ばれる所定のパケット交換をし続け、互いの生存を通知し合う。 For example, in the cluster system disclosed in Non-Patent Document 1, an active server machine 10 (#A) and a standby server machine 10 (#B) are provided as shown in FIG. The server machine 10 (#A) executes the application 11 (#A), and writes data as an execution result to the shared disk device 13 as shown in (i). Meanwhile, as shown in (ii), the cluster software 12 (#A) of the active server machine 10 (#A) and the cluster software 12 (#B) of the standby server machine 10 (#B) are: A predetermined packet called heartbeat a is continuously exchanged via the communication path 14 to notify each other of their survival.

そして、待機系のサーバマシン１０（＃Ｂ）がハートビートａの断絶を検出した場合に、（ｉｉｉ）に示すように、待機系のサーバマシン１０（＃Ｂ）で同一のアプリケーション１１（＃Ｂ）を起動させることでアプリケーション処理を継続させる、所謂フェールオーバｂが一般に行われている。 When the standby server machine 10 (#B) detects the disconnection of the heartbeat a, as shown in (iii), the same application 11 (#B) is stored in the standby server machine 10 (#B). In general, so-called failover b is performed in which application processing is continued by activating.

ハートビートａが断絶する理由としては主に以下がある。
（１）稼動系のサーバマシン１０（＃Ａ）のダウン。
（２）通信路１４の障害。
（３）稼動系のサーバマシン１０（＃Ａ）におけるＣＰＵ高負荷等による一時的なスローダウンの発生。 The main reasons for the heartbeat a breaking are as follows.
(1) The active server machine 10 (#A) is down.
(2) A failure of the communication path 14.
(3) Temporary slowdown due to high CPU load or the like in the active server machine 10 (#A).

上記（１）の場合であれば、単純に待機系のサーバマシン１０（＃Ｂ）にフェールオーバｂすれば良いが、上記（２）及び（３）の場合には稼動系のサーバマシン１０（＃Ａ）が処理を継続するため、フェールオーバｂしてしまうと以下に示すような不都合が生じる。 In the case of (1) above, failover b should simply be performed on the standby server machine 10 (#B), but in the cases (2) and (3), the active server machine 10 (# Since A) continues the processing, the following inconvenience occurs when failover b occurs.

すなわち、図１６に示すように、複数台のサーバマシン１０（＃Ａ，＃Ｂ）が同一の共有ディスク装置１３に接続された状態でクラスタ構成を組む場合、上記（２）及び（３）が原因のハートビートａの断絶によりフェールオーバｂが行われると、図１７に示すように、稼動系のサーバマシン１０（＃Ａ）と待機系のサーバマシン１０（＃Ｂ）との両系が共有ディスク装置１３内の同一のデータ領域１５に書き込みを行う可能性があり、その場合、共有ディスク装置１３内のデータの整合性が失われ、データが破壊されてしまう恐れがある。 That is, as shown in FIG. 16, when a cluster configuration is formed with a plurality of server machines 10 (#A, #B) connected to the same shared disk device 13, the above (2) and (3) When failover b is performed due to the interruption of the cause heartbeat a, as shown in FIG. 17, both the active server machine 10 (#A) and the standby server machine 10 (#B) are shared disks. There is a possibility of writing to the same data area 15 in the device 13, and in this case, the consistency of the data in the shared disk device 13 is lost, and the data may be destroyed.

このため、従来のクラスタシステムでも、この不都合に対する対策が幾つか施されている。以下、この対策を施した従来技術を２つ紹介する。 For this reason, some countermeasures against this inconvenience are taken even in the conventional cluster system. Two conventional technologies that have taken this measure are introduced below.

第１の従来技術は、上記（１）乃至（３）に対処するものであり、ＳＣＳＩ仕様として定義されているリザーブ排他機能が使用され、１つのＬＵ（Logical Unit）につきＩ／Ｏ（ｒｅａｄやｗｒｉｔｅ）を発行可能なサーバが１つに絞られている。つまり、図１８の（ｉ）に示すようにハートビートａ切れが検出され、（ｉｉ）に示すように待機系のサーバマシン１０（＃Ｂ）にフェールオーバｂする際、（ｉｉｉ）に示すように待機系のサーバマシン１０（＃Ｂ）が共有ディスク装置１３内の対象ＬＵをリザーブし、（ｉｖ）に示すように稼動系のサーバマシン１０（＃Ａ）がそれ以上対象ＬＵに書き込みを行うのを防いでから、アプリケーション１１（＃Ｂ）を起動し、（ｖ）に示すように共有ディスク装置１３への書き込みを行うようにしている。 The first prior art deals with the above (1) to (3), uses the reserve exclusive function defined as the SCSI specification, and uses one I / O (read or other) per LU (Logical Unit). The number of servers that can issue (write) is limited to one. That is, when a heartbeat a break is detected as shown in (i) of FIG. 18 and failover b is performed to the standby server machine 10 (#B) as shown in (ii), as shown in (iii) The standby server machine 10 (#B) reserves the target LU in the shared disk device 13, and the active server machine 10 (#A) writes more to the target LU as shown in (iv). Then, the application 11 (#B) is started, and writing to the shared disk device 13 is performed as shown in (v).

第２の従来技術は、上記（１）及び（２）に対処するものであり、図１９に示すように、ＱＵＯＲＵＭ用領域１７と呼ばれる特殊なＬＵを共有ディスク装置１３内に設け、（ｉｉｉ）に示すように、クラスタを構成するサーバマシン１０（＃Ａ，＃Ｂ）からその時々の現在時刻を定期的に書き込むことで、（ｉ）に示すハートビートａ以外にサーバマシン１０（＃Ａ，＃Ｂ）の状態を監視する。 The second prior art deals with the above (1) and (2). As shown in FIG. 19, a special LU called a QUARUM area 17 is provided in the shared disk device 13, and (iii) As shown in FIG. 6, by periodically writing the current time from the server machine 10 (#A, #B) constituting the cluster, the server machine 10 (#A, #B) other than the heartbeat a shown in (i). Monitor the status of #B).

第２の従来技術によれば、通信路１４の障害が理由で待機系のサーバマシン１０（＃Ｂ）がハートビートａ断絶を確認した場合でも、待機系のサーバマシン１０（＃Ｂ）がＱＵＯＲＵＭ用領域１７を参照する。ＱＵＯＲＵＭ用領域１７は、通信路１４に障害があっても参照可能であるので、稼動系のサーバマシン１０（＃Ａ）が現在時刻のｗｒｉｔｅを継続していることを確認することができ、稼動系のサーバマシン１０（＃Ａ）がダウンしていないことを知ることができる。
東芝レビューＶｏｌ．５４Ｎｏ．１２（１９９９）、１８〜２１ページ According to the second prior art, even when the standby server machine 10 (#B) confirms the heartbeat a disconnection due to the failure of the communication path 14, the standby server machine 10 (#B) is QUIORUM. Reference area 17 is referred to. Since the QUARUM area 17 can be referred to even if there is a failure in the communication path 14, it can be confirmed that the active server machine 10 (#A) continues to write the current time, It can be known that the system server machine 10 (#A) is not down.
Toshiba Review Vol. 54 No. 12 (1999), pages 18-21

しかしながら、このような従来のクラスタシステムでは、以下のような問題がある。 However, such a conventional cluster system has the following problems.

まず、第１の従来技術の場合、図１８の（ｉｉｉ）及び（ｉｖ）に示すようなリザーブ排他が可能な否かは、共有ディスク装置１３や、そのマルチパスドライバ（図示せず）の実装に依存するという問題がある。そのため、新しい共有ディスク装置１３やそのマルチパスドライバに対応するためには、毎回検証することが必要になる。また、共有ディスク装置１３やそのマルチパスドライバによって、細かな制限事項が付きがちである。例えば、ある共有ディスク装置１３では、ディスクＩ／Ｏの物理的なパスが多重化されている環境で片パス障害が起きると、リザーブが維持されない状態に陥ってしまい、排他できなくなってしまう。 First, in the case of the first prior art, whether or not reserve exclusion as shown in (iii) and (iv) of FIG. 18 is possible is determined by mounting the shared disk device 13 and its multipath driver (not shown). There is a problem that depends on. Therefore, in order to support the new shared disk device 13 and its multipath driver, it is necessary to verify each time. In addition, there is a tendency that fine restrictions are attached depending on the shared disk device 13 and its multipath driver. For example, in a certain shared disk device 13, if a one-path failure occurs in an environment where physical paths of disk I / O are multiplexed, the reserve is not maintained and cannot be excluded.

また、第２の従来技術の場合、例えば、図２０に示すように、（ｉ）稼動系のサーバマシン１０（＃Ａ）がｗｒｉｔｅ中にＣＰＵ高負荷に陥りスローダウンしてしまった場合には、（ｉｉ）待機系のサーバマシン１０（＃Ｂ）はハートビートａ切れを検出し、稼動系のサーバマシン１０（＃Ａ）がダウンしたと思ってしまい、（ｉｉｉ）フェールオーバｂを実施してしまう。これにより、（ｉｖ）待機系のサーバマシン１０（＃Ｂ）は、ＱＵＯＲＵＭ用領域１７に現在時刻を書き込むとともに、（ｖ）アプリケーションデータ用領域１８のデータ領域１５にデータをｗｒｉｔｅする。 In the case of the second prior art, for example, as shown in FIG. 20, when (i) the active server machine 10 (#A) falls into a high CPU load during write and slows down. (Ii) The standby server machine 10 (#B) detects that the heartbeat a has expired and thinks that the active server machine 10 (#A) has gone down, and (iii) performs failover b End up. Thus, (iv) the standby server machine 10 (#B) writes the current time in the QUIORUM area 17 and (v) writes data to the data area 15 in the application data area 18.

一方、（ｖｉ）稼動系のサーバマシン１０（＃Ａ）も停止した訳ではないので、スローダウンした状態でＱＵＯＲＵＭ用領域１７に現在時刻を書き込むとともに、（ｖｉｉ）復帰直後から、アプリケーションデータ用領域１８のデータ領域１５へのデータのｗｒｉｔｅを継続する。 On the other hand, since (vi) the active server machine 10 (#A) is not stopped, the current time is written in the QUIORUM area 17 in a slow-down state, and (vii) the application data area immediately after the return. Continue to write data to 18 data areas 15.

このように、稼動系のサーバマシン１０（＃Ａ）がスローダウンから復帰した直後に、両サーバから同じＬＵへの書き込みが行われ、データ破壊が発生する可能性があるという問題がある。 In this way, immediately after the active server machine 10 (#A) recovers from the slowdown, there is a problem in that data is written to the same LU from both servers and data corruption may occur.

本発明はこのような事情に鑑みてなされたものであり、複数のサーバマシンと共有ディスク装置とから構成されたクラスタシステムにおいて、共有ディスク装置やそのマルチパスドライバの実装に依存せずにリザーブ排他が可能であり、かつ、フェールオーバが発生した場合であっても、複数のサーバマシンの書き込みによるデータ破損を発生させることなく、共有ディスク装置内のデータの整合を図ることが可能なクラスタシステム及びプログラムを提供することを目的とする。 The present invention has been made in view of such circumstances, and in a cluster system composed of a plurality of server machines and a shared disk device, reserved exclusive use is not required without depending on the implementation of the shared disk device and its multipath driver. Cluster system and program capable of matching data in a shared disk device without causing data corruption due to writing in a plurality of server machines even when failover occurs The purpose is to provide.

上記の目的を達成するために、本発明では、以下のような手段を講じる。 In order to achieve the above object, the present invention takes the following measures.

すなわち、請求項１の発明は、複数のサーバマシンと、複数のサーバマシンに接続された共有ディスク装置とから構成されたクラスタシステムであって、複数のサーバマシンは、複数のサーバマシン上でそれぞれ動作するアプリケーションと、複数のサーバマシン上でそれぞれ動作するクラスタソフトと、複数のサーバマシン上でそれぞれ動作するフィルタドライバとを備え、共有ディスク装置は、アプリケーションのデータを格納するマスター領域と、各サーバマシンのデータを格納する各サーバ用領域と、各サーバマシンで共有する共通領域とを備え、各サーバ用領域は、それぞれデータを格納するためのキャッシュ領域及びパッチ領域を備え、共通領域は、キャッシュ領域に格納されたデータの、マスター領域へのコピー許可であるコミット権が付与されたサーバマシンを規定したコミット権管理部を備え、複数のサーバマシンに備えられた各クラスタソフトは、互いに定期的に通信して互いの生存状態を確認し合うと共に、それぞれ稼動系と待機系との２種類の状態を持っている。 That is, the invention of claim 1 is a cluster system including a plurality of server machines and a shared disk device connected to the plurality of server machines, and the plurality of server machines are respectively connected to the plurality of server machines. The shared disk device includes an application that operates, cluster software that operates on each of a plurality of server machines, and a filter driver that operates on each of the plurality of server machines. Each server area includes a server area for storing machine data and a common area shared by each server machine. Each server area includes a cache area and a patch area for storing data, and the common area includes a cache area. Copy permission for copying data stored in the area to the master area The cluster software provided in the multiple server machines communicates with each other periodically to check each other's survival status, and includes the commit right management unit that defines the server machines to which the server rights are granted. There are two types of states, active and standby.

そして、クラスタソフトが稼動系の場合には、クラスタソフトは、コミット権管理部を変更して自サーバマシンのみにコミット権が付与されるようにし、自サーバマシン側からマスター領域が見える状態にし、自サーバマシンに備えられたアプリケーションを起動させる。また、自サーバマシンに備えられたフィルタドライバは、Ｉ／Ｏ入力を待ち、Ｉ／Ｏ入力がマスター領域への読取要求だった場合には、読取要求データの一部または全部が自サーバマシン用のキャッシュ領域にあればキャッシュ領域から読み取り、読取要求データが他にもあればマスター領域から読み取り、読み取ったデータを、読取要求をしたＩ／Ｏ入力側に返し、Ｉ／Ｏ入力がマスター領域への書込要求だった場合であり、自サーバマシン用のキャッシュ領域に書き込むための空き容量がなく、自サーバマシンにコミット権が付与されているのであれば、キャッシュ領域のデータをマスター領域にコピーしてキャッシュ領域をクリアし、自サーバマシンにコミット権が付与されていないのであれば書込要求をしたＩ／Ｏ入力側にエラーを返し、しかる後にマスター領域における書き込み位置情報と書き込んだデータとを自サーバマシン用のキャッシュ領域に書き込み、この書き込み結果を書込要求をしたＩ／Ｏ入力側に返し、Ｉ／Ｏ入力がマスター領域への読取要求でも書込要求でもない場合には、自サーバマシンに備えられたフィルタドライバの下位側にＩ／Ｏ入力によるデータを受け渡し、返り値をＩ／Ｏ入力側へ返す。 When the cluster software is active, the cluster software changes the commit right management unit so that the commit right is given only to the own server machine, and the master area is visible from the own server machine side. Start the application provided on the local server machine. Also, the filter driver provided in the own server machine waits for I / O input, and when the I / O input is a read request to the master area, part or all of the read request data is for the own server machine. Read from the cache area if there is any other read request data, read from the master area if there is any other read request data, return the read data to the I / O input side that requested the read, and the I / O input to the master area If there is no free space to write to the cache area for the local server machine and the commit right is granted to the local server machine, the cache area data is copied to the master area. Clear the cache area, and if the commit right is not granted to the local server machine, an error will be sent to the I / O input side that made the write request. Thereafter, the write position information and the written data in the master area are written in the cache area for the own server machine, and this write result is returned to the I / O input side that requested the write, and the I / O input is the master area. If the request is neither a read request nor a write request, the I / O input data is transferred to the lower side of the filter driver provided in the server machine, and the return value is returned to the I / O input side.

一方、クラスタソフトが待機系の場合であって、待機系のクラスタソフトが、稼動系のクラスタソフトが生存していないと判定した場合には、判定したクラスタソフトは、コミット権管理部を変更して自サーバマシンにのみコミット権が付与されるようにし、稼動系のクラスタソフトが備えられているサーバマシン用のキャッシュ領域のデータを自サーバマシン用のパッチ領域にコピーしてパッチ領域をマスター領域の一部として代用し、自サーバマシン側からマスター領域が見えるようにし、自サーバマシンに備えられたアプリケーションを起動する。そして、自サーバマシン上に備えられたフィルタドライバは、自サーバマシンに備えられたクラスタソフトが稼動系になった後、Ｉ／Ｏ入力を待ち、Ｉ／Ｏ入力がマスター領域への読取要求だった場合には、読取要求データの一部または全部が自サーバマシン用のパッチ領域にあればパッチ領域から読み取り、読取要求データの一部または全部が自サーバマシン用のキャッシュ領域にあればキャッシュ領域から読み取り、読取要求データが他にもあればマスター領域から読み取り、読み取ったデータを、読取要求したＩ／Ｏ入力側に返し、Ｉ／Ｏ入力がマスター領域への書込要求だった場合には、書込要求先の全部が自サーバマシン用のパッチ領域にあればパッチ領域に書き込んでＩ／Ｏ入力側に返し、そうでない場合にはパッチ領域に該当するデータのみをパッチ領域に書き込み、自サーバマシン用のキャッシュ領域に書き込むための空き容量が無ければコミット権管理部を参照して自サーバマシンにコミット権が付与されているか否かを確認し、付与されているならマスター領域に自サーバマシン用のキャッシュ領域のデータをコピーしてキャッシュ領域をクリアし、付与されていないならＩ／Ｏ入力側にエラーを返し、しかる後にマスター領域における書き込み位置情報と書き込んだデータとを自サーバマシン用のキャッシュ領域に書き込み、Ｉ／Ｏ入力側に返し、Ｉ／Ｏ入力がマスター領域への読取要求でも書込要求でもない場合には、自サーバマシンに備えられたフィルタドライバの下位層にＩ／Ｏ入力によるデータを受け渡し、返り値をＩ／Ｏ入力側へ返す。 On the other hand, if the cluster software is the standby system and the standby cluster software determines that the active cluster software is not alive, the determined cluster software changes the commit right management unit. The commit right is granted only to the local server machine, and the data in the cache area for the server machine equipped with the active cluster software is copied to the patch area for the local server machine, and the patch area becomes the master area. As an alternative, the master area can be seen from the server machine side, and the application provided on the server machine is started. The filter driver provided on the local server machine waits for I / O input after the cluster software provided on the local server machine becomes active, and the I / O input is a read request to the master area. If part or all of the read request data is in the patch area for the local server machine, it is read from the patch area, and if part or all of the read request data is in the cache area for the local server machine, the cache area is read. If there is any other read request data read from the master area, the read data is returned to the I / O input side that requested the read, and if the I / O input is a write request to the master area If all the write request destinations are in the patch area for the local server machine, write to the patch area and return to the I / O input side; otherwise, it corresponds to the patch area If there is no free space to write only the data to the patch area and write to the cache area for the local server machine, refer to the commit right management unit to check whether the commit right is granted to the local server machine. If it is assigned, the cache area data for the local server machine is copied to the master area to clear the cache area. If not assigned, an error is returned to the I / O input side, and then the write position information in the master area Are written in the cache area for the own server machine and returned to the I / O input side, and the I / O input is neither a read request nor a write request to the master area, it is prepared for the own server machine. The data by the I / O input is transferred to the lower layer of the received filter driver, and the return value is returned to the I / O input side.

請求項２の発明は、請求項１の発明のクラスタシステムに適用されるプログラムである。 The invention of claim 2 is a program applied to the cluster system of the invention of claim 1.

請求項３の発明は、請求項３の発明のプログラムにおいて、各フィルタドライバは、自サーバマシンのキャッシュ領域と同じデータを持つメモリキャッシュ領域をそれぞれ備え、自サーバマシン用のキャッシュ領域を参照する代わりに、自サーバマシン用のメモリキャッシュ領域を参照し、マスター領域における書き込み位置情報と書き込んだデータとを自サーバマシン用のキャッシュ領域に書き込む代わりに、自サーバマシン用のメモリキャッシュ領域およびキャッシュ領域に書き込み、キャッシュ領域のデータをマスター領域にコピーする代わりに、メモリキャッシュ領域のデータをマスター領域にコピーするプログラムである。 According to a third aspect of the present invention, in the program of the third aspect, each filter driver has a memory cache area having the same data as the cache area of the own server machine, and refers to the cache area for the own server machine. Instead of referring to the memory cache area for the own server machine and writing the write position information and the written data in the master area to the cache area for the own server machine, the memory cache area and the cache area for the own server machine are stored. This is a program for copying data in the memory cache area to the master area instead of copying data in the write / cache area to the master area.

本発明によれば、複数のサーバマシンと共有ディスク装置とから構成されたクラスタシステムにおいて、共有ディスク装置やそのマルチパスドライバの実装に依存せずにリザーブ排他が可能であり、かつ、フェールオーバが発生した場合であっても、複数のサーバマシンの書き込みによるデータ破損を発生させることなく、共有ディスク装置内のデータの整合を図ることが可能なクラスタシステム及びプログラムを実現することができる。 According to the present invention, in a cluster system composed of a plurality of server machines and a shared disk device, reserve exclusion is possible without depending on the implementation of the shared disk device and its multipath driver, and failover occurs. Even in this case, it is possible to realize a cluster system and a program capable of matching data in the shared disk device without causing data corruption due to writing in a plurality of server machines.

以下に、本発明を実施するための最良の形態について図面を参照しながら説明する。 The best mode for carrying out the present invention will be described below with reference to the drawings.

なお、以下の各実施の形態の説明に用いる図中の符号は、図１６乃至図２０と同一部分については同一符号を付して示し、重複説明を省略する。 In addition, the code | symbol in the figure used for description of each following embodiment attaches | subjects and shows the same code | symbol about the same part as FIG. 16 thru | or FIG.

（第１の実施の形態）
図１は、第１の実施の形態に係るクラスタシステムの構成例を示す機能ブロック図である。 (First embodiment)
FIG. 1 is a functional block diagram illustrating a configuration example of the cluster system according to the first embodiment.

すなわち、本実施の形態に係るクラスタシステムは、複数のサーバマシン（ここでは、一例として２つのサーバマシン１０（＃Ａ，＃Ｂ）を示す）と、これらサーバマシン１０（＃Ａ，＃Ｂ）に接続された共有ディスク装置１３とから構成されたクラスタシステムである。ここでは、仮に、初期状態として、サーバマシン１０（＃Ａ）が稼動系、サーバマシン１０（＃Ｂ）が待機系であるとする。各サーバマシン１０（＃Ａ，＃Ｂ）はそれぞれ、各サーバマシン１０（＃Ａ，＃Ｂ）上でそれぞれ動作するアプリケーション１１（＃Ａ,＃Ｂ）、クラスタソフト１２（＃Ａ，＃Ｂ）、フィルタドライバ２１（＃Ａ，＃Ｂ）、ディスクドライバ２２（＃Ａ，＃Ｂ）を備えている。フィルタドライバ２１は、アプリケーション１１とディスクドライバ２２との間に介挿して設けられる。アプリケーション１１は、主にサーバアプリケーションを想定しており、本実施の形態では、プロキシキャッシュサーバであるとする。 That is, the cluster system according to the present embodiment includes a plurality of server machines (here, two server machines 10 (#A, #B) are shown as an example) and these server machines 10 (#A, #B). This is a cluster system composed of the shared disk device 13 connected to. Here, it is assumed that the server machine 10 (#A) is an active system and the server machine 10 (#B) is a standby system as an initial state. Each server machine 10 (#A, #B) has an application 11 (#A, #B) and cluster software 12 (#A, #B) respectively running on each server machine 10 (#A, #B). And a filter driver 21 (#A, #B) and a disk driver 22 (#A, #B). The filter driver 21 is provided between the application 11 and the disk driver 22. The application 11 mainly assumes a server application, and is assumed to be a proxy cache server in the present embodiment.

共有ディスク装置１３は、各サーバマシン１０（＃Ａ，＃Ｂ）それぞれとＦＣ（Fiber Channel）ケーブル２３（＃Ａ，＃Ｂ）で接続されており、ＱＵＯＲＵＭ用領域１７とマスター領域３０とを備えている。それぞれを異なるＬＵに分けても良いし、異なるパーティションに分けても良い。 The shared disk device 13 is connected to each server machine 10 (#A, #B) via an FC (Fiber Channel) cable 23 (#A, #B), and includes a QUIORUM area 17 and a master area 30. ing. Each may be divided into different LUs or may be divided into different partitions.

図２は、共有ディスク装置１３の詳細構成例を示す概念図である。 FIG. 2 is a conceptual diagram showing a detailed configuration example of the shared disk device 13.

図２に示すようにＱＵＯＲＵＭ用領域１７は、共通領域１７０、サーバＡ用領域１７１、及びサーバＢ用領域１７２の３つの領域からなる。更に、共通領域１７０はコミット権管理テーブル１７０ａを、サーバＡ用領域１７１はキャッシュ領域１７１ａとパッチ領域１７１ｂを、サーバＢ用領域１７２はキャッシュ領域１７２ａとパッチ領域１７２ｂをそれぞれ備えている。 As shown in FIG. 2, the QUARUM area 17 includes three areas: a common area 170, a server A area 171, and a server B area 172. Further, the common area 170 includes a commit right management table 170a, the server A area 171 includes a cache area 171a and a patch area 171b, and the server B area 172 includes a cache area 172a and a patch area 172b.

キャッシュ領域１７１ａは、サーバマシン１０（＃Ａ）のアプリケーション１１（＃Ａ）からＦＣケーブル２３（＃Ａ）を経由してデータがｗｒｉｔｅされる領域である。同様に、キャッシュ領域１７２ａは、サーバマシン１０（＃Ｂ）のアプリケーション１１（＃Ｂ）からＦＣケーブル２３（＃Ｂ）を経由してデータがｗｒｉｔｅされる領域である。 The cache area 171a is an area where data is written from the application 11 (#A) of the server machine 10 (#A) via the FC cable 23 (#A). Similarly, the cache area 172a is an area where data is written from the application 11 (#B) of the server machine 10 (#B) via the FC cable 23 (#B).

コミット権管理テーブル１７０ａは、キャッシュ領域１７１ａ，１７２ａに格納されたデータの、マスター領域３０へのコピー許可であるコミット権が付与されたサーバマシン１０を規定している。図２は、サーバマシン１０（＃Ａ）に対してコミット権が付与され、サーバマシン１０（＃Ｂ）に対してコミット権が付与されていない例を示している。 The commit right management table 170a defines the server machine 10 to which the commit right that is permission to copy the data stored in the cache areas 171a and 172a to the master area 30 is given. FIG. 2 shows an example in which the commit right is given to the server machine 10 (#A) and the commit right is not given to the server machine 10 (#B).

マスター領域３０は、キャッシュ領域１７１ａ，１７２ａに書き込まれたデータが一杯になった場合に、キャッシュ領域１７１ａ，１７２ａに書き込まれたデータを移動する領域である。移動は、キャッシュ領域１７１ａ，１７２ａが一杯になった場合に、コミット権を持っているサーバマシン１０、すなわち、コミット権管理テーブル１７０ａにおいて「コミット有」と規定されたサーバマシン１０のフィルタドライバ２１が、マスター領域３０にコミットすることによって開始される。 The master area 30 is an area for moving data written to the cache areas 171a and 172a when the data written to the cache areas 171a and 172a is full. When the cache areas 171a and 172a are full, the migration is performed by the server machine 10 having the commit right, that is, the filter driver 21 of the server machine 10 defined as “with commit” in the commit right management table 170a. , Starting by committing to the master area 30.

このような本実施の形態に係るクラスタシステムでは、クラスタソフト１２（＃Ａ，＃Ｂ）同士が、互いに定期的に通信して互いのサーバマシン１０の生存状態を確認し合うと共に、それぞれ稼動系と待機系との２種類の状態を持つ。以下の説明では、サーバマシン１０（＃Ａ）が稼動系であり、サーバマシン（＃Ｂ）が待機系であるものとする。したがって、クラスタソフト１２（＃Ａ）は稼動系の状態であり、クラスタソフト１２（＃Ｂ）は待機系の状態となっている。 In such a cluster system according to the present embodiment, the cluster software 12 (#A, #B) regularly communicate with each other to check the survival state of each other's server machine 10, and each operating system And standby system. In the following description, it is assumed that the server machine 10 (#A) is an active system and the server machine (#B) is a standby system. Therefore, the cluster software 12 (#A) is in an active state, and the cluster software 12 (#B) is in a standby state.

このように稼動系であるクラスタソフト１２（＃Ａ）は、コミット権管理テーブル１７０ａを、自サーバマシン１０（＃Ａ）のみにコミット権が付与されるように設定し、上位側である自サーバマシン１０（＃Ａ）側からマスター領域３０が見える状態にし、自サーバマシン１０（＃Ａ）に備えられたアプリケーション１１（＃Ａ）を起動させる。 In this way, the cluster software 12 (#A) that is the active system sets the commit right management table 170a so that the commit right is given only to the own server machine 10 (#A), and the own server that is the upper side The master area 30 can be seen from the machine 10 (#A) side, and the application 11 (#A) provided in the server machine 10 (#A) is started.

サーバマシン１０（＃Ａ）に備えられたフィルタドライバ２１（＃Ａ）は、上位からのＩ／Ｏ入力を待ち、Ｉ／Ｏ入力がマスター領域３０へのｒｅａｄだった場合には、ｒｅａｄデータの一部または全部がサーバマシン１０（＃Ａ）用のキャッシュ領域１７１ａにあればキャッシュ領域１７１ａから読み取り、ｒｅａｄデータが他にもあればマスター領域３０から読み取り、読み取ったデータを、ｒｅａｄ要求をしたＩ／Ｏ入力側にまとめて返す。 The filter driver 21 (#A) provided in the server machine 10 (#A) waits for I / O input from the host, and if the I / O input is read to the master area 30, the read data If part or all of the data is in the cache area 171a for the server machine 10 (#A), it is read from the cache area 171a, and if there is any other read data, it is read from the master area 30, and the read data is read from the I / O Return all data to the input side.

また、Ｉ／Ｏ入力がマスター領域３０へのｗｒｉｔｅだった場合であり、自サーバマシン１０（＃Ａ）用のキャッシュ領域１７１ａに書き込むための空き容量がなく、自サーバマシン１０（＃Ａ）にコミット権が付与されているのであれば、キャッシュ領域１７１ａのデータをマスター領域３０にコピーしてキャッシュ領域１７１ａをクリアする。一方、自サーバマシン１０（＃Ａ）にコミット権が付与されていないのであれば、ｗｒｉｔｅ要求をしたＩ／Ｏ入力側にエラーを返し、しかる後に、図２のデータテーブル１７１ｃに示すように、マスター領域３０における書き込み位置情報（開始位置、長さ）と書き込んだデータとを自サーバマシン１０（＃Ａ）用のキャッシュ領域１７１ａに書き込み、書き込み結果を、ｗｒｉｔｅ要求をしたＩ／Ｏ入力側に返す。 Further, this is a case where the I / O input is a write to the master area 30, and there is no free space for writing to the cache area 171a for the own server machine 10 (#A), and the own server machine 10 (#A) If the commit right is granted, the data in the cache area 171a is copied to the master area 30 and the cache area 171a is cleared. On the other hand, if the commit right is not given to the own server machine 10 (#A), an error is returned to the I / O input side that made the write request, and thereafter, as shown in the data table 171c of FIG. Write position information (start position, length) in the master area 30 and the written data are written to the cache area 171a for the own server machine 10 (#A), and the write result is sent to the I / O input side that has issued the write request. return.

一方、Ｉ／Ｏ入力が、マスター領域３０へのｒｅａｄでもｗｒｉｔｅでもない場合には、自サーバマシン１０（＃Ａ）に備えられたフィルタドライバ２１（＃Ａ）の下位側にあるディスクドライバ２２（＃Ａ）にＩ／Ｏ入力によるデータをそのまま受け渡し、返り値をそのままＩ／Ｏ入力側へ返す。 On the other hand, when the I / O input is neither read nor write to the master area 30, the disk driver 22 (on the lower side of the filter driver 21 (#A) provided in the own server machine 10 (#A) ( The data by I / O input is transferred to #A) as it is, and the return value is returned to the I / O input side as it is.

次に、待機系のクラスタソフト１２（＃Ｂ）およびフィルタドライバ２１（＃Ｂ）について説明する。 Next, standby cluster software 12 (#B) and filter driver 21 (#B) will be described.

待機系のクラスタソフト１２（＃Ｂ）は、稼動系のクラスタソフト１２（＃Ａ）が生存していない、すなわち動作していないと判定した場合には、コミット権管理テーブル１７０ａを変更して自サーバマシン１０（＃Ｂ）にのみコミット権が付与されるようにする。そして、サーバマシン１０（＃Ａ）用のキャッシュ領域１７１ａのデータを自サーバマシン１０（＃Ｂ）用のパッチ領域１７２ｂにコピーし、パッチ領域１７２ｂをマスター領域３０の一部として代用し、自サーバマシン１０（＃Ｂ）側からマスター領域３０が見えるようにし、自サーバマシン１０（＃Ｂ）に備えられたアプリケーション１１（＃Ｂ）を起動する。 If the standby cluster software 12 (#B) determines that the active cluster software 12 (#A) is not alive, that is, is not operating, it changes the commit right management table 170a to The commit right is given only to the server machine 10 (#B). Then, the data in the cache area 171a for the server machine 10 (#A) is copied to the patch area 172b for the own server machine 10 (#B), the patch area 172b is used as a part of the master area 30, and the own server The master area 30 is made visible from the machine 10 (#B) side, and the application 11 (#B) provided in the own server machine 10 (#B) is activated.

自サーバマシン１０（＃Ｂ）上に備えられたフィルタドライバ２１（＃Ｂ）は、サーバマシン１０（＃Ｂ）に備えられたクラスタソフト１２（＃Ｂ）が稼動系になった後、上位からのＩ／Ｏ入力を待つ。そして、このＩ／Ｏ入力がマスター領域３０へのｒｅａｄだった場合には、ｒｅａｄデータの一部または全部が自サーバマシン１０（＃Ｂ）用のパッチ領域１７２ｂにあればパッチ領域１７２ｂから読み取り、ｒｅａｄデータの一部または全部がキャッシュ領域１７２ａにあれば読み取り、ｒｅａｄデータが他にもあればマスター領域３０から読み取り、読み取ったデータを、ｒｅａｄ要求したＩ／Ｏ入力側にまとめて返す。 The filter driver 21 (#B) provided on the own server machine 10 (#B) starts from the upper level after the cluster software 12 (#B) provided on the server machine 10 (#B) becomes an active system. Wait for I / O input. If this I / O input is read to the master area 30, if part or all of the read data is in the patch area 172b for the own server machine 10 (#B), it is read from the patch area 172b, If some or all of the read data is in the cache area 172a, it is read. If there is other read data, it is read from the master area 30, and the read data is collectively returned to the I / O input side that requested read.

そして、Ｉ／Ｏ入力がマスター領域３０へのｗｒｉｔｅだった場合には、ｗｒｉｔｅ先の全部が自サーバマシン１０（＃Ｂ）用のパッチ領域１７２ｂにあればパッチ領域１７２ｂに書き込んでＩ／Ｏ入力側に返す。そうでない場合にはパッチ領域１７２ｂに該当するデータのみをパッチ領域１７２ｂに書き込み、自サーバマシン１０（＃Ｂ）用のキャッシュ領域１７２ａにデータを書き込むための空き容量が無ければコミット権管理テーブル１７０ａを参照して自サーバマシン１０（＃Ｂ）にコミット権が付与されているか否かを確認する。 If the I / O input is a write to the master area 30, if all the write destinations are in the patch area 172b for the own server machine 10 (#B), the I / O input is written to the patch area 172b. Return to the side. Otherwise, only the data corresponding to the patch area 172b is written to the patch area 172b, and if there is no free space for writing data to the cache area 172a for the own server machine 10 (#B), the commit right management table 170a is stored. It is confirmed whether or not the commit right is given to the own server machine 10 (#B).

確認の結果、コミット権が付与されていればマスター領域３０に自サーバマシン１０（＃Ｂ）用のキャッシュ領域１７２ａのデータをコピーしてキャッシュ領域１７２ａをクリアする。一方、コミット権が付与されていないならばＩ／Ｏ入力側にエラーを返し、しかる後にマスター領域３０における書き込み位置情報（開始位置、長さ）と書き込んだデータとを、図２のデータテーブル１７１ｃに示すように自サーバマシン１０（＃Ｂ）用のキャッシュ領域１７２ａに書き込み、Ｉ／Ｏ入力側に返す。 As a result of the confirmation, if the commit right is given, the data in the cache area 172a for the own server machine 10 (#B) is copied to the master area 30 to clear the cache area 172a. On the other hand, if the commit right is not given, an error is returned to the I / O input side, and then the write position information (start position, length) in the master area 30 and the written data are shown in the data table 171c of FIG. As shown in FIG. 4, the data is written in the cache area 172a for the local server machine 10 (#B) and returned to the I / O input side.

一方、Ｉ／Ｏ入力がマスター領域３０へのｒｅａｄでもｗｒｉｔｅでもない場合には、自サーバマシン１０（＃Ｂ）に備えられたフィルタドライバ２１（＃Ｂ）の下位層であるディスクドライバ２２（＃Ｂ）にＩ／Ｏ入力によるデータをそのまま受け渡し、返り値をそのままＩ／Ｏ入力側へ返す。 On the other hand, when the I / O input is neither read nor write to the master area 30, the disk driver 22 (#) which is a lower layer of the filter driver 21 (#B) provided in the server machine 10 (#B). B) The data by the I / O input is transferred as it is, and the return value is returned to the I / O input side as it is.

また、各サーバマシン１０は、マスター領域３０にコピーすべきパッチ領域（１７１ｂ又は１７２ｂ）を各サーバマシン１０が持っているか否かを管理する。更に、クラスタソフト１２は、稼動系になった場合、この管理結果に基づいて、マスター領域３０にコピーすべきパッチ領域（１７１ｂ又は１７２ｂ）を全てマスター領域３０にコピーするようにしても良い。 Each server machine 10 manages whether each server machine 10 has a patch area (171b or 172b) to be copied to the master area 30. Further, when the cluster software 12 becomes active, all the patch areas (171b or 172b) to be copied to the master area 30 may be copied to the master area 30 based on the management result.

次に、以上のように構成した本実施の形態に係るクラスタシステムの動作について説明する。ただし、初期状態として、サーバマシン１０（＃Ａ）とサーバマシン１０（＃Ｂ）との間ではクラスタソフト１２（＃Ａ，＃Ｂ）同士がハートビートを交換し、互いの生存が確認できており、サーバマシン１０（＃Ａ，＃Ｂ）ともに、ＱＵＯＲＵＭ用領域１０７とマスター領域３０は上位層から見えない（フィルタドライバがフェンスオフ）状態になっており、サーバマシン１０（＃Ａ，＃Ｂ）ともにアプリケーション１１（＃Ａ，＃Ｂ）は起動していないものとする。 Next, the operation of the cluster system according to this embodiment configured as described above will be described. However, as an initial state, the cluster software 12 (#A, #B) exchanges heartbeats between the server machine 10 (#A) and the server machine 10 (#B), and the existence of each other can be confirmed. Both the server machine 10 (#A, #B) and the QUIORUM area 107 and the master area 30 are not visible from the upper layer (the filter driver is fence-off), and the server machine 10 (#A, #B ) Both applications 11 (#A, #B) are not activated.

まず、図３に示すフローチャートと、図４に示す概念図とを用いて、サーバマシン１０（＃Ａ）を稼動系に設定する場合におけるクラスタソフト１２（＃Ａ）による処理の流れを説明する。 First, the flow of processing by the cluster software 12 (#A) when the server machine 10 (#A) is set as an active system will be described using the flowchart shown in FIG. 3 and the conceptual diagram shown in FIG.

まず、ユーザ操作により、サーバマシン１０（＃Ａ）を稼動系にせよとの通知が、サーバマシン１０（＃Ａ）のクラスタソフト１２（＃Ａ）に届く（Ｓ１）。次に、クラスタソフト１２（＃Ａ）によって、サーバマシン１０（＃Ａ）にコミット権が付与されるようにコミット権管理テーブル１７０ａが設定される（Ｓ２）。すると、クラスタソフト１２（＃Ａ）によって、フィルタドライバ２１（＃Ａ）に稼動系となったことが通知される（Ｓ３）。その後、サーバマシン１０（＃Ａ）のフィルタドライバ２１（＃Ａ）によって、上位層からマスター領域３０が見える状態にされる（Ｓ４）。そして、サーバマシン１０（＃Ａ）のクラスタソフト１２（＃Ａ）によって、アプリケーション１１（＃Ａ）が起動される（Ｓ５）。 First, a notification that the server machine 10 (#A) should be an active system is sent to the cluster software 12 (#A) of the server machine 10 (#A) (S1). Next, the commit right management table 170a is set by the cluster software 12 (#A) so that the commit right is given to the server machine 10 (#A) (S2). Then, the cluster software 12 (#A) notifies the filter driver 21 (#A) that it has become an active system (S3). Thereafter, the master area 30 is made visible from the upper layer by the filter driver 21 (#A) of the server machine 10 (#A) (S4). Then, the application 11 (#A) is activated by the cluster software 12 (#A) of the server machine 10 (#A) (S5).

次に、図５のフローチャートと、図４に示す概念図とを用いて、サーバマシン１０（＃Ａ）を稼動系に設定した後のフィルタドライバ２１（＃Ａ）による処理の流れを説明する。 Next, the flow of processing by the filter driver 21 (#A) after setting the server machine 10 (#A) to the active system will be described using the flowchart of FIG. 5 and the conceptual diagram shown in FIG.

まず、サーバマシン１０（＃Ａ）のフィルタドライバ２１（＃Ａ）が上位からのＩ／Ｏ入力を待つ（Ｓ１１）。そして、Ｉ／Ｏ入力がなされると、なされたＩ／Ｏ入力の種別が判定される（Ｓ１２）。なされたＩ／Ｏ入力がマスター領域３０へのｒｅａｄであればステップＳ１３へ、ｗｒｉｔｅであればステップＳ１９へ、それ以外であればステップＳ２６へそれぞれ進む。 First, the filter driver 21 (#A) of the server machine 10 (#A) waits for an I / O input from the host (S11). When an I / O input is made, the type of the made I / O input is determined (S12). If the I / O input made is read to the master area 30, the process proceeds to step S13, if it is write, the process proceeds to step S19, and if not, the process proceeds to step S26.

ステップＳ１３では、ｒｅａｄ先データの一部又は全部がキャッシュ領域１７１ａにあれば、キャッシュ領域１７１ａ上の該当するデータがｒｅａｄされ（Ｓ１４）、ステップＳ１５の処理に進む。ｒｅａｄ先データがキャッシュ領域１７１ａに全く無いのであれば、ステップＳ１４の処理をスキップしてステップＳ１５の処理に進む。 In step S13, if part or all of the read destination data is in the cache area 171a, the corresponding data in the cache area 171a is read (S14), and the process proceeds to step S15. If there is no read destination data in the cache area 171a, the process of step S14 is skipped and the process proceeds to step S15.

ステップＳ１５では、ｒｅａｄ先データの一部又は全部がキャッシュ領域１７１ａに無いのであれば、マスター領域３０から、残りのデータのみがｒｅａｄされ（Ｓ１６）、ステップＳ１７の処理に進む。そうでない場合には、ステップＳ１６の処理をスキップしてステップＳ１７の処理に進む。 In step S15, if some or all of the read destination data is not in the cache area 171a, only the remaining data is read from the master area 30 (S16), and the process proceeds to step S17. Otherwise, the process of step S16 is skipped and the process proceeds to step S17.

ステップＳ１７では、ｒｅａｄされたデータが一つにマージされてｒｅａｄ要求元の上位へ返され（Ｓ１７）、ステップＳ１８の処理に進む。 In step S17, the read data is merged into one and returned to the upper level of the read request source (S17), and the process proceeds to step S18.

そして、ステップＳ１８では、ＯＳが終了するのであれば処理が終了し、そうでなければステップＳ１１の処理に戻る（Ｓ１８）。 In step S18, if the OS ends, the process ends. Otherwise, the process returns to step S11 (S18).

ステップＳ１９では、キャッシュ領域１７１ａに、ｗｒｉｔｅできるだけの空き容量があるかが判定される（Ｓ１９）。そして、ｗｒｉｔｅできるだけの空き容量があると判定された場合には、ステップＳ２０において、サーバマシン１０（＃Ａ）のフィルタドライバ２１（＃Ａ）によって、共有ディスク装置１３上の書き込み開始アドレス、長さ、生データの３つがサーバマシン１０（＃Ａ）のキャッシュ領域１７１ａに書き込まれる（Ｓ２０）とともに上位へ返される（Ｓ２１）。その後、ステップＳ１８の処理に進む。 In step S19, it is determined whether there is enough free space in the cache area 171a (S19). If it is determined that there is enough free space to write, in step S20, the filter driver 21 (#A) of the server machine 10 (#A) writes the write start address and length on the shared disk device 13. The three raw data are written into the cache area 171a of the server machine 10 (#A) (S20) and returned to the upper level (S21). Thereafter, the process proceeds to step S18.

一方、ステップＳ１９において、キャッシュ領域１７１ａに、ｗｒｉｔｅできるだけの空き容量が無いと判定された場合には、ステップＳ２２において、コミット権管理テーブル１７０ａが参照され、サーバマシン１０（＃Ａ）にコミット権が付与されているかが確認される（Ｓ２２）。 On the other hand, if it is determined in step S19 that the cache area 171a does not have enough free space to write, the commit right management table 170a is referred to in step S22, and the commit right is assigned to the server machine 10 (#A). It is confirmed whether it is given (S22).

サーバマシン１０（＃Ａ）にコミット権が付与されていれば、ステップＳ２３に進み、サーバマシン１０（＃Ａ）のフィルタドライバ２１（＃Ａ）が、コミット（キャッシュ領域１７１ａからマスター領域３０への反映）を実施する（Ｓ２３）。 If the commit right is given to the server machine 10 (#A), the process proceeds to step S23, and the filter driver 21 (#A) of the server machine 10 (#A) performs the commit (from the cache area 171a to the master area 30). (Reflect) is performed (S23).

次に、サーバマシン１０（＃Ａ）のフィルタドライバ２１（＃Ａ）が、キャッシュ領域１７１ａからコミット済みのキャッシュを削除し（Ｓ２４）、ステップＳ２０の処理に進む。 Next, the filter driver 21 (#A) of the server machine 10 (#A) deletes the committed cache from the cache area 171a (S24), and proceeds to the process of step S20.

一方、ステップＳ２２において、サーバマシン１０（＃Ａ）にコミット権が付与されていない場合には、ステップＳ２５に進み、フィルタドライバ２１（＃Ａ）は上位へエラーを返し（Ｓ２５）、しかる後にステップＳ１８の処理に進む。 On the other hand, if the commit right is not given to the server machine 10 (#A) in step S22, the process proceeds to step S25, and the filter driver 21 (#A) returns an error to the upper level (S25), and then the step is performed. The process proceeds to S18.

ステップＳ２６では、ディスクドライバ２２（＃Ａ）に処理がそのまま渡され、その後ステップＳ２７において返り値がそのまま上位に返され、ステップＳ１８の処理に進む。 In step S26, the process is transferred to the disk driver 22 (#A) as it is, and then in step S27, the return value is returned to the upper level as it is, and the process proceeds to step S18.

次に、図６に示すフローチャートと、図７に示す概念図とを用いて、サーバマシン１０（＃Ｂ）がハートビート切れを検出し、フェールオーバするときのサーバマシン１０（＃Ｂ）のクラスタソフト１２（＃Ｂ）による処理の流れを説明する。 Next, using the flowchart shown in FIG. 6 and the conceptual diagram shown in FIG. 7, the cluster software of the server machine 10 (#B) when the server machine 10 (#B) detects a heartbeat break and fails over. 12 (#B) will be described.

まず、サーバマシン１０（＃Ｂ）のクラスタソフト１２（＃Ｂ）は、サーバマシン１０（＃Ａ）のクラスタソフト１２（＃Ａ）が、図７中のＸに示すようにｗｒｉｔｅ中にスローダウンしたり、図７中のＹに示すようにコミット中にスローダウンすることによってハードビート切れを検出する（Ｓ３１）と、サーバマシン１０（＃Ｂ）にコミット権が付与されるようにコミット権管理テーブル１７０ａが設定される（Ｓ３２）。なお、スローダウンしたクラスタソフト１２（＃Ａ）は、何れの場合であっても、しばらくした後に復帰するものとする。 First, the cluster software 12 (#B) of the server machine 10 (#B) is slowed down during the write by the cluster software 12 (#A) of the server machine 10 (#A) as indicated by X in FIG. Or when the hard beat is detected by slowing down during commit as shown by Y in FIG. 7 (S31), the commit right management is performed so that the commit right is given to the server machine 10 (#B). The table 170a is set (S32). In any case, the slow-down cluster software 12 (#A) is assumed to return after a while.

次に、クラスタソフト１２（＃Ｂ）が、サーバマシン１０（＃Ａ）用のキャッシュ領域１７１ａの全データ（コミット前又はコミット中のものを含む）を、サーバマシン１０（＃Ｂ）のパッチ領域１７２ｂにコピーする（Ｓ３３）。 Next, the cluster software 12 (#B) stores all data in the cache area 171a for the server machine 10 (#A) (including data before or during commit) in the patch area of the server machine 10 (#B). Copy to 172b (S33).

すると、クラスタソフト１２（＃Ｂ）によって、フィルタドライバ２１（＃Ｂ）にサーバマシン１０（＃Ｂ）が稼動系になったことが通知される（Ｓ３４）。その後、サーバマシン１０（＃Ｂ）のフィルタドライバ２１（＃Ｂ）によって、上位層からマスター領域３０が見える状態にされる（Ｓ３５）。そして、サーバマシン１０（＃Ｂ）のクラスタソフト１２（＃Ｂ）によって、アプリケーション１１（＃Ｂ）が起動される（Ｓ３６）。 Then, the cluster software 12 (#B) notifies the filter driver 21 (#B) that the server machine 10 (#B) has become an active system (S34). Thereafter, the master area 30 is made visible from the upper layer by the filter driver 21 (#B) of the server machine 10 (#B) (S35). Then, the application 11 (#B) is activated by the cluster software 12 (#B) of the server machine 10 (#B) (S36).

次に、図８のフローチャートと、図７に示す概念図とを用いて、サーバマシン１０（＃Ｂ）へのフェールオーバ後のフィルタドライバ２１（＃Ｂ）による処理の流れを説明する。 Next, the flow of processing by the filter driver 21 (#B) after failover to the server machine 10 (#B) will be described using the flowchart of FIG. 8 and the conceptual diagram shown in FIG.

まず、サーバマシン１０（＃Ｂ）のフィルタドライバ２１（＃Ｂ）が上位からのＩ／Ｏ入力を待つ（Ｓ４１）。そして、Ｉ／Ｏ入力がなされると、なされたＩ／Ｏ入力の種別が判定される（Ｓ４２）。なされたＩ／Ｏ入力がマスター領域３０へのｒｅａｄであればステップＳ４３へ、ｗｒｉｔｅであればステップＳ５１へ、それ以外であればステップＳ６１へそれぞれ進む。 First, the filter driver 21 (#B) of the server machine 10 (#B) waits for an I / O input from the host (S41). When an I / O input is made, the type of the made I / O input is determined (S42). If the I / O input made is read to the master area 30, the process proceeds to step S43, if it is write, the process proceeds to step S51, and if not, the process proceeds to step S61.

ステップＳ４３では、ｒｅａｄ先データの一部又は全部がパッチ領域１７２ｂにあれば、パッチ領域１７２ｂ上の該当するデータがｒｅａｄされ（Ｓ４４）、ステップＳ４５の処理に進む。ｒｅａｄ先データがパッチ領域１７２ｂに全く無いのであれば、ステップＳ４４の処理をスキップしてステップＳ４５の処理に進む。 In step S43, if part or all of the read destination data is in the patch area 172b, the corresponding data on the patch area 172b is read (S44), and the process proceeds to step S45. If there is no read destination data in the patch area 172b, the process of step S44 is skipped and the process proceeds to step S45.

ステップＳ４５では、ｒｅａｄ先データの一部又は全部がキャッシュ領域１７２ａにあるのであれば、キャッシュ領域１７２ａ上の該当データがｒｅａｄされ（Ｓ４６）、ステップＳ４７の処理に進む。そうでない場合には、ステップＳ４６の処理をスキップしてステップＳ４７の処理に進む。 In step S45, if part or all of the read destination data is in the cache area 172a, the corresponding data in the cache area 172a is read (S46), and the process proceeds to step S47. Otherwise, the process of step S46 is skipped and the process proceeds to step S47.

ステップＳ４７では、ｒｅａｄ先データの一部又は全部がパッチ領域１７２ｂにもキャッシュ領域１７２ａにも無いのであれば、マスター領域３０から、残りのデータのみがｒｅａｄされ（Ｓ４８）、ステップＳ４９の処理に進む。そうでない場合には、ステップＳ４８の処理をスキップしてステップＳ４９の処理に進む。 In step S47, if part or all of the read destination data is not in the patch area 172b or the cache area 172a, only the remaining data is read from the master area 30 (S48), and the process proceeds to step S49. . Otherwise, the process of step S48 is skipped and the process proceeds to step S49.

ステップＳ４９では、ｒｅａｄされたデータが一つにマージされてｒｅａｄ要求元の上位へ返され（Ｓ４９）、ステップＳ５０の処理に進む。 In step S49, the read data is merged into one and returned to the upper level of the read request source (S49), and the process proceeds to step S50.

そして、ステップＳ５０では、ＯＳが終了するのであれば処理が終了し、そうでなければステップＳ４１の処理に戻る（Ｓ１８）。 In step S50, if the OS ends, the process ends. If not, the process returns to step S41 (S18).

ステップＳ５１では、ｗｒｉｔｅ先の一部又は全部がパッチ領域１７２ｂ上にあれば、ステップＳ５２に進み、該当するデータがパッチ領域１７２ｂにｗｒｉｔｅされ（Ｓ５２）、ステップＳ５３の処理に進む。 In step S51, if part or all of the write destination is on the patch area 172b, the process proceeds to step S52, the corresponding data is written to the patch area 172b (S52), and the process proceeds to step S53.

ステップＳ５３では、ｗｒｉｔｅ先の全部がパッチ領域１７２ｂであった場合には、ステップＳ５６の処理に進み、上位へ結果が返された後にステップＳ５０の処理に進む。一方、ｗｒｉｔｅ先の全部がパッチ領域１７２ｂである訳ではない場合には、ステップＳ５４の処理に進む。 In step S53, if all of the write destinations are the patch area 172b, the process proceeds to step S56, and after the result is returned to the upper level, the process proceeds to step S50. On the other hand, if the entire write destination is not the patch area 172b, the process proceeds to step S54.

ステップＳ５４では、キャッシュ領域１７２ａに、ｗｒｉｔｅできるだけの空き容量があるかが判定される（Ｓ５４）。そして、ｗｒｉｔｅできるだけの空き容量があると判定された場合には、ステップＳ５５において、サーバマシン１０（＃Ｂ）のフィルタドライバ２１（＃Ｂ）によって、共有ディスク装置１３上の書き込み開始アドレス、長さ、生データの３つがサーバマシン１０（＃Ｂ）のキャッシュ領域１７２ａに書き込まれる（Ｓ５５）とともに上位へ返される（Ｓ５６）。その後、ステップＳ５０の処理に進む。 In step S54, it is determined whether there is enough free space in the cache area 172a (S54). If it is determined that there is enough free space to write, in step S55, the filter driver 21 (#B) of the server machine 10 (#B) writes the write start address and length on the shared disk device 13. The three raw data are written to the cache area 172a of the server machine 10 (#B) (S55) and returned to the upper level (S56). Thereafter, the process proceeds to step S50.

一方、ステップＳ５４において、キャッシュ領域１７２ａに、ｗｒｉｔｅできるだけの空き容量が無いと判定された場合には、ステップＳ５７において、コミット権管理テーブル１７０ａが参照され、サーバマシン１０（＃Ｂ）にコミット権が付与されているかが確認される（Ｓ５７）。 On the other hand, if it is determined in step S54 that the cache area 172a does not have enough free space to write, the commit right management table 170a is referred to in step S57, and the commit right is given to the server machine 10 (#B). It is confirmed whether it is given (S57).

サーバマシン１０（＃Ｂ）にコミット権が付与されていれば、ステップＳ５８に進み、サーバマシン１０（＃Ｂ）のフィルタドライバ２１（＃Ｂ）が、コミット（キャッシュ領域１７２ａからマスター領域３０への反映）を実施する（Ｓ５８）。 If the commit right is given to the server machine 10 (#B), the process proceeds to step S58, and the filter driver 21 (#B) of the server machine 10 (#B) commits (transfers from the cache area 172a to the master area 30). (Reflect) is performed (S58).

次に、サーバマシン１０（＃Ｂ）のフィルタドライバ２１（＃Ｂ）が、キャッシュ領域１７２ａからコミット済みのキャッシュを削除し（Ｓ５９）、ステップＳ５５の処理に進む。 Next, the filter driver 21 (#B) of the server machine 10 (#B) deletes the committed cache from the cache area 172a (S59), and proceeds to the process of step S55.

一方、ステップＳ５７において、サーバマシン１０（＃Ｂ）にコミット権が付与されていない場合には、ステップＳ６０に進み、フィルタドライバ２１（＃Ｂ）は上位へエラーを返し（Ｓ６０）、しかる後にステップＳ５０の処理に進む。 On the other hand, if the commit right is not given to the server machine 10 (#B) in step S57, the process proceeds to step S60, and the filter driver 21 (#B) returns an error to the upper level (S60), and then the step is performed. The process proceeds to S50.

ステップＳ６１では、ディスクドライバ２２（＃Ｂ）に処理がそのまま渡され、その後ステップＳ６２において返り値がそのまま上位に返され、ステップＳ５０の処理に進む。 In step S61, the process is transferred to the disk driver 22 (#B) as it is, and then in step S62, the return value is returned as it is to the upper level, and the process proceeds to step S50.

上述したように、本実施の形態に係るクラスタシステムにおいては、上記のような作用により、複数のサーバマシン１０（＃Ａ，＃Ｂ）と共有ディスク装置１３とから構成されたクラスタシステムにおいて、共有ディスク装置１３やそのマルチパスドライバの実装に依存せずにリザーブ排他が可能となる。 As described above, in the cluster system according to the present embodiment, sharing is performed in the cluster system configured by the plurality of server machines 10 (#A, #B) and the shared disk device 13 by the operation as described above. Reserve exclusion is possible without depending on the implementation of the disk device 13 or its multipath driver.

また、稼動系のサーバマシン１０（＃Ａ）のダウン、スローダウン、あるいは通信路１４の障害のようないかなる理由でフェールオーバが発生した場合であっても、複数のサーバマシン１０（＃Ａ，＃Ｂ）の重複書き込みによるデータ破損が発生することなく、共有ディスク装置１３内のデータの整合を図ることが可能となる。しかも、アプリケーションデータのためのサーバＡ用領域１７１及びサーバＢ用領域１７２を通常と比較してほとんど増やさずに済むのみならず、フェールオーバ時にコピーするデータ量も少なくて済むために迅速なフェールオーバが可能となる。 In addition, even when a failover occurs for any reason such as a down or slowdown of the active server machine 10 (#A) or a failure of the communication path 14, a plurality of server machines 10 (#A, # The data in the shared disk device 13 can be matched without causing data corruption due to redundant writing in B). In addition, the server A area 171 and the server B area 172 for application data need not be increased as compared with the normal case, and the amount of data to be copied at the time of failover can be reduced, so that a quick failover is possible. It becomes.

また、各サーバマシン１０は、マスター領域３０にコピーすべきパッチ領域（１７１ｂ又は１７２ｂ）を各サーバマシン１０が持っているか否かを管理する。更に、クラスタソフト１２は、稼動系になった場合、この管理結果に基づいて、マスター領域３０にコピーすべきパッチ領域（１７１ｂ又は１７２ｂ）のデータを全てマスター領域３０にコピーすることもできるので、元稼動系のサーバマシン１０（＃Ａ）の状態が確認できない状態（ダウン／スローダウン／生存）で、パッチ領域１７２ｂを持つ新たな稼動系のサーバマシン１０（＃Ｂ）を停止させた場合でも、アプリケーションデータの整合性を崩す危険を回避することが可能となる。 Each server machine 10 manages whether each server machine 10 has a patch area (171b or 172b) to be copied to the master area 30. Further, when the cluster software 12 becomes an active system, all the data of the patch area (171b or 172b) to be copied to the master area 30 can be copied to the master area 30 based on this management result. Even when the status of the original active server machine 10 (#A) cannot be confirmed (down / slowdown / survival) and the new active server machine 10 (#B) having the patch area 172b is stopped It is possible to avoid the risk of breaking the consistency of application data.

（第２の実施の形態）
図９は、第２の実施の形態に係るクラスタシステムの構成例を示す機能ブロック図である。本実施の形態に係るクラスタシステムは、第１の実施の形態に係るクラスタシステムの変形例であるので、第１の実施の形態と同一部位については同一符番で示して重複説明を省略し、異なる点について説明する。 (Second Embodiment)
FIG. 9 is a functional block diagram illustrating a configuration example of the cluster system according to the second embodiment. Since the cluster system according to the present embodiment is a modification of the cluster system according to the first embodiment, the same parts as those in the first embodiment are denoted by the same reference numerals, and redundant description is omitted. Different points will be described.

すなわち、本実施の形態に係るクラスタシステムは、図１に示す構成に加えて、各サーバマシン１０（＃Ａ，＃Ｂ）が更にコミットスレッド２４（＃Ａ，＃Ｂ）とメモリキャッシュ領域２５（＃Ａ，＃Ｂ）とを備えている。コミットスレッド２４及びメモリキャッシュ領域２５はともにフィルタドライバ２１に備えられている。 That is, in the cluster system according to the present embodiment, each server machine 10 (#A, #B) further includes a commit thread 24 (#A, #B) and a memory cache area 25 ( #A, #B). Both the commit thread 24 and the memory cache area 25 are provided in the filter driver 21.

また、共有ディスク装置１３においては、図１に示すＱＵＯＲＵＭ用領域１７を無くした代わりに、マスター領域３０の中にＱＵＯＲＵＭ用ファイル３１を備えている。更に、ＱＵＯＲＵＭ用ファイル３１は、図１０に示すように、コミット権管理テーブル１７０ａ及びパッチ領域保持状態管理テーブル１７０ｂを含む共通領域１７０と、キャッシュ領域１７１ａ及びパッチ領域１７１ｂを含むサーバＡ用領域１７１と、キャッシュ領域１７２ａ及びパッチ領域１７２ｂを含むサーバＢ用領域１７２とを含んでいる。 In addition, the shared disk device 13 includes a QUIORUM file 31 in the master area 30 instead of eliminating the QUARUM area 17 shown in FIG. Furthermore, as shown in FIG. 10, the QUIORUM file 31 includes a common area 170 including a commit right management table 170a and a patch area holding state management table 170b, and a server A area 171 including a cache area 171a and a patch area 171b. A server B area 172 including a cache area 172a and a patch area 172b.

パッチ領域保持状態管理テーブル１７０ｂは、マスター領域３０に反映すべきデータを各サーバマシン１０（＃Ａ，＃Ｂ）がパッチ領域１７１ｂ，１７２ｂに保持しているか否かの情報を管理している。 The patch area holding state management table 170b manages information indicating whether or not each server machine 10 (#A, #B) holds data to be reflected in the master area 30 in the patch areas 171b and 172b.

メモリキャッシュ領域２５（＃Ａ)には、サーバマシン１０（＃Ａ）のアプリケーション１１（＃Ａ）からフィルタドライバ２１（＃Ａ）を経由してｗｒｉｔｅされるデータが書き込まれる。同じデータは、ＦＣケーブル２３（＃Ａ）を経由してサーバＡ用領域１７１内のキャッシュ領域１７１ａにも書き込まれる。 Data written from the application 11 (#A) of the server machine 10 (#A) via the filter driver 21 (#A) is written in the memory cache area 25 (#A). The same data is also written to the cache area 171a in the server A area 171 via the FC cable 23 (#A).

同様に、メモリキャッシュ領域２５（＃Ｂ）には、サーバマシン１０（＃Ｂ）のアプリケーション１１（＃Ｂ）からフィルタドライバ２１（＃Ｂ）を経由してｗｒｉｔｅされるデータが書き込まれる。同じデータは、ＦＣケーブル２３（＃Ｂ）を経由してサーバＢ用領域１７２内のキャッシュ領域１７２ａにも書き込まれる。 Similarly, data written from the application 11 (#B) of the server machine 10 (#B) via the filter driver 21 (#B) is written in the memory cache area 25 (#B). The same data is also written to the cache area 172a in the server B area 172 via the FC cable 23 (#B).

これにより、各フィルタドライバ２１（＃Ａ，＃Ｂ）は、各サーバマシン１０（＃Ａ，＃Ｂ）のキャッシュ領域１７１ａ，１７２ａと同じデータを持つメモリキャッシュ領域２５（＃Ａ，＃Ｂ）をそれぞれ備えることになる。 As a result, each filter driver 21 (#A, #B) creates a memory cache area 25 (#A, #B) having the same data as the cache areas 171a, 172a of each server machine 10 (#A, #B). Each will be prepared.

コミットスレッド２４は、自サーバマシン１０にコミット権が付与されている場合、直接もしくはフィルタドライバ２１を経由して、自サーバマシン１０用のキャッシュ領域（１７１ａ又は１７２ａ）のデータを、マスター領域３０に定期的にコピーさせる。 When the commit right is given to the own server machine 10, the commit thread 24 transfers the data in the cache area (171 a or 172 a) for the own server machine 10 to the master area 30 directly or via the filter driver 21. Make copies regularly.

このような本実施の形態に係るクラスタシステムでは、第１の実施の形態に係るクラスタシステムのフィルタドライバ２１が、自サーバマシン１０用のキャッシュ領域（１７１ａ又は１７２ａ）を参照する代わりに、自サーバマシン１０用のメモリキャッシュ領域２５を参照する。また、マスター領域３０における書き込み位置情報と書き込んだデータとを自サーバマシン１０用のキャッシュ領域（１７１ａ又は１７２ａ）内のデータテーブル（１７１ｃ又は１７２ｃ）に書き込む代わりに、自サーバマシン１０用のメモリキャッシュ領域２５およびキャッシュ領域（１７１ａ又は１７２ａ）に書き込む。また、キャッシュ領域（１７１ａ又は１７２ａ）のデータをマスター領域３０にコピーする代わりに、メモリキャッシュ領域２５のデータをマスター領域３０にコピーする。 In such a cluster system according to the present embodiment, the filter driver 21 of the cluster system according to the first embodiment instead of referring to the cache area (171a or 172a) for the own server machine 10 The memory cache area 25 for the machine 10 is referred to. Further, instead of writing the write position information and the written data in the master area 30 into the data table (171c or 172c) in the cache area (171a or 172a) for the own server machine 10, a memory cache for the own server machine 10 is used. Write to the area 25 and the cache area (171a or 172a). Further, instead of copying the data in the cache area (171a or 172a) to the master area 30, the data in the memory cache area 25 is copied to the master area 30.

次に、以上のように構成した本実施の形態に係るクラスタシステムの動作について説明する。 Next, the operation of the cluster system according to this embodiment configured as described above will be described.

まず、図１１に示すフローチャートを用いて、サーバマシン１０（＃Ａ）を稼動系に設定する場合におけるクラスタソフト１２（＃Ａ）による処理の流れを説明する。 First, the flow of processing by the cluster software 12 (#A) when the server machine 10 (#A) is set as an active system will be described using the flowchart shown in FIG.

まず、ユーザ操作により、サーバマシン１０（＃Ａ）を稼動系にせよとの通知が、サーバマシン１０（＃Ａ）のクラスタソフト１２（＃Ａ）に届く（Ｓ７１）。次に、クラスタソフト１２（＃Ａ）によって、サーバマシン１０（＃Ａ）にコミット権が付与されるようにコミット権管理テーブル１７０ａが設定される（Ｓ７２）。すると、クラスタソフト１２（＃Ａ）によって、フィルタドライバ２１（＃Ａ）に稼動系となったことが通知される（Ｓ７３）。そして、フィルタドライバ２１（＃Ａ）は、パッチ領域保持状態管理テーブル１７０ｂを参照し、指定されているサーバマシン１０のパッチ領域（１７１ｂ又は１７２ｂ）の内容がマスター領域３０に反映される（Ｓ７４）。 First, a notification that the server machine 10 (#A) should be an active system is received by the user operation to the cluster software 12 (#A) of the server machine 10 (#A) (S71). Next, the commit right management table 170a is set by the cluster software 12 (#A) so that the commit right is given to the server machine 10 (#A) (S72). Then, the cluster software 12 (#A) notifies the filter driver 21 (#A) that it has become an active system (S73). Then, the filter driver 21 (#A) refers to the patch area holding state management table 170b, and the contents of the designated patch area (171b or 172b) of the server machine 10 are reflected in the master area 30 (S74). .

その後、サーバマシン１０（＃Ａ）のフィルタドライバ２１（＃Ａ）によって、上位層からマスター領域３０が見える状態にされる（Ｓ７５）。そして、サーバマシン１０（＃Ａ）のフィルタドライバ２１（＃Ａ）によって、コミットスレッド２４（＃Ａ）が起動され（Ｓ７６）、サーバマシン１０（＃Ａ）のクラスタソフト１２（＃Ａ）によって、アプリケーション１１（＃Ａ）が起動される（Ｓ７７）。 Thereafter, the master area 30 is made visible from the upper layer by the filter driver 21 (#A) of the server machine 10 (#A) (S75). Then, the commit thread 24 (#A) is started by the filter driver 21 (#A) of the server machine 10 (#A) (S76), and the cluster software 12 (#A) of the server machine 10 (#A) The application 11 (#A) is activated (S77).

次に、図１２のフローチャートを用いて、サーバマシン１０（＃Ａ）を稼動系に設定した後のコミットスレッド２４（＃Ａ）による処理の流れを説明する。 Next, the flow of processing by the commit thread 24 (#A) after setting the server machine 10 (#A) to the active system will be described using the flowchart of FIG.

サーバマシン１０（＃Ａ）が稼動系に設定された後、コミットスレッド２４（＃Ａ）がフィルタドライバ２１（＃Ａ）に対してコミット実行命令を発行する（Ｓ８１）。コミットを連続的に続けていると、サーバマシン１０（＃Ａ）の負荷が常に１００％になって、他の処理をする余裕がなくなる恐れがある。これを回避するために、ステップＳ８１の後、例えば１０秒間スリープする（Ｓ８２）。これによって、キャッシュ領域１７１ａが満杯にならない程度である例えば１０秒間スリープし、コミットを継続する（Ｓ８３）。 After the server machine 10 (#A) is set to the active system, the commit thread 24 (#A) issues a commit execution command to the filter driver 21 (#A) (S81). If the commit is continued continuously, the load of the server machine 10 (#A) is always 100%, and there is a possibility that there is no room for other processing. In order to avoid this, after step S81, for example, sleep is performed for 10 seconds (S82). As a result, the cache area 171a is not filled up, for example for 10 seconds, and the commit is continued (S83).

次に、図１３のフローチャートを用いて、サーバマシン１０（＃Ａ）を稼動系に設定した後のフィルタドライバ２１（＃Ａ）による処理の流れを説明する。 Next, the flow of processing by the filter driver 21 (#A) after setting the server machine 10 (#A) to the active system will be described using the flowchart of FIG.

まず、サーバマシン１０（＃Ａ）のフィルタドライバ２１（＃Ａ）が上位からのＩ／Ｏ入力を待つ（Ｓ９１）。そして、なされたＩ／Ｏ入力がコミット実行命令である場合にはステップＳ９３の処理に進み、コミット実行命令ではない場合にはステップＳ１００の処理に進む。 First, the filter driver 21 (#A) of the server machine 10 (#A) waits for an I / O input from the host (S91). If the I / O input made is a commit execution instruction, the process proceeds to step S93. If the input is not a commit execution instruction, the process proceeds to step S100.

ステップＳ９３では、コミット権管理テーブル１７０ａが参照され、サーバマシン１０（＃Ａ）にコミット権が付与されているか否かが確認される。そして、サーバマシン１０（＃Ａ）にコミット権が付与されている場合には、ステップＳ９４で、コミット時に、ｒｅａｄ／ｗｒｉｔｅ命令を出さないようにアプリケーション１１（＃Ａ）をブロックする処理であるロックが、コミットスレッド２４（＃Ａ）によって獲得される。 In step S93, the commit right management table 170a is referred to and it is confirmed whether or not the commit right is given to the server machine 10 (#A). If the commit right is given to the server machine 10 (#A), in step S94, lock is a process that blocks the application 11 (#A) so that a read / write command is not issued at the time of commit. Is acquired by the commit thread 24 (#A).

そして、フィルタドライバ２１（＃Ａ）によって、メモリキャッシュ領域２５（＃Ａ）が参照され、コミットする範囲が確定され（Ｓ９５）た後に、コミット、すなわちメモリキャッシュ領域２５（＃Ａ）からマスター領域３０への反映が実施される（Ｓ９６）。そして、メモリキャッシュ領域２５（＃Ａ）及びキャッシュ領域１７１ａからコミット済みのキャッシュが削除され（Ｓ９７）た後に、ステップＳ９３で獲得されたロックが解放される（Ｓ９８）。その後、ステップＳ１２０の処理に進む。 Then, the memory cache area 25 (#A) is referred to by the filter driver 21 (#A), and after committing the range to be committed (S95), the commit, that is, from the memory cache area 25 (#A) to the master area 30 is performed. Reflection is performed (S96). Then, after the committed cache is deleted from the memory cache area 25 (#A) and the cache area 171a (S97), the lock acquired in step S93 is released (S98). Thereafter, the process proceeds to step S120.

そして、ステップＳ１２０では、ＯＳが終了するのであれば処理が終了し、そうでなければステップＳ９１の処理に戻る（Ｓ１２０）。 In step S120, if the OS ends, the process ends. If not, the process returns to step S91 (S120).

一方、ステップＳ９３において、サーバマシン１０（＃Ａ）にコミット権が付与されていない場合には、フィルタドライバ２１（＃Ａ）によって上位へエラーが戻され（Ｓ９９）た後に、ステップＳ１２０の処理に進む。 On the other hand, if the commit right is not granted to the server machine 10 (#A) in step S93, an error is returned to the upper level by the filter driver 21 (#A) (S99), and then the process of step S120 is performed. move on.

ステップＳ１００では、ステップＳ９１でなされたＩ／Ｏ入力の種別が判定される（Ｓ１００）。なされたＩ／Ｏ入力がマスター領域３０へのｒｅａｄであればステップＳ１０１へ、ｗｒｉｔｅであればステップＳ１０９へ、それ以外であればステップＳ１１８へそれぞれ進む。 In step S100, the type of I / O input made in step S91 is determined (S100). If the I / O input made is read to the master area 30, the process proceeds to step S101. If it is write, the process proceeds to step S109. Otherwise, the process proceeds to step S118.

ステップＳ１０１では、ｒｅａｄ中にコミット命令を出さないようにコミットスレッド２４（＃Ａ）をブロックする処理であるロックが、アプリケーション１１（＃Ａ）によって獲得される（Ｓ１０１）。そして、ｒｅａｄ先データの一部又は全部がメモリキャッシュ領域２５（＃Ａ）にあれば（Ｓ１０２）、メモリキャッシュ領域２５（＃Ａ）の該当するデータがｒｅａｄされ（Ｓ１０３）、ステップＳ１０４の処理に進む。ｒｅａｄ先データがメモリキャッシュ領域２５（＃Ａ）に全く無いのであれば、ステップＳ１０３の処理をスキップしてステップＳ１０４の処理に進む。 In step S101, the application 11 (#A) acquires a lock, which is a process for blocking the commit thread 24 (#A) so as not to issue a commit command during read (S101). If part or all of the read destination data is in the memory cache area 25 (#A) (S102), the corresponding data in the memory cache area 25 (#A) is read (S103), and the process of step S104 is performed. move on. If there is no read destination data in the memory cache area 25 (#A), the process of step S103 is skipped and the process proceeds to step S104.

ステップＳ１０４では、ｒｅａｄ先データの一部又は全部がキャッシュ領域１７１ａに無いのであれば、マスター領域３０から、残りのデータのみがｒｅａｄされ（Ｓ１０５）、ステップＳ１０６の処理に進む。そうでない場合には、ステップＳ１０５の処理をスキップしてステップＳ１０６の処理に進む。 In step S104, if some or all of the read destination data is not in the cache area 171a, only the remaining data is read from the master area 30 (S105), and the process proceeds to step S106. Otherwise, the process of step S105 is skipped and the process proceeds to step S106.

ステップＳ１０６では、ステップＳ１０１で獲得されたロックが解放され、その後、ｒｅａｄされたデータが一つにマージされてｒｅａｄ要求元の上位へ返され（Ｓ１０７）、ステップＳ１２０の処理に進む。 In step S106, the lock acquired in step S101 is released, and then the read data is merged into one and returned to the upper level of the read request source (S107), and the process proceeds to step S120.

ステップＳ１０９では、ｗｒｉｔｅ中にコミット命令を出さないようにコミットスレッド２４（＃Ａ）をブロックする処理であるロックが、アプリケーション１１（＃Ａ）によって獲得される。そして、メモリキャッシュ領域２５（＃Ａ）に、ｗｒｉｔｅできるだけの空き容量があるかが判定される（Ｓ１１０）。そして、ｗｒｉｔｅできるだけの空き容量があると判定された場合には、ステップＳ１１１において、サーバマシン１０（＃Ａ）のフィルタドライバ２１（＃Ａ）によって、共有ディスク装置１３上の書き込み開始アドレス、長さ、生データの３つがサーバマシン１０（＃Ａ）のメモリキャッシュ領域２５（＃Ａ）及びキャッシュ領域１７１ａのデータテーブル１７１ｃに書き込まれ（Ｓ１１１）、ステップＳ１０９で獲得されたロックが解放された（Ｓ１１２）後に、ステップＳ１１１で書き込まれたデータが上位へ返される（Ｓ１１３）。その後、ステップＳ１２０の処理に進む。 In step S109, the application 11 (#A) acquires a lock that is a process for blocking the commit thread 24 (#A) so as not to issue a commit instruction during write. Then, it is determined whether the memory cache area 25 (#A) has enough free space for writing (S110). If it is determined that there is enough free space to write, in step S111, the filter driver 21 (#A) of the server machine 10 (#A) writes the write start address and length on the shared disk device 13. Three pieces of raw data are written into the memory cache area 25 (#A) and the data table 171c of the cache area 171a of the server machine 10 (#A) (S111), and the lock acquired in step S109 is released (S112). ) Later, the data written in step S111 is returned to the upper level (S113). Thereafter, the process proceeds to step S120.

一方、ステップＳ１１０において、キャッシュ領域１７１ａに、ｗｒｉｔｅできるだけの空き容量が無いと判定された場合には、ステップＳ１０９で獲得されたロックが解放され（Ｓ１１４）、ステップＳ１１５において、フィルタドライバ２１（＃Ａ）によってコミットの実行命令が発行される（Ｓ１１５）。そして、コミットの実行命令が成功するとステップＳ１０９の処理に戻り、成功しなかった場合には、フィルタドライバ２１（＃Ａ）によって上位へエラーが戻され（Ｓ１１７）、ステップＳ１２０の処理に進む。 On the other hand, if it is determined in step S110 that the cache area 171a does not have enough free space to write, the lock acquired in step S109 is released (S114), and in step S115, the filter driver 21 (#A ) Issues a commit execution command (S115). If the commit execution instruction is successful, the process returns to step S109. If the commit execution instruction is not successful, an error is returned to the upper level by the filter driver 21 (#A) (S117), and the process proceeds to step S120.

ステップＳ１１８では、ディスクドライバ２２（＃Ａ）に処理がそのまま渡され、その後ステップＳ１１９において返り値がそのまま上位に返され、ステップＳ１２０の処理に進む。 In step S118, the process is transferred to the disk driver 22 (#A) as it is, and then in step S119, the return value is returned to the upper level as it is, and the process proceeds to step S120.

次に、図１４に示すフローチャートを用いて、サーバマシン１０（＃Ｂ）がハートビート切れを検出し、フェールオーバするときのサーバマシン１０（＃Ｂ）のクラスタソフト１２（＃Ｂ）による処理の流れを説明する。 Next, the flow of processing by the cluster software 12 (#B) of the server machine 10 (#B) when the server machine 10 (#B) detects a heartbeat break and fails over using the flowchart shown in FIG. Will be explained.

まず、サーバマシン１０（＃Ｂ）のクラスタソフト１２（＃Ｂ）は、サーバマシン１０（＃Ａ）のクラスタソフト１２（＃Ａ）が、図７中のＸに示すようにｗｒｉｔｅ中にスローダウンしたり、図７中のＹに示すようにコミット中にスローダウンすることによってハードビート切れを検出する（Ｓ１３１）と、クラスタソフト１２（＃Ｂ）によって、サーバマシン１０（＃Ｂ）にコミット権が付与されるようにコミット権管理テーブル１７０ａが設定される（Ｓ１３２）。なお、スローダウンしたクラスタソフト１２（＃Ａ）は、何れの場合であっても、しばらくした後に復帰するものとする。 First, the cluster software 12 (#B) of the server machine 10 (#B) is slowed down during the write by the cluster software 12 (#A) of the server machine 10 (#A) as indicated by X in FIG. If the hard beat is detected by slowing down during commit as shown by Y in FIG. 7 (S131), the cluster software 12 (#B) gives the commit right to the server machine 10 (#B). Is set such that the commit right management table 170a is assigned (S132). In any case, the slow-down cluster software 12 (#A) is assumed to return after a while.

次に、クラスタソフト１２（＃Ｂ）が、サーバマシン１０（＃Ａ）用のキャッシュ領域１７１ａの全データ（コミット前又はコミット中のものを含む）を、サーバマシン１０（＃Ｂ）のパッチ領域１７２ｂにコピーする（Ｓ１３３）。 Next, the cluster software 12 (#B) stores all data in the cache area 171a for the server machine 10 (#A) (including data before or during commit) in the patch area of the server machine 10 (#B). It is copied to 172b (S133).

すると、クラスタソフト１２（＃Ｂ）によって、パッチ領域保持状態管理テーブル１７０ｂが、サーバマシン１０（＃Ｂ）のパッチ領域保持状態が指定されるように変更される（Ｓ１３４）。 Then, the cluster software 12 (#B) changes the patch area holding state management table 170b so that the patch area holding state of the server machine 10 (#B) is designated (S134).

次に、クラスタソフト１２（＃Ｂ）によって、フィルタドライバ２１（＃Ｂ）にサーバマシン１０（＃Ｂ）が稼動系になったことが通知される（Ｓ１３５）。その後、サーバマシン１０（＃Ｂ）のフィルタドライバ２１（＃Ｂ）によって、上位層からマスター領域３０が見える状態にされる（Ｓ１３６）。そして、サーバマシン１０（＃Ｂ）のクラスタソフト１２（＃Ｂ）によって、コミットスレッド２４（＃Ｂ）が起動され（Ｓ１３７）、次にアプリケーション１１（＃Ｂ）が起動される（Ｓ１３８）。 Next, the cluster software 12 (#B) notifies the filter driver 21 (#B) that the server machine 10 (#B) has become an active system (S135). Thereafter, the master area 30 is made visible from the upper layer by the filter driver 21 (#B) of the server machine 10 (#B) (S136). Then, the commit thread 24 (#B) is activated by the cluster software 12 (#B) of the server machine 10 (#B) (S137), and then the application 11 (#B) is activated (S138).

次に、図１５のフローチャートを用いて、サーバマシン１０（＃Ｂ）にフェールオーバ後のフィルタドライバ２１（＃Ｂ）による処理の流れを説明する。 Next, the flow of processing by the filter driver 21 (#B) after failover to the server machine 10 (#B) will be described using the flowchart of FIG.

まず、サーバマシン１０（＃Ｂ）のフィルタドライバ２１（＃Ｂ）が上位からのＩ／Ｏ入力を待つ（Ｓ１４１）。このＩ／Ｏ入力がコミット実行命令である場合には、ステップＳ１４３の処理に進み、コミット実行命令でない場合には、ステップＳ１５０の処理に進む（Ｓ１４２）。 First, the filter driver 21 (#B) of the server machine 10 (#B) waits for an I / O input from the host (S141). If this I / O input is a commit execution instruction, the process proceeds to step S143. If not, the process proceeds to step S150 (S142).

ステップＳ１４３では、コミット権管理テーブル１７０ａが参照され、サーバマシン１０（＃Ｂ）にコミット権が付与されているか否かが確認される。そして、サーバマシン１０（＃Ｂ）にコミット権が付与されている場合には、ステップＳ１４４で、コミット時に、ｒｅａｄ／ｗｒｉｔｅ命令を出さないようにアプリケーション１１（＃Ｂ）をブロックする処理であるロックが、コミットスレッド２４（＃Ｂ）によって獲得される。 In step S143, the commit right management table 170a is referred to, and it is confirmed whether the commit right is given to the server machine 10 (#B). If the commit right is given to the server machine 10 (#B), in step S144, lock is a process that blocks the application 11 (#B) so as not to issue a read / write command at the time of commit. Is acquired by the commit thread 24 (#B).

そして、フィルタドライバ２１（＃Ｂ）によって、メモリキャッシュ領域２５（＃Ｂ）が参照され、コミットする範囲が確定され（Ｓ１４５）た後に、コミット、すなわちメモリキャッシュ領域２５（＃Ｂ）からマスター領域３０への反映が実施される（Ｓ１４６）。そして、メモリキャッシュ領域２５（＃Ｂ）及びキャッシュ領域１７２ａからコミット済みのキャッシュが削除され（Ｓ１４７）た後に、ステップＳ１４３で獲得されたロックが解放される（Ｓ１４８）。その後、ステップＳ１７４の処理に進む。 Then, the memory cache area 25 (#B) is referred to by the filter driver 21 (#B), and after the range to be committed is determined (S145), the commit, that is, from the memory cache area 25 (#B) to the master area 30 is performed. Reflection is performed (S146). Then, after the committed cache is deleted from the memory cache area 25 (#B) and the cache area 172a (S147), the lock acquired in step S143 is released (S148). Thereafter, the process proceeds to step S174.

そして、ステップＳ１７４では、ＯＳが終了するのであれば処理が終了し、そうでなければステップＳ１４１の処理に戻る（Ｓ１７４）。 In step S174, if the OS ends, the process ends. If not, the process returns to step S141 (S174).

一方、ステップＳ１４３において、サーバマシン１０（＃Ｂ）にコミット権が付与されていない場合には、フィルタドライバ２１（＃Ｂ）によって上位へエラーが戻され（Ｓ１４９）た後に、ステップＳ１７４の処理に進む。 On the other hand, if the commit right is not granted to the server machine 10 (#B) in step S143, an error is returned to the upper level by the filter driver 21 (#B) (S149), and then the process of step S174 is performed. move on.

ステップＳ１５０では、ステップＳ１４１でなされたＩ／Ｏ入力の種別が判定される（Ｓ５０）。なされたＩ／Ｏ入力がマスター領域３０へのｒｅａｄであればステップＳ１５１へ、ｗｒｉｔｅであればステップＳ１６０へ、それ以外であればステップＳ１７２へそれぞれ進む。 In step S150, the type of I / O input made in step S141 is determined (S50). If the I / O input made is read to the master area 30, the process proceeds to step S151. If it is write, the process proceeds to step S160. Otherwise, the process proceeds to step S172.

ステップＳ１５１では、ｒｅａｄ先データの一部又は全部がパッチ領域１７２ｂにあれば、パッチ領域１７２ｂ上の該当するデータがｒｅａｄされ（Ｓ１５２）、ステップＳ１５３の処理に進む。ｒｅａｄ先データがパッチ領域１７２ｂに全く無いのであれば、ステップＳ１５２の処理をスキップしてステップＳ１５３の処理に進む。 In step S151, if part or all of the read destination data is in the patch area 172b, the corresponding data on the patch area 172b is read (S152), and the process proceeds to step S153. If there is no read destination data in the patch area 172b, the process of step S152 is skipped and the process proceeds to step S153.

ステップＳ１５３では、ｒｅａｄ中にコミット命令を出さないようにコミットスレッド２４（＃Ｂ）をブロックする処理であるロックが、アプリケーション１１（＃Ｂ）によって獲得される（Ｓ１５３）。そして、ｒｅａｄ先データの一部又は全部がメモリキャッシュ領域２５（＃Ｂ）にあれば（Ｓ１５４）、メモリキャッシュ領域２５（＃Ｂ）の該当するデータがｒｅａｄされ（Ｓ１５５）、ステップＳ１５６の処理に進む。ｒｅａｄ先データがメモリキャッシュ領域２５（＃Ｂ）に全く無いのであれば、ステップＳ１５５の処理をスキップしてステップＳ１５６の処理に進む。 In step S153, the application 11 (#B) acquires a lock that is a process for blocking the commit thread 24 (#B) so as not to issue a commit command during read (S153). If part or all of the read destination data is in the memory cache area 25 (#B) (S154), the corresponding data in the memory cache area 25 (#B) is read (S155), and the process of step S156 is performed. move on. If there is no read destination data in the memory cache area 25 (#B), the process of step S155 is skipped and the process proceeds to step S156.

ステップＳ１５６では、ｒｅａｄ先データの一部又は全部がキャッシュ領域１７２ａにもパッチ領域１７２ｂにも無いのであれば、マスター領域３０から、残りのデータのみがｒｅａｄされ（Ｓ１５７）、ステップＳ１５８の処理に進む。そうでない場合には、ステップＳ１５７の処理をスキップしてステップＳ１５８の処理に進む。 In step S156, if part or all of the read destination data is not in the cache area 172a or the patch area 172b, only the remaining data is read from the master area 30 (S157), and the process proceeds to step S158. . Otherwise, the process of step S157 is skipped and the process proceeds to step S158.

ステップＳ１５８では、ステップＳ１５３で獲得されたロックが解放され、その後、ｒｅａｄされたデータが一つにマージされてｒｅａｄ要求元の上位へ返され（Ｓ１５９）、ステップＳ１７４の処理に進む。 In step S158, the lock acquired in step S153 is released, and then the read data is merged into one and returned to the upper level of the read request source (S159), and the process proceeds to step S174.

ステップＳ１６０では、ｗｒｉｔｅ先の一部又は全部がパッチ領域１７２ｂ上にあれば、ステップＳ１６１に進み、該当するデータがパッチ領域１７２ｂにｗｒｉｔｅされ（Ｓ１６１）、ステップＳ１６２の処理に進む。 In step S160, if part or all of the write destination is on the patch area 172b, the process proceeds to step S161, the corresponding data is written to the patch area 172b (S161), and the process proceeds to step S162.

ステップＳ１６２では、ｗｒｉｔｅ先の全てがパッチ領域１７２ｂであった場合には、ステップＳ１６７の処理に進み、上位へ結果が返された後にステップＳ１７４の処理に進む。一方、ｗｒｉｔｅ先の全てがパッチ領域１７２ｂである訳ではない場合には、ステップＳ１６３の処理に進む。 In step S162, if all of the write destinations are the patch area 172b, the process proceeds to step S167, and after the result is returned to the upper level, the process proceeds to step S174. On the other hand, if not all of the write destinations are the patch area 172b, the process proceeds to step S163.

ステップＳ１６３では、ｗｒｉｔｅ中にコミット命令を出さないようにコミットスレッド２４（＃Ｂ）をブロックする処理であるロックが、アプリケーション１１（＃Ｂ）によって獲得される。そして、メモリキャッシュ領域２５（＃Ｂ）に、ｗｒｉｔｅできるだけの空き容量があるかが判定される（Ｓ１６４）。そして、ｗｒｉｔｅできるだけの空き容量があると判定された場合には、ステップＳ１６５において、サーバマシン１０（＃Ｂ）のフィルタドライバ２１（＃Ｂ）によって、共有ディスク装置１３上の書き込み開始アドレス、長さ、生データの３つがサーバマシン１０（＃Ｂ）のメモリキャッシュ領域２５（＃Ｂ）及びキャッシュ領域１７２ａに書き込まれ（Ｓ１６５）、ステップＳ１６３で獲得されたロックが解放された（Ｓ１６６）後に、ステップＳ１６５で書き込まれたデータが上位へ返される（Ｓ１６７）。その後、ステップＳ１７４の処理に進む。 In step S163, the application 11 (#B) acquires a lock that is a process for blocking the commit thread 24 (#B) so as not to issue a commit instruction during write. Then, it is determined whether the memory cache area 25 (#B) has enough free capacity to write (S164). If it is determined that there is enough free space to write, in step S165, the filter driver 21 (#B) of the server machine 10 (#B) writes the write start address and length on the shared disk device 13. , Three of the raw data are written into the memory cache area 25 (#B) and the cache area 172a of the server machine 10 (#B) (S165), and after the lock acquired in step S163 is released (S166), the step The data written in S165 is returned to the upper level (S167). Thereafter, the process proceeds to step S174.

一方、ステップＳ１６４において、キャッシュ領域１７２ａに、ｗｒｉｔｅできるだけの空き容量が無いと判定された場合には、ステップＳ１６３で獲得されたロックが解放され（Ｓ１６８）、ステップＳ１６９において、フィルタドライバ２１（＃Ｂ）によってコミットの実行命令が発行される（Ｓ１６９）。そして、コミットの実行命令が成功するとステップＳ１６３の処理に戻り、成功しなかった場合には、フィルタドライバ２１（＃Ｂ）によって上位へエラーが戻され（Ｓ１７１）、ステップＳ１７４の処理に進む。 On the other hand, if it is determined in step S164 that there is not enough free space in the cache area 172a, the lock acquired in step S163 is released (S168), and in step S169, the filter driver 21 (#B ) Issues a commit execution command (S169). If the commit execution instruction is successful, the process returns to step S163. If the commit execution instruction is not successful, an error is returned to the upper level by the filter driver 21 (#B) (S171), and the process proceeds to step S174.

ステップＳ１７２では、ディスクドライバ２２（＃Ｂ）に処理がそのまま渡され、その後ステップＳ１７３において返り値がそのまま上位に返され、ステップＳ１７４の処理に進む。 In step S172, the process is passed to the disk driver 22 (#B) as it is, and then in step S173, the return value is returned as it is to the upper level, and the process proceeds to step S174.

なお、サーバマシン１０（＃Ｂ）にフェールオーバした後のコミットスレッド２４（＃Ｂ）による処理の流れは、図１２のフローチャートの説明に示すサーバマシン１０（＃Ａ）、フィルタドライバ２１（＃Ａ）、コミットスレッド２４（＃Ａ）、及びキャッシュ領域１７１ａを、サーバマシン１０（＃Ｂ）、フィルタドライバ２１（＃Ｂ）、コミットスレッド２４（＃Ｂ）、及びキャッシュ領域１７２ａに読み替えたものと同じである。 The processing flow by the commit thread 24 (#B) after failing over to the server machine 10 (#B) is the server machine 10 (#A) and the filter driver 21 (#A) shown in the flowchart of FIG. The commit thread 24 (#A) and the cache area 171a are the same as the server machine 10 (#B), the filter driver 21 (#B), the commit thread 24 (#B), and the cache area 172a. is there.

上述したように、本実施の形態に係るクラスタシステムにおいては、共通領域１７０、サーバＡ用領域１７１、及びサーバＢ用領域１７２をマスター領域３０内に設けているので、新たに特別なＬＵやパーティションを用意することが無くなるため、例えば、既に稼動しているクラスタシステムへの適用を容易に行うことが可能となる。 As described above, in the cluster system according to the present embodiment, the common area 170, the server A area 171 and the server B area 172 are provided in the master area 30, so that a new special LU or partition is newly created. For example, it is possible to easily apply to a cluster system that is already in operation.

また、フィルタドライバ２１がコミットする際、例えば、第１の実施の形態のような構成では、キャッシュ領域（１７１ａ又は１７２ａ）からのデータのｒｅａｄと、マスター領域３０へのデータのｗｒｉｔｅとのために２回のディスクアクセスが必要であったが、本実施の形態によれば、ディスクアクセスの回数が減るため、共有ディスク装置１３に対する負荷が減り、上位アプリケーションからのＩ／Ｏに対しより高速な処理が可能となる。 When the filter driver 21 commits, for example, in the configuration as in the first embodiment, for the read of data from the cache area (171a or 172a) and the write of data to the master area 30 Although two disk accesses were required, according to the present embodiment, the number of disk accesses is reduced, so the load on the shared disk device 13 is reduced, and higher-speed processing is performed for I / O from higher-level applications. Is possible.

更に、本実施の形態に係るクラスタシステムでは、コミットスレッド２４は、自サーバマシン１０にコミット権が付与されている場合、直接もしくはフィルタドライバ２１を経由して、自サーバマシン１０用のキャッシュ領域（１７１ａ又は１７２ａ）のデータを、マスター領域３０に定期的にコピーさせるので、マスター領域３０にキャッシュ領域（１７１ａ又は１７２ａ）のデータをコピーする処理を、通常のＩ／Ｏ入力に対する処理と並行して行うことが可能となり、上位アプリケーションからのＩ／Ｏの遅延を防ぎ、高速化を図ることが可能となる。 Furthermore, in the cluster system according to the present embodiment, the commit thread 24, when the commit right is given to the own server machine 10, directly or via the filter driver 21, the cache area for the own server machine 10 ( Since the data of 171a or 172a) is periodically copied to the master area 30, the process of copying the data of the cache area (171a or 172a) to the master area 30 is performed in parallel with the process for normal I / O input. Therefore, it is possible to prevent I / O delay from the host application and increase the speed.

更にまた、本実施の形態に係るクラスタシステムでは、パッチ領域保持状態を管理するパッチ領域保持状態管理テーブル１７０ｂを備えているので、パッチ領域保持状態管理テーブル１７０ｂを参照することにより、マスター領域３０に反映すべきパッチ領域（１７１ｂ又は１７２ｂ）を各サーバマシン１０（＃Ａ，＃Ｂ）が持っているか否かを把握できる。従って、例えば、サーバマシン１０（＃Ｂ）を停止させたい場合でも、サーバマシン１０（＃Ａ）の状態を確認することができ、パッチ領域１７１ｂからマスター領域３０への必要な反映のし忘れや、サーバマシン１０（＃Ａ）とサーバマシン１０（＃Ｂ）との両系による重複書き込みを回避し、パッチ領域（１７１ｂ又は１７２ｂ）をマスター領域３０の該当箇所に確実に反映することが可能となる。 Furthermore, since the cluster system according to the present embodiment includes the patch area holding state management table 170b for managing the patch area holding state, the master area 30 can be referred to by referring to the patch area holding state management table 170b. It can be ascertained whether each server machine 10 (#A, #B) has a patch area (171b or 172b) to be reflected. Therefore, for example, even when it is desired to stop the server machine 10 (#B), the state of the server machine 10 (#A) can be confirmed, and the necessary reflection from the patch area 171b to the master area 30 is forgotten. It is possible to avoid redundant writing by both the server machine 10 (#A) and the server machine 10 (#B), and to reliably reflect the patch area (171b or 172b) in the corresponding area of the master area 30. Become.

以上、本発明を実施するための最良の形態について、添付図面を参照しながら説明したが、本発明はかかる構成に限定されない。特許請求の範囲の発明された技術的思想の範疇において、当業者であれば、各種の変更例及び修正例に想到し得るものであり、それら変更例及び修正例についても本発明の技術的範囲に属するものと了解される。 The best mode for carrying out the present invention has been described above with reference to the accompanying drawings, but the present invention is not limited to such a configuration. Within the scope of the invented technical idea of the scope of claims, a person skilled in the art can conceive of various changes and modifications. The technical scope of the present invention is also applicable to these changes and modifications. It is understood that it belongs to.

第１の実施の形態に係るクラスタシステムの構成例を示す機能ブロック図。1 is a functional block diagram showing a configuration example of a cluster system according to a first embodiment. FIG. 第２の実施の形態における共有ディスク装置の詳細構成例を示す概念図。The conceptual diagram which shows the detailed structural example of the shared disk apparatus in 2nd Embodiment. サーバマシンを稼動系に設定する場合におけるクラスタソフトによる処理の流れを示すフローチャート。The flowchart which shows the flow of a process by cluster software in setting a server machine to an active system. サーバマシンを稼動系に設定する場合におけるクラスタソフト、および設定後におけるフィルタドライバによる処理の流れを示す概念図。The conceptual diagram which shows the flow of a process by the cluster software in the case of setting a server machine to an active system, and the filter driver after a setting. サーバマシンを稼動系に設定した後のフィルタドライバによる処理の流れを示すフローチャート。The flowchart which shows the flow of a process by the filter driver after setting a server machine to an active system. サーバマシンがハートビート切れを検出し、フェールオーバするときのサーバマシンのクラスタソフトによる処理の流れを示すフローチャート。The flowchart which shows the flow of a process by the cluster software of a server machine when a server machine detects heartbeat expiration and fails over. サーバマシンがハートビート切れを検出し、フェールオーバするときのサーバマシンのクラスタソフトによる処理と、サーバマシンにフェールオーバ後のフィルタドライバによる処理の流れを示す概念図。The conceptual diagram which shows the flow of the process by the cluster software of the server machine when a server machine detects heartbeat expiration, and fails over, and the process by the filter driver after a failover to the server machine. サーバマシンにフェールオーバ後のフィルタドライバによる処理の流れを示すフローチャート。The flowchart which shows the flow of a process by the filter driver after failover to a server machine. 第２の実施の形態に係るクラスタシステムの構成例を示す機能ブロック図。The functional block diagram which shows the structural example of the cluster system which concerns on 2nd Embodiment. 第２の実施の形態における共有ディスク装置の詳細構成例を示す概念図。The conceptual diagram which shows the detailed structural example of the shared disk apparatus in 2nd Embodiment. サーバマシンを稼動系に設定する場合におけるクラスタソフトによる処理の流れを示すフローチャート。The flowchart which shows the flow of a process by cluster software in setting a server machine to an active system. サーバマシンを稼動系に設定した後のコミットスレッドによる処理の流れを示すフローチャート。The flowchart which shows the flow of a process by the commit thread after setting a server machine to an active system. サーバマシンを稼動系に設定した後のフィルタドライバによる処理の流れを示すフローチャート。The flowchart which shows the flow of a process by the filter driver after setting a server machine to an active system. サーバマシンがハートビート切れを検出し、フェールオーバするときのクラスタソフトによる処理の流れを示すフローチャート。The flowchart which shows the flow of a process by the cluster software when a server machine detects a heartbeat break and fails over. サーバマシンにフェールオーバ後のフィルタドライバによる処理の流れを示すフローチャート。The flowchart which shows the flow of a process by the filter driver after failover to a server machine. 従来技術のクラスタシステムによるフェールオーバを説明するための概念図。The conceptual diagram for demonstrating the failover by the cluster system of a prior art. 従来技術のクラスタシステムによるフェールオーバ時に生じるデータ破壊を説明するための概念図。The conceptual diagram for demonstrating the data destruction which arises at the time of failover by the cluster system of a prior art. リザーブ排他機能を用いた従来技術のクラスタシステムによるフェールオーバを説明するための概念図。The conceptual diagram for demonstrating the failover by the cluster system of the prior art using a reserve exclusive function. ＱＵＯＲＵＭを用いた従来技術のクラスタシステムによるフェールオーバを説明するための概念図。The conceptual diagram for demonstrating the failover by the cluster system of the prior art using QUIORUM. ＱＵＯＲＵＭを用いた従来技術のクラスタシステムによるフェールオーバ時に生じるデータ破壊を説明するための概念図。The conceptual diagram for demonstrating the data destruction which arises at the time of the failover by the cluster system of the prior art using QUIORUM.

Explanation of symbols

ａ…ハートビート、ｂ…フェールオーバ、１０…サーバマシン、１１…アプリケーション、１２…クラスタソフト、１３…共有ディスク装置、１４…通信路、１５…データ領域、１７…ＱＵＯＲＵＭ用領域、１８…アプリケーションデータ用領域、２１…フィルタドライバ、２２…ディスクドライバ、２３…ＦＣケーブル、２４…コミットスレッド、２５…メモリキャッシュ領域、３０…マスター領域、３１…ＱＵＯＲＵＭ用ファイル、１０７…ＱＵＯＲＵＭ用領域、１７０…共通領域、１７０ａ…コミット権管理テーブル、１７０ｂ…パッチ領域保持状態管理テーブル、１７１…サーバＡ用領域、１７１ａ，１７２ａ…キャッシュ領域、１７１ｂ，１７２ｂ…パッチ領域、１７１ｃ…データテーブル、１７２…サーバＢ用領域 a ... heartbeat, b ... failover, 10 ... server machine, 11 ... application, 12 ... cluster software, 13 ... shared disk device, 14 ... communication path, 15 ... data area, 17 ... QUIORUM area, 18 ... application data Area, 21 ... filter driver, 22 ... disk driver, 23 ... FC cable, 24 ... commit thread, 25 ... memory cache area, 30 ... master area, 31 ... QUIORUM file, 107 ... QUARUUM area, 170 ... common area, 170a ... Commit right management table, 170b ... Patch area holding state management table, 171 ... Server A area, 171a, 172a ... Cache area, 171b, 172b ... Patch area, 171c ... Data table, 172 ... Server B area

Claims

A cluster system composed of a plurality of server machines and a shared disk device connected to the plurality of server machines,
The plurality of server machines include an application that runs on each of the plurality of server machines, cluster software that runs on each of the plurality of server machines, and a filter driver that runs on each of the plurality of server machines,
The shared disk device includes a master area for storing data of the application, an area for each server for storing data of each server machine, and a common area shared by each server machine,
Each server area includes a cache area and a patch area for storing data, respectively.
The common area includes a commit right management unit that defines a server machine to which a commit right that is permission to copy data stored in the cache area to the master area is provided,
Each cluster software provided in the plurality of server machines communicates with each other periodically to check each other's survival state, and each has two types of states, an active system and a standby system,
If the cluster software is active,
The cluster software changes the commit right management unit so that only the own server machine is given the commit right, makes the master area visible from the own server machine side, and is provided in the own server machine. Start the application
The filter driver provided in the own server machine waits for I / O input, and when the I / O input is a read request to the master area, part or all of the read request data is the own server machine. If there is another read request data, it reads from the cache area, and if there is any other read request data, it reads from the master area, and returns the read data to the I / O input side that has made the read request. If / O input is a write request to the master area, if there is no free space to write to the cache area for the own server machine and the commit right is granted to the own server machine, The cache area is copied to the master area, the cache area is cleared, and the commit right is not granted to the local server machine If this is the case, an error is returned to the I / O input side that made the write request, and then the write position information in the master area and the written data are written into the cache area for the local server machine, and the write result is written to the write area. If the I / O input is neither a read request nor a write request for the master area, the I / O input is sent to the lower side of the filter driver provided in the server machine. Pass data by I / O input, and return the return value to the I / O input side,
When the cluster software is a standby system and the standby cluster software determines that the active cluster software is not alive,
The determined cluster software changes the commit right management unit so that the commit right is given only to the own server machine, and the cache area data for the server machine provided with the active cluster software is stored. Copy to the patch area for the local server machine, substitute the patch area as a part of the master area, make the master area visible from the local server machine side, and start the application provided on the local server machine ,
The filter driver provided on the local server machine waits for I / O input after the cluster software provided on the local server machine becomes active, and the I / O input is read into the master area. If it is a request, if part or all of the read request data is in the patch area for the local server machine, it is read from the patch area, and part or all of the read request data is in the cache area for the local server machine. Read from the cache area, if there is any other read request data, read from the master area, return the read data to the I / O input side that requested the read, and the I / O input to the master area If it is a write request, if all of the write request destinations are in the patch area for the local server machine, the write request is written in the patch area and the I / O input side If not, only the data corresponding to the patch area is written to the patch area, and if there is no free space to write to the cache area for the own server machine, refer to the commit right management unit and refer to the own server machine. Whether or not the commit right is granted to the master area, the cache area data for the local server machine is copied to the master area to clear the cache area, and if not granted, the I An error is returned to the / O input side, and then the write position information and the written data in the master area are written to the cache area for the own server machine, returned to the I / O input side, and the I / O input is If the request is neither a read request nor a write request to the master area, the filter driver provided in the local server machine is used. Cluster system adapted delivers the data by the I / O input to the lower layer of the server, and returns the return value to the I / O input.

A program applied to a cluster system composed of a plurality of server machines and a shared disk device connected to the plurality of server machines,
The plurality of server machines include an application that runs on each of the plurality of server machines, cluster software that runs on each of the plurality of server machines, and a filter driver that runs on each of the plurality of server machines,
The shared disk device includes a master area for storing data of the application, an area for each server for storing data of each server machine, and a common area shared by each server machine,
Each server area includes a cache area and a patch area for storing data, respectively.
The common area includes a commit right management unit that defines a server machine to which a commit right that is permission to copy data stored in the cache area to the master area is provided,
Each cluster software provided in the plurality of server machines communicates with each other periodically to check each other's survival state, and each has two types of states, an active system and a standby system,
When the cluster software is active, the program is
The cluster software changes the commit right management unit so that only the own server machine is granted the commit right, makes the master area visible from the own server machine side, and is provided in the own server machine. A function to start
The filter driver provided in the own server machine waits for I / O input, and when the I / O input is a read request to the master area, the cache area for the own server machine is referred to, If some or all of the read request data is in the cache area for its own server machine, it is read from the cache area, and if there is any other read request data, it is read from the master area. The I / O input is returned to the I / O input side, and the I / O input is a write request to the master area, and there is no free space for writing to the cache area for the own server machine. If the commit right is granted, the cache area is cleared by copying the data in the cache area to the master area. If the commit right is not given to the own server machine, an error is returned to the I / O input side that made the write request, and then the write position information in the master area and the written data are returned for the own server machine. When writing to the cache area and returning the write result to the I / O input side that has made the write request, and the I / O input is neither a read request nor a write request to the master area, the local server machine A function of passing the data by the I / O input to the lower side of the filter driver provided in the server and returning a return value to the I / O input side in the local server machine ,
The program is further provided when the cluster software is a standby system.
When the standby cluster software determines whether the active cluster software is alive, and determines that the active cluster software is not alive, the standby cluster software changes the commit right management unit. The commit right is granted only to the local server machine, and the cache area data for the server machine provided with the active cluster software is copied to the patch area for the local server machine to copy the patch area. Substituting as a part of the master area, making the master area visible from the own server machine side, and starting an application provided in the own server machine,
The filter driver provided on the local server machine waits for I / O input after the cluster software provided on the local server machine becomes active, and the I / O input is read into the master area. If it is a request, if a part or all of the read request data is in the patch area for the own server machine, read from the patch area, refer to the cache area for the own server machine, and Or, if all are in the cache area for the server machine, read from the cache area, if there is other read request data, read from the master area, and return the read data to the I / O input side that requested the read When the I / O input is a write request to the master area, all the write request destinations must be in the patch area for the own server machine. Write to the patch area and return to the I / O input side; otherwise, only the data corresponding to the patch area is written to the patch area and there is no free space to write to the cache area for the local server machine. For example, the commit right management unit is referred to confirm whether or not the commit right is granted to the own server machine. If granted, the cache area data for the own server machine is copied to the master area and The cache area is cleared, and if it is not granted, an error is returned to the I / O input side. Thereafter, the write position information in the master area and the written data are written into the cache area for the own server machine, and the I / O When the I / O input is neither a read request nor a write request to the master area. The I / O input passes the data by, the return value the program for a function that returns the I / O input side to realize the local server machine to the lower layer of the filter driver that the provided in the own server machine.

The program according to claim 2,
Each filter driver, each comprise a memory cache area having the same data as the cache area of the local server machine,
Instead of referring to the cache area for the own server machine, instead of referring to the memory cache area for the own server machine, the write position information in the master area and the written data are written to the cache area for the own server machine. In addition, a program for copying data in the memory cache area to the master area instead of writing to the memory cache area and the cache area for the server machine and copying the data in the cache area to the master area.