JP4874847B2

JP4874847B2 - Cluster system

Info

Publication number: JP4874847B2
Application number: JP2007081624A
Authority: JP
Inventors: 孝治村松; 哲也飯沼; 茂夫大道; 雅田中
Original assignee: Toshiba Corp; Toshiba Solutions Corp
Current assignee: Toshiba Corp; Toshiba Digital Solutions Corp
Priority date: 2007-03-27
Filing date: 2007-03-27
Publication date: 2012-02-15
Anticipated expiration: 2027-03-27
Also published as: JP2008242742A

Description

本発明は、複数のサーバマシンを備えてなるクラスタシステム及びプログラムに関し、特に、フェールオーバが発生した場合であっても、データの整合を図ることが可能なクラスタシステムに関する。 The present invention relates to a cluster system and a program including a plurality of server machines, and more particularly to a cluster system capable of matching data even when a failover occurs.

例えば、非特許文献１で開示されているシステムに代表されるクラスタシステムでは、図２０に示すように、稼動系のサーバマシン１０（＃Ａ）と待機系のサーバマシン１０（＃Ｂ）とが設けられ、稼動系のサーバマシン１０（＃Ａ）がアプリケーション１１（＃Ａ）を実行し、（１０１）に示すように、実行結果であるデータをミラーリングディスク装置２０（＃Ａ）に書き込む。その間、（１０２）に示すように、稼動系のサーバマシン１０（＃Ａ）のクラスタソフト１２（＃Ａ）と待機系のサーバマシン１０（＃Ｂ）のクラスタソフト１２（＃Ｂ）とは、通信路５０を経由して、ハートビートａと呼ばれる所定のパケット交換をし続け、互いの生存を通知し合う。 For example, in a cluster system represented by the system disclosed in Non-Patent Document 1, as shown in FIG. 20, there are an active server machine 10 (#A) and a standby server machine 10 (#B). The active server machine 10 (#A) is executed, and the application 11 (#A) is executed, and the execution result data is written to the mirroring disk device 20 (#A) as indicated by (101). Meanwhile, as shown in (102), the cluster software 12 (#A) of the active server machine 10 (#A) and the cluster software 12 (#B) of the standby server machine 10 (#B) are: A predetermined packet called heartbeat a is continuously exchanged via the communication path 50 to notify each other of their survival.

更に、（１０３）に示すように、ハートビート継続中は、常時ミラーリングを行うことによって、ミラーリングディスク装置２０（＃Ａ）に格納されたデータが、ミラーリングディスク装置２０（＃Ｂ）にミラーリングされる。 Further, as shown in (103), the data stored in the mirroring disk device 20 (#A) is mirrored to the mirroring disk device 20 (#B) by always performing mirroring while the heartbeat continues. .

このような状態において、クラスタソフト１２（＃Ｂ）が、ハートビートａの断絶を検出すると、（１０４）に示すように、待機系のサーバマシン１０（＃Ｂ）で同一のアプリケーション１１（＃Ｂ）を起動させることでアプリケーション処理を継続させる、所謂フェールオーバｂが一般に行われている。 In this state, when the cluster software 12 (#B) detects the disconnection of the heartbeat a, as shown in (104), the same application 11 (#B) is used in the standby server machine 10 (#B). In general, so-called failover b is performed in which application processing is continued by activating.

ハートビートａが断絶する理由としては主に以下がある。
［１］稼動系のサーバマシン１０（＃Ａ）のダウン。
［２］通信路５０の障害。
［３］稼動系のサーバマシン１０（＃Ａ）におけるＣＰＵ高負荷等による一時的なスローダウンの発生。 The main reasons for the heartbeat a breaking are as follows.
[1] Down of the active server machine 10 (#A).
[2] Communication path 50 failure.
[3] Temporary slowdown due to high CPU load or the like in the active server machine 10 (#A).

上記［１］の場合であれば、単純に待機系のサーバマシン１０（＃Ｂ）にフェールオーバｂすれば良いが、上記［２］及び［３］の場合には稼動系のサーバマシン１０（＃Ａ）が処理を継続するため、フェールオーバｂしてしまうと以下に示すような不都合が生じる。 In the case of the above [1], it is sufficient to simply fail over to the standby server machine 10 (#B), but in the case of [2] and [3], the active server machine 10 (# Since A) continues the processing, the following inconvenience occurs when failover b occurs.

すなわち、図２１に示すように、複数台のサーバマシン１０（＃Ａ，＃Ｂ）で通信回線５２を経由してミラーリングディスク装置２０（＃Ａ），２０（＃Ｂ）をミラーリングした状態で、データベースサーバアプリケーション（以降「ＤＢサーバ」と称する）１３（＃Ａ，＃Ｂ）を対象にクラスタを構成する場合、上記［３］が原因のハートビートａ断絶によりフェールオーバｂが発生すると、ミラーリングディスク装置２０（＃Ａ），２０（＃Ｂ）間のミラーリングは強制的に停止される。これは、待機系であったサーバマシン１０（＃Ｂ）で新たに実行するＤＢサーバ１３（＃Ｂ）からのｗｒｉｔｅと、稼動系であったサーバマシン１０（＃Ａ）のＤＢサーバ１３（＃Ａ）からのｗｒｉｔｅの混在によるデータ破壊を避けるためである。 That is, as shown in FIG. 21, in the state in which the mirroring disk devices 20 (#A) and 20 (#B) are mirrored via the communication line 52 in a plurality of server machines 10 (#A and #B), When a cluster is configured for a database server application (hereinafter referred to as “DB server”) 13 (#A, #B), if failover b occurs due to heartbeat a interruption caused by [3] above, a mirroring disk device Mirroring between 20 (#A) and 20 (#B) is forcibly stopped. This is because the write from the DB server 13 (#B) newly executed on the server machine 10 (#B) that is the standby system and the DB server 13 (# #) of the server machine 10 (#A) that is the active system. This is to avoid data destruction due to the mixed write from A).

しかしながら、待機系であったサーバマシン１０（＃Ｂ）が、稼動系であったサーバマシン１０（＃Ａ）のＩＰアドレスやデータ等を全て引き継ぐために、データベースクライアント（以降、「ＤＢクライアント」と称する）６１は、ＤＢサーバ１３（＃Ａ）がＤＢサーバ１３（＃Ｂ）へフェールオーバしたことを意識せず、トランザクションの整合性が崩れる可能性がある。 However, in order for the server machine 10 (#B) that is the standby system to take over all of the IP address, data, and the like of the server machine 10 (#A) that is the active system, the database client (hereinafter referred to as “DB client”). 61) does not recognize that the DB server 13 (#A) has failed over to the DB server 13 (#B), and there is a possibility that transaction consistency may be lost.

例えば、図２１に示すように、サーバマシン１０（＃Ａ）のＤＢサーバ１３（＃Ａ）が、クライアント６０内のＤＢクライアント６１からのコミット要求を受け（２０１）、そのままスローダウンした場合（２０２）、クラスタソフト１２（＃Ｂ）がハートビートａ切れを検出する（２０３）ことにより、ミラーリングが中止される（２０４）とともに、サーバマシン１０（＃Ｂ）はフェールオーバを行う（２０５）。しかし、その後復帰したサーバマシン１０（＃Ａ）のＤＢサーバ１３（＃Ａ）は、自らのミラーリングディスク装置２０（＃Ａ）にｗｒｉｔｅし、正常終了する（２０６）。このとき、ミラーリングは既に停止されているので行われない（２０７）が、ＤＢクライアント６１にコミット成功を返す（２０８）。 For example, as shown in FIG. 21, when the DB server 13 (#A) of the server machine 10 (#A) receives a commit request from the DB client 61 in the client 60 (201) and slows down as it is (202 ) When the cluster software 12 (#B) detects that the heartbeat a is expired (203), the mirroring is stopped (204) and the server machine 10 (#B) performs failover (205). However, the DB server 13 (#A) of the restored server machine 10 (#A) writes to its mirroring disk device 20 (#A) and completes normally (206). At this time, since mirroring is already stopped, it is not performed (207), but a commit success is returned to the DB client 61 (208).

しかし、サーバマシン１０（＃Ｂ）にはこのコミットが反映されていないため、フェールオーバ後にサーバマシン１０（＃Ｂ）が、ＤＢクライアント６１からのロールバック要求を受けると（２０９）、サーバマシン１０（＃Ｂ）のＤＢサーバ１３（＃Ｂ）は、ＤＢクライアント６１が意図したチェックポイントよりも１つ前のチェックポイントまでロールバックされてしまう（２１０）。さらにその後、ＤＢクライアント６１からサーバマシン１０（＃Ｂ）に対してなされた要求（２１１）にしたがって、データベースＢにてデータベースの更新やコミットが行われると（２１２）、サーバマシン１０（＃Ａ）のＤＢサーバ１３（＃Ａ）もサーバマシン１０（＃Ｂ）のＤＢサーバ１３（＃Ｂ）も、本来あるべき姿とは異なる状態になってしまう。 However, since this commit is not reflected in the server machine 10 (#B), when the server machine 10 (#B) receives a rollback request from the DB client 61 after failover (209), the server machine 10 (#B) The DB server 13 (#B) of #B) is rolled back to a checkpoint one before the checkpoint intended by the DB client 61 (210). After that, when the database is updated or committed in the database B according to the request (211) made from the DB client 61 to the server machine 10 (#B) (212), the server machine 10 (#A) The DB server 13 (#A) of the server and the DB server 13 (#B) of the server machine 10 (#B) will be in different states from what they should be.

同様の問題は、図２２に示すように、ミラーリングディスク装置２０（＃Ａ，＃Ｂ）の代わりに、サーバマシン１０（＃Ａ）によってなされたデータを書き込むためのサーバＡ用データ領域３２（＃Ａ）と、サーバマシン１０（＃Ｂ）によってなされたデータを書き込むためのサーバＢ用データ領域３２（＃Ｂ）とを備えた共有ディスク装置３１を用い、クラスタを組んだ場合においても発生する可能性がある。 As shown in FIG. 22, the same problem occurs in the server A data area 32 (# for writing data made by the server machine 10 (#A) instead of the mirroring disk device 20 (#A, #B). This can occur even when a cluster is formed by using the shared disk device 31 including A) and the server B data area 32 (#B) for writing data made by the server machine 10 (#B). There is sex.

すなわち、図２２に示すように、サーバマシン１０（＃Ａ）のＤＢサーバ１３（＃Ａ）が、クライアント６０内のＤＢクライアント６１からのコミット要求を受け（３０１）、そのままスローダウンした場合（３０２）、クラスタソフト１２（＃Ｂ）がハートビートａ切れを検出すると（３０３）、サーバＡ用データ領域３２（＃Ａ）のデータがサーバＢ用データ領域３２（＃Ｂ）にコピーされる（３０４）とともに、サーバマシン１０（＃Ｂ）はフェールオーバを行う（３０５）。しかし、その後復帰したサーバマシン１０（＃Ａ）のＤＢサーバ１３（＃Ａ）は、サーバＡ用データ領域３２（＃Ａ）にｗｒｉｔｅし、正常終了し（３０６）、ＤＢクライアント６１にコミット成功を返す（３０７）。 That is, as shown in FIG. 22, when the DB server 13 (#A) of the server machine 10 (#A) receives a commit request from the DB client 61 in the client 60 (301) and slows down as it is (302) ) When the cluster software 12 (#B) detects that the heartbeat a is expired (303), the data in the server A data area 32 (#A) is copied to the server B data area 32 (#B) (304). ), The server machine 10 (#B) performs failover (305). However, the DB server 13 (#A) of the server machine 10 (#A) that has returned thereafter writes to the server A data area 32 (#A), ends normally (306), and commits successfully to the DB client 61. Return (307).

しかし、サーバマシン１０（＃Ｂ）にはこのコミットが反映されていないため、フェールオーバ後にサーバマシン１０（＃Ｂ）が、ＤＢクライアント６１からのロールバック要求を受けると（３０８）、サーバマシン１０（＃Ｂ）のＤＢサーバ１３（＃Ｂ）は、ＤＢクライアント６１が意図したチェックポイントよりも１つ前のチェックポイントまでロールバックされてしまう（３０９）。さらにその後、ＤＢクライアント６１からサーバマシン１０（＃Ｂ）に対してなされた要求（３１０）にしたがって、ＤＢサーバ１３（＃Ｂ）にてデータベースの更新やコミットが行われると（３１１）、サーバマシン１０（＃Ａ）のＤＢサーバ１３（＃Ａ）もサーバマシン１０（＃Ｂ）のＤＢサーバ１３（＃Ｂ）も、本来あるべき姿とは異なる状態になってしまう。 However, since this commit is not reflected in the server machine 10 (#B), when the server machine 10 (#B) receives a rollback request from the DB client 61 after failover (308), the server machine 10 (#B) The DB server 13 (#B) of #B) is rolled back to a checkpoint one before the checkpoint intended by the DB client 61 (309). Thereafter, when the database server 13 (#B) updates or commits the database (311) according to the request (310) made from the DB client 61 to the server machine 10 (#B), the server machine The DB server 13 (#A) of the 10 (#A) and the DB server 13 (#B) of the server machine 10 (#B) will be in different states from what they should be.

以上のことから、従来のクラスタシステムでは、元稼動系であったサーバマシン１０（＃Ａ）が復帰してハートビートａが復活した場合には、即座にサーバマシン１０（＃Ａ）及びサーバマシン１０（＃Ｂ）を停止あるいは再起動したり、クラスタソフト１２（＃Ａ），１２（＃Ｂ）間の通信路５０を多重化することにより、通信路５０の一つに障害が発生した場合であってもハートビートａの断絶が発生しないような対策が講じられている。
東芝レビューＶｏｌ．５４Ｎｏ．１２（１９９９）、１８〜２１ページ From the above, in the conventional cluster system, when the server machine 10 (#A) that was the original active system is restored and the heartbeat a is restored, the server machine 10 (#A) and the server machine are immediately restored. 10 (#B) is stopped or restarted, or a failure occurs in one of the communication paths 50 by multiplexing the communication paths 50 between the cluster software 12 (#A) and 12 (#B) Even so, measures are taken to prevent the heartbeat a from being interrupted.
Toshiba Review Vol. 54 No. 12 (1999), pages 18-21

しかしながら、このような従来のクラスタシステムでは、以下のような問題がある。 However, such a conventional cluster system has the following problems.

すなわち、上述した対策を講じても、上記［３］稼動系のサーバマシン１０におけるＣＰＵ高負荷等により一時的なスローダウンが発生した場合には、フェールオーバした直後にトランザクションの不整合が発生するという可能性を完全に回避できるものではないという問題がある。 That is, even if the above measures are taken, if a temporary slowdown occurs due to a high CPU load or the like in the [3] active server machine 10, transaction inconsistency occurs immediately after failover. There is a problem that the possibility is not completely avoided.

この不整合は、例えば図２２に示すクラスタシステムを、図２３に示すように、共有ディスク装置３１に共有領域３３を設け、共有領域３３に格納されたディスクアクセス権管理テーブル３３ａによって共有ディスク装置３１へのアクセス権を管理するとともに、サーバマシン１０にフィルタドライバ１４を設け、共有ディスク装置３１へのデータの書き込み（Ｗｒｉｔｅ）や読み出し（Ｒｅａｄ）（以下、Ｉ／Ｏと呼ぶ）が発生するたびにアクセス権の有無に応じて処理結果を上位であるクライアント６０に戻す方式をとり、フェールオーバ時に、待機系であるサーバマシン１０（＃Ｂ）が、稼動系であるサーバマシン１０（＃Ａ）のアクセス権を奪うように改良することによって回避することができる。 For example, in the cluster system shown in FIG. 22, a shared area 33 is provided in the shared disk apparatus 31 as shown in FIG. 23, and the shared disk apparatus 31 is managed by the disk access right management table 33a stored in the shared area 33. The server machine 10 is provided with a filter driver 14 every time data write (Write) or read (Read) (hereinafter referred to as I / O) occurs in the shared disk device 31. A method of returning the processing result to the upper client 60 according to the presence or absence of the access right is used, and at the time of failover, the server machine 10 (#B) as the standby system accesses the server machine 10 (#A) as the active system It can be avoided by improving to take away the right.

すなわち、この構成では、サーバマシン１０（＃Ａ）のＤＢサーバ１３（＃Ａ）が、クライアント６０内のＤＢクライアント６１からのコミット要求を受け（４０１）、そのままスローダウンした場合（４０２）、クラスタソフト１２（＃Ｂ）がハートビートａ切れを検出すると（４０３）、クラスタソフト１２（＃Ｂ）は更に、ディスクアクセス権管理テーブル３３ａを、サーバマシン１０（＃Ａ）ではなくサーバマシン１０（＃Ｂ）に共有ディスク装置３１へのアクセス権が与えられるように書き換える（４０４）。 That is, in this configuration, when the DB server 13 (#A) of the server machine 10 (#A) receives a commit request from the DB client 61 in the client 60 (401) and slows down as it is (402), the cluster When the software 12 (#B) detects that the heartbeat a is expired (403), the cluster software 12 (#B) further sets the disk access right management table 33a in the server machine 10 (#A) instead of the server machine 10 (#A). B) is rewritten so that the access right to the shared disk device 31 is given (404).

その後、サーバＡ用データ領域３２（＃Ａ）のデータがサーバＢ用データ領域３２（＃Ｂ）にコピーされる（４０５）とともに、サーバマシン１０（＃Ｂ）がフェールオーバする（４０６）。しかし、その後サーバマシン１０（＃Ａ）のＤＢサーバ１３（＃Ａ）が復帰すると、フィルタドライバ１４（＃Ａ）は、ＤＢサーバ１３（＃Ａ）による処理結果をサーバＡ用データ領域３２（＃Ａ）にｗｒｉｔｅし（４０７）、更に、ディスクアクセス権管理テーブル３３ａを参照して、サーバマシン１０（＃Ａ）にアクセス権が与えられているかを確認する（４０８）。 Thereafter, the data in the server A data area 32 (#A) is copied to the server B data area 32 (#B) (405), and the server machine 10 (#B) fails over (406). However, when the DB server 13 (#A) of the server machine 10 (#A) is subsequently restored, the filter driver 14 (#A) displays the processing result of the DB server 13 (#A) as the server A data area 32 (# A) is written to (A) (407), and further, referring to the disk access right management table 33a, it is confirmed whether or not the access right is given to the server machine 10 (#A) (408).

そして、アクセス権が与えられていれば正常リターンがフィルタドライバ１４（＃Ａ）からＤＢサーバ１３（＃Ａ）へ返され、アクセス権が与えられていなければエラーリターンが返される（４０９）。この場合、サーバマシン１０（＃Ｂ）にアクセス権が与えられているので、エラーが返される。更に、このエラーがＤＢサーバ１３（＃Ａ）からＤＢクライアント６１へ返される（４１０）。 If the access right is given, a normal return is returned from the filter driver 14 (#A) to the DB server 13 (#A). If the access right is not given, an error return is returned (409). In this case, since an access right is given to the server machine 10 (#B), an error is returned. Further, this error is returned from the DB server 13 (#A) to the DB client 61 (410).

これによりクライアント６０は、（４０１）でサーバマシン１０（＃Ａ）に行ったコミット要求に対する処理がエラーになったことを把握し、今度はサーバマシン１０（＃Ｂ）に要求することによりコミット要求のリトライを行う（４１１）。それに対しＤＢサーバ１３（＃Ｂ）がコミット要求完了をＤＢクライアント６１に通知する（４１２）。さらにその後、ＤＢクライアント６１がサーバマシン１０（＃Ｂ）に対して行ったデータベース更新要求（４１３）にしたがって、ＤＢサーバ１３（＃Ｂ）にてデータベースの更新が行われる（４１４）。 As a result, the client 60 grasps that the processing for the commit request made to the server machine 10 (#A) in (401) has an error, and this time requests the server machine 10 (#B) to make a commit request. Is retried (411). In response to this, the DB server 13 (#B) notifies the DB client 61 of the completion of the commit request (412). Thereafter, the database is updated in the DB server 13 (#B) in accordance with the database update request (413) made by the DB client 61 to the server machine 10 (#B) (414).

その後は、ＤＢクライアント６１からＤＢサーバ１３（＃Ｂ）に対してデータベースのロールバック要求がなされ（４１５）、それに対して、ＤＢサーバ１３（＃Ｂ）からＤＢクライアント６１へロールバック要求完了（成功）が通知される（４１６）。続いて、ＤＢクライアント６１からＤＢサーバ１３（＃Ｂ）に対してデータベースの更新要求がなされ（４１７）、それに対して、ＤＢサーバ１３（＃Ｂ）からＤＢクライアント６１へ更新要求完了（成功）が通知される（４１８）。更に、ＤＢクライアント６１からＤＢサーバ１３（＃Ｂ）に対してコミット要求がなされ（４１９）、それに対して、ＤＢサーバ１３（＃Ｂ）からＤＢクライアント６１へコミット要求完了（成功）が通知される（４２０）。 Thereafter, a database rollback request is made from the DB client 61 to the DB server 13 (#B) (415). In response to this, a rollback request is completed from the DB server 13 (#B) to the DB client 61 (success). ) Is notified (416). Subsequently, a database update request is made from the DB client 61 to the DB server 13 (#B) (417). In response to this, an update request completion (success) is sent from the DB server 13 (#B) to the DB client 61. Notification is made (418). Further, a commit request is made from the DB client 61 to the DB server 13 (#B) (419), and in response thereto, the commit request completion (success) is notified from the DB server 13 (#B) to the DB client 61. (420).

しかしながら、図２３に示すような構成では、ディスクＩ／Ｏ（ｒｅａｄやｗｒｉｔｅ）が１回発生するたびに共有ディスク装置３１へのＩ／Ｏ（ｒｅａｄやｗｒｉｔｅ）が１回発生するため、ディスクＩ／Ｏの性能劣化が大きくなってしまうという新たな問題が発生する。 However, in the configuration shown in FIG. 23, each time a disk I / O (read or write) occurs once, I / O (read or write) to the shared disk device 31 occurs once. A new problem arises that the performance degradation of / O becomes large.

一方、図２４は、図２３に示すような改良点を、図２１に示すミラーリングディスク装置２０を備えたクラスタシステムに適用した構成を示す。 On the other hand, FIG. 24 shows a configuration in which the improvement as shown in FIG. 23 is applied to the cluster system including the mirroring disk device 20 shown in FIG.

すなわち、この構成では、サーバマシン１０（＃Ａ）のＤＢサーバ１３（＃Ａ）が、クライアント６０内のＤＢクライアント６１からのコミット要求を受け（５０１）、これに基づいてフィルタドライバ１４（＃Ａ）が、コミット要求に対応するｗｒｉｔｅをミラーリングディスク装置２０（＃Ａ）に発行する（５０２）。 That is, in this configuration, the DB server 13 (#A) of the server machine 10 (#A) receives a commit request from the DB client 61 in the client 60 (501), and based on this, the filter driver 14 (#A ) Issues a write corresponding to the commit request to the mirroring disk device 20 (#A) (502).

そして、その後、ミラーリングディスク装置２０（＃Ａ）が、ミラーリングディスク装置２０（＃Ｂ）にデータをコピーしてミラーリングする際に、サーバマシン１０（＃Ａ）がスローダウンし（５０３）、その後復帰するものとする。この場合、スローダウンが発生した時点でクラスタソフト１２（＃Ｂ）がハートビートａ切れを検出する（５０４）。その後、サーバマシン１０（＃Ａ）が復帰しても、通信タイムアウト等によりミラーリングディスク装置２０（＃Ａ）からミラーリングディスク装置２０（＃Ｂ）へのミラーリングは失敗してしまう（５０５）。 After that, when the mirroring disk device 20 (#A) copies and mirrors data to the mirroring disk device 20 (#B), the server machine 10 (#A) slows down (503) and then returns. It shall be. In this case, when the slowdown occurs, the cluster software 12 (#B) detects that the heartbeat a has expired (504). Thereafter, even if the server machine 10 (#A) returns, mirroring from the mirroring disk device 20 (#A) to the mirroring disk device 20 (#B) fails due to a communication timeout or the like (505).

その後、フィルタドライバ１４（＃Ａ）は、ディスクアクセス権管理テーブル３３ａを参照して、サーバマシン１０（＃Ａ）にアクセス権が与えられているかを確認する（５０６）。この時点で、クラスタソフト１２（＃Ｂ）によって、ディスクアクセス権管理テーブル３３ａが、アクセス権がサーバマシン１０（＃Ｂ）に与えられるように書き換えられていなければ、すなわち、（５０６）のタイミングの後に、クラスタソフト１２（＃Ｂ）によって、ディスクアクセス権管理テーブル３３ａが、アクセス権がサーバマシン１０（＃Ｂ）に与えられるように書き換えられる（５０７）のであれば、フィルタドライバ１４（＃Ａ）は、ＤＢサーバ１３（＃Ａ）に正常リターンを返す（５０８）。 Thereafter, the filter driver 14 (#A) refers to the disk access right management table 33a and confirms whether the access right is given to the server machine 10 (#A) (506). At this time, the disk access right management table 33a is not rewritten by the cluster software 12 (#B) so that the access right is given to the server machine 10 (#B), that is, at the timing of (506). Later, if the cluster software 12 (#B) rewrites the disk access right management table 33a so that the access right is given to the server machine 10 (#B) (507), the filter driver 14 (#A) Returns a normal return to the DB server 13 (#A) (508).

その後、サーバマシン１０（＃Ｂ）がフェールオーバする（５０９）。一方、ＤＢサーバ１３（＃Ａ）は、ＤＢクライアント６１へコミット処理結果（成功）を通知する（５１０）。 Thereafter, the server machine 10 (#B) fails over (509). On the other hand, the DB server 13 (#A) notifies the DB client 61 of the commit processing result (success) (510).

しかし、サーバマシン１０（＃Ｂ）にはこのコミットが反映されていないため、フェールオーバ後にサーバマシン１０（＃Ｂ）が、ＤＢクライアント６１からのロールバック要求を受けると（５１１）、サーバマシン１０（＃Ｂ）のＤＢサーバ１３（＃Ｂ）は、ＤＢクライアント６１が意図したチェックポイントよりも１つ前のチェックポイントまでロールバックされてしまう（５１２）。さらにその後、ＤＢクライアント６１からサーバマシン１０（＃Ｂ）に対してなされた要求（５１３）にしたがって、ＤＢサーバ１３（＃Ｂ）にてデータベースの更新やコミットが行われると（５１４）、サーバマシン１０（＃Ａ）のＤＢサーバ１３（＃Ａ）もサーバマシン１０（＃Ｂ）のＤＢサーバ１３（＃Ｂ）も、本来あるべき姿とは異なる状態になってしまう。 However, since this commit is not reflected in the server machine 10 (#B), when the server machine 10 (#B) receives a rollback request from the DB client 61 after the failover (511), the server machine 10 (#B) The DB server 13 (#B) of #B) is rolled back to a checkpoint one before the checkpoint intended by the DB client 61 (512). Thereafter, when the database server 13 (#B) updates or commits the database (514) according to the request (513) made from the DB client 61 to the server machine 10 (#B), the server machine The DB server 13 (#A) of the 10 (#A) and the DB server 13 (#B) of the server machine 10 (#B) will be in different states from what they should be.

このように、図２３に示すような改良点を、ミラーリングディスク装置２０を備えたクラスタシステムに適用した場合には、フェールオーバした直後にトランザクションの不整合が発生するという可能性を完全に回避できるものではないという問題は依然として解決されない。 As described above, when the improvement as shown in FIG. 23 is applied to the cluster system including the mirroring disk device 20, the possibility that transaction inconsistency occurs immediately after failover can be completely avoided. The problem is not solved.

本発明はこのような事情に鑑みてなされたものであり、複数のサーバマシンを備えてなるクラスタシステムにおいて、フェールオーバが発生した場合であっても、ディスクＩ／Ｏの性能劣化を抑えながら、トランザクションの整合を図ることが可能なクラスタシステム及びプログラムを提供することを目的とする。 The present invention has been made in view of such circumstances, and in a cluster system including a plurality of server machines, even if a failover occurs, a transaction can be performed while suppressing deterioration in disk I / O performance. It is an object of the present invention to provide a cluster system and a program capable of matching the above.

上記の目的を達成するために、本発明では、以下のような手段を講じる。 In order to achieve the above object, the present invention takes the following measures.

すなわち、請求項１の発明は、複数のサーバマシンと、複数のサーバマシンに共有して接続された共有ディスク装置とから構成されたクラスタシステムであって、複数のサーバマシンは、複数のサーバマシン上でそれぞれ動作するアプリケーションと、複数のサーバマシン上でそれぞれ動作するクラスタソフトと、複数のサーバマシン上でそれぞれ動作するフィルタドライバと、複数のサーバマシン上でそれぞれ動作する有効期限通知デーモンとを備えている。また、共有ディスク装置は、各サーバマシンのデータをそれぞれ格納する各サーバ用データ領域と、各サーバマシンで共有する共有領域とを備え、共有領域は、各サーバマシンのそれぞれについて、共有ディスク装置へのアクセスの許可又は不許可が設定されたアクセス情報を管理するアクセス管理部を備えている。 That is, the invention of claim 1 is a cluster system comprising a plurality of server machines and a shared disk device shared and connected to the plurality of server machines, wherein the plurality of server machines are a plurality of server machines. Each of the above applications, cluster software that runs on multiple server machines, filter drivers that run on multiple server machines, and expiration date notification daemons that run on multiple server machines ing. The shared disk device includes a server data area for storing data of each server machine, and a shared area shared by each server machine. The shared area is connected to the shared disk device for each server machine. An access management unit for managing access information in which permission or denial of access is set.

また、有効期限通知デーモンは、アクセス管理部によって管理されているアクセス情報を定期的に参照し、自サーバマシンによるアクセスが許可されているのであれば、現在時間に所定時間を加えた有効期限を設定し、この有効期限を自サーバマシンのフィルタドライバに通知し、複数のサーバマシンに備えられた各クラスタソフトは、互いに定期的に通信して互いのサーバマシンの生存状態を確認し合うと共に、それぞれ稼動系と待機系との２種類の状態を持つ。 The expiration notification daemon periodically refers to the access information managed by the access management unit, and if access by the local server machine is permitted, the expiration notification daemon adds a predetermined time to the current time. Set and notify the filter driver of this server machine of this expiration date, and each cluster software provided in multiple server machines communicate with each other periodically to check the survival status of each other's server machine, Each has two types of states, an active system and a standby system.

そして、クラスタソフトが稼動系になる際には、このクラスタソフトは、アクセス管理部に管理されたアクセス情報を、自サーバマシンのみがアクセスを許可されるように変更し、自サーバマシン側から自サーバマシンのサーバ用データ領域が見える状態にし、自サーバマシンに備えられたアプリケーションを起動させる。クラスタソフトが稼動系になった後は、稼動系のサーバマシンのフィルタドライバは、Ｉ／Ｏ入力を待ち、Ｉ／Ｏ入力の処理結果の一部あるいは全てが成功の場合、現在時間を確認し、現在時間が、有効期限通知デーモンから通知された有効期限内であれば処理結果をＩ／Ｏ入力側へ返し、有効期限内でなければＩ／Ｏ入力側へエラーを返す。待機系のサーバマシンのクラスタソフトが、稼動系のサーバマシンが生存していないと判定した場合には、待機系のサーバマシンのクラスタソフトは、アクセス管理部に管理されたアクセス情報を、自サーバマシンのみがアクセスを許可されるように変更し、所定時間待機した後に、自サーバマシン側から自サーバマシンのサーバ用データ領域が見える状態にするとともに、自サーバマシンに備えられたアプリケーションを起動させる。 When the cluster software becomes an active system, the cluster software changes the access information managed by the access management unit so that only the local server machine is allowed access, and the local server machine automatically changes the access information. Make the server data area of the server machine visible, and start the application provided on the server machine. After the cluster software becomes active, the filter driver of the active server machine waits for I / O input, and if some or all of the I / O input processing results are successful, confirms the current time. If the current time is within the expiration date notified from the expiration date notification daemon, the processing result is returned to the I / O input side, and if it is not within the expiration date, an error is returned to the I / O input side. If the cluster software on the standby server machine determines that the active server machine is not alive, the cluster software on the standby server machine sends the access information managed by the access management unit to its own server. Change so that only the machine is allowed access, wait for a certain period of time, and then make the server data area of the local server machine visible from the local server machine and start the application provided on the local server machine .

本発明によれば、複数のサーバマシンを備えてなるクラスタシステムにおいて、フェールオーバが発生した場合であっても、ディスクＩ／Ｏの性能劣化を抑えながら、トランザクションの整合を図ることが可能なクラスタシステムを実現することができる。 According to the present invention, in a cluster system including a plurality of server machines, even when failover occurs, a cluster system capable of achieving transaction matching while suppressing performance degradation of disk I / O. Can be realized.

以下に、本発明を実施するための最良の形態について図面を参照しながら説明する。 The best mode for carrying out the present invention will be described below with reference to the drawings.

なお、以下の各実施の形態の説明に用いる図中の符号は、図２０乃至図２４と同一部分については同一符号を付して示し、重複説明を省略する。 In addition, the code | symbol in the figure used for description of each following embodiment attaches | subjects and shows the same code | symbol about the same part as FIG. 20 thru | or FIG.

（第１の実施の形態）
図１は、第１の実施の形態に係るクラスタシステムの構成例を示す機能ブロック図である。 (First embodiment)
FIG. 1 is a functional block diagram illustrating a configuration example of the cluster system according to the first embodiment.

すなわち、本実施の形態に係るクラスタシステムは、複数のサーバマシン（ここでは、一例として２つのサーバマシン１０（＃Ａ，＃Ｂ）を示す）と、これらサーバマシン１０（＃Ａ，＃Ｂ）に接続された共有ディスク装置３１とから構成されたクラスタシステムである。ここでは、仮に、初期状態として、サーバマシン１０（＃Ａ）が稼動系、サーバマシン１０（＃Ｂ）が待機系であるとする。各サーバマシン１０（＃Ａ，＃Ｂ）はそれぞれ、各サーバマシン１０（＃Ａ，＃Ｂ）上でそれぞれ動作するアプリケーション１１（＃Ａ,＃Ｂ）、クラスタソフト１２（＃Ａ，＃Ｂ）、フィルタドライバ１４（＃Ａ，＃Ｂ）、ディスクドライバ１５（＃Ａ，＃Ｂ）、有効期限通知デーモン１６（＃Ａ，＃Ｂ）を備えている。フィルタドライバ１４は、アプリケーション１１とディスクドライバ１５との間に介挿して設けられる。アプリケーション１１は、主にサーバアプリケーションを想定しており、本実施の形態では、プロキシキャッシュサーバであるとする。 That is, the cluster system according to the present embodiment includes a plurality of server machines (here, two server machines 10 (#A, #B) are shown as an example) and these server machines 10 (#A, #B). This is a cluster system composed of a shared disk device 31 connected to. Here, it is assumed that the server machine 10 (#A) is an active system and the server machine 10 (#B) is a standby system as an initial state. Each server machine 10 (#A, #B) has an application 11 (#A, #B) and cluster software 12 (#A, #B) respectively running on each server machine 10 (#A, #B). , Filter driver 14 (#A, #B), disk driver 15 (#A, #B), and expiration date notification daemon 16 (#A, #B). The filter driver 14 is provided between the application 11 and the disk driver 15. The application 11 mainly assumes a server application, and is assumed to be a proxy cache server in the present embodiment.

共有ディスク装置３１は、各サーバマシン１０（＃Ａ，＃Ｂ）それぞれとＦＣ（Fiber Channel）ケーブル５１（＃Ａ，＃Ｂ）で接続されており、更にサーバＡ用データ領域３２（＃Ａ）、サーバＢ用データ領域３２（＃Ｂ）、及び共有領域３３を持つ。これらは、それぞれを異なるＬＵ（Logical Unit）でも、異なるパーティションでも、異なるファイルでも良い。 The shared disk device 31 is connected to each of the server machines 10 (#A, #B) via FC (Fiber Channel) cables 51 (#A, #B), and further to the server A data area 32 (#A). And a server B data area 32 (#B) and a shared area 33. These may be different LUs (Logical Units), different partitions, or different files.

図２は、共有ディスク装置３１の詳細構成例を示す概念図である。 FIG. 2 is a conceptual diagram showing a detailed configuration example of the shared disk device 31.

図２に示すように共有領域３３は、各サーバマシン１０（＃Ａ，＃Ｂ）のそれぞれについて、共有ディスク装置３１へのアクセスの許可又は不許可が設定されたアクセス情報を管理するディスクアクセス権管理テーブル３３ａを保持している。図２に示すように、サーバＡが「有」、サーバＢが「無」と設定されている場合は、サーバマシン１０（＃Ａ）のアクセスが許可され、サーバマシン１０（＃Ｂ）のアクセスが許可されていないことを示す。 As shown in FIG. 2, the shared area 33 is a disk access right for managing access information in which permission or non-permission of access to the shared disk device 31 is set for each of the server machines 10 (#A, #B). A management table 33a is held. As shown in FIG. 2, when the server A is set to “present” and the server B is set to “none”, the access of the server machine 10 (#A) is permitted and the access of the server machine 10 (#B) Indicates that is not allowed.

ディスクアクセス権管理テーブル３３ａは、有効期限通知デーモン１６が、定期的（例えば３０秒毎）に参照する。そして、有効期限通知デーモン１６は、自サーバマシン１０によるアクセスが許可されているのであれば、現在時間に予め定めた所定時間（例えば、６０秒）を加えた有効期限を設定し、この有効期限を自サーバマシン１０のフィルタドライバ１４に通知する。なお、有効期限通知デーモン１６は、例えば図３に示すように、時刻変更の影響を受けないもの（ブート後の起動時間等）から現在時間を把握する。 The expiration date notifying daemon 16 refers to the disk access right management table 33a periodically (for example, every 30 seconds). Then, if access by the server machine 10 is permitted, the expiration date notification daemon 16 sets an expiration date obtained by adding a predetermined time (for example, 60 seconds) to the current time, and this expiration date is set. Is sent to the filter driver 14 of the server machine 10. For example, as shown in FIG. 3, the expiration date notifying daemon 16 grasps the current time from the one that is not affected by the time change (such as the startup time after booting).

また、クラスタソフト１２（＃Ａ，＃Ｂ）は、互いに定期的に通信し、互いのサーバマシン１０の生存状態を確認し合うと共に、それぞれ稼動系と待機系との２種類の状態を持つ。以下の説明では、サーバマシン１０（＃Ａ）が稼動系であり、サーバマシン（＃Ｂ）が待機系であるものとする。したがって、クラスタソフト１２（＃Ａ）は稼動系の状態であり、クラスタソフト１２（＃Ｂ）は待機系の状態となっている。 Further, the cluster software 12 (#A, #B) communicates with each other periodically, confirms the survival state of each server machine 10, and has two types of states: an active system and a standby system. In the following description, it is assumed that the server machine 10 (#A) is an active system and the server machine (#B) is a standby system. Therefore, the cluster software 12 (#A) is in an active state, and the cluster software 12 (#B) is in a standby state.

クラスタソフト１２（＃Ａ）が稼動系になる際には、クラスタソフト１２（＃Ａ）は、ディスクアクセス権管理テーブル３３ａの情報を、自サーバマシン１０（＃Ａ）のみがアクセス許可されるように設定し、自サーバマシン１０（＃Ａ）側からサーバ用データ領域３２（＃Ａ）が見える状態にし、自サーバマシン１０（＃Ａ）に備えられたアプリケーション１１（＃Ａ）を起動させる。 When the cluster software 12 (#A) becomes an active system, the cluster software 12 (#A) is permitted to access only the information of the disk access right management table 33a only to its own server machine 10 (#A). And the server data area 32 (#A) can be seen from the server machine 10 (#A) side, and the application 11 (#A) provided in the server machine 10 (#A) is started.

このようにしてクラスタソフト１２（＃Ａ）が稼動系になった後は、フィルタドライバ１４（＃Ａ）は、例えば図２１に示すようなクライアント６０からのＩ／Ｏ入力を待ち、このＩ／Ｏ入力の処理結果の一部あるいは全てが成功の場合、現在時刻を確認し、現在時刻が、有効期限通知デーモン１６（＃Ａ）から通知された有効期限内であれば、処理結果を例えばクライアント６０のようなＩ／Ｏ入力側へ返し、有効期限内でなければＩ／Ｏ入力側へエラーを返す。このような現在時刻の確認は、フィルタドライバ１４（＃Ａ）が処理結果をＩ／Ｏ入力側に返すたびに行う。 After the cluster software 12 (#A) becomes active in this way, the filter driver 14 (#A) waits for an I / O input from the client 60 as shown in FIG. When a part or all of the processing results of the O input are successful, the current time is confirmed, and if the current time is within the expiration date notified from the expiration date notification daemon 16 (#A), the processing result is, for example, the client Returns to the I / O input side such as 60, and returns an error to the I / O input side if it is not within the validity period. Such confirmation of the current time is performed every time the filter driver 14 (#A) returns the processing result to the I / O input side.

一方、待機系のサーバマシン１０（＃Ｂ）のクラスタソフト１２（＃Ｂ）が、稼動系のサーバマシン１０（＃Ａ）が生存していないと判定した場合には、クラスタソフト１２（＃Ｂ）は、ディスクアクセス権管理テーブル３３ａに設定されている情報を、サーバマシン１０（＃Ｂ）のみがアクセス許可されるように変更し、予め定めた所定時間（例えば３０秒）待機し、しかる後に、サーバマシン１０（＃Ｂ）側からサーバ用データ領域３２（＃Ｂ）が見える状態にするとともに、アプリケーション１１（＃Ｂ）を起動させる。 On the other hand, if the cluster software 12 (#B) of the standby server machine 10 (#B) determines that the active server machine 10 (#A) is not alive, the cluster software 12 (#B ) Changes the information set in the disk access right management table 33a so that only the server machine 10 (#B) is permitted to access, waits for a predetermined time (for example, 30 seconds), and then The server data area 32 (#B) can be seen from the server machine 10 (#B) side, and the application 11 (#B) is activated.

このように待機することにより、サーバマシン１０（＃Ｂ）がアクセス権を取得してから、即座にフェールオーバするのではなく、安全な時間帯に入ってからフェールオーバするようにしている。待機のための所定時間としては、任意に設定可能であるが、長くするほどフェールオーバに要する時間が長くなる。一方、短くするほどフェールオーバに要する時間は短くなるが、共有領域３３へのアクセス頻度も高くなるので、例えば３０秒のように１分以内が現実的である。 By waiting in this way, the server machine 10 (#B) does not immediately fail over after acquiring the access right, but fails over after entering a safe time zone. The predetermined time for standby can be arbitrarily set, but the longer it takes, the longer the time required for failover. On the other hand, as the time is shorter, the time required for failover becomes shorter, but the access frequency to the shared area 33 becomes higher.

次に、以上のように構成した本実施の形態に係るクラスタシステムの動作について説明する。ただし、初期状態として、サーバマシン１０（＃Ａ）とサーバマシン１０（＃Ｂ）との間ではクラスタソフト１２（＃Ａ，＃Ｂ）同士がハートビートを交換し、互いの生存が確認できており、サーバマシン１０（＃Ａ，＃Ｂ）ともに、共有領域３３と各サーバ用データ領域３２（＃Ａ，＃Ｂ）は上位層から見えない（フィルタドライバ１４がフェンスオフ）状態になっており、サーバマシン１０（＃Ａ，＃Ｂ）ともにアプリケーション１１（＃Ａ，＃Ｂ）は起動していないものとする。 Next, the operation of the cluster system according to this embodiment configured as described above will be described. However, as an initial state, the cluster software 12 (#A, #B) exchanges heartbeats between the server machine 10 (#A) and the server machine 10 (#B), and the existence of each other can be confirmed. The server machine 10 (#A, #B) is in a state where the shared area 33 and the server data areas 32 (#A, #B) are not visible from the upper layer (the filter driver 14 is fence-off). In addition, it is assumed that the application 11 (#A, #B) is not activated in either the server machine 10 (#A, #B).

まず、図４に示すフローチャートを用いて、サーバマシン１０（＃Ａ）を稼動系に設定する場合におけるクラスタソフト１２（＃Ａ）による処理の流れを説明する。 First, the flow of processing by the cluster software 12 (#A) when the server machine 10 (#A) is set as an active system will be described using the flowchart shown in FIG.

まず、ユーザ操作により、サーバマシン１０（＃Ａ）を稼動系にせよとの通知が、サーバマシン１０（＃Ａ）のクラスタソフト１２（＃Ａ）に届く（Ｓ１）。次に、クラスタソフト１２（＃Ａ）によって、サーバマシン１０（＃Ａ）にアクセス権が与えられるようにディスクアクセス権管理テーブル３３ａが設定される（Ｓ２）。すると、クラスタソフト１２（＃Ａ）によって、フィルタドライバ１４（＃Ａ）に稼動系となったことが通知される（Ｓ３）。その後、サーバマシン１０（＃Ａ）のフィルタドライバ１４（＃Ａ）によって、上位層からサーバＡ用データ領域３２（＃Ａ）が見える状態にされる（Ｓ４）。そして、サーバマシン１０（＃Ａ）のクラスタソフト１２（＃Ａ）によって、有効期限通知デーモン１６（＃Ａ）が起動され（Ｓ５）、更にアプリケーション１１（＃Ａ）が起動される（Ｓ６）。 First, a notification that the server machine 10 (#A) should be an active system is sent to the cluster software 12 (#A) of the server machine 10 (#A) (S1). Next, the disk access right management table 33a is set so that the access right is given to the server machine 10 (#A) by the cluster software 12 (#A) (S2). Then, the cluster software 12 (#A) notifies the filter driver 14 (#A) that it has become an active system (S3). Thereafter, the server A data area 32 (#A) is made visible from the upper layer by the filter driver 14 (#A) of the server machine 10 (#A) (S4). Then, the expiration date notification daemon 16 (#A) is activated by the cluster software 12 (#A) of the server machine 10 (#A) (S5), and further the application 11 (#A) is activated (S6).

次に、図５のフローチャートと、図６に示す概念図とを用いて、サーバマシン１０（＃Ａ）を稼動系に設定した後のフィルタドライバ１４（＃Ａ）による処理の流れを説明する。 Next, the flow of processing by the filter driver 14 (#A) after setting the server machine 10 (#A) to the active system will be described using the flowchart of FIG. 5 and the conceptual diagram shown in FIG.

まず、サーバマシン１０（＃Ａ）のフィルタドライバ１４（＃Ａ）が、例えば図２１に示すクライアント６０のような上位からのＩ／Ｏ入力を待つ（Ｓ１１）。そして、Ｉ／Ｏ入力がなされ、なされたＩ／Ｏ入力が有効期限の通知である場合（Ｓ１２：Ｙｅｓ）には、ステップＳ１３に進み、有効期限の通知でない場合（Ｓ１２：Ｎｏ）には、ステップＳ１４に進む。 First, the filter driver 14 (#A) of the server machine 10 (#A) waits for an I / O input from a host such as the client 60 shown in FIG. 21 (S11). If an I / O input is made and the I / O input made is a notification of an expiration date (S12: Yes), the process proceeds to step S13, and if it is not a notification of an expiration date (S12: No), Proceed to step S14.

ステップＳ１３では、通知された有効期限に有効期間を更新した後に、ステップＳ２４の処理に進む。 In step S13, the validity period is updated to the notified validity period, and then the process proceeds to step S24.

ステップＳ１４では、ステップＳ１１でなされたＩ／Ｏ入力がサーバＡ用データ領域３２（＃Ａ）へのｒｅａｄであればステップＳ１５へ、ｗｒｉｔｅであればステップＳ１６へ、それ以外であればステップＳ２２へそれぞれ進む。 In step S14, if the I / O input made in step S11 is read to the server A data area 32 (#A), go to step S15, if write, go to step S16, otherwise go to step S22. Proceed with each.

そして、ステップＳ１５ではｒｅａｄが、ステップＳ１６ではｗｒｉｔｅがそれぞれ実行され、ステップＳ１７の処理に進む。 In step S15, read is executed, and in step S16, write is executed, and the process proceeds to step S17.

ステップＳ１７では、Ｉ／Ｏが部分的に又は全部が成功した場合にはステップＳ１８の処理に進み、そうでない場合にはステップＳ２０の処理に進む。 In step S17, if the I / O is partially or wholly successful, the process proceeds to step S18. If not, the process proceeds to step S20.

ステップＳ１８では、現在の時刻が取得される。そして、ステップＳ１９では、有効期限通知デーモン１６（＃Ａ）から有効期限が通知され、ステップＳ１８で取得した現在の時刻が、有効期限内であるか否かが判定される。そして、有効期限内である場合には、Ｉ／Ｏの処理結果がそのまま上位へ返され（Ｓ２０）た後に、ステップＳ２４の処理に進む。 In step S18, the current time is acquired. In step S19, the expiration date is notified from the expiration date notification daemon 16 (#A), and it is determined whether or not the current time acquired in step S18 is within the expiration date. If it is within the expiration date, the I / O processing result is returned to the upper level as it is (S20), and then the process proceeds to step S24.

一方、ステップＳ１９において有効期限を過ぎていると判定された場合には、Ｉ／Ｏ処理失敗のエラーが上位へ返され（Ｓ２１）た後に、ステップＳ２４の処理に進む。 On the other hand, if it is determined in step S19 that the expiration date has passed, an I / O processing failure error is returned to the upper level (S21), and then the process proceeds to step S24.

ステップＳ２２では、フィルタドライバ１４（＃Ａ）からディスクドライバ１５（＃Ａ）に処理がそのまま渡され、ステップＳ２３において、ディスクドライバ１５（＃Ａ）からの返り値がそのまま上位に返された後に、ステップＳ２４の処理に進む。 In step S22, the process is directly passed from the filter driver 14 (#A) to the disk driver 15 (#A). In step S23, the return value from the disk driver 15 (#A) is returned to the upper level as it is. The process proceeds to step S24.

そして、ステップＳ２４では、ＯＳが終了するのであれば処理が終了し、そうでなければステップＳ１１の処理に戻る。 In step S24, if the OS ends, the process ends. Otherwise, the process returns to step S11.

次に、図７のフローチャートと、図６に示す概念図とを用いて、サーバマシン１０（＃Ａ）を稼動系に設定した後の有効期限通知デーモン１６（＃Ａ）による処理の流れを説明する。 Next, the flow of processing by the expiration date notification daemon 16 (#A) after the server machine 10 (#A) is set to the active system will be described using the flowchart of FIG. 7 and the conceptual diagram shown in FIG. To do.

有効期限通知デーモン１６（＃Ａ）は、例えば３０秒毎のように、ディスクアクセス権管理テーブル３３ａを定期的に参照する（Ｓ３１）。そして、サーバマシン１０（＃Ａ）にアクセス権が与えられていることを確認すると（Ｓ３２）、現在時刻を取得し（Ｓ３３）、現在時刻に予め定めた所定時間（例えば６０秒）を加えることによって有効期限を設定し（Ｓ３４）、この有効期限をフィルタドライバ１４（＃Ａ）に通知し（Ｓ３５）、ステップＳ３６の処理に進む。また、ステップＳ３２において、アクセス権が与えられていない場合もステップＳ３６の処理に進む。なお、ステップＳ３５の処理は、図５におけるステップＳ１９に対応する。 The expiration date notification daemon 16 (#A) periodically refers to the disk access right management table 33a, for example, every 30 seconds (S31). When it is confirmed that the access right is given to the server machine 10 (#A) (S32), the current time is acquired (S33), and a predetermined time (for example, 60 seconds) is added to the current time. The expiration date is set (S34), the expiration date is notified to the filter driver 14 (#A) (S35), and the process proceeds to step S36. In step S32, if the access right is not given, the process proceeds to step S36. Note that the processing in step S35 corresponds to step S19 in FIG.

ステップＳ３６では、例えば３０秒間スリープした後、ＯＳが終了しなければステップＳ３１の処理に戻り、ＯＳが終了するのであれば、この処理も終了する。 In step S36, for example, after sleeping for 30 seconds, if the OS does not end, the process returns to step S31. If the OS ends, this process also ends.

次に、図８に示すフローチャートと、図９に示す概念図とを用いて、サーバマシン１０（＃Ｂ）のクラスタソフト１２（＃Ｂ）がハートビート切れを検出し、フェールオーバするときのサーバマシン１０（＃Ｂ）の処理の流れを説明する。 Next, the server machine when the cluster software 12 (#B) of the server machine 10 (#B) detects a heartbeat break and fails over using the flowchart shown in FIG. 8 and the conceptual diagram shown in FIG. The flow of processing 10 (#B) will be described.

まず、サーバマシン１０（＃Ｂ）のクラスタソフト１２（＃Ｂ）がハートビートａ切れを検出する（Ｓ４１）と、クラスタソフト１２（＃Ｂ）によって、サーバマシン１０（＃Ｂ）にのみアクセス権が与えられるようにディスクアクセス権管理テーブル３３ａが設定される（Ｓ４２）。 First, when the cluster software 12 (#B) of the server machine 10 (#B) detects that the heartbeat a has expired (S41), the cluster software 12 (#B) has access rights only to the server machine 10 (#B). Is set in the disk access right management table 33a (S42).

次に、クラスタソフト１２（＃Ｂ）は、予め定めた所定時間（例えば、６０秒）スリープした（Ｓ４３）後に、サーバＡ用データ領域３２（＃Ａ）の全データをサーバＢ用データ領域３２（＃Ｂ）にコピーし（Ｓ４４）、更に有効期限通知デーモン１６（＃Ｂ）を起動し（Ｓ４５）、サーバマシン１０（＃Ｂ）が稼動系になったことをフィルタドライバ１４（＃Ｂ）に通知する（Ｓ４６）。 Next, the cluster software 12 (#B) sleeps for a predetermined time (for example, 60 seconds) (S43), and then all the data in the server A data area 32 (#A) is transferred to the server B data area 32. (S44), the expiration date notification daemon 16 (#B) is started (S45), and the fact that the server machine 10 (#B) has become an active system is indicated by the filter driver 14 (#B). (S46).

その後、サーバマシン１０（＃Ｂ）のフィルタドライバ１４（＃Ｂ）によって、上位層からサーバＢ用データ領域３２（＃Ｂ）が見える状態にされる（Ｓ４７）。そして、サーバマシン１０（＃Ｂ）のクラスタソフト１２（＃Ｂ）によってアプリケーション１１（＃Ｂ）が起動される（Ｓ４８）。 Then, the server B data area 32 (#B) is made visible from the upper layer by the filter driver 14 (#B) of the server machine 10 (#B) (S47). Then, the application 11 (#B) is activated by the cluster software 12 (#B) of the server machine 10 (#B) (S48).

上述したように、本実施の形態に係るクラスタシステムにおいては、上記のような作用により、待機系のサーバマシン１０（＃Ｂ）のクラスタソフト１２（＃Ｂ）が、稼動系のサーバマシン１０（＃Ａ）が生存していないと判定した場合には、サーバマシン１０（＃Ｂ）のみが共有ディスク装置３１へのアクセスが許可されるようにすることができる。 As described above, in the cluster system according to the present embodiment, the cluster software 12 (#B) of the standby server machine 10 (#B) is operated by the active server machine 10 ( When it is determined that #A) is not alive, only the server machine 10 (#B) can be permitted to access the shared disk device 31.

これにより、フェールオーバが発生した場合であっても、複数のサーバマシン１０（＃Ａ，＃Ｂ）の重複書き込みを回避することができ、トランザクションの整合を図ることが可能となる。 As a result, even if a failover occurs, it is possible to avoid the redundant writing of the plurality of server machines 10 (#A, #B), and to achieve transaction matching.

また、フェールオーバは、サーバマシン１０（＃Ｂ）がアクセス権を取得した後に直ちに実行するのではなく、予め定めた所定時間（例えば３０秒）待機し、安全な時間帯に入ってからフェールオーバするようにしている。 In addition, the failover is not executed immediately after the server machine 10 (#B) obtains the access right, but waits for a predetermined time (for example, 30 seconds), and fails over after entering a safe time zone. I have to.

以上のことから、本実施の形態に係るクラスタシステムは、複数のサーバマシンを備えてなるクラスタシステムにおいて、フェールオーバが発生した場合であっても、ディスクＩ／Ｏの性能劣化を抑えながら、かつトランザクションの整合を図ることが可能となる。 From the above, the cluster system according to the present embodiment is capable of performing transaction while suppressing deterioration of disk I / O performance even when failover occurs in a cluster system including a plurality of server machines. Can be matched.

（第２の実施の形態）
図１０は、第２の実施の形態に係るクラスタシステムの構成例を示す機能ブロック図である。本実施の形態に係るクラスタシステムは、第１の実施の形態に係るクラスタシステムの変形例であるので、第１の実施の形態と同一部位については同一符番で示して重複説明を省略し、異なる点について説明する。 (Second Embodiment)
FIG. 10 is a functional block diagram illustrating a configuration example of the cluster system according to the second embodiment. Since the cluster system according to the present embodiment is a modification of the cluster system according to the first embodiment, the same parts as those in the first embodiment are denoted by the same reference numerals, and redundant description is omitted. Different points will be described.

すなわち、本実施の形態に係るクラスタシステムは、第１の実施の形態に係るクラスタシステムの概念を、ミラーリングディスク装置２０を備えたクラスタシステムに適用したものである。 That is, the cluster system according to the present embodiment is an application of the concept of the cluster system according to the first embodiment to a cluster system including the mirroring disk device 20.

図１０に示すような本実施の形態に係るクラスタシステムの、図１に示すような第１の実施の形態に係るクラスタシステムとの相違点を説明すると、図１０に示すようなミラーリングディスク装置２０を備えたクラスタシステムの場合、図１に示すようなクラスタシステムとは異なり、共有ディスク装置３１を備えていない代わりに、サーバマシン１０（＃Ａ）、及びサーバマシン１０（＃Ｂ）のにはそれぞれミラーリングディスク装置２０（＃Ａ），２０（＃Ｂ）が接続されている。また、共有領域３３を保持するための共通コンピュータ７０を備えている。共有領域３３は、第１の実施の形態に係るクラスタシステムと同様にディスクアクセス権管理テーブル３３ａを保持しているが、更にそれに加えて、各サーバマシン１０（＃Ａ）,１０（＃Ｂ）のそれぞれについて、各ミラーリングディスク装置２０（＃Ａ），（＃Ｂ）へのミラーリングエラーの発生の有無を示すミラーリングエラー情報を管理するミラーリングエラー発生有無管理テーブル３３ｂをも保持している。また、各サーバマシン１０（＃Ａ），１０（＃Ｂ）はそれぞれ、ミラーリングエラー発生有無管理テーブル３３ｂの参照及び設定を行うミラーリングデーモン１７（＃Ａ），１７（＃Ｂ）を備えている。 The difference between the cluster system according to the present embodiment as shown in FIG. 10 and the cluster system according to the first embodiment as shown in FIG. 1 will be described. The mirroring disk device 20 as shown in FIG. 1 is different from the cluster system shown in FIG. 1 in that the server machine 10 (#A) and the server machine 10 (#B) are not provided with the shared disk device 31. Mirroring disk devices 20 (#A) and 20 (#B) are connected to each other. A common computer 70 for holding the shared area 33 is also provided. The shared area 33 holds a disk access right management table 33a as in the cluster system according to the first embodiment. In addition to this, each server machine 10 (#A), 10 (#B) Are also stored in a mirroring error occurrence management table 33b for managing mirroring error information indicating whether or not a mirroring error has occurred in each mirroring disk device 20 (#A) and (#B). Each of the server machines 10 (#A) and 10 (#B) includes mirroring daemons 17 (#A) and 17 (#B) for referencing and setting the mirroring error occurrence presence / absence management table 33b.

クラスタソフト１２（＃Ａ）が稼動系になった後は、第１の実施の形態と同様に、フィルタドライバ１４（＃Ａ）が、例えばクライアント６０からのＩ／Ｏ入力を待ち、このＩ／Ｏ入力の処理結果の一部あるいは全てが成功の場合、処理結果を、Ｉ／Ｏ入力側へ返すたびに現在時間を確認し、現在時間が、有効期限通知デーモン１６（＃Ａ）から通知された有効期限内であれば処理結果をＩ／Ｏ入力側へ返し、有効期限内でなければＩ／Ｏ入力側へエラーを返す。 After the cluster software 12 (#A) becomes an active system, the filter driver 14 (#A) waits for an I / O input from the client 60, for example, as in the first embodiment. When a part or all of the processing results of the O input are successful, the current time is confirmed each time the processing results are returned to the I / O input side, and the current time is notified from the expiration notification daemon 16 (#A). If it is within the valid period, the processing result is returned to the I / O input side, and if it is not within the valid period, an error is returned to the I / O input side.

しかしながら、処理結果が同一では無い場合に、本実施の形態では、ミラーリングデーモン１７（＃Ａ）が、ミラーリングエラー発生有無管理テーブル３３ｂに管理されたミラーリングエラー情報を、サーバマシン１０（＃Ａ）についてミラーリングエラーが発生しているように変更する。 However, if the processing results are not the same, in the present embodiment, the mirroring daemon 17 (#A) sends the mirroring error information managed in the mirroring error occurrence management table 33b to the server machine 10 (#A). Change so that a mirroring error has occurred.

そして、待機系のサーバマシン１０（＃Ｂ）のクラスタソフト１２（＃Ｂ）が、稼動系のサーバマシン１０（＃Ａ）が生存していないと判定した場合には、クラスタソフト１２（＃Ｂ）は、ディスクアクセス権管理テーブル３３ａに設定されている情報を、サーバマシン１０（＃Ｂ）のみがアクセス許可されるように変更し、予め定めた所定時間（例えば３０秒）待機した後に、ミラーリングエラー発生有無管理テーブル３３ｂに管理されたミラーリングエラー情報が、サーバマシン１０（＃Ｂ）についてミラーリングエラーが発生しているように設定されていないのであれば、サーバマシン１０（＃Ｂ）側から、ミラーリングディスク装置２０（＃Ｂ）が見える状態にするとともに、アプリケーション１１（＃Ｂ）を起動させる。 If the cluster software 12 (#B) of the standby server machine 10 (#B) determines that the active server machine 10 (#A) is not alive, the cluster software 12 (#B) ) Changes the information set in the disk access right management table 33a so that only the server machine 10 (#B) is permitted to access, and waits for a predetermined time (for example, 30 seconds) before mirroring. If the mirroring error information managed in the error occurrence management table 33b is not set so that a mirroring error has occurred for the server machine 10 (#B), from the server machine 10 (#B) side, While making the mirroring disk device 20 (#B) visible, the application 11 (#B) is activated.

次に、以上のように構成した本実施の形態に係るクラスタシステムの動作について説明する。ただし、初期状態として、サーバマシン１０（＃Ａ）とサーバマシン１０（＃Ｂ）との間ではクラスタソフト１２（＃Ａ，＃Ｂ）同士がハートビートを交換し、互いの生存が確認できており、サーバマシン１０（＃Ａ，＃Ｂ）ともに、共有領域３３と各ミラーリングディスク装置２０（＃Ａ，＃Ｂ）は上位層から見えない（フィルタドライバ１４がフェンスオフ。ただし、ミラーリングデーモン１７のみアクセス可能）状態になっており、サーバマシン１０（＃Ａ，＃Ｂ）ともにアプリケーション１１（＃Ａ，＃Ｂ）は起動しておらず、ミラーリングデーモン１７は起動しているものとする。 Next, the operation of the cluster system according to this embodiment configured as described above will be described. However, as an initial state, the cluster software 12 (#A, #B) exchanges heartbeats between the server machine 10 (#A) and the server machine 10 (#B), and the existence of each other can be confirmed. Both the server machine 10 (#A, #B) cannot see the shared area 33 and each mirroring disk device 20 (#A, #B) from the upper layer (the filter driver 14 is fence-off. However, the mirroring daemon 17 only. It is assumed that the application 11 (#A, #B) is not activated on the server machine 10 (#A, #B) and the mirroring daemon 17 is activated.

まず、図１１に示すフローチャートを用いて、サーバマシン１０（＃Ａ）を稼動系に設定する場合におけるクラスタソフト１２（＃Ａ）による処理の流れを説明する。 First, the flow of processing by the cluster software 12 (#A) when the server machine 10 (#A) is set as an active system will be described using the flowchart shown in FIG.

まず、ユーザ操作により、サーバマシン１０（＃Ａ）を稼動系にせよとの通知が、サーバマシン１０（＃Ａ）のクラスタソフト１２（＃Ａ）に届く（Ｓ５１）。次に、クラスタソフト１２（＃Ａ）によって、サーバマシン１０（＃Ａ）にアクセス権が与えられるようにディスクアクセス権管理テーブル３３ａが設定される（Ｓ５２）。更に、クラスタソフト１２（＃Ａ）によって、ミラーリングエラー発生有無管理テーブル３３ｂに管理されたミラーリングエラー情報が、サーバマシン１０（＃Ａ）についてミラーリングエラーが発生していない（「ミラーリングエラー無」）ように設定される（Ｓ５３）。 First, a notification that the server machine 10 (#A) is to be an active system is sent to the cluster software 12 (#A) of the server machine 10 (#A) by a user operation (S51). Next, the disk access right management table 33a is set so that the access right is given to the server machine 10 (#A) by the cluster software 12 (#A) (S52). Further, the mirroring error information managed in the mirroring error occurrence management table 33b by the cluster software 12 (#A) is such that no mirroring error has occurred in the server machine 10 (#A) (“no mirroring error”). (S53).

すると、クラスタソフト１２（＃Ａ）がミラーリングデーモン１７（＃Ａ）に稼動系となったことが通知する（Ｓ５４）。そして、サーバマシン１０（＃Ａ）のクラスタソフト１２（＃Ａ）が有効期限通知デーモン１６（＃Ａ）を起動し（Ｓ５５）、クラスタソフト１２（＃Ａ）によって、フィルタドライバ１４（＃Ａ）に稼動系となったことを通知する（Ｓ５６）。 Then, the cluster software 12 (#A) notifies the mirroring daemon 17 (#A) that it has become an active system (S54). Then, the cluster software 12 (#A) of the server machine 10 (#A) starts the expiration date notification daemon 16 (#A) (S55), and the filter driver 14 (#A) is executed by the cluster software 12 (#A). (S56).

その後、サーバマシン１０（＃Ａ）のフィルタドライバ１４（＃Ａ）が上位層からミラーリングディスク装置２０（＃Ａ）が見える状態とする（Ｓ５７）。次に、アプリケーション１１（＃Ａ）が起動される（Ｓ５８）。 Thereafter, the filter driver 14 (#A) of the server machine 10 (#A) makes the mirroring disk device 20 (#A) visible from the upper layer (S57). Next, the application 11 (#A) is activated (S58).

次に、図１２のフローチャートと、図１３に示す概念図とを用いて、サーバマシン１０（＃Ａ）を稼動系に設定した後のフィルタドライバ１４（＃Ａ）による処理の流れを説明する。 Next, the flow of processing by the filter driver 14 (#A) after setting the server machine 10 (#A) to the active system will be described using the flowchart of FIG. 12 and the conceptual diagram shown in FIG.

まず、サーバマシン１０（＃Ａ）のフィルタドライバ１４（＃Ａ）が、例えば図２１に示すクライアント６０のような上位からのＩ／Ｏ入力を待つ（Ｓ６１）。そして、Ｉ／Ｏ入力が有効期限の通知である場合（Ｓ６２：Ｙｅｓ）には、ステップＳ６３に進み、有効期限の通知でない場合（Ｓ６２：Ｎｏ）には、ステップＳ６４に進む。 First, the filter driver 14 (#A) of the server machine 10 (#A) waits for an I / O input from a host such as the client 60 shown in FIG. 21 (S61). If the I / O input is a notification of an expiration date (S62: Yes), the process proceeds to step S63. If the I / O input is not a notification of the expiration date (S62: No), the process proceeds to step S64.

ステップＳ６３では、有効期限を通知された有効期限に更新した後に、ステップＳ７８の処理に進む。 In step S63, after the expiration date is updated to the notified expiration date, the process proceeds to step S78.

ステップＳ６４では、ステップＳ６１でなされたＩ／Ｏ入力がミラーリングディスク装置２０（＃Ａ）へのｒｅａｄであればステップＳ６５へ、ｗｒｉｔｅであればステップＳ６６へ、それ以外であればステップＳ７６へそれぞれ進む。 In step S64, if the I / O input made in step S61 is read to the mirroring disk device 20 (#A), the process proceeds to step S65. If write, the process proceeds to step S66. Otherwise, the process proceeds to step S76. .

そして、ステップＳ６５ではｒｅａｄが実行され、ステップＳ７１の処理に進み、ステップＳ６６ではｗｒｉｔｅが実行され、ステップＳ６７の処理に進む。 In step S65, read is executed, and the process proceeds to step S71. In step S66, write is executed, and the process proceeds to step S67.

ステップＳ６７では、ミラーリングデーモン１７（＃Ａ）にミラーリングを依頼して待機し、ステップＳ６８でミラーリングが実行される。そして、ミラーリングが成功したならばステップＳ７１の処理に進み、成功しなかったのならステップＳ７０においてミラーリングデーモン１７（＃Ａ）によって、ミラーリングエラー発生有無管理テーブル３３ｂの設定が、サーバマシン１０（＃Ａ）によるミラーリングエラー発生が有るように変更され、ステップＳ７１の処理に進む。 In step S67, the mirroring daemon 17 (#A) is requested to perform mirroring and waits. In step S68, mirroring is executed. If the mirroring is successful, the process proceeds to step S71. If the mirroring is not successful, the mirroring daemon 17 (#A) sets the mirroring error occurrence management table 33b to the server machine 10 (#A) in step S70. ) So that a mirroring error has occurred, and the process proceeds to step S71.

ステップＳ７１では、Ｉ／Ｏが部分的に又は全部が成功した場合にはステップＳ７２の処理に進み、そうでない場合にはステップＳ７４の処理に進む。 In step S71, if the I / O is partially or wholly successful, the process proceeds to step S72. If not, the process proceeds to step S74.

ステップＳ７２では、現在時刻が取得される。そして、ステップＳ７３では、有効期限通知デーモン１６（＃Ａ）から有効期限が通知され、ステップＳ７２で取得した現在の時刻が、有効期限内であるか否かが判定される。そして、有効期限内である場合には、Ｉ／Ｏの処理結果がそのまま上位へ返され（Ｓ７４）た後に、ステップＳ７８の処理に進む。 In step S72, the current time is acquired. In step S73, the expiration date is notified from the expiration date notification daemon 16 (#A), and it is determined whether or not the current time acquired in step S72 is within the expiration date. If it is within the validity period, the I / O processing result is returned to the upper level as it is (S74), and then the process proceeds to step S78.

一方、ステップＳ７３において有効期限を過ぎていると判定された場合には、Ｉ／Ｏ処理失敗のエラーが上位へ返され（Ｓ７５）た後に、ステップＳ７８の処理に進む。 On the other hand, if it is determined in step S73 that the expiration date has passed, an I / O processing failure error is returned to the upper level (S75), and then the process proceeds to step S78.

ステップＳ７６では、フィルタドライバ１４（＃Ａ）からディスクドライバ１５（＃Ａ）に処理がそのまま渡され、ステップＳ７７において、ディスクドライバ１５（＃Ａ）からの返り値がそのまま上位に返された後に、ステップＳ７８の処理に進む。 In step S76, the process is directly passed from the filter driver 14 (#A) to the disk driver 15 (#A). In step S77, the return value from the disk driver 15 (#A) is returned to the upper level as it is. The process proceeds to step S78.

そして、ステップＳ７８では、ＯＳが終了するのであれば処理が終了し、そうでなければステップＳ６１の処理に戻る。 In step S78, if the OS ends, the process ends. If not, the process returns to step S61.

また、サーバマシン１０（＃Ａ）を稼動系に設定した後の有効期限通知デーモン１６（＃Ａ）による処理の流れは、図７のフローチャートに示す通りであるので、説明を省略する。 Further, the flow of processing by the expiration date notification daemon 16 (#A) after setting the server machine 10 (#A) to the active system is as shown in the flowchart of FIG.

次に、図１４に示すフローチャートと、図１５に示す概念図とを用いて、サーバマシン１０（＃Ｂ）のクラスタソフト１２（＃Ｂ）がハートビート切れを検出し、フェールオーバするときのサーバマシン１０（＃Ｂ）の処理の流れを説明する。 Next, using the flowchart shown in FIG. 14 and the conceptual diagram shown in FIG. 15, the cluster machine 12 (#B) of the server machine 10 (#B) detects a heartbeat break and performs a failover. The flow of processing 10 (#B) will be described.

まず、サーバマシン１０（＃Ｂ）のクラスタソフト１２（＃Ｂ）がハートビートａ切れを検出する（Ｓ８１）と、クラスタソフト１２（＃Ｂ）によって、サーバマシン１０（＃Ｂ）にのみアクセス権が与えられるようにディスクアクセス権管理テーブル３３ａが設定される（Ｓ８２）。 First, when the cluster software 12 (#B) of the server machine 10 (#B) detects that the heartbeat a has expired (S81), the cluster software 12 (#B) only accesses the server machine 10 (#B). Is set in the disk access right management table 33a (S82).

次に、クラスタソフト１２（＃Ｂ）は、予め定めた所定時間（例えば、６０秒）スリープする（Ｓ８３）。 Next, the cluster software 12 (#B) sleeps for a predetermined time (for example, 60 seconds) (S83).

一方、ステップＳ８４において、ミラーリングエラー発生有無管理テーブル３３ｂにおいて、サーバマシン１０（＃Ａ）がミラーリングエラー発生有りと設定されている場合（Ｓ８４：Ｙｅｓ）には、ステップＳ８５の処理に進み、そうでない場合（Ｓ８４：Ｎｏ）には、ステップＳ９１の処理に進む。 On the other hand, if the server machine 10 (#A) is set to have a mirroring error occurrence in the mirroring error occurrence management table 33b in step S84 (S84: Yes), the process proceeds to step S85, otherwise. In the case (S84: No), the process proceeds to step S91.

ステップＳ８５では、クラスタソフト１２（＃Ｂ）によって、ミラーリングエラー発生有無管理テーブル３３ｂにおいて、サーバマシン１０（＃Ｂ）がミラーリングエラー発生無しと設定され、次に、ステップＳ８６では、サーバマシン１０（＃Ｂ）のクラスタソフト１２（＃Ｂ）によって、ミラーリングデーモン１７（＃Ｂ）に、稼動系になったことが通知される。これによって、ミラーリングデータの受信が停止される。 In step S85, the cluster software 12 (#B) sets the server machine 10 (#B) to have no mirroring error in the mirroring error occurrence management table 33b. Next, in step S86, the server machine 10 (#B) The cluster software 12 (#B) of B) notifies the mirroring daemon 17 (#B) that it has become an active system. As a result, reception of the mirroring data is stopped.

更に有効期限通知デーモン１６（＃Ｂ）を起動し（Ｓ８７）、サーバマシン１０（＃Ｂ）が稼動系になったことをフィルタドライバ１４（＃Ｂ）に通知する（Ｓ８８）。 Furthermore, the expiration date notification daemon 16 (#B) is activated (S87), and the server driver 10 (#B) is notified to the filter driver 14 (#B) (S88).

その後、サーバマシン１０（＃Ｂ）のフィルタドライバ１４（＃Ｂ）によって、上位層からミラーリングディスク装置２０（＃Ｂ）が見える状態にされる（Ｓ８９）。そして、サーバマシン１０（＃Ｂ）のクラスタソフト１２（＃Ｂ）によってアプリケーション１１（＃Ｂ）が起動され（Ｓ９０）、処理を終了する。 Thereafter, the mirror driver 20 (#B) is made visible from the upper layer by the filter driver 14 (#B) of the server machine 10 (#B) (S89). Then, the application 11 (#B) is started by the cluster software 12 (#B) of the server machine 10 (#B) (S90), and the process is terminated.

ステップＳ９１では、クラスタソフト１２（＃Ｂ）が、ディスクアクセス権管理テーブル３３ａにおいて、サーバマシン１０（＃Ｂ）のアクセス権が「有」から「無」に変更され（フェールオーバが断念され）、処理を終了する。 In step S91, the cluster software 12 (#B) changes the access right of the server machine 10 (#B) from “Yes” to “No” in the disk access right management table 33a (failover is abandoned). Exit.

上述したように、本実施の形態に係るクラスタシステムにおいては、上記のような作用により、ミラーリングに失敗している場合にはフェールオーバしないようにすることができる。また、フェールオーバした後には、元稼動系であるサーバマシン１０（＃Ａ）側でＩ／Ｏが成功しないようにすることができる。これにより、トランザクションの不整合の発生を回避することが可能となる。 As described above, in the cluster system according to the present embodiment, it is possible to prevent failover from occurring when mirroring fails due to the above-described operation. Further, after a failover, it is possible to prevent I / O from succeeding on the server machine 10 (#A) side which is the original operating system. This makes it possible to avoid the occurrence of transaction inconsistencies.

また、ディスクＩ／Ｏ（ｒｅａｄやｗｒｉｔｅ）のたびにミラーリングディスク装置２０へのＩ／Ｏを発生させることがないので、ディスクＩ／Ｏの性能劣化を抑えることが可能となる。 Further, since I / O to the mirroring disk device 20 is not generated each time the disk I / O (read or write) is performed, it is possible to suppress the performance degradation of the disk I / O.

（第３の実施の形態）
本実施の形態に係るクラスタシステムは、第１及び第２の実施の形態に係るクラスタシステムの変形例であり、図１６に示すように、フィルタドライバ１４に代えて、ジャケットライブラリ（以降、「ジャケットＤＬＬ」と称する）１８と、Ｉ／Ｏ処理用ライブラリ（以降、「Ｉ／０処理用ＤＬＬ」と称する）１９とを備えた構成としている点のみが異なる。したがって、ここでは、第１及び第２の実施の形態に係るクラスタシステムと異なる点について説明する。 (Third embodiment)
The cluster system according to the present embodiment is a modification of the cluster system according to the first and second embodiments. As shown in FIG. 16, instead of the filter driver 14, a jacket library (hereinafter referred to as "jacket") is used. The only difference is that it is configured to include an I / O processing library (hereinafter referred to as “I / 0 processing DLL”) 19 and a DLL 18. Therefore, here, differences from the cluster systems according to the first and second embodiments will be described.

すなわち、第１及び第２の実施の形態に係るクラスタシステムでは、フィルタドライバ１４が有効期限を判定していたが、本実施の形態では、フィルタドライバ１４ではなく、上位のライブラリが行うようにしている。この際、本来のＩ／Ｏ処理を行うためのＩ／Ｏ処理用ＤＬＬ１９と、アプリケーション１１との間に、Ｉ／Ｏをフックして独自の処理を行うジャケットＤＬＬ１８を備えている。ジャケットＤＬＬの詳細については、特開平１１−１１０１９４号公報を参照されたい。このジャケットＤＬＬ１８は、アプリケーション１１から見ると、Ｉ／Ｏ処理用ＤＬＬ１９であるかのように見えるものであり、図１７に示すように、アプリケーション１１のプロセスハンドル管理テーブル１８ａを持ち、管理しているアプリケーション１１（図１７に示すプロセス１、プロセス２）に対してのみ有効期限による判定ルーチンを通す。 That is, in the cluster system according to the first and second embodiments, the filter driver 14 determines the expiration date. However, in this embodiment, the higher-level library is used instead of the filter driver 14. Yes. At this time, a jacket DLL 18 that hooks the I / O and performs its own processing is provided between the I / O processing DLL 19 for performing the original I / O processing and the application 11. For details of the jacket DLL, refer to JP-A-11-110194. The jacket DLL 18 looks as if it is an I / O processing DLL 19 when viewed from the application 11, and has a process handle management table 18a for the application 11 as shown in FIG. Only the application 11 (process 1 and process 2 shown in FIG. 17) passes the determination routine based on the expiration date.

なお、以下では、図１６に示すように、代表的に、第１の実施の形態に示すような共有ディスク装置３１を備えた構成に適用した場合について説明するが、本実施の形態のようにフィルタドライバ１４の代わりにジャケットＤＬＬ１８と、Ｉ／Ｏ処理用ＤＬＬ１９とを備えた構成は、第２の実施の形態に示すようなミラーリングディスク装置２０を備えた構成に適用することも可能であり、第２の実施の形態に示すようなミラーリングディスク装置２０を備えた構成に適用したものも本発明の一実施例として解される。 In the following, as shown in FIG. 16, a case where the present invention is representatively applied to a configuration including the shared disk device 31 as shown in the first embodiment will be described. The configuration provided with the jacket DLL 18 and the I / O processing DLL 19 instead of the filter driver 14 can also be applied to the configuration provided with the mirroring disk device 20 as shown in the second embodiment. What is applied to the configuration including the mirroring disk device 20 as shown in the second embodiment is also understood as an example of the present invention.

次に、図１８のフローチャートと、図１９に示す概念図とを用いて、サーバマシン１０（＃Ａ）を稼動系に設定した後のフィルタドライバ１４（＃Ａ）による処理の流れを説明する。なお、サーバマシン１０（＃Ａ）を稼動系に設定する場合におけるクラスタソフト１２（＃Ａ）による処理の流れ、サーバマシン１０（＃Ｂ）のクラスタソフト１２（＃Ｂ）がハートビート切れを検出し、フェールオーバするときのサーバマシン１０（＃Ｂ）の処理の流れは、第１及び第２の実施の形態と同様である。また、サーバマシン１０（＃Ａ）を稼動系に設定した後の有効期限通知デーモン１６（＃Ａ）による処理については、有効期限をフィルタドライバ１４に通知する代わりに、ジャケットＤＬＬ１８に通知する点のみが異なる。 Next, the flow of processing by the filter driver 14 (#A) after setting the server machine 10 (#A) to the active system will be described using the flowchart of FIG. 18 and the conceptual diagram shown in FIG. Note that the flow of processing by the cluster software 12 (#A) when the server machine 10 (#A) is set to the active system, the cluster software 12 (#B) of the server machine 10 (#B) detects a heartbeat break. The process flow of the server machine 10 (#B) at the time of failover is the same as in the first and second embodiments. In addition, regarding the processing by the expiration date notification daemon 16 (#A) after the server machine 10 (#A) is set to the active system, only the notification of the expiration date to the jacket DLL 18 is given instead of the notification to the filter driver 14. Is different.

すなわち、本実施の形態に係るクラスタシステムでは、まず、サーバマシン１０（＃Ａ）のジャケットＤＬＬ１９（＃Ａ）が、例えば図２１に示すクライアント６０のような上位からのＩ／Ｏ入力を待つ（Ｓ１０１）。Ｉ／Ｏ入力が有効期限の通知である場合（Ｓ１０２：Ｙｅｓ）には、ステップＳ１０３に進み、有効期限の通知でない場合（Ｓ１０２：Ｎｏ）には、ステップＳ１０４に進む。 That is, in the cluster system according to the present embodiment, first, the jacket DLL 19 (#A) of the server machine 10 (#A) waits for an I / O input from a host, such as the client 60 shown in FIG. S101). If the I / O input is an expiration date notification (S102: Yes), the process proceeds to step S103. If the I / O input is not an expiration date notification (S102: No), the process proceeds to step S104.

ステップＳ１０３では、通知された有効期限に更新された後に、ステップＳ１１５の処理に進む。 In step S103, after updating to the notified expiration date, the process proceeds to step S115.

ステップＳ１０４では、ステップＳ１０１でなされたＩ／Ｏ入力がサーバＡ用データ領域３２（＃Ａ）へのｒｅａｄであればステップＳ１０５へ、ｗｒｉｔｅであればステップＳ１０６へ、それ以外であればステップＳ１１３へそれぞれ進む。 In step S104, if the I / O input made in step S101 is read to the server A data area 32 (#A), go to step S105, if write, go to step S106, otherwise go to step S113. Proceed with each.

そして、ステップＳ１０５ではｒｅａｄが、ステップＳ１０６ではｗｒｉｔｅがそれぞれ実行され、ステップＳ１０７の処理に進む。 In step S105, read is executed, and in step S106, write is executed, and the process proceeds to step S107.

ステップＳ１０７では、ジャケットＤＬＬ１８が、プロセスハンドル管理テーブル１８ａを参照し、ｒｅａｄ又はｗｒｉｔｅを実行するアプリケーションが、プロセスハンドル管理テーブル１８ａに書き込まれたアプリケーション（クラスタソフト１２によって管理されたアプリケーション）である場合にはステップＳ１０８の処理に進み、そうでない場合にはステップＳ１１１の処理に進む。 In step S107, when the jacket DLL 18 refers to the process handle management table 18a and the application for executing read or write is an application written in the process handle management table 18a (an application managed by the cluster software 12). Advances to the process of step S108, and otherwise proceeds to the process of step S111.

ステップＳ１０８では、Ｉ／Ｏが部分的に又は全部が成功した場合にはステップＳ１０９の処理に進み、そうでない場合にはステップＳ１１１の処理に進む。 In step S108, if the I / O is partially or wholly successful, the process proceeds to step S109. If not, the process proceeds to step S111.

ステップＳ１０９では、現在の時間が取得される。そして、ステップＳ１１０では、有効期限通知デーモン１６（＃Ａ）から有効期限が通知され、ステップＳ１０９で取得した現在の時間が、有効期限内であるか否かが判定される。そして、有効期限内である場合には、Ｉ／Ｏの処理結果がそのまま上位へ返され（Ｓ１１１）た後に、ステップＳ１１５の処理に進む。 In step S109, the current time is acquired. In step S110, the expiration date is notified from the expiration date notification daemon 16 (#A), and it is determined whether or not the current time acquired in step S109 is within the expiration date. If it is within the expiration date, the I / O processing result is returned to the upper level as it is (S111), and then the process proceeds to step S115.

一方、ステップＳ１１０において有効期限を過ぎていると判定された場合には、ジャケットＤＬＬ１８が、Ｉ／Ｏ処理失敗のエラーを上位へ返した（Ｓ１１２）後に、ステップＳ１１５の処理に進む。 On the other hand, if it is determined in step S110 that the expiration date has passed, the jacket DLL 18 returns an I / O process failure error to the upper level (S112), and then proceeds to the process of step S115.

ステップＳ１１３では、Ｉ／Ｏ処理用ＤＬＬ１９（＃Ａ）からディスクドライバ１５（＃Ａ）に処理がそのまま渡され、ステップＳ１１４において、ディスクドライバ１５（＃Ａ）からの返り値がそのまま上位に返された後に、ステップＳ１１５の処理に進む。 In step S113, the process is passed from the I / O processing DLL 19 (#A) to the disk driver 15 (#A) as it is, and in step S114, the return value from the disk driver 15 (#A) is returned to the upper level as it is. Then, the process proceeds to step S115.

そして、ステップＳ１１５では、ＯＳが終了するのであれば処理が終了し、そうでなければステップＳ１０１の処理に戻る。 In step S115, if the OS ends, the process ends. If not, the process returns to step S101.

上述したように、本実施の形態に係るクラスタシステムにおいては、上記のような作用により、Ｉ／Ｏ処理結果を上位へ返す場合、有効期限を使用した判定処理を無条件に行うのではなく、クラスタソフト１２の管理下にあるアプリケーションのＩ／Ｏに限り行うことができ、クラスタソフト１２の管理下にないアプリケーションとディスクデータとを共有することにより、フェールオーバが発生した場合であっても、ディスクＩ／Ｏの性能劣化を抑えながら、トランザクションの整合を図ることが可能となる。 As described above, in the cluster system according to the present embodiment, when the I / O processing result is returned to the upper layer by the above-described operation, the determination process using the expiration date is not performed unconditionally. This can be done only for application I / O under the management of the cluster software 12, and by sharing the disk data with the application not under the management of the cluster software 12, even if a failover occurs, the disk It is possible to achieve transaction matching while suppressing degradation of I / O performance.

以上、本発明を実施するための最良の形態について、添付図面を参照しながら説明したが、本発明はかかる構成に限定されない。特許請求の範囲の発明された技術的思想の範疇において、当業者であれば、各種の変更例及び修正例に想到し得るものであり、それら変更例及び修正例についても本発明の技術的範囲に属するものと了解される。 The best mode for carrying out the present invention has been described above with reference to the accompanying drawings, but the present invention is not limited to such a configuration. Within the scope of the invented technical idea of the scope of claims, a person skilled in the art can conceive of various changes and modifications. The technical scope of the present invention is also applicable to these changes and modifications. It is understood that it belongs to.

第１の実施の形態に係るクラスタシステムの構成例を示す機能ブロック図。1 is a functional block diagram showing a configuration example of a cluster system according to a first embodiment. FIG. 共有ディスク装置の詳細構成例を示す概念図。The conceptual diagram which shows the detailed structural example of a shared disk apparatus. 現在時刻を取得する手段の例を示す図。The figure which shows the example of the means to acquire the present time. 第１の実施の形態に係るクラスタシステムにおいて、サーバマシンを稼動系に設定する場合におけるクラスタソフトによる処理の流れを示すフローチャート。6 is a flowchart showing a flow of processing by cluster software when a server machine is set as an active system in the cluster system according to the first embodiment. 第１の実施の形態に係るクラスタシステムにおいて、サーバマシンを稼動系に設定した後のフィルタドライバによる処理の流れを示すフローチャート。5 is a flowchart showing a flow of processing by a filter driver after setting a server machine to an active system in the cluster system according to the first embodiment. 第１の実施の形態に係るクラスタシステムにおいて、フィルタドライバによる処理の流れを示す概念図。The conceptual diagram which shows the flow of the process by a filter driver in the cluster system which concerns on 1st Embodiment. 第１の実施の形態に係るクラスタシステムにおいて、サーバマシンを稼動系に設定した後の有効期限通知デーモンによる処理の流れを示すフローチャート。5 is a flowchart showing a flow of processing by an expiration date notification daemon after setting a server machine to an active system in the cluster system according to the first embodiment. 第１の実施の形態に係るクラスタシステムにおいて、ハートビート切れを検出し、フェールオーバするときのサーバマシンの処理の流れを示すフローチャート。The flowchart which shows the flow of a process of the server machine at the time of detecting failover of a heartbeat and failing over in the cluster system which concerns on 1st Embodiment. 第１の実施の形態に係るクラスタシステムにおいて、フェールオーバ時におけるサーバマシンによる処理の流れを示す概念図。The conceptual diagram which shows the flow of the process by the server machine at the time of failover in the cluster system which concerns on 1st Embodiment. 第２の実施の形態に係るクラスタシステムの構成例を示す機能ブロック図。The functional block diagram which shows the structural example of the cluster system which concerns on 2nd Embodiment. 第２の実施の形態に係るクラスタシステムにおいて、サーバマシンを稼動系に設定する場合におけるクラスタソフトによる処理の流れを示すフローチャート。9 is a flowchart showing a flow of processing by cluster software when a server machine is set as an active system in the cluster system according to the second embodiment. 第２の実施の形態に係るクラスタシステムにおいて、サーバマシンを稼動系に設定した後のフィルタドライバによる処理の流れを示すフローチャート。The flowchart which shows the flow of a process by the filter driver after setting a server machine to an active system in the cluster system which concerns on 2nd Embodiment. 第２の実施の形態に係るクラスタシステムにおいて、フィルタドライバによる処理の流れを示す概念図。The conceptual diagram which shows the flow of the process by a filter driver in the cluster system which concerns on 2nd Embodiment. 第２の実施の形態に係るクラスタシステムにおいて、ハートビート切れを検出し、フェールオーバするときのサーバマシンの処理の流れを示すフローチャート。The flowchart which shows the flow of a process of the server machine at the time of detecting failover of a heartbeat and failing over in the cluster system which concerns on 2nd Embodiment. 第２の実施の形態に係るクラスタシステムにおいて、フェールオーバ時におけるサーバマシンによる処理の流れを示す概念図。The conceptual diagram which shows the flow of a process by the server machine at the time of failover in the cluster system which concerns on 2nd Embodiment. 第３の実施の形態に係るクラスタシステムの構成例を示す機能ブロック図。The functional block diagram which shows the structural example of the cluster system which concerns on 3rd Embodiment. プロセスハンドル管理テーブルの一例を示す図。The figure which shows an example of a process handle management table. 第３の実施の形態に係るクラスタシステムにおいて、サーバマシンを稼動系に設定する場合におけるジャケットＤＬＬによる処理の流れを示すフローチャート。The flowchart which shows the flow of a process by the jacket DLL in the case of setting a server machine to an active system in the cluster system which concerns on 3rd Embodiment. 第３の実施の形態に係るクラスタシステムにおいて、ジャケットＤＬＬによる処理の流れを示す概念図。The conceptual diagram which shows the flow of the process by jacket DLL in the cluster system which concerns on 3rd Embodiment. 従来技術のクラスタシステムによるフェールオーバを説明するための概念図。The conceptual diagram for demonstrating the failover by the cluster system of a prior art. 従来技術のクラスタシステムによるフェールオーバ時に生じるデータ破壊を説明するための概念図（ミラーリングディスク装置使用時）。The conceptual diagram for demonstrating the data destruction which arises at the time of failover by the cluster system of a prior art (at the time of mirroring disk apparatus use). 従来技術のクラスタシステムによるフェールオーバ時に生じるデータ破壊を説明するための概念図（共有ディスク装置使用時）。The conceptual diagram for demonstrating the data destruction which arises at the time of failover by the cluster system of a prior art (when using a shared disk apparatus). 従来技術のクラスタシステムの改良構成例とそれによる処理の流れを示す図（共有ディスク装置使用時）。The figure which shows the structural improvement example of the cluster system of a prior art, and the flow of a process by it (when using a shared disk apparatus). 従来技術のクラスタシステムの改良構成例とそれによる処理の流れを示す図（ミラーリングディスク装置）。The figure which shows the example of an improved structure of the cluster system of a prior art, and the flow of a process by it (mirroring disk apparatus).

Explanation of symbols

ａ…ハートビート、ｂ…フェールオーバ、１０…サーバマシン、１１…アプリケーション、１２…クラスタソフト、１３…データベースサーバ、１４…フィルタドライバ、１５…ディスクドライバ、１６…有効期限通知デーモン、１７…ミラーリングデーモン、１８…ジャケットＤＬＬ、１８ａ…プロセスハンドル管理テーブル、１９…Ｉ／Ｏ処理用ＤＬＬ、２０…ミラーリングディスク装置、３１…共有ディスク装置、３２…サーバＡ用データ領域、３２…サーバＢ用データ領域、３３…共有領域、３３ａ…ディスクアクセス権管理テーブル、３３ｂ…ミラーリングエラー発生有無管理テーブル、５０，５２…通信路、５１…ＦＣケーブル、６０…クライアント、６１…ＤＢクライアント、７０…共通コンピュータ a ... heartbeat, b ... failover, 10 ... server machine, 11 ... application, 12 ... cluster software, 13 ... database server, 14 ... filter driver, 15 ... disk driver, 16 ... expiration date notification daemon, 17 ... mirroring daemon, 18 ... Jacket DLL, 18a ... Process handle management table, 19 ... I / O processing DLL, 20 ... Mirroring disk unit, 31 ... Shared disk unit, 32 ... Server A data area, 32 ... Server B data area, 33 ... Shared area, 33a ... Disk access right management table, 33b ... Mirroring error occurrence management table, 50, 52 ... Communication path, 51 ... FC cable, 60 ... Client, 61 ... DB client, 70 ... Common computer

Claims

A cluster system composed of a plurality of server machines and a shared disk device shared and connected to the plurality of server machines,
The plurality of server machines include an application that runs on each of the plurality of server machines, cluster software that runs on each of the plurality of server machines, a filter driver that runs on each of the plurality of server machines, And an expiration notification daemon that runs on each server machine,
The shared disk device includes a server data area for storing data of the server machines, and a shared area shared by the server machines,
The shared area includes an access management unit that manages access information in which permission or non-permission of access to the shared disk device is set for each of the server machines,
The expiration date notification daemon periodically refers to the access information managed by the access management unit, and if access by the own server machine is permitted, the expiration date plus a predetermined time is added to the current time. Set and notify the filter driver of this server machine of this expiration date,
Each of the cluster software provided in the plurality of server machines communicates with each other periodically to check the survival state of each server machine, and has two types of states, an active system and a standby system,
When the cluster software becomes an active system, the cluster software changes the access information managed by the access management unit so that only the own server machine is permitted to access, and from the own server machine side. Make the server data area of the local server machine visible, start an application provided in the local server machine,
After the cluster software becomes active, the filter driver of the active server machine waits for I / O input, and if some or all of the processing results of the I / O input are successful, the current time If the current time is within the validity period notified from the validity period notification daemon, the processing result is returned to the I / O input side. If the current time is not within the validity period, the process result is returned to the I / O input side. Returns an error,
If the cluster software of the standby server machine determines that the active server machine is not alive, the cluster software of the standby server machine uses the access information managed by the access management unit. Is changed so that only the own server machine is allowed access, and after waiting for a predetermined time, the server data area of the own server machine is made visible from the own server machine side, and A cluster system in which the provided application is started.