JP6070064B2

JP6070064B2 - Node device, cluster system, failover method and program

Info

Publication number: JP6070064B2
Application number: JP2012237731A
Authority: JP
Inventors: 哲森内
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2012-10-29
Filing date: 2012-10-29
Publication date: 2017-02-01
Anticipated expiration: 2032-10-29
Also published as: JP2014089506A

Description

本発明はノード装置、クラスタシステム、フェイルオーバー方法およびプログラムに関し、特に異常発生時に短時間にフェイルオーバー処理を可能とするノード装置等に関する。 The present invention relates to a node device, a cluster system, a failover method, and a program, and more particularly, to a node device that enables failover processing in a short time when an abnormality occurs.

企業などで利用されるコンピュータシステムにおいては、短時間の停止であっても、その間に発生した業務の停止によって巨額の損失が発生することがある。そのため、そのようなコンピュータシステムにおいては「高可用性（High Availability）」、より具体的には「稼働率９９．９９％以上（年間停止時間５２分以下）」が求められている。そのようなコンピュータシステムでは、サーバクラスタと呼ばれる構成が多く利用されている。 In a computer system used in a company or the like, even if it is stopped for a short period of time, a huge loss may occur due to the stoppage of business during that time. Therefore, in such a computer system, “High Availability”, more specifically “operation rate of 99.99% or more (annual downtime of 52 minutes or less)” is required. In such a computer system, a configuration called a server cluster is often used.

サーバクラスタは、同一のＮＡＳ（Network Attached Storage、ネットワーク接続ストレージ）を参照する複数台のサーバコンピュータが１台の仮想サーバとして動作するように構成したものである。サーバクラスタは、このＮＡＳの利用形態により、「アクティブ−アクティブ構成」と「アクティブ−パッシブ構成」の２種類に分類される。 The server cluster is configured such that a plurality of server computers that refer to the same NAS (Network Attached Storage) operate as one virtual server. Server clusters are classified into two types, “active-active configuration” and “active-passive configuration”, according to the usage form of NAS.

アクティブ−アクティブ構成のサーバクラスタは、当該サーバクラスタを構成する各サーバがいずれも稼働系サーバとして動作して処理を行う。このため、オフロード（off-load、負荷分散）も兼ねることができ、システム全体でのパフォーマンスを向上させることも可能である。 In an active-active server cluster, each server constituting the server cluster operates as an active server for processing. For this reason, it can also serve as off-load (load distribution), and it is also possible to improve the performance of the entire system.

図１０は、特許文献１に記載されている、既存のクラスタシステム９０１の構成について示す説明図である。クラスタシステム９０１は、第一ノード９１０および第二ノード９２０という各々のコンピュータ装置（サーバ）と、その他多数のクライアント装置９４０とが、ネットワーク９４１を介して相互に接続されて構成されている。 FIG. 10 is an explanatory diagram showing the configuration of an existing cluster system 901 described in Patent Document 1. As shown in FIG. The cluster system 901 includes a computer device (server) of a first node 910 and a second node 920 and a large number of other client devices 940 connected to each other via a network 941.

第一ノード９１０および第二ノード９２０は、共有の外部記憶装置である共有ストレージ９３０と接続されている。そして、第一ノード９１０および第二ノード９２０は、処理にかかる負荷を相互に分散しつつ、記憶されているデータを互いにミラーリングして、どちらか一方に故障が発生した場合にその故障した方の装置で行われていた処理を残る一方が引き継いで続行する（これをフェイルオーバーという）ことができる構成となっている。 The first node 910 and the second node 920 are connected to a shared storage 930 that is a shared external storage device. The first node 910 and the second node 920 mirror the stored data with each other while distributing the processing load to each other. One of the remaining processes performed by the apparatus can take over and continue (this is called a failover).

第一ノード９１０は、コンピュータプログラムを実行する主体であるプロセッサ９１１と、処理中のデータを一時的に記憶する不揮発性の主記憶装置であるＮＶＲＡＭ（Non-Volatile RAM）９１２と、処理されたデータを共有ストレージ９３０に固定的に記憶する外部ストレージ接続手段９１３と、ネットワーク９４１を介して第二ノード９２０や各クライアント装置９４０との間で通信を行う通信手段９１４とが備えられている。 The first node 910 includes a processor 911 that is a main body that executes a computer program, an NVRAM (Non-Volatile RAM) 912 that is a nonvolatile main storage device that temporarily stores data being processed, and processed data Are stored in the shared storage 930 in a fixed manner, and a communication unit 914 that communicates with the second node 920 and each client device 940 via the network 941 is provided.

プロセッサ９１１は、クライアント装置９４０からの依頼に基づく処理を行うアプリケーションソフトを実行してそのデータをＮＶＲＡＭ９１２に書き込むアプリケーション実行部９５１と、ＮＶＲＡＭ９１２との間のデータ交換を仲介するデバイスドライバを動作させるＮＶＲＡＭドライバ実行部９５２と、外部ストレージ接続手段９１３を経由して共有ストレージ９３０との間のデータ交換を仲介するデバイスドライバを動作させるストレージ装置ドライバ実行部９５３と、通信手段１４との間のデータ交換を仲介するデバイスドライバを動作させる外部通信ドライバ９５４として機能する。 The processor 911 executes application software that performs processing based on a request from the client device 940 and writes the data to the NVRAM 912, and an NVRAM driver that operates a device driver that mediates data exchange between the NVRAM 912 Data exchange between the communication unit 14 and the storage device driver execution unit 953 that operates the device driver that mediates data exchange between the execution unit 952 and the shared storage 930 via the external storage connection unit 913 It functions as an external communication driver 954 that operates the device driver.

ＮＶＲＡＭ９１２は、第一ノード９１０で行われる処理についてデータを一時的に書き込む第一ノード用領域９１２ａと、第二ノード９２０で行われる処理で後述のＮＶＲＡＭ９２２に書き込まれたデータをミラーリングする第二ノード用領域９１２ｂとに分かれている。第一ノード９１０では、クライアント装置９４０からの依頼に応じて第一ノード用領域９１２ａに対してデータ処理を行い、その内容を第二ノード９２０側の第一ノード用領域９２２ａにもコピーして反映させる。 The NVRAM 912 is a first node area 912a for temporarily writing data for processing performed in the first node 910, and a second node for mirroring data written in the NVRAM 922 described later in processing performed in the second node 920. It is divided into a region 912b. In response to a request from the client device 940, the first node 910 performs data processing on the first node area 912a, and copies the contents to the first node area 922a on the second node 920 side to be reflected. Let

プロセッサ９１１はさらに、各々のプログラムの動作により、アプリケーション実行部９５１がクライアント装置９４０からの依頼に応じて第一ノード用領域９１２ａに書き込んだデータを第二ノード９２０の第一ノード用領域９２２ａにコピーして反映させるミラーリング処理部９５５と、そのデータを共有ストレージ９３０に記憶するストレージ記憶部９５６と、第二ノード９２０で異常が発生した場合にその処理を引き継ぐフェイルオーバー処理部９５７としても機能する。 The processor 911 further copies the data written in the first node area 912a by the application execution unit 951 in response to a request from the client device 940 to the first node area 922a of the second node 920 by the operation of each program. It also functions as a mirroring processing unit 955 to be reflected, a storage storage unit 956 that stores the data in the shared storage 930, and a failover processing unit 957 that takes over the processing when an abnormality occurs in the second node 920.

第二ノード９２０は、ハードウェア的にもソフトウェア的にも、次に説明する点を除いては第一ノード９１０と同一の構成を有するので、呼称は全て同一とし、参照番号は各々＋１０ずつしていう。即ち、ハードウェアとしてはプロセッサ９２１、ＮＶＲＡＭ９２２…などのようにいい、ソフトウェアとしてはアプリケーション実行部９６１、ＮＶＲＡＭドライバ実行部９６２…などのようにいう。 The second node 920 has the same configuration as that of the first node 910 except for the points described below in terms of hardware and software. Therefore, the names are all the same, and the reference numbers are each +10. Say. That is, the hardware is referred to as a processor 921, NVRAM 922..., And the software is referred to as an application execution unit 961, NVRAM driver execution unit 962.

第二ノード９２０の、第一ノード９１０と比べての唯一の相違点について説明する。ＮＶＲＡＭ９２２は、第一ノード９１０で行われる処理でＮＶＲＡＭ９１２に書き込まれたデータをミラーリングする第一ノード用領域９２２ａと、第二ノード９２０で行われる処理についてデータを一時的に書き込む第二ノード用領域９２２ｂとに分かれている。第二ノード９２０では、クライアント装置９４０からの依頼に応じて第二ノード用領域９２２ｂに対してデータ処理を行い、その内容を第一ノード９１０側の第二ノード用領域９１２ｂにもコピーして反映させる。 The only difference between the second node 920 and the first node 910 will be described. The NVRAM 922 includes a first node area 922a for mirroring data written to the NVRAM 912 in the process performed at the first node 910, and a second node area 922b for temporarily writing data regarding the process performed at the second node 920. It is divided into and. In response to a request from the client device 940, the second node 920 performs data processing on the second node area 922b, and copies the contents to the second node area 912b on the first node 910 side for reflection. Let

第一ノード９１０のアプリケーション実行部９５１がデータをＮＶＲＡＭ９１２の第一ノード用領域９１２ａに書き込む際には、既にファイルシステムに書き込み可能な状態で記憶させる。第二ノード９２０のアプリケーション実行部９６１がデータをＮＶＲＡＭ９２２の第二ノード用領域９２２ｂに書き込む際も同様である。この状態のデータに対して、第一ノード９１０および第二ノード９２０は互いにミラーリング処理部９５５および９６５によってＮＶＲＡＭ９１２および９２２のデータを相互にミラーリングし、そしてストレージ記憶部９５６もしくは９６６によって共有ストレージ９３０に記憶させる。 When the application execution unit 951 of the first node 910 writes data to the first node area 912a of the NVRAM 912, the data is already stored in the file system in a writable state. The same applies when the application execution unit 961 of the second node 920 writes data to the second node area 922b of the NVRAM 922. With respect to the data in this state, the first node 910 and the second node 920 mirror the data in the NVRAMs 912 and 922 with each other by the mirroring processing units 955 and 965 and store them in the shared storage 930 by the storage storage unit 956 or 966. Let

ここで、たとえば第一ノード９１０で、ＮＶＲＡＭ９１２上に共有ストレージ９３０に未反映のデータが残った状態で異常が発生した場合には、第二ノード９２０側でフェイルオーバー処理部９６７が、第一ノード用領域９２２ａ上のデータに対してリカバリ処理、即ち共有ストレージ９３０に記憶させる処理を行った後で、各クライアント装置９４０からのアクセス受付を再開させる。第二ノード９２０で同様の異常が発生した場合には、第一ノード９１０がこれと同様の動作を行う。 Here, for example, if an abnormality occurs in the first node 910 with data not yet reflected in the shared storage 930 on the NVRAM 912, the failover processing unit 967 on the second node 920 side After the recovery process, that is, the process of storing the data in the shared storage 930, is performed on the data on the use area 922a, the access acceptance from each client device 940 is resumed. When the same abnormality occurs in the second node 920, the first node 910 performs the same operation.

このように構成することで、第一ノード９１０と第二ノード９２０とで各クライアント装置９４０からのアクセスにかかる処理の負荷を分散しつつ、一方で異常が発生した時には残る一方ですぐに処理を再開することが可能となる。この構造は、自ノード用の領域を使用する限りにおいては、シングルシステムと同様の構造で動作することができる。このため、クラスタ構成として特別な改造が必要なく、通常運用での構造変更を必要としないでクラスタシステムの構築が可能となる。 By configuring in this way, the first node 910 and the second node 920 distribute the processing load related to access from each client device 940, but on the other hand, when an abnormality occurs, the remaining processing is immediately performed. It is possible to resume. This structure can operate in the same structure as a single system as long as the area for the own node is used. For this reason, no special modification is required as a cluster configuration, and a cluster system can be constructed without requiring a structural change in normal operation.

他にこれに関連する技術文献として、たとえば次の各文献がある。その中でも特許文献２には、ＮＶＲＡＭを複数ブロックに分割して各ブロックごとにリード／ライトプロテクトすることが可能であるというディスク制御装置について記載されている。特許文献３には、遠隔地に設置されたリモートサイトにデータをミラーリングするシステムについて記載されている。非特許文献１および２には、ＮＡＳクラスタモデルの基本的な構成について記載されている。 Other technical documents relating to this include the following documents, for example. Among them, Patent Document 2 describes a disk control device that can divide NVRAM into a plurality of blocks and perform read / write protection for each block. Patent Document 3 describes a system that mirrors data at a remote site installed at a remote location. Non-Patent Documents 1 and 2 describe the basic configuration of the NAS cluster model.

特開２００７−３２８７７８号公報JP 2007-328778 A 特開２００８−２１７８１１号公報JP 2008-217811 A 特表２００６−５２７８７５号公報JP-T-2006-527875

デジタルアドバンテージ「第２０回ファイル共有プロトコルＳＭＢ／ＣＩＦＳ（その１）（基礎から学ぶＷｉｎｄｏｗｓネットワーク―Ｗｉｎｄｏｗｓネットワーク管理者への道―より）」、平成１６年１０月２９日、アイティメディア株式会社、［平成２４年５月２１日検索］、インターネット＜URL：http://www.atmarkit.co.jp/fwin2k/network/baswinlan020/baswinlan020_01.html＞Digital Advantage "20th File Sharing Protocol SMB / CIFS (Part 1) (From the Windows Network Learned from the Basics-The Way to Windows Network Administrators)", October 29, 2004, IT Media Corporation, [Heisei Search on May 21, 2012], Internet <URL: http://www.atmarkit.co.jp/fwin2k/network/baswinlan020/baswinlan020_01.html> 高橋郷「Ｗｉｎｄｏｗｓクラスタリング入門第１回ＭＳＣＳ導入の準備〜サーバ・クラスタの基礎知識〜」、平成２０年１２月３日、アイティメディア株式会社、［平成２３年１２月２１日検索］、インターネット＜URL：http://www.atmarkit.co.jp/fwin2k/operation/mscluster01/mscluster01_01.html＞Go Takahashi “Introduction to Windows Clustering 1st MSCS Introduction Preparation-Basic Knowledge of Server Clusters”, December 3, 2008, IT Media Co., Ltd. [Search December 21, 2011], Internet <URL : Http://www.atmarkit.co.jp/fwin2k/operation/mscluster01/mscluster01_01.html>

図１０で説明した特許文献１記載の既存のクラスタシステム９０１は、前述したように、通常動作時には負荷を分散しつつ、異常発生時にもすぐに処理を再開することが可能である。 As described above, the existing cluster system 901 described in Patent Document 1 described with reference to FIG. 10 can immediately resume processing even when an abnormality occurs while distributing the load during normal operation.

しかしながらこの構成では、ＮＶＲＡＭ９１２または９２２上のデータを共有ストレージ９３０に反映を完了させないと、データの整合性を保つことができない。この処理の途中で異常が発生した場合、共有ストレージ９３０に途中まで書き込んでいたデータの整合性は失われてしまう。 However, in this configuration, data consistency cannot be maintained unless data on the NVRAM 912 or 922 is reflected in the shared storage 930. If an abnormality occurs during this process, the consistency of data that has been written to the shared storage 930 halfway is lost.

前述の第一ノード９１０で異常が発生した場合の例でいえば、第二ノード９２０側で第一ノード用領域９２２ａ上のデータに対するリカバリ処理、即ち当該データを共有ストレージ９３０に記憶させる処理が、第二ノード９２０が第一ノード９１０の動作を引き継ぐためには必要である。第二ノード９２０で異常が発生した場合も同様である。 Speaking of an example when an abnormality occurs in the first node 910 described above, recovery processing for data on the first node area 922a on the second node 920 side, that is, processing for storing the data in the shared storage 930, It is necessary for the second node 920 to take over the operation of the first node 910. The same applies when an abnormality occurs in the second node 920.

従って、そのリカバリ処理を行うための時間が必要であり、クライアント装置９４０に異常の発生を報告する必要が生じることともなる。この問題を解決しうる技術は、残る特許文献２〜３および非特許文献１〜２にも記載されていない。 Therefore, it takes time to perform the recovery process, and it becomes necessary to report the occurrence of an abnormality to the client device 940. The technology that can solve this problem is not described in the remaining Patent Documents 2-3 and Non-Patent Documents 1-2.

本発明の目的は、クライアント装置側に異常の発生を意識させることなく、ごく短時間でフェイルオーバー処理を行うことを可能とするクラスタシステム、ノード、フェイルオーバー方法およびプログラムを提供することにある。 An object of the present invention is to provide a cluster system, a node, a failover method, and a program capable of performing failover processing in a very short time without making the client device aware of the occurrence of an abnormality.

上記目的を達成するため、本発明に係るノード装置は、他のノード装置と相互に接続されてクラスタシステムを構成すると共に、他のノード装置との間で共有される外部記憶装置である共有ストレージに接続されてなるノード装置であって、メモリと、接続された各クライアント装置のＩＰアドレスと当該クライアント装置のためのメモリとの間の対応関係を予め記憶しているＩＰアドレステーブル記憶手段と、クライアント装置からの要求に基づいて予め装備されたアプリケーションソフトを実行して処理を行い、これによって得られる処理データを処理依頼元のクライアント装置のＩＰアドレスにメモリに記憶させるアプリケーションソフト実行部と、記憶された処理データを他のノード装置のメモリに記憶させるミラーリング処理部と、他のノード装置のメモリに記憶させられた後の処理データを共有ストレージに書き込むストレージ記憶部と、他のノード装置のメモリに処理データが残った状態で当該他のノード装置に異常が発生した場合に、ＩＰアドレステーブル記憶手段に記憶された対応関係を変更して当該他のノード装置による処理を引き継ぐフェイルオーバー処理部と、を有することを特徴とする。 To achieve the above object, the shared storage is the node device according to the present invention, are interconnected with other node devices with configuring the cluster system, which is an external storage device that is shared with another node device a connected the node device to be a memory, an IP address table memory means a correspondence relationship stored in advance between the memory for the connected IP address and the client device of each client apparatus, An application software execution unit for executing processing by executing pre-installed application software based on a request from the client device, and storing processing data obtained thereby in the memory at the IP address of the client device of the processing request source, and storage mirroring processing unit for storing the processed data in the memory of another node device A storage memory unit for writing the processed data after it has been stored in the memory of another node device to the shared storage, when an abnormality to the other node device occurs when the processing data remains in the memory of another node device And a failover processing unit that changes the correspondence stored in the IP address table storage unit and takes over the processing by the other node device .

上記目的を達成するため、本発明に係るクラスタシステムは、第１および第２のノード装置と、第１および第２のノード装置の間で共有される外部記憶装置である共有ストレージとが相互に接続されて構築されたクラスタシステムであって、第１および第２のノード装置が、請求項１又は２に記載のノード装置であることを特徴とする。 In order to achieve the above object, in the cluster system according to the present invention, the first and second node devices and the shared storage that is an external storage device shared between the first and second node devices are mutually connected. A cluster system constructed by being connected, wherein the first and second node devices are the node devices according to claim 1 or 2 .

上記目的を達成するため、本発明に係るフェイルオーバー方法は、他のノード装置と相互に接続されてクラスタシステムを構成すると共に、他のノード装置との間で共有される外部記憶装置である共有ストレージに接続されてなるノード装置にあって、接続された各クライアント装置のＩＰアドレスと当該クライアント装置のためのメモリとの間の対応関係が予め備えられたＩＰアドレステーブル記憶手段に記憶されたものであると共に、アプリケーションソフトをアプリケーションソフト実行部が実行し、アプリケーションソフトによって得られた処理データをクライアント装置のＩＰアドレスに対応するメモリ上にアプリケーションソフト実行部が一時的に保存し、記憶された処理データをミラーリング処理部が他のノード装置の対応する記憶領域に記憶させ、他のノード装置のメモリに記憶させられた後の処理データをストレージ記憶部が共有ストレージに書き込み、他のノード装置のメモリに処理データが残った状態で当該他のノード装置に異常が発生した場合に、ＩＰアドレステーブル記憶手段に記憶された対応関係をフェイルオーバー処理部が変更して当該他のノード装置による処理を引き継ぐことを特徴とする。 To achieve the above object, the failover method according to the present invention, are connected to each other with other node devices with configuring the cluster system, which is an external storage device that is shared with the other node devices sharing in the node device formed by connecting to the storage, that is stored in the IP address table storage means corresponding relationship is previously provided between the memory for the connected IP address and the client device of each client apparatus with it, processing of the application software application software execution unit is executed, the application software execution unit in the memory corresponding to process data obtained by the application software to the IP address of the client device is temporarily stored, stored Data mirroring processing unit supports other node devices. Stored in the storage area, writes the processed data after it has been stored in the memory of another node device to the shared storage storage storage unit, in a state where the memory in the processing data remained in the other node devices of the other When an abnormality occurs in a node device, the failover processing unit changes the correspondence stored in the IP address table storage unit and takes over the processing by the other node device .

上記目的を達成するため、本発明に係るフェイルオーバープログラムは、他のノード装置と相互に接続されてクラスタシステムを構成すると共に、他のノード装置との間で共有される外部記憶装置である共有ストレージに接続されてなるノード装置にあって、接続された各クライアント装置のＩＰアドレスと当該クライアント装置のためのメモリとの間の対応関係が予め備えられたＩＰアドレステーブル記憶手段に記憶されたものであると共に、ノード装置が備えるプロセッサに、アプリケーションソフトを実行する手順、アプリケーションソフトによって得られた処理データをクライアント装置のＩＰアドレスに対応するメモリ上に一時的に保存する手順、記憶された処理データを他のノード装置の対応するメモリに記憶させる手順、他のノード装置の対応する記憶領域に記憶させられた後の処理データを共有ストレージに書き込む手順、および他のノード装置のメモリに処理データが残った状態で当該他のノード装置に異常が発生した場合に、ＩＰアドレステーブル記憶手段に記憶された対応関係をフェイルオーバー処理部が変更して当該他のノード装置による処理を引き継ぐ手順を実行させることを特徴とする。 To achieve the above object, the failover program according to the present invention, are connected to each other with other node devices with configuring the cluster system, which is an external storage device that is shared with the other node devices sharing in the node device formed by connecting to the storage, that is stored in the IP address table storage means corresponding relationship is previously provided between the memory for the connected IP address and the client device of each client apparatus In addition, a procedure for executing application software on a processor included in the node device, a procedure for temporarily storing processing data obtained by the application software in a memory corresponding to the IP address of the client device , and stored processing data the procedure to be stored in the corresponding memory of another node device, other If an abnormality occurs in the corresponding procedure writes the processed data after it has been stored in the storage area in the shared storage, and the other node device in a state where the memory in the processing data remained of another node device over de device Further, the failover processing unit changes the correspondence stored in the IP address table storage unit, and the procedure for taking over the processing by the other node device is executed.

本発明は、上記したように、各クライアント装置のＩＰアドレスに対応して不揮発性メモリの記憶領域を区切って、ミラーリング処理を行ってから共有ストレージに書き込むように構成したので、一方のノード装置で異常が発生しても不揮発性メモリに記憶されたデータの整合性は保たれる。これによってクライアント装置側に異常の発生を意識させることなく、ごく短時間でフェイルオーバー処理を行うことが可能であるという、優れた特徴を持つクラスタシステム、ノード、フェイルオーバー方法およびプログラムを提供することができる。 As described above, the present invention is configured so that the storage area of the nonvolatile memory is divided in accordance with the IP address of each client device, the mirroring process is performed, and then the shared storage is written. Even if an abnormality occurs, the consistency of data stored in the nonvolatile memory is maintained. To provide a cluster system, a node, a failover method, and a program having an excellent feature that a failover process can be performed in a very short time without making the client device aware of an abnormality. Can do.

本発明の実施形態に係るクラスタシステムの構成について示す説明図である。It is explanatory drawing shown about the structure of the cluster system which concerns on embodiment of this invention. 図１に示した第一ノードに備えられるＩＰアドレステーブル記憶手段の記憶内容について示す説明図である。It is explanatory drawing shown about the memory content of the IP address table memory | storage means with which the 1st node shown in FIG. 1 is equipped. 図１に示したクラスタシステムで、第一ノードおよび第二ノードの正常動作時の処理の分担について示す説明図である。FIG. 3 is an explanatory diagram showing the sharing of processing during normal operation of the first node and the second node in the cluster system shown in FIG. 1. 図１に示したクラスタシステムで、第一ノードおよび第二ノードの異常発生時の処理の分担について示す説明図である。FIG. 3 is an explanatory diagram showing the sharing of processing when an abnormality occurs in the first node and the second node in the cluster system shown in FIG. 1. 図１に示したクラスタシステムの正常時の動作について示す説明図である。FIG. 2 is an explanatory diagram illustrating an operation in a normal state of the cluster system illustrated in FIG. 1. 図１に示したクラスタシステムの正常時の動作について示すフローチャートである。2 is a flowchart illustrating an operation in a normal state of the cluster system illustrated in FIG. 1. 図６の続きである。FIG. 7 is a continuation of FIG. 図１に示したクラスタシステムの異常発生時の動作について示す説明図である。It is explanatory drawing shown about the operation | movement at the time of abnormality generation | occurrence | production of the cluster system shown in FIG. 図１に示したクラスタシステムの異常発生時の動作について示すフローチャートである。2 is a flowchart illustrating an operation when an abnormality occurs in the cluster system illustrated in FIG. 1. 特許文献１に記載されている、既存のクラスタシステムの構成について示す説明図である。FIG. 2 is an explanatory diagram showing a configuration of an existing cluster system described in Patent Document 1.

（実施形態）
以下、本発明の実施形態の構成について添付図１に基づいて説明する。
最初に、本実施形態の基本的な内容について説明し、その後でより具体的な内容について説明する。
本実施形態に係るノード装置（第一ノード１０および第二ノード２０）は、同一の構成を有する他のノード装置と相互に接続されてクラスタシステムを構成すると共に、他のノード装置との間で共有される同一の外部記憶装置である共有ストレージに接続されてなるノード装置である。このノード装置（第一ノード１０）は、予め複数の記憶領域に区切られた不揮発性メモリ（ＮＶＲＡＭ１２）と、接続された各クライアント装置４０のＩＰアドレスと当該クライアント装置が使用すべき不揮発性メモリの記憶領域との間の対応関係を予め記憶しているＩＰアドレステーブル記憶手段１５と、クライアント装置からの要求に基づいて予め装備されたアプリケーションソフトを実行して処理を行い、これによって得られる処理データを処理依頼元のクライアント装置のＩＰアドレスに対応する記憶領域上に記憶させるアプリケーションソフト実行部１０１と、記憶された処理データを他のノード装置の対応する記憶領域に記憶させるミラーリング処理部１０５と、他のノード装置の対応する記憶領域に記憶させられた後の処理データを共有ストレージに書き込むストレージ記憶部１０６とを有する。 (Embodiment)
Hereinafter, the configuration of an embodiment of the present invention will be described with reference to FIG.
First, the basic content of the present embodiment will be described, and then more specific content will be described.
The node devices (first node 10 and second node 20) according to the present embodiment are connected to other node devices having the same configuration to form a cluster system, and between the other node devices. This is a node device connected to a shared storage that is the same shared external storage device. The node device (first node 10) includes a nonvolatile memory (NVRAM 12) partitioned in advance into a plurality of storage areas, an IP address of each connected client device 40, and a nonvolatile memory to be used by the client device. Processing data obtained by executing processing by executing pre-installed application software based on a request from the client apparatus and the IP address table storage means 15 storing the correspondence relationship between the storage areas in advance Is stored in a storage area corresponding to the IP address of the requesting client device, a mirroring processing unit 105 that stores the stored processing data in a corresponding storage area of another node device, Processing data after being stored in the corresponding storage area of another node device And a storage memory 106 to write data to the shared storage.

ここで、ミラーリング処理部１０５は、処理データを他のノード装置の対応する記憶領域に記憶させる処理の完了後、処理要求元のクライアント装置４０に書き込み終了通知を返信する機能を有する。さらに、他のノード装置の不揮発性メモリに処理データが残った状態で当該他のノード装置に異常が発生した場合に、ＩＰアドレステーブル記憶手段に記憶された対応関係を変更して当該他のノード装置による処理を引き継ぐフェイルオーバー処理部１０７も備える。 Here, the mirroring processing unit 105 has a function of returning a write end notification to the processing request source client device 40 after completing the processing of storing the processing data in the corresponding storage area of the other node device. Further, when an abnormality occurs in the other node device with the processing data remaining in the non-volatile memory of the other node device, the correspondence stored in the IP address table storage unit is changed to change the other node A failover processing unit 107 that takes over processing by the apparatus is also provided.

以上の構成を備えることにより、ノード装置（第一ノード１０）は、クライアント装置側に異常の発生を意識させることなく、ごく短時間でフェイルオーバー処理を行うことが可能となる。
以下、これをより詳細に説明する。 With the above configuration, the node device (first node 10) can perform failover processing in a very short time without making the client device aware of the occurrence of an abnormality.
Hereinafter, this will be described in more detail.

図１は、本発明の実施形態に係るクラスタシステム１の構成について示す説明図である。クラスタシステム１は、第一ノード１０および第二ノード２０という各々のコンピュータ装置（サーバ）と、その他多数のクライアント装置４０とが、ネットワーク４１を介して相互に接続されて構成されている。 FIG. 1 is an explanatory diagram showing the configuration of the cluster system 1 according to the embodiment of the present invention. The cluster system 1 is configured by connecting each computer device (server) of the first node 10 and the second node 20 and many other client devices 40 via a network 41.

第一ノード１０および第二ノード２０は、単にクライアント装置４０から受信したデータを保存するファイルサーバでもよいし、また受信したデータに対して何らかの処理を行ってから保存するデータベースシステム、ウェブサーバ、業務システムなどでもよい。 The first node 10 and the second node 20 may be simply a file server that stores data received from the client device 40, or a database system, web server, business that stores data received after performing some processing. It may be a system.

またここで、第一ノード１０および第二ノード２０とクライアント装置４０との間で利用される通信方式は、クライアント装置４０のＯＳ（Operating System: 基本ソフト）がたとえばウィンドウズ（登録商標）であればＣＩＦＳ（Common Internet File System）を利用することができるし、ＵＮＩＸ（登録商標）であればＮＩＳ（Network File System）を利用することができる。これら以外にも、利用可能な通信方式であれば任意のものを使用することができる。 Here, the communication method used between the first node 10 and the second node 20 and the client device 40 is, for example, when the OS (Operating System: basic software) of the client device 40 is Windows (registered trademark). CIFS (Common Internet File System) can be used, and if UNIX (registered trademark), NIS (Network File System) can be used. In addition to these, any communication system that can be used can be used.

ここでサーバクラスタを構成している第一ノード１０および第二ノード２０は、共有の外部記憶装置である共有ストレージ３０と接続されている。そして、第一ノード１０および第二ノード２０は、処理にかかる負荷を相互に分散しつつ、記憶されているデータを互いにミラーリングして、どちらか一方に故障が発生した場合にその故障した方の装置で行われていた処理を残る一方が引き継いで続行する（これをフェイルオーバーという）ことができる構成となっている。 Here, the first node 10 and the second node 20 constituting the server cluster are connected to a shared storage 30 that is a shared external storage device. Then, the first node 10 and the second node 20 mirror the stored data while distributing the processing load to each other. One of the remaining processes performed by the apparatus can take over and continue (this is called a failover).

第一ノード１０は、コンピュータプログラムを実行する主体であるプロセッサ１１と、処理中のデータを一時的に記憶する不揮発性の主記憶装置であるＮＶＲＡＭ１２と、処理されたデータを共有ストレージ３０に固定的に記憶する外部ストレージ接続手段１３と、ネットワーク４１を介して第二ノード２０や各クライアント装置４０との間で通信を行う通信手段１４と、後述のＩＰアドレステーブル記憶手段１５とが備えられている。 The first node 10 is fixed to a shared storage 30 and a processor 11 that is a main body that executes a computer program, an NVRAM 12 that is a nonvolatile main storage device that temporarily stores data being processed, and a shared storage 30. External storage connection means 13 for storing in the network, communication means 14 for communicating with the second node 20 and each client device 40 via the network 41, and an IP address table storage means 15 to be described later. .

ＮＶＲＡＭ１２は、ＩＣＡ（InterConnectAccess）カードとして第一ノード１０内に実装されている。この方式によって接続されることにより、ＮＶＲＡＭ１２はプロセッサ１１上で動作するオペレーティングシステムを経由することなく、ＲＤＭＡ（Remote Direct Memory Access）プロトコルによってデータを第二ノード２０に転送することが可能となるであるというメリットがある。他にも、ＮＶＲＡＭ１２を実装する上では、任意の接続方式を利用することができる。 The NVRAM 12 is mounted in the first node 10 as an ICA (InterConnect Access) card. By connecting in this manner, the NVRAM 12 can transfer data to the second node 20 by RDMA (Remote Direct Memory Access) protocol without going through the operating system operating on the processor 11. There is a merit. In addition, an arbitrary connection method can be used for mounting the NVRAM 12.

プロセッサ１１は、クライアント装置４０からの依頼に基づく処理を行うアプリケーションソフトを実行してそのデータをＮＶＲＡＭ１２に書き込むアプリケーション実行部１０１と、ＮＶＲＡＭ１２との間のデータ交換を仲介するデバイスドライバを動作させるＮＶＲＡＭドライバ実行部１０２と、外部ストレージ接続手段１３を経由して共有ストレージ３０との間のデータ交換を仲介するデバイスドライバを動作させるストレージ装置ドライバ実行部１０３と、通信手段１４との間のデータ交換を仲介するデバイスドライバを動作させる外部通信ドライバ実行部１０４として機能する。 The processor 11 executes application software that performs processing based on a request from the client device 40 and writes the data to the NVRAM 12, and an NVRAM driver that operates a device driver that mediates data exchange between the NVRAM 12 Data exchange between the communication unit 14 and the storage device driver execution unit 103 that operates the device driver that mediates data exchange between the execution unit 102 and the shared storage 30 via the external storage connection unit 13 It functions as the external communication driver execution unit 104 that operates the device driver to be operated.

ＮＶＲＡＭ１２には、第一ノード１０および第二ノード２０で各々行われる処理について、後述するようにクライアント装置４０のＩＰアドレスのグループに応じてグループ０用領域１２ａおよびグループ１用領域１２ｂとに分かれている。第一ノード１０では、「グループ０」に属するクライアント装置４０からの依頼に応じてグループ０用領域１２ａに対してデータ処理を行い、その内容を第二ノード２０側の同領域にもコピーして反映する。 The NVRAM 12 is divided into a group 0 area 12a and a group 1 area 12b according to the IP address group of the client device 40, as will be described later. Yes. The first node 10 performs data processing on the group 0 area 12a in response to a request from the client device 40 belonging to “group 0”, and copies the contents to the same area on the second node 20 side. reflect.

プロセッサ１１はさらに、各々のプログラムの動作により、アプリケーション実行部１０１が「グループ０」に属するクライアント装置４０からの依頼に応じてグループ０用領域１２ａに書き込んだデータを第二ノード２０のグループ０用領域２２ａにコピーして反映させるミラーリング処理部１０５と、そのデータを共有ストレージ３０に記憶するストレージ記憶部１０６と、第二ノード２０で異常が発生した場合にその処理を引き継ぐようＩＰアドレステーブル記憶手段１５の記憶内容を変更するフェイルオーバー処理部１０７としても機能する。 The processor 11 further uses the data written by the application execution unit 101 in the group 0 area 12a in response to a request from the client device 40 belonging to “group 0” by the operation of each program. The mirroring processing unit 105 that is copied and reflected in the area 22a, the storage storage unit 106 that stores the data in the shared storage 30, and the IP address table storage unit that takes over the processing when an abnormality occurs in the second node 20 15 also functions as a failover processing unit 107 that changes the stored contents.

第二ノード２０は、第一ノード１０と、ハードウェア的にもソフトウェア的にも同一の構成を有している。従って、第二ノード２０の各機能部については、第一ノード１０と呼称を全て同一とし、参照番号は最上位の「１」を「２」に代えた以外は全て第一ノード１０と同一とする。即ち、ハードウェア的にはプロセッサ２１、ＮＶＲＡＭ２２…などのようにいい、ソフトウェア的にはアプリケーション実行部２０１、ＮＶＲＡＭドライバ実行部２０２…などのようにいう。 The second node 20 has the same configuration as the first node 10 both in terms of hardware and software. Accordingly, the functional units of the second node 20 are all the same as the first node 10 except that the names of the first nodes 10 are all the same, and the reference numbers are the same except that the highest-ranked “1” is replaced with “2”. To do. That is, the hardware is referred to as the processor 21, the NVRAM 22,..., And the software is referred to as the application execution unit 201, the NVRAM driver execution unit 202, etc.

図２は、図１に示した第一ノード１０に備えられるＩＰアドレステーブル記憶手段１５の記憶内容について示す説明図である。クライアント装置４０は、各々のＩＰアドレスのグループに応じて「グループ０」および「グループ１」の２グループに分かれる。ＩＰアドレステーブル記憶手段１５には、各クライアント装置４０のＩＰアドレスと、それに該当するクライアント装置４０が「グループ０」および「グループ１」のうちのいずれに属するかについて記憶されている。第二ノード２０にも、これと同一の内容を記憶するＩＰアドレステーブル記憶手段２５が存在する。 FIG. 2 is an explanatory diagram showing the contents stored in the IP address table storage means 15 provided in the first node 10 shown in FIG. The client device 40 is divided into two groups, “Group 0” and “Group 1”, according to each IP address group. The IP address table storage means 15 stores the IP address of each client device 40 and whether the corresponding client device 40 belongs to “Group 0” or “Group 1”. The second node 20 also has an IP address table storage unit 25 that stores the same contents.

ここでいう「グループ０」および「グループ１」の分け方については、たとえばＩＰアドレスの範囲によって分けてもよいし、１個ごとのＩＰアドレスについて「グループ０」もしくは「グループ１」を指定してもよい。また、ＩＰアドレスではなくホスト名で「グループ０」もしくは「グループ１」に分類してもよい。 The “group 0” and “group 1” can be divided according to the IP address range, for example, or “Group 0” or “Group 1” is designated for each IP address. Also good. Further, it may be classified into “group 0” or “group 1” not by IP address but by host name.

図３は、図１に示したクラスタシステム１で、第一ノード１０および第二ノード２０の正常動作時の処理の分担について示す説明図である。第一ノード１０のＮＶＲＡＭ１２は前述のようにグループ０用領域１２ａおよびグループ１用領域１２ｂとに分かれ、第二ノード２０のＮＶＲＡＭ２２も同様にグループ０用領域２２ａおよびグループ１用領域２２ｂとに分かれる。 FIG. 3 is an explanatory diagram showing the sharing of processing during normal operation of the first node 10 and the second node 20 in the cluster system 1 shown in FIG. As described above, the NVRAM 12 of the first node 10 is divided into the group 0 area 12a and the group 1 area 12b, and the NVRAM 22 of the second node 20 is similarly divided into the group 0 area 22a and the group 1 area 22b.

ＩＰアドレステーブル記憶手段１５（２５）に記憶された各クライアント装置４０の「グループ０」および「グループ１」の区分について、第一ノード１０のアプリケーション実行部１０１は、「グループ０」に属するクライアント装置４０からの依頼に応じて処理を行い、その処理結果をＮＶＲＡＭ１２のグループ０用領域１２ａに書き込む。第二ノード２０のアプリケーション実行部２０１は、「グループ１」に属するクライアント装置４０からの依頼に応じて処理を行い、その処理結果をＮＶＲＡＭ２２のグループ１用領域２２ｂに書き込む。 For the “group 0” and “group 1” classifications of the client devices 40 stored in the IP address table storage unit 15 (25), the application execution unit 101 of the first node 10 assigns the client devices belonging to “group 0”. Processing is performed in response to a request from 40, and the processing result is written in the group 0 area 12 a of the NVRAM 12. The application execution unit 201 of the second node 20 performs processing in response to a request from the client device 40 belonging to “group 1”, and writes the processing result in the group 1 area 22 b of the NVRAM 22.

そして、「グループ０」に属するクライアント装置４０からの依頼に応じた処理の場合、第一ノード１０のアプリケーション実行部１０１がデータ処理を行った後で第一ノード１０および第二ノード２０のミラーリング処理部１０５および２０５は、ＮＶＲＡＭ１２および２２の相互の内容をコピーしあい、そして第一ノード１０のストレージ記憶部１０６がそのデータをストレージ装置ドライバ実行部１０３および外部ストレージ接続手段１３を介して共有ストレージ３０にそのデータを記憶する。 In the case of processing according to a request from the client device 40 belonging to “group 0”, the mirroring processing of the first node 10 and the second node 20 after the application execution unit 101 of the first node 10 performs the data processing. The units 105 and 205 copy the mutual contents of the NVRAMs 12 and 22, and the storage storage unit 106 of the first node 10 transfers the data to the shared storage 30 via the storage device driver execution unit 103 and the external storage connection unit 13. Store the data.

「グループ１」に属するクライアント装置４０からの依頼に応じた処理の場合は、第二ノード２０のアプリケーション実行部２０１が先に処理を行う点以外は、「グループ０」の場合と同一である。 The processing in response to the request from the client device 40 belonging to “group 1” is the same as the case of “group 0” except that the application execution unit 201 of the second node 20 performs the processing first.

図４は、図１に示したクラスタシステム１で、第一ノード１０および第二ノード２０の異常発生時の処理の分担について示す説明図である。第一ノード１０の側で、共有ストレージ３０への書き込みの済んでいないデータがＮＶＲＡＭ１２に残っている状態で第一ノード１０が異常を起こして停止した場合、ミラーリング処理部１０５によって、ＮＶＲＡＭ２２のグループ０用領域２２ａにその未反映データがコピーされている。 FIG. 4 is an explanatory diagram showing the sharing of processing when an abnormality occurs in the first node 10 and the second node 20 in the cluster system 1 shown in FIG. On the first node 10 side, when the first node 10 is stopped due to an abnormality while data that has not been written to the shared storage 30 remains in the NVRAM 12, the mirroring processing unit 105 causes the group 0 of the NVRAM 22 to stop. The unreflected data is copied to the use area 22a.

そこで、第二ノード２０が「グループ０」に属するクライアント装置４０からの依頼によるその処理を引き継ぎ、クライアント装置４０からの依頼に応じた処理によるデータを、ＮＶＲＡＭ２２のグループ０用領域２２ａにコピーされた分のデータに続いて書き込んで、これを共有ストレージ３０に記憶する。 Therefore, the second node 20 takes over the processing by the request from the client device 40 belonging to “group 0”, and the data by the processing according to the request from the client device 40 is copied to the group 0 area 22 a of the NVRAM 22. After the minute data is written, it is stored in the shared storage 30.

図５は、図１に示したクラスタシステム１の正常時の動作について示す説明図である。図６〜７（図面の錯綜回避のため２枚に分ける）は、図１に示したクラスタシステム１の正常時の動作について示すフローチャートである。より詳しくは、図５では、「グループ０」に属するクライアント装置４０からの処理依頼に応じての動作を示している。 FIG. 5 is an explanatory diagram showing the normal operation of the cluster system 1 shown in FIG. 6 to 7 (divided into two sheets for avoiding confusion in the drawing) are flowcharts showing the normal operation of the cluster system 1 shown in FIG. More specifically, FIG. 5 shows an operation in response to a processing request from the client device 40 belonging to “group 0”.

クライアント装置４０からデータ処理依頼がなされた場合、第一ノード１０および第二ノード２０のアプリケーション実行部１０１および２０１はそれぞれ、各々のＩＰアドレステーブル記憶手段１５および２５を参照して、「グループ０」および「グループ１」のどちらに属するクライアント装置４０からの処理依頼かを判断する（ステップＳ３０１および３５１）。 When a data processing request is made from the client device 40, the application execution units 101 and 201 of the first node 10 and the second node 20 refer to the IP address table storage units 15 and 25, respectively, and “group 0”. It is determined whether the processing request is from the client apparatus 40 belonging to “Group 1” (steps S301 and S351).

「グループ０」に属するクライアント装置４０からの処理依頼だった場合、第一ノード１０のアプリケーション実行部１０１がその依頼に対応する処理を行い、その処理結果をＮＶＲＡＭ１２のグループ０用領域１２ａに書き込む（ステップＳ３０２）。そしてミラーリング処理部１０５がそのデータを第二ノード２０のミラーリング処理部２０５に渡して記憶させる（ステップＳ３０３）。これを受けた第二ノード２０のミラーリング処理部２０５は、そのデータをＮＶＲＡＭ２２のグループ０用領域２２ａに書き込む（ステップＳ３０７）。 In the case of a processing request from the client device 40 belonging to “group 0”, the application execution unit 101 of the first node 10 performs processing corresponding to the request, and writes the processing result in the group 0 area 12a of the NVRAM 12 ( Step S302). Then, the mirroring processing unit 105 passes the data to the mirroring processing unit 205 of the second node 20 and stores it (step S303). Receiving this, the mirroring processing unit 205 of the second node 20 writes the data in the group 0 area 22a of the NVRAM 22 (step S307).

そして、ミラーリング処理部１０５が依頼元のクライアント装置４０に書き込み終了通知を返し（ステップＳ３０４）、ストレージ記憶部１０６がクライアント装置４０に対して書き込み終了を通知するタイミングでそのデータをストレージ装置ドライバ実行部１０３および外部ストレージ接続手段１３を介して共有ストレージ３０にそのデータを記憶する（ステップＳ３０５）。共有ストレージ３０への書き込みが終了したら、ミラーリング処理部１０５はそのデータをＮＶＲＡＭ１２のグループ０用領域１２ａから削除する（ステップＳ３０６）。 Then, the mirroring processing unit 105 returns a write end notification to the requesting client device 40 (step S304), and the storage device driver execution unit sends the data at a timing when the storage storage unit 106 notifies the client device 40 of the write end. The data is stored in the shared storage 30 via 103 and the external storage connection means 13 (step S305). When the writing to the shared storage 30 is completed, the mirroring processing unit 105 deletes the data from the group 0 area 12a of the NVRAM 12 (step S306).

「グループ１」に属するクライアント装置４０からの処理依頼だった場合、第二ノード２０のアプリケーション実行部２０１がその依頼に対応する処理を行い、その処理結果をＮＶＲＡＭ２２のグループ１用領域２２ｂに書き込む（ステップＳ３５２）。そしてミラーリング処理部２０５がそのデータを第一ノード１０のミラーリング処理部１０５に渡して記憶させる（ステップＳ３５３）。これを受けた第一ノード１０のミラーリング処理部１０５は、そのデータをＮＶＲＡＭ１２のグループ１用領域１２ｂに書き込む（ステップＳ３５７）。 In the case of a processing request from the client device 40 belonging to “group 1”, the application execution unit 201 of the second node 20 performs processing corresponding to the request and writes the processing result in the group 1 area 22b of the NVRAM 22 ( Step S352). Then, the mirroring processing unit 205 passes the data to the mirroring processing unit 105 of the first node 10 and stores it (step S353). Receiving this, the mirroring processing unit 105 of the first node 10 writes the data in the group 1 area 12b of the NVRAM 12 (step S357).

そして、ミラーリング処理部２０５が依頼元のクライアント装置４０に書き込み終了通知を返し（ステップＳ３５４）、ストレージ記憶部２０６がそのデータをストレージ装置ドライバ実行部２０３および外部ストレージ接続手段２３を介して共有ストレージ３０にそのデータを記憶する（ステップＳ３５５）。共有ストレージ３０への書き込みが終了したら、ミラーリング処理部２０５はそのデータをＮＶＲＡＭ２２のグループ１用領域２２ａから削除する（ステップＳ３５６）。 Then, the mirroring processing unit 205 returns a write end notification to the requesting client device 40 (step S354), and the storage storage unit 206 transmits the data to the shared storage 30 via the storage device driver execution unit 203 and the external storage connection unit 23. The data is stored in (step S355). When the writing to the shared storage 30 is completed, the mirroring processing unit 205 deletes the data from the group 1 area 22a of the NVRAM 22 (step S356).

図８は、図１に示したクラスタシステム１の異常発生時の動作について示す説明図である。図９は、図１に示したクラスタシステム１の異常発生時の動作について示すフローチャートである。より詳しくは、「グループ０」に属するクライアント装置４０からの処理依頼に応じて第一ノード１０が図５・ステップＳ３０５に示した共有ストレージ３０への書き込み処理を行っている間に、この第一ノード１０に故障が発生して、ＮＶＲＡＭ１２のグループ０用領域１２ａに共有ストレージ３０へ未反映のデータが残ってしまった場合について、図８〜９では示している。 FIG. 8 is an explanatory diagram showing an operation when an abnormality occurs in the cluster system 1 shown in FIG. FIG. 9 is a flowchart showing an operation when an abnormality occurs in the cluster system 1 shown in FIG. More specifically, the first node 10 performs the write process to the shared storage 30 shown in FIG. 5 / step S305 in response to a processing request from the client device 40 belonging to “group 0”. FIGS. 8 to 9 show a case where a failure occurs in the node 10 and unreflected data remains in the shared storage 30 in the group 0 area 12a of the NVRAM 12. FIG.

この段階であれば、ミラーリング処理部１０５によって、ＮＶＲＡＭ２２のグループ０用領域２２ａにその未反映データがコピーされている。そこで、異常発生を検出したフェイルオーバー処理部２０７は（ステップＳ４０１）、第二ノード２０のＩＰアドレステーブル記憶手段２５で、「グループ０」に属するクライアント装置４０の処理を第二ノード２０が引き継ぐように設定し直す（ステップＳ４０２）。 At this stage, the unreflected data is copied to the group 0 area 22 a of the NVRAM 22 by the mirroring processing unit 105. Therefore, the failover processing unit 207 that has detected the occurrence of an abnormality (step S401) causes the second node 20 to take over the processing of the client device 40 belonging to “group 0” in the IP address table storage unit 25 of the second node 20. (Step S402).

ここで、第一ノード１０および第二ノード２０では、互いが正常に動作していることを確認するために、常時周期的にハートビート通信を行っている。本実施形態で「ＮＶＲＡＭに未反映データが残ったままの状態で、一方のノードで異常が発生した」ことを検出して上記ステップＳ４０１からの動作開始は、このハートビート通信に対する返答が一定時間ないことを検出した場合をその契機とすることができる。 Here, the first node 10 and the second node 20 always perform heartbeat communication periodically to confirm that they are operating normally. In the present embodiment, it is detected that “an abnormality has occurred in one node while the non-reflected data remains in the NVRAM”, and the operation start from step S401 is a response to this heartbeat communication for a certain period of time. The case where it is detected that there is nothing can be the trigger.

これによって、ＮＶＲＡＭ２２のグループ０用領域２２ａに残っている未反映データの続きの処理をアプリケーション実行部２０１が行ってそのデータをＮＶＲＡＭ２２のグループ０用領域２２ａに書き込み（ステップＳ４０３）、ミラーリング処理部２０５が依頼元のクライアント装置４０に書き込み終了通知を返し（ステップＳ４０４）、ストレージ記憶部２０６がそのデータをストレージ装置ドライバ実行部２０３および外部ストレージ接続手段２３を介して共有ストレージ３０にそのデータを記憶する（ステップＳ４０５）。 As a result, the application execution unit 201 performs subsequent processing of unreflected data remaining in the group 0 area 22a of the NVRAM 22 and writes the data to the group 0 area 22a of the NVRAM 22 (step S403). Returns a write end notification to the requesting client device 40 (step S404), and the storage storage unit 206 stores the data in the shared storage 30 via the storage device driver execution unit 203 and the external storage connection means 23. (Step S405).

共有ストレージ３０への書き込みが終了したら、ミラーリング処理部２０５はそのデータをＮＶＲＡＭ２２のグループ１用領域２２ａから削除する（ステップＳ４０６）。以上の処理によって、順序性をシビアに要求される処理であっても、データのリカバリ処理を行うことなく処理を継続することが可能となる。即ち、ごく短時間でサービスを復旧させることが可能となるので、クライアント装置４０の側に故障の発生を意識させること自体が必要ない。 When the writing to the shared storage 30 is completed, the mirroring processing unit 205 deletes the data from the group 1 area 22a of the NVRAM 22 (step S406). With the above processing, even if processing is strictly required for order, processing can be continued without performing data recovery processing. That is, since the service can be restored in a very short time, it is not necessary to make the client device 40 aware of the occurrence of the failure.

（実施形態の全体的な動作）
次に、上記の実施形態の全体的な動作について説明する。
本実施形態に係るフェイルオーバー方法は、同一の構成を有する他のノード装置と相互に接続されてクラスタシステムを構成すると共に、他のノード装置との間で共有される同一の外部記憶装置である共有ストレージに接続されてなるノード装置（第一ノード１０）にあって、予め備えられた不揮発性メモリは複数の記憶領域に区切られており、接続された各クライアント装置のＩＰアドレスと当該クライアント装置が使用すべき不揮発性メモリの記憶領域との間の対応関係が予め備えられたＩＰアドレステーブル記憶手段に記憶されたものであると共に、アプリケーションソフトをアプリケーションソフト実行部が実行し、アプリケーションソフトによって得られた処理データを不揮発性メモリ上のクライアント装置のＩＰアドレスに対応する記憶領域上にアプリケーションソフト実行部が一時的に保存し（図６・ステップＳ３０２または３５２）、記憶された処理データをミラーリング処理部が他のノード装置の対応する記憶領域に記憶させ（図６・ステップＳ３０３または３５３）、他のノード装置の対応する記憶領域に記憶させられた後の処理データをストレージ記憶部が共有ストレージに書き込む（図６・ステップＳ３０５または３５５）。 (Overall operation of the embodiment)
Next, the overall operation of the above embodiment will be described.
The failover method according to the present embodiment is the same external storage device that is interconnected with other node devices having the same configuration to form a cluster system and is shared with other node devices. In the node device (first node 10) connected to the shared storage, the nonvolatile memory provided in advance is divided into a plurality of storage areas, and the IP address of each connected client device and the client device Is stored in the IP address table storage means provided in advance, and the application software is executed by the application software execution unit and obtained by the application software. The processed data to the IP address of the client device on the non-volatile memory The application software execution unit temporarily stores the data in the storage area (FIG. 6, step S302 or 352), and the mirroring processing unit stores the stored processing data in the corresponding storage area of another node device (FIG. 6). Step S303 or 353), the processing data stored in the corresponding storage area of the other node device is written in the shared storage by the storage storage unit (FIG. 6, step S305 or 355).

そして、他のノード装置（第二ノード２０）の不揮発性メモリに処理データが残った状態で当該他のノード装置に異常が発生した場合に、ＩＰアドレステーブル記憶手段に記憶された対応関係をフェイルオーバー処理部が変更して当該他のノード装置による処理を引き継ぐ（図９・ステップＳ４０２〜４０３）。 Then, when an abnormality occurs in the other node device with the processing data remaining in the non-volatile memory of the other node device (second node 20), the correspondence relationship stored in the IP address table storage unit is failed. The over processing unit changes and takes over the processing by the other node device (FIG. 9, steps S402 to S403).

ここで、上記各動作ステップについては、これをコンピュータで実行可能にプログラム化し、これらを前記各ステップを直接実行するノード装置（第一ノード１０）のプロセッサ１１に実行させるようにしてもよい。本プログラムは、非一時的な記録媒体、例えば、ＤＶＤ、ＣＤ、フラッシュメモリ等に記録されてもよい。その場合、本プログラムは、記録媒体からコンピュータによって読み出され、実行される。
この動作により、本実施形態は以下のような効果を奏する。 Here, each of the above-described operation steps may be programmed to be executable by a computer, and may be executed by the processor 11 of the node device (first node 10) that directly executes each of the steps. The program may be recorded on a non-temporary recording medium, such as a DVD, a CD, or a flash memory. In this case, the program is read from the recording medium by a computer and executed.
By this operation, this embodiment has the following effects.

本実施形態によれば、第一ノード１０もしくは第二ノード２０のうちのいずれかに異常が発生して停止した場合でも、残る一方のノードが備えるＮＶＲＡＭ１２または２２に対応する記憶領域が存在しており、その領域に共有ストレージ３０にすぐ書き込める状態のデータが、ミラーリング処理によって整合性が取れた状態で記憶されている。 According to the present embodiment, even when an abnormality occurs in either the first node 10 or the second node 20, the storage area corresponding to the NVRAM 12 or 22 provided in the remaining one node exists. In this area, data that can be immediately written to the shared storage 30 is stored in a state in which consistency is achieved by the mirroring process.

従って、ＩＰアドレステーブル記憶手段１５または２５に記憶された各クライアント装置に属する記憶領域に応じて、データの書き込みを行う領域を切り替えれば、データのリカバリ処理を行う必要は無く、すぐに残る一方のノードで処理を引き継いで続行させることが可能となる。これによって、クライアント装置の側に故障の発生を意識させることなく、ごく短時間でサービスを復旧させることが可能となる。 Therefore, if the area where data is written is switched according to the storage area belonging to each client device stored in the IP address table storage means 15 or 25, there is no need to perform data recovery processing. It becomes possible to take over the processing at the node and continue. This makes it possible to restore the service in a very short time without making the client device aware of the occurrence of the failure.

（実施形態の拡張）
上記実施形態は、以上で説明した本発明の趣旨を改変しない範囲で、様々な拡張が可能である。以下、これについて説明する。 (Extended embodiment)
The above-described embodiment can be variously expanded without departing from the spirit of the present invention described above. This will be described below.

まず、上記実施形態は第一ノード１０および第二ノード２０という２台のサーバコンピュータによる構成例を示したが、これが３台以上になってももちろんよい。その場合、各クライアント装置をサーバコンピュータの台数分のグループに予め分けたＩＰアドレステーブルを、各々のＩＰアドレステーブル記憶手段に記憶させた上で、上記と同様の動作を行うこととなる。 First, although the said embodiment showed the example of a structure by two server computers called the 1st node 10 and the 2nd node 20, of course, even if this becomes three or more units | sets, it is good. In such a case, an IP address table in which each client device is divided into groups corresponding to the number of server computers is stored in each IP address table storage means, and the same operation as described above is performed.

さらに、各サーバコンピュータが、物理的に複数台のコンピュータによって構成されてもよい。そして、各サーバコンピュータが、同一の処理をクライアント装置のグループ毎に分担してもよいし、また別々の処理を並行して行うものとしてもよい。 Furthermore, each server computer may be physically configured by a plurality of computers. Each server computer may share the same processing for each group of client devices, or may perform separate processing in parallel.

これまで本発明について図面に示した特定の実施形態をもって説明してきたが、本発明は図面に示した実施形態に限定されるものではなく、本発明の効果を奏する限り、これまで知られたいかなる構成であっても採用することができる。 The present invention has been described with reference to the specific embodiments shown in the drawings. However, the present invention is not limited to the embodiments shown in the drawings, and any known hitherto provided that the effects of the present invention are achieved. Even if it is a structure, it is employable.

上述した実施形態について、その新規な技術内容の要点をまとめると、以下のようになる。なお、上記実施形態の一部または全部は、新規な技術として以下のようにまとめられるが、本発明は必ずしもこれに限定されるものではない。 Regarding the embodiment described above, the main points of the new technical contents are summarized as follows. In addition, although part or all of the said embodiment is summarized as follows as a novel technique, this invention is not necessarily limited to this.

（付記１）同一の構成を有する他のノード装置と相互に接続されてクラスタシステムを構成すると共に、前記他のノード装置との間で共有される同一の外部記憶装置である共有ストレージに接続されてなるノード装置であって、
予め複数の記憶領域に区切られた不揮発性メモリと、
接続された各クライアント装置のＩＰアドレスと当該クライアント装置が使用すべき前記不揮発性メモリの前記記憶領域との間の対応関係を予め記憶しているＩＰアドレステーブル記憶手段と、
前記クライアント装置からの要求に基づいて予め装備されたアプリケーションソフトを実行して処理を行い、これによって得られる処理データを処理依頼元の前記クライアント装置のＩＰアドレスに対応する前記記憶領域上に記憶させるアプリケーションソフト実行部と、
記憶された前記処理データを前記他のノード装置の対応する記憶領域に記憶させるミラーリング処理部と、
前記他のノード装置の対応する記憶領域に記憶させられた後の前記処理データを前記共有ストレージに書き込むストレージ記憶部と
を有することを特徴とするノード装置。 (Appendix 1) A cluster system is configured by being interconnected with other node devices having the same configuration, and is connected to a shared storage that is the same external storage device shared with the other node devices. A node device comprising:
A non-volatile memory previously partitioned into a plurality of storage areas;
IP address table storage means for storing in advance a correspondence relationship between the IP address of each connected client device and the storage area of the nonvolatile memory to be used by the client device;
Based on a request from the client device, the pre-installed application software is executed to perform processing, and processing data obtained thereby is stored in the storage area corresponding to the IP address of the client device of the processing request source. An application software execution unit;
A mirroring processing unit for storing the stored processing data in a corresponding storage area of the other node device;
A node device comprising: a storage storage unit that writes the processing data after being stored in a corresponding storage area of the other node device to the shared storage.

（付記２）前記ミラーリング処理部が、前記処理データを前記他のノード装置の対応する記憶領域に記憶させる処理の完了後、処理要求元の前記クライアント装置に書き込み終了通知を返信する機能を有することを特徴とする、付記１に記載のノード装置。 (Additional remark 2) The said mirroring process part has a function which returns a write completion notification to the said client apparatus of a process request origin after completion of the process which memorize | stores the said process data in the corresponding storage area of the said other node apparatus. The node device according to appendix 1, characterized by:

（付記３）前記他のノード装置の不揮発性メモリに前記処理データが残った状態で当該他のノード装置に異常が発生した場合に、前記ＩＰアドレステーブル記憶手段に記憶された対応関係を変更して当該他のノード装置による処理を引き継ぐフェイルオーバー処理部
を有することを特徴とする、付記１または付記２に記載のノード装置。 (Supplementary Note 3) When an abnormality occurs in another node device with the processing data remaining in the non-volatile memory of the other node device, the correspondence relationship stored in the IP address table storage unit is changed. The node device according to Appendix 1 or 2, further comprising a failover processing unit that takes over processing by the other node device.

（付記４）同一の構成を有する第１および第２のノード装置と、前記第１および第２のノード装置の間で共有される同一の外部記憶装置である共有ストレージとが相互に接続されて構築されたクラスタシステムであって、
前記第１および第２のノード装置が、付記１ないし付記３のうちいずれか１項に記載のノード装置であることを特徴とするクラスタシステム。 (Additional remark 4) The 1st and 2nd node apparatus which has the same structure, and the shared storage which is the same external storage device shared between the said 1st and 2nd node apparatus are mutually connected A built cluster system,
The cluster system, wherein the first and second node devices are node devices according to any one of appendix 1 to appendix 3.

（付記５）同一の構成を有する他のノード装置と相互に接続されてクラスタシステムを構成すると共に、前記他のノード装置との間で共有される同一の外部記憶装置である共有ストレージに接続されてなるノード装置にあって、
予め備えられた不揮発性メモリは複数の記憶領域に区切られており、
接続された各クライアント装置のＩＰアドレスと当該クライアント装置が使用すべき前記不揮発性メモリの前記記憶領域との間の対応関係が予め備えられたＩＰアドレステーブル記憶手段に記憶されたものであると共に、
前記アプリケーションソフトをアプリケーションソフト実行部が実行し、
前記アプリケーションソフトによって得られた前記処理データを前記不揮発性メモリ上の前記クライアント装置のＩＰアドレスに対応する前記記憶領域上に前記アプリケーションソフト実行部が一時的に保存し、
記憶された前記処理データをミラーリング処理部が前記他のノード装置の対応する記憶領域に記憶させ、
前記他のノード装置の対応する記憶領域に記憶させられた後の前記処理データをストレージ記憶部が前記共有ストレージに書き込む
ことを特徴とするフェイルオーバー方法。 (Additional remark 5) It connects with the other node apparatus which has the same structure, comprises a cluster system, and is connected to the shared storage which is the same external storage device shared with the said other node apparatus. In the node device
The non-volatile memory provided in advance is divided into a plurality of storage areas,
The correspondence relationship between the IP address of each connected client device and the storage area of the nonvolatile memory to be used by the client device is stored in an IP address table storage means provided in advance,
The application software execution unit executes the application software,
The application software execution unit temporarily stores the processing data obtained by the application software on the storage area corresponding to the IP address of the client device on the nonvolatile memory,
The mirroring processing unit stores the stored processing data in a corresponding storage area of the other node device,
A failover method, wherein a storage storage unit writes the processing data after being stored in a corresponding storage area of the other node device to the shared storage.

（付記６）前記他のノード装置の不揮発性メモリに前記処理データが残った状態で当該他のノード装置に異常が発生した場合に、前記ＩＰアドレステーブル記憶手段に記憶された対応関係をフェイルオーバー処理部が変更して当該他のノード装置による処理を引き継ぐ
ことを特徴とする、付記５に記載のフェイルオーバー方法。 (Appendix 6) When an abnormality occurs in the other node device with the processing data remaining in the non-volatile memory of the other node device, the correspondence stored in the IP address table storage unit is failed over. The failover method according to appendix 5, wherein the processing unit changes and takes over the processing by the other node device.

（付記７）同一の構成を有する他のノード装置と相互に接続されてクラスタシステムを構成すると共に、前記他のノード装置との間で共有される同一の外部記憶装置である共有ストレージに接続されてなるノード装置にあって、
予め備えられた不揮発性メモリは複数の記憶領域に区切られており、
接続された各クライアント装置のＩＰアドレスと当該クライアント装置が使用すべき前記不揮発性メモリの前記記憶領域との間の対応関係が予め備えられたＩＰアドレステーブル記憶手段に記憶されたものであると共に、
前記ノード装置が備えるプロセッサに、
前記アプリケーションソフトを実行する手順、
前記アプリケーションソフトによって得られた前記処理データを前記不揮発性メモリ上の前記クライアント装置のＩＰアドレスに対応する前記記憶領域上に一時的に保存する手順、
記憶された前記処理データを前記他のノード装置の対応する記憶領域に記憶させる手順、
および前記他のノード装置の対応する記憶領域に記憶させられた後の前記処理データを前記共有ストレージに書き込む手順
を実行させることを特徴とするフェイルオーバープログラム。 (Supplementary Note 7) A cluster system is configured by being interconnected with other node devices having the same configuration, and is connected to a shared storage that is the same external storage device shared with the other node devices. In the node device
The non-volatile memory provided in advance is divided into a plurality of storage areas,
The correspondence relationship between the IP address of each connected client device and the storage area of the nonvolatile memory to be used by the client device is stored in an IP address table storage means provided in advance,
In the processor provided in the node device,
A procedure for executing the application software;
A procedure for temporarily storing the processing data obtained by the application software on the storage area corresponding to the IP address of the client device on the nonvolatile memory;
A procedure for storing the stored processing data in a corresponding storage area of the other node device;
And a failover program for executing a procedure for writing the processing data stored in the corresponding storage area of the other node device to the shared storage.

本発明は、コンピュータにおいてデータの整合性や順序性が要求される用途のクラスタシステムにおいて適用可能である。より具体的には、ファイルサーバ、データベースシステム、ウェブサーバ、業務システムなどに適用可能である。 The present invention can be applied to a cluster system for a purpose that requires data consistency and order in a computer. More specifically, the present invention can be applied to a file server, a database system, a web server, a business system, and the like.

１クラスタシステム
１０第一ノード
１１，２１プロセッサ
１２，２２ＮＶＲＡＭ
１２ａ，２２ａグループ０用領域
１２ｂ，２２ｂグループ１用領域１
１３，２３外部ストレージ接続手段
１４，２４通信手段
１５，２５ＩＰアドレステーブル記憶手段
２０第二ノード
３０共有ストレージ
４０クライアント装置
４１ネットワーク
１０１，２０１アプリケーション実行部
１０２，２０２ＮＶＲＡＭドライバ実行部
１０３，２０３ストレージ装置ドライバ実行部
１０４，２０４外部通信ドライバ実行部
１０５，２０５ミラーリング処理部
１０６，２０６ストレージ記憶部
１０７，２０７フェイルオーバー処理部 1 Cluster System 10 First Node 11, 21 Processor 12, 22 NVRAM
12a, 22a Group 0 area 12b, 22b Group 1 area 1
13, 23 External storage connection means 14, 24 Communication means 15, 25 IP address table storage means 20 Second node 30 Shared storage 40 Client device 41 Network 101, 201 Application execution unit 102, 202 NVRAM driver execution unit 103, 203 Storage device Driver execution unit 104, 204 External communication driver execution unit 105, 205 Mirroring processing unit 106, 206 Storage storage unit 107, 207 Failover processing unit

Claims

A node device that is connected to another node device to form a cluster system and is connected to a shared storage that is an external storage device shared with the other node device,
Memory ,
IP address table storage means for storing in advance the correspondence between the IP address of each connected client device and the memory for the client device;
Application software that performs processing by executing pre-installed application software based on a request from the client device, and stores processing data obtained thereby in the memory corresponding to the IP address of the client device that is the processing request source The execution part;
A mirroring processing unit for storing the stored processing data in a memory of the other node device;
A storage storage unit that writes the processing data after being stored in the memory of the other node device to the shared storage ;
When an abnormality occurs in the other node device while the processing data remains in the memory of the other node device, the correspondence stored in the IP address table storage unit is changed to change the other node device. Failover processing unit that takes over the processing by
A node device comprising:

The mirroring processing unit has a function of returning a write end notification to the client device as a processing request source after the processing for storing the processing data in the memory of the other node device is completed. 1. The node device according to 1.

A first and second node devices, a shared storage is an external storage device that is shared between the first and second node devices provide a cluster system constructed are connected to each other,
The cluster system according to claim 1, wherein the first and second node devices are the node devices according to claim 1 or 2 .

In a node device connected to another node device to form a cluster system and connected to a shared storage that is an external storage device shared with the other node device ,
The correspondence relationship between the IP address of each connected client device and the memory for the client device is stored in the IP address table storage means provided in advance,
The application software execution unit executes the application software,
The application software execution unit temporarily stores the processing data obtained by the application software on the memory corresponding to the IP address of the client device,
The mirroring processing unit stores the stored processing data in a corresponding storage area of the other node device,
Writes in the processing data storage memory unit said shared storage after being stored in the memory of the other node devices,
When an abnormality occurs in the other node device with the processing data remaining in the memory of the other node device, the failover processing unit changes the correspondence stored in the IP address table storage unit. Take over the processing by the other node device,
A failover method characterized by that.

In a node device connected to another node device to form a cluster system and connected to a shared storage that is an external storage device shared with the other node device ,
The correspondence relationship between the IP address of each connected client device and the memory for the client device is stored in the IP address table storage means provided in advance,
In the processor provided in the node device,
Procedure to execute application software ,
A procedure for temporarily storing processing data obtained by the application software on the memory corresponding to the IP address of the client device;
A procedure for storing the stored processing data in a corresponding memory of the other node device;
A procedure for writing the processing data after being stored in the memory of the other node device to the shared storage ;
And the failover processing unit changes the correspondence stored in the IP address table storage means when an abnormality occurs in the other node device while the processing data remains in the memory of the other node device. To take over the processing by the other node device
Failover program characterized by causing