JP4038510B2

JP4038510B2 - Cluster node status detection and communication

Info

Publication number: JP4038510B2
Application number: JP2005012195A
Authority: JP
Inventors: ケン・ゲーリー・ポマランスキ; アンドリュー・ハーヴェイ・バール
Original assignee: Hewlett Packard Development Co LP
Current assignee: Hewlett Packard Development Co LP
Priority date: 2004-01-23
Filing date: 2005-01-20
Publication date: 2008-01-30
Anticipated expiration: 2025-01-20
Also published as: US20050177779A1; JP2005209200A; GB2410405B; GB2410405A; US7228462B2; GB0501117D0

Description

本開示は、包括的には、コンピュータネットワークに関する。
より詳細には、本開示は、相互接続されたコンピュータシステムのクラスタに関する。 The present disclosure relates generally to computer networks.
More particularly, the present disclosure relates to a cluster of interconnected computer systems.

クラスタは、単一の統合された計算ユニットとして使用される相互接続されたコンピュータシステムまたはサーバの集合を備える並列システムまたは分散システムである。
クラスタのメンバは、ノードまたはシステムと呼ばれる。
クラスタサービスは、クラスタに関係したアクティビティを管理する各ノードのソフトウェアの集合である。
クラスタサービスは、すべての資源を同一のオブジェクトとみなす。
資源には、ディスクドライブやネットワークカード等の物理ハードウェアデバイス、または、特に論理ディスクボリューム、ＴＣＰ／ＩＰアドレス、アプリケーション全体、データベース全体等の論理アイテムが含まれ得る。
１つのグループは、単一のユニットとして管理される資源の集合である。
一般に、グループは、特定のアプリケーションを実行して、そのアプリケーションが提供するサービスにユーザが接続することを可能にするのに必要なコンポーネントのすべてを含む。
グループで実行されるオペレーションは、通常、そのグループ内に含まれるすべての資源に影響を与える。
クラスタ化は、２つまたは３つ以上のサーバを共に接続することによって、システムの可用性、性能、ならびにネットワークシステムおよびアプリケーションの容量を増大させる。 A cluster is a parallel or distributed system comprising a collection of interconnected computer systems or servers that are used as a single integrated computing unit.
Cluster members are called nodes or systems.
The cluster service is a collection of software of each node that manages activities related to the cluster.
The cluster service considers all resources as the same object.
Resources can include physical hardware devices such as disk drives and network cards, or logical items such as logical disk volumes, TCP / IP addresses, entire applications, entire databases, among others.
A group is a collection of resources managed as a single unit.
In general, a group includes all of the components necessary to run a particular application and allow a user to connect to services provided by that application.
Operations performed on a group typically affect all resources contained within that group.
Clustering increases system availability, performance, and network system and application capacity by connecting two or more servers together.

クラスタ化は、２つまたは３つ以上のＣＰＵを同時に使用して、アプリケーションまたはプログラムを実行する並列処理または並列計算に使用することができる。
クラスタ化は、システムアドミニストレータが既存のコンピュータおよびワークステーションを利用することを可能にするので、並列処理アプリケーションを実施するのにポピュラーな戦略である。
ネットワーク接続されたサーバに発行される要求の数を予測することは困難であるので、クラスタ化は、単一のサーバが圧倒されないように、処理アクティビティおよび通信アクティビティをネットワークシステム全体にわたって一様に分散させる負荷バランシングにも有益である。
１つのサーバが圧倒される危険を冒している場合に、より大きな容量を有する別のクラスタ化されたサーバに要求を転送することができる。
例えば、繁忙なウェブサイトは、負荷バランス方式を使用するために、２つまたは３つ以上のクラスタ化されたウェブサーバを使用することができる。
また、クラスタ化は、システムの負荷が増加するに伴い、新たなコンポーネントの追加を可能にすることによって拡張性も増大させる。
さらに、クラスタ化は、１つのグループ全体を単一のシステムとして管理することをシステムアドミニストレータに可能にすることによって、システムのグループおよびそれらのアプリケーションの管理も単純にする。
また、クラスタ化は、ネットワークシステムのフォールトトレランスを増大させるのに使用することもできる。
或るサーバが予期しないソフトウェア障害またはハードウェア障害を受けると、別のクラスタ化されたサーバが、障害を受けたサーバのオペレーションを引き継ぐことができる。
したがって、システムの何らかのハードウェアコンポーネントまたはソフトウェアコンポーネントが障害を受けると、ユーザは、性能低下を受けることはあるが、サービスへのアクセスを失うことはない。 Clustering can be used for parallel processing or parallel computing using two or more CPUs simultaneously to execute an application or program.
Clustering is a popular strategy for implementing parallel processing applications because it allows system administrators to utilize existing computers and workstations.
Because it is difficult to predict the number of requests issued to networked servers, clustering distributes processing and communication activities uniformly across the network system so that no single server is overwhelmed. It is also useful for load balancing.
If one server is at risk of being overwhelmed, the request can be forwarded to another clustered server with greater capacity.
For example, a busy website can use two or more clustered web servers to use a load balancing scheme.
Clustering also increases extensibility by allowing new components to be added as the system load increases.
Furthermore, clustering also simplifies the management of groups of systems and their applications by allowing the system administrator to manage an entire group as a single system.
Clustering can also be used to increase the fault tolerance of a network system.
If one server experiences an unexpected software or hardware failure, another clustered server can take over the operation of the failed server.
Thus, if any hardware or software component of the system is compromised, the user may suffer performance degradation but not lose access to the service.

現在のクラスタサービスには、特に、Microsoft社が自社のＷｉｎｄｏｗｓ（登録商標）ＮＴ４．０オペレーティングシステムおよびＷｉｎｄｏｗｓ２０００ＡｄｖａｎｃｅｄＳｅｒｖｅｒオペレーティングシステムのクラスタ化用に設計したＭｉｃｒｏｓｏｆｔＣｌｕｓｔｅｒＳｅｒｖｅｒ（ＭＳＣＳ）、ＮｏｖｅｌｌＮｅｔｗａｒｅＣｌｕｓｔｅｒＳｅｒｖｉｃｅｓ（ＮＷＣＳ）等が含まれる。
例えば、ＭＳＣＳは、２つのＮＴサーバのクラスタ化をサポートして、単一の高可用性サーバを提供する。 Current cluster services include, among others, Microsoft Cluster Server Server (MSCS), Novell Network Cluster Server, designed by Microsoft for clustering its Windows NT 4.0 operating system and Windows 2000 Advanced Server operating system. (NWCS) and the like are included.
For example, MSCS supports the clustering of two NT servers and provides a single high availability server.

また、クラスタ化は、ストレージエリアネットワーク（ＳＡＮ）および同様のネットワーク接続環境を利用するコンピュータネットワークにおいて実施することもできる。
ＳＡＮネットワークによって、複数のクラスタおよび／またはサーバ間でストレージシステムを共有することが可能になる。
ＳＡＮのストレージデバイスは、例えば、ＲＡＩＤ構成で構造化することができる。 Clustering can also be implemented in a computer network utilizing a storage area network (SAN) and similar network connection environment.
A SAN network allows a storage system to be shared between multiple clusters and / or servers.
A SAN storage device can be structured in a RAID configuration, for example.

システム障害を検出するために、クラスタ化されたノードは、ハートビートメカニズムを使用して、互いの健全性を監視することができる。
ハートビートは、或るクラスタ化されたノードが別のクラスタ化されたノードに送信する信号である。
ハートビート信号は、通常、イーサネット（登録商標）または同様のネットワーク上で送信される。
ただし、このネットワークは他の目的にも利用される。 To detect system failures, clustered nodes can monitor each other's health using a heartbeat mechanism.
A heartbeat is a signal transmitted from one clustered node to another clustered node.
The heartbeat signal is usually transmitted over Ethernet or a similar network.
However, this network is also used for other purposes.

予期されたハートビート信号がノードから受信されない時、そのノードの障害が検出される。
ノードの障害が検出された場合、クラスタ化ソフトウェアは、例えば、障害のあるノードの資源グループ全体を別のノードに転送することができる。
障害によって影響を受けたクライアントアプリケーションは、セッションの障害を検出でき、元の接続と同じように再接続を行うことができる。 When an expected heartbeat signal is not received from a node, a failure of that node is detected.
If a node failure is detected, the clustering software can, for example, transfer the entire resource group of the failed node to another node.
Client applications that are affected by the failure can detect session failures and reconnect in the same way as the original connection.

ハートビート信号がクラスタのノードから受信されると、そのノードは、通常、「アップ」状態にあると定義される。
アップ状態では、ノードは、適切に動作していると仮定される。
他方、ハートビート信号がもはやノードから受信されないと、そのノードは、通常、「ダウン」状態にあると定義される。
ダウン状態では、ノードは、障害を受けていると仮定される。 When a heartbeat signal is received from a node of the cluster, that node is typically defined as being in an “up” state.
In the up state, the node is assumed to be operating properly.
On the other hand, when a heartbeat signal is no longer received from a node, that node is typically defined as being in a “down” state.
In the down state, the node is assumed to have failed.

本明細書に開示する一実施の形態は、コンピュータシステムのクラスタのノードからステータスを通信する方法に関する。
第１のステータス信号が計算ノードから受信され、デフォルトステータス信号が生成される。
これらの第１のステータス信号およびデフォルトステータス信号が使用されて、第２のステータス信号が生成される。 One embodiment disclosed herein relates to a method of communicating status from a node of a cluster of a computer system.
A first status signal is received from the compute node and a default status signal is generated.
These first status signal and default status signal are used to generate a second status signal.

本明細書に開示する別の実施の形態は、コンピュータシステムのクラスタ内のノードステータスを通信する方法に関する。
現ノードのステータスを示す第１の信号が生成される。
先行ノードのステータスを示す第２の信号が受信される。
現ノードがクラスタに存在する場合には、第１の信号が次のノードに送信され、現ノードがクラスタから除去されている場合には、第２の信号が次のノードに送信される。 Another embodiment disclosed herein relates to a method of communicating node status within a cluster of computer systems.
A first signal indicating the status of the current node is generated.
A second signal indicating the status of the preceding node is received.
If the current node is in the cluster, the first signal is sent to the next node, and if the current node is removed from the cluster, the second signal is sent to the next node.

本明細書に開示する別の実施の形態は、コンピュータシステムのクラスタのノードからステータスを通信する装置に関する。
この装置は、少なくとも入力と、デフォルト信号ジェネレータと、出力信号ジェネレータとを含む。
入力は、計算ノードから第１のステータス信号を受信するように構成され、デフォルト信号ジェネレータは、デフォルトステータス信号を生成するように構成される。
出力信号ジェネレータは、第１のステータス信号およびデフォルトステータス信号を使用して、第２のステータス信号を生成するように構成される。 Another embodiment disclosed herein relates to an apparatus for communicating status from a node of a cluster of a computer system.
The apparatus includes at least an input, a default signal generator, and an output signal generator.
The input is configured to receive a first status signal from the compute node and the default signal generator is configured to generate a default status signal.
The output signal generator is configured to generate a second status signal using the first status signal and the default status signal.

本明細書に開示する別の実施の形態は、コンピュータシステムのクラスタ内のノードステータスを通信する装置に関する。
回路機構が、現ノードのステータスを示す第１の信号を生成するように構成され、入力が、先行ノードのステータスを示す第２の信号を受信するように構成される。
現ノードがクラスタに存在する場合には、第１の信号を次のノードに送信し、現ノードがクラスタから除去されている場合には、第２の信号を次のノードに送信するように、選択回路が構成される。 Another embodiment disclosed herein relates to an apparatus for communicating node status within a cluster of computer systems.
The circuitry is configured to generate a first signal indicating the status of the current node, and the input is configured to receive a second signal indicating the status of the preceding node.
If the current node is in the cluster, send the first signal to the next node, and if the current node is removed from the cluster, send the second signal to the next node, A selection circuit is configured.

クラスタ化されたノードの状態を報告する従来の技法について上述した。従来の技法では、ハートビートメカニズムが使用され、ノードは、「アップ」状態または「ダウン」状態のいずれかであると判断される。 A conventional technique for reporting the status of clustered nodes has been described above. In conventional techniques, a heartbeat mechanism is used and a node is determined to be in either an “up” state or a “down” state.

この従来の技法は、さまざまな場合において不十分であり、かつ、不利である。
例えば、ターゲットの重要なアプリケーションが機能していない（すなわち、アプリケーションがダウンしている）場合であっても、そのアプリケーションを実行しているノードは、依然として、自身のハートビート信号を送信している場合がある。 This conventional technique is inadequate and disadvantageous in various cases.
For example, even if the target critical application is not functioning (ie, the application is down), the node running that application is still sending its heartbeat signal There is a case.

その場合、クラスタは、重要なアプリケーションがダウンしている場合であっても、依然としてそのノードがアップしているとみなすことになる。
別の例では、クラスタは、予期されたハートビート信号をノードから受信しない場合があり、したがって、そのノードがダウンしていると仮定する場合がある。
しかしながら、そのノードは、実際にはアップしている（すなわち、適切に動作している）場合があり、ハートビート信号の欠落は、ダウンによるものではなく、相互接続の障害によるものである場合がある。 In that case, the cluster will still assume that the node is up, even if an important application is down.
In another example, the cluster may not receive an expected heartbeat signal from a node and may therefore assume that the node is down.
However, the node may actually be up (ie, operating properly) and the missing heartbeat signal may be due to an interconnect failure rather than a down. is there.

さらに、従来の技法は、通常、既存の回路機構を利用して、ステータス信号を生成し送信する。
この既存の回路機構は、クラスタ内の他の通信にも使用される。
それに反して、出願人は、ステータス信号をローバストに生成して送信するように特に設計された専用回路機構を使用することが、従来の技法よりも有利であると判断した。 In addition, conventional techniques typically utilize existing circuitry to generate and transmit status signals.
This existing circuitry is also used for other communications within the cluster.
In contrast, Applicants have determined that it is advantageous over conventional techniques to use dedicated circuitry specifically designed to robustly generate and transmit status signals.

高可用性（ＨＡ）クラスタの効率性（稼動時間の割合（percentage uptime））は、主として、クラスタのノードの１つが有用な計算機能または記憶機能の実行を中止したこと（すなわち、ノードが有効にダウンしている時）をクラスタが認識するのに要する時間量によって決定されることが分かる。
ノードが有効にダウンしているとクラスタが判断すると、クラスタ化ソフトウェアは、ユーザタスクへの割り込みをほとんど行うことなくノードの残りの実行を維持するのに必要なタスクを実行することができる。 The efficiency (percentage uptime) of a high availability (HA) cluster is primarily that one of the nodes in the cluster has ceased performing a useful calculation or storage function (ie, the node is effectively down). It can be seen that this is determined by the amount of time it takes for the cluster to recognize.
If the cluster determines that the node is effectively down, the clustering software can perform the tasks necessary to maintain the remaining execution of the node with little interruption to the user task.

しかしながら、上述したように、クラスタノードの状態を決定するのに使用される従来の技法は、さまざまな場合において正確でない。
従来の技法は、誤った（不必要な）フェイルオーバまたは検出の失敗になる場合がある。
検出の失敗は、クラスタレベルソフトウェアが、不良ノードから良好ノードに切り換わるべき時に切り換わることができない場合である。
さらに、従来の技法は、多くの場合、ノードのダウン状態を検出するのに、好ましくない長い時間を要する。 However, as mentioned above, the conventional techniques used to determine the state of a cluster node are not accurate in various cases.
Conventional techniques may result in false (unnecessary) failover or detection failure.
A detection failure is when the cluster level software cannot switch when it should switch from a bad node to a good node.
Furthermore, conventional techniques often require an undesirably long time to detect a node down condition.

図１は、本発明の一実施の形態によるクラスタのノード１００の概略図である。
ノード１００は、従来の計算サブシステム１０２および信号ハードウェア回路機構１０６を含む。
計算サブシステム１０２は計算要素を備え、この従来の要素には、一般に、１つまたは２つ以上の中央処理装置（ＣＰＵ）、メモリ等が含まれる。
計算サブシステム１０２は、特に、サブシステムステータス信号１０４を生成して出力する。
信号ハードウェア回路機構１０６は、サブシステムステータス信号１０４を受信し、ノードステータス信号１０８を出力する。
ノードステータス信号１０８は、クラスタ内の次のノードに出力することができる。
これらの信号は、後続の図に関連して以下にさらに説明する。 FIG. 1 is a schematic diagram of a node 100 of a cluster according to an embodiment of the present invention.
Node 100 includes a conventional computing subsystem 102 and signal hardware circuitry 106.
The computing subsystem 102 includes computing elements, which typically include one or more central processing units (CPUs), memory, and the like.
In particular, the computing subsystem 102 generates and outputs a subsystem status signal 104.
The signal hardware circuitry 106 receives the subsystem status signal 104 and outputs a node status signal 108.
The node status signal 108 can be output to the next node in the cluster.
These signals are further described below in connection with subsequent figures.

図２は、本発明の一実施の形態による信号ハードウェア１０６の概略図である。
信号ハードウェア１０６は、信号ジェネレータ２０２および出力信号ジェネレータ２０６を含むことができる。 FIG. 2 is a schematic diagram of the signal hardware 106 according to one embodiment of the present invention.
The signal hardware 106 can include a signal generator 202 and an output signal generator 206.

信号ハードウェア１０６は、計算ノード１０２からサブシステムステータス信号１０４を受信する。
サブシステムステータス信号１０４の例示のタイミング図を図４の上部に示す。
図４に示すように、サブシステムステータス信号１０４は、ＧＯＯＤ（良好）（アップ）状態またはＢＡＤ（不良）（ダウン）状態になることができる。
例えば、ＧＯＯＤ状態は、ハイ（論理１）信号によって表すことができ、ＢＡＤ状態は、ロー（論理０）信号によって表すことができる。
計算サブシステム１０２が適切に機能している（正しく作動している）場合には、サブシステムステータス信号１０４は、ＧＯＯＤ状態に駆動されるべきである。
計算サブシステム１０２が適切に機能していない場合には、ＧＯＯＤ状態がサブシステムステータス信号１０４上に駆動されるべきではない。
ＧＯＯＤ信号の欠如は、システムがＢＡＤである（ダウンしている）ことを意味する。 Signal hardware 106 receives subsystem status signal 104 from compute node 102.
An exemplary timing diagram for subsystem status signal 104 is shown at the top of FIG.
As shown in FIG. 4, the subsystem status signal 104 can be in a GOOD (good) (up) state or a BAD (bad) (down) state.
For example, the GOOD state can be represented by a high (logic 1) signal and the BAD state can be represented by a low (logic 0) signal.
If the computing subsystem 102 is functioning properly (operating properly), the subsystem status signal 104 should be driven to the GOOD state.
If the computing subsystem 102 is not functioning properly, the GOOD state should not be driven on the subsystem status signal 104.
The lack of a GOOD signal means that the system is BAD (down).

信号ジェネレータ２０２は、デフォルトＢＡＤ（デフォルトダウン）信号２０４を生成する。
デフォルトＢＡＤ信号２０４の例示のタイミング図を図４の下部に示す。
図４に示すように、デフォルトＢＡＤ信号２０４は、（単に論理レベルだけでなく）周期が非対称な信号を備える。
例えば、図示するように、デフォルトＢＡＤ信号２０４は、非対称のトグルパターン信号またはパルス変調信号を備えることができる。
図４に示すトグルパターンは、１つの可能性を示す単なる例である。
このようなトグルパターンは、当業者に既知のさまざまな電子回路機構を使用して生成することができる。 The signal generator 202 generates a default BAD (default down) signal 204.
An exemplary timing diagram for the default BAD signal 204 is shown at the bottom of FIG.
As shown in FIG. 4, the default BAD signal 204 comprises a signal that is asymmetric in period (not just logic levels).
For example, as shown, the default BAD signal 204 can comprise an asymmetric toggle pattern signal or a pulse modulated signal.
The toggle pattern shown in FIG. 4 is just an example showing one possibility.
Such toggle patterns can be generated using various electronic circuitry known to those skilled in the art.

出力信号ジェネレータ２０６は、デフォルトＢＡＤ信号２０４およびサブシステムステータス信号１０４の双方を受信するように構成される。
出力信号ジェネレータ２０６は、これら２つの信号を使用して、ノードステータス信号１０８を生成して出力する。 Output signal generator 206 is configured to receive both default BAD signal 204 and subsystem status signal 104.
The output signal generator 206 generates and outputs a node status signal 108 using these two signals.

図３は、本発明の一実施の形態による出力信号ジェネレータ２０６の概略図である。
出力信号ジェネレータ２０６は、プルダウン素子３０２および論理関数ブロック３０４を含むことができる。 FIG. 3 is a schematic diagram of an output signal generator 206 according to one embodiment of the present invention.
The output signal generator 206 can include a pull-down element 302 and a logic function block 304.

図３に示すように、プルダウン素子３０２は、サブシステムステータス信号１０４を受信するラインに接続される。
ハイレベル（この実施の形態ではＧＯＯＤ）が計算サブシステム１０２から駆動されていない場合に、プルダウン素子３０２は、そのラインを強制的にローレベル（この実施の形態ではＢＡＤ）にする。
したがって、計算サブシステム１０２が何ら信号を生成してない場合であっても、サブシステムステータス信号１０４は、ＢＡＤ状態に対応するレベルに好都合に引き寄せられる。 As shown in FIG. 3, the pull-down element 302 is connected to a line that receives the subsystem status signal 104.
When the high level (GOOD in this embodiment) is not driven from the computing subsystem 102, the pull-down element 302 forces the line to a low level (BAD in this embodiment).
Thus, even if the computing subsystem 102 is not generating any signal, the subsystem status signal 104 is conveniently drawn to the level corresponding to the BAD condition.

代替的な実施の形態では、サブシステムステータス信号１０４のローレベルはＧＯＯＤ状態に対応することができ、ハイレベルはＢＡＤ状態に対応することができる。
その場合、プルアップ素子を使用してこの有利な効果を達成することができる。
プルダウン回路素子およびプルアップ回路素子（電圧レベルのプル素子）は当業者に既知である。 In an alternative embodiment, a low level of subsystem status signal 104 can correspond to a GOOD state and a high level can correspond to a BAD state.
In that case, a pull-up element can be used to achieve this advantageous effect.
Pull-down circuit elements and pull-up circuit elements (voltage level pull elements) are known to those skilled in the art.

図３に示すように、論理関数ブロック３０４は、サブシステムステータス信号１０４と共にデフォルトＢＡＤ信号２０４を受信する。
一実施の形態によると、論理関数ブロック３０４は、排他的論理和（ＸＯＲ）ゲートを備えることができる。
他の実施の形態では、異なる関数を利用することができる。 As shown in FIG. 3, logic function block 304 receives default BAD signal 204 along with subsystem status signal 104.
According to one embodiment, the logic function block 304 can comprise an exclusive-or (XOR) gate.
In other embodiments, different functions can be utilized.

論理関数ブロック３０４が生成するノードステータス信号１０８の例示のタイミング図を図５に示す。
これらのタイミング図では、論理関数ブロック３０４はＸＯＲゲートであり、ＸＯＲゲートへ入力される信号は、図４に示す信号（１０４および２０４）である。 An exemplary timing diagram for the node status signal 108 generated by the logic function block 304 is shown in FIG.
In these timing diagrams, the logic function block 304 is an XOR gate, and the signals input to the XOR gate are the signals (104 and 204) shown in FIG.

まず、サブシステムステータス信号１０４がＢＡＤ状態に対応する時に生成されるノードステータス信号１０８を検討する。
この場合、ＸＯＲゲートは、デフォルトＢＡＤ信号２０４およびサブシステムステータス信号１０４のローレベルを受信し、これらの２つの信号に対して排他的論理和演算を実行する。
その結果は、図５の上部に示すノードステータス信号１０８である。
この例では、ノードステータス信号１０８は、ＢＡＤ状態を表す周期的な信号である。
より具体的には、ここでは、ノードステータス信号１０８は、デフォルトＢＡＤ信号２０４と同じ周期形態（この例では、トグルパターンまたはパルス変調パターン）を有する。 First, consider the node status signal 108 that is generated when the subsystem status signal 104 corresponds to a BAD state.
In this case, the XOR gate receives the low level of the default BAD signal 204 and the subsystem status signal 104 and performs an exclusive OR operation on these two signals.
The result is a node status signal 108 shown at the top of FIG.
In this example, the node status signal 108 is a periodic signal representing the BAD state.
More specifically, here, the node status signal 108 has the same periodic form (in this example, a toggle pattern or a pulse modulation pattern) as the default BAD signal 204.

次に、サブシステムステータス信号１０４がＧＯＯＤ状態に対応する時に生成されるノードステータス信号１０８を検討する。
この場合、ＸＯＲゲートは、デフォルトＢＡＤ信号２０４およびサブシステムステータス信号１０４のハイレベルを受信し、これらの２つの信号に対して排他的論理和演算を実行する。
その結果は、図５の下部に示すノードステータス信号１０８である。
この例では、ノードステータス信号１０８は、ＧＯＯＤ状態を表す周期的な信号である。
より具体的には、ここでは、ノードステータス信号１０８は、デフォルトＢＡＤ信号２０４の補完である、異なる周期的な信号である。 Next, consider the node status signal 108 that is generated when the subsystem status signal 104 corresponds to a GOOD state.
In this case, the XOR gate receives the high level of the default BAD signal 204 and the subsystem status signal 104 and performs an exclusive OR operation on these two signals.
The result is a node status signal 108 shown at the bottom of FIG.
In this example, the node status signal 108 is a periodic signal representing the GOOD state.
More specifically, here, the node status signal 108 is a different periodic signal that is a complement to the default BAD signal 204.

図６は、本発明の一実施の形態によるステータス通過回路６００の概略図である。
この回路６００によって、現ノードがダウンしている場合に、先行ノードのノードステータス信号１０８は当該現ノードを通過することが好都合に可能になる。 FIG. 6 is a schematic diagram of a status passing circuit 600 according to one embodiment of the present invention.
This circuit 600 advantageously allows the node status signal 108 of the preceding node to pass through the current node when the current node is down.

ノードＮの信号ハードウェア１０６は、ノードＮのノードステータス信号１０８を生成する。
例えば、信号ハードウェア１０６およびノードステータス信号１０８は、先の図面に関連して上述した通りのものとすることができる。 Node N signal hardware 106 generates node N node status signal 108.
For example, the signal hardware 106 and the node status signal 108 may be as described above in connection with the previous drawings.

選択回路６０２は、ノードＮのノードステータス信号１０８を受信する。
さらに、ノードＮ−１（クラスタ内の別のノード）からのノードステータス信号１０８も、選択回路６０２が受信する。
選択回路６０２は、これらの２つの信号に対して操作を行い、ノードＮ＋１（クラスタ内の次のノード）に送信されるステータス出力信号６０４を生成する。
一実施の形態では、選択回路６０２は、２つのステータス信号の一方を選択して（ステータス出力信号６０４を介して）次のノードに渡すマルチプレクサ（ＭＵＸ）を備えることができる。
ノードＮの計算サブシステム（計算素子）が、（例えば、ノードの障害、メンテナンス、または他の理由により）クラスタから事前に除去されていた場合、ノードＮ−１からのステータスが渡される。
ノードＮの計算サブシステムがクラスタによって現在使用されている場合には、ノードＮのステータスが渡される。
このように、ノードＮがダウンしている場合であっても、依然として、ノードＮ−１のステータスがシステムによって好都合に評価される。 The selection circuit 602 receives the node status signal 108 of the node N.
Further, the selection circuit 602 also receives the node status signal 108 from the node N-1 (another node in the cluster).
The selection circuit 602 operates on these two signals to generate a status output signal 604 that is transmitted to node N + 1 (the next node in the cluster).
In one embodiment, the selection circuit 602 can comprise a multiplexer (MUX) that selects one of two status signals (via the status output signal 604) and passes it to the next node.
If node N's computing subsystem (computing element) has been previously removed from the cluster (eg, due to node failure, maintenance, or other reasons), the status from node N-1 is passed.
If the node N compute subsystem is currently in use by the cluster, the status of node N is passed.
Thus, even if node N is down, the status of node N-1 is still conveniently evaluated by the system.

ノードＮ−１がダウンしている場合には、ノードＮ−１から受信したステータス信号は、ノードＮ−２を起源とするものである場合があることに留意されたい。
ノードＮ−１およびＮ−２が共にダウンしている場合には、ノードＮ−１から受信したステータス信号は、ノードＮ−３を起源とするものである場合がある。
以下同様である。 Note that if node N-1 is down, the status signal received from node N-1 may originate from node N-2.
If both nodes N-1 and N-2 are down, the status signal received from node N-1 may originate from node N-3.
The same applies hereinafter.

図７は、本発明の別の実施の形態によるクラスタのノード７００の概略図である。
図７のノード７００は、図１のノード１００と同様である。
しかしながら、図７では、ノード７００は、従来のサブシステムステータス信号１０４に加えて、サブシステム劣化ステータス信号７０２も生成する。
従来のサブシステムステータス信号１０４と組み合わせることによって、サブシステム劣化ステータス信号７０２は、報告された状態を、単純な２値信号から複数状態（３状態または４状態以上）の信号に拡張する。 FIG. 7 is a schematic diagram of a node 700 of a cluster according to another embodiment of the invention.
The node 700 in FIG. 7 is the same as the node 100 in FIG.
However, in FIG. 7, node 700 also generates subsystem degradation status signal 702 in addition to conventional subsystem status signal 104.
In combination with the conventional subsystem status signal 104, the subsystem degradation status signal 702 extends the reported state from a simple binary signal to a multi-state (3 states or more than 4 states) signal.

例えば、サブシステム劣化ステータス信号７０２は、計算サブシステム１０２のＤＥＧＲＡＤＥＤ（劣化）状態またはＮＯＴ＿ＤＥＧＲＡＤＥＤ（非劣化）状態を示すことができる。
ＤＥＧＲＡＤＥＤ状態は、ノードの１つまたは２つ以上の面が「標準に達して」動作しておらず、その結果、場合によっては、ノードがＨＡクラスタから除去され得る時として定義することができる。
例えば、以下のルールを使用することができる。
ルールＤ１：計算サブシステムが５０％を超える性能を失う。
ルールＤ２：危険な（臨界未満の或るレベル）シャーシコードが受信された。
これらのルールの変形および追加されたルールを使用して、特定のシステムに応じたＤＥＧＲＡＤＥＤ状態を定義することもできる。
例えば、劣化状態に入る前の性能の割合は、５０％と異なる場合がある。
その性能の割合は、７５％等のように高い場合もあるし、２５％等のように低い場合もある。 For example, the subsystem degradation status signal 702 can indicate the DEGGRADED (degraded) state or NOT_DEGRADED (non-degraded) state of the computing subsystem 102.
The DEGRADED state can be defined as the time when one or more faces of a node are not “running standard” and as a result the node can be removed from the HA cluster.
For example, the following rules can be used.
Rule D1: The computing subsystem loses more than 50% performance.
Rule D2: A dangerous (some level below critical) chassis code was received.
These rule variants and added rules can also be used to define DEGRADED states for a particular system.
For example, the performance ratio before entering the degraded state may be different from 50%.
The performance ratio may be as high as 75% or as low as 25%.

一実施の形態では、サブシステム劣化ステータス信号７０２は、ノードが劣化しているか、または、劣化していないかを示す単純なフラグとすることができる。
他の実施の形態では、複数のレベルの劣化が存在し得る。
これら複数のレベルの劣化は、劣化のレベルの複数ビット符号化を使用して実施することができる。
換言すると、単一のＤＥＧＲＡＤＥＤ状態だけを有する代わりに、複数のレベルの劣化をルールによって定義することができる。
複数のレベルの劣化を使用すると、クラスタのノード管理方法に関する、ＨＡクラスタ化ソフトウェアの意志決定プロセス用の付加情報が、ＨＡクラスタ化ソフトウェアに好都合に提供される。
例えば、劣化レベルは性能損失割合に依存する場合がある。 In one embodiment, the subsystem degradation status signal 702 can be a simple flag that indicates whether the node is degraded or not degraded.
In other embodiments, there may be multiple levels of degradation.
These multiple levels of degradation can be implemented using multiple levels of degradation of the degradation.
In other words, instead of having only a single DEGRADED state, multiple levels of degradation can be defined by rules.
Using multiple levels of degradation advantageously provides additional information for the HA clustering software decision process regarding the cluster's node management method to the HA clustering software.
For example, the degradation level may depend on the performance loss ratio.

特定の一実施の形態では、ノード劣化ステータス信号７０４は、ＨＡクラスタ内の次のノードにデジタルに劣化状態を提供する１組のラインを備えることができる。
これらのラインは、抵抗器でプルダウンすることができる。
一実施態様は次のようにすることができる。
すなわち、これらのデジタルラインのすべてが論理０であることは、ノードがＢＡＤであることを示すことができる。
これらのラインのすべてが論理１であることは、ノードがＧＯＯＤであることを示すことができる。
その間にある他の値は、ノードの劣化レベルを示すことができ、値が高いほどより多く機能していることを示す。 In one particular embodiment, the node degradation status signal 704 may comprise a set of lines that digitally provide a degradation state to the next node in the HA cluster.
These lines can be pulled down with resistors.
One embodiment can be as follows.
That is, all of these digital lines being logic 0 can indicate that the node is BAD.
All of these lines being logic 1 can indicate that the node is GOOD.
Other values in between can indicate the degradation level of the node, with higher values indicating more functioning.

図８は、本発明の別の実施の形態によるステータス通過回路８００の概略図である。
図８の回路８００は、図６の回路６００と同様である。
しかしながら、図８では、選択回路８０２も、ノードＮおよびＮ−１からのノード劣化ステータス信号７０４を受信する。 FIG. 8 is a schematic diagram of a status passing circuit 800 according to another embodiment of the present invention.
The circuit 800 in FIG. 8 is similar to the circuit 600 in FIG.
However, in FIG. 8, the selection circuit 802 also receives the node degradation status signal 704 from the nodes N and N-1.

選択回路８０２は、入力信号に対して操作を行い、ノードＮまたはノードＮ−１のいずれかからのＧＯＯＤ／ＢＡＤステータス情報と共に、追加された劣化ステータス情報を含むステータス出力信号８０４を生成する。
好都合なことに、この劣化ステータス情報は、ＧＯＯＤ／ＢＡＤステータス情報との「照合」のようなクラスタレベルソフトウェアが利用することができ、その結果、より確実な１組のステータス情報が生成される。 The selection circuit 802 operates on the input signal, and generates a status output signal 804 including the added deterioration status information along with the GOOD / BAD status information from either the node N or the node N-1.
Conveniently, this degradation status information can be used by cluster level software such as “matching” with GOOD / BAD status information, resulting in a more reliable set of status information.

上記開示は、従来技術を上回るさまざまな利点を含む。
第１に、クラスタにノードステータス情報を確実に送信するために、専用ハードウェアが設計および使用される。
これは、クラスタの高信頼性を改善するはずである。
第２に、ノードの適切なソフトウェアがアップして作動しており、ＧＯＯＤ状態を信号で伝えることができる時にのみ、ＧＯＯＤ状態は送信される。
その結果、ソフトウェアがダウンしている時に、ハードウェアはＧＯＯＤ状態を示すことはない。
第３に、上記開示は、ノードがダウンしていることによる「ハートビートが存在しないこと」を相互接続の障害による「ハートビートの喪失」と区別する問題に対する解決法を提供する。
これは、動作ノードがＧＯＯＤ信号に変更できるデフォルトＢＡＤ信号を提供することによって行われる。
第４に、上記開示は、劣化タイプのステータス信号用に別個の出力を提供し、その結果、このような劣化状態の通信を確実にする。
さらに、劣化ステータス信号によって、クラスタレベルソフトウェアは、「投票方式（voting scheme）」を使用して、ノードが実際にダウンしているかどうかを高速かつ正確に判断することが可能になる。
例えば、投票方式は、ＧＯＯＤ／ＢＡＤ信号、ＤＥＧＲＡＤＥＤ／ＮＯＴ＿ＤＥＧＲＡＤＥＤ信号、およびクラスタが提供する通常のイーサネット接続を含む３つの信号を利用することができる。 The above disclosure includes various advantages over the prior art.
First, dedicated hardware is designed and used to reliably send node status information to the cluster.
This should improve the high reliability of the cluster.
Second, the GOOD state is sent only when the appropriate software on the node is up and running and the GOOD state can be signaled.
As a result, the hardware does not indicate a GOOD state when the software is down.
Third, the above disclosure provides a solution to the problem of distinguishing "no heartbeat" due to a node being down from "lost heartbeat" due to an interconnect failure.
This is done by providing a default BAD signal that the working node can change to a GOOD signal.
Fourth, the above disclosure provides a separate output for degraded type status signals, thus ensuring communication of such degraded conditions.
Furthermore, the degradation status signal allows the cluster level software to use a “voting scheme” to quickly and accurately determine whether a node is actually down.
For example, the voting scheme can utilize three signals including a GOOD / BAD signal, a DEGRADED / NOT_DEGRADED signal, and a normal Ethernet connection provided by the cluster.

上記説明では、本発明の実施の形態の十分な理解を提供するために、多数の具体的な細部が与えられている。
しかしながら、本発明の例示の実施の形態の上記説明は、網羅するためのものでもないし、開示した正確な形に本発明を限定するためのものでもない。
当業者は、具体的な細部の１つまたは２つ以上のものがなくても本発明を実施でき、他の方法、コンポーネント等によっても本発明を実施できることを理解しよう。
それ以外の場合には、既知の構造またはオペレーションは、本発明の態様を分かりにくくすることを回避するために、詳細に示されていないし、説明されていない。
当業者には分かるように、本発明の特定の実施の形態および例は、例示の目的で本明細書に説明され、さまざまな等価な変更が本発明の範囲内で可能である。 In the above description, numerous specific details are given to provide a thorough understanding of embodiments of the invention.
However, the above description of example embodiments of the invention is not intended to be exhaustive or to limit the invention to the precise form disclosed.
Those skilled in the art will appreciate that the invention may be practiced without one or more of the specific details, and that the invention may be practiced with other methods, components, and the like.
In other instances, well-known structures or operations have not been shown or described in detail to avoid obscuring aspects of the invention.
As will be appreciated by those skilled in the art, specific embodiments and examples of the invention are described herein for purposes of illustration, and various equivalent modifications are possible within the scope of the invention.

これらの変更は、上記詳細な説明に鑑み、本発明に対して行うことができる。
添付した特許請求の範囲に使用された用語は、明細書および特許請求の範囲に開示された特定の実施の形態に本発明を限定するものと解釈されるべきではない。
そうではなく、本発明の範囲は、添付した特許請求の範囲によって決定されるべきである。
特許請求の範囲は、特許請求の範囲の解釈の確立された原則に従って解釈されるべきである。 These changes can be made to the invention in light of the above detailed description.
The terms used in the following claims should not be construed to limit the invention to the specific embodiments disclosed in the specification and the claims.
Instead, the scope of the present invention should be determined by the appended claims.
The claims should be construed in accordance with established principles of claim interpretation.

本発明の一実施の形態によるクラスタのノードを示す概略図である。It is the schematic which shows the node of the cluster by one embodiment of this invention. 本発明の一実施の形態による信号ハードウェアの概略図である。FIG. 2 is a schematic diagram of signal hardware according to an embodiment of the present invention. 本発明の一実施の形態による出力信号ジェネレータの概略図である。1 is a schematic diagram of an output signal generator according to an embodiment of the present invention. FIG. 本発明の一実施の形態によるサブシステムステータス信号およびデフォルトＢＡＤ信号のタイミング図である。FIG. 6 is a timing diagram of a subsystem status signal and a default BAD signal according to an embodiment of the present invention. 本発明の一実施の形態によるノードステータス信号のタイミング図である。FIG. 6 is a timing diagram of a node status signal according to an embodiment of the present invention. 本発明の一実施の形態によるステータス通過回路の概略図である。It is the schematic of the status passage circuit by one embodiment of this invention. 本発明の別の実施の形態によるクラスタのノードの概略図である。FIG. 4 is a schematic diagram of nodes of a cluster according to another embodiment of the present invention. 本発明の別の実施の形態によるステータス通過回路の概略図である。FIG. 6 is a schematic diagram of a status passing circuit according to another embodiment of the present invention.

Explanation of symbols

１００・・・ノード，
１０２・・・計算サブシステム（計算ノード），
１０４・・・サブシステムステータス，
１０６・・・信号ハードウェア，
１０８・・・ノードステータス，
１０４・・・サブシステムステータス，
１０６・・・信号ハードウェア，
１０８・・・ノードステータス，
２０２・・・信号ジェネレータ，
２０４・・・デフォルトＢＡＤ信号，
２０６・・・出力信号ジェネレータ，
１０４・・・サブシステムステータス，
１０８・・・ノードステータス，
２０４・・・デフォルトＢＡＤ信号，
３０４・・・論理関数ブロック， 100 ... node,
102: Computing subsystem (computing node),
104 ... subsystem status,
106: Signal hardware,
108 ... node status,
104 ... subsystem status,
106: Signal hardware,
108 ... node status,
202... Signal generator,
204 ... default BAD signal,
206 ... Output signal generator,
104 ... subsystem status,
108 ... node status,
204 ... default BAD signal,
304 ... logic function block,

Claims

A method for communicating status from a first node of a cluster of computer systems to a second node comprising :
Receiving a first status signal of a computing node included in the first node ;
Generating a default status signal of the first node ;
Forcing the first status signal to be in a down state if the compute node does not actively drive the first status signal to a state that should be in an up state;
Applying a logic function to the first status signal and the default status signal to generate a second status signal for indicating the status of the first node to the second node. .

When the first status signal indicates the up state,
The second status signal is:
Including a first periodic signal indicative of the up state;
When the first status signal indicates the down state,
The second status signal is:
The method of claim 1, comprising a second periodic signal indicative of the down state.

If the first status signal does not indicate an up state or a down state,
The method of claim 2, wherein the second status signal defaults to the second periodic signal.

The first periodic signal and the second periodic signal are:
4. The method of claim 3, comprising different toggle type signals.

The first periodic signal and the second periodic signal are:
The method of claim 4, comprising complementing each other.

Receiving a first degradation status signal from the compute node;
Generating a default degradation status signal;
The method of claim 1, further comprising using the first degradation status signal and the default degradation status signal to generate a second degradation status signal.

The degradation status signal is
The method of claim 6 comprising multiple levels of degradation.

A device for communicating status from a cluster node of a computer system,
  An input configured to receive a first status signal from the compute node;
  A default signal generator configured to generate a default status signal;
  An output signal generator configured to generate a second status signal using the first status signal and the default status signal;
  With
  The output signal generator includes a pull element having a voltage level effective for the first status signal.
  Including the device.

The output signal generator includes an exclusive OR circuit effective for the first status signal and the default status signal.
The apparatus of claim 8 comprising:

The output signal generator further includes a first periodic signal in which the second status signal indicates an up state when the first status signal indicates an up state, and the first status signal is down When indicating a state, the second status signal is configured to include a second periodic signal indicating a down state
The apparatus according to claim 8.

The output signal generator is further configured such that the second status signal becomes the default second periodic signal when the first status signal does not indicate either an up or down state. The
The apparatus according to claim 10.

The first and second periodic signals are different toggle type signals
The apparatus of claim 11 comprising:

The first and second periodic signals are complementary signals
The apparatus of claim 12 comprising: