JP2007052536A

JP2007052536A - Failure detection method

Info

Publication number: JP2007052536A
Application number: JP2005235844A
Authority: JP
Inventors: Kazuhiko Haru; 一彦春; 克典 ▲高▼橋; Katsunori Takahashi
Original assignee: Hitachi Communication Technologies Ltd
Current assignee: Hitachi Communication Technologies Ltd
Priority date: 2005-08-16
Filing date: 2005-08-16
Publication date: 2007-03-01

Abstract

<P>PROBLEM TO BE SOLVED: To detect a failure in a switching hub in a blade server which is equipped with a plurality of servers and a plurality of switching hubs. <P>SOLUTION: Signals via a plurality of switching hubs 4 and 5 are transmitted from a server 3 which is detected in failure to a server 2 which detects the failure at a fixed cycle. The server which monitors the failure holds a non-received time management table 8 and determines whether the non-received time of the server 3 and the switching hubs 4 and 5 continue a given period of time or not. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、障害検出方法に係り、特に、サーバ又はスイッチングハブ等の障害を検出するための障害検出方法に関する。 The present invention relates to a failure detection method, and more particularly to a failure detection method for detecting a failure of a server or a switching hub.

従来、特許文献１には、各通信制御処理モジュールでデータ送信要求が発生したか否かに関係なく、常に、早急に障害を検出する交換スイッチの障害検出システムが記載されている。 Conventionally, Patent Document 1 describes a failure detection system for an exchange switch that always detects a failure quickly regardless of whether a data transmission request is generated in each communication control processing module.

また、従来、特許文献２には、ネットワークを構成するスイッチングハブやケーブルに障害が発生したとき、予備のネットワークに動作を切替えることで、ネットワークを停止せずに継続動作させるようにしたネットワーク接続システムが記載されている。
特開平７−１２３１３５号公報特開２００４−１５９２０５号公報 Conventionally, Patent Document 2 discloses a network connection system in which when a failure occurs in a switching hub or cable constituting a network, the operation is switched to a spare network so that the network can be continuously operated without being stopped. Is described.
JP 7-123135 A JP 2004-159205 A

しかしながら、従来、複数のサーバ群と、該サーバ群を接続する複数のスイッチングハブを経由して通信するブレードサーバ装置において、スイッチングハブにおける障害を検出できなかった。
本発明の目的は、複数のサーバ群と該サーバを接続する複数のスイッチングハブを経由して通信するブレードサーバ装置において、ブレードサーバ装置が備えているサーバ群およびスイッチングハブの障害を検出することにある。 However, conventionally, in a blade server device that communicates with a plurality of server groups via a plurality of switching hubs connecting the server groups, a failure in the switching hub cannot be detected.
An object of the present invention is to detect a failure of a server group and a switching hub included in the blade server device in a blade server device that communicates with a plurality of server groups via a plurality of switching hubs connecting the servers. is there.

上記の課題を解決するために、本発明は、少なくとも以下のような手段を備える。すなわち、
（１）複数のサーバ群において障害を検出するサーバに対して、一定周期において信号を送信する。
（２）障害を検出するサーバにおいて、全サーバ群から受信する信号が途絶している時間を管理する。
（３）障害を検出するサーバにおいて、各々のスイッチングハブを経由した信号が途絶している時間を管理する。 In order to solve the above problems, the present invention comprises at least the following means. That is,
(1) A signal is transmitted at regular intervals to servers that detect a failure in a plurality of server groups.
(2) In a server that detects a failure, the time during which signals received from all server groups are interrupted is managed.
(3) In the server that detects the failure, the time during which the signal passing through each switching hub is interrupted is managed.

本発明の解決手段によると、
複数のサーバと、前記複数のサーバを接続する第１のスイッチングハブと、前記複数のサーバを接続する第２のスイッチングハブとを備え、該サーバ間における通信を該第１又は第２のスイッチングハブを経由して行い、該サーバが該第１又は第２のスイッチングハブを経由して外部網と通信するブレードサーバ装置における障害検出方法において、
前記サーバのうち予め定められた特定サーバに対して、前記特定サーバ以外の他のサーバが、第１及び第２のスイッチングハブを経由して、第１及び第２の一定周期信号を送信する機能と、
前記特定サーバが、前記第１及び第２のスイッチングハブを経由して、それぞれ前記他のサーバから第１及び第２の一定周期信号を受信する機能と、
前記特定サーバが、第１及び第２の一定周期信号に基づき送信元のサーバ及び経由したスイッチングハブを特定する機能と、
前記特定のサーバが、全ての前記他のサーバからの第１の一定周期信号が所定時間途絶し、且つ、第２の一定周期信号を受信したことを検出することにより、前記第１スイッチングハブにおいて発生した障害を検出する機能と、
前記特定のサーバが、全ての前記他のサーバからの第２の一定周期信号が所定時間途絶し、且つ、第１の一定周期信号を受信したことを検出することにより、前記第２スイッチングハブにおいて発生した障害を検出する機能と、
を含む障害検出方法が提供される。 According to the solution of the present invention,
A plurality of servers; a first switching hub that connects the plurality of servers; and a second switching hub that connects the plurality of servers, the first or second switching hub for communication between the servers In the failure detection method in the blade server device, which is performed via the server, and the server communicates with the external network via the first or second switching hub,
A function in which a server other than the specific server transmits first and second constant periodic signals to the specific server determined in advance via the first and second switching hubs. When,
A function for the specific server to receive first and second constant period signals from the other servers via the first and second switching hubs, respectively;
A function for the specific server to identify a transmission source server and a switching hub via the first and second constant periodic signals;
In the first switching hub, the specific server detects that the first constant period signal from all the other servers has been interrupted for a predetermined time and has received the second constant period signal. The ability to detect the failure that occurred,
In the second switching hub, the specific server detects that the second constant periodic signal from all the other servers has been interrupted for a predetermined time and has received the first constant periodic signal. The ability to detect the failure that occurred,
A fault detection method is provided.

本発明によると、スイッチングハブの障害発生状態を管理できるため、サーバが一方のスイッチングハブを経由してインターフェースにおいて行っていた外部網との通信を、他方のスイッチングハブを経由するように切り替えることにより、一方のスイッチングハブにおける障害発生時において、通信を継続することが可能になる。 According to the present invention, since the failure occurrence state of the switching hub can be managed, by switching the communication with the external network that the server was performing in the interface via one switching hub to be routed via the other switching hub It becomes possible to continue communication when a failure occurs in one switching hub.

また、本発明によると、サーバが他方のスイッチングハブを経由してインターフェースにおいて行っていた外部網との通信を、一方のスイッチングハブを経由するように切り替えることにより、他方のスイッチングハブにおける障害発生時において、通信を継続することも可能になる。 In addition, according to the present invention, when a failure occurs in the other switching hub, the server switches the communication with the external network that was performed at the interface via the other switching hub to pass through the one switching hub. In this case, communication can be continued.

本発明によると、複数のサーバ群と該サーバを接続する複数のスイッチングハブを経由して通信するブレードサーバ装置において、ブレードサーバ装置が備えているサーバ群およびスイッチングハブの障害を検出することができる。 According to the present invention, in a blade server device that communicates with a plurality of server groups via a plurality of switching hubs connecting the servers, it is possible to detect a failure of the server group and the switching hub included in the blade server device. .

以下に、本発明の実施の形態を図面を用いて説明する。
図１は、本発明におけるサーバブレード装置の構成図を示す。ブレードサーバ装置１は、障害を検出するサーバ２と、障害を検出されるサーバ３（３ａ、３ｂ、３ｃ）と、２系統のスイッチングハブ４及びスイッチングハブ５と、外部インターフェース８及び９とを備えている。本実施の形態におけるサーバ数は４枚としたが、必要に応じて、自由に選定できるものである。 Embodiments of the present invention will be described below with reference to the drawings.
FIG. 1 shows a configuration diagram of a server blade apparatus according to the present invention. The blade server device 1 includes a server 2 that detects a failure, a server 3 (3a, 3b, 3c) from which a failure is detected, a two-system switching hub 4 and a switching hub 5, and external interfaces 8 and 9. ing. Although the number of servers in this embodiment is four, it can be freely selected as necessary.

ブレードサーバ装置１は、外部インターフェース８によりスイッチングハブ４を経由して外部網と接続し、外部インターフェース９によりスイッチングハブ５を経由して外部網と接続する。 The blade server device 1 is connected to the external network via the switching hub 4 via the external interface 8 and is connected to the external network via the switching hub 5 via the external interface 9.

障害を検出されるサーバ３は、障害を検出するサーバ２に対して、スイッチングハブ４を経由して一定周期の信号６（６ａ、６ｂ、６ｃ）を送信する機能を備える。また、障害を検出されるサーバ３は、障害を検出するサーバ２に対して、スイッチングハブ５を経由して一定周期の信号７（７ａ、７ｂ、７ｃ）を送信する機能を備える。 The server 3 in which a failure is detected has a function of transmitting a signal 6 (6a, 6b, 6c) with a fixed period via the switching hub 4 to the server 2 that detects the failure. Further, the server 3 that detects a failure has a function of transmitting a signal 7 (7a, 7b, 7c) having a constant period via the switching hub 5 to the server 2 that detects the failure.

一方、障害を検出するサーバ２は、スイッチングハブ４を経由する信号６あるいはスイッチングハブ５を経由する信号７が途絶してからの時間（未受信時間）を管理するテーブル、すなわち未受信時間管理テーブル５０を備える。また、障害を検出する特定のサーバ２は、外部網との通信に経由するスイッチングハブ４又は５の選択を、サーバ３ａ〜３ｃに対して、指示することができる。 On the other hand, the server 2 that detects a failure manages a time (non-reception time) after the signal 6 passing through the switching hub 4 or the signal 7 passing through the switching hub 5 is interrupted, that is, a non-reception time management table. 50. In addition, the specific server 2 that detects the failure can instruct the servers 3a to 3c to select the switching hub 4 or 5 via communication with the external network.

次に、図２に、未受信時間管理テーブル５０を示す。
未受信時間管理テーブル５０は、サーバ３ａ〜ｃ、スイッチングハブ４、５のそれぞれについて未受信時間を記憶する。なお、以下のフローチャートの説明では、一例として、予め定められた測定開始値を設定し、一定周期毎に所定値を減算することで、未受信時間に対応させているが、適宜、未受信時間に対応させたデータを用いることができ、また、所定値を加算する手法を用いることもできる。 Next, FIG. 2 shows a non-reception time management table 50.
The non-reception time management table 50 stores the non-reception time for each of the servers 3a to 3c and the switching hubs 4 and 5. In the following description of the flowchart, as an example, a predetermined measurement start value is set, and the predetermined value is subtracted every predetermined period to correspond to the non-reception time. Data corresponding to the above can be used, and a method of adding a predetermined value can also be used.

本実施の形態の動作の概略は次の通りである。すなわち、障害を検出されるサーバ３から障害を検出するサーバ２に対して、複数のスイッチングハブ４および５経由の信号を一定周期にて送信する。障害を監視するサーバ２は未受信時間管理テーブル５０を保持し、サーバ３およびスイッチングハブ４および５の未受信時間が一定時間継続するか判定する。 The outline of the operation of the present embodiment is as follows. In other words, signals from the plurality of switching hubs 4 and 5 are transmitted at regular intervals from the server 3 that detects the failure to the server 2 that detects the failure. The server 2 that monitors the failure holds the non-reception time management table 50, and determines whether the non-reception times of the server 3 and the switching hubs 4 and 5 continue for a certain time.

図３は、未受信時間管理テーブル設定処理ルーチンを示すフローチャートである。
障害を検出するサーバ２は、信号を受信時、図示のような未受信時間管理テーブル設定処理ルーチン６０を起動する（一定周期信号受信）。 FIG. 3 is a flowchart showing a non-reception time management table setting processing routine.
When receiving a signal, the server 2 that detects a failure activates a non-reception time management table setting processing routine 60 as illustrated (reception of a fixed period signal).

障害を検出するサーバ２は、一定周期の信号６ａを受信すると、経由スイッチングハブはスイッチングハブ４であると特定し（６１）、未受信時間管理テーブル５０のデータ５２にスイッチングハブ用測定開始値を設定し（６２）、また、信号発信サーバはサーバ３ａであると特定し（６４）、未受信時間管理テーブル５０のデータ５１ａにサーバ用測定開始値を設定する（６５）。 When the server 2 that detects the failure receives the signal 6a having a constant period, the server 2 identifies that the transit switching hub is the switching hub 4 (61), and sets the measurement start value for the switching hub in the data 52 of the non-reception time management table 50. Setting (62), the signal transmission server is identified as the server 3a (64), and the server measurement start value is set in the data 51a of the unreceived time management table 50 (65).

なお、測定開始値は予め定められた値とすることができ、ここでは一例として、障害発生検出するための周期数に対応する正の値とする。また、スイッチングハブ４又は５の判定は、例えば、入力ポートにより判定可能であり、信号発信サーバ３ａ〜３ｃの判定は、例えば、スイッチングハブ４又は５から、又は、サーバ３ａ〜３ｃから発信元データを一定周期信号に含めて、サーバ２がそれを受信することにより判定可能である。 Note that the measurement start value can be a predetermined value. Here, as an example, the measurement start value is a positive value corresponding to the number of cycles for detecting the occurrence of a failure. Further, the determination of the switching hub 4 or 5 can be made, for example, by an input port, and the determination of the signal transmission servers 3a to 3c is made from the switching hub 4 or 5 or the transmission source data from the servers 3a to 3c, for example. Can be determined by including it in the fixed period signal and the server 2 receiving it.

障害を検出するサーバ２は、一定周期の信号７ａを受信すると、経由スイッチングハブ判定はスイッチングハブ５であると特定し（６１）、未受信時間管理テーブル５０のデータ５３にスイッチングハブ用測定開始値を設定し（６３）、また、信号発信サーバはサーバ３ａであると特定し（６４）、未受信時間管理テーブルの５１ａにサーバ用測定開始値を設定する（６５）。 When the server 2 that detects the failure receives the signal 7a with a fixed period, the transit switching hub determination specifies the switching hub 5 (61), and the measurement start value for the switching hub is stored in the data 53 of the unreceived time management table 50. (63), and the signal transmission server is identified as the server 3a (64), and the server measurement start value is set in 51a of the unreceived time management table (65).

障害を検出するサーバ２は、一定周期の信号６ｂを受信すると、経由スイッチングハブ判定はスイッチングハブ４であると特定し（６１）、未受信時間管理テーブル５０のデータ５２にスイッチングハブ用測定開始値を設定し（６２）、また、信号発信サーバはサーバ３ｂであると特定し（６４）、未受信時間管理テーブルのデータ５１ｂにサーバ用測定開始値を設定する（６６）。 When the server 2 that detects the failure receives the signal 6b having a fixed period, the relay switching hub determination specifies that the switching hub 4 is the switching hub 4 (61), and the switching hub measurement start value is stored in the data 52 of the unreceived time management table 50. (62), the signal transmission server is identified as the server 3b (64), and the server measurement start value is set in the data 51b of the unreceived time management table (66).

障害を検出するサーバ２は、一定周期の信号７ｂを受信すると、経由スイッチングハブ判定はスイッチングハブ５であると特定し（６１）、未受信時間管理テーブル５０にデータ５３にスイッチングハブ用測定開始値を設定し（６３）、また、信号発信サーバはサーバ３ｂであると特定し（６４）、未受信時間管理テーブルのデータ５１ｂにサーバ用測定開始値を設定する（６６）。 When the server 2 that detects the failure receives the signal 7b having a fixed period, the transit switching hub determination is specified as the switching hub 5 (61), and the measurement start value for the switching hub is stored in the data 53 in the non-reception time management table 50. (63), and the signal transmission server is identified as the server 3b (64), and the server measurement start value is set in the data 51b of the unreceived time management table (66).

障害を検出するサーバ２は、一定周期の信号６ｃを受信すると、経由スイッチングハブ判定はスイッチングハブ４であると特定し（６１）、未受信時間管理テーブル５０のデータ５２にスイッチングハブ用測定開始値を設定し（６２）、また、信号発信サーバはサーバ３ｃであると特定し（６４）、未受信時間管理テーブルのデータ５１ｃにサーバ用測定開始値を設定する（６７）。 When the server 2 that detects the failure receives the signal 6c with a fixed period, the relay switching hub determination specifies that the switching hub 4 is the switching hub 4 (61), and the switching hub measurement start value is stored in the data 52 of the non-reception time management table 50. (62), the signal transmission server is identified as the server 3c (64), and the server measurement start value is set in the data 51c of the unreceived time management table (67).

障害を検出するサーバ２は、一定周期の信号７ｃを受信すると、経由スイッチングハブ判定はスイッチングハブ５であると特定し（６１）、未受信時間管理テーブル５０のデータ５３にスイッチングハブ用測定開始値を設定し（６３）、また、信号発信サーバはサーバ３ｃであると特定し（６４）、未受信時間管理テーブルのデータ５１ｃにサーバ用測定開始値を設定する（６７）。 When the server 2 that detects the failure receives the signal 7c with a fixed period, the transit switching hub determination is specified as the switching hub 5 (61), and the measurement start value for the switching hub is stored in the data 53 of the unreception time management table 50. (63), the signal transmission server is identified as the server 3c (64), and the server measurement start value is set in the data 51c of the unreceived time management table (67).

次に、図４は、障害検出処理ルーチン８０を示すフローチャートである。
障害を検出するサーバ２は、周期起動にて障害検出処理ルーチン８０を起動する。
障害を検出するサーバ２は、障害を検出する装置を順次選択し（８６）、未受信時間管理テーブル５０の各々の設定値（データ５１ａ〜５１ｃ、５２、５３）から１又は所定数の減算を行う（８１）。サーバ２は、未受信時間が測定開始値から０以下に減算されたか判定を行い（８２）、０以下に減算された場合、当該装置（サーバ３ａ〜３ｃ、スイッチングハブ４、５のいずれかひとつ又は複数）における障害を検出し、適宜の記憶手段に当該装置を障害として記述する、及び／又は、所定装置部に通知・出力する（８４）。０より大きい場合においては障害検出しない（８３）。サーバ装置３およびスイッチングハブ装置４および５の全てに対して該障害検出処理を行い（８５）、それぞれの装置における障害の検出を行う。 Next, FIG. 4 is a flowchart showing the failure detection processing routine 80.
The server 2 that detects the failure activates the failure detection processing routine 80 by cyclic activation.
The server 2 that detects a failure sequentially selects a device that detects the failure (86), and subtracts 1 or a predetermined number from each set value (data 51a to 51c, 52, 53) of the unreceived time management table 50. (81). The server 2 determines whether or not the unreceived time has been subtracted from the measurement start value to 0 or less (82), and if it has been subtracted to 0 or less, one of the devices (servers 3a to 3c, switching hubs 4 and 5). (Or a plurality of failures) is detected, and the device is described as a failure in an appropriate storage means, and / or notified / output to a predetermined device section (84). If it is greater than 0, no failure is detected (83). The failure detection processing is performed for all of the server device 3 and the switching hub devices 4 and 5 (85), and the failure of each device is detected.

なお、未受信時間は、例えば、開始時間として０を設定し、一定周期の信号受信毎に１又は所定数の加算を行い、予め定められた閾値を超えた場合に障害を検出するようにしてもよい。その他、適宜のカウント手法により障害を検出することができる。 The non-reception time is set to 0 as the start time, for example, 1 or a predetermined number of additions are performed every time a signal is received in a certain period, and a failure is detected when a predetermined threshold is exceeded. Also good. In addition, the failure can be detected by an appropriate counting method.

図５に、サーバ３ａにおいて障害が発生した場合における信号の途絶状況を示す。この場合、信号６ａおよび７ａが途絶し、未受信時間データ５１ａのみ０以下に減算され、サーバ３ａにおける障害を検出する。 FIG. 5 shows a signal interruption state when a failure occurs in the server 3a. In this case, the signals 6a and 7a are interrupted, and only the unreceived time data 51a is subtracted to 0 or less to detect a failure in the server 3a.

図６に、サーバ３ｂにおいて障害が発生した場合における信号の途絶状況を示す。この場合、信号６ｂおよび７ｂが途絶し、未受信時間データ５１ｂのみ０以下に減算され、サーバ３ｂにおける障害を検出する。 FIG. 6 shows a signal interruption state when a failure occurs in the server 3b. In this case, the signals 6b and 7b are interrupted, and only the unreceived time data 51b is subtracted to 0 or less to detect a failure in the server 3b.

図７に、サーバ３ｃにおいて障害が発生した場合における信号の途絶状況を示す。この場合、信号６ｃおよび７ｃが途絶し、未受信時間データ５１ｃのみ０以下に減算され、サーバ３ｃにおける障害を検出する。 FIG. 7 shows a signal interruption state when a failure occurs in the server 3c. In this case, the signals 6c and 7c are interrupted, and only the unreceived time data 51c is subtracted to 0 or less to detect a failure in the server 3c.

図８に、スイッチングハブ４において障害が発生した場合における信号途絶状況を示す。この場合、信号６ａおよび６ｂおよび６ｃが途絶し、未受信時間データ５２のみ０以下に減算され、スイッチングハブ４における障害を検出する。 FIG. 8 shows a signal interruption situation when a failure occurs in the switching hub 4. In this case, the signals 6a and 6b and 6c are interrupted, and only the non-reception time data 52 is subtracted to 0 or less, and a failure in the switching hub 4 is detected.

図９に、スイッチングハブ５において障害が発生した場合における信号途絶状況を示す。この場合、信号７ａおよび７ｂおよび７ｃが途絶し、未受信時間データ５３のみ０以下に減算され、スイッチングハブ５における障害を検出する。 FIG. 9 shows a signal interruption situation when a failure occurs in the switching hub 5. In this case, the signals 7a, 7b, and 7c are interrupted, and only the unreceived time data 53 is subtracted to 0 or less to detect a failure in the switching hub 5.

本実施の形態によると、以上のように、障害を検出するサーバ２において、サーバ３からの信号未受信時間をスイッチングハブ経由毎に管理することにより、サーバ３およびスイッチングハブ４および５の障害をそれぞれに検出することができ、スイッチングハブ４あるいはスイッチングハブ５障害発生時において、障害を検出するサーバ２により、サーバ３において外部網との通信に経由するスイッチングハブを切替え、サーバ３が外部網との通信を継続することが可能になる。 According to the present embodiment, as described above, the server 2 that detects a failure manages the failure of the server 3 and the switching hubs 4 and 5 by managing the signal non-reception time from the server 3 via the switching hub. When the failure of the switching hub 4 or the switching hub 5 occurs, the server 2 that detects the failure switches the switching hub that is in communication with the external network in the server 3 so that the server 3 is connected to the external network. Communication can be continued.

本発明は、ブレードサーバ装置をはじめ、例えば、様々なサーバ装置等の各種通信装置及びコンピュータシステム等に適用可能である。 The present invention can be applied to a blade server device, various communication devices such as various server devices, a computer system, and the like.

本発明におけるサーバブレード装置の構成図。The block diagram of the server blade apparatus in this invention. 未受信時間管理テーブル５０。Non-reception time management table 50. 未受信時間管理テーブル設定処理ルーチンを示すフローチャート。The flowchart which shows a non-reception time management table setting process routine. 障害検出処理ルーチン８０を示すフローチャート。7 is a flowchart showing a failure detection processing routine 80. サーバ３ａにおいて障害が発生した場合における信号の途絶状況を示す図。The figure which shows the interruption state of the signal when a failure generate | occur | produces in the server 3a. サーバ３ｂにおいて障害が発生した場合における信号の途絶状況を示す図。The figure which shows the interruption state of the signal when a failure generate | occur | produces in the server 3b. サーバ３ｃにおいて障害が発生した場合における信号の途絶状況を示す図。The figure which shows the interruption state of the signal when a failure generate | occur | produces in the server 3c. スイッチングハブ４において障害が発生した場合における信号途絶状況を示す図。The figure which shows the signal interruption condition when a failure generate | occur | produces in the switching hub. スイッチングハブ５において障害が発生した場合における信号途絶状況を示す図。The figure which shows the signal interruption condition when a failure generate | occur | produces in the switching hub.

Explanation of symbols

1 ブレードサーバ装置、2 障害検出サーバ、3 サーバ、4 スイッチングハブ、5 スイッチングハブ、50
未受信時間管理テーブル、60 未受信時間管理テーブル設定処理ルーチン、80 障害検出処理ルーチン
1 blade server device, 2 failure detection server, 3 server, 4 switching hub, 5 switching hub, 50
Unreceived time management table, 60 Unreceived time management table setting processing routine, 80 Failure detection processing routine

Claims

A plurality of servers; a first switching hub that connects the plurality of servers; and a second switching hub that connects the plurality of servers, the first or second switching hub for communication between the servers In the failure detection method in the blade server device, which is performed via the server, and the server communicates with the external network via the first or second switching hub,
A function in which a server other than the specific server transmits first and second constant periodic signals to the specific server determined in advance via the first and second switching hubs. When,
A function for the specific server to receive first and second constant period signals from the other servers via the first and second switching hubs, respectively;
A function for the specific server to identify a transmission source server and a switching hub via the first and second constant periodic signals;
In the first switching hub, the specific server detects that the first constant period signal from all the other servers has been interrupted for a predetermined time and has received the second constant period signal. The ability to detect the failure that occurred,
In the second switching hub, the specific server detects that the second constant periodic signal from all the other servers has been interrupted for a predetermined time and has received the first constant periodic signal. The ability to detect the failure that occurred,
Fault detection method including

The specific server detects a failure occurring in the other server by detecting that both of the first and second constant period signals from any of the other servers are interrupted for a predetermined time. The failure detection method according to claim 1, further comprising a function.

Corresponding to each device of the plurality of servers and the first and second switching hubs, the first or second is based on a table storing information indicating the non-reception time of either the first or second constant period signal. 3. The device according to claim 1, wherein when the fixed period signal of 2 is received, information indicating a non-reception time corresponding to each device is set to detect which device has failed. Failure detection method.

The failure detection in the blade server device according to any one of claims 1 to 3, wherein the specific server instructs the other server to select a switching hub via communication with an external network. Method.