JP5790420B2

JP5790420B2 - Communication device, failure detection method, and failure detection program

Info

Publication number: JP5790420B2
Application number: JP2011244025A
Authority: JP
Inventors: 友晃兼田; 隆史川嶋; 幸正吉田; 健高島
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2011-11-07
Filing date: 2011-11-07
Publication date: 2015-10-07
Anticipated expiration: 2031-11-07
Also published as: JP2013102308A

Description

本発明は、通信装置、障害検出方法および障害検出プログラムに関する。 The present invention relates to a communication device, a failure detection method, and a failure detection program.

従来、通信装置間で通信障害が発生した場合に、予備の通信経路に切り替えたり、迂回経路を選択したりすることで、接続先装置との通信路を確保することが行われている。このような手法では、障害を復旧させないで放置することになるので、予備の経路が存在しなくなった状態で通信障害が発生した場合には、経路切替も迂回経路選択もできず通信ができなくなる。 Conventionally, when a communication failure occurs between communication apparatuses, a communication path with a connection destination apparatus is secured by switching to a backup communication path or selecting a bypass path. In such a method, the failure is not restored, and it is left unattended. Therefore, if a communication failure occurs when there is no spare route, neither route switching nor detour route selection can be performed, and communication cannot be performed. .

一方で、現在のように通信網が発達し、接続先装置までに複数の装置を経由するネットワークでは、障害箇所が自装置、接続先装置、通信経路、他の場所のいずれなのかを特定するのが難しい。このため、障害を検出した装置が、接続先装置に対して再起動などの復旧制御を実施する技術が知られている。 On the other hand, in a network where a communication network has been developed as in the present situation and a plurality of devices are connected to the connection destination device, the failure location is identified as the own device, the connection destination device, the communication path, or another location. It is difficult. For this reason, a technique is known in which a device that detects a failure performs recovery control such as restart on a connection destination device.

特開２００７−４９３３６号公報JP 2007-49336 A 特開平０７−２５０１２５号公報Japanese Patent Application Laid-Open No. 07-250125

しかしながら、従来技術では、障害箇所が不確かな状態で復旧作業を実施するので、復旧作業によって障害が悪化する場合もあるという問題がある。なお、この問題は、通信装置に限ったものではなく、シャーシに搭載されて、信号送信部や信号受信部を実行する中継カードについても同様に存在する。 However, in the prior art, since the recovery operation is performed in a state where the failure location is uncertain, there is a problem that the failure may be deteriorated by the recovery operation. This problem is not limited to the communication device, but also exists in the relay card that is mounted on the chassis and executes the signal transmission unit and the signal reception unit.

例えば、障害の原因が自装置である場合でも、接続先装置に復旧制御を実施するので、復旧作業を実施し続けても障害が継続し、復旧作業が障害の原因を特定する阻害要因になる恐れもある。また、正常に動作する接続先装置に繰り返して復旧作業を実施するので、接続先装置の障害を誘発し、却って障害を悪化させることも起こり得る。 For example, even if the cause of the failure is the local device, recovery control is performed on the connection destination device. Therefore, the failure continues even if the recovery operation is continued, and the recovery operation becomes an obstacle to identify the cause of the failure. There is also a fear. In addition, since the restoration work is repeatedly performed on the connection destination device that normally operates, it is possible that the failure of the connection destination device is induced and the failure is worsened.

開示の技術は、上記に鑑みてなされたものであって、障害箇所を絞り込むことができる通信装置、障害検出方法および障害検出プログラムを提供することを目的とする。 The disclosed technology has been made in view of the above, and an object thereof is to provide a communication device, a failure detection method, and a failure detection program that can narrow down a failure location.

本願の開示する通信装置は、一つの態様において、通信の障害を検出する検出部と、装置の内部に設けられた、各処理を実行する処理部位ごとに、前記処理部位を通過したメッセージの数を計数する計数部とを有する。通信装置は、前記検出部によって障害が検出された場合に、前記計数部によって計数された各処理部位のメッセージの数を比較する比較部を有する。通信装置は、前記比較部が比較した結果において、各処理部位を通過したメッセージの数に差異が生じている場合に、前記装置の内部を障害箇所と特定し、前記メッセージの数に差異が生じていない場合に、前記装置の外部を障害箇所と特定する特定部を有する。 In one aspect, the communication device disclosed in the present application includes a detection unit that detects a communication failure, and the number of messages that pass through the processing part for each processing part that is provided in the apparatus and that executes each process. And a counting unit for counting The communication device includes a comparison unit that compares the number of messages of each processing site counted by the counting unit when a failure is detected by the detection unit. When there is a difference in the number of messages that have passed through each processing site as a result of the comparison by the comparison unit, the communication device identifies the inside of the device as a failure location, and the number of messages is different. If not, a specifying unit for specifying the outside of the device as a failure point is provided.

本願の開示する通信装置、障害検出方法および障害検出プログラムの一つの態様によれば、障害箇所を絞り込むことができるという効果を奏する。 According to one aspect of the communication device, the failure detection method, and the failure detection program disclosed in the present application, it is possible to narrow down the failure location.

図１は、実施例１に係る制御装置の構成例を示す図である。FIG. 1 is a diagram illustrating a configuration example of a control device according to the first embodiment. 図２は、実施例１に係るＦＢカードのハードウェア構成例を示す図である。FIG. 2 is a diagram illustrating a hardware configuration example of the FB card according to the first embodiment. 図３は、実施例１に係るＦＢカードの機能ブロックを示す図である。FIG. 3 is a diagram illustrating functional blocks of the FB card according to the first embodiment. 図４は、信号種別テーブルに記憶される情報の例を示す図である。FIG. 4 is a diagram illustrating an example of information stored in the signal type table. 図５は、通過数テーブルに記憶される情報の例を示す図である。FIG. 5 is a diagram illustrating an example of information stored in the passage number table. 図６は、ポート設定テーブルに記憶される情報の例を示す図である。FIG. 6 is a diagram illustrating an example of information stored in the port setting table. 図７は、カード内で送受信されるメッセージの例を示す図である。FIG. 7 is a diagram illustrating an example of messages transmitted and received in the card. 図８は、実施例１に係るＦＢカードが実行する処理シーケンスを示す図である。FIG. 8 is a diagram illustrating a processing sequence executed by the FB card according to the first embodiment. 図９は、実施例１に係るＦＢカードが実行する障害箇所特定処理を示すフローチャートである。FIG. 9 is a flowchart illustrating the failure location specifying process executed by the FB card according to the first embodiment. 図１０は、通知信号を用いた障害検出の処理シーケンスを示す図である。FIG. 10 is a diagram illustrating a processing sequence for fault detection using a notification signal. 図１１は、要求信号の送信処理を用いた障害検出の処理シーケンスを示す図である。FIG. 11 is a diagram illustrating a processing sequence for failure detection using request signal transmission processing. 図１２は、障害特定プログラムを実行するコンピュータのハードウェア構成の例を示す図である。FIG. 12 is a diagram illustrating an example of a hardware configuration of a computer that executes a failure identification program.

以下に、本願の開示する通信装置、障害検出方法および障害検出プログラムの実施例を図面に基づいて詳細に説明する。なお、この実施例によりこの発明が限定されるものではない。 Embodiments of a communication device, a failure detection method, and a failure detection program disclosed in the present application will be described below in detail with reference to the drawings. Note that the present invention is not limited to the embodiments.

［全体構成］
図１は、実施例１に係る制御装置の構成例を示す図である。図１に示すように、制御装置１は、管理装置２とシャーシ３とを有する装置であり、例えば基地局制御装置（RNC：Radio Network Controller）などの装置である。また、シャーシ３は、ＦＢ（Function Block）カード１０とＦＢカード３０とがＳＷ（Switch）カード５を介して接続される。なお、ここで示したシャーシの数、カードの種別、ＦＢカードやＳＷカードの数は、あくまで例示であり、図示したものに限定されるものではない。 [overall structure]
FIG. 1 is a diagram illustrating a configuration example of a control device according to the first embodiment. As shown in FIG. 1, the control device 1 is a device having a management device 2 and a chassis 3, for example, a device such as a base station control device (RNC: Radio Network Controller). In the chassis 3, an FB (Function Block) card 10 and an FB card 30 are connected via an SW (Switch) card 5. Note that the number of chassis, the type of card, and the number of FB cards and SW cards shown here are merely examples, and are not limited to those illustrated.

管理装置２は、制御装置１で実行される処理を司る管理装置であり、ＳＷカード５を介して、ＦＢカード１０とＦＢカード３０各々に接続される。例えば、管理装置２は、管理者の指示操作を受け付けて、ＦＢカード１０やＦＢカード３０に対して、電源制御、データ処理の開始や終了制御などの処理を実行する。 The management device 2 is a management device that manages processing executed by the control device 1, and is connected to the FB card 10 and the FB card 30 via the SW card 5. For example, the management device 2 receives an instruction operation from the administrator, and executes processing such as power control and start / end control of data processing for the FB card 10 and the FB card 30.

ＳＷカード５は、各カードの接続経路を含むスイッチング情報等を保持し、ＦＢカード１０、ＦＢカード３０、管理装置２各々を相互に接続する。図１の例では、ＳＷカード５は、ＦＢカード１０とＦＢカード３０との間を２経路で接続し、経路を冗長化している。 The SW card 5 holds switching information including connection paths of the respective cards, and connects the FB card 10, the FB card 30, and the management device 2 to each other. In the example of FIG. 1, the SW card 5 connects the FB card 10 and the FB card 30 with two paths to make the paths redundant.

ＦＢカード１０とＦＢカード３０は、通信装置の一例であり、データを他のカード等に送信したり受信したりする。図１の例では、ＦＢカード１０とＦＢカード３０とは、２つの物理ポートで接続される。例えば、ＦＢカード１０は、他のカード等から受信したデータを接続先のＦＢカード３０に出力したり、変調処理を実行する他のカード等に出力する。なお、各ＦＢカードの接続先は、図示したようにＦＢカードに限ったものではなく、例えば変調を実行するカード、コーティングを実行するカード、無線端末にデータを送信するカード等であってもよい。 The FB card 10 and the FB card 30 are examples of communication devices, and transmit and receive data to other cards and the like. In the example of FIG. 1, the FB card 10 and the FB card 30 are connected by two physical ports. For example, the FB card 10 outputs data received from another card or the like to the connected FB card 30 or to another card or the like that executes modulation processing. Note that the connection destination of each FB card is not limited to the FB card as illustrated, and may be, for example, a card that performs modulation, a card that performs coating, a card that transmits data to a wireless terminal, or the like. .

このようなＦＢカード１０またはＦＢカード３０は、通信の障害を検出する。また、ＦＢカード１０またはＦＢカード３０は、カードの内部に設けられた、各処理を実行する処理部位ごとに、処理部位を通過したメッセージの数を計数する。また、ＦＢカード１０またはＦＢカード３０は、障害が検出された場合に、計数された各処理部位のメッセージの数を比較する。そして、ＦＢカード１０またはＦＢカード３０は、比較した結果において、各処理部位を通過したメッセージの数に差異が生じている場合に、カードの内部を障害箇所と特定し、メッセージの数に差異が生じていない場合に、カードの外部を障害箇所と特定する。 Such FB card 10 or FB card 30 detects a communication failure. In addition, the FB card 10 or the FB card 30 counts the number of messages that have passed through the processing portion for each processing portion that is provided inside the card and that executes each processing. Further, when a failure is detected, the FB card 10 or the FB card 30 compares the counted number of messages of each processing part. Then, in the FB card 10 or the FB card 30, if there is a difference in the number of messages that have passed through each processing site in the result of comparison, the inside of the card is identified as a failure location, and the difference in the number of messages If it does not occur, identify the outside of the card as the failure location.

このように、ＦＢカード１０またはＦＢカード３０は、対向装置との間で障害が発生した場合に、自カード内でメッセージが正常に送受信されているかを判定することができる。この結果、ＦＢカード１０またはＦＢカード３０は、障害箇所がカード内かカード外かを特定することができる。したがって、障害箇所を絞り込むことができる。 Thus, the FB card 10 or the FB card 30 can determine whether a message is normally transmitted / received in the own card when a failure occurs with the opposite device. As a result, the FB card 10 or the FB card 30 can specify whether the failure location is inside or outside the card. Therefore, the failure location can be narrowed down.

［ＦＢカードのハードウェア構成］
図２は、実施例１に係るＦＢカードのハードウェア構成例を示す図である。なお、図１に示したＦＢカード１０とＦＢカード３０とは同様の構成を有するので、ハードウェア構成および機能ブロック図については、ＦＢカード１０を例にして説明する。 [Hardware configuration of FB card]
FIG. 2 is a diagram illustrating a hardware configuration example of the FB card according to the first embodiment. Since the FB card 10 and the FB card 30 shown in FIG. 1 have the same configuration, the hardware configuration and the functional block diagram will be described using the FB card 10 as an example.

図２に示すように、ＦＢカード１０は、コネクタ１０ａとＰＨＹ（Physical Layer）１０ｂとＰＨＹ１０ｃとコネクタ１０ｄとＰＨＹ１０ｅとを有する。さらに、ＦＢカード１０は、ＦＰＧＡ（Field-Programmable Gate Array）１１とネットワークプロセッサ１２とメモリ１３とメモリコントローラ１４とＣＰＵ（Central Processing Unit）１５とを有する。なお、ここで示したハードウェアはあくまで例示であり、図示するハードウェア以外のハードウェアを有していてもよい。 As shown in FIG. 2, the FB card 10 includes a connector 10a, a PHY (Physical Layer) 10b, a PHY 10c, a connector 10d, and a PHY 10e. The FB card 10 further includes an FPGA (Field-Programmable Gate Array) 11, a network processor 12, a memory 13, a memory controller 14, and a CPU (Central Processing Unit) 15. Note that the hardware shown here is merely an example, and hardware other than the illustrated hardware may be included.

コネクタ１０ａは、ＦＢカード１０のバックサイドに設けられたイーサコネクタなどであり、他のＦＢカードや管理装置２との間を接続するケーブルと繋がれる。ＰＨＹ１０ｂとＰＨＹ１０ｃとは、コネクタ１０ａとＦＰＧＡ１１との間を接続する回路などのハードウェアであり、コネクタ１０ａとＦＰＧＡ１１との間でやり取りされる通信を物理層レベルで制御する。 The connector 10 a is an Ethernet connector or the like provided on the back side of the FB card 10, and is connected to a cable connecting the other FB card and the management device 2. The PHY 10b and the PHY 10c are hardware such as a circuit for connecting the connector 10a and the FPGA 11 and control communication exchanged between the connector 10a and the FPGA 11 at a physical layer level.

コネクタ１０ｄは、ＦＢカード１０のフロントサイドに設けられたイーサコネクタなどであり、ＦＢカード１０を操作するコンピュータ装置などと接続するケーブル等と繋がれる。ＰＨＹ１０ｅは、コネクタ１０ｄとメモリコントローラ１４との間を接続する回路などのハードウェアであり、コネクタ１０ｄとメモリコントローラ１４との間でやり取りされる通信を物理層レベルで制御する。 The connector 10d is an Ethernet connector or the like provided on the front side of the FB card 10, and is connected to a cable or the like that is connected to a computer device that operates the FB card 10. The PHY 10e is hardware such as a circuit for connecting the connector 10d and the memory controller 14, and controls communication exchanged between the connector 10d and the memory controller 14 at a physical layer level.

ＦＰＧＡ１１は、ＭＡＣ（Media Access Control）１１ａとＭＡＣ１１ｂとスイッチ１１ｃとを有し、これらによって、データ転送をスイッチングする集積回路である。ＭＡＣ１１ａは、ＰＨＹ１０ｂとスイッチ１１ｃとの間を接続する回路などのハードウェアであり、ＰＨＹ１０ｂとスイッチ１１ｃとの間でやり取りされる通信をデータリンク層レベルで制御する。同様に、ＭＡＣ１１ｂは、ＰＨＹ１０ｃとスイッチ１１ｃとの間を接続する回路などのハードウェアであり、ＰＨＹ１０ｃとスイッチ１１ｃとの間でやり取りされる通信をデータリンク層レベルで制御する。 The FPGA 11 includes an MAC (Media Access Control) 11a, a MAC 11b, and a switch 11c, and is an integrated circuit that switches data transfer using these. The MAC 11a is hardware such as a circuit that connects the PHY 10b and the switch 11c, and controls communication exchanged between the PHY 10b and the switch 11c at the data link layer level. Similarly, the MAC 11b is hardware such as a circuit for connecting the PHY 10c and the switch 11c, and controls communication exchanged between the PHY 10c and the switch 11c at the data link layer level.

スイッチ１１ｃは、ＭＡＣ１１ａとネットワークプロセッサ１２との間を接続するスイッチング回路であり、同様に、ＭＡＣ１１ｂとネットワークプロセッサ１２との間を接続するスイッチング回路である。スイッチ１１ｃは、ネットワークプロセッサ１２から入力されたパケットなどのメッセージを、宛先が接続されるＭＡＣ１１ａまたはＭＡＣ１１ｂに出力する。また、スイッチ１１ｃは、ＭＡＣ１１ａまたはＭＡＣ１１ｂから入力されたパケットなどのメッセージをネットワークプロセッサ１２に出力する。 The switch 11c is a switching circuit that connects between the MAC 11a and the network processor 12, and is similarly a switching circuit that connects between the MAC 11b and the network processor 12. The switch 11c outputs a message such as a packet input from the network processor 12 to the MAC 11a or MAC 11b to which the destination is connected. The switch 11c outputs a message such as a packet input from the MAC 11a or MAC 11b to the network processor 12.

ネットワークプロセッサ１２は、パケット転送などパケット処理に特化したプロセッサなどの電子回路であり、ＦＰＧＡ１１とメモリコントローラ１４との間のパケット転送を制御する。メモリ１３は、ＦＢカード１０が実行する各処理で用いられるデータやプログラム等を記憶する記憶装置である。 The network processor 12 is an electronic circuit such as a processor specialized for packet processing such as packet transfer, and controls packet transfer between the FPGA 11 and the memory controller 14. The memory 13 is a storage device that stores data, programs, and the like used in each process executed by the FB card 10.

メモリコントローラ１４は、メモリ１３へのデータの書き込みやメモリ１３からのデータの読み出しを制御する集積回路などである。メモリコントローラ１４は、ネットワークプロセッサ１２とメモリ１３との間のデータ書き込みまたはデータ読み出しを制御する。同様に、メモリコントローラ１４は、ＣＰＵ１５とメモリ１３との間のデータ書き込みまたはデータ読み出しを制御する。同様に、メモリコントローラ１４は、コネクタ１０ｄに接続される装置とメモリ１３との間のデータ書き込みまたはデータ読み出しを制御する。 The memory controller 14 is an integrated circuit that controls data writing to the memory 13 and data reading from the memory 13. The memory controller 14 controls data writing or data reading between the network processor 12 and the memory 13. Similarly, the memory controller 14 controls data writing or data reading between the CPU 15 and the memory 13. Similarly, the memory controller 14 controls data writing or data reading between the device connected to the connector 10d and the memory 13.

ＣＰＵ１５は、内部メモリ等を有し、ＦＢカード全体の制御を司る電子回路である。ＣＰＵ１５は、通信の障害の検出、障害箇所の特定、障害箇所の復旧制御等の各処理を実行する。 The CPU 15 is an electronic circuit that has an internal memory and the like and controls the entire FB card. The CPU 15 executes various processes such as communication failure detection, failure location identification, and failure location recovery control.

［ＦＢカードの機能ブロック図］
図３は、実施例１に係るＦＢカードの機能ブロックを示す図である。図３に示すように、ＦＢカード１０は、信号種別テーブル２０ａと通過数テーブル２０ｂとポート設定テーブル２０ｃとを有する。また、ＦＢカード１０は、送受信処理部２１と計数部２２と信号種別判定部２３と確認管理部２４と障害検出部２５と比較部２６と特定部２７と復旧制御部２８とを有する。 [FB card functional block diagram]
FIG. 3 is a diagram illustrating functional blocks of the FB card according to the first embodiment. As shown in FIG. 3, the FB card 10 includes a signal type table 20a, a passing number table 20b, and a port setting table 20c. The FB card 10 also includes a transmission / reception processing unit 21, a counting unit 22, a signal type determination unit 23, a confirmation management unit 24, a failure detection unit 25, a comparison unit 26, a specification unit 27, and a recovery control unit 28.

信号種別テーブル２０ａと通過数テーブル２０ｂとポート設定テーブル２０ｃとは、例えばメモリ１３に設けられる。送受信処理部２１と計数部２２と信号種別判定部２３と確認管理部２４と障害検出部２５と比較部２６と特定部２７と復旧制御部２８とは、ＣＰＵ１５が実行する処理部である。 The signal type table 20a, the passage number table 20b, and the port setting table 20c are provided in the memory 13, for example. The transmission / reception processing unit 21, the counting unit 22, the signal type determination unit 23, the confirmation management unit 24, the failure detection unit 25, the comparison unit 26, the identification unit 27, and the recovery control unit 28 are processing units executed by the CPU 15.

信号種別テーブル２０ａは、信号種別ごとに設定されたタイマ値を記憶するテーブルである。ここで記憶される情報は、管理者等によって更新される。図４は、信号種別テーブルに記憶される情報の例を示す図である。図４に示すように、信号種別テーブル２０ａは、「MsgID、信号種別、応答／確認待ちタイマ値、障害検出待ちタイマ値」を対応付けて記憶する。 The signal type table 20a is a table that stores timer values set for each signal type. The information stored here is updated by an administrator or the like. FIG. 4 is a diagram illustrating an example of information stored in the signal type table. As shown in FIG. 4, the signal type table 20a stores “MsgID, signal type, response / confirmation wait timer value, failure detection wait timer value” in association with each other.

ここで記憶される「MsgID」は、メッセージの種別を識別する識別子であり、ＦＢカード間等で送受信されるメッセージに含まれる。「信号種別」は、MsgIDによって特定されるメッセージの種別である。「応答／確認待ちタイマ値」は、通常のメッセージ送信で用いられるタイマ値であり、メッセージを送信してから応答を受信するまでのタイマ値である。「障害検出待ちタイマ値」は、再送信号待ちタイマと同等の用途で用いられるタイマ値であり、応答メッセージを送信してから当該応答の再送を要求されるまでのタイマ値である。 The “MsgID” stored here is an identifier for identifying the type of message, and is included in messages transmitted and received between FB cards. “Signal type” is the type of message specified by MsgID. The “response / confirmation wait timer value” is a timer value used in normal message transmission, and is a timer value from when a message is transmitted until a response is received. The “failure detection wait timer value” is a timer value used in the same application as the retransmission signal wait timer, and is a timer value from when a response message is transmitted until the response is requested to be retransmitted.

図４の場合、「MsgID」が「1029」である要求メッセージを送信してから「400ms」経過するまでに、当該要求メッセージの応答を受信しなかった場合に、障害と判定されることを示す。また、「MsgID」が「1030」である応答メッセージを送信してから「600ms」経過するまでに、送信した応答メッセージの再送要求を受信した場合に、障害と判定されることを示す。また、「MsgID」が「1033」である定期報告メッセージは、「8640000ms」間隔で受信しなかった場合に、障害と判定されることを示す。 In the case of FIG. 4, it indicates that a failure is determined when a response to the request message is not received before “400 ms” has elapsed since the transmission of the request message with “MsgID” “1029”. . In addition, it indicates that a failure is determined when a retransmission request for the transmitted response message is received by the time “600 ms” elapses after the response message with “MsgID” “1030” is transmitted. Further, the periodic report message whose “MsgID” is “1033” indicates that it is determined as a failure when it is not received at “8640000 ms” intervals.

通過数テーブル２０ｂは、計数部２２によって計数された処理部位ごとのメッセージ通過数を記憶するテーブルである。ここで記憶される情報は、計数部２２によって更新される。図５は、通過数テーブルに記憶される情報の例を示す図である。図５に示すように、通過数テーブル２０ｂは、処理部位ごとに「ＵＤＰ（User Datagram Protocol）ポート番号、通過数」を対応付けて記憶する。ここで記憶される「ＵＤＰポート番号」は、通過したメッセージが使用するポート番号であり、メッセージから取得することができる。「通過数」は、通過したメッセージの数である。また、通過数テーブル２０ｂは、結線情報や予め設定されたメッセージの送受信経路にしたがって、処理部位の接続順にテーブルを設けてもよい。 The passing number table 20b is a table that stores the number of passing messages for each processing part counted by the counting unit 22. The information stored here is updated by the counting unit 22. FIG. 5 is a diagram illustrating an example of information stored in the passage number table. As illustrated in FIG. 5, the passing number table 20 b stores “UDP (User Datagram Protocol) port number, passing number” in association with each processing part. The “UDP port number” stored here is a port number used by the passed message, and can be obtained from the message. The “number of passages” is the number of messages that have passed. Further, the passing number table 20b may be provided in the order of connection of processing parts in accordance with connection information and a preset message transmission / reception route.

図５の場合、ＣＰＵ１５、メモリコントローラ１４、ネットワークプロセッサ１２の順に結線されており、各処理部位についてメッセージの通過数が計数されていることを示す。例えば、ＣＰＵ１５については、ＵＤＰポート番号「1024」を「42」個のメッセージが通過し、ＵＤＰポート番号「1025」を「35」個のメッセージが通過したことを示す。また、ＣＰＵ１５について、ＵＤＰポート番号「1026」を「29」個のメッセージが通過し、ＵＤＰポート番号「1040」を「42」個のメッセージが通過したことを示す。 In the case of FIG. 5, the CPU 15, the memory controller 14, and the network processor 12 are connected in this order, indicating that the number of passing messages is counted for each processing part. For example, the CPU 15 indicates that “42” messages have passed through the UDP port number “1024” and “35” messages have passed through the UDP port number “1025”. Further, the CPU 15 indicates that “29” messages have passed through the UDP port number “1026” and “42” messages have passed through the UDP port number “1040”.

ポート設定テーブル２０ｃは、ＵＤＰポート番号と処理機能との対応付けを記憶するテーブルである。ここで記憶される情報は、管理者等によって設定される。図６は、ポート設定テーブルに記憶される情報の例を示す図である。図６に示すように、ポート設定テーブル２０ｃは、「ＵＤＰポート番号、処理機能部名」を対応付けて記憶する。ここで記憶される「ＵＤＰポート番号」は、メッセージが使用するＵＤＰのポート番号を示す。「処理機能部名」は、メッセージを送受信する機能や処理を示す。 The port setting table 20c is a table that stores associations between UDP port numbers and processing functions. The information stored here is set by an administrator or the like. FIG. 6 is a diagram illustrating an example of information stored in the port setting table. As illustrated in FIG. 6, the port setting table 20 c stores “UDP port number, processing function unit name” in association with each other. The “UDP port number” stored here indicates the UDP port number used by the message. The “processing function part name” indicates a function or process for transmitting / receiving a message.

図６の場合、ＵＤＰポート番号「1024」を使用するメッセージは、ＰＦＩＦ（Plat Form interface）でやり取りされる同期処理において必要なメッセージであることを示す。同様に、ＵＤＰポート番号「1040」を使用するメッセージは、配下カード間またはＭａｔｅ間でやり取りされる転送メッセージであることを示す。なお、Ｍａｔｅとは、例えば冗長構成をとるものであり、片系がＮ系、他方がＥ系で、どちらかがＡＣＴ系（ＡＣＴＩＶＥに運用している系）で、ＳＴＢＹ系（Ｓｔａｎｄｂｙしている退避系）を指す。例えば、Ｍａｔｅ間でのやりとりとは、ＡＣＴ系とＳＴＢＹ系との間でのデータのやりとりである。 In the case of FIG. 6, the message using the UDP port number “1024” indicates that it is a message necessary for the synchronous processing exchanged by PFIF (Platform Interface). Similarly, a message using the UDP port number “1040” indicates a transfer message exchanged between subordinate cards or Mates. Note that the Mate has a redundant configuration, for example, one system is an N system, the other is an E system, one of which is an ACT system (a system operating in ACTIVE), and an STBY system (Standby). Retract system). For example, the exchange between Mates is the exchange of data between the ACT system and the STBY system.

送受信処理部２１は、他のカード等からメッセージを受信したり、他のカードにメッセージを送信したりする処理部である。例えば、送受信処理部２１は、メッセージを受信した場合に、受信したメッセージを信号種別判定部２３に出力する。また、送受信処理部２１は、信号種別判定部２３から指示されたメッセージを指示された宛先に送信する。 The transmission / reception processing unit 21 is a processing unit that receives a message from another card or the like or transmits a message to another card. For example, when the transmission / reception processing unit 21 receives a message, the transmission / reception processing unit 21 outputs the received message to the signal type determination unit 23. Further, the transmission / reception processing unit 21 transmits the message instructed from the signal type determination unit 23 to the instructed destination.

計数部２２は、ＦＢカード１０内部に設けられた、各処理を実行する処理部位ごとに、処理部位を通過したメッセージの数を計数する処理部である。例えば、計数部２２は、送受信処理部２１を介して、ＦＢカード１０内の各ハードウェアまたはＣＰＵ１５が実行する各処理機能を監視し、通過したメッセージを計数する。 The counting unit 22 is a processing unit that is provided inside the FB card 10 and counts the number of messages that have passed through the processing site for each processing site that executes each process. For example, the counting unit 22 monitors each hardware function in the FB card 10 or each processing function executed by the CPU 15 via the transmission / reception processing unit 21 and counts messages that have passed.

ここで、ＦＢカード１０内で通過するメッセージの例を説明する。図７は、カード内で送受信されるメッセージの例を示す図である。図７に示すように、メッセージは、「送信元情報、送信先情報、送信元アドレス、送信先アドレス、送信元ＵＤＰポート番号、送信先ＵＤＰポート番号、送信元物理ポート番号、送信先物理ポート番号、メッセージ番号」などを有する。 Here, an example of a message passing through the FB card 10 will be described. FIG. 7 is a diagram illustrating an example of messages transmitted and received in the card. As shown in FIG. 7, the message is “transmission source information, transmission destination information, transmission source address, transmission destination address, transmission source UDP port number, transmission destination UDP port number, transmission source physical port number, transmission destination physical port number. Message number ".

例えば、「送信元情報」は、メッセージを送信したカードを特定する情報であり、マシン名やカード名などである。「送信先情報」は、メッセージの送信先であるカードを特定する情報であり、マシン名やカード名などである。「送信元アドレス」は、メッセージを送信したカードのアドレス情報であり、「送信先アドレス」は、メッセージの送信先であるカードのアドレス情報である。なお、アドレス情報としては、ＩＰ（Internet Protocol）アドレスやＭＡＣアドレスを用いることができる。 For example, “transmission source information” is information that identifies a card that has transmitted a message, such as a machine name or a card name. The “transmission destination information” is information for specifying a card that is a transmission destination of a message, such as a machine name or a card name. The “transmission source address” is address information of the card that transmitted the message, and the “transmission destination address” is address information of the card that is the transmission destination of the message. Note that an IP (Internet Protocol) address or a MAC address can be used as the address information.

「送信元ＵＤＰポート番号」および「送信先ＵＤＰポート番号」は、送信元の装置によって指定されたポート番号であり、メッセージの送受信において使用するＵＤＰポート番号である。「送信元物理ポート番号」および「送信先物理ポート番号」は、送信元の装置によって指定された送信経路となる物理ポート番号であり、メッセージが使用する物理的なインタフェースの番号である。「メッセージ番号」は、図４の「MsgID」に該当し、メッセージの種別を識別する識別子である。 The “transmission source UDP port number” and “transmission destination UDP port number” are port numbers designated by the transmission source device, and are UDP port numbers used in message transmission / reception. The “transmission source physical port number” and the “destination physical port number” are physical port numbers that are transmission paths designated by the transmission source device, and are numbers of physical interfaces used by the message. “Message number” corresponds to “MsgID” in FIG. 4 and is an identifier for identifying the type of message.

通過数を計数する一例を挙げると、計数部２２は、ＣＰＵ１５を監視し、図７に示すフォーマットで記述されたメッセージがＣＰＵ１５から出力された場合、「送信元ＵＤＰポート番号」または「送信先ＵＤＰポート番号」からＵＤＰポート番号を抽出する。そして、計数部２２は、通過数テーブル２０ｂが保持するテーブルのうちＣＰＵ１５に対応付けられたテーブルにおいて、抽出された「ＵＤＰポート番号」の通過数をインクリメントする。 For example, the counting unit 22 monitors the CPU 15 and, when a message described in the format shown in FIG. 7 is output from the CPU 15, the “transmission source UDP port number” or “transmission destination UDP”. The UDP port number is extracted from “Port Number”. Then, the counting unit 22 increments the number of passages of the extracted “UDP port number” in the table associated with the CPU 15 among the tables held by the passage number table 20b.

図３に戻り、信号種別判定部２３は、送受信処理部２１が受信したメッセージの種別を判定して確認管理部２４に通知する処理部である。また、信号種別判定部２３は、送受信処理部２１が送信したメッセージの種別を判定して確認管理部２４に通知する処理部である。 Returning to FIG. 3, the signal type determination unit 23 is a processing unit that determines the type of message received by the transmission / reception processing unit 21 and notifies the confirmation management unit 24 of the type. The signal type determination unit 23 is a processing unit that determines the type of the message transmitted by the transmission / reception processing unit 21 and notifies the confirmation management unit 24 of the type.

例えば、信号種別判定部２３は、送受信処理部２１から受信メッセージが入力された場合に、受信メッセージに含まれる「メッセージ番号」を抽出する。そして、信号種別判定部２３は、抽出した「メッセージ番号」に対応する「信号種別」を信号種別テーブル２０ａから特定して、確認管理部２４に通知する。一例を挙げると、信号種別判定部２３は、受信メッセージから「メッセージ番号」として「1029」を抽出した場合、「1029」に対応する「要求」を受信メッセージの信号種別として、確認管理部２４に通知する。 For example, when a received message is input from the transmission / reception processing unit 21, the signal type determination unit 23 extracts a “message number” included in the received message. Then, the signal type determination unit 23 specifies the “signal type” corresponding to the extracted “message number” from the signal type table 20 a and notifies the confirmation management unit 24 of it. For example, when “1029” is extracted as the “message number” from the received message, the signal type determination unit 23 sets the “request” corresponding to “1029” as the signal type of the received message to the confirmation management unit 24. Notice.

また、信号種別判定部２３は、送受信処理部２１からメッセージが送信される場合に、送信メッセージに含まれる「メッセージ番号」を抽出する。そして、信号種別判定部２３は、抽出した「メッセージ番号」に対応する「信号種別」を信号種別テーブル２０ａから特定して、確認管理部２４に通知する。一例を挙げると、信号種別判定部２３は、送信メッセージから「メッセージ番号」として「1030」を抽出した場合、「1030」に対応する「応答」を受信メッセージの信号種別として、確認管理部２４に通知する。 In addition, when a message is transmitted from the transmission / reception processing unit 21, the signal type determination unit 23 extracts a “message number” included in the transmission message. Then, the signal type determination unit 23 specifies the “signal type” corresponding to the extracted “message number” from the signal type table 20 a and notifies the confirmation management unit 24 of it. For example, when “1030” is extracted as the “message number” from the transmission message, the signal type determination unit 23 sets the “response” corresponding to “1030” as the signal type of the received message to the confirmation management unit 24. Notice.

確認管理部２４は、信号種別判定部２３から通知された信号種別に対応するタイマを起動させる処理部である。例えば、確認管理部２４は、信号種別判定部２３から「信号種別」が「要求」であることを通知された場合にタイマを起動させる。そして、確認管理部２４は、タイマを起動させたことや、起動させたタイマの信号種別、起動時間等を障害検出部２５に通知する。また、確認管理部２４は、正常にメッセージが送受信された場合や障害検出部２５等によって指示された場合に、タイマを初期化する。 The confirmation management unit 24 is a processing unit that starts a timer corresponding to the signal type notified from the signal type determination unit 23. For example, the confirmation management unit 24 starts the timer when notified from the signal type determination unit 23 that the “signal type” is “request”. Then, the confirmation management unit 24 notifies the failure detection unit 25 of the start of the timer, the signal type of the started timer, the start time, and the like. The confirmation management unit 24 initializes the timer when a message is normally transmitted / received or when instructed by the failure detection unit 25 or the like.

障害検出部２５は、通信の障害を検出する処理部である。例えば、障害検出部２５は、確認管理部２４によってタイマが起動されてから、設定されるタイマ値に到達するまでの間に所定のメッセージを受信できない場合に、障害が発生したと検出する。 The failure detection unit 25 is a processing unit that detects a communication failure. For example, the failure detection unit 25 detects that a failure has occurred when a predetermined message cannot be received after the timer is started by the confirmation management unit 24 until the set timer value is reached.

一例を挙げると、障害検出部２５は、確認管理部２４から「信号種別＝要求」に対応するタイマを起動させたことが通知されたとする。この場合、障害検出部２５は、信号種別判定部２３から確認管理部２４に対して要求メッセージの応答が受信されたことが通知された時点のタイマの値を特定する。つまり、障害検出部２５は、送受信処理部２１によって要求メッセージが受信された時点のタイマ値を特定する。そして、障害検出部２５は、特定したタイマの値が「信号種別＝要求」のタイマ値「400ms」未満である場合に、正常にメッセージが送受信されたとして、タイマをリセットする。一方、障害検出部２５は、特定したタイマの値が「信号種別＝要求」のタイマ値「400ms」以上である場合に、障害が発生したと検出して比較部２６に通知する。 For example, it is assumed that the failure detection unit 25 is notified from the confirmation management unit 24 that the timer corresponding to “signal type = request” has been started. In this case, the failure detection unit 25 specifies the timer value at the time when the signal type determination unit 23 is notified to the confirmation management unit 24 that the response of the request message has been received. That is, the failure detection unit 25 specifies the timer value at the time when the request message is received by the transmission / reception processing unit 21. Then, when the value of the specified timer is less than the timer value “400 ms” of “signal type = request”, the failure detection unit 25 resets the timer, assuming that the message is normally transmitted and received. On the other hand, the failure detection unit 25 detects that a failure has occurred and notifies the comparison unit 26 when the specified timer value is equal to or greater than the timer value “400 ms” of “signal type = request”.

別例を挙げると、障害検出部２５は、確認管理部２４から「信号種別＝応答」に対応するタイマを起動させたことが通知されたとする。この場合、障害検出部２５は、「信号種別＝応答」のタイマ値「600ms」経過前に、信号種別判定部２３から確認管理部２４に対して応答メッセージの再送要求が受信されたことが通知された場合に、障害が発生したと検出して比較部２６に通知する。つまり、障害検出部２５は、タイマ値「600ms」経過前に再送要求を受信すると障害が発生したと検出する。 As another example, it is assumed that the failure detection unit 25 is notified from the confirmation management unit 24 that the timer corresponding to “signal type = response” has been started. In this case, the failure detection unit 25 notifies that a response message retransmission request is received from the signal type determination unit 23 to the confirmation management unit 24 before the timer value “600 ms” of “signal type = response” elapses. If it has been detected, it is detected that a failure has occurred, and the comparison unit 26 is notified. That is, the failure detection unit 25 detects that a failure has occurred when a retransmission request is received before the timer value “600 ms” elapses.

比較部２６は、障害検出部２５によって障害が検出された場合に、計数部２２によって計数された各処理部位のメッセージの数を比較する処理部である。例えば、比較部２６は、通過数テーブル２０ｂを参照し、各処理部位ごとに計数された通過数を比較する。 The comparison unit 26 is a processing unit that compares the number of messages of each processing region counted by the counting unit 22 when a failure is detected by the failure detection unit 25. For example, the comparison unit 26 refers to the passage number table 20b and compares the passage numbers counted for each processing site.

図５の場合、比較部２６は、ＣＰＵ１５のテーブルに記憶される各ＵＤＰポート番号の通過数とメモリコントローラ１４のテーブルに記憶される各ＵＤＰポート番号の通過数とを比較する。同様に、比較部２６は、メモリコントローラ１４のテーブルに記憶される各ＵＤＰポート番号の通過数とネットワークプロセッサ１２のテーブルに記憶される各ＵＤＰポート番号の通過数とを比較する。同様に、比較部２６は、ＣＰＵ１５のテーブルに記憶される各ＵＤＰポート番号の通過数とネットワークプロセッサ１２のテーブルに記憶される各ＵＤＰポート番号の通過数とを比較する。そして、比較部２６は、比較結果を特定部２７に出力する。 In the case of FIG. 5, the comparison unit 26 compares the number of passages of each UDP port number stored in the table of the CPU 15 with the number of passages of each UDP port number stored in the table of the memory controller 14. Similarly, the comparison unit 26 compares the number of passages of each UDP port number stored in the table of the memory controller 14 with the number of passages of each UDP port number stored in the table of the network processor 12. Similarly, the comparison unit 26 compares the number of passages of each UDP port number stored in the table of the CPU 15 with the number of passages of each UDP port number stored in the table of the network processor 12. Then, the comparing unit 26 outputs the comparison result to the specifying unit 27.

特定部２７は、比較部２６が比較した結果において、各処理部位を通過したメッセージの数に差異が生じている場合に、ＦＢカード１０の内部を障害箇所と特定する処理部である。また、特定部２７は、メッセージの数に差異が生じていない場合に、ＦＢカード１０の外部を障害箇所と特定する処理部である。特定部２７は、特定した障害情報を復旧制御部２８に出力する。 The identifying unit 27 is a processing unit that identifies the inside of the FB card 10 as a failure location when there is a difference in the number of messages that have passed through each processing site as a result of the comparison performed by the comparison unit 26. The specifying unit 27 is a processing unit that specifies the outside of the FB card 10 as a failure location when there is no difference in the number of messages. The identifying unit 27 outputs the identified failure information to the recovery control unit 28.

例えば図５の場合、特定部２７は、ＣＰＵ１５においてＵＤＰポート番号「1040」を通過した数が「42」であり、メモリコントローラ１４においてＵＤＰポート番号「1040」を通過した数が「0」であることから、通過数に差異が生じていると判定する。この場合、特定部２７は、ＣＰＵ１５とメモリコントローラ１４の間の経路、または、メモリコントローラ１４に障害が発生していると特定する。さらに、特定部２７は、ポート設定テーブル２０ｃを参照し、障害を検出したＵＤＰポート番号「1040」に対応する処理機能が「配下カード間またはＭａｔｅ間の転送メッセージ」であることを特定する。一方、特定部２７は、各処理部位を通過したメッセージの数に差異が生じていない場合には、ＦＢカード１０に接続される経路またはＦＢカード１０の接続先であるＦＢカード３０に障害が発生していると特定する。 For example, in the case of FIG. 5, the specifying unit 27 has “42” as the number that has passed the UDP port number “1040” in the CPU 15, and “0” as the number that has passed the UDP port number “1040” in the memory controller 14. Therefore, it is determined that there is a difference in the number of passes. In this case, the specifying unit 27 specifies that a failure has occurred in the path between the CPU 15 and the memory controller 14 or in the memory controller 14. Further, the identifying unit 27 refers to the port setting table 20c and identifies that the processing function corresponding to the UDP port number “1040” in which the failure has been detected is “transfer message between subordinate cards or between mates”. On the other hand, when there is no difference in the number of messages that have passed through each processing part, the specifying unit 27 has a failure in the route connected to the FB card 10 or the FB card 30 that is the connection destination of the FB card 10. Identify that you are doing.

復旧制御部２８は、特定部２７によって特定された障害の要因となった処理部位、または、ＦＢカード１０に接続される接続先の装置に対して、復旧制御を実行する処理部である。例えば、復旧制御部２８は、特定部２７から障害箇所がメモリコントローラ１４であると通知された場合、メモリコントローラ１４に対して、リセット処理、再起動処理、予め定められた復旧コマンドなどの処理を実行する。同様に、復旧制御部２８は、特定部２７から障害箇所がＦＢカード３０であると通知された場合、ＦＢカード３０に対して、リセット処理、再起動処理、予め定められた復旧コマンドなどの処理を実行する。 The recovery control unit 28 is a processing unit that performs recovery control on the processing site that is the cause of the failure specified by the specifying unit 27 or the connection destination device connected to the FB card 10. For example, when the recovery control unit 28 is notified by the specifying unit 27 that the failure location is the memory controller 14, the recovery control unit 28 performs processing such as reset processing, restart processing, and a predetermined recovery command on the memory controller 14. Run. Similarly, when the recovery control unit 28 is notified from the specifying unit 27 that the failure point is the FB card 30, the recovery control unit 28 performs processing such as reset processing, restart processing, and a predetermined recovery command for the FB card 30. Execute.

また、復旧制御部２８は、特定部２７から障害が発生している機能が「配下カード間またはＭａｔｅ間の転送メッセージ」であることと通知されたとする。この場合、復旧制御部２８は、ＣＰＵ１５に対して、この機能を提供するアプリケーションの再起動を要求することもできる。つまり、復旧制御部２８は、ＣＰＵ１５が実行する処理単位で復旧制御を実行することもできる。 Further, it is assumed that the recovery control unit 28 is notified from the specifying unit 27 that the function in which the failure has occurred is “a transfer message between subordinate cards or between mates”. In this case, the recovery control unit 28 can also request the CPU 15 to restart an application that provides this function. That is, the recovery control unit 28 can also execute recovery control in units of processing executed by the CPU 15.

［処理の流れ］
次に、図８と図９とを用いて、実施例１に係るＦＢカード１０が実行する処理の流れを説明する。ここでは、ＦＢカード１０とＦＢカード３０との間の通信を例にして、全体的な処理シーケンスと障害箇所特定処理について説明する。 [Process flow]
Next, the flow of processing executed by the FB card 10 according to the first embodiment will be described with reference to FIGS. 8 and 9. Here, the overall processing sequence and the fault location specifying process will be described by taking communication between the FB card 10 and the FB card 30 as an example.

（全体的な処理シーケンス）
図８は、実施例１に係るＦＢカードが実行する処理シーケンスを示す図である。図８に示すように、ＦＢカード３０は、信号種別が「要求」であるメッセージをＦＢカード１０に送信する（Ｓ１０１とＳ１０２）。 (Overall processing sequence)
FIG. 8 is a diagram illustrating a processing sequence executed by the FB card according to the first embodiment. As illustrated in FIG. 8, the FB card 30 transmits a message whose signal type is “request” to the FB card 10 (S101 and S102).

ＦＢカード１０の送受信処理部２１は、信号種別が「要求」であるメッセージを受信する（Ｓ１０３）。続いて、信号種別判定部２３が受信メッセージの信号種別が「要求」であることを特定し、送受信処理部２１は、「要求」に対応するメッセージとして、信号種別が「応答」であるメッセージをＦＢカード３０に送信する（Ｓ１０４とＳ１０５）。 The transmission / reception processing unit 21 of the FB card 10 receives a message whose signal type is “request” (S103). Subsequently, the signal type determination unit 23 specifies that the signal type of the received message is “request”, and the transmission / reception processing unit 21 selects a message whose signal type is “response” as a message corresponding to “request”. It transmits to the FB card 30 (S104 and S105).

そして、ＦＢカード１０の確認管理部２４は、信号種別テーブル２０ａを参照し、信号種別判定部２３が特定した送信メッセージの信号種別である「応答」に対応したタイマを起動させる（Ｓ１０６）。この場合、タイマ値は「600ms」となる。 Then, the confirmation management unit 24 of the FB card 10 refers to the signal type table 20a and starts a timer corresponding to the “response” that is the signal type of the transmission message specified by the signal type determination unit 23 (S106). In this case, the timer value is “600 ms”.

その後、タイマ値経過前に、ＦＢカード３０は、信号種別が「要求」であるメッセージをＦＢカード１０に再送する（Ｓ１０７とＳ１０８）。ＦＢカード１０の信号種別判定部２３は、送受信処理部２１によって受信されたメッセージの信号種別が「要求」であることを特定する（Ｓ１０９）。 Thereafter, before the timer value elapses, the FB card 30 retransmits a message whose signal type is “request” to the FB card 10 (S107 and S108). The signal type determination unit 23 of the FB card 10 specifies that the signal type of the message received by the transmission / reception processing unit 21 is “request” (S109).

そして、ＦＢカード１０の障害検出部２５は、タイマ値「600ms」の経過前に、応答信号を送信したはずの要求信号の再送を受信したことから、障害が発生したと検出する（Ｓ１１０）。そして、障害検出部２５が確認管理部２４にタイマのリセット指示を送信し、確認管理部２４は、タイマをリセットする（Ｓ１１１）。なお、タイマのリセットは、図示したタイミングに限ったものではなく、障害箇所が特定された後や復旧後であってもよい。 Then, the failure detection unit 25 of the FB card 10 detects that a failure has occurred since the retransmission of the request signal that should have transmitted the response signal has been received before the timer value “600 ms” elapses (S110). Then, the failure detection unit 25 transmits a timer reset instruction to the confirmation management unit 24, and the confirmation management unit 24 resets the timer (S111). Note that the resetting of the timer is not limited to the timing shown in the figure, and may be performed after the fault location is specified or after recovery.

続いて、ＦＢカード１０の比較部２６および特定部２７は、障害箇所特定処理を実行して障害箇所を特定する（Ｓ１１２）。そして、復旧制御部２８は、特定された障害箇所に復旧制御を実行する（Ｓ１１３）。このとき、復旧制御部２８は、必要に応じて、ＦＢカード３０に対しても復旧制御を実行する（Ｓ１１４）。 Subsequently, the comparison unit 26 and the specifying unit 27 of the FB card 10 execute a failure location specifying process to specify the failure location (S112). Then, the recovery control unit 28 executes recovery control on the identified failure location (S113). At this time, the recovery control unit 28 also executes recovery control for the FB card 30 as necessary (S114).

その後、復旧制御によって障害が復旧すると、ＦＢカード１０の送受信処理部２１は、ＦＢカード１０とＦＢカード３０を接続する物理ポートから疎通報告を実行して、接続状態が正常であることを確認する（Ｓ１１５とＳ１１６）。また、ＦＢカード３０は、ＦＢカード１０から正常に疎通報告を受信したことを示す疎通報告確認をＦＢカード１０に送信する（Ｓ１１７とＳ１１８）。このようにして、各ＦＢカードは、疎通が確認できる。 Thereafter, when the failure is recovered by the recovery control, the transmission / reception processing unit 21 of the FB card 10 executes a communication report from the physical port connecting the FB card 10 and the FB card 30 to confirm that the connection state is normal. (S115 and S116). The FB card 30 transmits a communication report confirmation indicating that the communication report has been normally received from the FB card 10 to the FB card 10 (S117 and S118). In this way, communication can be confirmed for each FB card.

（障害箇所特定処理）
図９は、実施例１に係るＦＢカードが実行する障害箇所特定処理を示すフローチャートである。この処理は、図８のＳ１１２で実行される。 (Fault location identification process)
FIG. 9 is a flowchart illustrating the failure location specifying process executed by the FB card according to the first embodiment. This process is executed in S112 of FIG.

図９に示すように、ＦＢカード１０の計数部２２は、ＦＢカード１０内の各ハードウェアまたはＣＰＵ１５が実行する各処理機能を監視し、処理部位ごとに、入力されるメッセージを抽出する（Ｓ２０１）。続いて、計数部２２は、入力されるメッセージが抽出された場合（Ｓ２０２肯定）、処理部位ごとに、出力されるメッセージを抽出する（Ｓ２０３）。そして、計数部２２は、入力メッセージも出力メッセージも検出された処理部位について（Ｓ２０４肯定）、当該処理部位の通過数をカウントアップする（Ｓ２０５）。 As shown in FIG. 9, the counting unit 22 of the FB card 10 monitors each hardware function in the FB card 10 or each processing function executed by the CPU 15, and extracts an input message for each processing part (S201). ). Subsequently, when the input message is extracted (Yes in S202), the counting unit 22 extracts the output message for each processing part (S203). Then, the counting unit 22 counts up the number of passages of the processing part for the processing part in which both the input message and the output message are detected (Yes in S204) (S205).

一方、計数部２２は、入力メッセージまたは出力メッセージが検出されなかった処理部位について（Ｓ２０２否定またはＳ２０４否定）、通過数をカウントアップすることなく、Ｓ２０５を実行する。なお、図９では、Ｓ２０１からＳ２０５の計数処理を実行した後に、Ｓ２０６からＳ２１０の障害検出処理を実行する例を図示したがこれに限定されるものではない。例えば、計数処理と障害検出処理とを非同期で実行してもよい。 On the other hand, the counting unit 22 executes S205 without counting up the number of passages for a processing part for which no input message or output message has been detected (No in S202 or S204). Although FIG. 9 illustrates an example in which the fault detection process from S206 to S210 is performed after the count process from S201 to S205 is performed, the present invention is not limited to this. For example, the counting process and the failure detection process may be executed asynchronously.

その後、障害検出部２５が障害を検出すると（Ｓ２０６肯定）、比較部２６は、通過数テーブル２０ｂを参照し、各処理部位の通過数を比較する（Ｓ２０７）。そして、特定部２７は、比較結果に基づいて通過数に差異があると判定した場合（Ｓ２０８肯定）、比較結果に基づいて障害箇所の処理部位を特定する（Ｓ２０９）。一方、特定部２７は、比較結果に基づいて通過数に差異がないと判定した場合（Ｓ２０８否定）、障害箇所を外部と特定する（Ｓ２１０）。 Thereafter, when the failure detection unit 25 detects a failure (Yes at S206), the comparison unit 26 refers to the passage number table 20b and compares the number of passages of each processing site (S207). Then, if the specifying unit 27 determines that there is a difference in the number of passages based on the comparison result (Yes in S208), the specifying unit 27 specifies the processing site of the fault location based on the comparison result (S209). On the other hand, when it is determined that there is no difference in the number of passages based on the comparison result (No at S208), the specifying unit 27 specifies the fault location as the outside (S210).

［効果］
このように、ＦＢカード１０は、ＦＢカード３０との間でやり取りされるメッセージの信号種別に基づいてタイマを設定し、設定したタイマを用いて障害を検出することができる。つまり、ＦＢカード１０は、信号種別を用いて、ＦＢカード３０との間でシーケンスが続かない場合に、障害を検出することができる。また、ＦＢカード１０は、カード内の処理部位を通過したメッセージを計数することで、障害箇所がカード内かカード外かを特定することができる。さらに、ＦＢカード１０は、障害が発生した処理部位や機能までも特定することができる。したがって、ＦＢカード１０は、障害箇所を絞り込むことができ、復旧制御にかかるリスクも軽減することができる。 [effect]
As described above, the FB card 10 can set the timer based on the signal type of the message exchanged with the FB card 30, and can detect a failure using the set timer. That is, the FB card 10 can detect a failure using the signal type when the sequence does not continue with the FB card 30. Further, the FB card 10 can specify whether the failure part is inside the card or outside the card by counting the messages that have passed through the processing parts in the card. Furthermore, the FB card 10 can also identify the processing site and function where the failure has occurred. Therefore, the FB card 10 can narrow down the location of failure, and can reduce the risk of recovery control.

また、自カード障害での故障部位の特定化が高められ、障害が発生しても他からその発生有無が判断できない障害であるサイレント障害の検出と復旧が可能となる。また、誤った相手側経路への制御が減ることにより、回線上の信号送受信号が減り、回線使用効率が高まる。また、上位装置との通信が一時確保できていなくても、障害復旧制御が可能である。また、地震時など遠隔地における障害が発生した場合でも、最低限度の回線が確保でき、サービス提供が可能となる。 In addition, the identification of the failure part due to the own card failure is enhanced, and even if a failure occurs, it is possible to detect and recover a silent failure, which is a failure in which the presence or absence cannot be determined. Further, since the control to the wrong partner path is reduced, the signal transmission / reception signal on the line is reduced, and the line use efficiency is increased. In addition, failure recovery control is possible even if communication with the host device cannot be secured temporarily. In addition, even when a fault occurs in a remote area such as an earthquake, a minimum number of lines can be secured and services can be provided.

次に、図１０と図１１とを用いて、ＦＢカード１０が実行する障害検出の別例を説明する。図１０では、通知信号を受信した例、図１１では、要求信号を送信した例で、障害を検出する手法を説明する。 Next, another example of failure detection performed by the FB card 10 will be described with reference to FIGS. 10 and 11. FIG. 10 illustrates a method for detecting a failure using an example in which a notification signal is received, and FIG. 11 illustrates an example in which a request signal is transmitted.

（通知信号）
図１０は、通知信号を用いた障害検出の処理シーケンスを示す図である。図１０に示すように、ＦＢカード３０は、信号種別が「通知」であるメッセージをＦＢカード１０に送信する（Ｓ３０１とＳ３０２）。 (Notification signal)
FIG. 10 is a diagram illustrating a processing sequence for fault detection using a notification signal. As shown in FIG. 10, the FB card 30 transmits a message whose signal type is “notification” to the FB card 10 (S301 and S302).

ＦＢカード１０の送受信処理部２１は、信号種別が「通知」であるメッセージを受信する（Ｓ３０３）。続いて、信号種別判定部２３が受信メッセージの信号種別が「通知」であることを特定し、送受信処理部２１は、「通知」に対応するメッセージとして、信号種別が「確認」であるメッセージをＦＢカード３０に送信する（Ｓ３０４とＳ３０５）。 The transmission / reception processing unit 21 of the FB card 10 receives a message whose signal type is “notification” (S303). Subsequently, the signal type determination unit 23 specifies that the signal type of the received message is “notification”, and the transmission / reception processing unit 21 selects a message whose signal type is “confirmation” as a message corresponding to “notification”. The data is transmitted to the FB card 30 (S304 and S305).

そして、ＦＢカード１０の確認管理部２４は、信号種別テーブル２０ａを参照し、信号種別判定部２３が特定した送信メッセージの信号種別である「確認」に対応したタイマを起動させる（Ｓ３０６）。この場合、タイマ値は「200ms」となる。 Then, the confirmation management unit 24 of the FB card 10 refers to the signal type table 20a and starts a timer corresponding to “confirmation” that is the signal type of the transmission message specified by the signal type determination unit 23 (S306). In this case, the timer value is “200 ms”.

その後、ＦＢカード１０は、タイマ値を経過するまで障害検出等の処理を保留する（Ｓ３０７）。この間に、ＦＢカード１０は、ＦＢカード３０から、信号種別が「通知」であるメッセージの再送を受信したとする（Ｓ３０８からＳ３１０）。すなわち、ＦＢカード３０は、ＦＢカード１０から確認信号を受信できていないとする。 Thereafter, the FB card 10 suspends processing such as failure detection until the timer value elapses (S307). During this time, it is assumed that the FB card 10 receives a retransmission of a message whose signal type is “notification” from the FB card 30 (S308 to S310). That is, it is assumed that the FB card 30 has not received a confirmation signal from the FB card 10.

そして、タイマ値経過後、ＦＢカード１０の障害検出部２５は、タイマ値「200ms」の経過前に、確認信号を送信したはずの通知信号の再送を受信したことから、障害が発生したと検出する（Ｓ３１１）。その後、ＦＢカード１０またはＦＢカード３０が実行するＳ３１２からＳ３１９までの処理は、図８で説明したＳ１１１からＳ１１８までの処理と同様なので、説明を省略する。 Then, after the timer value has elapsed, the failure detection unit 25 of the FB card 10 detects that a failure has occurred because it has received a retransmission of the notification signal that should have transmitted the confirmation signal before the timer value “200 ms” has elapsed. (S311). Thereafter, the processing from S312 to S319 executed by the FB card 10 or the FB card 30 is the same as the processing from S111 to S118 described in FIG.

（要求信号）
図１１は、要求信号の送信処理を用いた障害検出の処理シーケンスを示す図である。図１１に示すように、ＦＢカード１０の送受信処理部２１は、信号種別が「要求」であるメッセージをＦＢカード３０に送信する（Ｓ４０１とＳ４０２）。 (Request signal)
FIG. 11 is a diagram illustrating a processing sequence for failure detection using request signal transmission processing. As shown in FIG. 11, the transmission / reception processing unit 21 of the FB card 10 transmits a message whose signal type is “request” to the FB card 30 (S401 and S402).

続いて、ＦＢカード１０の確認管理部２４は、信号種別テーブル２０ａを参照し、信号種別判定部２３が特定した送信メッセージの信号種別である「要求」に対応したタイマを起動させる（Ｓ４０３）。この場合、タイマ値は「400ms」となる。その後、ＦＢカード１０は、タイマ値が経過するまでに、ＦＢカード３０から「応答」を受信していないとする（Ｓ４０４）。 Subsequently, the confirmation management unit 24 of the FB card 10 refers to the signal type table 20a and starts a timer corresponding to the “request” that is the signal type of the transmission message specified by the signal type determination unit 23 (S403). In this case, the timer value is “400 ms”. Thereafter, it is assumed that the FB card 10 has not received a “response” from the FB card 30 until the timer value elapses (S404).

すると、タイマ経過後、ＦＢカード１０の障害検出部２５は、タイマ値「400ms」の経過前に、応答信号を受信していないことから、障害が発生したと検出する（Ｓ４０５）。その後、ＦＢカード１０またはＦＢカード３０が実行するＳ４０６からＳ４１３までの処理は、図８で説明したＳ１１１からＳ１１８までの処理と同様なので、説明を省略する。 Then, after the timer elapses, the failure detection unit 25 of the FB card 10 detects that a failure has occurred since the response signal has not been received before the timer value “400 ms” elapses (S405). Thereafter, the processing from S406 to S413 executed by the FB card 10 or the FB card 30 is the same as the processing from S111 to S118 described in FIG.

（その他の手法）
また、上記した例以外にも様々な信号種別を用いて障害を検出することができる。例えば、ＦＢカード１０は、ＦＢカード３０から定期的に送信される定期報告を用いることもできる。この場合、ＦＢカード１０は、定期報告を8640000ms間隔で受信しなかった場合に、障害が発生したと検出することもできる。 (Other methods)
In addition to the above example, a failure can be detected using various signal types. For example, the FB card 10 can use a periodic report periodically transmitted from the FB card 30. In this case, the FB card 10 can also detect that a failure has occurred when the periodic report is not received at an interval of 8640000 ms.

別の手法では、ＦＢカード１０の比較部２６は、通過数テーブル２０ｂに記憶される処理部位ごとの通過数を定期的に比較し、通過数に差異が生じているという比較結果が得られた時点で、障害発生を検出することもできる。このように、ＦＢカード１０は、信号種別からだけでなく、自装置内のメッセージの通過状況からも障害を検出することができる。したがって、ＦＢカード１０は、タイマに左右されることもなく、内部障害を迅速に検出することができる。 In another method, the comparison unit 26 of the FB card 10 periodically compares the number of passes for each processing region stored in the number-of-passes table 20b, and a comparison result is obtained that there is a difference in the number of passes. At the time, the occurrence of a failure can also be detected. In this way, the FB card 10 can detect a failure not only from the signal type but also from the message passing state in the own device. Therefore, the FB card 10 can quickly detect an internal failure without being influenced by the timer.

さて、これまで本発明の実施例について説明したが、本発明は上述した実施例以外にも、種々の異なる形態にて実施されてよいものである。そこで、以下に異なる実施例を説明する。 Although the embodiments of the present invention have been described so far, the present invention may be implemented in various different forms other than the embodiments described above. Therefore, different embodiments will be described below.

（適用装置）
上記実施例では、ＦＢカードを例にして説明したが、本願はこれに限定されるものではなく、例えばサーバや基地局などの通信装置にも適用することができる。例えば、一般的なアプリケーションサーバなどに図３と同様の機能を設けることで、他のサーバやクライアントとの通信における障害時に、障害箇所を絞り込むことができる。 (Applicable equipment)
In the above embodiment, the FB card has been described as an example. However, the present application is not limited to this, and can be applied to a communication apparatus such as a server or a base station. For example, by providing a function similar to that shown in FIG. 3 in a general application server or the like, the failure location can be narrowed down when a failure occurs in communication with another server or client.

（信号種別）
上記実施例で例示した信号種別、MsgID、タイマ値、ポート番号等はあくまで例示であり、上記実施例に限定されるものではなく、管理者等が任意に設定することができる。また、設定内容をポート設定テーブルに格納することで、上記実施例と同様の処理を実行することができる。また、上記実施例では、メッセージや信号といった表現で説明したが、これらに限定されるものではなく、パケットやフレームなど装置間でやり取りされる様々なデータに適用することができる。 (Signal type)
The signal type, MsgID, timer value, port number, and the like exemplified in the above embodiment are merely examples, and are not limited to the above embodiment, and can be arbitrarily set by an administrator or the like. Further, by storing the setting contents in the port setting table, the same processing as in the above embodiment can be executed. In the above-described embodiment, the description has been made using expressions such as a message and a signal. However, the present invention is not limited to these, and the present invention can be applied to various data exchanged between devices such as a packet and a frame.

（復旧制御）
上記実施例では、復旧制御として再起動やリセットを実行する例を説明したが、これに限定されるものではない。例えば、ＦＢカード１０は、自カードの上位装置にあたる管理装置２に対して、障害発生を報告してもよい。この際、ＦＢカード１０は、特定した障害箇所や機能等を通知することもできる。他には、ＦＢカード１０は、フロントサイドに接続されるディスプレイ等に障害内容を表示させてもよく、管理者等にメール等で通知することもできる。 (Recovery control)
In the above-described embodiment, an example in which restart or reset is executed as recovery control has been described, but the present invention is not limited to this. For example, the FB card 10 may report the occurrence of a failure to the management device 2 that is a host device of the own card. At this time, the FB card 10 can also notify the specified failure location and function. In addition, the FB card 10 may display a failure content on a display or the like connected to the front side, and can notify an administrator or the like by e-mail or the like.

（システム）
また、本実施例において説明した各処理のうち、自動的におこなわれるものとして説明した処理の全部または一部を手動的におこなうこともできる。あるいは、手動的におこなわれるものとして説明した処理の全部または一部を公知の方法で自動的におこなうこともできる。この他、上記文書中や図面中で示した処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。 (system)
In addition, among the processes described in the present embodiment, all or a part of the processes described as being automatically performed can be manually performed. Alternatively, all or part of the processing described as being performed manually can be automatically performed by a known method. In addition, the processing procedure, control procedure, specific name, and information including various data and parameters shown in the above-described document and drawings can be arbitrarily changed unless otherwise specified.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られない。つまり、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。例えば、確認管理部２４が、障害検出部２５、比較部２６、特定部２７が実行する各処理を実行するなどしてもよい。さらに、各装置にて行なわれる各処理機能は、その全部または任意の一部が、ＣＰＵおよび当該ＣＰＵにて解析実行されるプログラムにて実現され、あるいは、ワイヤードロジックによるハードウェアとして実現され得る。 Further, each component of each illustrated apparatus is functionally conceptual, and does not necessarily need to be physically configured as illustrated. That is, the specific form of distribution / integration of each device is not limited to that shown in the figure. That is, all or a part of them can be configured to be functionally or physically distributed / integrated in arbitrary units according to various loads or usage conditions. For example, the confirmation management unit 24 may execute each process executed by the failure detection unit 25, the comparison unit 26, and the specification unit 27. Further, all or any part of each processing function performed in each device may be realized by a CPU and a program analyzed and executed by the CPU, or may be realized as hardware by wired logic.

（プログラム）
ところで、上記の実施例で説明した各種の処理は、あらかじめ用意されたプログラムをパーソナルコンピュータやワークステーションなどのコンピュータシステムで実行することによって実現することができる。そこで、以下では、上記の実施例と同様の機能を有するプログラムを実行するコンピュータシステムの一例を説明する。 (program)
By the way, the various processes described in the above embodiments can be realized by executing a program prepared in advance on a computer system such as a personal computer or a workstation. Therefore, in the following, an example of a computer system that executes a program having the same function as in the above embodiment will be described.

図１２は、障害特定プログラムを実行するコンピュータのハードウェア構成の例を示す図である。図１２に示すように、コンピュータ１００は、ＣＰＵ１０２、入力装置１０３、出力装置１０４、通信インタフェース１０５、媒体読取装置１０６、ＨＤＤ（Hard Disk Drive）１０７、ＲＡＭ（Random Access Memory）１０８を有する。また、図１２に示した各部は、バス１０１で相互に接続される。 FIG. 12 is a diagram illustrating an example of a hardware configuration of a computer that executes a failure identification program. As illustrated in FIG. 12, the computer 100 includes a CPU 102, an input device 103, an output device 104, a communication interface 105, a medium reading device 106, an HDD (Hard Disk Drive) 107, and a RAM (Random Access Memory) 108. 12 are connected to each other by a bus 101.

入力装置１０３は、マウスやキーボードであり、出力装置１０４は、ディスプレイなどであり、通信インタフェース１０５は、ＮＩＣなどのインタフェースである。ＨＤＤ１０７は、障害特定プログラム１０７ａとともに、図４から図６に示した各テーブル等を記憶する。記録媒体の例としてＨＤＤ１０７を例に挙げたが、ＲＯＭ（Read Only Memory）、ＲＡＭ、ＣＤ−ＲＯＭ等の他のコンピュータ読み取り可能な記録媒体に各種プログラムを格納しておき、コンピュータに読み取らせることとしてもよい。なお、記録媒体を遠隔地に配置し、コンピュータが、その記憶媒体にアクセスすることでプログラムを取得して利用してもよい。また、その際、取得したプログラムをそのコンピュータ自身の記録媒体に格納して用いてもよい。 The input device 103 is a mouse or a keyboard, the output device 104 is a display or the like, and the communication interface 105 is an interface such as a NIC. The HDD 107 stores the tables shown in FIGS. 4 to 6 and the like together with the failure identification program 107a. As an example of the recording medium, the HDD 107 is taken as an example. However, various programs are stored in other computer-readable recording media such as a ROM (Read Only Memory), a RAM, and a CD-ROM, and are read by the computer. Also good. Note that the recording medium may be arranged at a remote place, and the computer may acquire and use the program by accessing the storage medium. At that time, the acquired program may be stored in a recording medium of the computer itself and used.

ＣＰＵ１０２は、障害特定プログラム１０７ａを読み出してＲＡＭ１０８に展開することで、図３等で説明した各機能を実行する障害特定プロセス１０８ａを動作させる。すなわち、障害特定プロセス１０８ａは、図３に記載した送受信処理部２１と計数部２２と信号種別判定部２３と確認管理部２４と障害検出部２５と比較部２６と特定部２７と復旧制御部２８と同様の機能を実行する。このようにコンピュータ１００は、プログラムを読み出して実行することで障害特定方法を実行する情報処理装置として動作する。 The CPU 102 operates the failure identification process 108a that executes each function described with reference to FIG. 3 and the like by reading the failure identification program 107a and developing it in the RAM 108. That is, the failure identification process 108a includes the transmission / reception processing unit 21, the counting unit 22, the signal type determination unit 23, the confirmation management unit 24, the failure detection unit 25, the comparison unit 26, the identification unit 27, and the recovery control unit 28 illustrated in FIG. Performs the same function as As described above, the computer 100 operates as an information processing apparatus that executes the fault identification method by reading and executing the program.

また、コンピュータ１００は、媒体読取装置１０６によって記録媒体から障害特定プログラムを読み出し、読み出された障害特定プログラムを実行することで上記した実施例と同様の機能を実現することもできる。なお、この他の実施例でいうプログラムは、コンピュータ１００によって実行されることに限定されるものではない。例えば、他のコンピュータまたはサーバがプログラムを実行する場合や、これらが協働してプログラムを実行するような場合にも、本発明を同様に適用することができる。 The computer 100 can also realize the same function as the above-described embodiment by reading the failure identification program from the recording medium by the medium reader 106 and executing the read failure identification program. Note that the program referred to in the other embodiments is not limited to being executed by the computer 100. For example, the present invention can be similarly applied to a case where another computer or server executes the program or a case where these programs cooperate to execute the program.

１制御装置
２管理装置
３シャーシ
５ＳＷカード
１０、３０ＦＢカード
１０ａ、１０ｄコネクタ
１０ｂ、１０ｃ、１０ｅＰＨＹ
１１ＦＰＧＡ
１１ａ、１１ｂＭＡＣ
１１ｃスイッチ
１２ネットワークプロセッサ
１３メモリ
１４メモリコントローラ
１５ＣＰＵ
２０ａ信号種別テーブル
２０ｂ通過数テーブル
２０ｃポート設定テーブル
２１送受信処理部
２２計数部
２３信号種別判定部
２４確認管理部
２５障害検出部
２６比較部
２７特定部
２８復旧制御部 DESCRIPTION OF SYMBOLS 1 Control apparatus 2 Management apparatus 3 Chassis 5 SW card 10, 30 FB card 10a, 10d Connector 10b, 10c, 10e PHY
11 FPGA
11a, 11b MAC
11c switch 12 network processor 13 memory 14 memory controller 15 CPU
20a Signal type table 20b Pass number table 20c Port setting table 21 Transmission / reception processing unit 22 Count unit 23 Signal type determination unit 24 Confirmation management unit 25 Failure detection unit 26 Comparison unit 27 Identification unit 28 Recovery control unit

Claims

A storage unit that stores a processing function and a port number used by the processing function in association with each other;
A detection unit for detecting a communication failure;
A counting unit provided inside the apparatus for counting the number of messages passing through the processing part for each port number, for each processing part for executing each process;
When a failure is detected by the detection unit, a comparison unit that compares the number of messages of each processing site counted by the counting unit;
In the result of comparison by the comparison unit, when there is a difference in the number of messages that have passed through each processing part, the inside of the device is specified as a failure part, and the processing part in which the difference occurs is specified, When there is no difference in the number of messages, a specifying unit that identifies the outside of the device as a failure location ;
When the processing part is specified as the failure part by the specifying unit, the processing function in which the failure has occurred in the processing part based on the port number in which the difference has occurred and the type of the message that has passed the port number And an execution unit that executes a recovery operation for the processing function .

The detection unit does not receive a response to the message from the connection destination device within a predetermined time corresponding to the signal type of the message transmitted to the connection destination device connected to the device, or The communication apparatus according to claim 1, wherein the communication apparatus detects that the communication failure has occurred when a retransmission request for the message is received from a connection destination apparatus.

The comparison unit periodically compares the number of messages for each processing site,
The said specific | specification part specifies the inside of the said apparatus as a failure location, when the comparison result in which the difference has arisen in the number of the messages which passed through each said process site | part is obtained. Communication equipment.

The specifying unit, when specifying the inside of the device as a failure location, specifies a processing site that has caused the failure by using connection information inside the device or a transmission / reception path of the message, The communication device according to claim 1.

The apparatus further comprises a recovery control unit that executes recovery control for a processing site that is specified as a cause of the failure specified by the specifying unit or a connection destination device connected to the device. Item 5. The communication device according to Item 4.

Computer
Detect communication failures,
For each processing site that is provided inside the computer and executes each process, the number of messages that have passed through the processing site is counted for each port number ,
When a failure is detected, compare the number of messages of each processing site counted,
As a result of the comparison, if there is a difference in the number of messages that have passed through each processing part, the inside of the computer is specified as a faulty part, the processing part in which the difference occurs is specified, and the number of messages If the difference is not generated in the outside of the computer to identify the failure location in,
When the processing site is specified as the failure location, the storage unit that stores the processing function and the port number used by the processing function in association with each other is referred to, and the port number and the port number where the difference has occurred are stored. A failure detection method comprising: processing for identifying the processing function in which a failure has occurred in the processing site based on the type of message that has passed, and executing recovery work for the processing function .

On the computer,
Detect communication failures,
For each processing site that is provided inside the computer and executes each process, the number of messages that have passed through the processing site is counted for each port number ,
When a failure is detected, compare the number of messages of each processing site counted,
As a result of the comparison, if there is a difference in the number of messages that have passed through each processing part, the inside of the computer is specified as a faulty part, the processing part in which the difference occurs is specified, and the number of messages If the difference is not generated in the outside of the computer to identify the failure location in,
When the processing site is specified as the failure location, the storage unit that stores the processing function and the port number used by the processing function in association with each other is referred to, and the port number and the port number where the difference has occurred are stored. A failure detection program that identifies the processing function in which a failure has occurred in the processing site based on the type of message that has passed, and causes the processing function to execute a recovery operation .