JP2013025539A

JP2013025539A - Monitoring system and monitoring method for multi-cpu system

Info

Publication number: JP2013025539A
Application number: JP2011159227A
Authority: JP
Inventors: Toru Uehara; 植原　　徹
Original assignee: Kyocera Corp
Current assignee: Kyocera Corp
Priority date: 2011-07-20
Filing date: 2011-07-20
Publication date: 2013-02-04

Abstract

PROBLEM TO BE SOLVED: To provide a monitoring system and monitoring method of a multi-CPU system for discriminating the classification of a failure.SOLUTION: A monitoring system 1 of a multi-CPU system having a plurality of CPU terminals (CPU1, CPU2, through CPUn+1) connected via a serial bus 2 includes a bus 3 for failure discrimination connected to the plurality of CPU terminals (CPU1, CPU2, through CPUn+1). When any response signal to a first response request signal transmitted from one CPU terminal (CPU1) via the serial bus 2 to the other CPU terminal (CPU2) is not received, a second response request signal is transmitted from one CPU terminal (CPU1) via the bus 3 for failure discrimination to the other CPU terminal (CPU2), and the classification of the failure of the other CPU terminal (CPU2) is discriminated on the basis of the presence/absence of the reception of a response signal to the second response request signal.

Description

本発明は、マルチＣＰＵシステムの監視システム及び監視方法に関するものである。 The present invention relates to a monitoring system and a monitoring method for a multi-CPU system.

従来、例えば銀行のオンラインシステム等においては、リスクの分散やデータ処理の効率化のため、情報処理を複数のＣＰＵ端末で分担して行う、分散処理システムが用いられている。
このようなシステム構成においては、各ＣＰＵ端末のうちの１つでも故障すると、システム全体の機能障害の原因となるため、各ＣＰＵ端末の動作を監視する必要がある。 2. Description of the Related Art Conventionally, for example, in bank online systems, a distributed processing system in which information processing is shared by a plurality of CPU terminals is used for risk distribution and data processing efficiency.
In such a system configuration, if any one of the CPU terminals fails, it causes a functional failure of the entire system. Therefore, it is necessary to monitor the operation of each CPU terminal.

この監視方法としては、例えば、図３に示すように、シリアルバスを介した各ＣＰＵ端末間での信号の送受信により、他のＣＰＵ端末の状態が正常であるか否かを判断する方法がある（例えば、特許文献１参照）。 As this monitoring method, for example, as shown in FIG. 3, there is a method of determining whether or not the state of another CPU terminal is normal by transmitting and receiving signals between the CPU terminals via a serial bus. (For example, refer to Patent Document 1).

特許第４１２６８４９号Japanese Patent No. 4126849

しかし、上記の技術では、障害の発生の有無は判定できるものの、障害の種別までは特定できず、各ＣＰＵ端末の通信機能に障害が発生しているのか、他の部分に障害が発生しているのかを判別することができなかった。
従って、上記の技術では、通信機能に障害が発生している場合でも、ＣＰＵ全体をリセットすることにより対処するため、これによるシステム全体への影響が大きかった。 However, with the above technology, it is possible to determine whether or not a failure has occurred, but it is not possible to specify the type of failure, and whether there is a failure in the communication function of each CPU terminal or a failure has occurred in other parts. It was not possible to determine whether it was.
Therefore, in the above technique, even when a failure occurs in the communication function, the entire CPU is dealt with by resetting the entire CPU.

本発明は、上記の問題を解決しようとするものであり、その目的は、障害の種別をより詳細に判別することのできる、マルチＣＰＵシステムの監視システム及び監視方法を提供することにある。 An object of the present invention is to provide a monitoring system and a monitoring method for a multi-CPU system that can determine the type of failure in more detail.

本発明に係るマルチＣＰＵシステムの監視システムは、シリアルバスで接続された複数のＣＰＵ端末を備えた、マルチＣＰＵシステムの監視システムであって、前記複数のＣＰＵ端末に接続された障害判別用バスを備え、一のＣＰＵ端末が、前記シリアルバスを介して他のＣＰＵ端末に送信した第１の応答要求信号に対する応答信号を受信しない場合に、前記一のＣＰＵ端末から前記障害判別用バスを介して前記他のＣＰＵ端末に第２の応答要求信号を送信し、当該第２の応答要求信号に対する応答信号の受信の有無に基づいて前記他のＣＰＵ端末の障害の種別を判別するものである。 A monitoring system for a multi-CPU system according to the present invention is a monitoring system for a multi-CPU system having a plurality of CPU terminals connected by a serial bus, and includes a failure determination bus connected to the plurality of CPU terminals. And when one CPU terminal does not receive a response signal to the first response request signal transmitted to another CPU terminal via the serial bus, the one CPU terminal via the fault determination bus A second response request signal is transmitted to the other CPU terminal, and a failure type of the other CPU terminal is determined based on whether or not a response signal is received in response to the second response request signal.

本発明の好適一実施形態によれば、前記シリアルバスは、同期式シリアルインタフェースであり、前記障害判別用バスは、非同期式シリアルインタフェースである。 According to a preferred embodiment of the present invention, the serial bus is a synchronous serial interface, and the failure determination bus is an asynchronous serial interface.

また、本発明に係るマルチＣＰＵシステムの監視方法は、シリアルバスを介して接続された複数のＣＰＵ端末の一のＣＰＵ端末から他のＣＰＵ端末に送信した第１の応答要求信号に対する応答信号の受信の有無に基づいて前記他のＣＰＵ端末の障害の発生の有無を判定する工程と、前記他のＣＰＵ端末に障害が発生したと判定した場合に、前記複数の端末が接続された障害判別用バスを介して前記一のＣＰＵ端末から前記他のＣＰＵ端末に第２の応答要求信号を送信して、その応答信号の受信の有無に基づいて前記他のＣＰＵ端末の障害の種別を判別する工程と、を含むものである。 The monitoring method for a multi-CPU system according to the present invention also includes receiving a response signal for a first response request signal transmitted from one CPU terminal to another CPU terminal of a plurality of CPU terminals connected via a serial bus. Determining whether or not a fault has occurred in the other CPU terminal based on the presence or absence of the fault, and a fault determination bus to which the plurality of terminals are connected when it is determined that a fault has occurred in the other CPU terminal. Transmitting a second response request signal from the one CPU terminal to the other CPU terminal via the terminal, and determining a failure type of the other CPU terminal based on whether or not the response signal is received; , Including.

本発明によれば、障害の発生の有無のみならず、障害の種別をより詳細に判別することができる。 According to the present invention, it is possible to determine not only the occurrence of a failure but also the type of failure in more detail.

本発明のマルチＣＰＵシステムの監視システムの一例を示すブロック図である。It is a block diagram which shows an example of the monitoring system of the multi CPU system of this invention. 本発明のマルチＣＰＵシステムの監視システムの動作を説明するための図である。It is a figure for demonstrating operation | movement of the monitoring system of the multi CPU system of this invention. 従来のマルチＣＰＵシステムを示すブロック図である。It is a block diagram which shows the conventional multi CPU system.

以下、本発明の一実施形態について図面を参照して詳細に説明する。
図１に示すように、本発明のマルチＣＰＵシステムの監視システム１は、複数のＣＰＵ端末ＣＰＵ１、ＣＰＵ２、…ＣＰＵｎ＋１が、シリアルバス２と障害判別用バス３を介して接続されている。 Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings.
As shown in FIG. 1, in a monitoring system 1 of a multi-CPU system of the present invention, a plurality of CPU terminals CPU1, CPU2,..., CPUn + 1 are connected via a serial bus 2 and a fault determination bus 3.

シリアルバス２は、各ＣＰＵ端末間でシリアル通信を行うためのインタフェースであり、図示例では、同期式シリアルバスである。
各ＣＰＵ端末ＣＰＵ１、ＣＰＵ２、…ＣＰＵｎ＋１は、シリアルバス２を介して他のＣＰＵ端末と信号の送受信を行い、他のＣＰＵ端末の障害の発生の有無を監視する。 The serial bus 2 is an interface for performing serial communication between CPU terminals, and is a synchronous serial bus in the illustrated example.
Each of the CPU terminals CPU1, CPU2,..., CPUn + 1 transmits / receives a signal to / from another CPU terminal via the serial bus 2, and monitors the occurrence of a failure in the other CPU terminal.

障害判別用バス３は、シリアルバス２を介したＣＰＵ端末間の障害の発生の有無の監視により障害が発生したと判定した場合に、障害の種別を判別するのに用いるインタフェースであり、図示例では、非同期式シリアルバスである。 The failure determination bus 3 is an interface used to determine the type of failure when it is determined that a failure has occurred by monitoring whether or not a failure has occurred between CPU terminals via the serial bus 2. Then, it is an asynchronous serial bus.

以下、本発明に係るマルチＣＰＵシステムの監視システムの動作について説明する。
まず、一のＣＰＵ端末は、シリアルバス２を介して他のＣＰＵ端末とシリアル通信を実行して、当該他のＣＰＵ端末に対し応答要求信号を送信する。そして、所定の時間内に応答信号を受信した場合は、当該他のＣＰＵ端末は正常であるものと判定する。
一方で、一のＣＰＵ端末は、所定の時間内にシリアルバス２を介して応答信号の受信がない場合には、他のＣＰＵ端末に障害が発生したものと判定する。 The operation of the multi-CPU system monitoring system according to the present invention will be described below.
First, one CPU terminal performs serial communication with another CPU terminal via the serial bus 2 and transmits a response request signal to the other CPU terminal. When the response signal is received within a predetermined time, it is determined that the other CPU terminal is normal.
On the other hand, if one CPU terminal does not receive a response signal via the serial bus 2 within a predetermined time, it determines that a failure has occurred in another CPU terminal.

具体的には、監視側のＣＰＵ端末は、被監視側のＣＰＵ端末に第１の応答要求信号を送信すると共に、その送信に同期してタイマを起動する。そして、タイマによる計測時間が所定の監視時間Ｔ１に達するまでに被監視側のＣＰＵ端末からの応答信号を受信した場合は、当該ＣＰＵ端末は正常と判定し、タイマをリセットする。これに対し、タイマの計測時間が監視時間Ｔ１に達しても被監視側のＣＰＵ端末からの応答信号が受信されない場合は、当該ＣＰＵ端末に障害が発生したものと判定し、タイマをリセットする。 Specifically, the monitoring CPU terminal transmits a first response request signal to the monitored CPU terminal and starts a timer in synchronization with the transmission. When a response signal is received from the monitored CPU terminal before the time measured by the timer reaches the predetermined monitoring time T1, the CPU terminal determines that the CPU terminal is normal and resets the timer. On the other hand, if a response signal from the monitored CPU terminal is not received even when the measured time of the timer reaches the monitoring time T1, it is determined that a failure has occurred in the CPU terminal, and the timer is reset.

このようにして、一のＣＰＵ端末がシリアルバス２を介して他の一のＣＰＵ端末を監視し、所定の監視時間内に応答信号の受信が確認された場合は、当該他の一のＣＰＵ端末が正常に動作していることを確認することができる。 In this way, when one CPU terminal monitors another CPU terminal via the serial bus 2 and reception of a response signal is confirmed within a predetermined monitoring time, the other CPU terminal Can confirm that it is working properly.

一方で、図２に示すように、シリアルバス２を介した通信で、例えばＣＰＵ１からＣＰＵ２への第１の応答要求信号に対して、所定の監視時間Ｔ１内に応答信号が受信されず、ＣＰＵ２に障害が発生したと判定した場合には、障害判別用バス３を介してＣＰＵ１からＣＰＵ２に対して第２の応答要求信号を送信し、所定の監視時間内Ｔ２（Ｔ１と同じであっても異なっていても良い）内でのＣＰＵ２からの応答信号の有無を確認する。 On the other hand, as shown in FIG. 2, in response to the first response request signal from, for example, the CPU 1 to the CPU 2 by communication via the serial bus 2, no response signal is received within a predetermined monitoring time T1, and the CPU 2 If it is determined that a failure has occurred, a second response request signal is transmitted from the CPU 1 to the CPU 2 via the failure determination bus 3, and T2 within a predetermined monitoring time (even if the same as T1) The presence or absence of a response signal from the CPU 2 is confirmed.

これにより、障害判別用バス３を介した通信で、所定の監視時間Ｔ２内にＣＰＵ２からの応答信号が受信された場合には、ＣＰＵ２の通信機能に障害があるものと判定する。
この場合には、ＣＰＵ２の通信機能のみリセットすればよい。従って、ＣＰＵ２全体をリセットする場合に比べて、マルチＣＰＵシステム全体に与える影響が少なくて済む。 Thus, when a response signal is received from the CPU 2 within a predetermined monitoring time T2 through communication via the failure determination bus 3, it is determined that there is a failure in the communication function of the CPU 2.
In this case, only the communication function of the CPU 2 may be reset. Therefore, the influence on the entire multi-CPU system can be reduced as compared with the case where the entire CPU 2 is reset.

一方で、障害判別用バス３を介した通信でも、所定の監視時間Ｔ２内にＣＰＵ２からの応答信号がない場合には、ＣＰＵ２の通信機能以外の部分に障害が発生しているものと判定する。
この場合には、ＣＰＵ２全体をリセットすることにより、ＣＰＵ２の機能を正常に回復させることができる。 On the other hand, even in communication via the failure determination bus 3, if there is no response signal from the CPU 2 within the predetermined monitoring time T2, it is determined that a failure has occurred in a portion other than the communication function of the CPU 2. .
In this case, the function of the CPU 2 can be restored normally by resetting the entire CPU 2.

ここで、シリアルバス２と、障害判別用バス３とは、異なる通信方式とすることが好ましく、特に、シリアルバス２を同期式シリアルインタフェースとし、障害判別用バス３を、非同期式シリアルインタフェースとすることが好ましい。 Here, it is preferable that the serial bus 2 and the failure determination bus 3 have different communication methods. In particular, the serial bus 2 is a synchronous serial interface, and the failure determination bus 3 is an asynchronous serial interface. It is preferable.

これは、ＣＰＵ端末の通信機能に障害が発生している場合、その通信方式に依存した障害であることが考えられ、そのような場合に、障害判別用バスの通信方式を異ならせることにより、障害判別用バスを介した通信で同様の原因による障害が発生することを回避して、より確実に故障の発生箇所が通信機能であることを判別できるようにするためである。 This is considered to be a failure depending on the communication method when a failure occurs in the communication function of the CPU terminal. In such a case, by changing the communication method of the failure determination bus, This is because it is possible to avoid the occurrence of a failure due to the same cause in communication via the failure determination bus and to more reliably determine that the failure occurrence location is the communication function.

例えば、図１に示す例では、シリアルバス２は、同期式シリアルバスであるため、同期式の通信に用いるクロック系にノイズ等が影響し、ジッタが発生する場合がある。
このような場合に、障害判別用バス３が同様の通信方式、すなわち同期式シリアルバスである場合、同様にノイズが影響することにより、障害が発生したとされるＣＰＵ端末との通信が成功しない場合があり、この場合、通信機能に障害が発生しているにもかかわらず、ＣＰＵ端末全体に故障が発生しているものと誤って判別してしまう可能性がある。
そこで、障害判別用バス３を非同期式のバスとすることにより、障害判別用バス３を介した通信に、同様の障害の原因が作用することを避けることができる。 For example, in the example shown in FIG. 1, since the serial bus 2 is a synchronous serial bus, noise or the like may affect the clock system used for synchronous communication, and jitter may occur.
In such a case, when the failure determination bus 3 is a similar communication system, that is, a synchronous serial bus, the communication with the CPU terminal where the failure has occurred is not successful due to the noise similarly. In this case, there is a possibility of erroneously determining that a failure has occurred in the entire CPU terminal even though a failure has occurred in the communication function.
Therefore, by making the failure determination bus 3 an asynchronous bus, it is possible to avoid the same cause of the failure from affecting the communication via the failure determination bus 3.

以上説明したように、本実施形態によれば、ＣＰＵ端末に障害が発生したと判定した場合に、障害判別用バスを介して通信を行うことにより、障害が発生したと判定されたＣＰＵ端末の通信機能に障害が発生したのか、あるいは、それ以外の箇所に障害が発生したのかを判別することができる。
従って、ＣＰＵ端末の障害が発生した箇所のみをリセットすることができるため、障害の復旧作業における、マルチＣＰＵシステム全体への影響を最小限に留めることができる。 As described above, according to the present embodiment, when it is determined that a failure has occurred in the CPU terminal, the CPU terminal that has been determined to have failed by communicating via the failure determination bus. It is possible to determine whether a failure has occurred in the communication function or whether a failure has occurred elsewhere.
Therefore, since only the location where the failure of the CPU terminal has occurred can be reset, the influence of the failure recovery operation on the entire multi-CPU system can be minimized.

なお、本発明は、上記実施形態に限定されるものではなく、発明の趣旨を逸脱しない範囲で種々変更可能である。例えば、障害判別用バスは、シリアルバスに限定されずにパラレルバスとすることもできる。 In addition, this invention is not limited to the said embodiment, A various change is possible in the range which does not deviate from the meaning of invention. For example, the failure determination bus is not limited to a serial bus, but may be a parallel bus.

１マルチＣＰＵシステムの監視システム
２シリアルバス
３障害判別用バス 1 Multi-CPU system monitoring system 2 Serial bus 3 Fault determination bus

Claims

A multi-CPU system monitoring system comprising a plurality of CPU terminals connected by a serial bus,
A fault determination bus connected to the plurality of CPU terminals;
When one CPU terminal does not receive a response signal to the first response request signal transmitted to the other CPU terminal via the serial bus, the other CPU terminal via the fault determination bus from the one CPU terminal. A multi-CPU system monitoring system that transmits a second response request signal to the CPU terminal of the CPU and determines the type of failure of the other CPU terminal based on whether or not a response signal is received in response to the second response request signal .

The multi-CPU system monitoring system according to claim 1, wherein the serial bus is a synchronous serial interface, and the failure determination bus is an asynchronous serial interface.

Occurrence of a failure in the other CPU terminal based on whether or not a response signal is received in response to the first response request signal transmitted from one CPU terminal to another CPU terminal of a plurality of CPU terminals connected via a serial bus Determining the presence or absence of
When it is determined that a failure has occurred in the other CPU terminal, a second response request signal is sent from the one CPU terminal to the other CPU terminal via a failure determination bus to which the plurality of terminals are connected. Transmitting and determining the type of failure of the other CPU terminal based on whether or not the response signal is received;
A method for monitoring a multi-CPU system, including: