JPH01276301A

JPH01276301A - Maintenance back-up device for multiplex system

Info

Publication number: JPH01276301A
Application number: JP63106247A
Authority: JP
Inventors: Tomoo Kumamaru; 熊丸　智雄
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1988-04-28
Filing date: 1988-04-28
Publication date: 1989-11-06

Abstract

PURPOSE:To surely grasp the working state of a multiplex system and at the same time to take a proper recovering measure before the occurrence of a trouble by using a common trouble extracting means which compares the information on the result of each system of the multiplex system with each other for extraction of the common trouble conditions that may possibly cause the discontinuation of operations of all systems. CONSTITUTION:A result information storage means 21 stores the information on the result of either one or both of the logging information and the diagnosing test information received from each system of a multiplex system. The means 21 fetches the information on the result of the logging information, the diagnosing information, etc., received from each system and sends these result information to a common trouble extracting means 23. The means 23 decides the presence or absence of the hard and soft common trouble conditions based on the result information on each system. When the presence of the common trouble conditions is confirmed, the trouble contents are displayed and informed to a maintenance operator. Thus it is possible to quickly take a proper recovering measure in accordance with the contents of each trouble before all systems have troubles.

Description

【発明の詳細な説明】［発明の目的］（産業上の利用分野）本発明は、例えばプロセス計装システム、各種のデータ
処理システム、ＦＡ（ファクトリ−・オ−トメ−ンヨン
）システムその他種々のバックアップシステム等に利用
する多重化システムの保守支援装置に係わり、特に全シ
ステムの停止故障を含む重大故障を速やかに検知して保
守員に知らせる多重化システムの保守支援装置に関する
。[Detailed Description of the Invention] [Object of the Invention] (Industrial Application Field) The present invention is applicable to, for example, process instrumentation systems, various data processing systems, FA (factory automation) systems, and other various applications. The present invention relates to a maintenance support device for a multiplex system used as a backup system, etc., and particularly to a maintenance support device for a multiplex system that promptly detects a serious failure, including a failure that stops the entire system, and notifies maintenance personnel.

（従来の技術）一般に、故障等によって多大の損失を被る各種のシステ
ムにおいては、そのシステムの信頼性を上げる観点から
広く多重化システムを採用しており、現段階でもそれ相
当の実績を上げている。(Prior art) In general, multiplexing systems are widely used in various systems that suffer large losses due to failures, etc., from the perspective of increasing the reliability of the system, and even at this stage, multiplexing systems have been widely used. There is.

しかし、多重化システムの採用にも拘らず、なおかつ全
システムの故障停止に至る場合がある。例えば二重化計
算機システムでは共通部の故障例えばＣＰＵやメモリの
故障や応用ソフトウェアのバグ（Ｂｕｇ）等が上げられ
、この場合にはシステム全体がダウンする。このような
システム全体のダウンうち例えばプロセス計装システム
の場合にはプラントのオンライン中に全システムか故障
停止すると、そのシステムの操業停止によって製品の生
産が大幅にダウンして多大の損失を受け、また故障内容
によっては地域環境に多大な影響を与え、あるいは爆発
事故を誘発する恐れがあって非常に危険な状態となる。However, despite the adoption of a multiplex system, there are still cases where the entire system fails and stops. For example, in a redundant computer system, failures in common parts such as failures in the CPU or memory, bugs in application software, etc. may occur, and in this case, the entire system will go down. For example, in the case of a process instrumentation system, if the entire system fails and stops while the plant is online, product production will be significantly down due to the system's operation being stopped, resulting in a huge loss. Furthermore, depending on the nature of the failure, it may have a significant impact on the local environment or cause an explosion, resulting in an extremely dangerous situation.

そこで、従来の多重化システムにおいては、全システム
の定期的な保守点検は無理としても、定期的に運用系側
システムに影響を与えない配慮をとりながら他の待機系
システムを停止させて過去のロギング（Ｌｏｇｇｉｎｇ
）情報を参照しながら手際よく短時間に故障の有無を判
断する保守点検作業を実施している。Therefore, in conventional multiplexed systems, even if it is impossible to perform periodic maintenance and inspection of all systems, other standby systems are periodically stopped while taking care not to affect the active system. Logging
) Maintenance and inspection work is carried out to quickly and efficiently determine the presence or absence of a failure while referring to information.

（発明が解決しようとする課題）しかし、以上のような多重化システムの保守点検作業の
うち特に重大事故につながる箇所の保守点検作業は、故
障に関する知識や実績データより故障を予ｆｌｌ１１す
る判断能力が必要であり、保守員の経験、技量のみなら
ず、保守員にそれ相当な負担と責任がかかっている。し
かも１その判断結果は保守員によって異なり、万一保守
点検すべき箇所を見落とし、その見落とした箇所に故障
がある場合には故障が除去されないまま実運用に入るた
めにシステム全体のダウン率が高くなり、かつ、システ
ム全体がダウンしたときにはその原因探索に多大の時間
と労力を必要とし、諸々の面で多大の損失を受けること
になる。(Problem to be solved by the invention) However, among the maintenance and inspection work for the multiplexed system as described above, maintenance and inspection work for parts that may lead to serious accidents in particular requires the ability to predict failures based on knowledge of failures and actual data. This requires not only the experience and skill of the maintenance personnel, but also a considerable burden and responsibility on the maintenance personnel. Moreover, the judgment result differs depending on the maintenance personnel, and in the unlikely event that a point that should be maintained and inspected is overlooked and there is a failure at that point, the failure rate of the entire system is high because the actual operation begins without the failure being removed. If this happens and the entire system goes down, it will take a lot of time and effort to find the cause, and you will suffer a lot of loss in various ways.

本発明は以上のような問題点を解決するためになされた
もので、多重システムのハードウェアおよびソフトウェ
ア等の共通故障条件を自動的に見つけ出し、特別な経験
および技量を必要とせずにシステムの動作状況を確実に
把握でき、かつ、故障に至る前に適切な回復処置を取り
得る多重化システムの保守支援装置を提供することを目
的とする。The present invention was made in order to solve the above-mentioned problems, and it automatically finds common failure conditions in the hardware and software of multiple systems, and enables system operation without the need for special experience or skill. It is an object of the present invention to provide a maintenance support device for a multiplexed system that can reliably grasp the situation and take appropriate recovery measures before a failure occurs.

また、本発明の他の１］的は、運用系側システムの故障
徴候の有無を監視し、故障徴候有りの場合には待機系シ
ステムの健全性を確認しながら待機系に切替えることに
より、多重化システムの安定。In addition, another object of the present invention is to monitor the active system for any signs of failure, and if there are signs of failure, switch to the standby system while checking the health of the standby system. stability of the system.

円滑な運用を確保することにある。The purpose is to ensure smooth operation.

［発明の構成］（課題を解決するための手段）本発明による多重化システムの保守支援装置は上記目的
を達成するために、各システムから送られてくるロギン
グ情報および診断テスト情報のうち何れか一方または両
方の実績情報を記憶する実績情報記憶手段と、前記各シ
ステムの実績情報を比較し全システムの停止の恐れある
共通故障条件を抽出する共通故障抽出手段とを備え、こ
の共通故障抽出手段によって抽出された故障内容を出−
５＝力する構成である。[Structure of the Invention] (Means for Solving the Problem) In order to achieve the above object, the maintenance support device for a multiplexed system according to the present invention uses any of the logging information and diagnostic test information sent from each system. The common fault extraction means includes performance information storage means for storing performance information of one or both of the systems, and common fault extraction means for comparing the performance information of each of the systems and extracting a common fault condition that may cause a stoppage of all systems. Output the fault details extracted by
5 = Powerful configuration.

また、□他のもう１つの発明は、多重化システムのうち
運用系側システムのロギング情報から故障の徴候を検知
する故障徴候検知手段と、この故障徴候検知手段によっ
て故障徴候か有ることを検知したとき前記待機系側シス
テムから診断テスト情報またはロギング情報を取り込ん
で待機系側の健全性の有無を判断する健全性判断手段と
、前記故障徴候検知手段から故障徴候信号を受け、かつ
、前記健全性判断手段による診断テスト結果またはロギ
ング情報から待機系システムが健全であることを条件に
前記待機系側に切替える切替条件判定手段とを備えたも
のである。□Another invention is a failure symptom detection means for detecting a failure symptom from logging information of an active system in a multiplexed system, and the presence of a failure symptom is detected by this failure symptom detection means. a health determining means for taking in diagnostic test information or logging information from the standby system to determine whether the standby system is healthy; and receiving a failure symptom signal from the failure symptom detection means; The switching condition determination means switches to the standby system on the condition that the standby system is healthy based on the diagnostic test result or logging information by the determination means.

（作用）従って、本発明は以上のような手段を３１４じたことに
より、実績情報記憶手段が各システムからロギング情報
１診断テスト情報等の実績情報を吸い上げた後、この実
績情報を共通故障抽出手段へ送出する。この共通故障抽
出手段では、各システムの実績情報に基づいてハードお
よびソフト上の＝　６− 共通故障条件か存在しているか否かを判断し、共通故障
条件がある場合にはその故障内容を表示して保守員に報
知することにより、全システムか故障に至る前に前記故
障内容に応じた迅速な回復処置をとることができる。(Function) Therefore, the present invention employs the above-mentioned means in 314 ways, so that after the performance information storage means sucks up performance information such as logging information 1 diagnostic test information from each system, this performance information is used to extract common faults. send to means. This common fault extraction means determines whether a =6- common fault condition exists on hardware and software based on the performance information of each system, and if there is a common fault condition, displays the details of the fault. By notifying the maintenance personnel, prompt recovery measures can be taken in accordance with the details of the failure before the entire system fails.

また、他の発明においては、常時は故障徴候検知手段で
運用系側システムからのロギング情報を取込んで故障の
徴候の有無を調べ、故障徴候有りの場合には健全性判断
手段にて待機系側システムから診断結果の情報またはロ
ギング情報を取り込んで待機系の健全性を２週べろ。そ
して、運用系側に故障徴候が有り、かつ、待機系におい
て健全であると判断したとき、切替条件判定手段によっ
て待機系側システムに切替えて運用を継続するものであ
る。Further, in another invention, the failure symptom detection means always takes in logging information from the active system to check whether there are any failure symptoms, and if there are failure symptoms, the health judgment means uses the standby system to detect the failure symptoms. Check the health of the standby system for two weeks by importing diagnostic result information or logging information from the side system. When it is determined that the active system has a failure symptom and the standby system is healthy, the switching condition determining means switches to the standby system and continues operation.

（実施例）以下、本発明の一実施例について第１図を参照して説明
する。同図において１０は多重化システムの一構成例と
しての並列二重化計算機システムである。このシステム
１０は、各システムを所定のシーケンスプログラムに基
づいて演算処理を実行する中央演算処理部（ＣＰＵ）１
１，１．１、バルクメモリ１２，１２、例えばプロセス
系１Ｂへ制御信号を送出し、かつ、プロセス系１３から
必要な信号を取り込むＩ１０インターフェイス１４．１
．４等のほか、各中央演算部］、１．，１１で個別に管
理する共通メモリ１５および各中央演算部１．１，１．
１を接続する連結装置１６等か設けられている。この各
システムは、各中実装置］１で運用時のロギング情報を
取得しており、また、定期的に診断テストを実行して診
断テスト情報を取得している。そこで、本装置ではそれ
らロギング情報１診断テスト情報等の実績情報を吸い上
げて両システムに共通な故障か発生しているか否かを判
断１〜、かつ、その判断結果を出力するために保守支援
装置２０を設けたことにある。(Example) Hereinafter, an example of the present invention will be described with reference to FIG. In the figure, reference numeral 10 denotes a parallel duplex computer system as an example of a configuration of a multiplex system. This system 10 includes a central processing unit (CPU) 1 that executes arithmetic processing based on a predetermined sequence program for each system.
1, 1.1, bulk memory 12, 12, for example, an I10 interface 14.1 that sends control signals to the process system 1B and takes in necessary signals from the process system 13
．． 4 etc., each central processing unit], 1. , 11 and a common memory 15 managed individually by each central processing unit 1.1, 1.
A connecting device 16 or the like for connecting 1 is provided. Each of these systems acquires logging information during operation using each solid device]1, and periodically executes a diagnostic test to acquire diagnostic test information. Therefore, this device collects performance information such as logging information 1 and diagnostic test information to determine whether or not a common failure has occurred in both systems. 20 was established.

この保守支援装置２０は、各中央演算部］１から逐次送
られてくる運用時のロギング情報（エラー情報）および
診断テスＩ・情報等の実績情報、その発生回数および発
生タイミング等をメモリの所定のエリアに順次記憶する
実績情報記憶手段２１と、この実績情報記憶手段２］か
らハス２２を介して実績情報を読出して所定の処理シー
ケンスに基づいて共通故障条件を検索する共通故障抽出
手段２Ｂと、この共通故障抽出手段２３によって抽出し
た共通故障内容をガイダンス表示する表示部２４とによ
って構成されている。なお、前記ロギング情報および診
断テスト情報としては例えば第１表および第２表のよう
な項Ｉ」が上げられる。This maintenance support device 20 stores logging information (error information) during operation that is sequentially sent from each central processing unit 1, performance information such as diagnostic test I/information, the number of occurrences thereof, the timing of occurrence, etc., in a predetermined memory. A common fault extraction means 2B reads out performance information from the performance information storage means 2 via a lotus 22 and searches for common fault conditions based on a predetermined processing sequence. , and a display section 24 that displays the common fault contents extracted by the common fault extraction means 23 as guidance. Note that the logging information and diagnostic test information include, for example, Item I as shown in Tables 1 and 2.

第１表　　ロギング情報の例第２表　診断テスト情報の例次に、以」二のように構成された装置の動作について説
明する。先ず、実績情報記憶手段２］ては中央演算部１
１，１．１から例えば第１表に示す運用時のロギング情
報および第２表に示す診断テスト情報が送られてくるの
で、これらの実績情報をその発生回数および発生タイミ
ングと共に順次記憶する。Table 1: Examples of logging information Table 2: Examples of diagnostic test information Next, the operation of the apparatus configured as shown below will be explained. First, the performance information storage means 2] and the central processing unit 1
For example, the logging information during operation shown in Table 1 and the diagnostic test information shown in Table 2 are sent from 1 and 1.1, so these performance information are sequentially stored together with the number of occurrences and the timing of occurrence.

しかして、以上のようにして実績情報記憶手段２１に実
績情報を記憶した後、あるいは記憶途中にあって共通故
障抽出手段２２では例えば第２図のようなシーケンスに
基づいて共通故障内容を探索する。すなわち、スタート
指令後、ロギング情報の読出しであるか診断テスト情報
の読出しであるかを判断する（ステップＳｔ、、Ｓ２）
。ロギング情報の読出しの場合には実績情報記憶手段２
１からロギング情報を読出して両システムに共通の故障
か発生しているか否かを判断する（ステップＳ３）。共
通故障条件が発生している時、引き続き、ハードウェア
の故障であるかソフトウェアの故障であるかを判断する
（ステップＳ４．Ｓ５）。After the performance information is stored in the performance information storage means 21 as described above, or during storage, the common fault extraction means 22 searches for common fault contents based on the sequence shown in FIG. 2, for example. . That is, after the start command is issued, it is determined whether logging information or diagnostic test information is to be read (steps St, S2).
. In case of reading logging information, performance information storage means 2
The logging information is read from 1 and it is determined whether a common failure has occurred in both systems (step S3). When a common failure condition occurs, it is subsequently determined whether the failure is a hardware failure or a software failure (steps S4 and S5).

−］Ｏ− 例えば故障発生タイミングがランダムに発生し、かつ、
その発生頻度か多いときハードウェア故障と判断し、ま
たプログラム実行過程の特定のタイミングで例えば実行
時間エラーが生じていればソフトウェアバグと判断する
。そして、ハードウェアの故障の場合にはステップＳ６
．Ｓ７、ソフトウェアの故障の場合にはステップＳ８．
Ｓ９でそれぞれ故障内容を判断し、その故障内容に応じ
て予めそのメモリに記憶されている故障モードに対応す
る保守ガイダンスを読出して表示部２４に表示する。例
えばＣＰＵの故障の場合、メモリからｒＣＰＵ故障発生
」の保守ガイダンスを読出して表示し、保守員にその故
障内容（ＣＰＵの不良）を知らせる。-]O- For example, the failure timing occurs randomly, and
If the occurrence frequency is high, it is determined to be a hardware failure, and if, for example, an execution time error occurs at a specific timing in the program execution process, it is determined to be a software bug. Then, in case of hardware failure, step S6
．． S7, in case of software failure, step S8.
In S9, the failure details are determined, and maintenance guidance corresponding to the failure mode stored in the memory in advance is read out and displayed on the display unit 24 according to the failure contents. For example, in the case of a CPU failure, the maintenance guidance ``rCPU failure has occurred'' is read out from the memory and displayed to notify maintenance personnel of the failure details (CPU failure).

一方、診断テスト情報の読出しの場合についても同様に
共通故障条件が成立しているか否かを判断しくステップ
５１１）、共通故障条件が成立している場合にはハード
ウェアの故障かソフトウェアの故障かを判断する（ステ
ップＳ１２゜８１３）。例えば共有メモリのリードライ
トテストやバルクメモリのリードライトテストで故障が
生じているときには両システムの共有メモリ１５または
バルクメモリ１２等のハードウェアに重大な故障があり
、応用プログラムのテストランによる故障の場合にはソ
フトウェアバグの故障と判断できる。そして、ハードウ
ェアの故障の場合にはステップＳ１４．Ｓ１５にしたが
って例えば「共有メモリのリードライトでリード不良発
生」とガイダンス表示し、ソフトウェアの故障の場合に
はステップＳ１６．Ｓ１７にしたがって同様な処理を行
って故障内容をガイダンス表示する。On the other hand, in the case of reading diagnostic test information, it is similarly determined whether or not the common failure condition is satisfied (step 511). If the common failure condition is satisfied, it is determined whether the failure is a hardware failure or a software failure. (Step S12°813). For example, if a failure occurs during a shared memory read/write test or a bulk memory read/write test, there is a serious failure in the hardware such as the shared memory 15 or bulk memory 12 of both systems. In this case, it can be determined that the failure is due to a software bug. In the case of hardware failure, step S14. In accordance with S15, guidance is displayed, for example, "A read failure occurred during read/write of the shared memory", and in the case of a software failure, step S16. Similar processing is performed in accordance with S17 to display guidance on the details of the failure.

従って、以上のように実施例の構成によれば、両システ
ムから送られてくる実績情報を記憶し、その実績情報の
中から共通の故障内容が有るか否かを判断するようにし
たので、両システムに共通に発生する共通の故障内容を
迅速に検知でき、しかもその故障状態からハードの故障
であるがソフトの故障であるかを判Ｗ１シ、何れの故障
においてもその故障内容を特定してガイダンス表示する
ので、保守員が故障内容を適切に知ることができ、その
故障内容となる箇所を最優先して調べることができる。Therefore, according to the configuration of the embodiment as described above, the performance information sent from both systems is stored, and it is determined from the performance information whether or not there is a common failure content. It is possible to quickly detect the common failure contents that commonly occur in both systems, and also to determine whether it is a hardware failure or a software failure based on the failure condition, and to identify the failure contents in both systems. Since the guidance is displayed, maintenance personnel can appropriately know the details of the failure, and can give top priority to investigating the location of the failure.

このことは、故障内容の特定により１つのシステムの運
用中に他のシステムの故障箇所を短時間に修復できシス
テム全体のダウンを未然に回避できる。また、重大故障
の原因を見落とすことがなく、システムのメンテナンス
の信頼性を上げることかできる。This means that by specifying the details of the failure, a failure in another system can be repaired in a short time while one system is in operation, thereby preventing the entire system from going down. Furthermore, the reliability of system maintenance can be improved without overlooking the cause of a major failure.

なお、上記共通故障抽出手段２３として、例えば第３図
に示すように知識ベース記憶部２３ａと推論部２３ｂを
設けたものでもよい。この知識ベース記憶部２３ａは故
障内容つまり故障モードに応じた保守ガイダンスをルー
ルの形で記憶しており、一方、推論部２３ｂは実績情報
と知識ベースとを照合する機能を持っている。すなわち
、推論部２３ｂは故障内容例えば共有メモリの故障であ
れば知識ベース記憶部２３ａから例えば「待機側共有メ
モリに切替えて診断テストしなさい」、ソフトウェアバ
グの場合には知識ベース記憶部２３ａから例えば「書込
みルーチンの誤りをデバッグしなさい」という保守ガイ
ダンスを読出して表示部２４に表示する構成である。Note that the common fault extraction means 23 may be provided with a knowledge base storage section 23a and an inference section 23b, as shown in FIG. 3, for example. The knowledge base storage unit 23a stores maintenance guidance in the form of rules in accordance with failure details, that is, failure modes, while the inference unit 23b has a function of comparing performance information with the knowledge base. That is, the inference unit 23b sends out the failure details, for example, from the knowledge base storage unit 23a in the case of a failure in the shared memory, such as “Switch to the standby side shared memory and perform a diagnostic test”, and in the case of a software bug, for example, from the knowledge base storage unit 23a. The configuration is such that maintenance guidance "Debug errors in the write routine" is read out and displayed on the display unit 24.

このような構成によれば、よりきめ細かく保守ガイダン
スを定めることができ、またその後の故障状況を考慮し
ながら容易に保守ガイダンスを追加することができる。According to such a configuration, maintenance guidance can be defined in more detail, and maintenance guidance can be easily added while taking subsequent failure conditions into consideration.

次に、第４図は同じく本発明装置の他の実施例を示す構
成図であって、これは常時運用系システムの動作状態を
監視し故障徴候が現われたとき、待機系システムの診断
テストを実行し異常がなければ当該待機系側に切替えて
所要とする動作を継続する構成である。先ず、多重化シ
ステムは種々の構成のものがあるが、説明の便宜上、第
１図のものと同一構成のもの、つまり並列二重化計算機
システムを用いた例について説明する。なお、第４図に
おいて符号の添字ａは運用系、添字すは待機系を示して
いる。Next, FIG. 4 is a block diagram showing another embodiment of the device of the present invention, which constantly monitors the operational status of the operating system and performs a diagnostic test on the standby system when a failure symptom appears. If there is no abnormality, the system switches to the standby system and continues the required operation. First, although there are various configurations of multiplexing systems, for convenience of explanation, an example using the same configuration as that shown in FIG. 1, that is, a parallel duplex computer system will be described. In FIG. 4, the subscript a indicates the active system, and the subscript s indicates the standby system.

一方、保守支援装置２０は、故障徴候検知手段３１、切
替条件判定手段３２、健全性判断手段としての診断テス
ト手段３３および表示部３４等で構成されている。On the other hand, the maintenance support device 20 includes a failure symptom detection means 31, a switching condition determination means 32, a diagnostic test means 33 as a health determination means, a display section 34, and the like.

前記故障徴候検知手段３１は運用系システムの中央演算
部］、　１　ａから送られてくるロギング情報（エラー
ログ情報）例えば第３表に示すようなプログラム実行時
間エラー、シングルビットエラーおよびリトライエラー
等についてその発生回数をカウントし、かつ、そのカウ
ント値がしきい値回数を越えたとき運用系システムに故
障徴候有りと判断する機能を持っている。The failure symptom detection means 31 detects logging information (error log information) sent from the central processing unit of the operational system], 1a, for example, program execution time errors, single bit errors, retry errors, etc. as shown in Table 3. It has the function of counting the number of occurrences of the occurrence of the problem, and determining that there is a failure symptom in the operational system when the count value exceeds the threshold number of times.

第３表　　ロギング情報の例前記切替条件判定手段３２は、故障徴候検知手段３］か
ら故障徴候有りの信号Ｓ１を受けると待機系診断必要有
りの信号を表示部３４に出力する。Table 3 Example of logging information When the switching condition determining means 32 receives a signal S1 indicating that there is a failure symptom from the failure symptom detecting means 3, it outputs a signal indicating that standby system diagnosis is necessary to the display section 34.

この診断テスト手段３３は、保守員の操作または図示さ
れていないか切替条件判定手段３２からの待機系診断必
要有りの信号を受けて待機系システムの中央演算部］、
　１　ｂに診断指令信号Ｓ２を送出し、それに基づいて
例えば第４表のような診断テスト情報Ｓ３を受け、かつ
、この診断テスト情報Ｓ３に基づいて待機系システムの
良否を判定し、その判定信号を切替条件判定手段３２へ
送出する。This diagnostic test means 33 receives a signal indicating that a standby system diagnosis is necessary from an operation by a maintenance person or from a switching condition determining means 32 (not shown), and the central processing unit of the standby system].
1b, and based on the diagnostic test information S3 as shown in Table 4, for example, determines the quality of the standby system based on the diagnostic test information S3, and sends the determination signal. is sent to the switching condition determining means 32.

従って、この切替条件判定手段３２は運用系に故障徴候
か有り、かつ、待機系システムが良好である場合に待機
県側切替許可信号を表示部３４に送出する。なお、各中
央演算部１］、ａ、ｌｌｂには図面上省略されているが
それぞれ故障徴候検知用信号線１診断指令信号線および
診断テスト情報用信号線が設けられている。Therefore, this switching condition determining means 32 sends a switching permission signal to the standby prefecture side to the display section 34 when there is a failure symptom in the active system and the standby system is in good condition. Although not shown in the drawing, each of the central processing units 1], a, and llb is provided with a failure symptom detection signal line 1, a diagnostic command signal line, and a diagnostic test information signal line, respectively.

第４表　診断テスト情報の例従って、以上のように実施例の構成によれば、故障徴候
検知手段３］において運用系システムの特定のエラーの
発生回数か予め定めたしきい値回数を越えたとき、運用
系システムに故障の徴候有りと判断し、切替条件判定手
段を通して表示部３４に表示すると共に診断テスト手段
３３で待機系システムへ診断指令を与えてその診断テス
ト情報を受は取る。そして、その診断テスト情報から待
機系システムか良好であれば、速やかに待機系側に切替
え、一方、待機系に異常かあれば全システムダウンにな
る恐れあるので切替えを許可しない。従って、以上のよ
うな診断により待機系システムか健全であれば、待機系
に迅速に切替えて運用系の保全を行い、一方、待機系の
診断結果か否であれば、待機系の保全修復を行って切替
えることにより、システムの切替えに対する信頼度が向
上し、全システムのダウンを未然に回避することかでき
る。Table 4 Example of diagnostic test information Therefore, according to the configuration of the embodiment as described above, when the number of occurrences of a specific error in the operational system exceeds a predetermined threshold number of times in the failure symptom detection means 3] At this time, it is determined that there is a symptom of a failure in the active system, and it is displayed on the display section 34 through the switching condition determining means, and the diagnostic test means 33 gives a diagnostic command to the standby system to receive and receive the diagnostic test information. If the diagnostic test information shows that the standby system is in good condition, it is immediately switched to the standby system.On the other hand, if the standby system is abnormal, switching is not permitted because there is a risk that the entire system will go down. Therefore, if the standby system is found to be healthy as a result of the above diagnosis, quickly switch to the standby system and maintain the active system, while if the diagnosis result of the standby system is not good, then maintenance and repair of the standby system is performed. By performing the switchover, the reliability of the system switchover improves, and it is possible to prevent the entire system from going down.

次に、第５図は第４図に示す装置の他の実施例を示す構
成図である。この実施例は、前記健全性判断手段として
、第４図と同様な常１１１は不動作の故障徴候検知手段
３３ａと待機系運用確認部３３ｂを設け、この待機系運
用確認部３３ｂか故障徴候検知手段３１から故障徴候有
りの信号Ｓ４を受けると、故障徴候検知手段３３ａに動
作制御信号Ｓ５を送出し、かつ、切替条件判定手段３２
へ切替判定開始信号Ｓ６を送出する。ここで、故障徴候
検知手段３３ａは待機系システムの故障徴候の有無を判
定し、その判定信号を切替条件判定手段３２に送出する
。この切替条件判定手段３２は故障徴候検知手段３３ａ
の判定信号から故障徴候無し、つまり健全であると判定
したとき待機系システムのＩ１０インターフェイス１４
　ｂのロック解除信号を送出する。通常、待機系のＩ１
０インターフェイス１４　ｂはロック状態に有る。この
ようにしてＩ１０インターフェイス１．４　ｂのロック
解除後、待機系を切替えて実運用に入る。Next, FIG. 5 is a block diagram showing another embodiment of the apparatus shown in FIG. 4. In this embodiment, as the health judgment means, a normal 111 similar to that shown in FIG. When receiving a signal S4 indicating that there is a failure symptom from the means 31, the operation control signal S5 is sent to the failure symptom detection means 33a, and the switching condition determination means 32
A switching determination start signal S6 is sent to. Here, the failure symptom detection means 33a determines whether or not there is a failure symptom in the standby system, and sends the determination signal to the switching condition determination means 32. This switching condition determining means 32 is a failure symptom detecting means 33a.
I10 interface 14 of the standby system when it is determined from the determination signal that there is no failure sign, that is, the system is healthy.
Sends the lock release signal of b. Usually, standby I1
0 interface 14b is in a locked state. After unlocking the I10 interface 1.4b in this way, the standby system is switched over and actual operation begins.

このような構成によれば、実際に待機系の故障徴候の有
無を確認することにより、切替えの信頼度を更に向上さ
せることかできる。According to such a configuration, the reliability of switching can be further improved by actually checking the presence or absence of failure symptoms in the standby system.

［発明の効果］以上詳記したように本発明によれば、次のような種々の
効果を奏する。[Effects of the Invention] As detailed above, the present invention provides the following various effects.

請求項１では、各システムの実績情報から共通となる故
障の有無を判断し、共通故障有りの場合にはハードの故
障かソフトの故障かを判定すると共にその故障内容に応
じた保守ガイダンスを出力するので、多重システムのハ
ードウェアおよびソフトウェア等の共通故障条件を自動
的に見つけ出すことができ、特別な経験および技量を必
要とせずにシステムの動作状況を確実に把握でき、かつ
、故障に至る前に適切な回復処置を講じることができる
。In claim 1, it is determined whether there is a common failure based on the performance information of each system, and if there is a common failure, it is determined whether it is a hardware failure or a software failure, and maintenance guidance is output according to the content of the failure. This makes it possible to automatically find common failure conditions in the hardware and software of multiple systems, and it is possible to reliably understand the operating status of the system without the need for special experience or skill, and to detect failure conditions before a failure occurs. Appropriate remedial action can be taken.

次に、請求項２においては、運用系システムにおける故
障徴候の有無を監視し故障徴候有りの場合には待機系の
健全性を確認して待機系側に切替えるので、切替えの信
頼性を高めることができ、多重化システムの安定１円滑
な動作を確保することができる。Next, in claim 2, the presence or absence of failure symptoms in the active system is monitored, and if there is a failure sign, the health of the standby system is confirmed and the switch is made to the standby system, thereby increasing the reliability of switching. This can ensure stable and smooth operation of the multiplex system.

[Brief explanation of the drawing]

第１図は本発明装置の一実施例を示す構成図、第２図は
第１図の動作を説明するフローチャート、第３図ないし
第５図はそれぞれ他の実施例を示す構成図である。１０・・・並列二重化計算機システム、１１゜１１ａ、
ｌｌｂ・・・中央演算部、１２・・バルクメモリ、１５
・・・共有メモリ、２０・・・保守支援装置、２１・・
・実績情報記憶手段、２Ｂ・・・共通故障抽出手段、２
３ａ・・・知識ベース記憶部、２３ｂ・・・推論部、２
４．３４・・・表示部、３１・・・故障徴候検知手段、
３２・・・切替条件判定手段、３３・・・診断テスト手
段、３３ａ・・・故障徴候検知手段、３３ｂ・・待機系
運用確認部。出願人代理人　　弁理士　鈴江武彦FIG. 1 is a block diagram showing one embodiment of the apparatus of the present invention, FIG. 2 is a flowchart explaining the operation of FIG. 1, and FIGS. 3 to 5 are block diagrams showing other embodiments. 10...Parallel redundant computer system, 11゜11a,
llb...Central processing unit, 12...Bulk memory, 15
...Shared memory, 20...Maintenance support device, 21...
・Achievement information storage means, 2B... Common fault extraction means, 2
3a...Knowledge base storage unit, 23b...Inference unit, 2
4.34...Display unit, 31...Failure symptom detection means,
32...Switching condition determination means, 33...Diagnostic test means, 33a...Failure symptom detection means, 33b...Standby system operation confirmation unit. Applicant's agent Patent attorney Takehiko Suzue

Claims

[Claims]

(1) In a maintenance support device for a multiplexed system that monitors the operating status of the multiplexed system, performance information that stores performance information of either or both of logging information and diagnostic test information sent from each of the systems. The system comprises a storage means and a common fault extraction means for comparing performance information of each of the systems and extracting a common fault condition that may lead to failure and stoppage of all the systems, and outputting the details of the fault extracted by the common fault extraction means. A maintenance support device for a multiplexed system characterized by:

(2) A maintenance support device for a multiplexed system that monitors the operating state of the multiplexed system, comprising a failure symptom detection means for detecting a failure symptom from logging information on the operational side of the multiplexed system, and the failure symptom detection means. a health determining means for determining whether or not the standby system is healthy by taking in diagnostic test information or logging information from the standby side system when the presence of a failure symptom is detected by the means; The switching condition determining means receives a symptom signal and switches to the standby system on the condition that the standby system is healthy based on the diagnostic test result or logging information by the health determining means. Maintenance support equipment for multiplexed systems.