JPH1049219A

JPH1049219A - Fault occurrence evading device

Info

Publication number: JPH1049219A
Application number: JP20473096A
Authority: JP
Inventors: Yasutomo Akiyama; 康智秋山
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1996-08-02
Filing date: 1996-08-02
Publication date: 1998-02-20

Abstract

PROBLEM TO BE SOLVED: To estimate a fault to possibly occur and automatically take a measure against the fault before it occurs by providing a fault evading means which performs an operation process based upon an evading method stored in a fault information data base while related to a fault judged by a fault judging means. SOLUTION: Observation data of respective clients which are inputted to a fault management server 8-21 are outputted to an operation state data base 1-5 on a data base server 12-25 according to a DB server specification file that a fault management server 8-21 has. The fault management server 8-21 judges the danger that a fault is about to occur on the basis of the operation state data base 1-5 and a fault information data base 1-6 that the data base server 12-25 has. When the danger of fault occurrence is judged, a client is so instructed to implement an evading method for fault occurrence which is recorded in the fault information data base 1-6 that the data base server 12-25 has.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、障害の発生を未
然に検知し、発生しうる障害に対する処置を自動的に行
い、障害の発生を回避させる障害発生回避装置に関する
ものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a failure occurrence avoidance apparatus for detecting occurrence of a failure beforehand, automatically taking measures for the possible failure, and avoiding the occurrence of the failure.

【０００２】[0002]

【従来の技術】障害発生回避装置の従来例としては、特
開平４−１６１８２３号公報に示されたものがある。こ
れは設備や製品などのハードウェアによる機械的な原因
に基づく障害の発生を回避するものであり、これを図１
５に示す。2. Description of the Related Art As a conventional example of a fault occurrence avoiding device, there is one disclosed in Japanese Patent Application Laid-Open No. 4-161823. This is to avoid the occurrence of a failure due to a mechanical cause caused by hardware such as equipment and products.
It is shown in FIG.

【０００３】図１５において、１５−１は設備であり、
生産用設備などを示す。１５−２は監視部であり、設備
１５−１に接続され、設備１５−１から得られる設備１
５−１内部の温度や生産される製品の寸法などのデータ
について観測する。１５−４はデータベースであり、監
視部１５−２に接続され、過去においてトラブルが発生
した際に監視部１５−２で観測されたデータに基づくパ
ターンコードとその際に発見されたトラブルの発生原因
とそのトラブルの発生頻度とが１つのレコードとして蓄
積される。１５−３は診断部であり、監視部１５−２及
びデータベース１５−４に接続され、パターンコードを
検索キーとしてデータベース１５−４を検索する。In FIG. 15, reference numeral 15-1 denotes equipment,
Shows production equipment. Reference numeral 15-2 denotes a monitoring unit, which is connected to the facility 15-1 and which is obtained from the facility 15-1.
5-1 Observe data such as internal temperature and dimensions of products to be produced. Reference numeral 15-4 denotes a database, which is connected to the monitoring unit 15-2 and is a pattern code based on data observed by the monitoring unit 15-2 when a trouble has occurred in the past, and a cause of the trouble found at that time. And the frequency of occurrence of the trouble are accumulated as one record. A diagnostic unit 15-3 is connected to the monitoring unit 15-2 and the database 15-4, and searches the database 15-4 using the pattern code as a search key.

【０００４】次に図１５に示す従来例の動作について説
明する。設備１５−１は、例えば製品を生産する。そし
て、設備１５−１内部の温度や設備１５−１から産出さ
れた製品の寸法などを、監視部１５−２が観測する。一
方、データベース１５−４は、過去に発生したトラブル
に対して、トラブルが発生した際に監視部１５−２で観
測されたデータに基づくパターンコードとそのトラブル
の発生原因とそのトラブルの発生頻度とを１つのレコー
ドとして蓄積している。Next, the operation of the conventional example shown in FIG. 15 will be described. The equipment 15-1 produces a product, for example. Then, the monitoring unit 15-2 observes the temperature inside the facility 15-1 and the dimensions of the products produced from the facility 15-1. On the other hand, the database 15-4 stores a pattern code based on data observed by the monitoring unit 15-2 when a trouble occurs, a cause of the trouble, a frequency of occurrence of the trouble, and a trouble that has occurred in the past. Are stored as one record.

【０００５】そして、診断部１５−３は、新しく観測さ
れたデータから得られたパターンコードに基づき、デー
タベース１５−４を検索する。この時、データベース１
５−４から該当するパターンが検索されると、該当する
トラブルが発生する前に管理者へトラブルが発生しそう
であることを通知する。そして、この通知を受けた管理
者は、発生しうるトラブルを回避するための処理を行
う。[0005] The diagnosis unit 15-3 searches the database 15-4 based on the pattern code obtained from the newly observed data. At this time, database 1
When the corresponding pattern is retrieved from 5-4, the administrator is notified that a trouble is likely to occur before the corresponding trouble occurs. Then, the administrator who has received the notification performs a process for avoiding a trouble that may occur.

【０００６】また、別の従来例としては、特開平６−１
７−３８８６号公報に示されたものがある。これは、Ｃ
ＰＵの負荷状況に関するデータのみを観測し障害が発生
したか否かを監視しているものであり、これを図１６に
示す。図１６において、１６−１〜１６−５は、通常業
務用ＬＡＮアダプタである。１６−６〜１６−１０は、
バックアップ用ＬＡＮアダプタである。１６−１１〜１
６−１４は現用ＣＰＵであり、通常業務用ＬＡＮアダプ
タ１６−１〜１６−４及びバックアップ用ＬＡＮアダプ
タ１６−６〜１６−９を有する。Another conventional example is disclosed in Japanese Patent Laid-Open No.
There is one disclosed in JP-A-7-3886. This is C
FIG. 16 shows the monitoring of whether or not a failure has occurred by observing only the data relating to the load status of the PU. In FIG. 16, reference numerals 16-1 to 16-5 denote ordinary business LAN adapters. 16-6 to 16-10 are
This is a backup LAN adapter. 16-11 to 1
Reference numeral 6-14 denotes an active CPU, which includes normal business LAN adapters 16-1 to 16-4 and backup LAN adapters 16-6 to 16-9.

【０００７】例えば、現用ＣＰＵ１６−１１は通常業務
用ＬＡＮアダプタ１６−１及びバックアップ用ＬＡＮア
ダプタ１６−６を有し、現用ＣＰＵ１６−１２は通常業
務用ＬＡＮアダプタ１６−２及びバックアップ用ＬＡＮ
アダプタ１６−７を有する。１６−１５はバックアップ
用ＣＰＵであり、通常業務用ＬＡＮアダプタ１６−５及
びバックアップ用ＬＡＮアダプタ１６−１０を有する。For example, the working CPU 16-11 has a normal business LAN adapter 16-1 and a backup LAN adapter 16-6, and the working CPU 16-12 has a normal business LAN adapter 16-2 and a backup LAN adapter.
It has an adapter 16-7. A backup CPU 16-15 has a normal business LAN adapter 16-5 and a backup LAN adapter 16-10.

【０００８】１６−１６は通常業務用ＬＡＮであり、各
現用ＣＰＵ１６−１１〜１６−１４及びバックアップ用
ＣＰＵ１６−１５が有する通常業務用ＬＡＮアダプタ１
６−１〜１６−５各々を接続する。１６−１７はバック
アップ用ＬＡＮであり、各現用ＣＰＵ１６−１１〜１６
−１４及びバックアップ用ＣＰＵ１６−１５が有するバ
ックアップ用ＬＡＮアダプタ１６−６〜１６−１０各々
を接続する。Reference numeral 16-16 denotes a normal business LAN, and a normal business LAN adapter 1 included in each of the active CPUs 16-11 to 16-14 and the backup CPU 16-15.
6-1 to 16-5 are connected. Reference numeral 16-17 denotes a backup LAN, and each of the active CPUs 16-11 to 16-16
-14 and the backup LAN adapters 16-6 to 16-10 of the backup CPU 16-15 are connected.

【０００９】次に図１６に示す従来例の動作について説
明する。バックアップ用ＣＰＵ１６−１５は、障害の監
視用データを各現用ＣＰＵ１６−１１〜１６−１４に一
定の時間間隔で送信する。そして、バックアップ用ＣＰ
Ｕ１６−１５は、所定時間内に各現用ＣＰＵ１６−１１
〜１６−１４から障害の監視用データに対する応答であ
るＣＰＵの負荷状況を示すＣＰＵ負荷状況情報が返答さ
れるか否かで、障害が発生しているか否かを判断する。
すなわち、所定の時間内に応答がなければ、バックアッ
プ用ＣＰＵ１６−１５は、当該現用ＣＰＵに故障が発生
しているものと判断する。そして、管理者はこの判断に
基づく通知を受け、故障を直すための処理を行う。Next, the operation of the conventional example shown in FIG. 16 will be described. The backup CPU 16-15 transmits failure monitoring data to each of the active CPUs 16-11 to 16-14 at regular time intervals. And backup CP
U16-15 sets each working CPU 16-11 within a predetermined time.
It is determined whether or not a failure has occurred based on whether or not CPU load status information indicating the CPU load status, which is a response to the failure monitoring data, is returned from .about.16-14.
That is, if there is no response within a predetermined time, the backup CPUs 16-15 determine that a failure has occurred in the current CPU. Then, the administrator receives a notification based on this determination and performs a process for correcting the failure.

【００１０】さらに、別の従来例としては、特開平４−
３１０１６０号公報に示されたものがある。これは、発
生した障害による被害の広がりを推定し、発生した障害
による被害を抑えようとするものであり、これを図１７
に示す。図１７において、１７−１は入力部であり、ネ
ットワークを介して複数の装置（図示せず）に接続さ
れ、障害の発生したある装置から出力された障害発生情
報が入力される。１７−２は、ネットワーク構成データ
ベースであり、ネットワークを構成する複数の装置各々
の名称、及び機種に関する情報と、各装置の接続関係を
示す接続情報とが記録される。Further, another conventional example is disclosed in Japanese Unexamined Patent Publication No.
There is one disclosed in Japanese Patent Publication No. 310160. This is to estimate the spread of damage caused by the failure and to reduce the damage caused by the failure.
Shown in In FIG. 17, reference numeral 17-1 denotes an input unit, which is connected to a plurality of devices (not shown) via a network, and receives fault occurrence information output from a faulty device. Reference numeral 17-2 denotes a network configuration database in which information on the names and models of each of a plurality of devices constituting the network and connection information indicating a connection relationship between the devices are recorded.

【００１１】１７−３は、障害データベースであり、ネ
ットワークを構成する各装置から入力される障害内容及
び障害の名称を含む障害発生情報と、障害が各装置へ及
ぼす影響範囲に関する情報とが記録される。１７−４は
推論部であり、ネットワーク構成データベース１７−
２、障害データベース１７−３、及び入力部１７−１に
接続され、入力部１７−１に入力された障害発生情報に
基づき、ネットワーク構成データベース１７−２及び障
害データベース１７−３を検索して、発生した障害の影
響範囲を予測する。Reference numeral 17-3 denotes a failure database, which records failure occurrence information including the contents of failures and names of failures input from the respective devices constituting the network, and information regarding the range of influence of the failures on the respective devices. You. 17-4 is an inference unit, which is a network configuration database 17-
2. Searching the network configuration database 17-2 and the failure database 17-3 connected to the failure database 17-3 and the input unit 17-1, based on the failure occurrence information input to the input unit 17-1, Predict the impact range of the failure that has occurred.

【００１２】１７−５は出力部であり、推論部１７−４
に接続され、推論部１７−４による推論結果に基づき、
発生した障害の影響範囲に関する情報が出力される。１
７−６はエディタであり、ネットワーク構成データベー
ス１７−２及び障害データベース１７−３に接続され、
ネットワーク構成データベース１７−２及び障害データ
ベース１７−３に記録された情報を修正する時に用いら
れる。Reference numeral 17-5 denotes an output unit, which is an inference unit 17-4.
And based on the inference result by the inference unit 17-4,
Information about the affected area of the fault that has occurred is output. 1
An editor 7-6 is connected to the network configuration database 17-2 and the failure database 17-3.
It is used when correcting information recorded in the network configuration database 17-2 and the failure database 17-3.

【００１３】次に図１７に示す従来例の動作について説
明する。障害の発生した装置から出力された障害発生情
報が、入力部１７−１に入力される。そして、推論部１
７−４は、入力部１７−１に入力された障害発生情報に
基づき、ネットワーク構成データベース１７−２及び障
害データベース１７−３を検索して、発生した障害の影
響範囲を予測する。Next, the operation of the conventional example shown in FIG. 17 will be described. The failure occurrence information output from the failed device is input to the input unit 17-1. And inference unit 1
7-4 searches the network configuration database 17-2 and the failure database 17-3 based on the failure occurrence information input to the input unit 17-1, and predicts the affected range of the occurred failure.

【００１４】推論部１７−４が推論した障害の影響範囲
に関する情報は、出力部１７−５に出力される。そし
て、管理者は出力部１７−５に出力された障害の影響範
囲に関する情報に基づき、発生した障害を直すための処
理を行う。なお、ネットワーク構成データベース１７−
２及び障害データベース１７−３に記録された情報を修
正する時は、エディタ１７−６より新たな情報を入力す
る。The information on the affected range of the fault inferred by the inference unit 17-4 is output to the output unit 17-5. Then, the administrator performs a process for correcting the fault that has occurred based on the information regarding the affected range of the fault output to the output unit 17-5. The network configuration database 17-
When correcting information recorded in the failure database 17-3 and the failure database 17-3, new information is input from the editor 17-6.

【００１５】[0015]

【発明が解決しようとする課題】図１５の特開平４−１
６１８２３号公報に示された従来の障害発生回避装置で
は、障害が発生するパターンを、設備の内部温度や製品
の寸法のような機械的なデータから作成し、作成された
パターンと観測されたデータとを比較して、障害の発生
を未然に検知していた。SUMMARY OF THE INVENTION FIG.
In the conventional fault occurrence avoiding device disclosed in Japanese Patent No. 61823, a pattern in which a fault occurs is created from mechanical data such as the internal temperature of equipment and dimensions of a product, and the created pattern and observed data are created. And the occurrence of a failure was detected beforehand.

【００１６】しかしながら、この従来装置では、設備や
製品などのハードウェアによる機械的な原因に基づく障
害を回避する構成については示されているが、設備を作
動させるソフトウェアによる機能的な原因に基づく障害
については対応できないものであった。However, in this conventional apparatus, a configuration for avoiding a failure due to a mechanical cause due to hardware such as equipment or a product is disclosed, but a failure due to a functional cause due to software for operating the equipment is disclosed. Was unable to respond.

【００１７】つまり、この従来の障害発生回避装置で
は、ソフトウェアが原因となる障害についての対策が取
られておらず、ソフトウェアが原因となる障害の発生を
未然に回避することができないものであった。また、こ
の従来の障害発生回避装置では、障害の発生を未然に検
知しても、その障害に対する処置は人間が行なうもので
あり、その処置は障害が発生した後になるため、工業用
コンピュータ等で連続運転が行われるものに対しては利
用しにくいものであった。In other words, in the conventional fault occurrence avoiding device, no countermeasures are taken against a fault caused by software, and the occurrence of a fault caused by software cannot be avoided. . Further, in the conventional fault occurrence avoiding device, even if the occurrence of the fault is detected beforehand, the fault is dealt with by a human, and the action is taken after the fault occurs. It was difficult to use it for continuous operation.

【００１８】また、図１６の特開平６−１７−３８８６
号公報に示された別の障害発生回避装置では、ＣＰＵの
負荷状況に関する情報という単一のデータについてのみ
観測し、障害の判断を行っていた。しかしながら、実際
に発生する障害は、複数の要素が絡み合っている場合が
多い。FIG. 16 shows an example of Japanese Patent Application Laid-Open No. 6-17-3886.
In another fault occurrence avoiding device disclosed in Japanese Patent Application Laid-Open No. H10-157, only a single piece of data, that is, information relating to the load status of a CPU is observed to determine a fault. However, a failure that actually occurs often involves a plurality of elements.

【００１９】つまり、あるひとつの要素を観測するだけ
では、発生している障害を確定させることが困難であ
り、発生した障害に対応した対策を施すことが困難であ
る。また、この従来の障害発生回避装置でも、障害の発
生が未然に検知された時には、人間が発生した障害に対
する処置を行なうものであり、その処置が障害の発生し
た後に行われるため、工業用コンピュータ等の連続運転
が行われるものに対しては利用しにくいものであった。That is, it is difficult to determine a fault that has occurred only by observing a certain element, and it is difficult to take measures corresponding to the fault that has occurred. Also, in the conventional fault occurrence avoiding device, when the occurrence of a fault is detected beforehand, a measure for the fault caused by a human is performed, and the action is performed after the fault has occurred. However, it is difficult to use such a device for continuous operation.

【００２０】さらに、図１７の特開平４−３１０１６０
号公報に示された別の障害発生回避装置では、ある障害
が発生した後に、そのある障害によって被害が及ぶと思
われる範囲に関する情報が示される。この障害発生回避
装置は、あくまでも障害が発生した後に起動するもので
あり、発生した障害が拡大しないように、装置の管理者
に知らせるものである。[0020] Further, FIG.
In another failure occurrence avoidance device disclosed in the publication, after a certain failure has occurred, information on a range that is considered to be damaged by the certain failure is shown. The failure occurrence avoidance device is started only after a failure has occurred, and informs an administrator of the device so that the failure does not increase.

【００２１】また、この障害発生回避装置では、この装
置の管理者が発生した障害に対する処置を行う。よっ
て、発生した障害による被害の拡大を抑えて、発生した
障害に対する処置を迅速に行なうために、障害発生回避
装置の管理者は、非常に多くの知識を備えておく必要が
あり、管理者の負担は非常に大きいものであった。さら
に、管理者による処置は障害の発生後に行われるため、
工業用コンピュータ等の連続運転が行われるものに対し
ては利用しにくいものであった。In the fault occurrence avoiding device, an administrator of the device takes action for the fault that has occurred. Therefore, in order to suppress the spread of the damage caused by the occurred failure and to promptly deal with the occurred failure, the administrator of the failure prevention device needs to have a great deal of knowledge, and The burden was very heavy. In addition, administrator actions are taken after a failure occurs,
It has been difficult to use such a device for continuous operation such as an industrial computer.

【００２２】本発明は、これらの問題に鑑みなされたも
のであり、障害発生回避装置のソフトウェアによる障害
の発生を回避できるように複数の項目について観測し、
発生しうる障害を推定し、障害の発生前にその障害に対
する処置を自動的に行う障害発生回避装置を得ることを
目的とする。また、本発明は、ひとつのサーバがＬＡＮ
を介して接続された複数のクライアントの障害の発生を
回避できるように複数の項目について観測し、発生しう
る障害を推定し、障害の発生前にその障害に対する処置
を自動的に行う障害発生回避装置を得ることを目的とす
る。The present invention has been made in view of these problems, and observes a plurality of items so as to avoid occurrence of a failure due to software of a failure occurrence avoiding device.
It is an object of the present invention to obtain a failure occurrence avoidance device that estimates a possible failure and automatically performs a measure for the failure before the failure occurs. Also, the present invention provides a method in which one server is connected to a LAN.
Observation of multiple items so as to avoid the occurrence of failures of multiple clients connected via, estimating possible failures, and automatically taking action against the failures before they occur The aim is to obtain a device.

【００２３】[0023]

【課題を解決するための手段】この発明にかかる障害発
生回避装置は、複数の要素の動作状況を観測する動作状
況観測手段、動作状況観測手段に接続され、動作状況観
測手段が観測した観測データが記録される動作状況デー
タベース、障害と、この障害を発生させる複数の要素の
動作状況の傾向と、障害の発生を回避するための回避方
法とを関連づけて記録した障害情報データベース、動作
状況データベースと障害情報データベースとに接続さ
れ、動作状況データベースに記録された観測データと障
害情報データベースに記録された動作状況の傾向とを比
較し、動作状況に関連した障害を判断する障害判断手
段、及び、障害判断手段と障害情報データベースとに接
続され、障害判断手段により判断された障害に関連づけ
られて障害情報データベースに記録されている回避方法
に基づく動作処理を行なう障害回避手段、を備えるもの
である。A fault occurrence avoiding device according to the present invention is connected to an operation status observing device for observing the operation status of a plurality of elements, and the observation data observed by the operation status observing device. And a failure information database and an operation status database that record the failure in association with a failure, a trend of the operation status of a plurality of elements that cause the failure, and an avoidance method for avoiding the occurrence of the failure. A failure determination unit connected to the failure information database, comparing the observation data recorded in the operation status database with the tendency of the operation status recorded in the failure information database, and determining a failure related to the operation status; The failure information database is connected to the determination means and the failure information database, and is associated with the failure determined by the failure determination means. Failure avoidance means for performing an operation processing based on the workaround recorded in the over scan, but with a.

【００２４】また、この発明にかかる障害発生回避装置
は、複数の要素の動作状況を観測し、観測して得られた
観測データを出力する動作状況観測手段と、及び、入力
された動作処理命令に基づき、障害の回避処理を行なう
障害回避手段と、を有する第一の計算機、第一の計算機
が有する動作状況観測手段から出力された観測データが
入力され記録される動作状況データベースと、障害及び
この障害を発生させる複数の要素の動作状況の傾向及び
障害の発生を回避するための回避方法を関連づけて記録
した障害情報データベースと、動作状況データベースと
障害情報データベースとに接続され、動作状況データベ
ースに記録された観測データと障害情報データベースに
記録された動作状況の傾向とを比較し、動作状況に関連
した障害を判断された障害に関連づけられて障害情報デ
ータベースに記録されている回避方法に基づく動作処理
を行わせるように命令する動作処理命令を第一の計算機
が有する障害回避手段へ出力する障害判断手段と、を有
する第二の計算機、を備えるものである。Further, the fault occurrence avoiding apparatus according to the present invention observes operation states of a plurality of elements, outputs operation data obtained by observation, and an operation processing instruction inputted. A first computer having failure avoidance means for performing a failure avoidance process based on the operation state database in which observation data output from the operation state observation means of the first computer is inputted and recorded; and A failure information database in which the operation status trends of a plurality of elements causing the failure and an avoidance method for avoiding the occurrence of the failure are recorded in association with each other, and the operation status database and the failure information database are connected to each other. The recorded observation data is compared with the trends in the operating status recorded in the fault information database to determine faults related to the operating status. And a failure determination unit that outputs an operation processing instruction for performing an operation process based on the avoidance method recorded in the failure information database in association with the failure to the failure avoidance unit of the first computer. A second computer.

【００２５】さらに、この発明にかかる障害発生回避装
置は、動作状況観測手段が観測する複数の要素をコンフ
ィグレーションファイルに設定された複数の要素とした
ものである。Further, in the fault occurrence avoiding device according to the present invention, the plurality of elements observed by the operation status observing means are a plurality of elements set in the configuration file.

【００２６】また、この発明にかかる障害発生回避装置
は、動作状況データベースに接続され、動作状況データ
ベースに記録された観測データから、所定の動作状況の
傾向を示す観測データを抽出する動作異常検索手段と、
動作異常検索手段及び障害情報データベースに接続さ
れ、動作異常検索手段が抽出した観測データ、所定の動
作状況の傾向を示す障害、及び障害を回避するための回
避方法を関連づけて障害情報データベースに記録する障
害情報作成手段と、を備えたものである。Further, the fault occurrence avoiding device according to the present invention is connected to an operation status database, and extracts operation data indicating a tendency of a predetermined operation status from the observation data recorded in the operation status database. When,
It is connected to the operation abnormality search means and the failure information database, and records in the failure information database the observation data extracted by the operation abnormality search means, the failure indicating a tendency of a predetermined operation situation, and the avoidance method for avoiding the failure in association with each other. Failure information creating means.

【００２７】さらに、この発明にかかる障害発生回避装
置は、動作異常検索手段が抽出する観測データを検索指
定ファイルに設定された複数の要素としたものである。Further, in the fault occurrence avoiding device according to the present invention, the observation data extracted by the operation abnormality search means is a plurality of elements set in the search specification file.

【００２８】[0028]

BEST MODE FOR CARRYING OUT THE INVENTION

発明の実施の形態１．本発明による障害発生回避装置の
一実施形態を図１に示す。図１において、１−１は動作
状況観測手段である動作状況監視部であり、動作状況を
観測する。１−５は動作状況データベースであり、動作
状況監視部１−１に接続され、動作状況監視部１−１に
よる観測結果に基づいた複数の要素を含む観測データを
記録する。Embodiment 1 of the Invention FIG. 1 shows an embodiment of a failure occurrence avoiding device according to the present invention. In FIG. 1, reference numeral 1-1 denotes an operation status monitoring unit which is an operation status observation means, and observes the operation status. Reference numeral 1-5 denotes an operation status database which is connected to the operation status monitoring unit 1-1 and records observation data including a plurality of elements based on the observation result by the operation status monitoring unit 1-1.

【００２９】１−６は障害情報データベースであり、過
去に発生した障害に関する動作状況の傾向である障害パ
ターンが記録されている。なお、障害情報データベース
１−６に記録されている障害パターンには、ある障害が
発生するまでの、複数の観測項目について、所定の期間
の観測データが記録されている。Reference numeral 1-6 denotes a failure information database in which a failure pattern, which is a tendency of an operation status relating to a failure that has occurred in the past, is recorded. In the failure pattern recorded in the failure information database 1-6, observation data for a predetermined period is recorded for a plurality of observation items until a certain failure occurs.

【００３０】１−４は障害判断手段及び障害回避手段で
ある障害処理管理部であり、動作状況データベース１−
５及び障害情報データベース１−６に接続され、動作状
況データベース１−５に記録された観測データが、障害
情報データベース１−６に記録された障害パターンにあ
てはまり、ある障害が発生しつつあるか否かが判断され
る。Reference numeral 1-4 denotes a failure processing management unit which is a failure determination means and a failure avoidance means.
5 is connected to the failure information database 1-6 and the observation data recorded in the operation status database 1-5 applies to the failure pattern recorded in the failure information database 1-6, and whether or not a certain failure is occurring Is determined.

【００３１】１−２は動作異常検索部であり、動作状況
データベース１−５に接続される。動作異常検索部１−
２は、障害処理管理部１−４によって障害発生時と同様
の検束データの傾向が現われ、障害発生の可能性がある
と判断された場合に、動作状況データベース１−５に記
録された所定の観測データを検索する。An operation abnormality search unit 1-2 is connected to the operation status database 1-5. Operation abnormality search unit 1-
2 indicates a case where the failure processing management unit 1-4 shows the same tendency of the detection data as that at the time of occurrence of the failure, and when it is determined that there is a possibility of occurrence of the failure, the predetermined data recorded in the operation status database 1-5 Search observation data.

【００３２】１−３は動作異常検出手段及び障害情報作
成手段である障害情報作成部であり、動作異常検索部１
−２に接続され、動作異常検索部１−２が検索した観測
データが入力され、新たな障害パターンを作成する。１
−７は障害発生回避装置であり、動作状況監視部１−
１、動作状況データベース１−５、障害情報データベー
ス１−６、障害処理管理部１−４、動作異常検索部１−
２、及び障害情報作成部１−３から構成される。Reference numeral 1-3 denotes a fault information generating unit which is an operation error detecting means and a fault information generating means.
-2, the observation data searched by the operation abnormality search unit 1-2 is input, and a new failure pattern is created. 1
-7 is a fault occurrence avoiding device, which is an operation status monitoring unit 1-
1. Operation status database 1-5, failure information database 1-6, failure processing management unit 1-4, operation abnormality search unit 1
2 and a fault information creation unit 1-3.

【００３３】次に、図１に示したこの発明の実施形態１
の障害発生回避装置の動作について、図２を用いて説明
する。図２において、ステップ（以下、Ｓと略す）１
で、障害発生回避装置を起動させる。Ｓ１が終了する
と、Ｓ２へ進む。Next, the first embodiment of the present invention shown in FIG.
The operation of the failure occurrence avoiding device will be described with reference to FIG. In FIG. 2, step (hereinafter abbreviated as S) 1
Then, the failure occurrence avoidance device is activated. When S1 ends, the process proceeds to S2.

【００３４】Ｓ２で、動作状況監視部１−１は、動作状
況監視部１−１が観測する項目について設定されたコン
フィグレーションファイルを確認する。そして、動作状
況監視部１−１は、確認したコンフィグレーションファ
イルに基づき、定期的にＣＰＵの負荷、メモリの負荷、
ネットワークの負荷、及び動作プロセス等について観測
し、得られた観測データを、動作状況データベース１−
５に記録する。At S2, the operation status monitoring section 1-1 checks the configuration file set for the items observed by the operation status monitoring section 1-1. Then, based on the confirmed configuration file, the operation status monitoring unit 1-1 periodically checks the CPU load, the memory load,
Observe the network load, operation process, etc., and store the obtained observation data in the operation status database 1-
Record in 5.

【００３５】Ｓ３で、障害が発生したとする。なお、障
害が発生しない限り、Ｓ２が繰り返し行われる。また、
Ｓ３が終了すると、Ｓ４へ進む。Ｓ４で、動作異常検索
部１−２は、動作状況データベース１−５に記録されて
いる観測データから、障害が発生した時点までの所定の
期間の観測データを読み出し、障害情報作成部１−３へ
出力する。It is assumed that a failure has occurred in S3. Note that S2 is repeatedly performed unless a failure occurs. Also,
When S3 ends, the process proceeds to S4. In S4, the operation abnormality search unit 1-2 reads the observation data for a predetermined period up to the time of occurrence of the failure from the observation data recorded in the operation status database 1-5, and reads the failure information creation unit 1-3. Output to

【００３６】そして、障害情報作成部１−３は、動作異
常検索部１−２から出力された観測データが入力され
る。観測データが入力された障害情報作成部１−３は、
発生した障害の原因を確定するための所定の条件が設定
された検索指定ファイルに基づき、入力された観測デー
タから所定の観測データを抽出する。Then, the failure data creation unit 1-3 receives the observation data output from the operation abnormality search unit 1-2. The failure information creating unit 1-3 to which the observation data is input,
The predetermined observation data is extracted from the input observation data based on a search specification file in which predetermined conditions for determining the cause of the generated failure are set.

【００３７】また、障害情報作成部１−３は、抽出され
た観測データと、発生した障害とを関連づけ、得られた
情報をひとつの障害パターンとして障害情報データベー
ス１−６に記録する。Ｓ４が終わると、Ｓ５へ進む。Ｓ
５で、障害情報作成部１−３はさらに、障害情報データ
ベース１−６に記録された障害パターンに、その障害に
対する回避方法に関する情報も関連づける。Ｓ５が終了
すると、Ｓ６へ進む。The fault information creating unit 1-3 associates the extracted observation data with the fault that has occurred, and records the obtained information as one fault pattern in the fault information database 1-6. When S4 ends, the process proceeds to S5. S
In step 5, the failure information creation unit 1-3 further associates the information on the avoidance method for the failure with the failure pattern recorded in the failure information database 1-6. When S5 ends, the process proceeds to S6.

【００３８】Ｓ６で、動作状況監視部１−１は、Ｓ２と
同様にコンフィグレーションファイルを確認する。そし
て、動作状況監視部１−１は、確認したコンフィグレー
ションファイルに基づき、定期的にＣＰＵ負荷、メモリ
負荷、ネットワーク負荷、及び動作プロセス等について
観測し、得られた観測データを、動作状況データベース
１−５に記録する。Ｓ６が終了すると、Ｓ７へ進む。In S6, the operation status monitoring unit 1-1 checks the configuration file as in S2. Then, the operation status monitoring unit 1-1 periodically observes the CPU load, the memory load, the network load, the operation process, and the like based on the confirmed configuration file, and stores the obtained observation data in the operation status database 1. Record at -5. When S6 ends, the process proceeds to S7.

【００３９】Ｓ７で、障害処理管理部１−４は、障害情
報データベース１−６に記録されている障害パターンに
示された障害が発生する時の観測データの傾向と同様の
傾向が、動作状況データベース１−５に新たに記録され
た観測データに見られるか否かを確認する。この時、障
害処理管理部１−４によって、動作状況データベース１
−５に新たに記録された観測データが障害の発生時に示
す傾向であると判断されなかった場合には、Ｓ８へ進
む。In S7, the fault processing management unit 1-4 determines the same tendency as the tendency of the observation data at the time of occurrence of the fault indicated by the fault pattern recorded in the fault information database 1-6. It is confirmed whether or not it can be found in the observation data newly recorded in the database 1-5. At this time, the failure status management unit 1-4 operates the operation status database 1
If it is not determined that the observation data newly recorded in -5 has a tendency to indicate when a failure occurs, the process proceeds to S8.

【００４０】またこの時、障害処理管理部１−４によっ
て、動作状況データベース１−５に新たに記録された観
測データが障害の発生する時に見られる傾向であると判
断された場合には、Ｓ１０へ進む。Ｓ８で、障害処理管
理部１−４は、既存の障害パターンとは異なるが類似し
ている動作状況の傾向を示す新たな障害パターンで障害
が発生しそうか否かを判断する。この時、新たな障害パ
ターンで障害が発生していなかった場合には、Ｓ７へ戻
る。またこの時、新たな障害パターンで障害が発生しそ
うな場合には、Ｓ９へ進む。At this time, if the failure processing management unit 1-4 determines that the observation data newly recorded in the operation status database 1-5 has a tendency to be seen when a failure occurs, the process proceeds to S10. Proceed to. In S8, the failure processing management unit 1-4 determines whether or not a failure is likely to occur in a new failure pattern that is different from the existing failure pattern but that is similar in operation status. At this time, if no failure has occurred in the new failure pattern, the process returns to S7. At this time, if a failure is likely to occur in a new failure pattern, the process proceeds to S9.

【００４１】Ｓ９で、新しい障害パターンの障害が発生
したとする。この新しい障害パターンの障害が発生した
場合、Ｓ４へ戻り、動作状況データベース１−５に記録
されている観測データから、新たな障害の発生傾向が観
測された時点までの所定の期間の観測データが読み出さ
れ、所定の条件が設定された検索指定ファイルに基づ
き、所定の観測データが抽出され、類似する動作状況の
傾向を示す障害及びその障害に対する回避方法を関連づ
け、新たな障害パターンとして障害情報データベース１
−６に記録する。It is assumed that a failure of a new failure pattern has occurred in S9. When a failure of this new failure pattern occurs, the process returns to S4, and the observation data for a predetermined period from the observation data recorded in the operation status database 1-5 to the point in time when the new tendency of failure occurrence is observed is obtained. The predetermined observation data is extracted based on the search specification file which is read and the predetermined condition is set, and a failure indicating a tendency of a similar operation situation and a method of avoiding the failure are associated with each other, and the failure information is set as a new failure pattern. Database 1
Record at -6.

【００４２】Ｓ１０で、障害処理管理部１−４が、障害
情報データベース１−６に記録されている障害パターン
に示された過去の観測データの傾向と同様の観測データ
の傾向を、動作状況データベース１−５に記録された最
新の観測データから検出できた場合、障害処理管理部１
−４は、障害情報データベース１−６に記録されている
該障害パターンに関連づけて記録されている回避方法を
行なう。In S10, the fault processing management unit 1-4 compares the tendency of the observation data similar to the past observation data indicated by the fault pattern recorded in the fault information database 1-6 with the operation status database. If it is detected from the latest observation data recorded in 1-5, the failure processing management unit 1
-4 performs the avoidance method recorded in association with the failure pattern recorded in the failure information database 1-6.

【００４３】なお、該障害パターンに関連づけて記録さ
れる回避方法は、観測データを構成する複数の項目個々
の状況の組合わせに基づき、決定される。Ｓ１０が終了
すると、Ｓ１１へ進む。Ｓ１１で、動作異常検索部１−
２は、動作状況データベース１−５に記録された所定の
期間の観測データを抽出し、障害情報作成部１−３へ出
力する。そして、動作状況データベース１−５から抽出
された観測データが入力される障害情報作成部１−３
は、Ｓ４と同様の処理を行ない、入力された観測データ
に基づき、新たな障害パターンを作成する。The avoidance method recorded in association with the failure pattern is determined based on a combination of the situations of a plurality of items constituting the observation data. When S10 ends, the process proceeds to S11. In S11, the operation abnormality search unit 1-
2 extracts the observation data for a predetermined period recorded in the operation status database 1-5 and outputs it to the fault information creation unit 1-3. Then, the fault information creation unit 1-3 to which the observation data extracted from the operation status database 1-5 is input.
Performs the same processing as in S4, and creates a new failure pattern based on the input observation data.

【００４４】Ｓ１１が終了すると、Ｓ１２へ進む。Ｓ１
２では、障害情報データベース１−６に記録されている
既存の障害パターンと、Ｓ１１で新たに作成された同様
の障害に対する障害パターンとが比較される。そして、
これら２つの障害パターンに共通した特徴的な部分が抽
出され、より特徴的な新しい障害パターンが作成され、
障害情報データベース１−６に記録されている既存の障
害パターンが更新される。Ｓ１２が終了すると、Ｓ５へ
戻る。When S11 ends, the process proceeds to S12. S1
In 2, the existing failure pattern recorded in the failure information database 1-6 is compared with the failure pattern for the same failure newly created in S11. And
The characteristic part common to these two failure patterns is extracted, a more characteristic new failure pattern is created,
The existing failure pattern recorded in the failure information database 1-6 is updated. When S12 ends, the process returns to S5.

【００４５】なお、Ｓ１で障害発生回避装置が起動した
時に、障害情報データベース１−６が障害に関する障害
パターンを既に備えていた場合には、Ｓ２〜Ｓ５の処理
を省略してもよい。If the failure information database 1-6 already has a failure pattern relating to the failure when the failure occurrence avoiding device is activated in S1, the processing of S2 to S5 may be omitted.

【００４６】次に、図２のＳ２で用いられるコンフィグ
レーションファイルについて、図３を用いて説明する。
コンフィグレーションファイルには、障害発生回避装置
の動作状況監視部１−１が観測する観測データ、及びそ
の観測データを観測するタイミング等の条件が設定され
る。そして、このコンフィグレーションファイルに設定
される条件は、３種類の書式で表される。Next, the configuration file used in S2 of FIG. 2 will be described with reference to FIG.
In the configuration file, observation data observed by the operation status monitoring unit 1-1 of the failure occurrence avoiding apparatus, and conditions such as timing of observing the observation data are set. The conditions set in this configuration file are expressed in three types of formats.

【００４７】その３種類の書式には、文字列と英字とで
設定される「文字列：英字（ｙ／ｎ）」（３−２）、文
字列だけで設定される「文字列」（３−１）、及び文字
列と数字と英字とで設定される「文字列（ＴＩＭＥ）：
数字：英字（Ｄ／Ｈ／Ｍ／Ｓ）」（３−３）がある。コ
ンフィグレーションファイルに設定される第１の書式で
ある「文字列：英字（ｙ／ｎ）」は、指定された文字列
に関する観測データを動作状況データベース１−５に記
録するか否かが設定されるものである。The three types of formats include “character string: alphabetic character (y / n)” (3-2) set with a character string and alphabetic characters, and “character string” (3) set only with a character string. -1) and "character string (TIME):
Numerals: English letters (D / H / M / S) "(3-3). In the first format “character string: English character (y / n)” set in the configuration file, it is set whether or not to record observation data relating to the specified character string in the operation status database 1-5. Things.

【００４８】コンフィグレーションファイルに設定され
る文字列としては、ＣＰＵ，ＭＥＭＯＲＹ，ＣＯＬＬＩ
ＳＩＯＮ，ＰＲＯＣＥＳＳの４種類がある。ＣＰＵは、
ＣＰＵの負荷状況に関する観測データを得るように命令
するコマンドである。ＭＥＭＯＲＹは、メモリの負荷状
況に関する観測データを得るように命令するコマンドで
ある。The character strings set in the configuration file include CPU, MEMORY, COLLI
There are four types: SION and PROCESS. The CPU
This is a command for instructing to obtain observation data on the load status of the CPU. MEMORY is a command for instructing to obtain observation data on the load status of the memory.

【００４９】ＣＯＬＬＩＳＩＯＮは、ネットワークの負
荷状況に関する観測データを得るように命令するコマン
ドである。ＰＲＯＣＥＳＳは、実行されている動作処理
である動作プロセスに関する観測データを得るように命
令するコマンドである。コロン（：）をはさんで、各文
字列の右側に示される英字には、各文字列の示すコマン
ドに基づいて得られた観測結果を、動作状況データベー
ス１−５に記録するか否かが指定される。"COLLISION" is a command for instructing to obtain observation data on the load status of the network. PROCESS is a command for instructing to obtain observation data on an operation process which is an operation process being executed. The alphabetic character shown on the right side of each character string with a colon (:) indicates whether or not to record the observation result obtained based on the command indicated by each character string in the operation status database 1-5. It is specified.

【００５０】コロンをはさんで、各コマンドの右側に、
英字ｙが指定された場合には、各コマンドに基づき得ら
れた観測結果が、動作状況データベース１−５に記録さ
れる。また、コロンをはさんで、各コマンドの右側に、
英字ｎが指定された場合には、各コマンドに基づき得ら
れた観測結果は、動作状況データベース１−５に記録さ
れない。コンフィグレーションファイルに設定される第
２の書式である「文字列」は、設定された文字列自体が
コマンドとして、障害発生回避装置に認識される。On the right side of each command with a colon between them,
When the letter y is specified, the observation result obtained based on each command is recorded in the operation status database 1-5. Also, with a colon between them, on the right side of each command,
When the letter n is specified, the observation result obtained based on each command is not recorded in the operation status database 1-5. The "character string", which is the second format set in the configuration file, is recognized by the failure occurrence avoiding device as the set character string itself as a command.

【００５１】障害発生回避装置は、そのコマンドに基づ
き、観測データを得る。そして、障害発生回避装置は、
得られた観測データを動作状況データベース１−５に記
録する。この第２の書式で設定されるコマンドは、障害
発生回避装置のユーザ自身が作成することができる。障
害発生回避装置のユーザが、障害を発生させる原因とな
る可能性が高いと思う要素に対して、詳細な観測データ
を必要とする場合、この文字列によるコマンドを作成す
ることにより、所定のコマンドに基づく観測データが得
られる。The failure avoidance device obtains observation data based on the command. And the failure occurrence avoiding device is
The obtained observation data is recorded in the operation status database 1-5. The command set in the second format can be created by the user of the failure occurrence avoidance apparatus. When a user of the failure occurrence avoidance device needs detailed observation data for an element that is likely to cause a failure, a command based on this character string can be used to generate a predetermined command. Observation data based on is obtained.

【００５２】コンフィグレーションファイルに設定され
る第３の書式である「文字列（ＴＩＭＥ）：数字：英字
（Ｄ／Ｈ／Ｍ／Ｓ）」には、第１及び第２の書式で設定
された観測項目に関して障害発生回避装置の動作状況監
視部１−１から得られた観測データを動作状況データベ
ース１−５に記録する時間間隔が設定される。The third format "character string (TIME): numeral: alphabetic character (D / H / M / S)" set in the configuration file is set in the first and second formats. A time interval for recording observation data obtained from the operation status monitoring unit 1-1 of the failure occurrence avoiding device in the operation status database 1-5 for the observation item is set.

【００５３】第３の書式に設定されるコマンドである命
令は、３つのブロックから構成されている。第３の書式
に設定される第一のブロックである「文字列」は、所定
の周期で所定の観測項目を観測することを示す”ＴＩＭ
Ｅ”が設定される。第３の書式に設定される第二のブロ
ックである「数字」には、動作状況監視部１−１によっ
て得られた観測データを、所定の期間内に何回、動作状
況データベース１−５に記録するかが設定される。第３
の書式に設定される第三のブロックである「英字」に
は、観測データを観測する時間の単位が設定される。An instruction, which is a command set in the third format, is composed of three blocks. The "character string", which is the first block set in the third format, indicates that a predetermined observation item is to be observed at a predetermined period, "TIM
E ”is set. In the“ numerals ”, which is the second block set in the third format, the observation data obtained by the operation status monitoring unit 1-1 is stored several times in a predetermined period. Whether to record in the operation status database 1-5 is set. Third
The unit of time for observing the observation data is set in the “alphabet”, which is the third block set in the format described above.

【００５４】なお、第３の書式に設定される観測時間の
単位には、「１日」を示すＤ、「１時間」を示すＨ、
「１分」を示すＭ、及び「１秒」を示すＳの４種類があ
る。例えば、「ＴＩＭＥ：１：Ｈ」がコンフィグレーシ
ョンファイルに設定された場合、「ＴＩＭＥ：１：Ｈ」
という命令は、障害発生回避装置が１時間に１回、第１
及び第２の書式に指定された観測項目に関する観測デー
タを、動作状況データベース１−５に記録するというこ
とを示す。The unit of the observation time set in the third format is D indicating “1 day”, H indicating “1 hour”,
There are four types, M indicating “1 minute” and S indicating “1 second”. For example, when “TIME: 1: H” is set in the configuration file, “TIME: 1: H”
Is issued by the fault avoidance device once an hour.
And that the observation data relating to the observation item specified in the second format is recorded in the operation status database 1-5.

【００５５】次に、図３に示したコンフィグレーション
ファイルに、更に設定する条件を加えて、より詳細な観
測データが得られるように変更したものを、図４に示
し、以下に説明する。この図４に示すコンフィグレーシ
ョンファイルには、障害発生回避装置が観測して得られ
る観測データを、動作状況データベース１−５に保存す
る期間が設定できる。また、図４に示すコンフィグレー
ションファイルには、障害発生回避装置が観測して得ら
れる観測データが動作状況データベース１−５に記録さ
れる量について、時間的又は数量的に制御することがで
きる。Next, FIG. 4 shows a modification of the configuration file shown in FIG. 3 to which more set conditions are added so as to obtain more detailed observation data, which will be described below. In the configuration file shown in FIG. 4, a period for storing observation data obtained by observation by the failure occurrence avoiding device in the operation status database 1-5 can be set. In the configuration file shown in FIG. 4, the amount of observation data obtained by observation by the failure occurrence avoidance device recorded in the operation status database 1-5 can be controlled temporally or quantitatively.

【００５６】図４におけるコンフィグレーションファイ
ルには、設定される書式が３種類ある。この３種類の書
式で、条件であるコマンドが設定される。第１の書式
は、「数字列Ａ：数字列Ｂ：文字列：英字（ｙ／ｎ）」
（４−１）であり、数字列Ａと数字列Ｂとで示された期
間において、文字列に基づくコマンドが執行されるか否
かが設定されるものである。The configuration file in FIG. 4 has three types of formats to be set. A command that is a condition is set in these three types of formats. The first format is “numerical string A: numeric string B: character string: alphabetic character (y / n)”.
(4-1) is to set whether or not the command based on the character string is executed during the period indicated by the numeral string A and the numeral string B.

【００５７】第２の書式は、「数字列Ａ：数字列Ｂ：文
字列」（４−３）であり、数字列Ａと数字列Ｂとで示さ
れた期間において、文字列に基づくコマンドが執行され
るものである。第３の書式は、図３の第３の書式と同様
の「文字列：数字：文字（Ｄ／Ｈ／Ｍ／Ｓ）」（４−
２）である。The second format is “numerical string A: numeric string B: character string” (4-3), and a command based on the character string is performed during the period indicated by numeric string A and numeric string B. Is to be enforced. The third format is similar to the “third format in FIG. 3,“ character string: number: character (D / H / M / S) ”(4-
2).

【００５８】第１の書式「数字列Ａ：数字列Ｂ：文字
列：英字」（４−１）は、図３の書式（３−２）に示さ
れたコマンドに、数字列Ａである観測した観測データの
記録を開始する時間、及び数字列Ｂである観測した観測
データの記録を終了する時間についての設定を追加した
ものである。The first format "numerical string A: numeric string B: character string: alphabetic character" (4-1) is obtained by adding an observation corresponding to numeric string A to the command shown in the format (3-2) of FIG. The setting for the time at which the recording of the observed data is started and the time at which the recording of the observed data, which is the numeric string B, is ended are added.

【００５９】また、書式「数字列Ａ：数字列Ｂ：文字
列」（４−３）は、図３の第１の書式（３−１）に示し
たコマンドに、数字列Ａである観測する観測データの記
録を開始する時間、及び数字列Ｂである観測した観測デ
ータの記録を終了する時間についての設定を追加したも
のである。なお、数字列Ａ、及び数字列Ｂは、５つの要
素から構成される。その５つの要素とは、「分、時、日
／曜日、月、年」であり、この表記された順番に設定さ
れる。The format "numerical string A: numeric string B: character string" (4-3) is a numeric string A in the command shown in the first format (3-1) of FIG. The setting for the time at which the recording of the observation data is started and the time at which the recording of the observation data, which is the numeric string B, is ended are added. Note that the numeric string A and the numeric string B are composed of five elements. The five elements are "minute, hour, day / day of the week, month, year", and are set in the order described.

【００６０】例えば、図４の式（１）〜（６）から数字
列Ａ、及び数字列Ｂの部分を抽出して、その部分のコマ
ンドの意味を、以下に説明する。なお、「*」は「任意
である」ことを示す。式（１）の「 * * * * *: *
* * * *」の部分は、「常に」を意味する。式
（２）の「00 09 * * *:00 17 * * *」の部分
は、「毎日9時00分から17時00分まで」を意味する。式
（３）の「00 09 Mo * *:00 17 Fr * *」の部分
は、「毎週月曜から金曜までの9時00分から17時00分ま
で」を意味する。For example, the parts of the numeric string A and the numeric string B are extracted from the equations (1) to (6) in FIG. 4, and the meaning of the commands in those parts will be described below. Note that “*” indicates “optional”. Expression (1) "* * * * *: *
The part of “***” means “always”. The part of “00 09 * * *: 00 17 * * *” in the equation (2) means “every day from 9:00 to 17:00”. The part of “00 09 Mo **: 00 17 Fr **” in equation (3) means “from 9:00 to 17:00 every Monday to Friday”.

【００６１】式（４）の「 * * Mo * *: * * Fr *
*」の部分は、「毎週月曜から金曜まで」を意味す
る。式（５）の「 * * Mo 04 95: * * Fr 07 95」の
部分は、「■95年4月から■95年7月までの毎週月曜から
金曜まで」を意味する。式（６）の「00 09 03 05 95:3
0 18 25 12 96」の部分は、「■95年5月から■96年12月
までの毎月3日から25日までの9時00分から18時30分ま
で」を意味する。Equation (4) “** Mo **: ** Fr *
The part “*” means “every Monday to Friday”. The part of “* Mo 04 95: ** Fr 07 95” in the equation (5) means “every Monday to Friday from April 1995 to July 1995”. "00 09 03 05 95: 3" in equation (6)
The part “0 18 25 12 96” means “from 9:00 to 18:30 from 3rd to 25th of every month from May 1995 to December 1996”.

【００６２】さらに、第３の書式「文字列：数字：英
字」（４−２）は、図３の第３の書式（３−３）で説明
したものと同様である。また、この第３の書式（４−
２）では、第一のブロックである文字列が”ＳＡＶＥ”
である場合のものを加えた。第一のブロックである文字
列に”ＳＡＶＥ”が設定された時には、障害発生回避装
置は、動作状況データベース１−５に保存される観測デ
ータが、期間的又は数量的に制限される。Further, the third format "character string: number: alphabetic character" (4-2) is the same as that described in the third format (3-3) of FIG. In addition, the third format (4-
In 2), the character string that is the first block is “SAVE”
The case where is added. When “SAVE” is set in the character string that is the first block, the failure occurrence avoiding device limits the observation data stored in the operation status database 1-5 in terms of period or quantity.

【００６３】また、第二のブロックである数字には、数
字が設定される。さらに、第三のブロックである英字に
は、英字（Ｙ，Ｍ，ｎ）が設定される。この第三のブロ
ックに設定される英字は、観測データを動作状況データ
ベース１−５に保存する期間を示しており、「Ｙ」は一
年、「Ｍ」は一月を示す。なお、動作状況データベース
１−５に記録される観測データを期間的に制限するので
はなく数量的に制限したい場合には、第三のブロックに
英字「ｎ」を設定する。すると、所定の数の観測データ
が動作状況データベース１−５に記録されることにな
る。A numeral is set in the numeral which is the second block. Further, an alphabetic character (Y, M, n) is set for the alphabetic character which is the third block. The alphabetic characters set in the third block indicate a period during which observation data is stored in the operation status database 1-5, where "Y" indicates one year and "M" indicates one month. In addition, when it is desired to limit the observation data recorded in the operation status database 1-5 quantitatively instead of limiting it periodically, an alphabetic character “n” is set in the third block. Then, a predetermined number of observation data is recorded in the operation status database 1-5.

【００６４】第三のブロックに英字「ｎ」が設定された
場合、観測データが保存される期間は限定されず、第二
のブロックに設定された数だけ観測データが動作状況デ
ータベース１−５に記録される。第三のブロックに英字
「ｎ」が設定され、そして第二のブロックに設定された
数の観測データが既に動作状況データベース１−５に記
録された場合、新たに観測された観測データは、最も古
い観測データと入れ代わり、動作状況データベース１−
５に記録される。When the alphabet “n” is set in the third block, the period for storing the observation data is not limited, and the observation data is stored in the operation status database 1-5 by the number set in the second block. Be recorded. If the letter “n” is set in the third block and the number of observation data set in the second block has already been recorded in the operation status database 1-5, the newly observed observation data is Replacement of old observation data, operation status database 1
5 is recorded.

【００６５】例えば、コンフィグレーションファイルに
「ＳＡＶＥ：１：Ｙ」と設定された場合、「ＳＡＶＥ：
１：Ｙ」は、「１年前までの観測データを動作状況デー
タベース１−５に記録する」ということを意味する。な
お、それ以前の観測データは、削除される。For example, if “SAVE: 1: Y” is set in the configuration file, “SAVE: 1: Y”
1: Y "means that" observation data up to one year ago is recorded in the operation status database 1-5 ". Observation data before that is deleted.

【００６６】また例えば、コンフィグレーションファイ
ルに「ＳＡＶＥ：１００００：ｎ」と設定された場合、
「ＳＡＶＥ：１００００：ｎ」は、「１００００個の観
測データを動作状況データベースに記録する」というこ
とを意味する。なお、１００００個の観測データが動作
状況データベース１−５に記録され、それ以上の観測デ
ータは、古い順番に削除される。For example, when “SAVE: 10000: n” is set in the configuration file,
“SAVE: 10000: n” means “record 10,000 observation data in the operation status database”. It should be noted that 10,000 observation data are recorded in the operation status database 1-5, and the observation data of more than 10000 are deleted in the oldest order.

【００６７】次に、図２のＳ４で用いられる検索指定フ
ァイルについて、図５を用いて説明する。検索指定ファ
イルには、動作異常検索部１−２が動作状況データベー
ス１−５から検索した観測データから、特徴的な観測デ
ータを検索するための条件が設定される。検索指定ファ
イルに設定される条件は、「文字列」、「数字」、及び
「文字」の３つのブロックから構成される書式「文字
列：数字：文字」（５−１）で示される。Next, the search designation file used in S4 of FIG. 2 will be described with reference to FIG. In the search specification file, conditions for searching characteristic observation data from the observation data retrieved by the operation abnormality retrieval unit 1-2 from the operation status database 1-5 are set. The condition set in the search designation file is indicated by a format “character string: number: character” (5-1) composed of three blocks of “character string”, “number”, and “character”.

【００６８】第一のブロックには、検索する項目が示さ
れる。例えば、文字列に「ＤＡＴＡ」と示された時に
は、所定の期間の観測データを検索することを示す。ま
た例えば、文字列に「ＤＥＶＩＡＴＩＯＮ」と示された
時には、所定の標準偏差を満たす観測データを検索する
ことを示す。さらに例えば、文字列に「ＤＥＬＴＡ」と
示された時には、時系列に並べられた観測データに基づ
き所定の傾きを満たす観測データを検索することを示
す。In the first block, items to be searched are shown. For example, when “DATA” is indicated in the character string, it indicates that observation data for a predetermined period is to be searched. Further, for example, when "DEVATION" is indicated in the character string, it indicates that observation data satisfying a predetermined standard deviation is searched. Further, for example, when "DELETE" is indicated in the character string, it indicates that the observation data satisfying a predetermined inclination is searched based on the observation data arranged in time series.

【００６９】検索指定ファイルに設定される条件を構成
する第二のブロックの数字には、第一のブロックの文字
列に対する所定の条件を形成するために用いられる数が
設定される。検索指定ファイルに設定される条件を構成
する第三のブロックの文字には、第一のブロックの文字
列に基づき、観測データが検索される期間や範囲が設定
される。In the number of the second block constituting the condition set in the search designation file, a number used to form a predetermined condition for the character string of the first block is set. For the characters of the third block constituting the conditions set in the search specification file, the period and range in which the observation data is searched are set based on the character string of the first block.

【００７０】例えば第一のブロックに、所定の期間の観
測データを検索することを示す「ＤＡＴＡ」が設定され
た場合、第三のブロックには、検索される観測データの
時間的な範囲である期間を設定する文字が設定される。
この第三のブロックに設定される文字には５種類あり、
その５種類とは、「年」を示す「Ｙ」、「月」を示す
「Ｍ」、「週」を示す「Ｗ」、「日」を示す「Ｄ」、及
び「時間」を示す「Ｈ」の５種類である。For example, when "DATA" indicating that observation data for a predetermined period is searched is set in the first block, the third block is a time range of the observation data to be searched. The character that sets the period is set.
There are five types of characters set in this third block,
The five types are “Y” indicating “year”, “M” indicating “month”, “W” indicating “week”, “D” indicating “day”, and “H” indicating “time”. 5 types.

【００７１】検索指定ファイルに「ＤＡＴＡ：１：Ｗ」
と設定された場合、この「ＤＡＴＡ：１：Ｗ」という条
件文は、「一週間以内の観測データを検索する」という
ことを示す。また例えば第一のブロックに、所定の標準
偏差を満たす観測データを検索することを示す「ＤＥＶ
ＩＡＴＩＯＮ」や、時系列に並べられた観測データに基
づき所定の傾きを満たす観測データを検索することを示
す「ＤＥＬＴＡ」が設定された場合、第三のブロックに
は、第二のブロックに設定された数字を基準にして、所
定の範囲を示す文字が設定される。"DATA: 1: W" in the search specification file
Is set, the conditional sentence “DATA: 1: W” indicates that “search for observation data within one week”. For example, in the first block, "DEV" indicating that observation data satisfying a predetermined standard deviation is searched.
In the case where "IATION" or "DELTA" indicating that observation data satisfying a predetermined inclination is searched based on observation data arranged in time series is set, the third block is set to the second block. A character indicating a predetermined range is set based on the numeral.

【００７２】この第三のブロックに設定される文字に
は、Ｂ、Ｓ、及びＥの３種類があり、Ｂは「以上」を示
すＢＩＧ、Ｓは「以下」を示すＳＭＡＬＬ、Ｅは「等し
い」ことを示すＥＱＵＡＬである。There are three types of characters set in the third block, B, S, and E, where B is BIG for "more than", S is SMALL for "less", and E is "equal". ".

【００７３】検索指定ファイルに「ＤＥＶＩＡＴＩＯ
Ｎ：６０：Ｂ」と設定された場合、この「ＤＥＶＩＡＴ
ＩＯＮ：６０：Ｂ」という条件文は、「標準偏差が６０
以上の観測データを検索する」ということを示す。ま
た、検索指定ファイルに「ＤＥＬＴＡ：｜２｜：Ｂ」と
設定された場合、この「ＤＥＬＴＡ：｜２｜：Ｂ」とい
う条件文は、「時系列で示した観測データの傾きの絶対
値が２以上である観測データを検索する」ということを
示す。In the search specification file, "DEVIATIO"
N: 60: B ”, this“ DEVIAT ”
The condition statement “ION: 60: B” indicates that “standard deviation is 60
Search the above observation data. " When “DELTA: | 2 |: B” is set in the search specification file, the conditional statement “DELTA: | 2 |: B” indicates that the absolute value of the slope of the observation data shown in time series is Search for two or more observation data. "

【００７４】次に、図１に示す動作状況データベース１
−５の概念図について、図６を用いて説明する。図６に
示すように、動作状況データベース１−５には、時間毎
のＣＰＵ負荷、メモリ負荷、ネットワーク負荷、動作プ
ロセス、及びディスク容量等の観測データが記録され
る。Next, the operation status database 1 shown in FIG.
The conceptual diagram of -5 will be described with reference to FIG. As shown in FIG. 6, the operation status database 1-5 records observation data such as CPU load, memory load, network load, operation process, and disk capacity for each time.

【００７５】６−１は、観測項目一覧表であり、動作状
況データベース１−５に記録される複数の観測項目が示
されている。なお、観測項目一覧表６−１に示される観
測項目は、障害発生回避装置に設定されるコンフィグレ
ーションファイルの内容に基づき変化する。6-1 is an observation item list, which shows a plurality of observation items recorded in the operation status database 1-5. Note that the observation items shown in the observation item list 6-1 change based on the contents of the configuration file set in the failure occurrence avoidance device.

【００７６】また、動作プロセスやディスク容量等の複
数の観測データから構成される観測項目は、観測項目一
覧表６−１とは別の表であるテーブルに記録される。な
お、図６において、動作プロセスに関する観測データは
テーブル６−２に記録される。また、ディスク容量に関
する観測データはテーブル６−３に記録される。そし
て、観測項目一覧表６−１の中の、動作プロセスやディ
スク容量等の欄には、別に設けられたテーブル６−２、
６−３のアドレスが記録される。The observation items composed of a plurality of observation data such as the operation process and disk capacity are recorded in a table different from the observation item list 6-1. In FIG. 6, observation data on the operation process is recorded in the table 6-2. Further, the observation data regarding the disk capacity is recorded in the table 6-3. In the column of the operation process, the disk capacity, and the like in the observation item list 6-1, a separately provided table 6-2,
The address of 6-3 is recorded.

【００７７】次に、図１に示す障害情報データベース１
−６の概念図について、図７を用いて説明する。図７に
示すように、障害情報データベース１−６には、条件、
障害内容、及び処理内容が記録される。Next, the fault information database 1 shown in FIG.
The conceptual diagram of -6 will be described with reference to FIG. As shown in FIG. 7, the failure information database 1-6 stores conditions,
The details of the failure and the processing are recorded.

【００７８】なお、障害内容に対する処理内容は、複数
の処理内容が設定されることもある。７−１は障害パタ
ーン一覧表であり、障害情報データベース１−６に記録
される障害内容、その障害が発生する時の各観測項目の
特性を示す条件、及びその障害の発生を回避するための
処理手段が示される処理内容が示されている。A plurality of processing contents may be set as the processing contents for the failure contents. Reference numeral 7-1 denotes a failure pattern list, which is the failure content recorded in the failure information database 1-6, conditions indicating characteristics of each observation item when the failure occurs, and information for avoiding the occurrence of the failure. The processing contents indicated by the processing means are shown.

【００７９】また、障害内容、及びその障害の処理内容
は、障害パターン一覧表７−１とは別の表であるテーブ
ルに記録される。障害内容はテーブル７−２に記録され
る。また、障害の処理内容はテーブル７−３に記録され
る。The details of the failure and the processing contents of the failure are recorded in a table different from the failure pattern list 7-1. The failure content is recorded in the table 7-2. The processing contents of the failure are recorded in the table 7-3.

【００８０】そして、障害パターン一覧表７−１の障害
内容や処理内容の欄には、別に設けられたテーブル７−
２、７−３のアドレスが記録される。なお、処理テーブ
ル７−３に設定される処理内容については、ユーザが自
由に追加、削除、及び変更することができる。In the column of failure contents and processing contents of the failure pattern list 7-1, a separately provided table 7-
2, 7-3 addresses are recorded. In addition, the user can freely add, delete, and change the processing contents set in the processing table 7-3.

【００８１】このように図１の発明の実施形態１に示し
た障害発生回避装置は、動作状況監視部１−１、動作状
況データベース１−５、障害情報データベース１−６、
障害処理管理部１−４、動作異常検索部１−２、及び障
害情報作成部１−３を備えている。As described above, the fault occurrence avoiding apparatus according to the first embodiment of the present invention shown in FIG. 1 includes an operation status monitoring unit 1-1, an operation status database 1-5, a fault information database 1-6,
The system includes a failure processing management unit 1-4, an operation abnormality search unit 1-2, and a failure information creation unit 1-3.

【００８２】そして、この障害発生回避装置は、動作状
況監視部１−１で、設定されるコンフィグレーションフ
ァイルに基づき、定期的にＣＰＵの負荷、メモリの負
荷、ネットワークの負荷、及び動作プロセス等について
観測し、得られた観測データを、動作状況データベース
１−５に記録する。In the fault occurrence avoiding device, the operation status monitoring unit 1-1 periodically checks the CPU load, the memory load, the network load, the operation process, and the like based on the configuration file set. The observation and the obtained observation data are recorded in the operation status database 1-5.

【００８３】そして、障害処理管理部１−４は、動作状
況データベース１−５に記録された観測データと、障害
情報データベース１−６に記録された障害の発生に関す
る障害パターンに示された観測データの傾向とに、同様
の傾向が見られるか否かを判断する。Then, the failure processing management section 1-4 stores the observation data recorded in the operation status database 1-5 and the observation data indicated in the failure pattern relating to the occurrence of the failure recorded in the failure information database 1-6. It is determined whether or not a similar tendency is found with the tendency.

【００８４】この時、障害処理管理部１−４が動作状況
データベース１−５に記録された観測データから障害の
発生に関する傾向を見出した場合、障害処理管理部１−
４は障害情報データベース１−６に記録された障害パタ
ーンに示された発生しうる障害に対する回避方法を実行
する。At this time, when the failure processing management unit 1-4 finds a tendency regarding the occurrence of a failure from the observation data recorded in the operation status database 1-5, the failure processing management unit 1-4
4 executes a method of avoiding a possible failure indicated by the failure pattern recorded in the failure information database 1-6.

【００８５】また、障害処理管理部１−４で障害が発生
しうると判断した場合、動作異常検索部１−２は、動作
状況データベース１−５に記録されている観測データか
ら、所定の期間の観測データを読み出す。If the failure processing management section 1-4 determines that a failure may occur, the operation abnormality search section 1-2 searches the operation data stored in the operation situation database 1-5 for a predetermined period. Read the observation data of.

【００８６】そして、障害情報作成部１−３が、検索指
定ファイルに基づき、動作異常検索部１−２が読み出し
た観測データから、障害の原因を確定するための所定の
条件に合致する観測データを抽出する。Then, based on the search specification file, the failure information creating unit 1-3 uses the observation data read by the operation abnormality search unit 1-2 to determine the observation data that matches a predetermined condition for determining the cause of the failure. Is extracted.

【００８７】さらに、障害情報作成部１−３は、抽出さ
れた観測データ、障害、及びその障害に対する回避方法
を関連づけ、新たな障害パターンを作成する。この新た
に作成された障害パターンは、障害情報データベース１
−６に記録されている既存の障害パターンと比較され、
これら２つの障害パターンに共通した特徴的な部分に基
づき作成される特徴的な障害パターンが作成され、障害
情報データベース１−６に記録される。Further, the fault information creating unit 1-3 creates a new fault pattern by associating the extracted observation data, the fault, and a method of avoiding the fault with each other. The newly created failure pattern is stored in the failure information database 1
-6 is compared with the existing failure pattern recorded in
A characteristic failure pattern created based on characteristic parts common to these two failure patterns is created and recorded in the failure information database 1-6.

【００８８】このように、動作状況監視部１−１で、設
定されるコンフィグレーションファイルに基づき、ＣＰ
Ｕの負荷、メモリの負荷、ネットワークの負荷、及び動
作プロセス等について観測するため、障害発生回避装置
が監視する設備等で起動するソフトウェアに障害が発生
しそうになっても、その障害の発生を未然に回避させる
ことができる。As described above, the operation status monitoring unit 1-1 sets the CP based on the configuration file set.
In order to observe the load of U, memory load, network load, operation process, etc., even if a failure is likely to occur in the software started in the equipment monitored by the failure avoidance device, the failure occurrence is anticipated. Can be avoided.

【００８９】また、障害処理管理部１−４で発生しうる
障害が検知された場合には、障害情報データベース１−
６に記録された該障害パターンに関連づけられた回避方
法が自動的に実行され、工業用コンピュータ等の連続運
転の必要があるものに対して利用することができる。When a fault that can occur is detected in the fault processing management unit 1-4, the fault information database 1-
The avoidance method associated with the failure pattern recorded in 6 is automatically executed and can be used for those requiring continuous operation such as an industrial computer.

【００９０】さらに、障害処理管理部１−４で発生しう
る障害が検知された場合には、発生しうる障害を未然に
回避するための処置が障害情報データベース１−６に記
録され、その処置の実行も障害発生回避装置自体が行う
ため、障害発生回避装置の管理者は新規に発生した障害
に対する処置の方法を考えるほうに重点を置くことが可
能となり、障害発生回避装置の管理者の負担を軽くでき
る。Further, when a fault that can occur is detected by the fault processing management unit 1-4, a measure for avoiding the possible fault is recorded in the fault information database 1-6. Is also performed by the failure occurrence avoidance device itself, so that the administrator of the failure occurrence avoidance device can focus on thinking about a method of dealing with a newly generated failure, and the burden on the administrator of the failure occurrence avoidance device Can be lightened.

【００９１】また、この障害発生回避装置が観測する観
測項目は、コンフィグレーションファイルの設定により
変化させることができるため、障害の発生を未然に発見
するために適切だと思われる項目を逐次変更させること
ができ、障害の発生に対する事細かな対応が可能にな
る。The observation items observed by the fault occurrence avoiding device can be changed by setting the configuration file, so that the items considered appropriate for detecting the occurrence of a fault are changed sequentially. And a detailed response to the occurrence of a failure becomes possible.

【００９２】さらに、この障害発生回避装置が観測する
観測項目は、独自にコマンドを設定してコンフィグレー
ションファイルに設定することができるため、障害の発
生を未然に発見するために適切だと思われる項目を逐次
設定することができ、障害の発生に対する事細かな対応
が可能になる。Further, the observation items observed by the fault occurrence avoiding device can be set in the configuration file by setting a command independently, so that it is considered appropriate to detect the occurrence of a fault beforehand. Items can be sequentially set, and detailed responses to the occurrence of failures can be made.

【００９３】また、この障害発生回避装置が障害の発生
の危険性を判断する時、障害発生回避装置は検索指定フ
ァイルを参照し、この検索指定ファイルには、観測デー
タの変化量が設定される。このため、障害発生回避装置
は、観測する設備等の変化に対応でき、変化に応じた対
処を施すことができる。When the failure avoidance device determines the risk of occurrence of a failure, the failure occurrence avoidance device refers to a search designation file, and the change amount of observation data is set in the search designation file. . For this reason, the fault occurrence avoiding device can cope with a change in the equipment to be observed or the like, and can take a measure according to the change.

【００９４】発明の実施の形態２．次に、本発明の他の
実施の形態について図８を用いて説明する。図８に示し
た実施形態２は、ＬＡＮケーブルを介して複数の計算機
が接続されて構成される障害発生回避装置であり、障害
発生回避装置を構成する複数の計算機の中のひとつが、
障害の発生を判断するために必要な障害情報データベー
ス１−６を有し、ＬＡＮケーブルに接続された全ての計
算機に起こりうる障害の発生を監視している。Embodiment 2 of the Invention Next, another embodiment of the present invention will be described with reference to FIG. The second embodiment shown in FIG. 8 is a failure avoidance device configured by connecting a plurality of computers via a LAN cable, and one of the plurality of computers constituting the failure occurrence avoidance device includes:
It has a failure information database 1-6 necessary to determine the occurrence of a failure, and monitors the occurrence of a failure that can occur in all computers connected to the LAN cable.

【００９５】図８において、８−２０は障害管理部であ
り、動作状況監視部１−１、動作異常検索部１−２、障
害情報作成部１−３、及び障害処理管理部１−４から構
成される。８−７は、障害管理通信部であり、障害管理
部８−２０に接続される。８−２１は、hostAである障
害管理サーバであり、障害管理部８−２０、動作状況デ
ータベース１−５、障害情報データベース１−６、及び
障害管理通信部８−７から構成される。In FIG. 8, reference numeral 8-20 denotes a failure management unit, which includes an operation status monitoring unit 1-1, an operation abnormality search unit 1-2, a failure information creation unit 1-3, and a failure processing management unit 1-4. Be composed. Reference numeral 8-7 denotes a failure management communication unit, which is connected to the failure management unit 8-20. Reference numeral 8-21 denotes a failure management server, which is hostA, and includes a failure management unit 8-20, an operation status database 1-5, a failure information database 1-6, and a failure management communication unit 8-7.

【００９６】８−１０は、第二の動作状況データベース
である。８−９は、第二の障害管理部であり、第二の動
作状況データベース８−１０に接続される。８−８は、
第二の障害管理通信部であり、第二の障害管理部８−９
に接続される。８−２２は、hostBである第一のクライ
アントであり、第二の動作状況データベース８−１０、
第二の障害管理部８−９、及び第二の障害通信管理部８
−８から構成される。Reference numeral 8-10 denotes a second operation status database. Reference numeral 8-9 denotes a second fault management unit, which is connected to the second operation status database 8-10. 8-8 is
A second fault management communication unit, which is a second fault management unit 8-9
Connected to. 8-22 is a first client which is hostB, and a second operation status database 8-10,
Second failure management unit 8-9 and second failure communication management unit 8
-8.

【００９７】８−１３は、第三の動作状況データベース
である。８−１２は、第三の障害管理部であり、第三の
動作状況データベース８−１３に接続される。８−１１
は、第三の障害管理通信部であり、第三の障害管理部８
−１２に接続される。８−２３は、hostCである第二の
クライアントであり、第三の動作状況データベース８−
１３、第三の障害管理部８−１２、及び第三の障害管理
通信部８−１１から構成される。Reference numeral 8-13 denotes a third operation status database. Reference numeral 8-12 denotes a third failure management unit, which is connected to the third operation status database 8-13. 8-11
Is a third fault management communication unit, and the third fault management unit 8
-12. Reference numeral 8-23 denotes a second client, which is hostC, and a third operation status database 8-23.
13, a third failure management unit 8-12, and a third failure management communication unit 8-11.

【００９８】障害管理サーバ８−２１が有する障害管理
通信部８−７、第一のクライアント８−２２が有する第
二の障害管理通信部８−８、及び第二のクライアント８
−２３が有する第三の障害管理通信部８−１１は、それ
ぞれＬＡＮケーブル８−２４で接続され、ひとつのネッ
トワークを形成する。なお、図８において、図１に示し
た実施形態と同一又は相当の部分には、同一符号を付し
てその説明を省略し、図１と相違する部分について説明
した。The fault management communication unit 8-7 of the fault management server 8-21, the second fault management communication unit 8-8 of the first client 8-22, and the second client 8
The third fault management communication unit 8-11 included in -23 is connected by a LAN cable 8-24 to form one network. In FIG. 8, the same or corresponding parts as those in the embodiment shown in FIG. 1 are denoted by the same reference numerals, and the description thereof will be omitted, and the parts different from FIG. 1 will be described.

【００９９】また、本実施形態２のように、複数の計算
機をＬＡＮケーブルで接続して障害発生回避装置を構成
する場合には、各計算機は管理サーバ指定ファイル９−
１、または障害対象指定ファイル９−２を有する。When a plurality of computers are connected by a LAN cable to constitute a failure avoidance apparatus as in the second embodiment, each computer is provided with a management server designation file 9-.
1 or a failure target designation file 9-2.

【０１００】これら２つのファイルについて図９を用い
て説明する。管理サーバ指定ファイル９−１には、障害
管理サーバ８−２１が指定される。このファイルを持つ
計算機はクライアントとなる。また、障害対象指定ファ
イル９−２には、クライアント８−２２、８−２３が指
定される。このファイルを持つ計算機は障害管理サーバ
となる。These two files will be described with reference to FIG. The failure management server 8-21 is specified in the management server specification file 9-1. The computer that has this file is the client. The clients 8-22 and 8-23 are specified in the failure target specification file 9-2. The computer having this file becomes the fault management server.

【０１０１】サーバ指定ファイル９−１は、各クライア
ント８−２２〜８−２３が有し、このファイルに障害管
理サーバ８−２１の名前が設定される。なお、サーバ指
定ファイル９−１に設定される障害管理サーバ８−２１
の名前は、基本的に１つである。The server designation file 9-1 is possessed by each of the clients 8-22 to 8-23, and the name of the fault management server 8-21 is set in this file. The fault management server 8-21 set in the server specification file 9-1
Is basically one.

【０１０２】障害対象指定ファイル９−２は、障害管理
サーバ８−２１が有する。このファイルには、障害管理
サーバ８−２１によって管理されるクライアント８−２
２〜８−２３が設定される。なお、障害対象指定ファイ
ル９−２に設定されるクライアントは、単数でも複数で
もよい。The failure target designation file 9-2 is included in the failure management server 8-21. This file contains the client 8-2 managed by the fault management server 8-21.
2 to 8-23 are set. The number of clients set in the failure target specification file 9-2 may be one or more.

【０１０３】各クライアント８−２２〜８−２３は、各
クライアント８−２２〜８−２３が有するサーバ指定フ
ァイル９−１を見て、障害管理サーバ８−２１を確認す
る。また、障害管理サーバ８−２１は、障害管理サーバ
８−２１が有する障害対象指定ファイル９−２を見て、
クライアント８−２２〜８−２３を確認する。Each of the clients 8-22 to 8-23 confirms the failure management server 8-21 by looking at the server specification file 9-1 of each of the clients 8-22 to 8-23. Further, the failure management server 8-21 looks at the failure target designation file 9-2 of the failure management server 8-21, and
Check the clients 8-22 to 8-23.

【０１０４】なお、各クライアント８−２２〜８−２３
が有するサーバ指定ファイル９−１に指定される障害管
理サーバ８−２１と、障害管理サーバ８−２１が有する
障害対象指定ファイル９−２に指定される各クライアン
ト８−２２〜８−２３とが対応している時、障害発生回
避装置が構成され、障害管理サーバ８−２１はクライア
ント８−２２〜８−２３に対して障害の発生を管理す
る。Each client 8-22 to 8-23
The failure management server 8-21 specified in the server specification file 9-1 of the client and the clients 8-22 to 8-23 specified in the failure target specification file 9-2 of the failure management server 8-21 When this is the case, a failure occurrence avoidance device is configured, and the failure management server 8-21 manages the occurrence of a failure for the clients 8-22 to 8-23.

【０１０５】各クライアント８−２２〜８−２３が有す
るサーバ指定ファイル９−１に指定される障害管理サー
バ８−２１と、障害管理サーバ８−２１が有する障害対
象指定ファイル９−２に指定される各クライアント８−
２２〜８−２３とが対応していない時、障害発生回避装
置は構成されず、障害の発生の管理は行われない。The failure management server 8-21 specified in the server specification file 9-1 of each of the clients 8-22 to 8-23 and the failure target specification file 9-2 included in the failure management server 8-21 are specified. Each client 8-
When 22 to 8-23 do not correspond, the failure occurrence avoiding device is not configured, and the occurrence of the failure is not managed.

【０１０６】次に、図８に示した本実施形態２の動作に
ついて、図１０を用いて説明する。まず、図１０の障害
管理サーバ８−２１側において、ステップ（以下、Ａと
略す）１で、障害管理サーバ８−２１を起動させる。Ａ
１が終了すると、Ａ２へ進む。Next, the operation of the second embodiment shown in FIG. 8 will be described with reference to FIG. First, the failure management server 8-21 in FIG. 10 starts the failure management server 8-21 in step (hereinafter abbreviated as A) 1. A
When 1 is completed, the process proceeds to A2.

【０１０７】Ａ２で、障害管理サーバ８−２１は、障害
情報データベース１−６に記録されている障害パターン
に示された障害が発生する時の観測データの傾向と同様
の傾向が、動作状況データベース１−５に新たに記録さ
れた各クライアント８−２２〜８−２３から出力された
観測データに見られるか否かを確認する。In A2, the fault management server 8-21 determines the same tendency as the observation data when the fault indicated by the fault pattern recorded in the fault information database 1-6 occurs in the operation status database. In 1-5, it is confirmed whether or not it can be found in the newly recorded observation data output from each of the clients 8-22 to 8-23.

【０１０８】この時、障害管理サーバ８−２１によっ
て、動作状況データベース１−５に新たに記録された観
測データが障害の発生時に現れる傾向であると判断され
なかった場合には、Ａ２の処理が繰り返される。At this time, if the failure management server 8-21 does not determine that the observation data newly recorded in the operation status database 1-5 is likely to appear when a failure occurs, the process of A2 is performed. Repeated.

【０１０９】またこの時、障害管理サーバ８−２１によ
って、動作状況データベース１−５に新たに記録された
観測データが障害の発生時に現れる傾向であると判断さ
れた場合には、Ａ３へ進む。At this time, if the fault management server 8-21 determines that the observation data newly recorded in the operation status database 1-5 tends to appear when a fault occurs, the process proceeds to A3.

【０１１０】Ａ３で、障害管理サーバ８−２１が、障害
情報データベース１−６に記録されている障害パターン
に示された過去の観測データの傾向と同様の観測データ
の傾向を、動作状況データベース１−５に記録された最
新の観測データから検出できた場合、障害管理サーバ８
−２１は、障害情報データベース１−６に記録されてい
る該障害パターンに関連づけて記録されている回避方法
を行なうように該観測データを出力したクライアント８
−２２〜８−２３のいずれかの該クライアントへ命じ
る。Ａ３が終了すると、Ａ４へ進む。At A3, the failure management server 8-21 compares the tendency of the observation data similar to the past observation data indicated by the failure pattern recorded in the failure information database 1-6 with the operation status database 1 If the failure is detected from the latest observation data recorded in -5, the fault management server 8
-21 is a client 8 that outputs the observation data so as to perform the avoidance method recorded in association with the failure pattern recorded in the failure information database 1-6.
-22 to 8-23. When A3 ends, the process proceeds to A4.

【０１１１】Ａ４で、障害管理サーバ８−２１は、障害
情報データベース１−６に記録されている障害パターン
に示される観測データの傾向と同様の傾向が検出された
クライアントが有する動作状況データベース１−５から
所定の期間の観測データを読み出す。そして、障害管理
サーバ８−２１は、読み出した観測データから、検索指
定ファイルに基づく観測データを抽出し、障害パターン
をまた新たに作成する。At A4, the failure management server 8-21 transmits the operation status database 1- 1 of the client in which a tendency similar to the tendency of the observation data indicated by the failure pattern recorded in the failure information database 1-6 is detected. From 5, the observation data for a predetermined period is read. Then, the failure management server 8-21 extracts the observation data based on the search designation file from the read observation data, and newly creates a failure pattern.

【０１１２】さらに、障害管理サーバ８−２１は、同じ
障害の発生に対する、新たに作成された障害パターン
と、既に障害情報データベース１−６に記録されている
障害パターンとを比較し、２つの障害パターンに共通し
た特徴的な部分を抽出し、より特徴的な新しい障害パタ
ーンを作成し、障害情報データベース１−６に記録す
る。Further, the fault management server 8-21 compares the newly created fault pattern and the fault pattern already recorded in the fault information database 1-6 for the same fault occurrence, and A characteristic part common to the patterns is extracted, a more characteristic new failure pattern is created, and recorded in the failure information database 1-6.

【０１１３】また、障害管理サーバ８−２１は、新しく
障害情報データベース１−６に記録された障害パターン
に、関連する回避方法も記録する。そして、Ａ４が終了
すると、Ａ２に戻る。Further, the fault management server 8-21 also records a related avoidance method in the fault pattern newly recorded in the fault information database 1-6. When A4 ends, the process returns to A2.

【０１１４】次に、図１０のクライアントの動作につい
て説明する。ステップ（以下、Ｂと略す）１では、クラ
イアントを起動させる。Ｂ１が終了すると、Ｂ２へ進
む。Ｂ２で、クライアントは、設定されたコンフィグレ
ーションファイルを確認する。Next, the operation of the client shown in FIG. 10 will be described. In step (hereinafter abbreviated as B) 1, the client is activated. When B1 ends, the process proceeds to B2. In B2, the client checks the set configuration file.

【０１１５】そして、クライアントは、確認したコンフ
ィグレーションファイルに基づき、定期的にＣＰＵ負
荷、メモリ負荷、ネットワーク負荷、及び動作プロセス
等の観測データをを、該クライアントが有する動作状況
データベース１−５に記録する。Ｂ２が終了すると、Ｂ
３へ進む。The client periodically records observation data such as CPU load, memory load, network load, and operation process in the operation status database 1-5 of the client based on the confirmed configuration file. I do. When B2 ends, B
Proceed to 3.

【０１１６】Ｂ３で、クライアントは、クライアントが
有する動作状況データベース１−５に記録した観測デー
タを、各クライアントが有する障害管理部から障害管理
通信部を介して、定期的に障害管理サーバ８−２１へ出
力する。なお、クライアントから障害管理サーバ８−２
１へ出力される情報は、それまで障害管理サーバ８−２
１に送信された観測データを除く、該クライアントが有
する動作状況データベースで更新された観測データのみ
を送信してもよい。Ｂ３が終了すると、Ｂ４へ進む。In B3, the client periodically transmits the observation data recorded in the operation status database 1-5 of the client to the failure management server 8-21 from the failure management unit of each client via the failure management communication unit. Output to It should be noted that the client sends the failure management server 8-2
The information output to 1 is the fault management server 8-2 until then.
Only the observation data updated in the operation status database of the client, excluding the observation data transmitted to 1, may be transmitted. When B3 ends, the process proceeds to B4.

【０１１７】Ｂ４で、クライアントは、障害管理サーバ
８−２１に送信した該クライアントに関する観測データ
が、障害管理サーバ８−２１が有する障害情報データベ
ース１−６に記録された障害パターンに、該当したか否
かが知らされる。クライアントに障害が発生しつつある
と判断された場合には、Ｂ５へ進む。また、クライアン
トに障害が発生しつつあるとは判断されなかった場合に
は、Ｂ２へ進む。At B4, the client determines whether the observation data regarding the client transmitted to the failure management server 8-21 corresponds to the failure pattern recorded in the failure information database 1-6 of the failure management server 8-21. It is informed whether or not. When it is determined that a failure is occurring in the client, the process proceeds to B5. If it is not determined that a failure is occurring in the client, the process proceeds to B2.

【０１１８】Ｂ５で、障害管理サーバ８−２１により障
害が発生しつつあると判断されたクライアントは、障害
管理サーバ８−２１から命じられた障害の回避方法を実
行する。Ｂ５が終了すると、Ｂ２に戻る。In B5, the client whose failure is judged to be occurring by the failure management server 8-21 executes the failure avoidance method instructed by the failure management server 8-21. When B5 ends, the process returns to B2.

【０１１９】次に、図８に示した本実施形態２を構成す
るクライアントに、障害情報データベース１−６に登録
されていない新たな障害パターンの障害が発生した時の
動作について、図１１を用いて説明する。まず、クライ
アント側において、障害情報データベース１−６には記
録されていない新たな障害が発生したとする。これを、
ステップ（以下、Ｃと略す）１とする。Ｃ１が終了する
と、Ｃ２へ進む。Next, an operation when a failure of a new failure pattern not registered in the failure information database 1-6 occurs in the client constituting the second embodiment shown in FIG. 8 will be described with reference to FIG. Will be explained. First, it is assumed that a new failure not recorded in the failure information database 1-6 has occurred on the client side. this,
Step (hereinafter abbreviated as C) 1 is assumed. When C1 ends, the process proceeds to C2.

【０１２０】Ｃ２で、クライアントは、発生した障害
と、その障害の発生時刻とに関する情報を障害管理サー
バ８−２１へ出力する。Ｃ２が終了すると、Ｃ３へ進
む。Ｃ３で、発生した障害に対する回避方法の設定を、
障害発生回避装置の管理者が設定するのか否かが選択さ
れる。障害発生回避装置の管理者が、発生した障害に対
する回避方法を設定する場合には、Ｃ４へ進む。At C2, the client outputs information on the fault that has occurred and the time at which the fault has occurred to the fault management server 8-21. When C2 ends, the process proceeds to C3. In C3, the setting of the avoidance method for the fault that occurred
A selection is made as to whether or not the administrator of the failure occurrence avoidance device sets. When the administrator of the failure occurrence avoidance apparatus sets an avoidance method for the occurred failure, the process proceeds to C4.

【０１２１】また、障害発生回避装置の管理者が、発生
した障害に対する回避方法を設定しない場合には、Ｃ５
へ進む。Ｃ４で、クライアントは、Ｃ３での設定に基づ
き、発生した障害に対する回避方法を、障害管理サーバ
８−２１へ通知する。Ｃ４が終了すると、Ｃ５へ進む。
Ｃ５で、クライアントは、図１０に示したＢ２以降の処
理を行う。If the administrator of the failure occurrence avoidance apparatus does not set an avoidance method for the failure, the C5
Proceed to. At C4, the client notifies the failure management server 8-21 of the avoidance method for the occurred failure based on the setting at C3. When C4 ends, the process proceeds to C5.
At C5, the client performs the processing after B2 shown in FIG.

【０１２２】次に、図１１での障害管理サーバ８−２１
側の動作について、説明する。ステップ（以下、Ｄと略
す）１で、障害管理サーバ８−２１には、クライアント
から出力された障害内容、及びその障害の発生時刻に関
する情報が入力される。Ｄ１が終了すると、Ｄ２へ進
む。Next, the fault management server 8-21 in FIG.
The operation on the side will be described. In step (hereinafter abbreviated as D) 1, the failure management server 8-21 receives the contents of the failure output from the client and information on the time of occurrence of the failure. When D1 ends, the process proceeds to D2.

【０１２３】Ｄ２で、障害管理サーバ８−２１は、クラ
イアントから入力された情報を、障害管理サーバ８−２
１が有する障害情報データベース１−６に記録する。Ｄ
２が終了すると、Ｄ３へ進む。At D2, the failure management server 8-21 transmits the information input from the client to the failure management server 8-2.
1 is recorded in the failure information database 1-6. D
When 2 is completed, the process proceeds to D3.

【０１２４】Ｄ３で、障害管理サーバ８−２１は、クラ
イアントから入力された情報に含まれる障害の発生時刻
に関する情報に基づき、クライアントに障害が発生する
までの所定の期間の動作状況を、障害が発生したクライ
アントの動作状況データベースから得る。Ｄ３が終了す
ると、Ｄ４へ進む。At D3, the fault management server 8-21 determines the operating status of the client for a predetermined period until a fault occurs based on the information on the fault occurrence time included in the information input from the client. Obtained from the operating status database of the client where the error occurred. When D3 ends, the process proceeds to D4.

【０１２５】Ｄ４で、クライアントに発生した障害は、
既に障害情報データベース１−６に記録されている障害
であるが障害発生までの動作状況の傾向が異なるもので
あるかどうかが判断される。At D4, the fault that occurred in the client is
It is determined whether the failure is already recorded in the failure information database 1-6, but has a different tendency in the operation status until the failure occurs.

【０１２６】クライアントに発生した障害は、既に障害
情報データベース１−６に記録されている障害であるが
障害発生までの動作状況の傾向が異なるものである場合
には、Ｄ４−１へ進む。また、クライアントに発生した
障害は、既に障害情報データベース１−６に記録されて
いる障害であるが障害発生までの動作状況の傾向が異な
るものではない場合には、Ｄ４−２へ進む。The failure that has occurred in the client is a failure that has already been recorded in the failure information database 1-6, but if the tendency of the operating status up to the failure is different, the process proceeds to D4-1. If the failure that has occurred in the client is a failure that has already been recorded in the failure information database 1-6, but the tendency of the operating status up to the failure is not different, the process proceeds to D4-2.

【０１２７】Ｄ４−１では、障害管理サーバ８−２１
が、障害情報データベース１−６に既に記録されている
障害及びその障害の回避方法と、発生したクライアント
の動作状況の傾向とを関連づけて、障害情報データベー
ス１−６に記録する。Ｄ４ー１が終了すると、Ｄ５へ進
む。In D4-1, the fault management server 8-21
Records the failures already recorded in the failure information database 1-6, the method of avoiding the failures, and the tendency of the operating status of the occurred client in the failure information database 1-6. When D4-1 ends, the process proceeds to D5.

【０１２８】Ｄ４−２では、障害管理サーバ８−２１
が、クライアントから入力された障害に対する回避方法
と、障害の発生したクライアントの動作状況の傾向と
を、関連づけて障害情報データベース１−６に記録す
る。Ｄ４ー２が終了すると、Ｄ５へ進む。Ｄ５で、障害
管理サーバ８−２１は、図１０に示したＡ２以降の処理
を行う。In D4-2, the fault management server 8-21
Records in the failure information database 1-6 the association between the method of avoiding the failure input from the client and the tendency of the operating status of the failed client. When D4-2 is completed, the process proceeds to D5. In D5, the failure management server 8-21 performs the processing after A2 shown in FIG.

【０１２９】このように図８の発明の実施形態２に示し
た障害発生回避装置は、障害管理サーバ８−２１、第一
のクライアント８−２２、及び第二のクライアント８−
２３が互いにＬＡＮ８−２４で接続されて構成されてい
る。障害管理サーバ８−２１は、障害管理部８−２０、
動作状況データベース１−５、障害情報データベース１
−６、及び障害管理通信部８−７から構成される。As described above, the fault occurrence avoiding apparatus according to the second embodiment of the present invention shown in FIG. 8 includes the fault management server 8-21, the first client 8-22, and the second client 8-22.
23 are connected to each other by a LAN 8-24. The failure management server 8-21 includes a failure management unit 8-20,
Operation status database 1-5, failure information database 1
-6, and a fault management communication unit 8-7.

【０１３０】第一のクライアント８−２２は、第二の動
作状況データベース８−１０、第二の障害管理部８−
９、及び第二の障害通信管理部８−８から構成される。
第二のクライアント８−２３は、第三の動作状況データ
ベース８−１３、第三の障害管理部８−１２、及び第三
の障害管理通信部８−１１から構成される。The first client 8-22 includes a second operation status database 8-10, a second fault management unit 8-
9 and a second fault communication management unit 8-8.
The second client 8-23 includes a third operation status database 8-13, a third fault management unit 8-12, and a third fault management communication unit 8-11.

【０１３１】なお、障害管理部８−２０、第二の障害管
理部８−９、及び第三の障害管理部８−１２は、動作状
況監視部１−１、動作異常検索部１−２、障害情報作成
部１−３、及び障害処理管理部１−４から構成される。
また、障害管理通信部８−７、第二の障害管理通信部８
−８、及び第三の障害管理通信部８−１１が、それぞれ
ＬＡＮケーブル８−２４で接続される。The failure management unit 8-20, the second failure management unit 8-9, and the third failure management unit 8-12 are composed of an operation status monitoring unit 1-1, an operation abnormality search unit 1-2, It comprises a failure information creation unit 1-3 and a failure processing management unit 1-4.
Further, the failure management communication unit 8-7 and the second failure management communication unit 8
-8 and the third fault management communication unit 8-11 are connected by a LAN cable 8-24.

【０１３２】そして、この障害発生回避装置は、各クラ
イアント８−２２〜８−２３で、設定されるコンフィグ
レーションファイルに基づき、定期的にＣＰＵの負荷、
メモリの負荷、ネットワークの負荷、及び動作プロセス
等について観測し、得られた観測データを、障害管理サ
ーバ８−２１が有する動作状況データベース１−５に記
録する。In the fault occurrence avoiding device, the clients 8-22 to 8-23 periodically determine the load on the CPU based on the configuration file set.
The load on the memory, the load on the network, the operation process, and the like are observed, and the obtained observation data is recorded in the operation status database 1-5 of the failure management server 8-21.

【０１３３】そして、障害管理サーバ８−２１が有する
障害処理管理部１−４は、障害管理サーバ８−２１が有
する動作状況データベース１−５に記録された観測デー
タと、障害情報データベース１−６に記録された障害の
発生に関する障害パターンに示された観測データの傾向
とに、同様の傾向が見られるか否かを判断する。Then, the fault processing management section 1-4 of the fault management server 8-21 includes the observation data recorded in the operation status database 1-5 of the fault management server 8-21 and the fault information database 1-6. It is determined whether or not a similar tendency is observed with the tendency of the observation data indicated in the failure pattern related to the occurrence of the failure recorded in the above.

【０１３４】この時、障害管理サーバ８−２１の障害処
理管理部１−４が動作状況データベース１−５に記録さ
れた観測データから障害の発生に関する傾向を見出した
場合、障害管理サーバ８−２１の障害処理管理部１−４
は障害情報データベース１−６に記録された障害パター
ンに示された発生しうる障害に対する回避方法を実行す
るように、該クライアントへ命令する。At this time, if the failure processing management section 1-4 of the failure management server 8-21 finds a tendency regarding the occurrence of a failure from the observation data recorded in the operation status database 1-5, the failure management server 8-21. Failure management unit 1-4
Commands the client to execute a method for avoiding a possible failure indicated by the failure pattern recorded in the failure information database 1-6.

【０１３５】障害管理サーバ８−２１から発生しうる障
害に対する回避方法を実行するように命じられたクライ
アントは、命じられた回避方法を実行する。また、障害
管理サーバ８−２１の障害処理管理部１−４で障害が発
生しうると判断した場合、動作異常検索部１−２は、動
作状況データベース１−５に記録されている観測データ
から、所定の期間の観測データを読み出す。The client instructed to execute the avoidance method for a failure that may occur from the failure management server 8-21 executes the instructed avoidance method. When the failure processing management unit 1-4 of the failure management server 8-21 determines that a failure can occur, the operation abnormality search unit 1-2 uses the observation data recorded in the operation status database 1-5 to determine , And reads the observation data for a predetermined period.

【０１３６】そして、障害情報作成部１−３が、検索指
定ファイルに基づき、動作異常検索部１−２が読み出し
た観測データから、障害の原因を確定するための所定の
条件に合致する観測データを抽出する。さらに、障害情
報作成部１−３は、抽出された観測データ、障害、及び
その障害に対する回避方法を関連づけ、新たな障害パタ
ーンを作成する。Then, based on the search specification file, the fault information creating unit 1-3 uses the observation data read by the operation abnormality search unit 1-2 to determine the observation data that matches a predetermined condition for determining the cause of the fault. Is extracted. Further, the fault information creating unit 1-3 creates a new fault pattern by associating the extracted observation data, the fault, and a method of avoiding the fault with each other.

【０１３７】この新たに作成された障害パターンは、障
害情報データベース１−６に記録されている既存の障害
パターンと比較され、これら２つの障害パターンに共通
した特徴的な部分に基づき作成される特徴的な障害パタ
ーンが作成され、障害情報データベース１−６に記録さ
れる。The newly created fault pattern is compared with an existing fault pattern recorded in the fault information database 1-6, and a feature created based on a characteristic part common to these two fault patterns. A typical failure pattern is created and recorded in the failure information database 1-6.

【０１３８】このように、各クライアント８−２２〜８
−２３の動作状況監視部１−１で、設定されるコンフィ
グレーションファイルに基づき、ＣＰＵの負荷、メモリ
の負荷、ネットワークの負荷、及び動作プロセス等につ
いて観測するため、障害発生回避装置が監視する設備等
で起動するソフトウェアに障害が発生しそうになって
も、その障害の発生を未然に回避させることができる。Thus, each of the clients 8-22 to 8-8
The equipment monitored by the fault occurrence avoiding device to observe the CPU load, the memory load, the network load, the operation process, and the like based on the configuration file set by the operation status monitoring unit 1-1 of -23. Thus, even if a failure is likely to occur in the software to be activated, the failure can be prevented from occurring.

【０１３９】また、障害管理サーバ８−２１の障害処理
管理部１−４で発生しうる障害が検知された場合には、
障害情報データベース１−６に記録された該障害パター
ンに関連づけられた回避方法が自動的に実行され、工業
用コンピュータ等の連続運転の必要があるものに対して
利用することができる。If a fault that can occur in the fault processing management unit 1-4 of the fault management server 8-21 is detected,
The avoidance method associated with the failure pattern recorded in the failure information database 1-6 is automatically executed, and can be used for those requiring continuous operation such as an industrial computer.

【０１４０】さらに、障害管理サーバ８−２１の障害処
理管理部１−４で発生しうる障害が検知された場合に
は、発生しうる障害を未然に回避するための処置が障害
情報データベース１−６に記録され、その処置の実行も
障害発生回避装置自体が行うため、障害発生回避装置の
管理者は新規に発生した障害に対する処置の方法を考え
るほうに重点を置くことが可能となり、障害発生回避装
置の管理者の負担を軽くできる。Further, when a possible failure is detected in the failure processing management section 1-4 of the failure management server 8-21, a measure for avoiding the possible failure is provided in the failure information database 1--1. 6, the execution of the measures is also performed by the fault occurrence avoiding device itself, so that the administrator of the fault occurrence avoiding device can focus on thinking about a method of dealing with a newly generated fault, and The burden on the administrator of the avoidance device can be reduced.

【０１４１】また、この障害発生回避装置が観測する観
測項目は、コンフィグレーションファイルの設定により
変化させることができるため、障害の発生を未然に発見
するために適切だと思われる項目を逐次変更させること
ができ、障害の発生に対する事細かな対応が可能にな
る。Since the observation items observed by the failure occurrence avoiding device can be changed by setting the configuration file, the items which are considered appropriate for detecting the occurrence of the failure are sequentially changed. And a detailed response to the occurrence of a failure becomes possible.

【０１４２】さらに、この障害発生回避装置が観測する
観測項目は、独自にコマンドを設定してコンフィグレー
ションファイルに設定することができるため、障害の発
生を未然に発見するために適切だと思われる項目を逐次
設定することができ、障害の発生に対する事細かな対応
が可能になる。Further, since the observation items observed by the failure occurrence avoiding device can be set in the configuration file by setting a command independently, it is considered appropriate to detect the occurrence of a failure beforehand. Items can be sequentially set, and detailed responses to the occurrence of failures can be made.

【０１４３】また、この障害発生回避装置が障害の発生
の危険性を判断する時、障害発生回避装置は検索指定フ
ァイルを参照し、この検索指定ファイルには、観測デー
タの変化量が設定される。このため、障害発生回避装置
は、観測する設備等の変化に対応でき、変化に応じた対
処を施すことができる。When the fault occurrence avoiding device determines the risk of occurrence of a fault, the fault occurrence avoiding device refers to a search designation file, and the change amount of observation data is set in the search designation file. . For this reason, the fault occurrence avoiding device can cope with a change in the equipment to be observed or the like, and can take a measure according to the change.

【０１４４】さらに、この実施形態２の障害発生回避装
置は、障害管理サーバ８−２１、及び複数のクライアン
ト８−２２〜８−２３から構成されており、ひとつの障
害管理サーバ８−２１で複数のクライアント８−２２〜
８−２３の動作状況を監視し、複数のクライアントの障
害の発生を未然に回避することができる。Further, the fault occurrence avoiding apparatus according to the second embodiment comprises a fault management server 8-21 and a plurality of clients 8-22 to 8-23. Clients 8-22
By monitoring the operation status of 8-23, it is possible to prevent a failure of a plurality of clients from occurring.

【０１４５】発明の実施の形態３．次に、本発明の他の
実施の形態について図１２を用いて説明する。図１２に
示した本実施形態３の障害発生回避装置は、ＬＡＮケー
ブル１２−２４を介して、障害管理サーバ８−２１、ク
ライアント８−２２〜８−２３、及びデータベースサー
バ１２−２５が接続されて構成される。本実施形態３の
障害発生回避装置は、ＬＡＮケーブルを介して接続され
た複数の計算機の中のひとつの計算機が、ＬＡＮケーブ
ルに接続された他の全ての計算機の動作状況、及び障害
情報を集中して管理している。Embodiment 3 of the Invention Next, another embodiment of the present invention will be described with reference to FIG. In the fault occurrence avoiding apparatus of the third embodiment shown in FIG. 12, a fault management server 8-21, clients 8-22 to 8-23, and a database server 12-25 are connected via a LAN cable 12-24. It is composed. In the fault occurrence avoiding apparatus according to the third embodiment, one of a plurality of computers connected via a LAN cable concentrates the operation statuses and fault information of all the other computers connected to the LAN cable. And manage it.

【０１４６】１２−２５は、データベースサーバであ
り、障害情報データベース１−６と、障害管理サーバ８
−２１及び動作状況データベース１−５とを、ひとつの
計算機にまとめたものである。動作状況データベース１
−５には、本実施形態３の障害発生回避装置を構成する
障害管理サーバ８−２１及びクライアント８−２２〜８
−２３の全ての計算機の動作状況が記録される。Reference numeral 12-25 denotes a database server, which stores a failure information database 1-6 and a failure management server 8.
-21 and the operation status database 1-5 are combined into one computer. Operation status database 1
-5, the fault management server 8-21 and the clients 8-22 to 8-8 that constitute the fault occurrence avoidance device of the third embodiment.
The operation statuses of all the computers of -23 are recorded.

【０１４７】また、障害情報データベース１−６には、
障害に関する障害パターンが記録される。１２−１４
は、データベース管理部であり、動作状況データベース
１−５、及び障害情報データベース１−６に接続され
る。１２−１５は、第四の障害管理通信部であり、デー
タベース管理部１２−１４に接続され、障害管理サーバ
８−２１、及びクライアント８−２２〜８−２３との間
で観測データや障害の発生回避のための命令等の情報の
送受信が行われる。The fault information database 1-6 includes:
A failure pattern relating to the failure is recorded. 12-14
Is a database management unit, which is connected to the operation status database 1-5 and the failure information database 1-6. Reference numeral 12-15 denotes a fourth failure management communication unit, which is connected to the database management unit 12-14 and communicates observation data and failures with the failure management server 8-21 and the clients 8-22 to 8-23. Information such as a command for avoiding occurrence is transmitted and received.

【０１４８】本実施形態３の障害発生回避装置では、デ
ータベースサーバ１２−２５が動作状況データベース１
−５、及び障害情報データベース１−６を有する。そし
て、障害発生回避装置を構成する障害管理サーバ８−２
１、及び各クライアント８−２２〜８−２３の各計算機
は、データベースサーバ１２−２５が有する動作状況デ
ータベース１−５、及び障害情報データベース１−６を
共用する。In the fault occurrence avoiding apparatus of the third embodiment, the database server 12-25 operates in the operation status database 1
-5, and a failure information database 1-6. Then, the fault management server 8-2 constituting the fault occurrence avoiding device
1 and the computers of the clients 8-22 to 8-23 share the operation status database 1-5 and the failure information database 1-6 of the database server 12-25.

【０１４９】また、動作状況データベース１−５、及び
障害情報データベース１−６に対する制御は、データベ
ースサーバ１２−２５が有するデータベース管理部１２
−１４で行われる。データベース管理部１２−１４は、
第四の障害管理通信部１２−２５を介して入力された動
作状況データベース１−５、及び障害情報データベース
１−６への読み書き等の要求命令に基づき、処理を実行
する。The operation status database 1-5 and the failure information database 1-6 are controlled by the database management unit 12 provided in the database server 12-25.
-14. The database management unit 12-14,
The processing is executed based on a request command for reading / writing to / from the operation status database 1-5 and the failure information database 1-6 input via the fourth failure management communication unit 12-25.

【０１５０】障害管理サーバ８−２１は、動作状況監視
部１−１と動作異常検索部１−２と障害情報作成部１−
３と障害処理管理部１−４とを有する障害管理部８−２
０、及び障害管理通信部８−７から構成される。また、
障害管理サーバ８−２１が有する障害管理部８−２０
は、動作状況データベース１−５、及び障害情報データ
ベース１−６を有するデータベースサーバ１２−２５を
指定するＤＢサーバ指定ファイル１２−１３を有する。The failure management server 8-21 includes an operation status monitoring unit 1-1, an operation abnormality search unit 1-2, and a failure information creation unit 1-
Management unit 8-2 including the network management unit 3 and the failure processing management unit 1-4
0, and a fault management communication unit 8-7. Also,
Failure management unit 8-20 of failure management server 8-21
Has a DB server designation file 12-13 for designating a database server 12-25 having an operation status database 1-5 and a failure information database 1-6.

【０１５１】なお、ＤＢサーバ指定ファイル１２−１３
に、障害管理サーバ８−２１、またはクライアント８−
２２〜８−２３が指定されてもよい。データベースサー
バ１２−２５が、障害管理サーバ８−２１、またはクラ
イアント８−２２〜８−２３に指定された場合、その障
害管理サーバ８−２１、またはクライアント８−２２〜
８−２３の構成は、「障害管理サーバ８−２１（または
クライアント８−２２〜８−２３）の構成＋データベー
スサーバ１２−２５の構成」を有する。The DB server designation file 12-13
The failure management server 8-21 or the client 8-
22 to 8-23 may be specified. When the database server 12-25 is designated as the failure management server 8-21 or the clients 8-22 to 8-23, the failure management server 8-21 or the clients 8-22.
The configuration of 8-23 has “the configuration of the fault management server 8-21 (or the clients 8-22 to 8-23) + the configuration of the database server 12-25”.

【０１５２】第一のクライアント８−２２は、第二の障
害管理部８−９、及び第二の障害管理通信部８−８を有
する。第二のクライアント８−２３は、第三の障害管理
部８−１２、及び第三の障害管理通信部８−１１を有す
る。なお、図１２において、図８に示した実施形態と同
一又は相当の部分には、同一符号を付してその説明を省
略し、図８と相違する部分について説明した。The first client 8-22 has a second fault management unit 8-9 and a second fault management communication unit 8-8. The second client 8-23 has a third fault management unit 8-12 and a third fault management communication unit 8-11. In FIG. 12, the same or corresponding parts as those of the embodiment shown in FIG. 8 are denoted by the same reference numerals, and the description thereof will be omitted. The parts different from FIG.

【０１５３】この実施形態３では、障害管理サーバ８−
２１が、障害の発生についての判断を下している。実施
形態３の障害発生回避装置を構成する障害管理サーバ８
−２１、クライアント８−２２〜８−２３、及びデータ
ベースサーバ１２−２５の各計算機での、障害が発生し
つつあることが判断されるまでの一連の処理を以下に示
す。各クライアント８−２２〜８−２３は観測した観測
データを、障害管理サーバ８−２１へ送信する。In the third embodiment, the fault management server 8-
21 make a determination about the occurrence of a failure. Fault management server 8 that constitutes a fault occurrence avoiding device according to the third embodiment
A series of processes until it is determined that a failure is occurring in each of the computers -21, the clients 8-22 to 8-23, and the database server 12-25 will be described below. Each of the clients 8-22 to 8-23 transmits the observed data to the fault management server 8-21.

【０１５４】障害管理サーバ８−２１に入力された各ク
ライアントの観測データは、障害管理サーバ８−２１が
有するＤＢサーバ指定ファイルに基づき、データベース
サーバ１２−２５の動作状況データベース１−５へ出力
される。障害管理サーバ８−２１は、データベースサー
バ１２−２５が有する動作状況データベース１−５及び
障害情報データベース１−６に基づき、障害が発生しつ
つある危険について判断する。The observation data of each client input to the fault management server 8-21 is output to the operation status database 1-5 of the database server 12-25 based on the DB server designation file of the fault management server 8-21. You. The failure management server 8-21 determines the danger that a failure is occurring, based on the operation status database 1-5 and the failure information database 1-6 of the database server 12-25.

【０１５５】障害管理サーバ８−２１が、障害の発生の
危険を判断した場合には、データベースサーバ１２−２
５が有する障害情報データベース１−６に記録されてい
る障害発生を回避するための回避方法を実行するよう
に、該クライアントへ命令する。障害管理サーバ８−２
１から命令を受けたクライアントは、その命令に基づ
き、障害発生の回避方法を実行し、障害の発生を未然に
防ぐ。When the failure management server 8-21 determines that there is a risk of occurrence of a failure, the database server 12-2
The client is instructed to execute an avoidance method for avoiding occurrence of a failure recorded in the failure information database 1-6 of the client 5. Fault management server 8-2
The client receiving the command from 1 executes a method for avoiding the occurrence of a failure based on the command to prevent the failure from occurring.

【０１５６】なお、実施形態３では、障害管理サーバ８
−２１が、障害の発生の危険性について判断していた
が、障害管理サーバを設定せず、障害の発生の危険性に
ついての判断をデータベースサーバ１２−２５が行な
い、各クライアント８−２２〜８−２３がＤＢサーバ指
定ファイル１２−２３を備えるようにしてもよい。In the third embodiment, the fault management server 8
-21 judges the risk of occurrence of a failure, but does not set a failure management server, and the database server 12-25 makes a judgment on the risk of occurrence of a failure. -23 may include the DB server designation file 12-23.

【０１５７】この時、第一の計算機である各クライアン
ト８−２２〜８−２３は、各クライアント８−２２〜８
−２３が有するＤＢサーバ指定ファイル１２−２３に基
づき、観測データを直接第二の計算機であるデータベー
スサーバ１２−２５へ転送する。そして、障害が発生し
つつあるとの判断は、データベースサーバ１２−２５が
行ない、その発生しつつある障害に対する処理に関して
は、データベースサーバ１２−２５から該クライアント
へ直接命令する。At this time, the clients 8-22 to 8-23, which are the first computers,
The observation data is directly transferred to the database server 12-25, which is the second computer, based on the DB server designation file 12-23 of the -23. Then, the database server 12-25 determines that a failure is occurring, and the database server 12-25 directly instructs the client regarding processing for the occurring failure.

【０１５８】次に図１２に示す動作状況データベース１
−５の概念図について、図１３を用いて説明する。図１
３に示すように、動作状況データベース１−５には、時
間毎のＣＰＵ負荷、メモリ負荷、ネットワーク負荷、動
作プロセス、及びディスク容量等の観測データが記録さ
れる。１３−１は、観測項目一覧表であり、動作状況デ
ータベース１−５に記録される複数の観測項目が示され
ている。Next, the operation status database 1 shown in FIG.
The conceptual diagram of -5 will be described with reference to FIG. FIG.
As shown in FIG. 3, the operation status database 1-5 records observation data such as CPU load, memory load, network load, operation process, and disk capacity for each time. 13-1 is an observation item list, which shows a plurality of observation items recorded in the operation status database 1-5.

【０１５９】なお、観測項目一覧表１３−１に示される
観測項目は、障害発生回避装置に設定されるコンフィグ
レーションファイルの内容に基づき変化する。また、動
作状況データベース１−５に記録される観測データは、
障害管理サーバ８−２１、及び各クライアント８−２２
〜８−２３等の対応する計算機がわかるように記録され
ている。The observation items shown in the observation item list 13-1 change based on the contents of the configuration file set in the failure occurrence avoiding device. The observation data recorded in the operation status database 1-5 is
Fault management server 8-21 and each client 8-22
The corresponding computer such as ８8-23 is recorded so as to be understood.

【０１６０】さらに、動作プロセスやディスク容量等の
複数の観測データから構成される観測項目は、観測項目
一覧表１３−１とは別の表であるテーブルに記録され
る。なお、図１３において、動作プロセスに関する観測
データはテーブル１３−２に記録される。Further, the observation items composed of a plurality of observation data such as the operation process and the disk capacity are recorded in a table different from the observation item list 13-1. In FIG. 13, observation data on the operation process is recorded in a table 13-2.

【０１６１】また、ディスク容量に関する観測データは
テーブル１３−３に記録される。そして、観測項目一覧
表１３−１の中の、動作プロセスやディスク容量等の欄
には、別に設けられたテーブル１３−２、１３−３のア
ドレスが記録される。The observation data relating to the disk capacity is recorded in the table 13-3. In the column of the operation process, the disk capacity, and the like in the observation item list 13-1, addresses of the separately provided tables 13-2 and 13-3 are recorded.

【０１６２】次に、図１２に示す障害情報データベース
１−６の概念図について、図１４を用いて説明する。図
１４に示すように、障害情報データベース１−６には、
各計算機の条件、障害内容、及び処理内容が記録され
る。なお、障害内容に対する処理内容は、複数の処理内
容が設定されることもある。１４−１は障害パターン一
覧表であり、障害管理サーバ８−２１やクライアント８
−２２〜８−２３等の計算機の名前、条件、障害内容、
処理内容が示される。Next, a conceptual diagram of the failure information database 1-6 shown in FIG. 12 will be described with reference to FIG. As shown in FIG. 14, the failure information database 1-6 includes:
The conditions, fault contents, and processing contents of each computer are recorded. A plurality of processing contents may be set as the processing contents for the failure contents. 14-1 is a failure pattern list, which is a failure management server 8-21 or a client 8;
-Name of computer such as -22 to 8-23, condition, content of failure,
Processing contents are shown.

【０１６３】障害パターン一覧表１４−１の計算機の名
前が示されるホスト名の欄に、複数のホスト名が示され
た場合、これら複数のホストで障害が発生しつつあるこ
とがわかる。なお、障害パターン一覧表１４ー１に複数
のホスト名が示された時、ホスト名とホスト名との間
は、カンマで区切られる。また、障害パターン一覧表１
４−１のホスト名の欄に”＊：”が示された場合、これ
は、実施形態３の障害発生回避装置を構成する障害管理
サーバ８−２１、及び全てのクライアント８−２２〜８
−２３で障害が発生しつつあることを示している。When a plurality of host names are indicated in the host name column indicating the names of the computers in the fault pattern list table 14-1, it can be understood that a fault is occurring in the plurality of hosts. When a plurality of host names are indicated in the failure pattern list 14-1, the host names are separated by commas. In addition, failure pattern list 1
If “*:” is displayed in the column of the host name of 4-1, this indicates the failure management server 8-21 and all the clients 8-22 to 8 that constitute the failure occurrence avoiding apparatus of the third embodiment.
-23 indicates that a fault is occurring.

【０１６４】障害管理サーバ８−２１、及び各クライア
ント８−２２〜８−２３のいずれの計算機にも、類似の
障害は発生しうるが、発生しうる障害に対する回避方法
は、計算機によって、また、特殊な試験で計算機を利用
している等の計算機を利用している環境によって、異な
ることが多い。そこで、本実施形態３の障害発生回避装
置が有する障害情報データベース１−６には、障害が発
生しつつある計算機によって、その計算機固有の回避方
法が実行されるよう設定できる。Similar failures can occur in the computers of the failure management server 8-21 and each of the clients 8-22 to 8-23, but methods of avoiding the possible failures depend on the computers. It often differs depending on the environment in which the computer is used, such as using the computer in a special test. Therefore, the fault information database 1-6 of the fault occurrence avoiding apparatus according to the third embodiment can be set so that the computer in which the fault is occurring executes a computer-specific avoidance method.

【０１６５】このように図１２の発明の実施形態３に示
した障害発生回避装置は、障害管理サーバ８−２１、第
一のクライアント８−２２、第二のクライアント８−２
３、及びデータベースサーバ１２−２５が互いにＬＡＮ
８−２４で接続されて構成されている。障害管理サーバ
８−２１は、障害管理部８−２０、及び障害管理通信部
８−７から構成される。As described above, the fault occurrence avoiding apparatus according to the third embodiment of the present invention shown in FIG. 12 includes a fault management server 8-21, a first client 8-22, and a second client 8-2.
3, and the database server 12-25 is connected to the LAN
8-24. The failure management server 8-21 includes a failure management unit 8-20 and a failure management communication unit 8-7.

【０１６６】第一のクライアント８−２２は、第二の障
害管理部８−９、及び第二の障害通信管理部８−８から
構成される。第二のクライアント８−２３は、第三の障
害管理部８−１２、及び第三の障害管理通信部８−１１
から構成される。The first client 8-22 includes a second fault management unit 8-9 and a second fault communication management unit 8-8. The second client 8-23 includes a third failure management unit 8-12 and a third failure management communication unit 8-11.
Consists of

【０１６７】なお、障害管理部８−２０、第二の障害管
理部８−９、及び第三の障害管理部８−１２は、動作状
況監視部１−１、動作異常検索部１−２、障害情報作成
部１−３、及び障害処理管理部１−４から構成される。
データベースサーバ１２−２５は、動作状況データベー
ス１−５、障害情報データベース１−６、データベース
管理部１２−１４、及び第四の障害管理通信部から構成
される。また、障害管理通信部８−７、第二の障害管理
通信部８−８、第三の障害管理通信部８−１１、及び第
四の障害管理通信部が、それぞれＬＡＮケーブル８−２
４で接続される。The fault management unit 8-20, the second fault management unit 8-9, and the third fault management unit 8-12 include an operation status monitoring unit 1-1, an operation abnormality search unit 1-2, It comprises a failure information creation unit 1-3 and a failure processing management unit 1-4.
The database server 12-25 includes an operation status database 1-5, a failure information database 1-6, a database management unit 12-14, and a fourth failure management communication unit. Further, the failure management communication unit 8-7, the second failure management communication unit 8-8, the third failure management communication unit 8-11, and the fourth failure management communication unit are respectively connected to the LAN cable 8-2.
4 are connected.

【０１６８】そして、この障害発生回避装置は、各クラ
イアント８−２２〜８−２３で、設定されるコンフィグ
レーションファイルに基づき、定期的にＣＰＵの負荷、
メモリの負荷、ネットワークの負荷、及び動作プロセス
等について観測し、得られた観測データを、障害管理サ
ーバ８−２１へ出力する。The fault occurrence avoiding device periodically determines the load on the CPU in each of the clients 8-22 to 8-23 based on the configuration file set.
The load on the memory, the load on the network, the operation process, and the like are observed, and the obtained observation data is output to the failure management server 8-21.

【０１６９】そして、障害管理サーバ８−２１は、入力
された観測データを、障害管理サーバ８−２１が有する
ＤＢサーバ指定ファイル１２−１３に基づき、データベ
ースサーバ１２−２５が有する動作状況データベース１
−５へ転送する。さらに、観測データが入力されたデー
タベースサーバ１２−２５は、入力された観測データを
データベースサーバ１２−２５が有する動作状況データ
ベース１−５に記録する。Then, the failure management server 8-21 converts the input observation data into the operation status database 1 of the database server 12-25 based on the DB server designation file 12-13 of the failure management server 8-21.
Transfer to -5. Further, the database server 12-25 to which the observation data has been input records the input observation data in the operation status database 1-5 of the database server 12-25.

【０１７０】そして、障害管理サーバ８−２１が有する
障害処理管理部１−４は、データベースサーバ１２−２
５が有する動作状況データベース１−５に記録された観
測データと、同じくデータベースサーバ１２−２５が有
する障害情報データベース１−６に記録された障害の発
生に関する障害パターンに示された観測データとを比較
し、２つの観測データに同様の傾向が見られるか否かを
判断する。[0170] The fault management section 1-4 of the fault management server 8-21 includes the database server 12-2.
5 is compared with the observation data recorded in the failure information database 1-6 of the database server 12-25, which is also shown in the failure pattern relating to the occurrence of the failure. Then, it is determined whether a similar tendency is observed in the two observation data.

【０１７１】この時、障害管理サーバ８−２１の障害処
理管理部１−４が動作状況データベース１−５に記録さ
れた観測データから障害の発生に関する傾向を見出した
場合、障害管理サーバ８−２１の障害処理管理部１−４
は障害情報データベース１−６に記録された障害パター
ンに示された発生しうる障害に対する回避方法を実行す
るように、該クライアントへ命令する。At this time, if the failure processing management section 1-4 of the failure management server 8-21 finds a tendency regarding the occurrence of a failure from the observation data recorded in the operation status database 1-5, the failure management server 8-21. Failure management unit 1-4
Commands the client to execute a method for avoiding a possible failure indicated by the failure pattern recorded in the failure information database 1-6.

【０１７２】障害管理サーバ８−２１から発生しうる障
害に対する回避方法を実行するように命じられたクライ
アントは、命じられた回避方法を実行する。また、障害
管理サーバ８−２１の障害処理管理部１−４で障害が発
生しうると判断した場合、動作異常検索部１−２は、動
作状況データベース１−５に記録されている観測データ
から、所定の期間の観測データを読み出す。The client instructed to execute the avoidance method for a failure that may occur from the failure management server 8-21 executes the instructed avoidance method. When the failure processing management unit 1-4 of the failure management server 8-21 determines that a failure can occur, the operation abnormality search unit 1-2 uses the observation data recorded in the operation status database 1-5 to determine , And reads the observation data for a predetermined period.

【０１７３】そして、障害情報作成部１−３が、検索指
定ファイルに基づき、動作異常検索部１−２が読み出し
た観測データから、障害の原因を確定するための所定の
条件に合致する観測データを抽出する。さらに、障害情
報作成部１−３は、抽出された観測データ、障害、及び
その障害に対する回避方法を関連づけ、新たな障害パタ
ーンを作成する。Then, based on the search specification file, the failure information creating unit 1-3 uses the observation data read by the operation abnormality search unit 1-2 to determine the observation data that matches a predetermined condition for determining the cause of the failure. Is extracted. Further, the fault information creating unit 1-3 creates a new fault pattern by associating the extracted observation data, the fault, and a method of avoiding the fault with each other.

【０１７４】この新たに作成された障害パターンは、障
害情報データベース１−６に記録されている既存の障害
パターンと比較され、これら２つの障害パターンに共通
した特徴的な部分に基づき作成される特徴的な障害パタ
ーンが作成され、障害情報データベース１−６に記録さ
れる。The newly created fault pattern is compared with an existing fault pattern recorded in the fault information database 1-6, and a feature created based on a characteristic portion common to these two fault patterns. A typical failure pattern is created and recorded in the failure information database 1-6.

【０１７５】このように、各クライアント８−２２〜８
−２３の動作状況監視部１−１で、設定されるコンフィ
グレーションファイルに基づき、ＣＰＵの負荷、メモリ
の負荷、ネットワークの負荷、及び動作プロセス等につ
いて観測するため、障害発生回避装置が監視する設備等
で起動するソフトウェアに障害が発生しそうになって
も、その障害の発生を未然に回避させることができる。As described above, each of the clients 8-22 to 8-8
The equipment monitored by the fault occurrence avoiding device to observe the CPU load, the memory load, the network load, the operation process, and the like based on the configuration file set by the operation status monitoring unit 1-1 of -23. Thus, even if a failure is likely to occur in the software to be activated, the failure can be prevented from occurring.

【０１７６】また、障害管理サーバ８−２１の障害処理
管理部１−４で発生しうる障害が検知された場合には、
障害情報データベース１−６に記録された該障害パター
ンに関連づけられた回避方法が自動的に実行され、工業
用コンピュータ等の連続運転の必要があるものに対して
利用することができる。When a fault which can occur in the fault management section 1-4 of the fault management server 8-21 is detected,
The avoidance method associated with the failure pattern recorded in the failure information database 1-6 is automatically executed, and can be used for those requiring continuous operation such as an industrial computer.

【０１７７】さらに、障害管理サーバ８−２１の障害処
理管理部１−４で発生しうる障害が検知された場合に
は、発生しうる障害を未然に回避するための処置が障害
情報データベース１−６に記録され、その処置の実行も
障害発生回避装置自体が行うため、障害発生回避装置の
管理者は新規に発生した障害に対する処置の方法を考え
るほうに重点を置くことが可能となり、障害発生回避装
置の管理者の負担を軽くできる。Further, when a possible failure is detected in the failure processing management section 1-4 of the failure management server 8-21, a measure for avoiding the possible failure is provided in the failure information database 1--1. 6, the execution of the measures is also performed by the fault occurrence avoiding device itself, so that the administrator of the fault occurrence avoiding device can focus on thinking about a method of dealing with a newly generated fault, and The burden on the administrator of the avoidance device can be reduced.

【０１７８】また、この障害発生回避装置が観測する観
測項目は、コンフィグレーションファイルの設定により
変化させることができるため、障害の発生を未然に発見
するために適切だと思われる項目を逐次変更させること
ができ、障害の発生に対する事細かな対応が可能にな
る。Since the observation items observed by the failure occurrence avoiding device can be changed by setting the configuration file, the items that are considered appropriate for detecting the occurrence of the failure are sequentially changed. And a detailed response to the occurrence of a failure becomes possible.

【０１７９】さらに、この障害発生回避装置が観測する
観測項目は、独自にコマンドを設定してコンフィグレー
ションファイルに設定することができるため、障害の発
生を未然に発見するために適切だと思われる項目を逐次
設定することができ、障害の発生に対する事細かな対応
が可能になる。Further, the observation items observed by the fault occurrence avoiding device can be set in the configuration file by setting a command independently, and thus it is considered appropriate to detect the occurrence of a fault beforehand. Items can be sequentially set, and detailed responses to the occurrence of failures can be made.

【０１８０】また、この障害発生回避装置が障害の発生
の危険性を判断する時、障害発生回避装置は検索指定フ
ァイルを参照し、この検索指定ファイルには、観測デー
タの変化量が設定される。このため、障害発生回避装置
は、観測する設備等の変化に対応でき、変化に応じた対
処を施すことができる。When the failure avoidance device determines the risk of occurrence of a failure, the failure occurrence avoidance device refers to a search specification file, and the change amount of observation data is set in the search specification file. . For this reason, the fault occurrence avoiding device can cope with a change in the equipment to be observed or the like, and can take a measure according to the change.

【０１８１】さらに、この実施形態３の障害発生回避装
置は、障害管理サーバ８−２１、複数のクライアント８
−２２〜８−２３、及びデータベースサーバ１２−２５
から構成されており、ひとつの障害管理サーバ８−２１
で複数のクライアント８−２２〜８−２３の動作状況を
監視し、複数のクライアントの障害の発生を未然に回避
することができる。Further, the fault occurrence avoiding device according to the third embodiment includes a fault management server 8-21, a plurality of clients 8
-22 to 8-23, and the database server 12-25
And one fault management server 8-21.
By monitoring the operation status of the plurality of clients 8-22 to 8-23, it is possible to avoid the occurrence of a failure in the plurality of clients.

【０１８２】また、この実施形態３の障害発生装置は、
障害管理サーバ８−２１、複数のクライアント８−２２
〜８−２３、及びデータベースサーバ１２−２５から構
成されており、ひとつのデータベースサーバ１２−２５
に複数のクライアント８−２２〜８−２３の動作状況デ
ータベース１−５、及び障害情報データベース１−６を
集約させることができ、装置の小型化を促進させる。Further, the failure generating device of the third embodiment
Fault management server 8-21, multiple clients 8-22
8-23 and a database server 12-25, and one database server 12-25.
The operation status database 1-5 and the failure information database 1-6 of the plurality of clients 8-22 to 8-23 can be consolidated, and the miniaturization of the device is promoted.

【０１８３】[0183]

【発明の効果】以上のように、この発明にかかる障害発
生回避装置は、複数の要素の動作状況を観測する動作状
況観測手段、動作状況観測手段に接続され、動作状況観
測手段が観測した観測データが記録される動作状況デー
タベース、障害と、この障害を発生させる複数の要素の
動作状況の傾向と、障害の発生を回避するための回避方
法とを関連づけて記録した障害情報データベース、動作
状況データベースと障害情報データベースとに接続さ
れ、動作状況データベースに記録された観測データと障
害情報データベースに記録された動作状況の傾向とを比
較し、動作状況に関連した障害を判断する障害判断手
段、及び、障害判断手段と障害情報データベースとに接
続され、障害判断手段により判断された障害に関連づけ
られて障害情報データベースに記録されている回避方法
に基づく動作処理を行なう障害回避手段、を備え、動作
状況観測手段で複数の要素を観測して得られた観測デー
タと、障害情報データベースに記録されている障害を発
生させる複数の要素の少なくともひとつの動作状況の傾
向とを比較し、動作状況に関連した障害の発生を判断す
る。そして、障害が発生しうると判断された場合には、
判断された障害に対して障害情報データベースに設定さ
れた障害の回避方法を実行する。このため、障害の発生
を未然に回避させることができ、動作状況が観測されて
いる装置の連続運転が可能となる。As described above, the fault occurrence avoiding device according to the present invention is connected to the operation status observation means for observing the operation status of a plurality of elements, An operation status database in which data is recorded, a failure information database, an operation status database in which a failure, a tendency of the operation status of a plurality of elements causing the failure, and an avoidance method for avoiding the occurrence of the failure are recorded. And a failure information database, which is connected to the failure information database, compares the observation data recorded in the operation state database with the tendency of the operation state recorded in the failure information database, and determines a failure related to the operation state, and The failure information data is connected to the failure determination means and the failure information database, and is associated with the failure determined by the failure determination means. Fault observing means for performing an operation process based on the avoiding method recorded in the fault information database. Observation data obtained by observing a plurality of elements by the operation status observing means, and a fault recorded in the fault information database. Is compared with the tendency of at least one operation state of a plurality of elements that cause the occurrence of a failure, and the occurrence of a failure related to the operation state is determined. If it is determined that a failure may occur,
The failure avoidance method set in the failure information database is executed for the determined failure. For this reason, the occurrence of a failure can be avoided beforehand, and continuous operation of the device whose operation status is observed becomes possible.

【０１８４】また、この発明にかかる障害発生回避装置
は、複数の要素の動作状況を観測し、観測して得られた
観測データを出力する動作状況観測手段と、及び、入力
された動作処理命令に基づき、障害の回避処理を行なう
障害回避手段と、を有する第一の計算機、第一の計算機
が有する動作状況観測手段から出力された観測データが
入力され記録される動作状況データベースと、障害及び
この障害を発生させる複数の要素の動作状況の傾向及び
障害の発生を回避するための回避方法を関連づけて記録
した障害情報データベースと、動作状況データベースと
障害情報データベースとに接続され、動作状況データベ
ースに記録された観測データと障害情報データベースに
記録された動作状況の傾向とを比較し、動作状況に関連
した障害を判断された障害に関連づけられて障害情報デ
ータベースに記録されている回避方法に基づく動作処理
を行わせるように命令する動作処理命令を第一の計算機
が有する障害回避手段へ出力する障害判断手段と、を有
する第二の計算機、を備え、第二の計算機の動作状況デ
ータベースに記録された第一の計算機の動作状況観測手
段で観測して得られた観測データと、第二の計算機の障
害情報データベースに記録されている障害を発生させる
動作状況の傾向とを比較し、動作状況に関連した障害の
発生を判断する。そして、第二の計算機で障害が発生し
うると判断された場合には、判断された障害に対して第
二の計算機の障害情報データベースに設定された障害の
回避方法を、第一の計算機は、実行する。このため、第
一の計算機の障害の発生を未然に回避させることがで
き、動作状況が観測されている第一の計算機の連続運転
が可能となる。Further, the fault occurrence avoiding apparatus according to the present invention observes the operation status of a plurality of elements, outputs the observation data obtained by the observation, and the input operation processing instruction. A first computer having failure avoidance means for performing a failure avoidance process based on the operation state database in which observation data output from the operation state observation means of the first computer is inputted and recorded; and A failure information database that records the tendency of the operation status of a plurality of elements causing the failure and an avoidance method for avoiding the occurrence of the failure in association with the operation status database and the failure information database, and is connected to the operation status database. The recorded observation data is compared with the trends in the operating status recorded in the fault information database to determine faults related to the operating status. And a failure determination unit that outputs an operation processing instruction for performing an operation process based on the avoidance method recorded in the failure information database in association with the failure, to the failure avoidance unit of the first computer. A second computer, and the observation data obtained by the operation state observation means of the first computer recorded in the operation state database of the second computer, and recorded in the failure information database of the second computer Then, the occurrence of a failure related to the operation state is determined by comparing the operation state with the tendency of the operation state that causes the failure. Then, when it is determined that a failure can occur in the second computer, the first computer uses the failure avoidance method set in the failure information database of the second computer for the determined failure. ,Run. For this reason, it is possible to prevent the failure of the first computer from occurring, and it is possible to continuously operate the first computer in which the operation status is observed.

【０１８５】さらに、この発明にかかる障害発生回避装
置は、動作状況観測手段が観測する複数の要素をコンフ
ィグレーションファイルに設定された複数の要素とし、
このコンフィグレーションファイルに設定される複数の
要素は、変更させることができるため、動作状況が観測
される装置に発生しうる障害の原因を的確に捉えること
が可能になり、発生しうる障害に対する詳細な対策を施
すことができる。Further, in the fault occurrence avoiding apparatus according to the present invention, the plurality of elements observed by the operation status observing means are a plurality of elements set in the configuration file,
Since multiple elements set in this configuration file can be changed, it is possible to accurately understand the cause of a failure that may occur in the device whose operation status is observed, and to provide details on the possible failure. Measures can be taken.

【０１８６】また、この発明にかかる障害発生回避装置
は、動作状況データベースに接続され、動作状況データ
ベースに記録された観測データから、所定の動作状況の
傾向を示す観測データを抽出する動作異常検索手段と、
動作異常検索手段及び障害情報データベースに接続さ
れ、動作異常検索手段が抽出した観測データ、所定の動
作状況の傾向を示す障害、及び障害を回避するための回
避方法とを関連づけて障害情報データベースに記録する
障害情報作成手段と、を備え、障害情報データベースに
記録されているある障害と類似の所定の動作状況の傾向
が動作状況観測手段から観測された場合、動作状況デー
タベースに記録された観測データを用いて新たな動作状
況の傾向を明らかにし、類似の動作状況の傾向を示すあ
る障害及びその障害を回避するための回避方法とを関連
づけて、新たに障害情報データベースに記録する。この
ため、障害情報データベースに記録されているそれまで
の障害の動作状況の傾向に類似した動作状況の傾向が発
生した場合は、新たに観測された動作状況の傾向、障
害、及びその障害の回避方法とが関連づけられ障害情報
データベースに記録され、回避できる障害を次第に増加
させることができ、運用が進むに従い信頼性が高まる。Further, the fault occurrence avoiding device according to the present invention is connected to an operation status database, and extracts operation data indicating a tendency of a predetermined operation status from observation data recorded in the operation status database. When,
It is connected to the operation abnormality search means and the failure information database, and records in the failure information database the observation data extracted by the operation abnormality search means, the failure indicating the tendency of the predetermined operation status, and the avoidance method for avoiding the failure in association with each other. Fault information creating means, and when a tendency of a predetermined operating situation similar to a certain fault recorded in the fault information database is observed from the operating situation observing means, the observation data recorded in the operating situation database is A new tendency of the operating situation is clarified by using the information, and a fault having a similar tendency of the operating situation and a method of avoiding the fault are associated with each other and newly recorded in the fault information database. For this reason, when a tendency of the operating situation similar to the trend of the operating situation of the fault recorded up to that time recorded in the fault information database occurs, the newly observed tendency of the operating situation, the fault, and the avoidance of the fault. The methods are associated with each other and recorded in the fault information database, and the number of faults that can be avoided can be gradually increased. As the operation progresses, the reliability increases.

【０１８７】さらに、この発明にかかる障害発生回避装
置は、動作異常検索手段が抽出する観測データを検索指
定ファイルに設定された複数の要素とし、この検索指定
ファイルに設定される複数の要素は、変更させることが
できるため、動作状況が観測される装置に発生しうる障
害の原因を的確に捉えることが可能になり、発生しうる
障害に対する詳細な対策を施すことができる。Further, the fault occurrence avoiding apparatus according to the present invention uses observation data extracted by the operation abnormality search means as a plurality of elements set in a search specification file, and the plurality of elements set in the search specification file include: Since the change can be made, it is possible to accurately grasp the cause of a failure that may occur in the device in which the operation status is observed, and to take a detailed countermeasure against the possible failure.

[Brief description of the drawings]

【図１】実施形態１の障害発生回避装置を示す構成図
である。FIG. 1 is a configuration diagram illustrating a failure occurrence avoiding device according to a first embodiment.

【図２】実施形態１の障害発生回避装置の動作を示す
フローチャートである。FIG. 2 is a flowchart illustrating an operation of the failure occurrence avoidance device according to the first embodiment.

【図３】実施形態１の障害発生回避装置の動作状況に
対する記録内容、及び記録期間を設定するコンフィグレ
ーションファイルの概念図である。FIG. 3 is a conceptual diagram of a configuration file for setting a recording content and a recording period with respect to an operation state of the failure occurrence avoidance device of the first embodiment.

【図４】実施形態１の障害発生回避装置の動作状況に
対する記録内容、及び記録期間を設定する別のコンフィ
グレーションファイルの概念図である。FIG. 4 is a conceptual diagram of another configuration file for setting a recording content and a recording period with respect to an operation status of the failure occurrence avoidance device of the first embodiment.

【図５】実施形態１の障害発生回避装置が有する動作
状況データベースに記録された情報から検索する内容が
設定される検索指定ファイルの概念図である。FIG. 5 is a conceptual diagram of a search specification file in which content to be searched is set from information recorded in an operation status database included in the failure occurrence avoidance device of the first embodiment.

【図６】実施形態１の障害発生回避装置が有する動作
状況データベースの概念図である。FIG. 6 is a conceptual diagram of an operation status database included in the failure occurrence avoidance device of the first embodiment.

【図７】実施形態１の障害発生回避装置が有する障害
情報データベースの概念図である。FIG. 7 is a conceptual diagram of a failure information database included in the failure occurrence avoidance device of the first embodiment.

【図８】実施形態２の障害発生回避装置を示す構成図
である。FIG. 8 is a configuration diagram illustrating a failure occurrence avoidance device according to a second embodiment.

【図９】実施形態２の障害発生回避装置の障害管理サ
ーバが有する管理サーバ指定ファイル、及び実施形態２
の障害発生回避装置のクライアントが有する障害対象指
定ファイルの概念図である。FIG. 9 is a management server designation file included in a fault management server of the fault occurrence avoidance apparatus according to the second embodiment, and the second embodiment.
FIG. 3 is a conceptual diagram of a failure target designation file possessed by a client of the failure occurrence avoidance apparatus of FIG.

【図１０】実施形態２の障害発生回避装置に、障害情
報データベースに記録されている既存の障害が発生しつ
つある時の動作を示すフローチャートである。FIG. 10 is a flowchart illustrating an operation when an existing failure recorded in a failure information database is occurring in the failure occurrence avoidance device of the second embodiment.

【図１１】実施形態２の障害発生回避装置に、障害情
報データベースに記録されていない新しい障害が発生し
つつある時の動作を示すフローチャートである。FIG. 11 is a flowchart illustrating an operation when a new failure not recorded in the failure information database is occurring in the failure occurrence avoidance device of the second embodiment.

【図１２】実施形態３の障害発生回避装置を示す構成
図である。FIG. 12 is a configuration diagram illustrating a failure occurrence avoiding device according to a third embodiment.

【図１３】実施形態３の障害発生回避装置が有する動
作状況データベースの概念図である。FIG. 13 is a conceptual diagram of an operation status database included in the failure occurrence avoidance device of the third embodiment.

【図１４】実施形態３の障害発生回避装置が有する障
害情報データベースの概念図である。FIG. 14 is a conceptual diagram of a failure information database included in the failure occurrence avoidance device according to the third embodiment.

【図１５】特開平４−１６１８２３に示される従来例FIG. 15 is a conventional example disclosed in JP-A-4-161823.

【図１６】特開平６−１７−３８８６に示される従来
例FIG. 16 shows a conventional example disclosed in JP-A-6-17-3886.

【図１７】特開平４−３１０１６０に示される従来例FIG. 17 shows a conventional example disclosed in Japanese Patent Application Laid-Open No. 4-310160.

[Explanation of symbols]

１− １動作状況監視部、１− ２動作異常検索部１− ３障害情報作成部、１− ４障害処理管理部１− ５動作状況データベース、１− ６障害情報
データベース１− ７障害発生回避装置３− １コンフィグレーションファイルの書式（その
１）３− ２コンフィグレーションファイルの書式（その
２）３− ３コンフィグレーションファイルの書式（その
３）４− １別のコンフィグレーションファイルの書式
（その１）４− ２別のコンフィグレーションファイルの書式
（その２）４− ３別のコンフィグレーションファイルの書式
（その３）６− １観測項目一覧表、６− ２動作プロセステ
ーブル６− ３ディスク容量テーブル７− １障害パターン一覧表、７− ２障害テーブ
ル７− ３処理テーブル８− ７障害管理通信部、８− ８第二の障害管理
通信部８− ９第二の障害管理部、８−１０第二の動作状
況データベース８−１１第三の障害管理通信部、８−１２第三の障
害管理部８−１３第三の動作状況データベース８−２０障害管理部、８−２１障害管理サーバ８−２２第一のクライアント、８−２３第二のクラ
イアント８−２４ＬＡＮケーブル９− １サーバ指定ファイル、９− ２障害対象指
定ファイル１２−１３ＤＢサーバ指定ファイル、１２−１４デ
ータベース管理部１２−１５第四の障害管理通信部１２−２５データベースサーバ１３− １情報の項目、１３− ２動作プロセステ
ーブル１３− ３ディスク容量テーブル１４− １情報の項目１５− １設備１５− ２監視部１５− ３診断部１５− ４データベース１６− １〜１６− ５通常業務用ＬＡＮアダプタ１６− ６〜１６−１０バックアップ用ＬＡＮアダプ
タ１６−１１〜１６−１４現用ＣＰＵ１６−１５バックアップ用ＣＰＵ、１６−１６通常
業務用ＬＡＮ１６−１７バックアップ用ＬＡＮ１７− １入力部、１７− ２出力部１７− ３推論部１７− ４ネットワーク構成データベース１７− ５障害データベース、１７− ６エディタ1-1 Operation state monitoring unit, 1-2 Operation abnormality search unit 1-3 Failure information creation unit, 1-4 Failure processing management unit 1-5 Operation state database, 1-6 Failure information database 1-7 Failure occurrence avoidance device 3-1 Format of Configuration File (Part 1) 3-2 Format of Configuration File (Part 2) 3-3 Format of Configuration File (Part 3) 4-1 Format of Another Configuration File (Part 1) 4-2 Format of Another Configuration File (Part 2) 4-3 Format of Another Configuration File (Part 3) 6-1 Observation Item List, 6-2 Operation Process Table 6-3 Disk Capacity Table 7- 1 failure pattern list, 7-2 failure table 7-3 processing table 8-7 failure management communication unit, -8 second fault management communication section 8-9 second fault management section, 8-10 second operation status database 8-11 third fault management communication section, 8-12 third fault management section 8- 13 Third operation status database 8-20 Failure management unit, 8-21 Failure management server 8-22 First client, 8-23 Second client 8-24 LAN cable 9-1 Server designation file, 9-1 Failure target specification file 12-13 DB server specification file, 12-14 Database management unit 12-15 Fourth failure management communication unit 12-25 Database server 13-1 Information items, 13-2 Operation process table 13-3 Disk Capacity table 14-1 Information items 15-1 Equipment 15-2 Monitoring unit 15-3 Diagnostic unit 15-4 Database 16-1 to 16 -5 LAN adapter for normal business 16-6 to 16-10 LAN adapter for backup 16-11 to 16-14 Active CPU 16-15 CPU for backup, 16-16 LAN for normal business 16-17 LAN for backup 17-1 Input unit, 17-2 Output unit 17-3 Inference unit 17-4 Network configuration database 17-5 Failure database, 17-6 Editor

Claims

[Claims]

An operation status observing means for observing the operation status of a plurality of elements, an operation status database connected to the operation status observing means and recording observation data observed by the operation status observing means, a fault, A failure information database that records the tendency of the operation states of the plurality of elements causing the failure and an avoidance method for avoiding the occurrence of the failure, and is connected to the operation state database and the failure information database; A failure determination unit that compares the observation data recorded in the operation status database with the tendency of the operation status recorded in the failure information database, and determines a failure related to the operation status; and the failure determination unit. And the failure information database, and is associated with the failure determined by the failure determination means. Failure avoidance means for performing an operation processing based on the workaround recorded in the information database,
A failure occurrence avoidance device comprising:

2. An operation status observing means for observing the operation status of a plurality of elements and outputting observation data obtained by the observation, and a fault for performing a fault avoiding process based on the input operation processing instruction. A first computer having avoidance means, an operation state database in which the observation data output from the operation state observation means of the first computer is inputted and recorded, and a plurality of failures and the plurality of failure occurrences. A failure information database that records the tendency of the operation status of the elements and the avoidance method for avoiding the occurrence of the failure in association with the operation status database and the failure information database, and is recorded in the operation status database. Comparing the observed data with the tendency of the operating status recorded in the fault information database, and determining the fault related to the operating status. Outputting the operation processing command for performing an operation process based on the avoidance method recorded in the failure information database in association with the disconnected failure to the failure avoiding means of the first computer And a second computer having a failure determination unit that performs the failure determination.

3. The failure occurrence avoiding device according to claim 1, wherein the plurality of elements observed by the operation status observing means are a plurality of elements set in a configuration file.

4. An operation abnormality search means connected to the operation state database and extracting the observation data indicating a predetermined tendency of the operation state from the observation data recorded in the operation state database; It is connected to the failure information database and records the observation data extracted by the operation abnormality search means, the failure indicating the tendency of the predetermined operation situation, and the avoidance method for avoiding the failure in the failure information database in association with each other. The fault occurrence avoiding device according to any one of claims 1 to 3, further comprising fault information creating means.

5. The fault occurrence avoiding apparatus according to claim 4, wherein the observation data extracted by the operation abnormality search means is a plurality of elements set in the search specification file.