JP2008191839A

JP2008191839A - Abnormality sign detection system

Info

Publication number: JP2008191839A
Application number: JP2007024207A
Authority: JP
Inventors: Akira Sasaki; 朗佐々木; Yoshiyuki Hirakawa; 喜之平川; Hiroyuki Koga; 弘之小賀
Original assignee: Hitachi Electronics Services Co Ltd
Current assignee: Hitachi Electronics Services Co Ltd
Priority date: 2007-02-02
Filing date: 2007-02-02
Publication date: 2008-08-21
Anticipated expiration: 2027-02-02
Also published as: JP4892367B2

Abstract

PROBLEM TO BE SOLVED: To provide an abnormality sign detection system of a computer for surely detecting an abnormal behavior which cannot be fully captured by failure monitoring using a threshold, that is, the sign of abnormality which has not yet become any failure, and for preventing erroneous detection such as monthly processing or term-end processing by grasping a cycle of a change pattern of operation data, and for reducing erroneous detection by automatically learning a prediction mistake. SOLUTION: This abnormality sign detection system is provided with a monitoring system 10 for monitoring a connected customer computer 3 and collecting operation data. The system normalizes the past operation data of a customer computer 3, obtains a change pattern of the operation data and the cycle of the change pattern, grasps the obtained change pattern and a cycle as a usual state of the customer computer 3, and detects an abnormal state which has not yet become any failure as an abnormality sign as a result of the comparison of the change pattern with the current change pattern. COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、コンピュータのシステム異常の兆候を検出するシステムに関し、特にコンピュータにおける障害に至っていない異常の兆候を確実に検出することができるコンピュータの異常兆候検出システムに関する。 The present invention relates to a system for detecting a sign of a computer system abnormality, and more particularly to a computer abnormality sign detection system capable of reliably detecting a sign of an abnormality that has not led to a failure in the computer.

コンピュータを使用する業務では、コンピュータにおける異常検出を、以下の方法で実現しているために、コンピュータに異常が発生した後の検出となってしまう。
（１）コンピュータ業務からの異常メッセージが合った場合
（２）コンピュータ業務が異常終了した場合
（３）リソースおよびコンピュータ業務のしきい値（例えば利用率、業務遅延等）が所定値を越えたことによる異常を検出した場合 In a business using a computer, since the abnormality detection in the computer is realized by the following method, the detection is performed after the abnormality occurs in the computer.
(1) When an abnormal message from the computer business is matched (2) When the computer business is terminated abnormally (3) The thresholds of resources and computer business (for example, utilization rate, business delay, etc.) exceed predetermined values When an abnormality is detected

これに対して、コンピュータのシステム異常の兆候を検出する技術として、特許文献１がある。この技術では、過去に発生した故障事例をグラフ化した故障事例グラフデータとしてデータベース（ＤＢ）に蓄積して、現に発生しつつある故障状況を同じくグラフ化した故障発生状況グラフデータを用いてこのＤＢにアクセスする。これにより、過去に発生した類似の故障事例に対して有効であった対応策を検討して、現在の故障状況への活用を図るものである。
特開２００５−２２２３７７号公報 On the other hand, Patent Document 1 discloses a technique for detecting a sign of a computer system abnormality. In this technology, failure cases that have occurred in the past are stored in a database (DB) as failure case graph data that is graphed, and this DB is created using failure occurrence state graph data that is also graphed in the same way. To access. As a result, measures that have been effective against similar failure cases that have occurred in the past are studied and utilized in the current failure situation.
JP 2005-222377 A

ところが、上記の特許文献１の技術では、以下の課題がある。
（１）過去の障害発生時のデータのみをグラフ化しているため、コンピュータのシステムが障害に至っていない異常動作については検出することができない。
（２）異なるお客様システムのグラフデータを比較に使用するために、正常動作を障害予兆として誤検出する恐れがある。
（３）システムの処理周期を意識していないために、月次処理や期末処理など比較的長い間隔をあけて実行する処理によるグラフデータの変化を障害予兆として誤検出する恐れがある。 However, the technique disclosed in Patent Document 1 has the following problems.
(1) Since only data at the time of the occurrence of a failure in the past is graphed, it is not possible to detect an abnormal operation in which the computer system has not failed.
(2) Since the graph data of different customer systems are used for comparison, there is a possibility that normal operation is erroneously detected as a sign of failure.
(3) Since the system processing cycle is not conscious, a change in graph data due to processing executed at relatively long intervals such as monthly processing or period-end processing may be erroneously detected as a failure sign.

本発明は、しきい値を使った障害監視では捉えきれないコンピュータの異常挙動、すなわちコンピュータの障害に至っていない異常の兆候を確実に検出することができ、また、稼働データの変化パターンの周期を捉えることにより、月次処理や期末処理などの誤検出しなくなり、予測ミスを自動的に学習することにより、誤検出を減らすことができる異常兆候検出システムを提供することを目的とする。 The present invention can reliably detect abnormal behavior of a computer that cannot be detected by failure monitoring using a threshold value, that is, a sign of an abnormality that has not led to a computer failure. It is an object of the present invention to provide an abnormal sign detection system that can reduce erroneous detection by automatically learning misprediction by preventing misdetection such as monthly processing and period-end processing.

本発明は、接続した顧客コンピュータを監視し稼働データを収集する監視システムを備え、前記監視システムは、前記顧客コンピュータの過去の稼働データを正規化し、前記稼働データの変化パターンと前記変化パターンの周期を求め、求めた変化パターンと前記周期を前記顧客コンピュータのいつもの状態として捉え、前記変化パターンと現在の変化パターンとを比較して、障害には至っていない異常状態を異常兆候として検出する異常兆候検出システムである。 The present invention includes a monitoring system that monitors a connected customer computer and collects operation data, the monitoring system normalizes past operation data of the customer computer, and changes the operation data change pattern and the change pattern cycle. An abnormal sign that detects the abnormal state that has not led to a fault as an abnormal sign by comparing the obtained change pattern and the cycle as the usual state of the customer computer and comparing the change pattern with the current change pattern. It is a detection system.

また、本発明は、前記稼働データの一部を、抽出サイズを伸張させながら切り出して、繰り返しの前記変化パターンを自動的に検出して、前記周期を自動的に検出する異常兆候検出システムである。 Further, the present invention is an abnormal sign detection system in which a part of the operation data is cut out while extending an extraction size, the repeated change pattern is automatically detected, and the cycle is automatically detected. .

そして、本発明は、前記周期が複数存在する場合には、前記周期ごとに前記変化パターンを求めて変化パターンデータベースに保持しておく異常兆候検出システムである。 And this invention is an abnormal sign detection system which calculates | requires the said change pattern for every said period and hold | maintains it in a change pattern database, when the said period exists in multiple numbers.

更に、本発明は、前記変化パターンを作成する際には、３〜１０周期前までの前記稼働データの平均を用いる異常兆候検出システムである。 Furthermore, the present invention is an abnormal sign detection system that uses an average of the operation data from 3 to 10 cycles before creating the change pattern.

また、本発明は、前記現在の変化パターンに類似する類似パターンを過去の前記稼働データから探し出して、その直後に発生した障害情報を示すことで、近く発生しうる障害を予想して障害予測結果として通報する異常兆候検出システムである。 In addition, the present invention searches for a similar pattern similar to the current change pattern from the past operation data and indicates failure information that has occurred immediately thereafter, thereby predicting a failure that may occur in the near future, and a failure prediction result It is an abnormal sign detection system to report as.

そして、本発明は、前記障害予測後に実際には障害が発生しなかった場合に、前記変化パターンは誤検出変化パターンとして例外パターンのデータベースに登録して、以後前記誤検出変化パターンを用いて前記異常状態の誤検出を防止する異常兆候検出システムである。 In the present invention, when a failure does not actually occur after the failure prediction, the change pattern is registered in the exception pattern database as a false detection change pattern, and thereafter the false detection change pattern is used to store the change pattern. This is an abnormal sign detection system that prevents erroneous detection of an abnormal state.

本発明によれば、しきい値を使った障害監視では捉えきれない異常挙動、すなわち障害に至っていない異常の兆候を確実に検出することができ、また、月次処理や期末処理などの誤検出しなくなり、そして、誤検出を減らすことができる。 According to the present invention, it is possible to reliably detect abnormal behavior that cannot be detected by failure monitoring using a threshold value, that is, an abnormality sign that has not led to a failure, and erroneous detection such as monthly processing or period-end processing. And false positives can be reduced.

本発明を実施するための最良の形態を説明する。
本発明の異常兆候検出システムの実施例について、図面を用いて説明する。図１は、本発明の異常兆候検出システムの実施例を示している。 The best mode for carrying out the present invention will be described.
Embodiments of the abnormality sign detection system of the present invention will be described with reference to the drawings. FIG. 1 shows an embodiment of the abnormal sign detection system of the present invention.

図１では、異常兆候検出対象であるお客様（顧客）システム（コンピュータシステム）１と、保守サービス会社２側の監視システム１０を示しており、監視システム１０は、お客様システムを監視し異常兆候を検出する。お客様システム１は、例えば複数のお客様コンピュータ３を有している。お客様コンピュータ３からは、監視システム１０の情報収集部１１に対し、稼働データなどの情報を供給する。お客様コンピュータ３と情報収集部１１との接続は、直接的な接続であっても良いし、インターネットなどの通信網を通じて接続してあっても良い。 FIG. 1 shows a customer (customer) system (computer system) 1 that is a detection target of abnormal signs and a monitoring system 10 on the maintenance service company 2 side. The monitoring system 10 monitors the customer system and detects abnormal signs. To do. The customer system 1 has, for example, a plurality of customer computers 3. The customer computer 3 supplies information such as operation data to the information collecting unit 11 of the monitoring system 10. The connection between the customer computer 3 and the information collecting unit 11 may be a direct connection or may be connected through a communication network such as the Internet.

保守サービス会社２に設置された監視システム１０は、情報収集部１１と、ＤＢ（データベース）１２と、管理部１３を有する。情報収集部１１が得たお客様コンピュータ３の稼働データなどの情報は、ＤＢ１２に送られる。ＤＢ１２は、障害データ２０、稼働データ２１、周期データ２２及び例外パターン２３を格納する。管理部１３は、障害監視部３０、異常兆候監視部３１及び過去の障害の照合部３２を有している。 The monitoring system 10 installed in the maintenance service company 2 includes an information collection unit 11, a DB (database) 12, and a management unit 13. Information such as operation data of the customer computer 3 obtained by the information collecting unit 11 is sent to the DB 12. The DB 12 stores failure data 20, operation data 21, period data 22, and an exception pattern 23. The management unit 13 includes a failure monitoring unit 30, an abnormal sign monitoring unit 31, and a past failure matching unit 32.

監視システム１０では、通常行われている検出したしきい値を使った障害監視では捉えきれない異常挙動、すなわち障害に至っていない異常の兆候を検出し、例えば電子メール３９により管理者５に対し、お客様コンピュータ３の異常兆候の通知４０、予測事象の通知４１として通知する。管理者５は、お客様コンピュータ３に異常が生じる前に事前対応処置４２を見出すことができるようになっている。これにより、管理者５はお客様コンピュータ３の業務が停止もしくは処理に異常が発生する前に異常の兆候を掴んで、事前に対応することができる。図４は、稼働データＤＢ２１の稼働データ２１ＤＴの変化例を示しており、月単位と週単位で示している。 The monitoring system 10 detects an abnormal behavior that cannot be detected by the failure monitoring using the detected threshold value that is normally performed, that is, an abnormality sign that has not led to the failure. The client computer 3 is notified as an abnormal sign notification 40 and a predicted event notification 41. The administrator 5 can find the proactive action 42 before an abnormality occurs in the customer computer 3. As a result, the administrator 5 can grasp the sign of abnormality before the operation of the customer computer 3 is stopped or the abnormality occurs in the processing, and can respond in advance. FIG. 4 shows an example of change in the operation data 21DT of the operation data DB 21, and shows the units in months and weeks.

次に、本実施例におけるお客様コンピュータ３の異常の兆候の検出手順の一例について説明する。図２は、本実施例におけるお客様コンピュータ３の異常の兆候の検出手順のフローである。この検出手順は、ステップＳ１からＳ１３を有している。ステップＳ１で、監視システム１０の情報収集部１１がお客様コンピュータ３の稼働データ２１ＤＴを受診するまで、管理部１３は待機している。ステップＳ２で、障害監視部３０は、お客様コンピュータ３の稼働データ２１ＤＴから、繰り返して表されるパターンの「種類」と「周期」を求める。この繰り返して表されるパターンの「種類」と「周期」は、具体例１として図５に例示している。 Next, an example of a procedure for detecting an abnormality sign of the customer computer 3 in the present embodiment will be described. FIG. 2 is a flow of a procedure for detecting a sign of abnormality of the customer computer 3 in the present embodiment. This detection procedure includes steps S1 to S13. In step S1, the management unit 13 is on standby until the information collection unit 11 of the monitoring system 10 receives the operation data 21DT of the customer computer 3. In step S 2, the failure monitoring unit 30 obtains the “type” and “cycle” of the pattern represented repeatedly from the operation data 21 DT of the customer computer 3. The “type” and “period” of the repeated pattern are illustrated in FIG.

図５（Ａ）は、障害監視部３０が、稼働データ２１ＤＴの正規化の例を示しており、稼働データ２１ＤＴを正規化することで、正規化された稼働データを得て、数字の微妙な揺らぎを丸め込み、コンピュータで比較し易い形に変換する。この正規化された稼働データの正規化の粒度（丸め込み）は、狭い周期のパターンでは細かく、広い周期のパターンでは荒くすることで、マッチング精度を調整する。図５（Ｂ）は、障害監視部３０が行う変化パターンの自動検出の例を示しており、稼働データの一部を、抽出サイズを伸張させながら切り出して、繰り返しパターンを自動的に検出する。稼働データの抽出サイズ（１）〜（３）は、抽出サイズ（４）の一部なので、繰り返しパターンと見なさない。抽出サイズ（５）は抽出サイズ（４）の２回繰り返しなので、繰り返しパターンとしては、抽出サイズ（４）を採用して、さらに広い範囲の繰り返しパターン検出を続ける。 FIG. 5A illustrates an example in which the failure monitoring unit 30 normalizes the operation data 21DT. By normalizing the operation data 21DT, the normalized operation data is obtained, and the numerical subtlety is obtained. The fluctuation is rounded and converted into a form that can be easily compared with a computer. The normalization granularity (rounding) of the normalized operation data is fine for a narrow cycle pattern and rough for a wide cycle pattern, thereby adjusting the matching accuracy. FIG. 5B shows an example of a change pattern automatic detection performed by the failure monitoring unit 30. A part of the operation data is cut out while expanding the extraction size, and the repeated pattern is automatically detected. The operation data extraction sizes (1) to (3) are a part of the extraction size (4), and thus are not regarded as repetitive patterns. Since the extraction size (5) is repeated twice the extraction size (4), the extraction size (4) is adopted as the repeated pattern, and the detection of the repeated pattern in a wider range is continued.

図５（Ｃ）は、障害監視部３０が一定の間隔で繰り返されるパターンと周期を求める例を示しており、このとき、例えば過去３〜１０周期前までの比較的新しい情報を使うことで、稼働状況の緩やかな変化に追従できるようになる。図５（Ｄ）は７日周期パターンの例と、３０日周期パターンの例を示しており、これらの繰り返しパターンは周期データ２２として登録される。 FIG. 5C shows an example in which the failure monitoring unit 30 obtains a pattern and a cycle that are repeated at regular intervals. At this time, for example, by using relatively new information from the previous 3 to 10 cycles, It will be possible to follow a gradual change in operating conditions. FIG. 5D shows an example of a 7-day cycle pattern and an example of a 30-day cycle pattern, and these repeated patterns are registered as cycle data 22.

図２のステップＳ３に戻ると、現状の稼働データをパターン化して、各繰り返しパターンの全てと、図６に示す具体例２のようにして照合する。すなわち、図６に示す現状の稼働状況を正規化して正規化データを作成し、正規化データと周期データの例えば７日周期パターン４７とを比較する。この比較の結果、マッチするものがある場合には、異常兆候監視部３１はステップＳ４においていつもと同じであると判断して、ステップＳ１に戻る。そうでなく、この比較の結果１つもマッチしない、すなわち正規化データに異常動作パターンがある場合には、図１の異常兆候監視部３１は、異常と判断してステップＳ５に移る。 Returning to step S3 in FIG. 2, the current operation data is patterned and collated with all the repeated patterns as in the second specific example shown in FIG. That is, the current operation status shown in FIG. 6 is normalized to create normalized data, and the normalized data is compared with, for example, the 7-day cycle pattern 47 of the cycle data. If there is a match as a result of the comparison, the abnormality sign monitoring unit 31 determines that it is the same as usual in step S4, and returns to step S1. Otherwise, if none of the comparison results in matching, that is, if there is an abnormal operation pattern in the normalized data, the abnormal sign monitoring unit 31 in FIG. 1 determines that there is an abnormality, and proceeds to step S5.

ステップＳ５では、現状の稼働データの正規化データのパターンを、図７に示す具体例３の要領で図１の例外パターン２３と照合する。正規化データと例外パターン２３を照合し、マッチするものがあれば、ステップＳ６において過去に誤検出した正常パターンであるとして、現状の稼働データは異常ではないと判断してステップＳ１に戻る。そうでなく、ステップＳ５において正規化データのパターンと例外パターン２３とが１つもマッチしない場合には、現状の稼働データは異常動作パターンであるとしてステップＳ７に移る。 In step S5, the normalization data pattern of the current operation data is collated with the exception pattern 23 of FIG. 1 in the manner of the specific example 3 shown in FIG. The normalized data and the exception pattern 23 are collated, and if there is a match, it is determined that the normal pattern has been erroneously detected in the past in step S6 and the current operation data is not abnormal, and the process returns to step S1. Otherwise, if there is no match between the normalized data pattern and the exception pattern 23 in step S5, the current operation data is regarded as an abnormal operation pattern, and the process proceeds to step S7.

ステップＳ７では、図８の具体例４で示すように、図１の過去の障害の照合部３２は、過去の稼働データ２１と現在の障害データＤＢ２０の障害データの中から、現在の障害データのパターンに類似した類似パターンを探す。ステップＳ８において、過去に類似した障害データ２０のパターンと障害発生記録があると、ステップＳ１０に移り、お客様コンピュータ３で近く発生するおそれがある障害内容と過去の対処方法と、障害発生する予定時刻を、障害予測メール３９として管理者５に通知してステップＳ１１に進む。ステップＳ７において、現在の障害データ２０は過去に類似パターンそのものがないか、または類似パターンはあるが障害は発生していない場合には、ステップＳ９において、お客様コンピュータ３が異常動作していることを、警告メール３９により管理者５に通知してステップＳ１１に進む。 In step S7, as shown in the specific example 4 of FIG. 8, the past failure matching unit 32 in FIG. 1 stores the current failure data from the past operation data 21 and the failure data in the current failure data DB 20. Look for similar patterns similar to the pattern. In step S8, if there is a pattern of failure data 20 similar to the past and a failure occurrence record, the process moves to step S10, the failure content that may occur near the customer computer 3, the past countermeasures, and the scheduled failure occurrence time. Is notified to the administrator 5 as the failure prediction mail 39, and the process proceeds to step S11. In step S7, if the current failure data 20 has no similar pattern in the past or there is a similar pattern but no failure has occurred, in step S9, the customer computer 3 is operating abnormally. The administrator 5 is notified by the warning mail 39 and the process proceeds to step S11.

ステップＳ１１では、障害発生予定時間までに障害が発生した、もしくは対策を施した結果通常のパターンに戻った場合には、ステップＳ１２において障害予測が成功したと判断してステップＳ１に戻る。対策を施していないにもかかわらず障害予定時刻を過ぎても障害が発生せず、その後通常のパターンに戻った場合には、ステップＳ１３において図９の具体例５に示すように、障害予測のミスのパターンを、例外パターン２３として自動登録して、この例外パターン２３は以後誤検出防止に使用される。 In step S11, when a failure has occurred by the scheduled failure occurrence time or when a normal pattern is returned as a result of taking countermeasures, it is determined in step S12 that failure prediction has been successful and the process returns to step S1. If no failure occurs even after the scheduled failure time has passed even though no countermeasure has been taken, and then the normal pattern is restored, then in step S13, as shown in FIG. A miss pattern is automatically registered as an exception pattern 23, and this exception pattern 23 is used for preventing false detection thereafter.

次に、図３のフローを参照して、お客様コンピュータ３の異常の兆候を自動的に検出する手順について説明する。図３のフローは、ステップＳ２０からＳ２９を有している。ステップＳ２０で、「抽出サイズ」をゼロにセットする。ステップＳ２１で、「抽出サイズ」を＋１とすると、ステップＳ２２で、稼働データ２１の最新データの位置から過去に遡って「抽出サイズ」分取り出して正規化する。この正規化データを、「仮変化パターン」と呼ぶ。 Next, a procedure for automatically detecting an abnormality sign of the customer computer 3 will be described with reference to the flow of FIG. The flow in FIG. 3 includes steps S20 to S29. In step S20, “extraction size” is set to zero. If “extraction size” is set to +1 in step S21, “extraction size” is extracted from the position of the latest data of the operation data 21 in the past and normalized in step S22. This normalized data is referred to as a “temporary change pattern”.

ステップＳ２３で、稼働データ２１を過去に遡って正規化して「仮変化パターン」と比較して、パターンの繰り返し周期を求める。ステップＳ２４で、繰り返しが認められた場合には、「仮変化パターン」を伸張してステップＳ２１に戻って再試行する。そうでなく、繰り返しが認められない場合には、ステップＳ２６において最後に繰り返しが認められた「仮変化パターン」を「変化パターン」として周期データ２２として登録する。ステップＳ２７で、「抽出サイズ」が稼働データ２１のサイズを超えていない場合には、ステップＳ２８において別の周期を持つパターンを検出するためにステップＳ２１に戻る。「抽出サイズ」が稼働データ２１のサイズを超えている場合には、ステップＳ２９において周期パターンの検出を終了する。 In step S23, the operation data 21 is normalized retroactively and compared with the “provisional change pattern” to obtain a pattern repetition period. If the repetition is recognized in step S24, the “temporary change pattern” is expanded and the process returns to step S21 to retry. Otherwise, if the repetition is not recognized, the “temporary change pattern” at which the repetition is finally recognized in step S26 is registered as the “change pattern” as the periodic data 22. If the “extraction size” does not exceed the size of the operation data 21 in step S27, the process returns to step S21 in order to detect a pattern having another period in step S28. If the “extraction size” exceeds the size of the operation data 21, the detection of the periodic pattern is terminated in step S29.

ところで、コンピュータシステムには、実行スケジュールに基づいて処理を実行するバッチ系処理システムがある。本発明の監視システムの実施形態では、このバッチ系処理システムに着目して、その特性に合わせて稼働データのいわゆる「変化パターン」の捉え方を、次の要領で変えている。 Incidentally, computer systems include batch processing systems that execute processing based on an execution schedule. In the embodiment of the monitoring system of the present invention, focusing on this batch processing system, the so-called “change pattern” of operation data is changed in the following manner in accordance with its characteristics.

（１）お客様コンピュータの稼動データ（ＣＰＵ（中央処理装置）の利用率やアクセス数、トランザクション数など）を日々記録する。 (1) Daily operation data (CPU (central processing unit) usage rate, number of accesses, number of transactions, etc.) of customer computers is recorded.

（２）稼働データを正規化して、稼働データの「変化パターン」とその変化パターンの「周期」を捉える。このとき月次処理などによって変化パターンの周期が複数存在する場合は、周期ごとに変化パターンを求めて変化パターンＤＢに保持しておく。また、変化パターンを作成するにあたっては、好ましくは３〜１０周期前までの稼働データの平均を使うことで、徐々に変化する稼働データの変化状況に対応できるようにする。 (2) Normalize the operation data and capture the “change pattern” of the operation data and the “cycle” of the change pattern. At this time, when there are a plurality of change pattern cycles due to monthly processing or the like, a change pattern is obtained for each cycle and stored in the change pattern DB. Further, when creating the change pattern, it is preferable to use the average of the operation data from 3 to 10 cycles before, so that the change state of the operation data that gradually changes can be dealt with.

（３）現在の稼働データの変化パターンが、変化パターンＤＢの内容と異なる場合には、“いつもと違う動作”＝異常動作であるとみなす。ただし、すでに述べたように、「例外パターンＤＢ２３」に類似パターンが記録されている場合には、異常動作とはしない。 (3) If the change pattern of the current operation data is different from the contents of the change pattern DB, it is considered that “unusual operation” = abnormal operation. However, as described above, when a similar pattern is recorded in the “exception pattern DB 23”, no abnormal operation is performed.

（４）お客様コンピュータ３の異常動作を検出した場合には、過去の類似した変化パターンとその直後に発生した障害データを検索して、近く発生しうる障害として管理者５に通知する。 (4) When an abnormal operation of the customer computer 3 is detected, a past similar change pattern and failure data that occurred immediately after that are searched and notified to the administrator 5 as a failure that may occur in the near future.

（５）異常状態と判断したものの、その後お客様コンピュータ３に障害が発生しなかった場合には、誤検出変化パターンとして図１の例外パターン２３として登録して、以後この誤検出変化パターンは誤検出防止の用途に使用する。 (5) If the customer computer 3 is determined to be in an abnormal state but no failure has occurred thereafter, it is registered as the exception pattern 23 in FIG. 1 as a false detection change pattern. Used for prevention purposes.

以上説明したように、実施例の異常兆候検出システムでは、お客様コンピュータ３の過去の稼働データを正規化して、稼働データの変化パターンと変化パターンの周期を求め、求めた変化パターンと周期をお客様コンピュータ３のいつもの状態として捉える障害監視部３０と、変化パターンと現在の変化パターンとを比較した結果、障害には至っていない異常状態を異常兆候として検出する異常兆候監視部３１とを備える。これにより、しきい値を使った障害監視では捉えきれない異常挙動、すなわち障害として検出されない異常挙動（障害に至っていない異常の兆候）を、確実に検出することができる。また、稼働データの変化パターンの周期を捉えることにより、月次処理や期末処理などの誤検出をしなくなり、予測ミスを自動的に学習することにより、誤検出を減らすことができる。 As described above, in the abnormal sign detection system of the embodiment, the past operation data of the customer computer 3 is normalized, the change pattern of the operation data and the cycle of the change pattern are obtained, and the obtained change pattern and cycle are obtained. 3, and a failure monitoring unit 30 that detects the abnormal state as a result of comparing the change pattern and the current change pattern as an abnormality sign. This makes it possible to reliably detect abnormal behavior that cannot be detected by fault monitoring using a threshold value, that is, abnormal behavior that is not detected as a fault (a sign of abnormality that has not led to a fault). Further, by detecting the cycle of the change pattern of the operation data, erroneous detection such as monthly processing and end-of-period processing is eliminated, and erroneous detection can be reduced by automatically learning a prediction error.

また、実施例の異常兆候検出システムでは、稼働データの一部を、抽出サイズを伸張させながら切り出して、繰り返しの前記変化パターンを自動的に検出して、前記周期を自動的に検出する。これにより、稼働データの変化パターンの周期を捉えることにより、例えば月次処理や期末処理などを誤検出しなくなる。 Further, in the abnormal sign detection system of the embodiment, a part of the operation data is cut out while extending the extraction size, the repeated change pattern is automatically detected, and the period is automatically detected. As a result, by capturing the cycle of the change pattern of the operation data, for example, monthly processing or period-end processing is not erroneously detected.

そして、実施例の異常兆候検出システムでは、稼働データの変化パターンと周期を自動的に検出して、周期が複数存在する場合には、周期ごとに変化パターンを求めて変化パターンデータベースに保持しておく。これにより、例えば月次処理などによって周期が複数存在する場合であっても、周期毎に変化パターンを保持しておくことができ、異常の兆候をさらに確実に検出することができる。 In the abnormal sign detection system of the embodiment, the change pattern and cycle of the operation data are automatically detected, and when there are a plurality of cycles, the change pattern is obtained for each cycle and stored in the change pattern database. deep. As a result, even when there are a plurality of periods due to, for example, monthly processing, a change pattern can be held for each period, and an abnormality sign can be detected more reliably.

更に、実施例の異常兆候検出システムは、変化パターンを作成する際には、３〜１０周期前までの前記稼働データの平均を用いる。これにより、徐々に変化するお客様コンピュータ３の稼働状態に対応できるようにする。 Furthermore, the abnormal sign detection system according to the embodiment uses the average of the operation data up to 3 to 10 cycles before creating a change pattern. Thereby, it becomes possible to cope with the operating state of the customer computer 3 that gradually changes.

また、実施例の異常兆候検出システムは、現在の変化パターンに類似する類似パターンを過去の稼働データから探し出して、その直後に発生した障害情報を示すことで、近く発生しうる障害を予想して障害予測結果として通報する。これにより、近く発生しうる障害を確実に管理者に知らせることができる。 In addition, the abnormal sign detection system according to the embodiment searches for a similar pattern similar to the current change pattern from past operation data and indicates failure information that has occurred immediately thereafter, thereby predicting a failure that may occur in the near future. Report as a failure prediction result. Thereby, the administrator can be surely notified of a failure that may occur in the near future.

そして、実施例の異常兆候検出システムは、障害予測後に実際には障害が発生しなかった場合に、変化パターンは誤検出変化パターンとして例外パターンのデータベースに登録して、以後誤検出変化パターンを用いて異常状態の誤検出を防止する。これにより、お客様コンピュータ３における異常状態の誤った検出を避けることができる。 In the abnormal sign detection system of the embodiment, when no failure actually occurs after the failure prediction, the change pattern is registered in the exception pattern database as a false detection change pattern, and thereafter the false detection change pattern is used. To prevent false detection of abnormal conditions. Thereby, the erroneous detection of the abnormal state in the customer computer 3 can be avoided.

従来の監視システム等の監視では、コンピュータ業務の異常が発生した後での異常検出やしきい値越えにより以上検出することができるが、業務が正常に稼働している段階では異常につながる兆候を検出することができなかった。これに対して、本発明は、コンピュータ業務の稼働状態を把握して、通常の稼働状態として記憶しておき、過去の稼働状態と現在の稼働状態を比較することで、管理者は業務が停止もしくは処理に異常が発生する前に、異常の兆候を掴んで、事前に対応することができる。 In conventional monitoring systems and other monitoring systems, it is possible to detect abnormalities after a computer business abnormality has occurred or by exceeding the threshold, but when the business is operating normally, there are indications that may lead to an abnormality. It could not be detected. On the other hand, the present invention grasps the operating status of the computer business, stores it as a normal operating status, and compares the past operating status with the current operating status, so that the administrator stops the business. Or, before an abnormality occurs in the process, it is possible to grasp the sign of the abnormality and cope with it in advance.

本発明は、上記実施形態に限定されず、特許請求の範囲を逸脱しない範囲で種々の変形例が採用できる。 The present invention is not limited to the above-described embodiment, and various modifications can be employed without departing from the scope of the claims.

実施例の異常兆候検出システムの説明図。Explanatory drawing of the abnormal sign detection system of an Example. 実施例における兆候検出手順の説明図。Explanatory drawing of the sign detection procedure in an Example. 実施例における自動検出手順の説明図。Explanatory drawing of the automatic detection procedure in an Example. 実施例におけるパターンの説明図。Explanatory drawing of the pattern in an Example. 実施例の異常兆候検出システムにおける具体例１の説明図。Explanatory drawing of the specific example 1 in the abnormal sign detection system of an Example. 実施例の異常兆候検出システムにおける具体例２の説明図。Explanatory drawing of the specific example 2 in the abnormal sign detection system of an Example. 実施例の異常兆候検出システムにおける具体例３の説明図。Explanatory drawing of the specific example 3 in the abnormal sign detection system of an Example. 実施例の異常兆候検出システムにおける具体例４の説明図。Explanatory drawing of the specific example 4 in the abnormal sign detection system of an Example. 実施例の異常兆候検出システムにおける具体例５の説明図。Explanatory drawing of the example 5 in the abnormal sign detection system of an Example.

Explanation of symbols

１お客様システム
２保守サービス会社
３お客様コンピュータ
１０監視システム
１１情報収集部
１２データベース（ＤＢ）
２０障害データ
２１稼働データ
２２周期データ
２３例外パターン
３０障害監視部
３１異常兆候監視部
３２過去の障害の照合部
４０異常兆候の通知
４１予測事象の通知
５管理者 1 Customer System 2 Maintenance Service Company 3 Customer Computer 10 Monitoring System 11 Information Collection Unit 12 Database (DB)
20 Failure data 21 Operation data 22 Periodic data 23 Exception pattern 30 Failure monitoring unit 31 Abnormal sign monitoring unit 32 Past failure checking unit 40 Abnormal sign notification 41 Predictive event notification 5 Administrator

Claims

A monitoring system that monitors connected customer computers and collects operation data, wherein the monitoring system normalizes past operation data of the customer computers, obtains a change pattern of the operation data and a period of the change pattern, An abnormal condition characterized by detecting an abnormal state that has not led to a failure as an abnormal sign by comparing the change pattern and the cycle as a normal state of the customer computer and comparing the change pattern with a current change pattern. Sign detection system.

The abnormal sign detection system according to claim 1, wherein a part of the operation data is cut out while extending an extraction size, the repeated change pattern is automatically detected, and the period is automatically detected.

The abnormality symptom detection system according to claim 2, wherein when there are a plurality of periods, the change pattern is obtained for each period and stored in a change pattern database.

The abnormal sign detection system according to claim 2 or 3, wherein an average of the operation data from 3 to 10 cycles before is used when the change pattern is created.

5. A similar pattern that is similar to the current change pattern is searched from the past operation data, and fault information that occurs immediately after that is indicated, so that a fault that may occur soon is predicted and reported as a fault prediction result. Abnormal sign detection system described in.

When a failure does not actually occur after the failure prediction, the change pattern is registered in the exception pattern database as a false detection change pattern, and the false detection change pattern is used to detect false detection of the abnormal state thereafter. The abnormal sign detection system according to claim 5 to prevent.