JP2008197698A

JP2008197698A - Integrated maintenance support system

Info

Publication number: JP2008197698A
Application number: JP2007029039A
Authority: JP
Inventors: Kohei Mori; 浩平森
Original assignee: NEC Fielding Ltd
Current assignee: NEC Fielding Ltd
Priority date: 2007-02-08
Filing date: 2007-02-08
Publication date: 2008-08-28

Abstract

<P>PROBLEM TO BE SOLVED: To automatically achieve restoration from the occurrence of a failure. <P>SOLUTION: A failure monitoring and reporting device 130 detects a failure occurring in a computer system, reports it to a maintenance status monitoring device 140, collects all logs, and sends them. An analysis device specifies a cause of the failure and suspicious parts based on the logs for output. A physical distribution device searches for the stock of the parts and sends a replacement procedure manual to the terminal of a maintenance person. Also, the maintenance person arranges for the parts so that the parts arrive on the determined date of treatment. Once these operations complete, the maintenance status monitoring device 140 registers the completion of the operation and the analysis device 100 updates an example database 104 if necessary. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は総合保守支援システムに関し、特に、システムに障害が発生した場合に、障害が復旧するまでの経過を監視し、個々の障害内容に応じて障害原因の解析、部品及び保守員の手配を自動的に行なう総合保守支援システムに関するものである。 The present invention relates to a general maintenance support system, and in particular, when a failure occurs in the system, the progress until the failure is recovered is monitored, the cause of the failure is analyzed according to the content of each failure, and parts and maintenance personnel are arranged. The present invention relates to a comprehensive maintenance support system that performs automatically.

従来の保守システムの一例が、特許文献１に記載されている。図１６に示すように、この従来の保守システムは、保守装置２０と、顧客システム１０と、保守員端末３０とから構成されている。保守装置２０は、構成データベース（ＤＢ）２１と、保守ＤＢ２２と、保守ＤＢ２３と、ＤＢ登録部２４と、診断処理部２５を含む。このような構成を有する従来の保守システムはつぎのように動作する。すなわち、保守装置２０内の診断処理部２５の診断プログラムによって監視対象装置の診断を実行し、異常を検出した際には診断結果と異常を検出したことを保守員端末３０へ通知する。
特開２００３−３４５６２２号公報 An example of a conventional maintenance system is described in Patent Document 1. As shown in FIG. 16, this conventional maintenance system includes a maintenance device 20, a customer system 10, and a maintenance staff terminal 30. The maintenance device 20 includes a configuration database (DB) 21, a maintenance DB 22, a maintenance DB 23, a DB registration unit 24, and a diagnosis processing unit 25. The conventional maintenance system having such a configuration operates as follows. That is, the diagnosis of the monitoring target device is executed by the diagnosis program of the diagnosis processing unit 25 in the maintenance device 20, and when the abnormality is detected, the maintenance result is notified to the maintenance staff terminal 30.
JP 2003-345622 A

しかしながら、従来の技術では、システムにて障害が発生した場合は現地にて保守員が保守員端末３０を操作してコマンドを入力し、出力結果を保存して障害支援部門へ送信するといった方法にて障害情報を送信していた。また、障害支援部門では、送付されてきた情報を目視で確認し、人手により解析を行った上で現地の保守員へ連絡を入れており、その後に部品の手配を行うため、障害の復旧までに多大な時間を要していた。また、解析の結果、新規に発見された問題であった場合も、システム開発部門や製造部門への支援要請は口頭により行なわれていた。そのため、効率が悪く、障害復旧までに多大な時間を要するという問題があった。
本発明はこのような状況に鑑みてなされたものであり、障害発生から障害復旧までの処理を人手を介することなく自動で一括して行うとともに復旧までの経過を監視し、復旧までの時間を短縮することを目的とする。 However, according to the conventional technique, when a failure occurs in the system, a maintenance staff operates the maintenance staff terminal 30 to input a command in the field, saves the output result, and transmits it to the failure support department. Was sending fault information. In addition, the failure support department visually confirms the sent information, analyzes it manually, contacts the local maintenance staff, and then arranges the parts so that the failure can be recovered. It took a lot of time. In addition, even if the problem was newly discovered as a result of the analysis, a request for support from the system development department or the manufacturing department was made verbally. Therefore, there is a problem that the efficiency is low and it takes a long time to recover from the failure.
The present invention has been made in view of such a situation, and processing from failure occurrence to failure recovery is performed automatically and collectively without human intervention, the progress until recovery is monitored, and the time until recovery is determined. The purpose is to shorten.

請求項１に記載の総合保守支援システムは、ネットワークに接続されたコンピュータシステムの障害を検知し、復旧処理を行う総合保守支援システムであって、前記コンピュータシステムにおける障害の発生を検出する障害検出手段と、保守員が所持する端末に対して障害発生を通知する通知手段と、障害発生時にログを採取するログ採取手段と、前記ログに基づいて前記障害の原因の解析を行う解析手段と、障害の原因となった部品の在庫状況を検索する検索手段と、前記部品を手配する部品手配手段と、前記保守員に対して前記部品の交換手順を送信する送信手段とを備えることを特徴とする。
また、前記障害発生時からの経過時間を計測し、監視する経過時間監視手段と、所定時間経過する毎に報知する報知手段と、前記障害の原因解析の完了時刻を登録する登録手段とをさらに備えるようにすることができる。
また、前記部品の到着時間を算出する算出手段をさらに備えるようにすることができる。
また、部品毎の交換手順を記憶するデータベースをさらに備え、前記送信手段は、前記データベースから取得した前記交換手順を送信するようにすることができる。
請求項５に記載の総合保守支援方法は、ネットワークに接続されたコンピュータシステムの障害を検知し、復旧処理を行う総合保守支援方法であって、前記コンピュータシステムにおける障害の発生を検出する障害検出ステップと、保守員が所持する端末に対して障害発生を通知する通知ステップと、障害発生時にログを採取するログ採取ステップと、前記ログに基づいて前記障害の原因の解析を行う解析ステップと、障害の原因となった部品の在庫状況を検索する検索ステップと、前記部品を手配する部品手配ステップと、前記保守員に対して前記部品の交換手順を送信する送信ステップとを備えることを特徴とする。
また、前記障害発生時からの経過時間を計測し、監視する経過時間監視ステップと、所定時間経過する毎に報知する報知ステップと、前記障害の原因解析の完了時刻を登録する登録ステップとをさらに備えるようにすることができる。
また、前記部品の到着時間を算出する算出ステップをさらに備えるようにすることができる。
また、部品毎の交換手順を記憶するデータベースをさらに備え、前記送信ステップにおいては、前記データベースから取得した前記交換手順が送信されるようにすることができる。
請求項９に記載の総合保守支援プログラムは、ネットワークに接続されたコンピュータシステムの障害を検知し、復旧処理を行う総合保守支援システムにおける総合保守支援プログラムであって、前記コンピュータシステムにおける障害の発生を検出する障害検出ステップと、保守員が所持する端末に対して障害発生を通知する通知ステップと、障害発生時にログを採取するログ採取ステップと、前記ログに基づいて前記障害の原因の解析を行う解析ステップと、障害の原因となった部品の在庫状況を検索する検索ステップと、前記部品を手配する部品手配ステップと、前記保守員に対して前記部品の交換手順を送信する送信ステップとを前記総合保守支援システムに実行させることを特徴とする。
また、前記障害発生時からの経過時間を計測し、監視する経過時間監視ステップと、所定時間経過する毎に報知する報知ステップと、前記障害の原因解析の完了時刻を登録する登録ステップとをさらに備えるようにすることができる。
また、前記部品の到着時間を算出する算出ステップをさらに備えるようにすることができる。
また、部品毎の交換手順を記憶するデータベースをさらに備え、前記送信ステップにおいては、前記データベースから取得した前記交換手順が送信されるようにすることができる。 The comprehensive maintenance support system according to claim 1 is a comprehensive maintenance support system that detects a failure of a computer system connected to a network and performs recovery processing, and detects a failure in the computer system. A notification means for notifying the occurrence of a failure to a terminal possessed by maintenance personnel, a log collecting means for collecting a log when the failure occurs, an analysis means for analyzing the cause of the failure based on the log, a failure A search means for searching for the inventory status of the part that caused the problem, a parts arrangement means for arranging the part, and a transmission means for transmitting a replacement procedure of the part to the maintenance staff. .
Further, an elapsed time monitoring means for measuring and monitoring an elapsed time from the occurrence of the failure, an informing means for informing every time a predetermined time elapses, and a registration means for registering the completion time of the cause analysis of the failure Can be provided.
Further, it may further comprise a calculation means for calculating the arrival time of the part.
Further, a database for storing a replacement procedure for each part may be further provided, and the transmission unit may transmit the replacement procedure acquired from the database.
The comprehensive maintenance support method according to claim 5 is a comprehensive maintenance support method for detecting a failure of a computer system connected to a network and performing a recovery process, and detecting a failure in the computer system. A notification step for notifying the occurrence of a failure to a terminal owned by a maintenance staff, a log collection step for collecting a log when a failure occurs, an analysis step for analyzing the cause of the failure based on the log, and a failure A search step for searching for the inventory status of the component that caused the problem, a component arrangement step for arranging the component, and a transmission step for transmitting a replacement procedure for the component to the maintenance staff. .
And an elapsed time monitoring step for measuring and monitoring an elapsed time from the occurrence of the failure, a notification step for notifying every time a predetermined time elapses, and a registration step for registering the completion time of the cause analysis of the failure. Can be provided.
In addition, a calculation step of calculating the arrival time of the part can be further provided.
Further, a database for storing a replacement procedure for each part may be further provided, and the replacement procedure acquired from the database may be transmitted in the transmission step.
An integrated maintenance support program according to claim 9 is an integrated maintenance support program in an integrated maintenance support system that detects a failure of a computer system connected to a network and performs recovery processing, and detects occurrence of a failure in the computer system. A failure detection step to detect, a notification step to notify the occurrence of a failure to a terminal possessed by maintenance personnel, a log collection step to collect a log when a failure occurs, and analysis of the cause of the failure based on the log An analysis step, a search step for searching for an inventory status of a component causing a failure, a component arrangement step for arranging the component, and a transmission step for transmitting a replacement procedure for the component to the maintenance staff It is characterized in that it is executed by a general maintenance support system.
And an elapsed time monitoring step for measuring and monitoring an elapsed time from the occurrence of the failure, a notification step for notifying every time a predetermined time elapses, and a registration step for registering the completion time of the cause analysis of the failure. Can be provided.
In addition, a calculation step of calculating the arrival time of the part can be further provided.
Further, a database for storing a replacement procedure for each part may be further provided, and the replacement procedure acquired from the database may be transmitted in the transmission step.

本発明によれば、コンピュータシステムにおいて発生した障害の原因を解析し、部品や保守員の手配を自動で行うことができ、復旧までの時間を短縮することができる。 According to the present invention, the cause of a failure occurring in a computer system can be analyzed, parts and maintenance personnel can be automatically arranged, and the time to recovery can be shortened.

次に、本発明の実施例の構成について図面を参照して詳細に説明する。図１を参照すると、本発明が適用される保守支援システムの第１の実施の形態は、プログラム制御により動作するコンピュータ（中央処理装置；プロセッサ；データ処理装置）である解析装置１００と、解析支援装置１１０と、物流装置１２０と、障害監視通報装置１３０と、保守状況監視装置１４０と、解析支援部門１６０とから構成され、これらの装置（コンピュータシステム）はインターネット経由で互いに接続され、各種データやコマンドを送受信することができるようになっている。また、保守員１５０は、インターネットに接続可能な後述する携帯端末１５１を所持し、携帯端末１５１を用いてこの保守システムの保守を行うようになっている。障害監視通報装置１３０は、ユーザ先のコンピュータシステムが設置されている場所に設置することができる。 Next, the configuration of the embodiment of the present invention will be described in detail with reference to the drawings. Referring to FIG. 1, a first embodiment of a maintenance support system to which the present invention is applied includes an analysis apparatus 100 that is a computer (central processing unit; processor; data processing unit) that operates under program control, and analysis support. The apparatus 110, the physical distribution apparatus 120, the failure monitoring and reporting apparatus 130, the maintenance status monitoring apparatus 140, and the analysis support department 160 are connected to each other via the Internet, and various data and Commands can be sent and received. The maintenance staff 150 has a portable terminal 151 (to be described later) that can be connected to the Internet, and performs maintenance of the maintenance system using the portable terminal 151. The failure monitoring notification device 130 can be installed at a place where the user's computer system is installed.

解析装置１００は、図２に示すように、総合解析プログラム１０１と、ハードウェア障害解析プログラム１０２と、ソフトウェア障害解析プログラム１０３と、事例データベース１０４とを含む。 As shown in FIG. 2, the analysis apparatus 100 includes a comprehensive analysis program 101, a hardware failure analysis program 102, a software failure analysis program 103, and a case database 104.

これらのプログラムはそれぞれ概略次のように動作する。総合解析プログラム１０１は、ハードウェア障害解析プログラム１０２とソフトウェア障害解析プログラム１０３の出力結果に基づいて、障害の原因となる事象がハードウェアにあるのか、ソフトウェアにあるのか、または、解析不可であるのかを総合的に判断する。 Each of these programs generally operates as follows. Based on the output results of the hardware failure analysis program 102 and the software failure analysis program 103, the comprehensive analysis program 101 determines whether the event causing the failure is in hardware, software, or analysis is impossible. Judging comprehensively.

ハードウェア障害解析プログラム１０２は、障害監視通報装置１３０より受け取ったデータ（ログデータ）より、ハードウェア観点での解析を行い、ハードウェアにて異常を検出していないか否かの判断を行なう。ソフトウェア障害解析プログラム１０３は、障害監視通報装置１３０より受け取ったデータ（ログデータ）より、ソフトウェア観点での解析を行い、ソフトウェアにて異常を検出していないか否かの判断を行なう。事例データベース１０４には、過去の障害事例、および、解析をするために必要なデータが保存されている。 The hardware failure analysis program 102 analyzes from the viewpoint of hardware from the data (log data) received from the failure monitoring notification device 130, and determines whether or not an abnormality is detected by the hardware. The software failure analysis program 103 analyzes from the viewpoint of software from the data (log data) received from the failure monitoring notification device 130, and determines whether or not an abnormality is detected by the software. The case database 104 stores past failure cases and data necessary for analysis.

解析支援装置１１０は、図３に示すように、支援要請先判断プログラム１１１と、解析・支援先データベース１１２とを含む。これらはそれぞれ概略次のように動作する。支援要請先判断プログラム１１１は、現在発生している障害復旧のための支援要請先を、解析・支援先データベース１１２を参照して判断する。解析・支援先データベース１１２には、各障害発生箇所における解析支援部門１６０の連絡先が保存されている。 As shown in FIG. 3, the analysis support apparatus 110 includes a support request destination determination program 111 and an analysis / support destination database 112. Each of these operates as follows. The support request destination determination program 111 determines the support request destination for the currently occurring failure recovery with reference to the analysis / support destination database 112. The analysis / support destination database 112 stores the contact information of the analysis support department 160 at each failure location.

物流装置１２０は、図４に示すように、到着時間計算プログラム１２１と、発注プログラム１２２と、交換手順作成プログラム１２３と、在庫データベース１２４と、交換手順データベース１２５とを含む。これらはそれぞれ概略次のように動作する。 As shown in FIG. 4, the distribution apparatus 120 includes an arrival time calculation program 121, an ordering program 122, an exchange procedure creation program 123, an inventory database 124, and an exchange procedure database 125. Each of these operates as follows.

到着時間計算プログラム１２１は、在庫データベース１２４に保存されている各物流センタの部品在庫状況に基づいて、交換が必要な部品が何時間で現地へ到着するのかを算出する。発注プログラム１２２は、部品持ち出し要求があった場合、在庫データベース１２４を更新し、各物流処理を行なう。交換手順作成プログラム１２３は、交換手順データベース１２５を参照し、持ち出された部品の交換手順を示す交換手順書を作成し、配信を行う。在庫データベース１２４には、各物流センタの部品在庫状況が保存されている。交換手順データベース１２５には、各部品交換の手順が保存されている。 The arrival time calculation program 121 calculates how many parts that need to be replaced arrive at the site based on the parts inventory status of each distribution center stored in the inventory database 124. The ordering program 122 updates the inventory database 124 and performs each physical distribution process when there is a part take-out request. The replacement procedure creation program 123 refers to the replacement procedure database 125, creates a replacement procedure document indicating the replacement procedure of the taken-out component, and distributes it. The inventory database 124 stores the parts inventory status of each distribution center. The replacement procedure database 125 stores a procedure for replacing each component.

障害監視通報装置１３０は、図５に示すように、障害監視プログラム１３１と、死活監視プログラム１３２と、ログ採取プログラム１３３と、障害判断基準データベース１３４とを含む。これらはそれぞれ概略次のように動作する。 As shown in FIG. 5, the failure monitoring notification device 130 includes a failure monitoring program 131, a life / death monitoring program 132, a log collection program 133, and a failure determination criterion database 134. Each of these operates as follows.

障害監視プログラム１３１は、各コンピュータシステムの情報（ログ等）、およびシステムとしての状態を監視する。死活監視プログラム１３２は、各コンピュータシステムが動作しているか、停止しているかの監視を行なう。ログ採取プログラム１３３は、システムに関係する全てのログを採取する。障害判断基準データベース１３４には、監視対象のシステムにて障害が発生した際のメッセージが保存されている。 The failure monitoring program 131 monitors information (such as logs) of each computer system and the state of the system. The life and death monitoring program 132 monitors whether each computer system is operating or stopped. The log collection program 133 collects all logs related to the system. The failure judgment reference database 134 stores a message when a failure occurs in the monitored system.

保守状況監視装置１４０は、図６に示すように、経過時間監視プログラム１４１と、未復旧時通報先判断プログラム１４２と、各関係部門データベース１４３とを含む。これらはそれぞれ概略次のように動作する。 As shown in FIG. 6, the maintenance status monitoring apparatus 140 includes an elapsed time monitoring program 141, an unrestored notification destination determination program 142, and each related department database 143. Each of these operates as follows.

経過時間監視プログラム１４１は、障害が発生したときからの経過時間を記録する。未復旧時通報先判断プログラム１４２は、例えば、２時間経過しても障害の完了報告が無い場合、その旨を連絡する部門を判断する。各関係部門データベース１４３には、解析支援部門１６０や保守担当拠点の連絡先が保存されている。 The elapsed time monitoring program 141 records the elapsed time since the failure occurred. For example, if there is no failure completion report even after two hours have passed, the non-restored report destination determination program 142 determines a department to notify to that effect. Each related department database 143 stores the contact information of the analysis support department 160 and the maintenance department.

保守員１５０は、図７に示すように、携帯端末１５１を所持している。これらはそれぞれ概略次のように動作する。携帯端末１５１は、障害発生後に、解析結果、被疑部品、部品の在庫状況、交換手順書に対応するデータを受信する。また、部品の手配や障害完了報告を送信する。保守員１５０は適宜、携帯端末１５１を操作する。 The maintenance staff 150 has a portable terminal 151 as shown in FIG. Each of these operates as follows. The portable terminal 151 receives data corresponding to the analysis result, the suspected part, the inventory status of the part, and the replacement procedure after the failure occurs. It also sends parts arrangements and failure completion reports. The maintenance staff 150 operates the mobile terminal 151 as appropriate.

解析支援部門１６０は、図８に示すように、保守員と保守員が所持するモバイル端末１６１を含む。これらはそれぞれ概略次のように動作する。モバイル端末１６１はログデータを受信するために用いられる。解析支援部門１６０の保守員は適宜、モバイル端末１６１を操作する。 As shown in FIG. 8, the analysis support department 160 includes a maintenance staff and a mobile terminal 161 possessed by the maintenance staff. Each of these operates as follows. The mobile terminal 161 is used for receiving log data. The maintenance staff of the analysis support department 160 operates the mobile terminal 161 as appropriate.

次に、図１乃至図８および図９のフローチャートを参照して、本発明の第１の実施の形態の動作について詳細に説明する。 Next, the operation of the first exemplary embodiment of the present invention will be described in detail with reference to the flowcharts of FIGS. 1 to 8 and FIG.

まず、コンピュータシステムを全て監視している障害監視通報装置１３０の障害監視プログラム１３１は、コンピュータシステムの異常（障害）を検出する（図９のステップＡ１）。次に、障害が発生したことを示す障害情報を保守状況監視装置１４０へ送信する（ステップＡ２）。
次に、障害発生を示す障害情報を受信した保守状況監視装置１４０の経過時間監視プログラム１４１が起動し、障害発生からの経過時間の計測および監視を開始する（ステップＡ３）。このとき、障害発生時刻を所定のデータベースに登録しておくことができる。さらに、障害発生を示す障害情報を保守員１５０へ送信する（ステップＡ４）。 First, the failure monitoring program 131 of the failure monitoring notification device 130 that monitors all computer systems detects an abnormality (failure) in the computer system (step A1 in FIG. 9). Next, failure information indicating that a failure has occurred is transmitted to the maintenance status monitoring device 140 (step A2).
Next, the elapsed time monitoring program 141 of the maintenance status monitoring apparatus 140 that has received the failure information indicating the occurrence of the failure is started, and measurement and monitoring of the elapsed time from the occurrence of the failure is started (step A3). At this time, the failure occurrence time can be registered in a predetermined database. Further, failure information indicating the occurrence of the failure is transmitted to the maintenance staff 150 (step A4).

次に、コンピュータシステムの異常を検出した障害監視通報装置１３０内のログ採取プログラム１３３により、図１に示したインターネットに接続されたコンピュータシステム全てのログ採取が実行される（ステップＡ５）。 Next, the log collection program 133 in the failure monitoring and reporting device 130 that has detected the abnormality of the computer system executes the log collection for all the computer systems connected to the Internet shown in FIG. 1 (step A5).

採取されたログは、解析装置１００へ送信される。ログを受信した解析装置１００は、ハードウェア障害解析プログラム１０２及びソフトウェア障害解析プログラム１０３によって、事例データベース１０４に保存されている過去の障害事例、および、解析をするために必要なデータを元に解析を行う（ステップＡ６）。本事例では、解析結果が正常に算出され、ハードウェアの障害（ハードディスクの電源部故障）であったと仮定する。 The collected log is transmitted to the analysis apparatus 100. The analysis apparatus 100 that has received the log analyzes by using the hardware failure analysis program 102 and the software failure analysis program 103 based on past failure cases stored in the case database 104 and data necessary for the analysis. (Step A6). In this case, it is assumed that the analysis result is normally calculated and that the hardware is a failure (hard disk power supply unit failure).

この場合、ハードウェア障害解析プログラム１０２によってハードディスク電源部分にて電源供給不可状態に陥ったと解析され、逆にソフトウェア障害解析プログラム１０３では書き込み不可のため、装置のリブートを行った旨の結果が出力されたものとする。このハードウェア障害解析プログラム１０２、ソフトウェア障害解析プログラム１０３の両プログラムの解析結果に基づいて、総合解析プログラム１０１が解析結果を算出し、ハードディスク故障と判断する（ステップＡ７）。 In this case, it is analyzed by the hardware failure analysis program 102 that the hard disk power supply has become incapable of supplying power. Conversely, the software failure analysis program 103 cannot write, so that a result indicating that the device has been rebooted is output. Shall be. Based on the analysis results of both the hardware failure analysis program 102 and the software failure analysis program 103, the comprehensive analysis program 101 calculates the analysis result and determines that there is a hard disk failure (step A7).

次に、解析装置１００は、ハードディスク故障であることを示す情報（解析結果）、及び該当するハードディスクの種類（被疑部品）についての情報からなるデータを保守状況監視装置１４０へ送信する（ステップＡ８）。解析結果を受信した保守状況監視装置１４０は、経過時間監視プログラム１４１によって解析完了時刻を所定のデータベースに登録し、受信したデータを物流装置１２０へ転送する（ステップＡ９）。 Next, the analysis apparatus 100 transmits data including information indicating the hard disk failure (analysis result) and information about the type of the corresponding hard disk (suspected part) to the maintenance status monitoring apparatus 140 (step A8). . The maintenance status monitoring apparatus 140 that has received the analysis result registers the analysis completion time in a predetermined database by the elapsed time monitoring program 141, and transfers the received data to the physical distribution apparatus 120 (step A9).

解析結果及び部品情報を示すデータを受信した物流装置１２０は、被疑部品であるハードディスクの在庫状況を在庫データベース１２４を参照して検索する（ステップＡ１０）。次に、交換手順作成プログラム１２３によって交換手順データベース１２５が参照され、交換が必要な部品の交換作業手順を指示する交換手順書が作成される（ステップＡ１１）。作成された交換手順書、交換部品、部品の在庫状況を示すデータが保守員１５０へ送信される（ステップＡ１２）。 The physical distribution device 120 that has received the data indicating the analysis result and the component information searches the inventory status of the hard disk that is the suspected component with reference to the inventory database 124 (step A10). Next, the replacement procedure database 125 is referred to by the replacement procedure creation program 123, and a replacement procedure document for instructing a replacement work procedure for parts that need to be replaced is created (step A11). The created replacement procedure manual, replacement parts, and data indicating the inventory status of the parts are transmitted to the maintenance staff 150 (step A12).

保守員１５０は、携帯端末１５１にて、交換手順書を示すデータと、交換部品を示すデータおよび部品の在庫状況を示すデータからなる故障部品情報を受信する（ステップＡ１３）。携帯端末１５１にてこのデータを受信した保守員１５０は、ユーザ先担当者へ連絡を入れ、交換作業の日程（処置日時）を調整する（ステップＡ１４）。次いで、物流装置１２０に対して、ユーザ先に決定した日程に部品がユーザ先に到着するよう、携帯端末１５１から指示する（部品を発注する）。 The maintenance staff 150 receives, at the portable terminal 151, failure part information including data indicating a replacement procedure, data indicating replacement parts, and data indicating the inventory status of the parts (step A13). The maintenance staff 150 who has received this data at the portable terminal 151 contacts the person in charge of the user and adjusts the schedule (treatment date / time) of the replacement work (step A14). Next, the distribution device 120 is instructed from the mobile terminal 151 (orders the parts) so that the parts arrive at the user destination on the schedule determined by the user destination.

部品発送指示を受けた物流装置１２０は、発注プログラム１２２により発注を受理し、物流システムへ部品発注を行い、在庫データベース２２４には、ハードディスクの持ち出しがあったことが記録され、在庫データベース２２４が更新される（ステップＡ１５）。 Receiving the parts dispatch instruction, the logistics apparatus 120 accepts the order by the ordering program 122, places the parts order to the logistics system, the inventory database 224 records that the hard disk has been taken out, and the inventory database 224 is updated. (Step A15).

交換日時調整を終えた保守員１５０は、作業日時に現地へ向かい、既に受信済みの交換手順書を元に、到着しているハードディスクの交換を実施する（ステップＡ１６）。問題無く、２時間以内に作業を終え、復旧すると（ステップＡ１７）、作業完了報告を携帯端末１５１より保守状況監視装置１４０に対して送信する（ステップＡ１８）。 After completing the replacement date and time adjustment, the maintenance staff 150 heads for the work date and time, and replaces the hard disk that has arrived based on the already received replacement procedure (step A16). When there is no problem and the work is completed and recovered within 2 hours (step A17), a work completion report is transmitted from the portable terminal 151 to the maintenance status monitoring device 140 (step A18).

作業完了報告を受信した保守状況監視装置１４０は、作業が完了したことを示すデータを経過時間監視プログラム１４１によって登録し、本件のクローズを行なう（ステップＡ１９）。次いで、受信したデータ（作業完了報告）を解析装置１００へ転送する。 The maintenance status monitoring apparatus 140 that has received the work completion report registers data indicating that the work has been completed by the elapsed time monitoring program 141, and closes this case (step A19). Next, the received data (work completion report) is transferred to the analysis apparatus 100.

最後に、解析装置１００は、作業完了報告を受信し（ステップＡ２０）、一次解析で算出したデータに誤りが無かったと判断し、特に事例データベース１０４の更新は行なわない（ステップＡ２１）。 Finally, the analysis apparatus 100 receives the work completion report (step A20), determines that there is no error in the data calculated in the primary analysis, and does not particularly update the case database 104 (step A21).

次に、本発明の第１の実施の形態の効果について説明する。本発明の第１の実施の形態では、人手介入が行なわれるのは保守員１５０のみというように構成されているため、他の作業に関して全て自動化できる。また、本発明の第１の実施の形態では、さらに、障害監視通報装置１３０にて採取されたログから障害原因が判明するように構成されているため、保守員１５０による現地でのログ採取、解析の工数が削減できる。また、被疑部品を選定した上で部品の手配をし、さらに、交換手順書を保守員１５０に送付するので、現地での保守員１５０による部品の手配や交換作業に要する工数を削減することができる。また、解析装置１００が障害の内容により解析困難であると判断した場合には、障害状況を見極め、解析支援装置１１０が解析・支援先データベース１１２から支援依頼先を検索して自動選択した後、支援先として選択された解析支援部門１６０にログデータを送信するとともに支援依頼を行う。このため、関連部署以外への不要な連絡を抑制することができる。 Next, effects of the first exemplary embodiment of the present invention will be described. In the first embodiment of the present invention, manual intervention is performed so that only the maintenance staff 150 is performed. Therefore, all other operations can be automated. Further, in the first embodiment of the present invention, since the cause of the failure is determined from the log collected by the failure monitoring notification device 130, the log collection at the site by the maintenance staff 150, Analysis man-hours can be reduced. In addition, after selecting the suspected parts, the parts are arranged, and the replacement procedure is sent to the maintenance staff 150. Therefore, it is possible to reduce the man-hours required for the parts arrangement and replacement work by the maintenance staff 150 at the site. it can. Also, when the analysis device 100 determines that the analysis is difficult due to the content of the failure, after determining the failure status, the analysis support device 110 retrieves the support request destination from the analysis / support destination database 112 and automatically selects it, The log data is transmitted to the analysis support department 160 selected as the support destination and a support request is made. For this reason, unnecessary communication to other than the related department can be suppressed.

次に、本発明の第２の実施の形態の構成について図面を参照して詳細に説明する。図１０を参照すると、本発明の第２の実施の形態の構成は、プログラム制御により動作するコンピュータ（中央処理装置；プロセッサ；データ処理装置）である、障害監視通報解析装置２００と、物流装置２２０と、保守員２３０と、解析支援部門２４０とから構成されている。障害監視通報解析装置２００は、ユーザ先のコンピュータシステムが設置されている場所に設置することができる。 Next, the configuration of the second exemplary embodiment of the present invention will be described in detail with reference to the drawings. Referring to FIG. 10, the configuration of the second embodiment of the present invention is a computer (central processing unit; processor; data processing unit) that operates under program control, that is, a failure monitoring / report analysis device 200 and a logistics device 220. A maintenance staff 230 and an analysis support department 240. The failure monitoring report analysis device 200 can be installed at a place where a user's computer system is installed.

障害監視通報解析装置２００は、図１１に示すように、総合解析プログラム２０１と、ハードウェア障害解析プログラム２０２と、ソフトウェア障害解析プログラム２０３と、支援要請先判断プログラム２０４と、障害監視プログラム２０５と、死活監視プログラム２０６と、ログ採取プログラム２０７と、経過時間監視プログラム２０８と、未復旧時通報先判断プログラム２０９と、事例データベース２１０と、解析・支援先データベース２１１と、障害判断基準データベース２１２と、各関係部門データベース２１３とを含む。これらはそれぞれ概略次のように動作する。 As shown in FIG. 11, the failure monitoring notification analysis device 200 includes a comprehensive analysis program 201, a hardware failure analysis program 202, a software failure analysis program 203, a support request destination determination program 204, a failure monitoring program 205, Life and death monitoring program 206, log collection program 207, elapsed time monitoring program 208, unrestored notification destination determination program 209, case database 210, analysis / support destination database 211, failure determination criterion database 212, Related department database 213. Each of these operates as follows.

総合解析プログラム２０１は、ハードウェア障害解析プログラム２０２とソフトウェア障害解析プログラム２０３の出力結果を総合判断し、障害事象がハードウェアであるのか、ソフトウェアであるのか、または解析不可であるのか否か等の判断を行なう。 The comprehensive analysis program 201 comprehensively judges the output results of the hardware failure analysis program 202 and the software failure analysis program 203, and determines whether the failure event is hardware, software, or analysis is impossible. Make a decision.

ハードウェア障害解析プログラム２０２は、ログ採取プログラム２０７より受け取ったデータ（ログデータ）より、ハードウェア観点での解析を行い、ハードウェアにて異常を検出していないか否かの判断を行なう。ソフトウェア障害解析プログラム２０３は、ログ採取プログラム２０７より受け取ったデータ（ログデータ）より、ソフトウェア観点での解析を行い、ソフトウェアにて異常を検出していないか否かの判断を行なう。事例データベース２１０には、過去の障害事例、及び、解析をするために必要なデータが保存されている。 The hardware failure analysis program 202 analyzes from the viewpoint of hardware based on the data (log data) received from the log collection program 207, and determines whether or not an abnormality is detected by the hardware. The software failure analysis program 203 performs analysis from the viewpoint of software based on the data (log data) received from the log collection program 207, and determines whether or not an abnormality is detected by the software. The case database 210 stores past failure cases and data necessary for analysis.

支援要請先判断プログラム２０４は、現在発生している障害復旧のための支援要請先を解析・支援先データベース２１１を参照して判断する。解析・支援先データベース２１１には、各障害発生箇所における解析支援部門２４０等の連絡先が保存されている。 The support request destination determination program 204 refers to the analysis / support destination database 211 to determine the support request destination for the currently occurring failure recovery. The analysis / support destination database 211 stores contact information such as the analysis support department 240 at each failure location.

障害監視プログラム２０５は、各コンピュータシステムの情報（ログ）、及び、システムとしての状態を監視する。死活監視プログラム２０６は、各コンピュータシステムが動作しているか、停止しているかの監視を行なう。ログ採取プログラム２０７は、システムに関係する全てのログを採取する。障害判断基準データベース２１２には、監視対象のシステムにて障害が発生した際のメッセージが保存されている。 The failure monitoring program 205 monitors information (log) of each computer system and the state of the system. The life and death monitoring program 206 monitors whether each computer system is operating or stopped. The log collection program 207 collects all logs related to the system. The failure judgment reference database 212 stores a message when a failure occurs in the monitored system.

経過時間監視プログラム２０８は、障害が発生したときからの経過時間を計測し記録する。さらに、障害が発生した時刻を記録しておくようにすることもできる。未復旧時通報先判断プログラム２０９は、例えば、２時間経過しても障害の完了報告が無い場合、その旨を連絡する部門を判断する。各関係部門データベース２１３には、解析支援部門２４０や保守担当拠点の連絡先が保存されている。 The elapsed time monitoring program 208 measures and records the elapsed time since the failure occurred. Furthermore, the time when the failure occurs can be recorded. For example, when there is no failure completion report even after 2 hours, the reporting destination determination program 209 at the time of non-recovery determines a department to notify to that effect. Each related department database 213 stores contact information of the analysis support department 240 and the maintenance department.

物流装置２２０は、図１２に示すように、到着時間計算プログラム２２１と、発注プログラム２２２と、交換手順作成プログラム２２３と、在庫データベース２２４と、交換手順データベース２２５とを含む。これらはそれぞれ概略次のように動作する。 As shown in FIG. 12, the logistics apparatus 220 includes an arrival time calculation program 221, an ordering program 222, an exchange procedure creation program 223, an inventory database 224, and an exchange procedure database 225. Each of these operates as follows.

到着時間計算プログラム２２１は、在庫データベース２２４に保存されている各物流センタの部品在庫状況を元に、交換が必要な部品が何時間で現地へ到着するのかを算出する。発注プログラム２２２は、部品持ち出し要求があった場合は、在庫データベース２２５を更新し、各物流処理を行なう。 The arrival time calculation program 221 calculates how many parts that need to be replaced arrive at the site based on the parts inventory status of each distribution center stored in the inventory database 224. The ordering program 222 updates the inventory database 225 and performs each physical distribution process when there is a part take-out request.

交換手順作成プログラム２２３は、交換手順データベース２２５を参照し、持ち出された部品の交換手順を作成し、配信を行う。在庫データベース２２４には、各物流センタの部品在庫状況が保存されている。交換手順データベース２２５には、各部品毎にその交換の手順を示すデータが保存されている。 The replacement procedure creation program 223 refers to the replacement procedure database 225, creates a replacement procedure for the taken-out component, and distributes it. The inventory database 224 stores the parts inventory status of each distribution center. The replacement procedure database 225 stores data indicating the replacement procedure for each component.

保守員２３０は、図１３に示すように、携帯端末２３１を含む。これらはそれぞれ概略次のように動作する。携帯端末２３１は障害発生後に解析結果、被疑部品、部品の在庫状況、および交換手順を示すデータを受信する。また、部品の手配や障害完了報告を送信する。これらの処理が実行されるように、保守員２３０は適宜、携帯端末２３１を操作する。 The maintenance staff 230 includes a portable terminal 231 as shown in FIG. Each of these operates as follows. The mobile terminal 231 receives the analysis result, the suspected part, the inventory status of the part, and data indicating the replacement procedure after the failure occurs. It also sends parts arrangements and failure completion reports. The maintenance staff 230 appropriately operates the portable terminal 231 so that these processes are executed.

解析支援部門２４０は、図１４に示すように、モバイル端末２４１を含む。これらはそれぞれ概略次のように動作する。モバイル端末２４１はログデータを受信するために用いられる。 The analysis support department 240 includes a mobile terminal 241 as shown in FIG. Each of these operates as follows. The mobile terminal 241 is used for receiving log data.

次に、図１０乃至図１４、及び図１５のフローチャートを参照して本発明の第２の実施の形態の動作について詳細に説明する。 Next, the operation of the second exemplary embodiment of the present invention will be described in detail with reference to the flowcharts of FIGS. 10 to 14 and FIG.

まず、図１０に示したインターネットに接続されたコンピュータシステムを全て監視している障害監視通報解析装置２００内の障害監視プログラム２０５は、コンピュータシステムの異常を検出する（図１５のステップＢ１）。次に、障害が発生したことを示すデータを経過時間監視プログラム２０８へ引き渡す（ステップＢ２）。 First, the failure monitoring program 205 in the failure monitoring notification analysis apparatus 200 that monitors all computer systems connected to the Internet shown in FIG. 10 detects an abnormality in the computer system (step B1 in FIG. 15). Next, data indicating that a failure has occurred is transferred to the elapsed time monitoring program 208 (step B2).

次に、このデータを受けた経過時間監視プログラム２０８が起動し、障害発生からの経過時間の計測および監視を開始する（ステップＢ３）。このとき、障害発生時刻を所定のデータベースに登録しておくことができる。さらに、障害発生を示すデータを保守員２３０の携帯端末２３１へ送信する（ステップＢ４）。ログ採取プログラム２０７により接続システム（図１０に示した各コンピュータシステム）全てのログ採取が実行される（ステップＢ５）。採取されたログは、ハードウェア障害解析プログラム２０２、及びソフトウェア障害解析プログラム２０３によって、事例データベース２１０を元に解析が行われる（ステップＢ６）。 Next, the elapsed time monitoring program 208 that has received this data is activated, and starts measuring and monitoring the elapsed time since the occurrence of the failure (step B3). At this time, the failure occurrence time can be registered in a predetermined database. Further, data indicating the occurrence of the failure is transmitted to the portable terminal 231 of the maintenance staff 230 (step B4). Log collection of all connected systems (each computer system shown in FIG. 10) is performed by the log collection program 207 (step B5). The collected logs are analyzed based on the case database 210 by the hardware failure analysis program 202 and the software failure analysis program 203 (step B6).

本事例では、解析結果が正常に算出され、ハードウェアの障害（ハードディスクの電源部故障）であったと仮定する。この場合、ハードウェア障害解析プログラム２０２によってハードディスク電源部分にて電源供給不可状態に陥ったと解析され、逆にソフトウェア障害解析プログラム２０３では書き込み不可のため装置のリブートを行った旨の結果が出力されたものとする。 In this case, it is assumed that the analysis result is normally calculated and that the hardware is a failure (hard disk power supply unit failure). In this case, it is analyzed by the hardware failure analysis program 202 that the power supply cannot be supplied in the hard disk power supply portion. Conversely, the software failure analysis program 203 outputs a result indicating that the device has been rebooted because writing is impossible. Shall.

このハードウェア障害解析プログラム２０２と、ソフトウェア障害解析プログラム２０３の両プログラムの解析結果に基づいて、総合解析プログラム２０１が解析結果を算出し、ハードディスク故障と判断する（ステップＢ７）。そして、ハードディスク故障であること（解析結果）、及び該当するハードディスクの種類（被疑部品）を示すデータを経過時間監視プログラム２０８へ渡す（ステップＢ８）。 Based on the analysis results of both the hardware failure analysis program 202 and the software failure analysis program 203, the comprehensive analysis program 201 calculates the analysis results and determines that there is a hard disk failure (step B7). Data indicating the hard disk failure (analysis result) and the type of the corresponding hard disk (suspected part) is passed to the elapsed time monitoring program 208 (step B8).

経過時間監視プログラム２０８は解析完了時刻を所定のデータベースに登録し、受信データを物流装置２２０へ転送する（ステップＢ９）。解析結果及び部品情報（被疑部品（この場合、ハードディスクの種類等の情報））を受信した物流装置２２０は、ハードディスクの在庫状況について在庫データベース２２４を参照し、検索する（ステップＢ１０）。 The elapsed time monitoring program 208 registers the analysis completion time in a predetermined database, and transfers the received data to the physical distribution device 220 (step B9). The logistics apparatus 220 that has received the analysis result and the component information (the suspected component (in this case, information such as the type of the hard disk)) refers to the inventory database 224 and searches for the inventory status of the hard disk (step B10).

次に、交換手順作成プログラム２２３は、交換手順データベース２２５を参照し、交換が必要な部品の交換作業手順を示す交換手順書を作成する（ステップＢ１１）。次に、交換手順書、交換部品、部品の在庫状況等を示すデータが保守員２３０の携帯端末２３１へ送信される（ステップＢ１２）。 Next, the replacement procedure creation program 223 refers to the replacement procedure database 225 and creates a replacement procedure document indicating a replacement work procedure for parts that need replacement (step B11). Next, data indicating the replacement procedure, replacement parts, inventory status of the parts, etc. is transmitted to the portable terminal 231 of the maintenance staff 230 (step B12).

保守員２３０は、携帯端末２３１にて交換部品、在庫状況、交換手順書等を示すデータを受信する（ステップＢ１３）。データを受信した保守員２３０は携帯端末２３１を用いてユーザ先担当者へ連絡を入れ、交換作業の日程を調整する（ステップＢ１４）。次いで、物流装置２２０に対して決定した日程に部品が到着するよう携帯端末２３１から指示する。 The maintenance staff 230 receives the data indicating the replacement part, the inventory status, the replacement procedure manual, etc. at the portable terminal 231 (step B13). The maintenance staff 230 who has received the data uses the portable terminal 231 to contact the person in charge of the user and adjusts the schedule of the replacement work (step B14). Next, the portable terminal 231 instructs the physical distribution device 220 to arrive at the part on the determined schedule.

部品発送指示を受けた物流装置２２０は、発注プログラム２２２により発注を受理し、物流システムへ部品発注を行い、在庫データベース２２４には、ハードディスクの持ち出しがあったことが記録され、在庫データベース２２４が更新される（ステップＢ１５）。 Upon receiving the parts dispatch instruction, the logistics apparatus 220 accepts the ordering by the ordering program 222, places the parts order to the logistics system, the inventory database 224 records that the hard disk has been taken out, and the inventory database 224 is updated. (Step B15).

交換日時調整を終えた保守員２３０は、作業日時に現地へ向かい、既に受信済みの交換手順書を元に、到着しているハードディスクの交換を実施する（ステップＢ１６）。問題無く、例えば２時間以内に作業を終え（ステップＢ１７）、作業完了報告を携帯端末２３１より障害監視通報解析装置２００に対して送信する（ステップＢ１８）。 After completing the replacement date and time adjustment, the maintenance staff 230 heads for the work date and time, and replaces the hard disk that has arrived based on the received replacement procedure (step B16). Without any problem, for example, the work is completed within 2 hours (step B17), and a work completion report is transmitted from the portable terminal 231 to the failure monitoring notification analyzing apparatus 200 (step B18).

作業完了報告を受信した障害監視通報解析装置２００は、作業完了を経過時間監視プログラム２０８によって登録し、本件のクローズを行なう（ステップＢ１９）。次いで、受信したデータ（作業完了報告）を総合解析プログラム２０１へ転送する。総合解析プログラム２０１は、作業完了報告を受信し（Ｂ２０）、最後に、一次解析で算出したデータに誤りが無かったと判断し、特に事例データベース２１０の更新は行なわない（ステップＢ２１）。 The failure monitoring report analysis apparatus 200 that has received the work completion report registers the work completion by the elapsed time monitoring program 208 and closes this case (step B19). Next, the received data (work completion report) is transferred to the comprehensive analysis program 201. The comprehensive analysis program 201 receives the work completion report (B20), and finally determines that there is no error in the data calculated in the primary analysis, and does not particularly update the case database 210 (step B21).

なお、障害解析は、障害事象がハードウェア（H/W）であるかソフトウェア（S/W）であるか総合的に解析するようにすることができる。例えば、ＵＮＩＸ（登録商標）マシンでは、まずH/W側ログを確認し、被疑の特定が不可であった場合はダンプ解析、即ち、S/W側からの解析が必要となる。従って、この部分に関しては、採取した全てのログの中で、H/W側ログから解析が可能であった場合は、H/Wを被疑として出力し、それ以外をS/W障害として出力する。 In the failure analysis, it is possible to comprehensively analyze whether the failure event is hardware (H / W) or software (S / W). For example, in a UNIX (registered trademark) machine, the H / W side log is first checked, and if the suspicion cannot be identified, dump analysis, that is, analysis from the S / W side is required. Therefore, for this part, if analysis can be performed from the H / W side log in all collected logs, H / W is output as a suspicion and the other is output as an S / W failure. .

また、障害解析支援部門先を判断する場合、システムによって支援先がホストシステムであったり、ディスクグループ、テープグループ、また、ホストのS/Wであったりと分かれており、それら全てから採取されたログを個別に判断し、ホストより、このパスに接続されているデバイスか、又は自分自身が故障していると判断されるログを出力し、その配下のディスクアレイ装置が、自分自身でコントローラが故障しているという結果を出力していた場合、支援先はディスクアレイ装置の開発元であるといったように判断することができる。また、両方のマシンで何らかの障害を検出した場合、支援先を両方の開発部門と決定することができる。 Also, when determining the failure analysis support department destination, the support destination is divided into the host system, the disk group, the tape group, and the host S / W depending on the system. Judgment is made individually for each log, and a log is output from the host indicating that the device connected to this path or that the device itself has failed. If the result of failure is output, it can be determined that the support destination is the developer of the disk array device. If any failure is detected on both machines, the support destination can be determined to be both development departments.

また、交換部品到着時間の計算は、以下のようにすることができ、保守員１５０（２３０）に送信することによって処置日時決定の参考とすることができる。まず、該当の部品がどの地区にあるかを判断する。障害が発生した場所と同じ都道府県内に部品が存在した場合は、地図情報等を事前に登録してあるデータベースより、道路状況に比較的左右されない、例えば「バイク便」での到着時間を算出する。これは、ユーザ先住所とパーツセンター間の直線距離等により決定することができる。遠地にある場合は、例えば、公共機関とバイク便を利用した時間を算出することができる。パーツセンターから公共機関までの時間については、データベースへ登録しておくことができる。 Also, the replacement part arrival time can be calculated as follows, and can be used as a reference for determining the treatment date and time by transmitting it to the maintenance staff 150 (230). First, determine in which district the relevant part is located. If there is a part in the same prefecture as the place where the failure occurred, the arrival time for a "bike flight", for example, is relatively unaffected by road conditions from a database in which map information is registered in advance. To do. This can be determined by the straight line distance between the user's address and the parts center. When the user is in a remote place, for example, the time using a public institution and a motorcycle flight can be calculated. The time from the parts center to public institutions can be registered in the database.

部品交換の交換手順を作成する場合、技術支援部門にて、事前に部品毎にパーツ交換に対応した交換手順をデータベースへ登録しておき、入力として部品（パーツ）が投入された場合に、そのパーツに対応する交換手順を出力するようにすることができる。 When creating a replacement procedure for parts replacement, the technical support department registers in advance a replacement procedure corresponding to parts replacement for each part in the database, and when a part (part) is input as input, An exchange procedure corresponding to the part can be output.

障害監視通報は以下のようにして行うことができる。個別に障害監視用端末を設置し、この障害監視用端末により各システムがエラーを出力する先を障害監視用端末に設定しておき、障害監視用端末が常時、各システムから出力されるエラーに対応する入力信号を監視する。何らかの障害が発生した場合、入力信号に基づいて障害の発生を障害監視端末が感知して、各々のシステムへアクセスし、全ての採取可能情報（ログデータ等）を採取する。採取されたログデータを１つのフォルダでまとめ、暗号化付の圧縮を実施し、Ｅメールにて通報する。Ｅメールには、障害発生を示すデータと、ログデータと、障害発生時刻等が含まれる。 The fault monitoring notification can be performed as follows. A fault monitoring terminal is individually installed, and the fault monitoring terminal sets the destination of error output from each system to the fault monitoring terminal. Monitor the corresponding input signal. When any failure occurs, the failure monitoring terminal senses the occurrence of the failure based on the input signal, accesses each system, and collects all collectable information (log data, etc.). Collect the collected log data in one folder, compress with encryption, and notify by e-mail. The e-mail includes data indicating the occurrence of a failure, log data, failure occurrence time, and the like.

保守経過時間の監視は、以下のようにして行うことができる。
・通報されてきた内容から障害発生時刻を抜き出し、記憶させておく。
・被疑部品を特定し、保守員１５０（２３０）へ通報した時刻を記憶させておく。
・作業開始時刻を記憶させておく。
それぞれの時刻から例えば２時間が経過した時点で、その旨を各関連部門に通報する。後は１時間置きに通報を行い、復旧した時点で完了通知を行い終了する。 The maintenance elapsed time can be monitored as follows.
-Extract the failure time from the reported content and store it.
-The suspected part is specified and the time when the maintenance staff 150 (230) is notified is stored.
・ Store the work start time.
For example, when 2 hours have passed since each time, the relevant department is notified to that effect. After that, a report is made every hour, and when the recovery is completed, a completion notice is given and the process is terminated.

以上説明したように、本発明の第１の実施の形態により次のような効果が得られる。第１の効果は、一連の流れに対する大幅な工数削減ができることにある。その理由は、保守員１５０が行う処理以外を全て自動化したこと、及び、ログの採取、解析までを自動化させたことにより、障害発生から部品手配までの時間を短縮することができたためである。第２の効果は、関連部門への報告が簡略化できることにある。その理由は、状況監視を自動で行なっており、完了報告を行うのみで正常に終了したことを通知可能なためである。第３の効果は、障害が発生したことを即時認識できることにある。その理由は、障害監視通報装置１３０によりユーザが認知するより早く通報されてくるためである。第４の効果は、ソフトウェア・ハードウェアに関係無く解析ができることにある。その理由は、解析装置１００内にソフトウェアを解析するプログラムと、ハードウェアを解析するプログラムと、それらのプログラムによって算出された解析結果を総合的に解析するプログラムの３つのプログラムを組み込んだためである。 As described above, the following effects can be obtained by the first embodiment of the present invention. The first effect is that the number of man-hours for a series of flows can be greatly reduced. The reason is that the time from failure occurrence to parts arrangement can be shortened by automating everything except the processing performed by the maintenance staff 150 and automating the log collection and analysis. The second effect is that reporting to related departments can be simplified. The reason is that the situation monitoring is performed automatically, and it is possible to notify that the operation has been completed normally only by reporting the completion. The third effect is that it is possible to immediately recognize that a failure has occurred. The reason is that the failure monitoring notification device 130 is notified earlier than the user recognizes. The fourth effect is that analysis can be performed regardless of software and hardware. This is because the analysis apparatus 100 incorporates three programs: a program for analyzing software, a program for analyzing hardware, and a program for comprehensively analyzing the analysis results calculated by these programs. .

また、本発明の第２の実施の形態により、次のような効果が得られる。即ち、本発明の第２の実施の形態では、１つの装置（障害監視通報解析装置２００）に解析や障害監視に関わる全てのプログラムが常駐するように構成されているため、インターネットを介することによる時間の浪費が軽減できる。また、本発明の第２の実施の形態では、さらに、ほとんどのプログラムがユーザ先の装置にて動作するように構成されているため、一連の流れの中で例外が発生した場合、手動での解析支援部門２４０への解析依頼や、関連部門へのエスカレーションが現地にて作業を行なっている保守員２３０により比較的容易にできる。 Moreover, the following effects are acquired by the 2nd Embodiment of this invention. In other words, in the second embodiment of the present invention, all programs related to analysis and fault monitoring are configured to reside in one apparatus (fault monitoring / report analyzing apparatus 200). Time waste can be reduced. Further, in the second embodiment of the present invention, since most programs are configured to operate on the user device, if an exception occurs in a series of flows, the program is manually Requests for analysis to the analysis support department 240 and escalations to related departments can be made relatively easy by the maintenance staff 230 working on site.

なお、上記実施の形態の構成及び動作は例であって、本発明の趣旨を逸脱しない範囲で適宜変更することができることは言うまでもない。 It should be noted that the configuration and operation of the above-described embodiment are examples, and it goes without saying that they can be changed as appropriate without departing from the spirit of the present invention.

本発明は、その他のコンピュータ保守の分野における新たなサービスの一環として、コンピュータ障害発生からほぼ人手介入無しにて迅速な障害復旧を行うといった用途に適用することができる。また、ハードウェア保守のみではなく、ソフトウェア保守といった用途にも適用可能である。 The present invention can be applied to uses such as quick recovery from a computer failure occurrence with almost no manual intervention as part of a new service in the field of computer maintenance. Moreover, it is applicable not only to hardware maintenance but also to applications such as software maintenance.

本発明の第１の実施の形態の全体構成例を示すブロック図である。It is a block diagram which shows the example of whole structure of the 1st Embodiment of this invention. 解析装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of an analyzer. 解析支援装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of an analysis assistance apparatus. 物流装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of a physical distribution apparatus. 障害監視通報装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of a failure monitoring notification apparatus. 保守状況監視装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of a maintenance condition monitoring apparatus. 保守員の構成例を示すブロック図である。It is a block diagram which shows the structural example of a maintenance worker. 解析支援部門の構成例を示すブロック図である。It is a block diagram which shows the structural example of an analysis assistance department. 第１の実施の形態の動作を説明するためのフローチャートである。It is a flowchart for demonstrating operation | movement of 1st Embodiment. 本発明の第２の実施の形態の全体構成例を示すブロック図である。It is a block diagram which shows the example of whole structure of the 2nd Embodiment of this invention. 障害監視通報解析装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of a failure monitoring report analysis apparatus. 物流装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of a physical distribution apparatus. 保守員の構成例を示すブロック図である。It is a block diagram which shows the structural example of a maintenance worker. 解析支援部門の構成例を示すブロック図である。It is a block diagram which shows the structural example of an analysis assistance department. 第２の実施の形態の動作を説明するためのフローチャートである。It is a flowchart for demonstrating operation | movement of 2nd Embodiment. 従来の保守システムの構成例を示すブロック図である。It is a block diagram which shows the structural example of the conventional maintenance system.

Explanation of symbols

１０顧客システム
２０保守装置
３０保守員端末
２１構成ＤＢ
２２保守ＤＢ
２３保守ＤＢ
２４ＤＢ登録部
２５診断処理部
１００解析装置
１０１総合解析プログラム
１０２ハードウェア障害解析プログラム
１０３ソフトウェア障害解析プログラム
１０４事例データベース
１１０解析支援装置
１１１支援要請先判断プログラム
１１２解析・支援先データベース
１２０，２２０物流装置
１２１到着時間計算プログラム
１２２発注プログラム
１２３交換手順作成プログラム
１２４在庫データベース
１２５交換手順データベース
１３０障害監視通報装置
１３１障害監視プログラム
１３２死活監視プログラム
１３３ログ採取プログラム
１３４障害判断基準データベース
１４０保守状況監視装置
１４１経過時間監視プログラム
１４２未復旧時通報先判断プログラム
１４３各関係部門データベース
１５０，２３０保守員
１５１携帯端末
１６０，２４０解析支援部門
１６１モバイル端末（端末）
２００障害監視通報解析装置
２０１総合解析プログラム
２０２ハードウェア障害解析プログラム
２０３ソフトウェア障害解析プログラム
２０４支援要請先判断プログラム
２０５障害監視プログラム
２０６死活監視プログラム
２０７ログ採取プログラム
２０８経過時間監視プログラム
２０９未復旧時通報先判断プログラム
２１０事例データベース
２１１解析・支援先データベース
２１２障害判断基準データベース
２１３各関係部門データベース
２２０物流装置
２２１到着時間計算プログラム
２２２発注プログラム
２２３交換手順作成プログラム
２２４在庫データベース
２２５交換手順データベース
２３１携帯端末
２４０解析支援部門
２４１モバイル端末

10 Customer system 20 Maintenance device 30 Maintenance staff terminal 21 Configuration DB
22 Maintenance DB
23 Maintenance DB
24 DB registration unit 25 diagnosis processing unit 100 analysis device 101 comprehensive analysis program 102 hardware failure analysis program 103 software failure analysis program 104 case database 110 analysis support device 111 support request destination judgment program 112 analysis / support destination database 120, 220 logistics device 121 Arrival Time Calculation Program 122 Ordering Program 123 Replacement Procedure Creation Program 124 Inventory Database 125 Replacement Procedure Database 130 Fault Monitoring and Reporting Device 131 Fault Monitoring Program 132 Life and Death Monitoring Program 133 Log Collection Program 134 Fault Judgment Criteria Database 140 Maintenance Status Monitoring Device 141 Elapsed Time Monitoring program 142 Unrecovered notification destination judgment program 143 Each related department database 150, 23 Maintenance personnel 151 mobile terminal 160,240 analysis support department 161 mobile terminal (terminal)
DESCRIPTION OF SYMBOLS 200 Failure monitoring notification analysis apparatus 201 Comprehensive analysis program 202 Hardware failure analysis program 203 Software failure analysis program 204 Support request destination judgment program 205 Failure monitoring program 206 Life and death monitoring program 207 Log collection program 208 Elapsed time monitoring program 209 Notification destination when not recovered Judgment program 210 Case database 211 Analysis / support destination database 212 Failure judgment criteria database 213 Relevant department database 220 Logistic equipment 221 Arrival time calculation program 222 Ordering program 223 Replacement procedure creation program 224 Inventory database 225 Replacement procedure database 231 Mobile terminal 240 Analysis support Department 241 Mobile Terminal

Claims

Comprehensive maintenance support system that detects a failure of a computer system connected to a network and performs recovery processing.
Fault detection means for detecting the occurrence of a fault in the computer system;
A notification means for notifying the terminal of the maintenance staff of the occurrence of the failure,
Log collection means to collect logs when a failure occurs;
Analyzing means for analyzing the cause of the failure based on the log;
A search means for searching the inventory status of the part that caused the failure,
Parts arranging means for arranging the parts;
Transmitting means for transmitting a replacement procedure of the parts to the maintenance staff;
Comprehensive maintenance support system characterized by comprising

An elapsed time monitoring means for measuring and monitoring the elapsed time from the occurrence of the failure;
An informing means for informing each time a predetermined time elapses;
The comprehensive maintenance support system according to claim 1, further comprising registration means for registering a completion time of the cause analysis of the failure.

The comprehensive maintenance support system according to claim 1, further comprising calculation means for calculating an arrival time of the part.

The comprehensive maintenance support system according to any one of claims 1 to 3, further comprising a database that stores a replacement procedure for each part, wherein the transmission unit transmits the replacement procedure acquired from the database.

A comprehensive maintenance support method that detects a failure of a computer system connected to a network and performs recovery processing.
A fault detection step of detecting occurrence of a fault in the computer system;
A notification step for notifying the terminal owned by the maintenance staff of the occurrence of a failure;
A log collection step to collect logs when a failure occurs;
An analysis step for analyzing the cause of the failure based on the log;
A search step to find the inventory status of the part that caused the failure;
A parts arranging step of arranging the parts;
A transmission step of transmitting a replacement procedure of the part to the maintenance staff;
Comprehensive maintenance support method comprising:

An elapsed time monitoring step of measuring and monitoring the elapsed time from the occurrence of the failure;
An informing step for informing each time a predetermined time elapses;
The comprehensive maintenance support method according to claim 5, further comprising a registration step of registering a completion time of the cause analysis of the failure.

The comprehensive maintenance support method according to claim 5, further comprising a calculation step of calculating an arrival time of the part.

The comprehensive maintenance support according to claim 5, further comprising a database storing a replacement procedure for each part, wherein the replacement procedure acquired from the database is transmitted in the transmission step. Method.

A comprehensive maintenance support program in a comprehensive maintenance support system that detects a failure of a computer system connected to a network and performs recovery processing,
A fault detection step of detecting occurrence of a fault in the computer system;
A notification step for notifying the terminal owned by the maintenance staff of the occurrence of a failure;
A log collection step to collect logs when a failure occurs;
An analysis step for analyzing the cause of the failure based on the log;
A search step to find the inventory status of the part that caused the failure;
A parts arranging step of arranging the parts;
A transmission step of transmitting a replacement procedure of the part to the maintenance staff;
Is executed by the integrated maintenance support system.

An elapsed time monitoring step of measuring and monitoring the elapsed time from the occurrence of the failure;
An informing step for informing each time a predetermined time elapses;
The comprehensive maintenance support program according to claim 9, further comprising: a registration step of registering a completion time of the cause analysis of the failure.

The comprehensive maintenance support program according to claim 9, further comprising a calculation step of calculating an arrival time of the part.

The comprehensive maintenance support according to any one of claims 9 to 11, further comprising a database that stores a replacement procedure for each part, wherein the replacement procedure acquired from the database is transmitted in the transmission step. program.