JP2008171057A

JP2008171057A - System integration management system

Info

Publication number: JP2008171057A
Application number: JP2007001293A
Authority: JP
Inventors: Kazuo Furuya; 一雄古谷
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2007-01-09
Filing date: 2007-01-09
Publication date: 2008-07-24

Abstract

PROBLEM TO BE SOLVED: To provide a system capable of enhancing the determination precision of maintenance-object parts necessitating check or parts replacement, thereby reducing parts cost and operation cost. SOLUTION: The system integration management server 4 is provided with a data collecting function 41 for collecting the data of a monitor control database 24 in wide area monitor control servers 2A and 2B, the data of a network management database 34 in a network management server 3 and data from a non-IP network monitoring device 5. The system integration management server 4 is provided with a main factor deciding function 46 for deciding failure information as a main factor from among a plurality of failure information and a statistical processing function 44 for executing statistical processing to the failure information decided as the main factor. Thus, it is possible to remove any secondarily generated failure, and to execute statistical processing only to the failure as the main factor for preparing the object of evaluation. Therefore, it is possible to improve the deciding precision of maintenance object parts, and to sharply reduce parts costs and operation costs necessary for maintenance. COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、ＩＰネットワークおよび非ＩＰネットワークの情報を収集・蓄積している複数のシステムの異常・故障等の障害情報を統合して管理し、点検または部品交換が必要な保守対象部品の抽出、交換部品の手配、保守作業の手配、保守員への作業指示等の一連の予防保守作業をシステム動作において行うシステム総合管理システムに関するものである。 The present invention integrates and manages failure information such as abnormalities / failures of a plurality of systems that collect and store information on IP networks and non-IP networks, and extracts maintenance target parts that require inspection or parts replacement. The present invention relates to a system total management system that performs a series of preventive maintenance operations such as replacement parts arrangement, maintenance arrangement, and work instruction to maintenance personnel in the system operation.

近年、河川、道路、ダム、ビル等の監視制御を行う広域監視制御システムにおいては、例えばＳＮＭＰ（Simple Network ManagementProtocol）、ＰＩＮＧ（Packet IＮternet Groper）等を用いてＩＰネットワークの情報を収集、蓄積しているシステムと、例えばＳＤＨ（SynchronousDigital Hierarchy；同時デジタルハイアラーキ）により管理されている装置（非ＩＰネットワーク管理装置）の情報を収集、蓄積しているシステムの双方が実用化されており、それぞれのシステムの異常・故障等の障害情報を統合し、一元管理するシステム総合管理システムの構築が必要とされている。 In recent years, in a wide area monitoring and control system for monitoring and controlling rivers, roads, dams, buildings, etc., IP network information is collected and accumulated using, for example, SNMP (Simple Network Management Protocol), PING (Packet Internet Groper), etc. System that collects and stores information on systems managed by, for example, SDH (Synchronous Digital Hierarchy) (non-IP network management device) has been put into practical use. There is a need to construct a system management system that integrates and manages failure information such as abnormalities and failures.

例えば特許文献１に開示された予防保守支援システムでは、保守対象システム運転中の統計障害情報を自動通報により蓄積し、蓄積された統計障害情報を予め設定した監視基準値により選択して保守支援センタに通報し、消耗部品および劣化部品の判定を行い、交換部品の手配、保守実施計画、保守員派遣等の一連の手続きを、人手を介することなく、システム動作により行っている。これにより、予防保守の一連の作業を迅速に行い、保守作業の遅れによりシステムダウンとなることを未然に防止することを目的としている。 For example, in the preventive maintenance support system disclosed in Patent Document 1, statistical failure information during operation of the maintenance target system is accumulated by automatic notification, and the accumulated statistical failure information is selected based on a preset monitoring reference value, and the maintenance support center is selected. The system operation is performed without human intervention by determining the consumable parts and the deteriorated parts and arranging a replacement part, a maintenance execution plan, and dispatching maintenance personnel. Thus, a series of preventive maintenance operations are performed quickly, and the purpose is to prevent the system from going down due to a delay in maintenance operations.

このような管理システムの例では、統計障害情報監視処理部において、例えばメモリに関しては訂正可能エラーの発生回数が１００回、磁気テープ装置に関しては読取エラーの発生回数が８０回という監視基準値に達する毎に、統計障害情報が障害リストとして保守支援センタに通報される。さらに、保守支援センタでは、運転状態解析評価処理部において、保守対象部品とする判定の基準となる限界値、例えばメモリに関しては訂正可能エラーの発生回数の累積回数が１０００回、磁気テープ装置に関しては読取エラーの発生回数の累積回数が１００回に達した時に、保守対象部品として指摘する。
特開平４−１２７２４７号公報 In such an example of the management system, the statistical failure information monitoring processing unit reaches a monitoring reference value such that the number of occurrences of correctable errors is 100 times for, for example, a memory, and the number of occurrences of read errors is 80 for a magnetic tape device. Every time, statistical failure information is reported to the maintenance support center as a failure list. Further, in the maintenance support center, the operation state analysis evaluation processing unit has a limit value that is a criterion for determination as a maintenance target part, for example, the cumulative number of occurrences of correctable errors for a memory is 1000 times, and for a magnetic tape device. When the cumulative number of occurrences of reading errors reaches 100 times, it is pointed out as a maintenance target part.
JP-A-4-127247

上記のような従来の管理システムにおいては、全ての障害情報に対して統計処理を行っており、ある障害の影響を受けて二次的に発生した障害についての考慮がなされていなかった。このため、保守支援センタに通報された全ての障害情報が、予め設定されたシステム要素の各装置の限界値（閾値）により評価される対象となり、点検または部品交換を行う保守対象部品が検出されていた。 In the conventional management system as described above, statistical processing is performed on all pieces of failure information, and no consideration has been given to failures that occur secondarily under the influence of certain failures. For this reason, all fault information notified to the maintenance support center is subject to evaluation based on the preset limit values (threshold values) of each device of the system elements, and the maintenance target parts to be inspected or replaced are detected. It was.

すなわち、従来システムにおいては、複数発生した異常・故障の中で、主原因となる障害のみならず、この障害の影響を受けて二次的に発生した障害についても同様に扱われ、保守対象部品を検出する際の評価に含まれるため、本来なら保守作業を必要としない正常な部品、装置を保守対象部品として検出してしまうという問題があった。その結果、点検、部品交換等の保守作業が頻繁に行われ、部品コスト及び保守作業コストの増大に繋がっていた。 In other words, in the conventional system, not only the failure that is the main cause among the abnormalities and failures that have occurred, but also the failures that occur secondaryly under the influence of this failure are handled in the same way, and the parts subject to maintenance Therefore, there is a problem in that normal parts and devices that do not normally require maintenance work are detected as maintenance target parts. As a result, maintenance work such as inspection and parts replacement is frequently performed, leading to an increase in parts cost and maintenance work cost.

本発明は、上記のような問題点を解消するためになされたもので、点検または部品交換が必要な保守対象部品の判定精度を向上させ、部品コスト及び作業コストの低減を図ることを目的とする。 The present invention has been made to solve the above-described problems, and has an object to improve the accuracy of determining a maintenance target part that needs to be inspected or replaced, and to reduce the part cost and the work cost. To do.

本発明によるシステム総合管理システムは、保守対象システムと、保守対象システムを構成する装置毎の障害情報を統計して分析、管理するシステム総合管理部を含むシステム総合管理システムであって、保守対象システム内には、システム運転中に発生した障害情報を周期的または障害発生時に収集する第１のデータ収集機能と、第１のデータ収集機能により収集された障害情報を集計し所定の処理を施す第１の集計／演算機能と、第１のデータ収集機能により収集された障害情報を直接または第１の集計／演算機能経由で受け取り蓄積保存する第１のデータベースを備え、システム総合管理部内には、第１のデータベースが保持する障害情報を収集する第２のデータ収集機能と、第２のデータ収集機能により収集された障害情報を集計し所定の処理を施す第２の集計／演算機能と、第２のデータ収集機能により収集された障害情報を直接または第２の集計／演算機能経由で受け取り蓄積保存する第２のデータベースと、第２のデータベースに保持されている複数の障害情報の中から主原因となる障害情報を判定する主原因判定機能と、第２のデータベースに保持されている障害情報の中から主原因判定機能によって主原因と判定された障害情報に対して統計処理を行い装置毎の統計障害情報を作成する統計処理機能を備えたものである。 A system total management system according to the present invention is a system total management system including a maintenance target system and a system total management unit that statistically analyzes and manages failure information for each device constituting the maintenance target system. The first data collection function for collecting failure information generated during system operation periodically or when a failure occurs, and the failure information collected by the first data collection function are totaled and a predetermined process is performed. 1 total / calculation function and a first database for storing and storing fault information collected by the first data collection function directly or via the first total / calculation function. The second data collection function for collecting failure information held in the first database and the failure information collected by the second data collection function are totaled. A second totaling / calculating function for performing predetermined processing; a second database for receiving and storing and storing fault information collected by the second data collecting function directly or via the second totaling / calculating function; Main cause determination function that determines failure information that is the main cause from a plurality of pieces of failure information stored in the database, and a main cause determination function from the failure information that is stored in the second database And a statistical processing function for creating statistical failure information for each device by performing statistical processing on the failure information determined to be.

本発明によれば、システム総合管理部内の第２のデータベースに保持されている複数の障害情報の中から主原因となる障害情報を判定する主原因判定機能と、この主原因判定機能によって主原因と判定された障害情報に対して統計処理を行い装置毎の統計障害情報を作成する統計処理機能を備えることにより、主原因となる障害の影響を受けて二次的に発生した障害を除外し、主原因となる障害のみについて統計処理を行い、評価の対象とすることができるため、保守対象部品の判定精度が向上する。 According to the present invention, a main cause determination function for determining failure information as a main cause from a plurality of pieces of failure information held in the second database in the system general management unit, and a main cause by this main cause determination function By providing a statistical processing function that creates statistical failure information for each device by performing statistical processing on failure information determined to be, it eliminates secondary failures caused by the main cause of failure. Since the statistical processing can be performed only on the failure that is the main cause and can be the target of evaluation, the determination accuracy of the maintenance target component is improved.

実施の形態１．
以下に、本発明を実施するための最良の形態である実施の形態１について、図面に基づいて説明する。図1は、本発明の実施の形態1におけるシステム総合管理システムを示す概略図である。本実施の形態におけるシステム総合管理システムは、広域監視制御サーバ２Ａ、２Ｂ、ネットワーク管理サーバ３、システム総合管理サーバ４および保守支援センタ６から構成されるイントラネット（ＩＰネットワーク）１０と、非ＩＰネットワーク監視装置５から構成される非ＩＰネットワーク２０を含んでいる。なお、図中、実線で示す通信回線８ａはＩＰ接続の通信回線、点線で示す通信回線８ｂは非ＩＰ接続の通信回線を示している。 Embodiment 1 FIG.
The first embodiment, which is the best mode for carrying out the present invention, will be described below with reference to the drawings. FIG. 1 is a schematic diagram showing an overall system management system according to Embodiment 1 of the present invention. The system integrated management system according to the present embodiment includes an intranet (IP network) 10 including wide area monitoring control servers 2A and 2B, a network management server 3, a system comprehensive management server 4 and a maintenance support center 6, and a non-IP network monitoring. A non-IP network 20 composed of the device 5 is included. In the figure, a communication line 8a indicated by a solid line indicates an IP connection communication line, and a communication line 8b indicated by a dotted line indicates a non-IP connection communication line.

本実施の形態におけるシステム総合管理システムは、広域監視制御サーバ２Ａ、２Ｂ、ネットワーク管理サーバ３および非ＩＰネットワーク監視装置５等の保守対象システムと、これらの保守対象システムを構成する装置毎の異常・故障等の障害情報を統計して分析、管理するシステム総合管理部であるシステム総合管理サーバ４を含んで構成されている。 The system comprehensive management system according to the present embodiment includes maintenance target systems such as the wide area monitoring control servers 2A and 2B, the network management server 3, and the non-IP network monitoring device 5, and abnormality / It includes a system integrated management server 4 that is a system integrated management unit that statistically analyzes and manages failure information such as failures.

広域監視制御サーバ２Ａ内には、第１のデータ収集機能であるデータ収集機能２１、データ配信機能２２、第１の集計／演算機能である集計／演算機能２３および監視制御データを蓄積する第１のデータベースである監視制御データベース２４が設けられている。データ収集機能２１にて収集されたデータは、直接または集計／演算機能２３経由で監視制御データベース２４へ蓄積される。なお、広域監視制御サーバ２Ｂについても同様であるので説明を省略する。 In the wide area monitoring control server 2A, the data collection function 21, which is the first data collection function, the data distribution function 22, the aggregation / calculation function 23 which is the first aggregation / calculation function, and the first which accumulates the monitoring control data. A monitoring control database 24 is provided. Data collected by the data collection function 21 is accumulated in the monitoring control database 24 directly or via the totaling / calculation function 23. Since the same applies to the wide area monitoring control server 2B, description thereof is omitted.

また、ネットワーク管理サーバ３内には、データ収集機能３１、データ配信機能３２、集計／演算機能３３、及びネットワーク管理データを蓄積するネットワーク管理データベース３４が設けられている。データ収集機能３１にて収集されたデータは、直接または集計／演算機能３３経由でネットワーク管理データベース３４へ蓄積される。 In the network management server 3, a data collection function 31, a data distribution function 32, a totaling / calculation function 33, and a network management database 34 for accumulating network management data are provided. Data collected by the data collection function 31 is stored in the network management database 34 directly or via the aggregation / calculation function 33.

システム総合管理サーバ４内には、第２のデータ収集機能であるデータ収集機能４１、データ配信機能４２、第２の集計／演算機能である集計／演算機能４３、統計処理機能４４、システム管理データを蓄積する第２のデータベースであるシステム管理データベース４５、および主原因判定機能４６が設けられている。データ収集機能４１にて収集されたデータは、直接または集計／演算機能４３経由でシステム管理データベース４５へ蓄積される。主原因判定機能４６は、システム管理データに複数の障害情報が連続して存在する場合に主原因の判定を行うものである。主原因判定機能４６についての詳細は後述する。 In the system general management server 4, a data collection function 41 that is a second data collection function, a data distribution function 42, an aggregation / calculation function 43 that is a second aggregation / calculation function, a statistical processing function 44, system management data A system management database 45, which is a second database for storing data, and a main cause determination function 46 are provided. The data collected by the data collection function 41 is stored in the system management database 45 directly or via the aggregation / calculation function 43. The main cause determination function 46 determines the main cause when a plurality of pieces of failure information are continuously present in the system management data. Details of the main cause determination function 46 will be described later.

これらの広域監視制御サーバ２Ａ、２Ｂ、ネットワーク管理サーバ３およびシステム総合管理サーバ４の各システムは、Ｌ２スイッチ７ａ、７ｂ、通信回線（ＩＰ接続）８aを介して接続され、さらに、保守支援センタ６も含み、イントラネット（ＩＰネットワーク）１０を構築している。Ｌ２スイッチ７ａ、７ｂは、回線やパケットの交換（スイッチング）機能を有する通信機器であり、ＩＰアドレスに基づいて転送制御を行う。なお、本実施の形態ではＬ２スイッチを用いた例を示したが、スイッチはこれに限定されるものではなく例えばＬ３スイッチを用いる場合もある。なお、保守支援センタ６の詳細については後述する。 These wide-area monitoring and control servers 2A and 2B, the network management server 3 and the system integrated management server 4 are connected via L2 switches 7a and 7b and a communication line (IP connection) 8a. Intranet (IP network) 10 is constructed. The L2 switches 7a and 7b are communication devices having a line or packet exchange (switching) function, and perform transfer control based on an IP address. In the present embodiment, an example using the L2 switch has been described. However, the switch is not limited to this, and for example, an L3 switch may be used. Details of the maintenance support center 6 will be described later.

さらに、ＳＤＨにより管理されている非ＩＰネットワーク管理装置５から構成される非ＩＰネットワーク２０は、Ｌ２スイッチ７ｃを介してＩＰネットワーク１０に接続され、双方のデータが通信可能となっている。なお、ＳＤＨとは、光ファイバを用いた高速デジタル通信方式の国際規格であり、インターネットのバックボーン回線等に用いられるものである。 Further, the non-IP network 20 including the non-IP network management device 5 managed by SDH is connected to the IP network 10 via the L2 switch 7c, and both data can communicate. SDH is an international standard for a high-speed digital communication system using an optical fiber, and is used for an Internet backbone line or the like.

次に、動作について説明する。ＩＰネットワーク１０においては、ネットワークに接続されたルータ（ネットワーク上を流れるデータを他のネットワークに中継する機器）やコンピュータ、端末等の通信機器に対して、ＳＮＭＰ、ＰＩＮＧ等を用いて情報を収集、蓄積し、ネットワーク経由で監視、制御している。広域監視制御サーバ２Ａにおいて、データ収集機能２１は、周期的または状態変化の発生時（障害発生時）にデータを収集し、収集されたデータは直接または集計／演算機能２３経由で監視制御データベース２４に蓄積保存される。広域監視制御サーバ２Ａのデータ配信機能２２から配信されるデータは、監視端末１の画面上に表示される。 Next, the operation will be described. In the IP network 10, information is collected using SNMP, PING, etc., on communication devices such as routers (devices that relay data flowing on the network to other networks), computers, terminals, etc. connected to the network. Accumulated, monitored and controlled via network. In the wide area monitoring control server 2A, the data collection function 21 collects data periodically or when a state change occurs (when a failure occurs), and the collected data is directly or via the aggregation / calculation function 23. Stored and stored. Data distributed from the data distribution function 22 of the wide area monitoring control server 2A is displayed on the screen of the monitoring terminal 1.

また、ネットワーク管理サーバ３では、ＬＡＮ上のネットワーク機器やＰＣ等、ＳＮＭＰに対応した機器の情報はＭＩＢ（ManagementInformation Base）として、ＰＩＮＧに対応した機器の情報はＰＩＮＧの応答結果として、データ収集機能３１にて収集している。収集されたデータは、ネットワーク管理データベース３４に蓄積されるとともに、予め設定されている閾値と比較して超えていれば異常の通知を行う。ネットワーク管理サーバ３の定常時及び障害発生時の状態は、監視端末１の画面上に表示される。 Further, in the network management server 3, information on devices corresponding to SNMP such as network devices and PCs on the LAN is used as MIB (Management Information Base), information on devices corresponding to PING is used as a response result of PING, and the data collection function 31. Collected at The collected data is accumulated in the network management database 34, and if it exceeds the preset threshold value, an abnormality is notified. The state of the network management server 3 when it is steady and when a failure occurs is displayed on the screen of the monitoring terminal 1.

さらに、システム総合管理サーバ４では、広域監視制御サーバ２Ａ、２Ｂが保持する監視制御データベース２４のデータ、ネットワーク管理サーバ３が保持するネットワーク管理データベース３４のデータ、および非ＩＰネットワーク監視装置５からのデータをデータ収集機能４１にて収集し、収集されたデータは直接または集計／演算機能４３経由で、システム管理データベース４５に蓄積保存される。 Further, in the system comprehensive management server 4, data in the monitoring control database 24 held by the wide area monitoring control servers 2 A and 2 B, data in the network management database 34 held in the network management server 3, and data from the non-IP network monitoring device 5. Is collected by the data collection function 41 and the collected data is stored and stored in the system management database 45 directly or via the aggregation / calculation function 43.

次に、本実施の形態におけるシステム総合管理システムの主原因判定機能４６について、図２を用いて説明する。図２（ａ）は、広域監視制御サーバ２Ａにおいて検知された障害を示す状変リストであり、図２（ｂ）は、ネットワーク管理サーバ３において検知された障害を示す状変リストである。図２（ａ）では、広域監視制御システムにおいて、データ１１、データ２４の欠測、およびＷｅｂサーバ−ＤＢサーバ間通信異常という３つの障害を連続して検知したことを示している。また、図２（ｂ）では、広域監視制御システムにおいて上記の障害が発生する約１分前に、ネットワーク管理サーバ３において、該当するデータ、すなわちデータ１１、データ２４の収集経路上の回線切断（Down）を検知したことを示している。 Next, the main cause determination function 46 of the system integrated management system according to the present embodiment will be described with reference to FIG. 2A is a status change list indicating a failure detected in the wide area monitoring control server 2A, and FIG. 2B is a status change list indicating a failure detected in the network management server 3. FIG. 2A shows that in the wide area monitoring and control system, three faults of data 11, data 24 missing, and Web server-DB server communication abnormality are detected in succession. Further, in FIG. 2B, about 1 minute before the occurrence of the failure in the wide area monitoring control system, the network management server 3 disconnects the line on the collection path of the corresponding data, that is, the data 11 and data 24 ( Down) is detected.

さらに、図２（ｂ）では、ルータ_01のポート01の回路切断と、ルータ_02のポート02およびポート03の回路切断が連続して発生しており、その中でルータ_01のポート01の回路切断が先に発生している。また、ルータ_01のポート01とルータ_02のポート02は、ＩＰアドレスおよび接続先ＩＰアドレスより、互いに接続されていることがわかる。ルータ_02のポート02とポート03は、ルータ_02内で接続されている。 Further, in FIG. 2B, the circuit disconnection of the port 01 of the router_01 and the circuit disconnection of the port 02 and the port 03 of the router_02 are continuously generated, and among them, the port 01 of the router_01 The circuit disconnection has occurred first. Further, it can be seen that the port 01 of the router_01 and the port 02 of the router_02 are connected to each other from the IP address and the connection destination IP address. The port 02 and the port 03 of the router_02 are connected in the router_02.

これらの情報から一連の障害の原因を推定すると、障害の主原因はルータ_01のポート01の回路切断であり、ルータ_02のポート02、ポート03の回路切断およびデータ11、データ24の欠測は、ルータ_01のポート01の回路切断の影響を受けて発生した二次的障害であると推定される。このように、ルータ_01のポート01の回路切断に影響されてルータ_02のポート02、ポート03が回路切断し、さらにデータ１１、データ２４の欠測が発生したと推定することで、障害の可能性がある機器はルータ_01のみであると判定する。 Estimating the cause of a series of failures from this information, the main cause of the failure is the circuit disconnection of port 01 of router_01, the circuit disconnection of port 02 and port 03 of router_02, and the lack of data 11 and data 24. The measurement is presumed to be a secondary failure caused by the circuit disconnection of the port 01 of the router_01. As described above, it is estimated that a failure has occurred in the data 11 and the data 24 due to the circuit disconnection in the ports 02 and 03 of the router_02 due to the circuit disconnection of the port 01 in the router_01. It is determined that the only device that has the possibility of is router_01.

なお、本実施の形態では、主原因判定機能４６により複数の障害情報の中からその主原因を判定する際に、ネットワーク管理サーバ３のネットワーク管理データベース３４に保持されているネットワーク状変リストの中から主原因を判定するようにしている。例えば広域監視制御サーバにおけるデータ欠測や、非ＩＰネットワーク監視装置５の故障データ等は、それぞれのシステム内のデータベースに蓄積されるが、主原因とはみなさないように設定されている。このように、主原因判定機能４６は、それぞれの装置の機能やシステム内およびシステム間での位置付けを考慮し、多数の故障内容の中から主原因となりうるもの、二次的障害の可能性が高いものを判定するための判定基準を備えていることが必要である。 In this embodiment, when the main cause determination function 46 determines the main cause from a plurality of pieces of failure information, the network state change list stored in the network management database 34 of the network management server 3 is used. The main cause is judged from. For example, missing data in the wide area monitoring control server, failure data of the non-IP network monitoring device 5 and the like are stored in the database in each system, but are set so as not to be considered as the main cause. As described above, the main cause determination function 46 considers the function of each device and the position within the system and between the systems, and there is a possibility that the main cause from among a number of failure contents or a secondary failure. It is necessary to have a criterion for determining a high one.

主原因判定機能４６において複数の障害情報を関連付ける要因としては、障害の発生時刻が最も容易で且つ正確である。最初の障害が発生した時刻から、所定の設定時間以内、例えば３０秒以内に発生した複数の障害は１つの主となる障害を原因として発生したものであるとみなし、その中から障害の主原因を判定する。この設定時間については３０秒に限らず、任意に設定可能である。発生原因の異なる障害が偶然に連続して発生する場合もあるが、それぞれの装置の接続先アドレスを確認することにより、それらの障害が関連しているかどうかを確認でき、関連のないものと区別することができる。 As a factor for associating a plurality of pieces of failure information in the main cause determination function 46, the occurrence time of the failure is the easiest and accurate. A plurality of failures that occurred within a predetermined set time, for example, 30 seconds from the time when the first failure occurred, are considered to have occurred due to one main failure, and the main cause of the failure from among them Determine. The set time is not limited to 30 seconds and can be set arbitrarily. Failures with different causes may occur consecutively by chance, but by checking the connection destination address of each device, it is possible to check whether these failures are related and distinguish them from unrelated ones. can do.

また、統計処理機能４４は周期的に動作し、システム管理データベース４５に保持されている広域監視制御サーバ２Ａ、２Ｂ、ネットワーク管理サーバ３および非ＩＰネットワーク管理装置５の障害情報に対し、装置毎の発生頻度、発生要因、納入メーカー等の分類で統計処理を行う。その際、主原因判定機能４６によって主原因と判定された障害情報に対して統計処理を行い、統計障害情報を作成する。 In addition, the statistical processing function 44 operates periodically, and for each failure information of the wide area monitoring control servers 2A and 2B, the network management server 3 and the non-IP network management device 5 held in the system management database 45, Statistical processing is performed according to the classification of occurrence frequency, generation factor, delivery manufacturer, etc. At that time, statistical processing is performed on the failure information determined to be the main cause by the main cause determination function 46 to create statistical failure information.

次に、本実施の形態におけるシステム総合管理サーバ４の通報機能４７について図３を用いて説明する。システム総合管理サーバ４内には、主原因判定機能４６により主原因と判定された障害情報の発生回数が、装置毎に予め設定された通報閾値を超えた場合、その装置および障害情報を保守支援センタ６に通報する通報機能４７を備えている。通報機能４７は、図３に示す通報状況管理表をもとにして、主原因と判定された回数が予め設定された通報閾値、例えばルータ_01のポート01の回路切断であれば、主原因と判定された回数が２０回を越えた場合、保守支援センタ６への通報を行う。 Next, the notification function 47 of the system general management server 4 in the present embodiment will be described with reference to FIG. If the number of occurrences of failure information determined as the main cause by the main cause determination function 46 exceeds the notification threshold set in advance for each device, the system general management server 4 supports maintenance of the device and the failure information. A reporting function 47 for reporting to the center 6 is provided. The notification function 47 is based on the notification status management table shown in FIG. 3, and if the number of times determined as the main cause is a preset notification threshold, for example, if the circuit of the port 01 of the router_01 is disconnected, the main cause When the number of times determined to exceed 20 is notified to the maintenance support center 6.

保守支援センタ６内には、通報機能４７により通報された障害情報の回数が、装置毎に予め設定された点検閾値または交換閾値を越えた場合、点検または部品交換を行う保守対象部品として抽出する保守部品検出機能６１を備えている。保守部品検出機能６１は、図４に示す点検、交換閾値のデータベースを保持しており、通報機能４７から新たに通報されてきた情報とこれまで蓄積している障害情報の回数に対して、図４に示す点検閾値または交換閾値と比較し、これを越えた場合に、点検または部品交換を行う保守対象部品であると判定する。 In the maintenance support center 6, when the number of pieces of failure information notified by the notification function 47 exceeds the inspection threshold value or replacement threshold value preset for each device, it is extracted as a maintenance target component to be inspected or replaced. A maintenance part detection function 61 is provided. The maintenance part detection function 61 holds the inspection / replacement threshold database shown in FIG. 4, and shows the newly reported information from the notification function 47 and the number of fault information accumulated so far. 4 is compared with the inspection threshold value or the replacement threshold value shown in FIG.

また、保守支援センタ６内には、保守部品検出機能６１により抽出された保守対象部品に対して保守作業の手配を行う手配伝票生成機能６２を備えている。手配伝票生成機能６２において手配伝票を作成する際には、装置毎の保守情報が必要である。図５は、装置毎の型番、納入年月、最新点検年月、保守担当（部署、担当者）の連絡先メールアドレス、電話番号を格納したデータベースである。手配伝票生成機能６２は、図５に示す情報を格納したデータベースを参照して、手配伝票を自動的に生成する。 The maintenance support center 6 also includes an arrangement slip generation function 62 that arranges maintenance work for the maintenance target parts extracted by the maintenance part detection function 61. When the arrangement slip generation function 62 creates an arrangement slip, maintenance information for each device is required. FIG. 5 is a database that stores the model number, delivery date, latest inspection date, contact email address of the person in charge of maintenance (department, person in charge), and telephone number for each apparatus. The arrangement slip generation function 62 automatically generates an arrangement slip with reference to the database storing the information shown in FIG.

さらに、保守支援センタ６内には、手配伝票生成機能６２により保守作業の手配が行われた保守対象部品に対して、実施する作業の内容、保守員の業務範囲および勤務状況から派遣する保守員を選定し、作業指示を与える作業計画生成機能６３を備えている。図６は、保守員の業務範囲と勤務計画を格納したデータベースである。作業計画生成機能６３は、図６に示す情報を格納したデータベースを参照し、実施する作業の内容と保守員の業務範囲を比較し、該当する保守員の勤務状況が1（所内勤務）すなわち在籍中であることを確認した上で、作業計画を自動的に生成する。 Further, in the maintenance support center 6, the maintenance staff dispatched from the contents of the work to be performed, the scope of work of the maintenance staff, and the work status for the maintenance target parts for which the maintenance work has been arranged by the arrangement slip generation function 62. And a work plan generation function 63 for selecting work and giving work instructions. FIG. 6 is a database that stores the work scope and work plan of maintenance personnel. The work plan generation function 63 refers to the database storing the information shown in FIG. 6 and compares the contents of the work to be performed with the work range of the maintenance staff. A work plan is automatically generated after confirming that it is inside.

以上のように、本実施の形態によれば、広域監視制御サーバ２Ａ、２Ｂが保持する監視制御データベース２４のデータ、ネットワーク管理サーバ３が保持するネットワーク管理データベース３４のデータ、および非ＩＰネットワーク監視装置５からのデータを収集するデータ収集機能４１を備えたシステム総合管理サーバ４において、システム管理データベース４５に保持されている複数の障害情報の中から主原因となる障害情報を判定する主原因判定機能４６と、この主原因判定機能４６によって主原因と判定された障害情報に対して統計処理を行い装置毎の統計障害情報を作成する統計処理機能４４を備えることにより、主原因となる障害の影響を受けて二次的に発生した障害を除外し、主原因となる障害のみについて統計処理を行い、評価の対象とすることができるため、保守対象部品の判定精度が向上する。その結果、点検、部品交換等の保守作業の頻度を少なくすることができ、部品コストおよび作業コストを大幅に低減することが可能となる。 As described above, according to the present embodiment, the data in the monitoring control database 24 held by the wide area monitoring control servers 2A and 2B, the data in the network management database 34 held by the network management server 3, and the non-IP network monitoring device In the system integrated management server 4 having a data collection function 41 that collects data from 5, a main cause determination function that determines failure information that is a main cause from a plurality of pieces of failure information held in the system management database 45 46 and the statistical processing function 44 that performs statistical processing on the failure information determined as the main cause by the main cause determination function 46 and creates statistical failure information for each device, thereby affecting the influence of the failure that is the main cause. The secondary failure is excluded and statistical processing is performed only for the main cause failure. It is possible to value the subject is improved maintenance target component determination accuracy. As a result, the frequency of maintenance work such as inspection and parts replacement can be reduced, and the parts cost and work cost can be greatly reduced.

また、主原因判定機能４６において、所定の時間内に発生した複数の障害は１つの主となる障害を原因として発生したものとみなし、これらの中から主原因となる障害を判定するようにしたので、容易且つ正確に障害の主原因を判定することが可能である。 In the main cause determination function 46, a plurality of failures occurring within a predetermined time are regarded as being caused by a single main failure, and the main cause failure is determined from these. Therefore, it is possible to easily and accurately determine the main cause of the failure.

また、通報機能４７は、主原因判定機能４６により主原因と判定された障害情報の発生回数をもとに保守支援センタ６に通報し、保守支援センタ６内においては、保守部品検出機能６１が通報機能４７により通報された障害情報の回数をもとに保守対象部品を抽出し、この保守対象部品に対して手配伝票生成機能６２が保守作業の手配を行い、作業計画生成機能６３が派遣する保守員を選定し作業指示を与えるようにしたので、本来は必要のない通報や、これを受けての手配伝票作成および作業計画書作成が無くなり、保守業務の著しい効率化が実現できる。さらに、これら予防保守の一連の作業が人手を介することなく、システム動作により迅速に行われるので、保守作業の遅れによりシステムダウンとなることを未然に防止することが可能である。 The notification function 47 notifies the maintenance support center 6 based on the number of occurrences of the failure information determined to be the main cause by the main cause determination function 46. In the maintenance support center 6, a maintenance component detection function 61 is provided. Maintenance target parts are extracted based on the number of failure information notified by the notification function 47, the arrangement slip generation function 62 arranges maintenance work for the maintenance target parts, and the work plan generation function 63 dispatches. Since maintenance personnel are selected and given work instructions, there is no need for notifications that are not necessary, and preparation of arrangement slips and creation of work plans in response to the notifications. Furthermore, since a series of these preventive maintenance operations are quickly performed by the system operation without human intervention, it is possible to prevent the system from being down due to a delay in maintenance operations.

本発明は、河川、道路、ダム、ビル等の監視制御を行う広域監視制御システムを含むシステム総合管理システムに利用可能である。 The present invention is applicable to a system comprehensive management system including a wide area monitoring control system that performs monitoring control of rivers, roads, dams, buildings, and the like.

本発明の実施の形態１におけるシステム総合管理システムを示す概略図である。It is the schematic which shows the system comprehensive management system in Embodiment 1 of this invention. 本発明の実施の形態１におけるシステム総合管理システムの主原因判定機能を説明する図である。It is a figure explaining the main cause determination function of the system integrated management system in Embodiment 1 of this invention. 本発明の実施の形態１におけるシステム総合管理システムの通報機能を説明する図である。It is a figure explaining the report function of the system integrated management system in Embodiment 1 of this invention. 本発明の実施の形態１におけるシステム総合管理システムの保守部品検出機能を説明する図である。It is a figure explaining the maintenance component detection function of the system integrated management system in Embodiment 1 of this invention. 本発明の実施の形態１におけるシステム総合管理システムの手配伝票生成機能を説明する図である。It is a figure explaining the arrangement slip production | generation function of the system integrated management system in Embodiment 1 of this invention. 本発明の実施の形態１におけるシステム総合管理システムの作業計画生成機能を説明する図である。It is a figure explaining the work plan production | generation function of the system integrated management system in Embodiment 1 of this invention.

Explanation of symbols

１監視端末、２Ａ、２Ｂ広域監視制御サーバ、２１データ収集機能、２２データ配信機能、２３集計／演算機能、２４監視制御データベース、
３ネットワーク管理サーバ、３１データ収集機能、３２データ配信機能、３３集計／演算機能、３４ネットワーク管理データベース、
４システム総合管理サーバ、４１データ収集機能、４２データ配信機能、４３集計／演算機能、４４統計処理機能、４５システム管理データベース、４６主原因判定機能、４７通報機能、５非ＩＰネットワーク監視装置、
６保守支援センタ、６１保守部品検出機能、６２手配伝票生成機能、６３作業計画生成機能、７ａ、７ｂ、７ｃＬ２スイッチ、
８ａ通信回線（ＩＰ接続）、８ｂ通信回線（非ＩＰ接続）、
１０イントラネット（ＩＰネットワーク）、２０非ＩＰネットワーク。 1 monitoring terminal, 2A, 2B wide area monitoring control server, 21 data collection function, 22 data distribution function, 23 counting / calculation function, 24 monitoring control database,
3 Network management server, 31 Data collection function, 32 Data distribution function, 33 Total / calculation function, 34 Network management database,
4 system total management server, 41 data collection function, 42 data distribution function, 43 counting / calculation function, 44 statistical processing function, 45 system management database, 46 main cause determination function, 47 notification function, 5 non-IP network monitoring device,
6 Maintenance support center, 61 Maintenance parts detection function, 62 Arrangement slip generation function, 63 Work plan generation function, 7a, 7b, 7c L2 switch,
8a communication line (IP connection), 8b communication line (non-IP connection),
10 intranet (IP network), 20 non-IP network.

Claims

A system comprehensive management system including a maintenance target system and a system total management unit that statistically analyzes and manages failure information for each device constituting the maintenance target system,
In the maintenance target system, a first data collection function that collects failure information that occurs during system operation periodically or when a failure occurs, and failure information that is collected by the first data collection function are aggregated and specified. And a first database for storing and storing the failure information collected by the first data collection function directly or via the first calculation / calculation function,
In the system comprehensive management unit, the second data collection function for collecting the failure information held in the first database and the failure information collected by the second data collection function are totaled and subjected to predetermined processing. A second aggregation / calculation function; a second database for receiving and storing and storing fault information collected by the second data collection function directly or via the second aggregation / calculation function; and the second database A main cause determination function that determines failure information that is a main cause from a plurality of pieces of failure information that is stored in the database, and a main cause determination function that uses the main cause determination function from the failure information stored in the second database. A system comprehensive management system comprising a statistical processing function for performing statistical processing on failure information determined to create statistical failure information for each device.

The system comprehensive management system according to claim 1, wherein the main cause determination function considers that a plurality of failures occurring within a predetermined time are caused by one main failure, and from among these, An integrated system management system characterized by determining the main cause of failure.

The system integrated management system according to claim 1, wherein the number of times of failure information determined as a main cause by the main cause determination function exceeds a notification threshold preset for each device. In this case, a system comprehensive management system comprising a reporting function for reporting the device and failure information to the maintenance support center.

4. The system comprehensive management system according to claim 3, wherein the maintenance support center performs inspection when the number of fault information notified by the notification function exceeds an inspection threshold value or replacement threshold value preset for each device. Alternatively, a system comprehensive management system having a maintenance part detection function for extracting as a maintenance target part to be replaced.

5. The system comprehensive management system according to claim 4, wherein the maintenance support center includes an arrangement slip generation function for arranging maintenance work for a maintenance target part extracted by the maintenance part detection function. System comprehensive management system.

6. The system comprehensive management system according to claim 5, wherein the maintenance support center performs the contents of work to be performed on the maintenance target parts for which maintenance work has been arranged by the arrangement slip generation function, and the work of maintenance personnel. A system integrated management system that has a work plan generation function that selects maintenance personnel dispatched from the scope and work status and gives work instructions.