JP2010087834A

JP2010087834A - Network monitoring system

Info

Publication number: JP2010087834A
Application number: JP2008254567A
Authority: JP
Inventors: Daisuke Yamashita; 乃丞山下; Misaki Kakuno; みさき角野; Shinichi Watabe; 伸一渡部; Hirokazu Nagai; 浩和永井; Keisuke Tagami; 啓介田上
Original assignee: NTT Data Corp
Current assignee: NTT Data Group Corp
Priority date: 2008-09-30
Filing date: 2008-09-30
Publication date: 2010-04-15

Abstract

<P>PROBLEM TO BE SOLVED: To provide a system for continuously monitoring a network element being subjected to monitoring by a monitoring server, the system continuing monitor processing without interruption even if any one of monitoring servers is stopped. <P>SOLUTION: Each of a plurality of monitoring servers monitors a network element of a predetermined monitor target among a plurality of network elements constituting a communication network. Each of the plurality of monitoring servers receives allocation information for extracting a network element, that is its own monitor target, from among the plurality of network elements and monitors the network element of the monitor target corresponding to the allocation information. When a management server connected to the plurality of monitoring servers detects that any one of the plurality of monitoring servers continues monitor processing, the management server transmits to any other monitoring server than the monitoring server that continues monitor processing, allocation information for extracting network elements including the network element that is the monitor target for the monitoring server which continues monitor processing. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、ＩＰネットワークを構成する複数のネットワークエレメントの状態を監視するネットワーク監視システムに関する。 The present invention relates to a network monitoring system that monitors the states of a plurality of network elements that constitute an IP network.

従来、ＩＰ（Internet Protocol）ネットワークを構成するスイッチやルータ、ホストといったネットワーク機器、コンピュータ装置などのネットワークエレメント（以下、ＮＥともいう）と遠隔から通信を行い、ＮＥの状態を監視するように構成されたネットワーク監視システムが利用されている。 Conventionally, it is configured to remotely communicate with network elements (hereinafter also referred to as NE) such as network devices such as switches, routers, and hosts constituting an IP (Internet Protocol) network, and computer devices (hereinafter also referred to as NE), and monitor the state of the NE. A network monitoring system is used.

例えば、図９は、ネットワークを構成する複数台のＮＥ６００（ＮＥ６００−１、ＮＥ６００−２、ＮＥ６００−３、・・・）の状態を監視サーバ７００が監視するネットワーク監視システムの例を示す図である。監視サーバ７００は、例えば５分間隔の周期でＮＥ６００のそれぞれにＩＣＭＰ（Internet Control Message Protocol）やＳＮＭＰ（Simple Network Management Protocol）に基づいて、定期的に情報取得のための要求を送信する。ＮＥ６００は、このような情報取得要求を受信した際に自身に何らかの異常が発生していれば、異常が発生したことを示すエラー情報を応答として送信する。監視サーバ７００は、受信したエラー情報をエラーリストとして記憶して蓄積し、蓄積したエラーリストを自身のディスプレイに表示させる。ネットワークの管理者は、監視サーバ７００のディスプレイに表示されるエラーリストを見ることで、ＮＥ６００に発生した異常を知ることができ、故障の早期発見と復旧等の適切な処置を行うことができる。 For example, FIG. 9 is a diagram illustrating an example of a network monitoring system in which the monitoring server 700 monitors the state of a plurality of NEs 600 (NE600-1, NE600-2, NE600-3,...) Configuring a network. . For example, the monitoring server 700 periodically transmits a request for information acquisition to each NE 600 based on ICMP (Internet Control Message Protocol) or SNMP (Simple Network Management Protocol) at a period of 5 minutes. If any abnormality has occurred in the NE 600 when such an information acquisition request is received, the NE 600 transmits error information indicating that the abnormality has occurred as a response. The monitoring server 700 stores and accumulates the received error information as an error list, and displays the accumulated error list on its own display. The network administrator can know the abnormality that has occurred in the NE 600 by looking at the error list displayed on the display of the monitoring server 700, and can take appropriate measures such as early detection and recovery of a failure.

ここで、ＮＥ６００と監視サーバ７００とを接続する回線や監視サーバ７００自体のハードウェア性能には限りがあるため、一台の監視サーバ７００により監視可能なＮＥ６００の台数には限りがある。そこで、一台の監視サーバ７００で監視できない程に大量のＮＥ６００を監視する場合、複数台の監視サーバ７００を用いてＮＥ６００を監視することになる。例えば、図１０は、大量のＮＥ６００（ＮＥ６００−１〜ＮＥ６００−９、・・・）を、複数台の監視サーバ７００（監視サーバ７００−１、監視サーバ７００−２、監視サーバ７００−３、・・・）により監視する例を示す図である。図１０の例では、複数台の監視サーバ７００が、ネットワークを構成する複数のＮＥ６００を分担して監視する。例えば、監視サーバ７００−１は、ＮＥ６００−１、ＮＥ６００−２、ＮＥ６００−３、・・・を監視対象とし、監視サーバ７００−２は、ＮＥ６００−４、ＮＥ６００−５、ＮＥ６００−６、・・・を監視対象とし、監視サーバ７００−３は、ＮＥ６００−７、ＮＥ６００−８、ＮＥ６００−９、・・・を監視対象としている。このように、複数台の監視サーバ７００が大量のＮＥ６００の監視を分担することで、１台の監視サーバ７００では監視しきれない数のＮＥ６００を監視することができる。 Here, since the performance of the line connecting the NE 600 and the monitoring server 700 and the hardware performance of the monitoring server 700 are limited, the number of NE 600 that can be monitored by one monitoring server 700 is limited. Therefore, when monitoring a large number of NEs 600 that cannot be monitored by one monitoring server 700, the NEs 600 are monitored using a plurality of monitoring servers 700. For example, FIG. 10 shows a large amount of NE600 (NE600-1 to NE600-9,...) And a plurality of monitoring servers 700 (monitoring server 700-1, monitoring server 700-2, monitoring server 700-3,. It is a figure which shows the example monitored by (*). In the example of FIG. 10, a plurality of monitoring servers 700 share and monitor a plurality of NEs 600 constituting a network. For example, the monitoring server 700-1 targets NE600-1, NE600-2, NE600-3,..., And the monitoring server 700-2 includes NE600-4, NE600-5, NE600-6,. .. Is a monitoring target, and the monitoring server 700-3 is monitoring targets NE600-7, NE600-8, NE600-9,. As described above, since a plurality of monitoring servers 700 share the monitoring of a large number of NEs 600, the number of NEs 600 that cannot be monitored by one monitoring server 700 can be monitored.

特許文献１には、監視サーバからネットワークエレメントへの監視経路が切断された場合に、他の通信経路を通して監視を継続する技術が提案されている。
特許文献２には、ＶＲＲＰ（Virtual Router Redundancy Protocol）に基づく通信を行うネットワークシステムにおいて、ルータの処理負荷に応じて稼動するルータを切り替え、不可分散を行う技術が提案されている。
特開２０００−１３３７３号公報特開２００３−４６５３９号公報 Patent Document 1 proposes a technique for continuing monitoring through another communication path when the monitoring path from the monitoring server to the network element is disconnected.
Patent Document 2 proposes a technique for performing non-dispersion by switching a router that operates according to the processing load of a router in a network system that performs communication based on VRRP (Virtual Router Redundancy Protocol).
JP 2000-13373 A JP 2003-46539 A

しかしながら、複数の監視サーバ７００がそれぞれ異なるＮＥ６００を監視する場合、監視サーバ７００の故障時には、故障した監視サーバ７００が監視対象とするＮＥ６００の状態監視が行えず、ＮＥ６００に異常が発生したことを検知できない。そこで、監視サーバ７００によるネットワーク監視システムの可用性を向上させるための対策として、同一の監視対象のＮＥ６００に対する監視サーバ７００を複数台用意してデュアルシステムやデュプレックスシステムを構成することが考えられる。しかし、監視サーバ７００を構成するコンピュータ装置は高価である場合が多く、特に大量のＮＥ６００を監視対象とする複数台の監視サーバ７００のそれぞれのコンピュータ装置を複数台用意するのは導入、運用に多大なコストがかかるという問題がある。 However, when a plurality of monitoring servers 700 monitor different NEs 600, when the monitoring server 700 fails, the failure monitoring server 700 cannot monitor the status of the NE 600 to be monitored and detects that an abnormality has occurred in the NE 600. Can not. Therefore, as a measure for improving the availability of the network monitoring system by the monitoring server 700, it is conceivable to prepare a plurality of monitoring servers 700 for the same monitored NE 600 to form a dual system or a duplex system. However, the computer devices that constitute the monitoring server 700 are often expensive. In particular, preparing a plurality of computer devices for each of the plurality of monitoring servers 700 for monitoring a large number of NEs 600 is very difficult to introduce and operate. There is a problem that costs are high.

本発明は、このような状況に鑑みてなされたもので、ネットワークを監視する複数の監視サーバのいずれかが故障した場合の対策として、予め監視サーバ７００を増設しておくことなく、監視対象となっているＮＥ６００への監視を途切れさせずに可用性を向上させるネットワーク監視システムを提供する。 The present invention has been made in view of such a situation. As a countermeasure when any of a plurality of monitoring servers that monitor a network fails, the monitoring target 700 can be monitored without adding the monitoring server 700 in advance. A network monitoring system that improves availability without interrupting monitoring of the NE 600 is provided.

上述した課題を解決するために、本発明は、通信ネットワークを構成する複数のネットワークエレメントのうち、予め定められた監視対象のネットワークエレメントを監視し、ネットワークエレメントから送信されるエラー情報を受信する複数の監視サーバと、複数の監視サーバのそれぞれから送信されるエラー情報が記憶される管理サーバとを備えたネットワーク監視システムであって、監視サーバは、管理サーバから、複数のネットワークエレメントから自身の監視対象のネットワークエレメントを抽出する割り振り情報を受信する受信部と、受信部が受信した割り振り情報に応じた監視対象のネットワークエレメントを監視するネットワークエレメント監視部と、を備え、管理サーバは、複数の監視サーバのうち、いずれかの監視サーバによるネットワークエレメントの監視処理が停止したことを検知する検知部と、検知部がネットワークエレメントの監視処理が停止したことを検知すると、監視処理を停止した監視サーバ以外の監視サーバに、監視処理を停止した監視サーバが監視対象としていたネットワークエレメントを含むネットワークエレメントを監視対象として抽出する割り振り情報を送信する割り振り情報送信部と、を備えることを特徴とする。 In order to solve the above-described problem, the present invention monitors a predetermined network element to be monitored among a plurality of network elements constituting a communication network, and receives error information transmitted from the network element. Monitoring server and a management server in which error information transmitted from each of the plurality of monitoring servers is stored, the monitoring server monitoring itself from a plurality of network elements from the management server A receiving unit that receives allocation information for extracting a target network element; and a network element monitoring unit that monitors a network element to be monitored according to the allocation information received by the receiving unit. One of the monitoring servers Stops the monitoring process on a monitoring server other than the monitoring server that stopped the monitoring process when the detection unit detects that the network element monitoring process has stopped and the detection unit detects that the network element monitoring process has stopped. And an allocation information transmitting unit that transmits allocation information for extracting the network element including the network element that has been the monitoring target.

また、本発明は、上述の管理サーバの割り振り情報送信部が送信する割り振り情報は、通信ネットワークを構成する複数のネットワークエレメントの全てから、監視サーバ以外の監視サーバのそれぞれが監視対象とするネットワークエレメントを抽出する割り振り情報であることを特徴とする。 Further, according to the present invention, the allocation information transmitted by the allocation information transmission unit of the management server described above is a network element that is monitored by each of the monitoring servers other than the monitoring server from all of the plurality of network elements constituting the communication network. It is the allocation information which extracts.

また、本発明は、上述の管理サーバの割り振り情報送信部が送信する割り振り情報は、監視処理を停止した監視サーバが監視対象としていたネットワークエレメントのみを、監視サーバ以外の監視サーバが追加の監視対象として抽出する割り振り情報であることを特徴とする。 Further, according to the present invention, the allocation information transmitted by the allocation information transmission unit of the management server described above includes only network elements that are monitored by the monitoring server that has stopped monitoring processing, and monitoring servers other than the monitoring server add additional monitoring targets. It is the allocation information extracted as follows.

また、本発明は、上述の監視サーバは、予め定められた監視周期ごとに監視対象のネットワークエレメントの監視処理を行い、管理サーバは、監視サーバの処理能力を示す監視サーバ情報が記憶される監視サーバ情報記憶部と、割り振り情報送信部によって送信される割り振り情報により監視サーバに監視対象として割り振られるネットワークエレメントと、監視サーバに対応する監視サーバ情報とに基づく監視サーバの監視処理の処理負荷が、予め定められた閾値を超える場合、監視サーバに、監視周期を延伸することを示す情報を送信することを特徴とする。 Further, according to the present invention, the above-described monitoring server performs monitoring processing of a network element to be monitored at every predetermined monitoring cycle, and the management server stores monitoring server information indicating the processing capacity of the monitoring server. The processing load of the monitoring process of the monitoring server based on the server information storage unit, the network element allocated to the monitoring server by the allocation information transmitted by the allocation information transmitting unit, and the monitoring server information corresponding to the monitoring server is When a predetermined threshold value is exceeded, information indicating that the monitoring cycle is extended is transmitted to the monitoring server.

また、本発明は、上述の管理サーバの監視サーバ情報記憶部に記憶される監視サーバ情報には、監視サーバのＣＰＵ使用率とメモリ量とによって監視サーバの処理能力を示す情報が含まれることを特徴とする。 According to the present invention, the monitoring server information stored in the monitoring server information storage unit of the management server includes information indicating the processing capacity of the monitoring server based on the CPU usage rate and the memory amount of the monitoring server. Features.

また、本発明は、上述の管理サーバの監視サーバ情報記憶部に記憶される監視サーバ情報には、監視サーバによるネットワークエレメントの監視処理の所要時間によって監視サーバの処理能力を示す情報が含まれることを特徴とする。 According to the present invention, the monitoring server information stored in the monitoring server information storage unit of the management server includes information indicating the processing capability of the monitoring server according to the time required for the monitoring process of the network element by the monitoring server. It is characterized by.

以上説明したように、本発明によれば、通信ネットワークを構成する複数のネットワークエレメントのうち、予め定められた監視対象のネットワークエレメントを監視する複数の監視サーバは、複数のネットワークエレメントから自身の監視対象のネットワークエレメントを抽出する割り振り情報を受信して、割り振り情報に応じた監視対象のネットワークエレメントを監視し、複数の監視サーバのそれぞれから送信されるエラー情報が記憶される管理サーバは、複数の監視サーバのうち、いずれかの監視サーバによる監視処理が停止したことを検知すると、監視処理を停止した監視サーバ以外の監視サーバに、監視処理を停止した監視サーバが監視対象としていたネットワークエレメントを含むネットワークエレメントを監視対象として抽出する割り振り情報を送信するようにしたので、複数の監視サーバのいずれかが停止したときにも、監視処理を停止した監視サーバが監視対象としていたネットワークエレメントの監視を他の監視サーバが行うことができ、監視対象のネットワークエレメントの監視を途切れさせずに続けることが可能となる。 As described above, according to the present invention, among a plurality of network elements constituting a communication network, a plurality of monitoring servers that monitor a predetermined network element to be monitored are monitored by the plurality of network elements. The management server that receives the allocation information for extracting the target network element, monitors the network element to be monitored according to the allocation information, and stores the error information transmitted from each of the plurality of monitoring servers. When it is detected that one of the monitoring servers has stopped the monitoring process, the monitoring server other than the monitoring server that stopped the monitoring process includes the network element that was monitored by the monitoring server that stopped the monitoring process. Extract network elements for monitoring Since the allocation information to be transmitted is transmitted, even if one of the monitoring servers stops, other monitoring servers can monitor the network element that was monitored by the monitoring server that stopped the monitoring process. It is possible to continue monitoring the network element to be monitored without interruption.

以下、本発明の一実施形態について、図面を参照して説明する。
図１は、本実施形態によるネットワークシステム１の構成を示す図である。本実施形態によるネットワークシステム１は、複数台のＮＥ（ネットワークエレメント）１００（ＮＥ１００−１〜ＮＥ１００−９、・・・）と、複数台の監視サーバ２００（監視サーバ２００−１、監視サーバ２００−２、監視サーバ２００−３、・・・）と、管理サーバ３００と、ＡＰサーバ４００と、複数台の監視用クライアント端末５００（監視用クライアント端末５００−１、監視用クライアント端末５００−２、監視用クライアント端末５００−３、・・・）とを備えている。ここで、ＮＥ１００、監視サーバ２００、管理サーバ３００、ＡＰサーバ４００、監視用クライアント端末５００のそれぞれの台数は、ＮＥ１００により構成されるＩＰネットワーク（ａ）の規模や、各機器の性能等に応じて定められ、構成されるようにして良い。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings.
FIG. 1 is a diagram showing a configuration of a network system 1 according to the present embodiment. The network system 1 according to this embodiment includes a plurality of NEs (network elements) 100 (NE100-1 to NE100-9,...) And a plurality of monitoring servers 200 (monitoring server 200-1, monitoring server 200-). 2, monitoring server 200-3,...), Management server 300, AP server 400, and a plurality of monitoring client terminals 500 (monitoring client terminal 500-1, monitoring client terminal 500-2, monitoring) Client terminal 500-3, ...). Here, the number of the NE 100, the monitoring server 200, the management server 300, the AP server 400, and the monitoring client terminal 500 depends on the scale of the IP network (a) configured by the NE 100, the performance of each device, and the like. It may be defined and configured.

複数台のＮＥ１００は、ＩＰネットワーク（ａ）を構成するスイッチやルータ、ホストサーバといったネットワーク機器、コンピュータ装置などであり、これらを総称してネットワークエレメントと呼ぶ。ＮＥ１００は、自身を監視対象とする監視サーバ２００から送信されるＳＮＭＰやＩＣＭＰに基づく情報取得要求を受信し、応答する。ここで、ＮＥ１００は、自身にエラーが発生している場合には、受信する情報取得要求に対してエラー情報を送信する。エラー情報には、ＮＥ１００自身の識別情報やエラー内容などの情報が含まれる。 The plurality of NEs 100 are network devices such as switches, routers, and host servers constituting the IP network (a), computer devices, and the like, and are collectively referred to as network elements. The NE 100 receives and responds to an information acquisition request based on SNMP or ICMP transmitted from the monitoring server 200 that monitors itself. Here, if an error has occurred in the NE 100, the NE 100 transmits error information in response to the received information acquisition request. The error information includes information such as identification information of NE 100 itself and error contents.

複数台の監視サーバ２００は、ＣＰＵ（中央演算装置）、メモリ、ＨＤＤ（ハードディスク）などを備え、それぞれに自身の監視対象として定められたＮＥ１００との間でＩＣＭＰやＳＮＭＰに基づく通信を行い、監視対象のＮＥ１００の状態を監視するコンピュータ装置である。本実施形態における初期状態では、監視サーバ２００−１の監視対象は、ＩＰネットワーク（ａ）のうち、ＮＥ１００−３、ＮＥ１００−６、ＮＥ１００−９、・・・により構成されるＩＰネットワーク（ａ−１）である。同様に、監視サーバ２００−２の監視対象は、ＩＰネットワーク（ａ）のうち、ＮＥ１００−１、ＮＥ１００−４、ＮＥ１００−７、・・・により構成されるＩＰネットワーク（ａ−２）である。同様に、監視サーバ２００−３の監視対象は、ＩＰネットワーク（ａ）のうち、ＮＥ１００−２、ＮＥ１００−５、ＮＥ１００−８、・・・により構成されるＩＰネットワーク（ａ−３）である。 The plurality of monitoring servers 200 are provided with a CPU (Central Processing Unit), a memory, an HDD (Hard Disk), etc., and perform communication based on ICMP and SNMP with the NE 100 determined as their monitoring targets. This is a computer device that monitors the state of the target NE 100. In the initial state in the present embodiment, the monitoring target of the monitoring server 200-1 is an IP network (a−) composed of NE100-3, NE100-6, NE100-9,. 1). Similarly, the monitoring target of the monitoring server 200-2 is the IP network (a-2) configured by the NE 100-1, NE 100-4, NE 100-7,... Of the IP network (a). Similarly, the monitoring target of the monitoring server 200-3 is the IP network (a-3) configured by the NE 100-2, NE 100-5, NE 100-8,... Of the IP network (a).

図２は、複数台の監視サーバ２００と管理サーバ３００との構成を詳細に示す図である。監視サーバ２００−１は、ＮＥ情報記憶部２１０−１と、監視対象ＮＥ抽出部２２０−１と、通信部２３０−１と、ＮＥ監視部２４０−１と、ＯＳ（オペレーティングシステム）２５０−１とを備えている。複数台の監視サーバ２００は同様の構成であるので、ここでは監視サーバ２００−１の構成を代表して説明し、複数台の監視サーバ２００のうち特定の監視サーバ２００を識別する「−１」等の表記は省略して説明する。 FIG. 2 is a diagram showing in detail the configuration of a plurality of monitoring servers 200 and management servers 300. The monitoring server 200-1 includes an NE information storage unit 210-1, a monitoring target NE extraction unit 220-1, a communication unit 230-1, an NE monitoring unit 240-1, and an OS (operating system) 250-1. It has. Since the plurality of monitoring servers 200 have the same configuration, here, the configuration of the monitoring server 200-1 will be described as a representative, and “−1” for identifying a specific monitoring server 200 among the plurality of monitoring servers 200. The notation such as is omitted for explanation.

ＮＥ情報記憶部２１０には、自身の監視サーバ２００が監視対象とするか否かに関わらず、ＩＰネットワーク（ａ）が備える全てのＮＥ１００に関するＮＥ情報が記憶される。図３は、ＮＥ情報記憶部２１０に記憶されるＮＥ情報のデータ例を示す図である。ＮＥ情報は、ＮＥ１００を識別するＮＥ識別情報と、ＮＥ識別情報に対応するＩＰアドレスなどの情報が対応付けられた情報である。本実施形態では、ＮＥ識別情報が「１」である場合には図１におけるＮＥ１００−１を示し、ＮＥ識別情報が「２」である場合にはＮＥ１００−２を示すといったように、ＮＥ識別情報の数字（Ｎ）と、図１においてＮＥ１００の後に付された数字（ＮＥ１００−Ｎ）とが対応することとする。ＮＥ情報記憶部２１０には、監視サーバ２００の起動時に管理サーバ３００からＮＥ情報が送信され、記憶される。 The NE information storage unit 210 stores NE information related to all NEs 100 included in the IP network (a) regardless of whether or not the own monitoring server 200 is a monitoring target. FIG. 3 is a diagram illustrating a data example of NE information stored in the NE information storage unit 210. The NE information is information in which NE identification information for identifying the NE 100 is associated with information such as an IP address corresponding to the NE identification information. In the present embodiment, when the NE identification information is “1”, the NE identification information is shown as NE 100-1 in FIG. 1, and when the NE identification information is “2”, the NE identification information is shown. The number (N) in FIG. 1 corresponds to the number (NE100-N) added after NE100 in FIG. The NE information storage unit 210 transmits and stores NE information from the management server 300 when the monitoring server 200 is activated.

図２に戻り、監視対象ＮＥ抽出部２２０は、管理サーバ３００から送信される割り振り情報と、予め定められた割り振り条件とに基づいて、ＮＥ情報記憶部２１０に記憶されたＮＥ情報のうち、自身の監視サーバ２００が監視対象とするＮＥ１００の識別情報を抽出する。例えば、管理サーバ３００から送信される割り振り情報として、起動状態にある監視サーバ２００の台数と、監視サーバ２００毎の割り振り番号と、ＮＥ情報とを受信する。予め定められた割り振り条件には、ＮＥ情報に含まれるＮＥ識別情報を、起動状態にある監視サーバ２００の台数で割った余り（剰余）の値が、自身の割り振り番号に一致するＮＥ識別情報を自身の監視対象とする条件を適用することができる。 Returning to FIG. 2, the monitoring target NE extraction unit 220 includes itself among the NE information stored in the NE information storage unit 210 based on the allocation information transmitted from the management server 300 and the predetermined allocation condition. The identification information of the NE 100 to be monitored by the monitoring server 200 is extracted. For example, as the allocation information transmitted from the management server 300, the number of monitoring servers 200 in the activated state, the allocation number for each monitoring server 200, and NE information are received. The predetermined allocation condition includes NE identification information in which the value of the remainder (remainder) obtained by dividing the NE identification information included in the NE information by the number of monitoring servers 200 in the active state matches the own allocation number. You can apply the conditions that you want to monitor.

例えば、監視サーバ２００−１と、監視サーバ２００−２と、監視サーバ２００−３との３台が起動状態にあり、監視サーバ２００−１には割り振り番号として「０」が割り振られ、監視サーバ２００−２には割り振り番号として「１」が割り振られ、監視サーバ２００−３には割り振り番号として「２」が割り振られ、ＮＥ情報に含まれるＮＥ識別情報が１〜９であるとする割り振り情報が送られる。監視サーバ２００が、この割り振り情報と上述の割り振り条件とに基づいて監視対象のＮＥ１００を抽出すると、監視サーバ２００−１の監視対象は、ＮＥ情報に含まれるＮＥ識別情報「１〜９」を起動状態にある監視サーバ２００の台数「３」で割った余りが「０」となるＮＥ１００−３、ＮＥ１００−６、ＮＥ１００−９となる。同様に、監視サーバ２００−２の監視対象はＮＥ１００−１、ＮＥ１００−４、ＮＥ１００−７となる。同様に、監視サーバ２００−３の監視対象はＮＥ１００−２、ＮＥ１００−５、ＮＥ１００−８となる。 For example, three servers, a monitoring server 200-1, a monitoring server 200-2, and a monitoring server 200-3 are in an activated state, and “0” is allocated as an allocation number to the monitoring server 200-1, and the monitoring server Allocation information that “1” is allocated as the allocation number 200-2, “2” is allocated as the allocation number to the monitoring server 200-3, and the NE identification information included in the NE information is 1 to 9. Will be sent. When the monitoring server 200 extracts the NE 100 to be monitored based on this allocation information and the above-described allocation conditions, the monitoring target of the monitoring server 200-1 activates the NE identification information “1-9” included in the NE information. NE100-3, NE100-6, and NE100-9 in which the remainder obtained by dividing the number of monitoring servers 200 in the state by “3” is “0”. Similarly, the monitoring targets of the monitoring server 200-2 are NE100-1, NE100-4, and NE100-7. Similarly, the monitoring targets of the monitoring server 200-3 are NE100-2, NE100-5, and NE100-8.

通信部２３０は、ネットワークを介して管理サーバ３００と情報の送受信を行う。
ＮＥ監視部２４０は、監視対象ＮＥ抽出部２２０によって監視対象として抽出されたＮＥ識別情報に対応するＮＥ１００とＩＣＭＰやＳＮＭＰに基づく通信を行い、監視対象のＮＥ１００を監視する。ＮＥ監視部２４０は、定められた周期ごとに監視対象のＮＥ１００に情報取得要求のポーリングを行い、ＮＥ１００からエラーが発生したことを示すエラー情報を受信すると、通信部２３０を介して管理サーバ３００に送信する。また、ＮＥ監視部２４０が通信部２３０を介して管理サーバ３００に送信するエラー情報には、監視サーバ２００からＮＥ１００への情報取得要求に対する応答として送信されるエラー情報の他に、情報取得要求に対してＮＥ１００が応答せず、応答がタイムアウトした場合のエラー情報や、ＮＥ１００が、自身に異常が発生した場合に自発的に送信し、自身の変化を通知するＳＮＭＰトラップと呼ばれるエラー情報がある。 The communication unit 230 transmits / receives information to / from the management server 300 via the network.
The NE monitoring unit 240 communicates with the NE 100 corresponding to the NE identification information extracted as the monitoring target by the monitoring target NE extraction unit 220 based on ICMP or SNMP, and monitors the NE 100 to be monitored. The NE monitoring unit 240 polls the monitoring target NE 100 for an information acquisition request every predetermined period, and when receiving error information indicating that an error has occurred from the NE 100, the NE monitoring unit 240 notifies the management server 300 via the communication unit 230. Send. The error information transmitted from the monitoring server 200 to the management server 300 via the communication unit 230 includes, in addition to error information transmitted as a response to the information acquisition request from the monitoring server 200 to the NE 100, the information acquisition request. In contrast, there is error information when the NE 100 does not respond and the response times out, or error information called an SNMP trap that the NE 100 spontaneously transmits and notifies its own change when an abnormality occurs.

ＯＳ２５０は、監視サーバ２００が備えるＣＰＵ、メモリ、ＨＤＤなどのハードウェアリソースを管理し、監視サーバ２００が備える各機能部からの要求に応じてハードウェアリソースの資源割り当てを行う基本ソフトウェアである。ＯＳ２５０には、ＣＰＵ、メモリ、ＨＤＤなどの各ハードウェアリソースの性能、容量、使用率、空き容量などがリアルタイムに記憶される。また、ＯＳ２５０は監視サーバ２００が備えるＮＥ監視部２４０などの各機能部にハードウェアリソースの資源割り当てを行っており、資源割り当てを行っているか否かによって各機能部のプロセスが起動状態にあるか否かを判定することが可能である。 The OS 250 is basic software that manages hardware resources such as a CPU, a memory, and an HDD provided in the monitoring server 200 and allocates hardware resources in response to requests from the functional units provided in the monitoring server 200. The OS 250 stores the performance, capacity, usage rate, free capacity, and the like of each hardware resource such as a CPU, memory, and HDD in real time. In addition, the OS 250 allocates hardware resources to each function unit such as the NE monitoring unit 240 included in the monitoring server 200, and whether the process of each function unit is in an activated state depending on whether the resource allocation is performed. It is possible to determine whether or not.

図１に戻り、管理サーバ３００は、複数台の監視サーバ２００に接続されたコンピュータ装置であり、監視サーバ２００から送信されるエラー情報を受信して記憶するとともに、複数台の監視サーバ２００の動作を管理する。図２を参照して、管理サーバ３００の詳細な構成を説明する。管理サーバ３００は、ＮＥ情報記憶部３１０と、通信部３２０と、監視サーバ状態取得部３３０と、監視サーバ情報記憶部３４０と、割り振り情報送信部３５０と、エラー情報記憶部３６０と、ＯＳ３７０とを備えている。 Returning to FIG. 1, the management server 300 is a computer device connected to a plurality of monitoring servers 200, receives and stores error information transmitted from the monitoring server 200, and operates the plurality of monitoring servers 200. Manage. The detailed configuration of the management server 300 will be described with reference to FIG. The management server 300 includes an NE information storage unit 310, a communication unit 320, a monitoring server state acquisition unit 330, a monitoring server information storage unit 340, an allocation information transmission unit 350, an error information storage unit 360, and an OS 370. I have.

ＮＥ情報記憶部３１０には、図３に示したＮＥ情報が記憶されている。ＮＥ情報は、管理者からの操作情報の入力に応じて記憶される。ＮＥ情報記憶部３１０に記憶されたＮＥ情報は、監視サーバ状態取得部３３０が、監視サーバ２００が起動したことを検知すると、通信部３２０によって監視サーバ２００に送信される。
通信部３２０は、ネットワークを介して複数台の監視サーバ２００のそれぞれと情報の送受信を行う。通信部３２０は、例えば、監視サーバ２００にＮＥ情報を送信し、また、各監視サーバ２００がＮＥ１００から受信したエラー情報を受信する。また、通信部３２０は、図１に示したＡＰサーバ４００からの要求に応じて、エラー情報記憶部３６０に記憶されたＮＥ１００のエラー情報を送信する。
監視サーバ状態取得部３３０は、通信部３２０を介して監視サーバ２００と通信を行い、監視サーバ２００の状態を示す情報を取得して監視サーバ情報記憶部３４０に記憶させる。図４は、監視サーバ情報記憶部３４０に記憶される監視サーバ情報のデータ例を示す図である。監視サーバ情報には、監視サーバ識別情報に対応付けられて、起動フラグと、ポーリング所要時間と、ＣＰＵ数と、ＣＰＵ使用率と、メモリ容量と、ＨＤＤ容量などの情報が含まれる。 The NE information storage unit 310 stores NE information shown in FIG. The NE information is stored in response to operation information input from the administrator. The NE information stored in the NE information storage unit 310 is transmitted to the monitoring server 200 by the communication unit 320 when the monitoring server state acquisition unit 330 detects that the monitoring server 200 is activated.
The communication unit 320 transmits / receives information to / from each of the plurality of monitoring servers 200 via the network. For example, the communication unit 320 transmits NE information to the monitoring server 200 and receives error information received from the NE 100 by each monitoring server 200. Further, the communication unit 320 transmits the NE 100 error information stored in the error information storage unit 360 in response to the request from the AP server 400 illustrated in FIG.
The monitoring server state acquisition unit 330 communicates with the monitoring server 200 via the communication unit 320, acquires information indicating the state of the monitoring server 200, and stores the information in the monitoring server information storage unit 340. FIG. 4 is a diagram illustrating an example of monitoring server information data stored in the monitoring server information storage unit 340. The monitoring server information includes information such as a start flag, a polling required time, the number of CPUs, a CPU usage rate, a memory capacity, and an HDD capacity in association with the monitoring server identification information.

監視サーバ情報に含まれる監視サーバ識別情報は、複数台の監視サーバ２００のそれぞれを識別する情報である。起動フラグは、監視サーバ識別情報に示される監視サーバ２００のＮＥ監視部２４０が起動状態にあるか停止状態にあるかを示す情報である。例えば、起動フラグには、監視サーバ２００のＮＥ監視部２４０が起動状態であれば「１」が、停止状態であれば「０」が記憶される。例えば、監視サーバ状態取得部３３０は、監視サーバ２００のＯＳ２５０を介してＮＥ監視部２４０のプロセスに通信要求を行い、応答があればＮＥ監視部２４０は起動状態であると判定し、応答がなければＮＥ監視部２４０は停止状態であると判定する。 The monitoring server identification information included in the monitoring server information is information for identifying each of the plurality of monitoring servers 200. The activation flag is information indicating whether the NE monitoring unit 240 of the monitoring server 200 indicated by the monitoring server identification information is in an activated state or a stopped state. For example, “1” is stored in the activation flag if the NE monitoring unit 240 of the monitoring server 200 is in the activated state, and “0” is stored in the suspended state. For example, the monitoring server state acquisition unit 330 makes a communication request to the process of the NE monitoring unit 240 via the OS 250 of the monitoring server 200. If there is a response, the NE monitoring unit 240 determines that the NE monitoring unit 240 is in an activated state and there is no response. For example, the NE monitoring unit 240 determines that it is in a stopped state.

監視サーバ情報に含まれるポーリング所要時間は、監視サーバ２００が監視対象とするＮＥ１００にポーリングを行い、監視対象の全てのＮＥ１００から応答を受信するまでの実績の所要時間である。監視サーバ状態取得部３３０は、通信部３２０を介して監視サーバ２００のＮＥ監視部２４０からポーリング所要時間を取得して監視サーバ情報記憶部３４０に記憶させる。 The polling required time included in the monitoring server information is an actual required time until the monitoring server 200 polls the NE 100 to be monitored and receives responses from all the NEs 100 to be monitored. The monitoring server state acquisition unit 330 acquires the required polling time from the NE monitoring unit 240 of the monitoring server 200 via the communication unit 320 and stores it in the monitoring server information storage unit 340.

監視サーバ情報に含まれるＣＰＵは、監視サーバ２００に備えられるＣＰＵの個数やクロック数などの性能を示す情報である。ＣＰＵ使用率は、監視サーバ２００にて実行されているソフトウェアが単位時間あたりにＣＰＵを占有している時間の割合であり、例えば０％付近であれば何も実行されていないことを示し、１００％であれば継続して処理が実行されており、１００％付近の状態が続く場合にはＣＰＵの処理能力を超えた処理要求が行われていることを示す。メモリ容量は、監視サーバ２００に備えられるメモリの容量を示す情報である。ＨＤＤ空き容量は、監視サーバ２００に備えられるＨＤＤのうち、空き容量を示す情報である。ここで、起動フラグや、ＣＰＵ、ＣＰＵ使用率、メモリ容量、ＨＤＤ空き容量等のマシン性能情報は、監視サーバ２００のＯＳ２５０と通信を行うことにより取得することができる。 The CPU included in the monitoring server information is information indicating performance such as the number of CPUs and the number of clocks provided in the monitoring server 200. The CPU usage rate is a ratio of the time that the software executed on the monitoring server 200 occupies the CPU per unit time. For example, if it is around 0%, nothing is executed. If it is%, processing is continuously executed, and if the state near 100% continues, it indicates that a processing request exceeding the processing capacity of the CPU is being performed. The memory capacity is information indicating the capacity of the memory provided in the monitoring server 200. The HDD free capacity is information indicating the free capacity of the HDDs provided in the monitoring server 200. Here, machine performance information such as a start flag, CPU, CPU usage rate, memory capacity, HDD free capacity, and the like can be acquired by communicating with the OS 250 of the monitoring server 200.

割り振り情報送信部３５０は、監視サーバ状態取得部３３０が監視サーバ２００のＮＥ監視部２４０が起動または停止し、起動状態が変化したことを検知すると、ＮＥ情報記憶部３１０に記憶されたＮＥ情報と、監視サーバ情報記憶部３４０に記憶された監視サーバ情報とに基づいて割り振り情報を生成し、起動状態にある監視サーバ２００に送信する。例えば、割り振り情報には、上述のように、起動状態にある監視サーバ２００の台数と、ＮＥ情報記憶部３１０に記憶されたＮＥ情報と、監視サーバ２００への割り振り番号とが含まれる。ここで、割り振り番号には、０以上かつ起動状態にある監視サーバ２００の台数以下の数字であって、監視サーバ２００のそれぞれに一意となる数字が付与される。
エラー情報記憶部３６０には、監視サーバ２００から送信され、通信部３２０が受信したＮＥ１００のエラー情報が記憶される。
ＯＳ３７０は、監視サーバ２００が備えるＣＰＵ、メモリ、ＨＤＤなどのハードウェアリソースを管理し、監視サーバ２００が備える各機能部からの要求に応じてハードウェアリソースの資源割り当てを行う基本ソフトウェアである。 When the monitoring server state acquisition unit 330 detects that the NE monitoring unit 240 of the monitoring server 200 is activated or stopped and the activation state has changed, the allocation information transmission unit 350 detects the NE information stored in the NE information storage unit 310 and The allocation information is generated based on the monitoring server information stored in the monitoring server information storage unit 340, and is transmitted to the monitoring server 200 in the activated state. For example, as described above, the allocation information includes the number of monitoring servers 200 in the activated state, NE information stored in the NE information storage unit 310, and an allocation number to the monitoring server 200. Here, the allocation number is a number that is greater than or equal to 0 and equal to or less than the number of monitoring servers 200 in the activated state, and is unique to each monitoring server 200.
The error information storage unit 360 stores NE100 error information transmitted from the monitoring server 200 and received by the communication unit 320.
The OS 370 is basic software that manages hardware resources such as a CPU, a memory, and an HDD included in the monitoring server 200 and allocates hardware resources according to requests from each functional unit included in the monitoring server 200.

図１に戻り、ＡＰサーバ４００は、監視用クライアント端末５００からの要求に応じて管理サーバ３００のエラー情報記憶部３６０に記憶されたＮＥ１００のエラー情報を読み出し、監視用クライアント端末５００に送信する。本実施形態では、ＡＰサーバ４００はウェブサービス機能部を備えており、管理サーバ３００から読み出した異常通知を、ＨＴＴＰ（HyperText Transfer Protocol）などの通信により監視用クライアント端末５００に送信する。 Returning to FIG. 1, the AP server 400 reads the NE 100 error information stored in the error information storage unit 360 of the management server 300 in response to a request from the monitoring client terminal 500 and transmits the NE 100 error information to the monitoring client terminal 500. In this embodiment, the AP server 400 includes a web service function unit, and transmits an abnormality notification read from the management server 300 to the monitoring client terminal 500 by communication such as HTTP (HyperText Transfer Protocol).

監視用クライアント端末５００は、ＡＰサーバ４００から送信される複数のＮＥ１００の状態を出力するコンピュータ端末である。監視用クライアント端末５００は、ＩＰネットワーク（ａ）の管理者に利用され、管理者からの要求に応じてＡＰサーバ４００から監視対象のＮＥ１００のエラー情報を受信して表示する。本実施形態では、監視用クライアント端末５００は、ウェブブラウザ機能部を備えており、ＡＰサーバ４００と通信を行って、ＡＰサーバ４００から送信されるエラー情報を自身が備えるディスプレイに出力する。ここで、監視用クライアント端末５００は、例えば、ＮＥ１００にエラーが発生した発生日時、エラー情報に対応するエラーメッセージ、エラー情報を送信したＮＥ１００のＩＰアドレスなどを表示する。ＩＰネットワーク（ａ）の管理者は、監視用クライアント端末５００に表示される異常通知によって監視対象のＮＥ１００の状態や発生した異常を知ることができ、故障の早期発見と復旧等の適切な処置を行うことができる。 The monitoring client terminal 500 is a computer terminal that outputs the states of a plurality of NEs 100 transmitted from the AP server 400. The monitoring client terminal 500 is used by an administrator of the IP network (a), receives error information of the NE 100 to be monitored from the AP server 400 and displays it in response to a request from the administrator. In this embodiment, the monitoring client terminal 500 includes a web browser function unit, communicates with the AP server 400, and outputs error information transmitted from the AP server 400 to a display included in the monitoring client terminal 500. Here, the monitoring client terminal 500 displays, for example, the date and time when the error occurred in the NE 100, the error message corresponding to the error information, the IP address of the NE 100 that transmitted the error information, and the like. The administrator of the IP network (a) can know the state of the NE 100 to be monitored and the abnormality that has occurred by the abnormality notification displayed on the monitoring client terminal 500, and can take appropriate measures such as early detection and recovery of the failure. It can be carried out.

次に、図５を参照して、監視サーバ２００−１のＮＥ監視部２４０−１が停止した場合に監視対象の再割り振りが行われる動作例を説明する。
初期状態では、監視サーバ２００−１、監視サーバ２００−２、監視サーバ２００−３のいずれもが起動状態であり、図１に示したそれぞれの監視対象のＮＥ１００に情報取得要求を送信してポーリングによる監視を行っているとする。監視サーバ２００−１のＮＥ監視部２４０−１にエラーが発生し、動作が停止すると（ステップＳ１）、管理サーバ３００の監視サーバ状態取得部３３０が、監視サーバ２００−１のＮＥ監視部２４０−１が停止したことを検知する（ステップＳ２）。 Next, with reference to FIG. 5, an operation example in which reallocation of monitoring targets is performed when the NE monitoring unit 240-1 of the monitoring server 200-1 is stopped will be described.
In the initial state, all of the monitoring server 200-1, the monitoring server 200-2, and the monitoring server 200-3 are activated, and polling is performed by transmitting an information acquisition request to each of the monitoring target NEs 100 illustrated in FIG. Suppose that monitoring is performed. When an error occurs in the NE monitoring unit 240-1 of the monitoring server 200-1 and the operation stops (step S1), the monitoring server state acquisition unit 330 of the management server 300 causes the NE monitoring unit 240- of the monitoring server 200-1 to operate. 1 is detected (step S2).

管理サーバ３００の割り振り情報送信部３５０は、ＮＥ情報記憶部３１０から読み出したＮＥ情報に基づいて、監視サーバ２００−２と監視サーバ２００−３とに対応する割り振り情報を生成して送信する（ステップＳ３）。ここで、割り振り情報に含まれる起動状態の監視サーバ２００の台数を示す情報は２台であり、ＮＥ情報に含まれるＮＥ識別情報は１〜９であり、監視サーバ２００−２の割り振り番号は０であり、監視サーバ２００−３の割り振り番号は１であるとする。 Based on the NE information read from the NE information storage unit 310, the allocation information transmission unit 350 of the management server 300 generates and transmits allocation information corresponding to the monitoring server 200-2 and the monitoring server 200-3 (step). S3). Here, the information indicating the number of activated monitoring servers 200 included in the allocation information is two, the NE identification information included in the NE information is 1 to 9, and the allocation number of the monitoring server 200-2 is 0. It is assumed that the allocation number of the monitoring server 200-3 is 1.

監視サーバ２００−２は、送信された割り振り情報を受信すると、割り振り情報に含まれるＮＥ情報をＮＥ情報記憶部２１０−２に記憶させる。監視サーバ２００−２の監視対象ＮＥ抽出部２２０−１は、管理サーバ３００から受信した割り振り情報と、予め定められ自身の記憶領域に記憶された割り振り条件とに基づいて、ＮＥ情報記憶部２１０−２に記憶させたＮＥ情報のうち、自身の監視対象とするＮＥ識別情報を抽出する（ステップＳ４）。ここで、監視対象ＮＥ抽出部２２０−２が抽出する監視対象のＮＥ識別情報は、ＮＥ情報に含まれるＮＥ識別情報「１〜９」を、起動状態にある監視サーバ２００の台数「２」で割った余りが「０」となるＮＥ１００−２、ＮＥ１００−４、ＮＥ１００−６、ＮＥ１００−８である。ＮＥ監視部２４０−２は、監視対象ＮＥ抽出部２２０−２が抽出したＮＥ識別情報に対応する監視対象のＮＥ１００の監視を行う（ステップＳ５）。 When receiving the transmitted allocation information, the monitoring server 200-2 stores the NE information included in the allocation information in the NE information storage unit 210-2. The monitoring target NE extraction unit 220-1 of the monitoring server 200-2 is based on the allocation information received from the management server 300 and the allocation conditions stored in its own storage area in advance. 2 is extracted from the NE information stored in 2 (step S4). Here, the NE identification information of the monitoring target extracted by the monitoring target NE extraction unit 220-2 is the NE identification information “1 to 9” included in the NE information by the number “2” of the monitoring servers 200 in the activated state. These are NE100-2, NE100-4, NE100-6, NE100-8 in which the remainder is "0". The NE monitoring unit 240-2 monitors the monitoring target NE 100 corresponding to the NE identification information extracted by the monitoring target NE extraction unit 220-2 (step S5).

同様に、監視サーバ２００−３の監視対象ＮＥ抽出部２２０−３が、監視対象のＮＥ識別情報を抽出する（ステップＳ６）。ここで、監視対象ＮＥ抽出部２２０−３が抽出する監視対象のＮＥ識別情報は、ＮＥ情報に含まれるＮＥ識別情報「１〜９」を、起動状態にある監視サーバ２００の台数「２」で割った余りが「１」となるＮＥ１００−１、ＮＥ１００−３、ＮＥ１００−５、ＮＥ１００−７、ＮＥ１００−９である。ＮＥ監視部２４０−３は、監視対象ＮＥ抽出部２２０−３が抽出したＮＥ識別情報に対応する監視対象のＮＥ１００の監視を行う（ステップＳ７）。図６は、このように監視対象の再割り振りが行われた後のネットワークシステム１の監視状態を示す図である。 Similarly, the monitoring target NE extraction unit 220-3 of the monitoring server 200-3 extracts NE identification information to be monitored (step S6). Here, the NE identification information of the monitoring target extracted by the monitoring target NE extraction unit 220-3 is the NE identification information “1 to 9” included in the NE information by the number “2” of the monitoring servers 200 in the activated state. The NE100-1, NE100-3, NE100-5, NE100-7, and NE100-9 are those whose remainder is "1". The NE monitoring unit 240-3 monitors the monitoring target NE 100 corresponding to the NE identification information extracted by the monitoring target NE extraction unit 220-3 (step S7). FIG. 6 is a diagram illustrating a monitoring state of the network system 1 after the reallocation of the monitoring target is performed as described above.

このように、大量のＮＥ１００の監視を複数台の監視サーバ２００が分担して監視するネットワークシステム１において、複数台の監視サーバ２００のうちのいずれかの監視サーバ２００にエラーが発生して停止した場合に、停止した監視サーバ２００の監視対象とするＮＥ１００の監視を他の監視サーバ２００に再割り振りを行うようにすれば、監視サーバ２００が故障した場合にも、監視対象とする全てのＮＥ１００に対する監視を途切れさせることなく、また予め監視サーバ２００を増設しておくことなく、ＮＥ１００の監視の可用性を高めることが可能となる。なお、監視サーバ２００により行われる監視対象の割り振りは、管理サーバ３００で行うようにしても良い。この場合、管理サーバにより生成された監視対象の割り振りを示す割り振り結果情報が各監視サーバ２００に送信され、各監視サーバ２００は、管理サーバ２００から受信した割り振り結果情報に基づいて監視対象のＮＥ１００を監視する。 As described above, in the network system 1 in which a large number of monitoring servers 200 share and monitor a large number of NEs 100, one of the monitoring servers 200 has stopped due to an error. In this case, if the monitoring of the NE 100 that is the monitoring target of the stopped monitoring server 200 is re-allocated to another monitoring server 200, even if the monitoring server 200 breaks down, all the NEs 100 that are the monitoring target are monitored. It is possible to increase the availability of monitoring of the NE 100 without interrupting monitoring and without adding the monitoring server 200 in advance. The allocation of the monitoring target performed by the monitoring server 200 may be performed by the management server 300. In this case, the allocation result information indicating the allocation of the monitoring target generated by the management server is transmitted to each monitoring server 200, and each monitoring server 200 determines the NE 100 to be monitored based on the allocation result information received from the management server 200. Monitor.

＜第２の実施形態＞
次に、本発明の第２の実施形態について説明する。第１の実施形態では、複数の監視サーバ２００のうちいずれかの監視サーバ２００が停止状態となった場合、稼動中の監視サーバ２００に管理サーバ３００が割り振り情報を送信し、監視サーバ２００の監視対象のＮＥ１００の再割り振りを行う例を説明した。ここで、多数の監視サーバ２００がエラーの発生や工事などで停止状態となり、稼動中の監視サーバ２００の台数が著しく減少した場合、稼動中の監視サーバ２００に、その処理能力を超えて監視対象のＮＥ１００が割り振られることが考えられる。そこで、監視サーバ２００の処理負荷が一定の閾値を超えることが推測される場合、監視サーバ２００がＮＥ１００を監視する監視周期を延伸し、監視サーバ２００に過剰な負荷がかからないように縮退運転を行うようにしても良い。 <Second Embodiment>
Next, a second embodiment of the present invention will be described. In the first embodiment, when any one of the plurality of monitoring servers 200 is stopped, the management server 300 transmits the allocation information to the active monitoring server 200, and the monitoring server 200 monitors the monitoring server 200. An example of performing reallocation of the target NE 100 has been described. Here, if a large number of monitoring servers 200 are stopped due to an error or construction, and the number of operating monitoring servers 200 is remarkably reduced, the monitoring servers 200 in operation exceed the processing capacity and are monitored. It is conceivable that the NE100 is allocated. Therefore, when it is estimated that the processing load of the monitoring server 200 exceeds a certain threshold value, the monitoring server 200 extends the monitoring cycle for monitoring the NE 100 and performs a degenerate operation so that the monitoring server 200 is not overloaded. You may do it.

この場合、例えば、図７に示すように、管理サーバ３００に監視対象ＮＥ抽出部３８０と監視周期算出部３９０とを設ける。
監視対象ＮＥ抽出部３８０は、監視サーバ２００が備える監視対象ＮＥ抽出部２２０と同様の処理を行う。すなわち、監視対象ＮＥ抽出部３８０は、割り振り情報送信部３５０によって生成される割り振り情報と、予め定められた割り振り条件とに基づいて、監視サーバ２００のそれぞれが監視対象とするＮＥ１００の識別情報を抽出する。 In this case, for example, as illustrated in FIG. 7, the management server 300 includes a monitoring target NE extraction unit 380 and a monitoring cycle calculation unit 390.
The monitoring target NE extraction unit 380 performs the same processing as the monitoring target NE extraction unit 220 included in the monitoring server 200. That is, the monitoring target NE extraction unit 380 extracts the identification information of the NE 100 to be monitored by each of the monitoring servers 200 based on the allocation information generated by the allocation information transmission unit 350 and a predetermined allocation condition. To do.

監視周期算出部３９０は、監視サーバ２００が監視対象のＮＥ１００にポーリングを行う周期を算出する。ここで、監視周期算出部３９０は、監視対象ＮＥ抽出部３８０が抽出した監視サーバ２００毎の監視対象のＮＥ１００の割り振り台数と、監視サーバ情報記憶部３４０に記憶された監視サーバ２００のマシン性能（ＣＰＵ使用率、メモリ容量など）とに基づいて監視サーバ２００の処理負荷を算出する。監視周期算出部３９０は、算出した監視サーバ２００の処理負荷が予め定められた閾値を超える場合、処理負荷が閾値を下回る監視サーバ２００の監視周期を算出する。監視周期算出部３９０が算出した監視周期は、通信部３２０を介して監視サーバ２００に送信される。監視サーバ２００のＮＥ監視部２４０は、管理サーバ３００から送信される監視周期に応じた周期で、監視対象のＮＥ１００にポーリングを行う。 The monitoring cycle calculation unit 390 calculates a cycle in which the monitoring server 200 polls the NE 100 to be monitored. Here, the monitoring cycle calculation unit 390 allocates the number of monitoring target NEs 100 for each monitoring server 200 extracted by the monitoring target NE extraction unit 380 and the machine performance of the monitoring server 200 stored in the monitoring server information storage unit 340 ( The processing load of the monitoring server 200 is calculated based on the CPU usage rate, the memory capacity, and the like. When the calculated processing load of the monitoring server 200 exceeds a predetermined threshold, the monitoring cycle calculation unit 390 calculates the monitoring cycle of the monitoring server 200 where the processing load falls below the threshold. The monitoring period calculated by the monitoring period calculation unit 390 is transmitted to the monitoring server 200 via the communication unit 320. The NE monitoring unit 240 of the monitoring server 200 polls the NE 100 to be monitored at a cycle corresponding to the monitoring cycle transmitted from the management server 300.

＜第３の実施形態＞
次に、本発明の第３の実施形態について説明する。第２の実施形態では、監視対象の再割り振りを行うと、監視サーバ２００の処理負荷が予め定められた閾値を超える場合に、監視サーバ２００のマシン性能に応じてＮＥ１００の監視周期を延伸して縮退運転を行う例を説明した。ここで、監視サーバ２００がＮＥ１００を監視する監視周期は、監視サーバ情報記憶部３４０に記憶される監視サーバ情報に含まれるポーリング所要時間に基づき、安全率を考慮して算出するようにしても良い。例えば、監視周期算出部３９０は、特定の監視サーバ２００の監視対象のＮＥ１００の台数とポーリング所要時間との比に応じて、再割り振りによって増加する監視対象の台数に応じたポーリング所要時間を推測して算出し、ポーリング時間の推測値に安全率（例えば、１．５倍）を乗じた時間を、ＮＥ１００を監視する監視周期として算出するようにしても良い。 <Third Embodiment>
Next, a third embodiment of the present invention will be described. In the second embodiment, when reallocation of a monitoring target is performed, if the processing load of the monitoring server 200 exceeds a predetermined threshold, the monitoring cycle of the NE 100 is extended according to the machine performance of the monitoring server 200. The example which performs degenerate operation was demonstrated. Here, the monitoring cycle in which the monitoring server 200 monitors the NE 100 may be calculated in consideration of the safety factor based on the polling required time included in the monitoring server information stored in the monitoring server information storage unit 340. . For example, the monitoring cycle calculation unit 390 estimates the polling required time according to the number of monitoring targets increased by the reallocation according to the ratio of the number of monitoring target NEs 100 of the specific monitoring server 200 and the required polling time. The time obtained by multiplying the estimated polling time by a safety factor (for example, 1.5 times) may be calculated as a monitoring cycle for monitoring the NE 100.

＜第４の実施形態＞
次に、本発明の第４の実施形態について説明する。第１の実施形態では、複数の監視サーバ２００のうちいずれかの監視サーバ２００が停止状態となった場合、稼動中の監視サーバ２００の全てに管理サーバ３００が割り振り情報を送信し、監視サーバ２００の監視対象のＮＥ１００の再割り振りを行う例を説明した。ここで、監視対象ＮＥ１００の再割り振りは、停止した監視サーバ２００が監視対象としていたＮＥ１００のみを、稼動中の監視サーバ２００に割り振るようにしても良い。例えば、図８に示すように、図１において監視サーバ２００−１の監視対象であったＮＥ１００−３を監視サーバ２００−２の監視対象として割り振り、監視サーバ２００−１の監視対象であったＮＥ１００−６、ＮＥ１００−９を監視サーバ２００−３の監視対象に割り振る。これにより、全ての監視サーバ２００の監視対象の再割り振りを行うことに比べて、監視対象の割り振りに変動のあるＮＥ１００の数が少なくなり、効率よく監視対象の再割り振りを行うことができる。 <Fourth Embodiment>
Next, a fourth embodiment of the present invention will be described. In the first embodiment, when any one of the plurality of monitoring servers 200 is stopped, the management server 300 transmits the allocation information to all of the active monitoring servers 200, and the monitoring server 200 An example of performing reallocation of the NE 100 to be monitored has been described. Here, the reallocation of the monitoring target NE100 may be performed by allocating only the NE100 that is the monitoring target of the stopped monitoring server 200 to the active monitoring server 200. For example, as shown in FIG. 8, the NE 100-3 that is the monitoring target of the monitoring server 200-1 in FIG. 1 is allocated as the monitoring target of the monitoring server 200-2, and the NE 100 that is the monitoring target of the monitoring server 200-1 is allocated. -6, NE100-9 is allocated to the monitoring target of the monitoring server 200-3. As a result, the number of NEs 100 that vary in the allocation of monitoring targets is reduced compared to performing the reallocation of monitoring targets of all the monitoring servers 200, and the monitoring targets can be efficiently reassigned.

なお、上述の実施形態では、稼動中の監視サーバ２００の台数に応じて、監視対象のＮＥ１００を割り振るようにしたが、稼動中の監視サーバ２００のマシン性能に応じて監視対象のＮＥ１００を割り振るようにしても良い。ここで、マシン性能は、例えば、監視サーバ情報記憶部３４０に記憶される監視サーバ情報に含まれるＣＰＵ使用率やメモリ容量などを用いることができる。例えば、監視サーバ２００−２のメモリ量が監視サーバ２００−２のメモリ量の倍であれば、倍の数のＮＥ１００を監視対象として割り振ることが考えられる。 In the above-described embodiment, the monitoring target NE 100 is allocated according to the number of operating monitoring servers 200. However, the monitoring target NE 100 is allocated according to the machine performance of the operating monitoring server 200. Anyway. Here, as the machine performance, for example, a CPU usage rate or a memory capacity included in the monitoring server information stored in the monitoring server information storage unit 340 can be used. For example, if the amount of memory of the monitoring server 200-2 is twice the amount of memory of the monitoring server 200-2, it is possible to allocate twice as many NEs 100 as monitoring targets.

また、監視サーバ情報記憶部３４０に記憶された監視サーバ２００のＨＤＤ空き容量に応じて監視対象のＮＥ１００を割り振るようにしても良い。特に、ＮＥ１００から送信されるエラー情報が監視サーバ２００に一時的に蓄積される場合には、ＨＤＤ空き容量がより多い監視サーバ２００により多くのＮＥ１００を監視対象として割り振ることで、複数台の監視サーバ２００を有効に利用することができる。
また、予め１台または複数台の待機系の監視サーバ２００を停止状態でネットワークシステム１に接続しておき、停止した監視サーバ２００が監視対象としていたＮＥ１００の監視を、待機系の監視サーバ２００に割り振るようにしても良い。 Further, the monitoring target NE 100 may be allocated according to the HDD free space of the monitoring server 200 stored in the monitoring server information storage unit 340. In particular, when error information transmitted from the NE 100 is temporarily stored in the monitoring server 200, a plurality of monitoring servers can be allocated by allocating more NEs 100 as monitoring targets to the monitoring server 200 with more HDD free space. 200 can be used effectively.
In addition, one or more standby monitoring servers 200 are connected to the network system 1 in a stopped state in advance, and monitoring of the NE 100 that is the monitoring target of the stopped monitoring server 200 is sent to the standby monitoring server 200. You may make it allocate.

また、上述の実施形態では、監視サーバ２００のマシン性能、またはポーリング所要時間に応じて監視サーバ２００がＮＥ１００を監視する監視周期を延伸して縮退運転を行うこととしたが、例えば、一定の台数を超えて監視サーバ２００が停止した場合、監視対象のＮＥ１００の割り振りを行う稼動中の監視サーバ２００がＮＥ１００を監視する監視周期を予め定められた一定倍（例えば、２倍）にすることにより縮退運転を行うようにしても良い。また、同様に、１台あたりの監視サーバ２００が監視するＮＥ１００の数が一定の台数を超えた場合、監視対象のＮＥ１００の割り振りを行う稼働中の監視サーバ２００の監視周期を予め定められた一定倍（例えば、２倍）にすることにより縮退運転を行うようにしても良い。 In the above-described embodiment, the monitoring server 200 performs the degenerate operation by extending the monitoring cycle for monitoring the NE 100 according to the machine performance of the monitoring server 200 or the required polling time. When the monitoring server 200 stops exceeding the limit, the monitoring server 200 in operation that allocates the NE 100 to be monitored degenerates by multiplying the monitoring cycle for monitoring the NE 100 by a predetermined fixed time (for example, 2 times). You may make it drive | work. Similarly, when the number of NEs 100 monitored by one monitoring server 200 exceeds a certain number, the monitoring period of the active monitoring server 200 that allocates the monitoring target NEs 100 is set to a predetermined constant. You may make it perform degenerate operation by making it double (for example, 2 times).

なお、本発明における処理部の機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによりネットワークの監視を行ってもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータシステム」は、ホームページ提供環境（あるいは表示環境）を備えたＷＷＷシステムも含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリ（ＲＡＭ）のように、一定時間プログラムを保持しているものも含むものとする。 The program for realizing the function of the processing unit in the present invention is recorded on a computer-readable recording medium, and the program recorded on the recording medium is read into the computer system and executed to monitor the network. You may go. The “computer system” here includes an OS and hardware such as peripheral devices. The “computer system” includes a WWW system having a homepage providing environment (or display environment). The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Further, the “computer-readable recording medium” refers to a volatile memory (RAM) in a computer system that becomes a server or a client when a program is transmitted via a network such as the Internet or a communication line such as a telephone line. In addition, those holding programs for a certain period of time are also included.

また、上記プログラムは、このプログラムを記憶装置等に格納したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネット等のネットワーク（通信網）や電話回線等の通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。また、上記プログラムは、前述した機能の一部を実現するためのものであっても良い。さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であっても良い。 The program may be transmitted from a computer system storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the “transmission medium” for transmitting the program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line. The program may be for realizing a part of the functions described above. Furthermore, what can implement | achieve the function mentioned above in combination with the program already recorded on the computer system, and what is called a difference file (difference program) may be sufficient.

本発明の一実施形態によるネットワークシステムの構成を示す図である。It is a figure which shows the structure of the network system by one Embodiment of this invention. 本発明の一実施形態による管理サーバと監視サーバとの構成を示すブロック図である。It is a block diagram which shows the structure of the management server and monitoring server by one Embodiment of this invention. 本発明の一実施形態によるＮＥ情報のデータ例を示す図である。It is a figure which shows the example of data of NE information by one Embodiment of this invention. 本発明の一実施形態による監視サーバ状態情報のデータ例を示す図である。It is a figure which shows the example of data of the monitoring server status information by one Embodiment of this invention. 本発明の一実施形態によるネットワークシステムの動作例を示す図である。It is a figure which shows the operation example of the network system by one Embodiment of this invention. 本発明の一実施形態により監視対象の再割り振りが行われたネットワークシステムの構成を示す図である。It is a figure which shows the structure of the network system by which reallocation of the monitoring object was performed by one Embodiment of this invention. 本発明の一実施形態による管理サーバと監視サーバとの構成を示すブロック図である。It is a block diagram which shows the structure of the management server and monitoring server by one Embodiment of this invention. 本発明の一実施形態により監視対象の再割り振りが行われたネットワークシステムの構成を示す図である。It is a figure which shows the structure of the network system by which reallocation of the monitoring object was performed by one Embodiment of this invention. 従来技術によるネットワークシステムの構成を示す図である。It is a figure which shows the structure of the network system by a prior art. 従来技術によるネットワークシステムの構成を示す図である。It is a figure which shows the structure of the network system by a prior art.

Explanation of symbols

１ネットワークシステム
１００ＮＥ
２００監視サーバ
２１０ＮＥ情報記憶部
２２０監視対象ＮＥ抽出部
２３０通信部
２４０ＮＥ監視部
２５０ＯＳ
３００管理サーバ
３１０ＮＥ情報記憶部
３２０通信部
３３０監視サーバ状態取得部
３４０監視サーバ情報記憶部
３５０割り振り情報送信部
３６０エラー情報記憶部
３７０ＯＳ
３８０監視対象ＮＥ抽出部
３９０監視周期算出部
４００ＡＰサーバ
５００監視用クライアント端末
６００ＮＥ
７００監視サーバ 1 Network system 100 NE
200 Monitoring Server 210 NE Information Storage Unit 220 Monitoring Target NE Extraction Unit 230 Communication Unit 240 NE Monitoring Unit 250 OS
300 Management Server 310 NE Information Storage Unit 320 Communication Unit 330 Monitoring Server Status Acquisition Unit 340 Monitoring Server Information Storage Unit 350 Allocation Information Transmission Unit 360 Error Information Storage Unit 370 OS
380 Monitoring target NE extraction unit 390 Monitoring cycle calculation unit 400 AP server 500 Monitoring client terminal 600 NE
700 Monitoring server

Claims

Among a plurality of network elements constituting a communication network, a plurality of monitoring servers that monitor a predetermined network element to be monitored and receive error information transmitted from the network element, and a plurality of monitoring servers A network monitoring system comprising: a management server for storing the error information transmitted from each;
The monitoring server is
A receiving unit that receives allocation information for extracting a network element to be monitored from the plurality of network elements from the management server;
A network element monitoring unit that monitors the network element to be monitored according to the allocation information received by the receiving unit,
The management server
A detection unit for detecting that the monitoring process of the network element by any one of the plurality of monitoring servers is stopped;
When the detection unit detects that the monitoring process of the network element is stopped, the network element that is the monitoring target of the monitoring server that stopped the monitoring process is sent to a monitoring server other than the monitoring server that stopped the monitoring process. An allocation information transmission unit for transmitting allocation information for extracting a network element including a monitoring target;
A network monitoring system comprising:

The allocation information transmitted by the allocation information transmission unit of the management server extracts network elements to be monitored by each of the monitoring servers other than the monitoring server from all of the plurality of network elements constituting the communication network. The network monitoring system according to claim 1, wherein the network monitoring system is allocation information.

The allocation information transmitted by the allocation information transmission unit of the management server extracts only the network elements that are monitored by the monitoring server that stopped the monitoring process, and the monitoring servers other than the monitoring server extract as additional monitoring targets The network monitoring system according to claim 1, wherein the network monitoring system is assigned information.

The monitoring server is
The monitoring process of the network element to be monitored is performed every predetermined monitoring cycle,
The management server
A monitoring server information storage unit that stores monitoring server information indicating the processing capability of the monitoring server;
The processing load of the monitoring process of the monitoring server based on the network element allocated as a monitoring target to the monitoring server by the allocation information transmitted by the allocation information transmitting unit and the monitoring server information corresponding to the monitoring server is The network according to any one of claims 1 to 3, wherein when the predetermined threshold value is exceeded, information indicating that the monitoring period is extended is transmitted to the monitoring server. Monitoring system.

The monitoring server information stored in the monitoring server information storage unit of the management server includes information indicating the processing capability of the monitoring server according to a CPU usage rate and a memory amount of the monitoring server. The network monitoring system according to claim 4.

The monitoring server information stored in the monitoring server information storage unit of the management server includes information indicating the processing capability of the monitoring server according to the time required for the monitoring process of the network element by the monitoring server. The network monitoring system according to claim 4 or 5.