JP2009171476A

JP2009171476A - Server management system and its construction method

Info

Publication number: JP2009171476A
Application number: JP2008009977A
Authority: JP
Inventors: Yoshihide Shirai; 良英白井; Yu Yoshimura; 裕吉村
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2008-01-21
Filing date: 2008-01-21
Publication date: 2009-07-30

Abstract

<P>PROBLEM TO BE SOLVED: To solve the problem that the necessity for a large amount of communication occurs, since it is necessary for a management server to communicate with all servers being managing objects and monitor their states, to confirm whether all operating servers are operated normally on the occasion of managing a large number of servers. <P>SOLUTION: With respect to a system wherein communication is performed between a management server and a large number of servers being managing objects, a highest rank representative server is determined from among the servers of managing objects, and low order servers are formed into a group. Moreover from among the formed group, a representative server is determined, and moreover a low-order group is formed. By acquiring a mechanism for repeating these procedures, a hierarchical structure is constructed automatically. In this system, only the highest order representative server communicates with the management server, by using this constructed hierarchical structure, while putting together the management information of the managing object servers instead of making all the servers communicate with the management server. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、複数のサーバが稼動している環境において、管理サーバが複数のサーバを管理する際の通信方法に関する。 The present invention relates to a communication method used when a management server manages a plurality of servers in an environment in which the plurality of servers are operating.

システムの異常を検出して通報する場合やシステムが正常であることを定期的に通報する場合、管理対象サーバが受信センタへ通報することで監視を実施する保守システムが広く採用されている。インターネット等のネットワーク及び通信技術が進歩した近年では、インターネット網を利用し、ＩＰ（インターネット・プロトコル）を使用して通報する保守システムが一般的である。この保守システムの特徴として、受信センタで管理可能なサーバ数および同時処理可能なサーバ数が挙げられる。１つの通報受信センタでは、管理可能なサーバ数は限られているが、管理可能サーバ数は同時処理可能なサーバ数に比較して遥かに多くのサーバ数となっている。その理由は、複数の通報装置が同時に通報を行う場合を確率的に考慮して設計されるためである。 When a system abnormality is detected and reported, or when a system is informed regularly, a maintenance system is widely adopted in which monitoring is performed by a managed server reporting to a receiving center. In recent years when networks such as the Internet and communication technologies have advanced, a maintenance system that uses the Internet network and reports using IP (Internet Protocol) is common. The features of this maintenance system include the number of servers that can be managed at the receiving center and the number of servers that can be processed simultaneously. In one report receiving center, the number of servers that can be managed is limited, but the number of servers that can be managed is much larger than the number of servers that can be processed simultaneously. The reason is that it is designed considering probabilistically the case where a plurality of reporting devices simultaneously report.

監視システムに関する従来技術は、多数の技術文献に開示又は提案されている。複数の監視装置と複数の監視対象機器との間の配線効率を高め、複数の監視装置に対する単一の監視センタの通信を効率化する保守システムが開示されている（特許文献１参照）。また、通報を受信するセンタが各監視対象を監視して、センタの指示によって監視対象を階層化することで、複数の監視対象から同時に通報される場合に、通報を効率化する方法も開示されている(例えば、特許文献２参照)。そして、各サーバから通報受信センタに通報せずに済むように、複数の通報を受けて通報する装置を置いて、監視を効率化する方法がとられること(例えば、非特許文献１参照)がある。 Prior art relating to monitoring systems is disclosed or proposed in a number of technical literature. There has been disclosed a maintenance system that improves the wiring efficiency between a plurality of monitoring devices and a plurality of devices to be monitored, and improves the efficiency of communication of a single monitoring center with respect to the plurality of monitoring devices (see Patent Document 1). Also disclosed is a method for improving the efficiency of notification when a center that receives a notification monitors each monitoring target and stratifies the monitoring target according to the instructions of the center so that a plurality of monitoring targets can be notified simultaneously. (For example, refer to Patent Document 2). And, in order to avoid the need to report from each server to the report receiving center, a method is adopted in which a device for receiving and reporting a plurality of reports is placed to improve the monitoring efficiency (for example, see Non-Patent Document 1). is there.

特開２００１−２３０６４号公報JP 2001-23064 A 特開２００６−３１０９４７号公報JP 2006-310947 A LogStare Enterprise[terilogy社] (http://www.secuavail.com/service/logstareoutline.html)LogStare Enterprise [terilogy] (http://www.secuavail.com/service/logstareoutline.html)

多数のサーバを管理する際に、稼動している全てのサーバが正常に動作しているかを確認するためには、管理サーバと全ての管理対象のサーバと通信して状態監視を行う必要がある。従来では、顧客サイトにある管理対象のサーバと監視センタ間において専用回線を用いることが出来ず、一般のインターネット回線を用いざるを得ないサーバ管理システムにおいては、顧客サイトの各サーバと監視センタ間で通信を行うため、大量の通信が発生する。これに対しては、多数の通信を集約することで解決する既存の手段がある。しかし、複数の通報を受けて通報する装置を置いて、監視を効率化する方法がとられる場合（特許文献１又は非特許文献１）、顧客のネットワーク内に装置を置く必要があり、直接の業務に関連しない装置を購入するコストや、装置を配置するスペース、装置を設定する手間が必要となることと、通信を集約する装置自身の障害監視も必要となる。階層構造を成し、代表監視対象が通報するシステムの場合(特許文献２)、代表監視対象に障害が発生した場合の通報手段を失ってしまう。また、管理するサーバが多い場合、階層構造を構築する手間がかかることも問題となる。 When managing a large number of servers, it is necessary to monitor the status by communicating with the management server and all the servers to be managed in order to check whether all the operating servers are operating normally. . Conventionally, in a server management system in which a dedicated line cannot be used between a server to be managed at a customer site and a monitoring center, and a general Internet line must be used, between each server at the customer site and the monitoring center A large amount of communication occurs because of communication. There is an existing means to solve this problem by aggregating many communications. However, when a method for improving the efficiency of monitoring by placing a device that receives a plurality of notifications and reporting (Patent Document 1 or Non-Patent Document 1), it is necessary to place the device in the customer's network, The cost of purchasing a device that is not related to business, the space for arranging the device, the trouble of setting the device, and the failure monitoring of the device itself that consolidates communication are also required. In the case of a system that has a hierarchical structure and the representative monitoring target reports (Patent Document 2), the reporting means when a failure occurs in the representative monitoring target is lost. In addition, when there are many servers to be managed, it takes time to construct a hierarchical structure.

本発明は、以上の課題を解決する。 The present invention solves the above problems.

本発明では、管理サーバと管理対象サーバ間での通信するシステムに対して、全てのサーバと通信を行うのではなく、全体の管理対象サーバの状態を集約して代表サーバから通信を行い、代表サーバの障害時の対処として待機サーバを備える方式とする。全サーバが本システムのサーバ情報テーブルを備え、以下の手順を実施することで本システムの構築を実現する。
（１）サーバ群から管理サーバへ直接通信を実施する最上位代表サーバを決定する。
（２）最上位代表サーバを代表サーバと定義する。
（３）代表サーバが管理可能な数だけのサーバをグループとして下位に割り当てる。
（４）前記（３）で割り当てた下位グループから待機サーバを割り当てる。
（５）前記（３）で割り当てた下位グループのうちの一つを代表サーバとする。
（６）前記（５）で割り当てた代表サーバに対して、管理可能な数のサーバだけのグループを下位に割り当てる。
（７）待機サーバではなく、下位グループの割り当てのないサーバを代表サーバと定義する。
（８）前記（３）〜（７）のステップを繰り返すことによって、グループに属さないサーバが無くなるまで、全てのサーバをグループに割り当て、その割り当てをテーブルに反映する。
（９）前記（８）で作成したテーブルを全ての管理対象サーバへ展開する。
（１０）各サーバは、自サーバの代表サーバと下位のグループが割り当てられている場合は、下位のグループのサーバと接続を確認する。
（１１）代表サーバと待機サーバ間では、通報システムの状態を相互監視する。 In the present invention, the system that communicates between the management server and the managed server does not communicate with all the servers, but collects the status of the entire managed server and communicates from the representative server. A method of providing a standby server as a countermeasure when a server failure occurs. All servers have the server information table of this system, and the construction of this system is realized by executing the following procedure.
(1) The highest representative server that performs direct communication from the server group to the management server is determined.
(2) The highest representative server is defined as a representative server.
(3) All the servers that can be managed by the representative server are assigned as a group to the lower level.
(4) A standby server is assigned from the lower group assigned in (3).
(5) One of the subordinate groups assigned in (3) is set as a representative server.
(6) A group of only a manageable number of servers is assigned to the representative server assigned in (5) above.
(7) A server that is not a standby server and has no lower group assignment is defined as a representative server.
(8) By repeating the steps (3) to (7), all servers are assigned to the group until there are no servers belonging to the group, and the assignment is reflected in the table.
(9) The table created in (8) is expanded to all managed servers.
(10) Each server confirms the connection with the server of the lower group when the representative server of the own server and the lower group are assigned.
(11) The state of the reporting system is mutually monitored between the representative server and the standby server.

これらの機構を実現することで、階層構造を自動的に構築する。 By realizing these mechanisms, a hierarchical structure is automatically constructed.

通信システムの階層構造を自動的に構築する機構により、構築の手間が軽減される。さらに、構築された階層構造によって、全ての監視対象サーバと通信を実施しなくとも、最上位代表サーバが階層構造を使用して監視対象サーバの状態を収集することによって、最上位代表サーバとのみ通信を行うことで、全ての監視対象サーバの状態を収集することが出来る。このような仕組みにすることによって、管理サーバ及びそれに至る経路の負荷は軽減される。この負荷の軽減の度合いは、監視対象の数が多くなればなるほど、効果が上がる。また、全ての監視対象サーバが、さらに、待機サーバを設けることにより、代表サーバに障害が発生した場合でも通信システムの復旧が自動で行われる。また、管理サーバへ通信を行うのは、最上位代表サーバだけであるため、セキュリティの面でも効果がある。 The mechanism for automatically constructing the hierarchical structure of the communication system reduces the labor of construction. Furthermore, even if the established hierarchical structure does not communicate with all monitored servers, the highest representative server collects the status of the monitored servers using the hierarchical structure, so that only the highest representative server can By performing communication, it is possible to collect the status of all monitored servers. By adopting such a mechanism, the load on the management server and the route to it is reduced. The degree of reduction of the load increases as the number of monitoring targets increases. Further, since all the monitoring target servers are further provided with standby servers, the communication system is automatically restored even when a failure occurs in the representative server. Further, since only the highest representative server communicates with the management server, it is effective in terms of security.

実施例の説明をする上で、必要な図の説明を行う。図１は、本発明で対象とする通信システムの構成例を示した図である。図２は、本通信システムで実現するために必要なサーバの内部機構を示した図である。図３は、図２における通信システム基本データを示した図である。図４は、本通信システムの構成手順を示したシーケンス図である。図５は、図１に於いて最上位グループではないあるグループのサーバが上位グループの代表サーバへ送信する情報の一例を示した図である。図６は、図１に於いてグループ(G02)の代表サーバが上位代表サーバへ通報する情報の一例を示した図である。 Necessary figures will be described in describing the embodiment. FIG. 1 is a diagram showing a configuration example of a communication system targeted by the present invention. FIG. 2 is a diagram showing an internal mechanism of a server necessary for realizing this communication system. FIG. 3 is a diagram showing communication system basic data in FIG. FIG. 4 is a sequence diagram showing a configuration procedure of the communication system. FIG. 5 is a diagram illustrating an example of information transmitted from a server of a group that is not the highest group in FIG. 1 to a representative server of the upper group. FIG. 6 is a diagram showing an example of information reported from the representative server of the group (G02) to the upper representative server in FIG.

実施例の説明に入る。本実施例では、ユーザ企業は図１からなる通信システムを使用するものとする。拠点A(10)内の全てのサーバの死活監視をインターネット網(30)を介し、管理サーバ(20)が行うシステムである。 The description of the examples begins. In this embodiment, it is assumed that the user company uses the communication system shown in FIG. This is a system in which the management server (20) performs alive monitoring of all servers in the site A (10) via the Internet network (30).

まず、本システムの各サーバについての構成を図２を用いて説明する。本システムを構成するサーバは、通信システム基本データ(S12)、ハートビートレジスタ(S13)とハートビート監視機構(S24)を備えている。 First, the configuration of each server in this system will be described with reference to FIG. The server constituting this system includes communication system basic data (S12), a heartbeat register (S13), and a heartbeat monitoring mechanism (S24).

次に、本通信システムの構成手順を説明する。図４が、本通信システムの構成手順を示したシーケンス図であり、この図に沿って説明する。図４での評価値を本実施例では、一例として「稼働時間」を採用した。ユーザの要求によって仮代表サーバへシステム構築要求を行った。その要求の中で、ユーザによってパラメータの管理可能台数(m台、本例では３台)が仮代表サーバに与えられる(T01)。要求を受けた仮代表サーバは、ブロードキャストによって拠点内のサーバへ返信要求を行い、検出を行う。そして、検出されたサーバに対して、仮代表サーバは仮代表サーバの通信システム基本データ(S12)の全サーバの機器IDカラム(C10)、IPアドレスカラム(C12)、稼働時間カラム(C13)へ、情報を収集し更新する(T02)。さらに、稼働時間の長いサーバ(41)を最上位代表サーバと決定し、通信システム基本データ(S12)のサーバ(41)のグループ内の役割カラムを「代表」へ更新する。現在仮代表サーバが所持している通信システム基本データ(S12)を最上位代表サーバ(41)へ全てと、管理可能台数(m台)を引き継ぐ。今、代表サーバを最上位代表サーバと定義する(T03)。最上位代表サーバ(41)は代表サーバに管理可能な台数(m台)だけ、未割り当てのサーバを代表サーバの下位のグループへ稼働時間順で割り当て、最上位代表サーバ(41)の通信システム基本データ(S12)のグループIDカラム(C10)を更新する(T04)。T04で割り当てられたグループの中から、稼働時間の一番長いサーバを待機サーバへ決定し、最上位代表サーバ(41)の通信システム基本データ(S12)のグループ内の役割カラム(C16)を更新する(T05)。未割り当てのサーバがあるかを最上位代表サーバ(41)は、自サーバの通信システム基本データ(S12)のグループIDカラム(C10)から判断する(T06)。未割り当てのサーバがあれ代表サーバを、下位にグループが割り当てられていないサーバの中で待機サーバでない稼働時間が一番長いサーバと定義する(T07)。以上のT04〜T07のステップを繰り返すことによって全てのサーバがグループに割り当てられる。割り当てが終了したら、最上位代表サーバ(41)は割り当てを全サーバへ通知し終了する(T09)。各グループの代表サーバは、下位のグループの待機サーバとハートビートを構成する。以上で、本実施例の通信システムが構築される。 Next, a configuration procedure of the communication system will be described. FIG. 4 is a sequence diagram showing a configuration procedure of the communication system, which will be described along this diagram. In the present embodiment, “operating time” is adopted as an example of the evaluation value in FIG. A system construction request was made to the temporary representative server at the request of the user. In the request, the number of parameters that can be managed (m units, 3 units in this example) is given to the temporary representative server by the user (T01). The temporary representative server that has received the request makes a return request to the server in the base by broadcast and performs detection. For the detected server, the temporary representative server goes to the device ID column (C10), IP address column (C12), and operation time column (C13) of all servers in the communication system basic data (S12) of the temporary representative server. Collect and update information (T02). Further, the server (41) having a long operation time is determined as the highest representative server, and the role column in the group of the server (41) in the communication system basic data (S12) is updated to “representative”. The communication system basic data (S12) currently possessed by the temporary representative server is transferred to the highest representative server (41) and the manageable number (m units) is taken over. Now, the representative server is defined as the highest representative server (T03). The highest representative server (41) assigns unassigned servers to the lower group of the representative server in the order of operation time, as many as the number that can be managed by the representative server (m), and the communication system of the highest representative server (41) The group ID column (C10) of the data (S12) is updated (T04). From the group assigned in T04, the server with the longest operation time is determined as the standby server, and the role column (C16) in the group of the communication system basic data (S12) of the highest representative server (41) is updated. (T05). The highest representative server (41) determines whether there is an unassigned server from the group ID column (C10) of the communication system basic data (S12) of its own server (T06). If there is an unassigned server, the representative server is defined as the server with the longest operation time that is not the standby server among the servers to which no group is assigned (T07). By repeating the above steps T04 to T07, all servers are assigned to the group. When the assignment is completed, the highest representative server (41) notifies the assignment to all the servers and finishes (T09). The representative server of each group constitutes a heartbeat with the standby server of the lower group. Thus, the communication system of the present embodiment is constructed.

さらに、本実施例の通信システムにサーバが追加される場合の説明を行う。具体的に現構成(図１)でサーバが一つ追加されたと仮定する。最上位代表サーバ(41)は、定期的にサーバ追加検出用にブロードキャストを送信している。今、最上位代表サーバ(41)は追加されたサーバを検出した。最上位代表サーバの通信手段テーブル(S11)の最下に行が追加され、その追加された行のIPアドレスカラム(C12)に新規のIPアドレスである「11.203.22.11」と機器IDカラム(C10)に新規の機器IDである「74」を情報収集し、更新される。最上位代表サーバ(41)は、最下層のグループの中で代表サーバの管理可能な台数より少ないグループが存在する場合、そのグループと同じグループIDと階層の深さを自サーバの通信システム基本データ(S12)の追加されたサーバ(74)のグループIDカラム(C10)と階層の深さカラム(C15)へ更新する。それがない場合は、下位のグループの割り当てがないサーバ中でも最上層のサーバに、下位グループのサーバとして割り当て、自サーバの通信システム基本データ(S12)の追加されたサーバ(74)のグループIDカラム(C10)と階層の深さカラム(C15)、上位代表サーバカラム(C17)に、新規のグループIDと下位のグループの割り当てがないサーバ中でも最上層のサーバの階層の深さよりひとつ大きい階層の深さと上位代表サーバに下位のグループの割り当てがないサーバ中でも最上層のサーバの機器IDを入れる。その後、最上位代表サーバは自サーバの通信システム基本データ(S12)を全監視対象サーバに展開する。以上の手順によって、本実施例の通信システムにサーバが追加される。 Furthermore, the case where a server is added to the communication system of a present Example is demonstrated. Specifically, assume that one server is added in the current configuration (FIG. 1). The highest-level representative server (41) periodically transmits a broadcast for server addition detection. Now, the top representative server (41) has detected the added server. A row is added to the bottom of the communication means table (S11) of the highest representative server, and the new IP address `` 11.203.22.11 '' and the device ID column (C10) are added to the IP address column (C12) of the added row. ) Is collected and updated with the new device ID “74”. If there is a group that is lower than the number of representative servers that can be managed among the lowest-level groups, the highest-level representative server (41) uses the same group ID and hierarchy depth as the group's own communication system basic data. Update the group ID column (C10) and hierarchy depth column (C15) of the added server (74) in (S12). If there is not, the group ID column of the server (74) to which the communication system basic data (S12) of the local server has been added and assigned to the top layer server, even among servers without a lower group assignment (C10) and the hierarchy depth column (C15) and the upper representative server column (C17), even if there is no new group ID and lower group assignment, the depth of the hierarchy one level higher than the hierarchy of the top layer server. And the device ID of the uppermost server is entered even if the upper representative server has no lower group assignment. Thereafter, the highest representative server expands the communication system basic data (S12) of the own server to all monitoring target servers. With the above procedure, a server is added to the communication system of the present embodiment.

次に、死活監視方法の説明をする。通信システム基本データ(S12)の詳細である図３を用いて説明する。最上位代表サーバ(41)が、管理サーバ(20)より死活監視のリクエストを受信した。最上位代表サーバ(41)は、下位の代表サーバ(53)へ死活監視のリクエストを行う。以下、代表サーバは、下位の代表サーバへ死活監視のリクエストを再帰的に行っていく。そして、最下層グループの代表サーバ(63)から死活監視が実施される。具体的には、最下層のグループ(G03)のサーバをその上位グループ(G02)の代表サーバ(63)が、G03のサーバ群にリクエストを送信し、G03のサーバ群がG02の代表サーバ(63)へレスポンスを返信することにより、G02の代表サーバ(63)が自サーバの通信システム基本データ(S12)におけるステータスカラム(C14)をレスポンスを受信が確認されれば「正常」、そうでなければ「障害」に更新を行う。G02の代表サーバ(63)は、自分より下位のグループのサーバのステータスカラム(C14)の更新が完了したら、G01の代表サーバ(53)の通信情報基本データ(S12)の自分より下位のグループのサーバのステータスカラム(C14)の更新を行う。これを、代表サーバが上位代表サーバの通信システム基本データ(S12)のステータスカラム(C14)を再帰的に更新していくことで、最上位代表サーバ(41)の通信システム基本データ(S12)におけるステータスカラム(C14)に全てのサーバのステータス情報の更新が実現される。最上位代表サーバ(41)は、全てのサーバのステータス情報の更新が確認されれば、管理サーバ(20)へ通知を行う。以上で死活監視が実現される。 Next, the life and death monitoring method will be described. This will be described with reference to FIG. 3 which is the details of the communication system basic data (S12). The top-level representative server (41) received a request for alive monitoring from the management server (20). The highest representative server (41) makes a life / death monitoring request to the lower representative server (53). Hereinafter, the representative server recursively sends a life / death monitoring request to a lower representative server. Life monitoring is performed from the representative server (63) of the lowest group. Specifically, the representative server (63) of the upper level group (G02) sends a request to the server group of G03, and the server group of G03 is the representative server of G02 (63). ), If the G02 representative server (63) receives a response to the status column (C14) in the communication system basic data (S12) of its own server, the response is confirmed to be “normal”; Update to "failure". When the G02 representative server (63) completes the update of the status column (C14) of the server in the group lower than its own, the communication information basic data (S12) of the G01 representative server (53) in the group lower than itself is updated. Update the server status column (C14). The representative server recursively updates the status column (C14) of the communication system basic data (S12) of the upper representative server, so that the communication system basic data (S12) of the highest representative server (41) is updated. The status information of all servers is updated in the status column (C14). When the update of the status information of all the servers is confirmed, the highest representative server (41) notifies the management server (20). Life and death monitoring is realized as described above.

次に、待機サーバの役割について説明する。各サーバは、代表サーバへ通信する契機にて、自分のグループの待機サーバへも同じ通信を行う。待機サーバは、上位代表サーバとハートビートを構築している。そのハートビート監視で、上位代表サーバに障害を感知した場合、待機サーバは上位代表サーバと交代する。具体的に図3を利用して説明しよう。G01の代表サーバ(53)にて障害が発生し、待機サーバ(62)が検知した。待機サーバ(62)は、最上位代表サーバ(41)へ障害を通知する。通知を受信した最上位代表サーバ(41)は、自サーバの通信システム基本データ(S12)に対して障害が発生したサーバ(53)のステータスカラム(C14)を「正常」から「障害」へ変更する。最上位代表サーバ(41)は、G01の代表サーバであったサーバ(53)の代わりに、G02の待機サーバであるサーバ(62)をG01の新しい代表サーバとして割り当て、自サーバの通信システム基本データ(S12)のサーバ(62)のグループの役割カラム(C16)を「待機」から「代表」に書き換える。最上位代表サーバ(41)は、さらにG02の通常サーバであったサーバ(61)をG02待機サーバに割り当て、自サーバの通信システム基本データ(S12)のサーバ(61)のグループの役割カラム(C16)を「通常」から「待機」に書き換える。そして、最上位代表サーバ(41)は、自サーバの通信システム基本データ(S12)を全監視対象サーバに展開する。それを、受信した新G02待機サーバ(61)と新G01代表サーバ(62)は、ハートビートを構築する。以上の仕組みから、代表サーバが障害時でも本通信システムは維持することが可能である。 Next, the role of the standby server will be described. Each server performs the same communication with the standby server of its own group when communicating with the representative server. The standby server constructs a heartbeat with the upper representative server. In the heartbeat monitoring, when a failure is detected in the upper representative server, the standby server is replaced with the upper representative server. This will be specifically explained with reference to FIG. A failure occurred in the G01 representative server (53), and the standby server (62) detected it. The standby server (62) notifies the highest representative server (41) of the failure. Upon receiving the notification, the highest-level representative server (41) changes the status column (C14) of the server (53) where the failure occurred to the communication system basic data (S12) of its own server from "Normal" to "Fault". To do. Instead of the server (53) that was the G01 representative server, the top-level representative server (41) assigns the G02 standby server (62) as a new G01 representative server, and the communication system basic data of its own server The group role column (C16) of the server (62) in (S12) is rewritten from “standby” to “representative”. The highest representative server (41) further assigns the server (61), which was a normal server of G02, to the G02 standby server, and the role column (C16) of the server (61) of the communication system basic data (S12) of the local server. ) Is changed from “Normal” to “Standby”. Then, the highest representative server (41) expands the communication system basic data (S12) of the own server to all the monitoring target servers. The new G02 standby server (61) and the new G01 representative server (62) that received it construct a heartbeat. From the above mechanism, the communication system can be maintained even when the representative server is in failure.

本死活監視システムを利用せずに、全サーバの死活監視を直接管理サーバが行うシステムにすると、管理サーバと監視対象サーバとの間の通信数が膨大なものとなり、管理サーバ及びそれに至る経路に多大な負荷をかけてしまう。本死活監視システムを利用すれば、管理サーバ関与せずに階層構造が構築されているので、管理サーバへの通信が最上位代表サーバが纏めて通知をするため、管理サーバ及びそれに至る経路が軽負荷で実現可能である。 If the management server directly monitors the life and death of all servers without using this life and death monitoring system, the number of communications between the management server and the monitoring target server becomes enormous. A great load is applied. If this life and death monitoring system is used, the hierarchical structure is constructed without involvement of the management server, and therefore the communication to the management server is notified collectively by the highest-level representative server. It can be realized with a load.

尚、本実施例の通信システム構成で、死活監視のほかにもさらに、通報に関しても適用可能であることを説明する。本実施例では、ユーザ企業は図１からなる通信システムを使用するものとする。拠点A(10)内の全ての通報をインターネット網(30)を介し、管理サーバ(20)へE-mailにて通知するシステムである。 In addition, it demonstrates that the communication system structure of a present Example is applicable also about notification other than life and death monitoring. In this embodiment, it is assumed that the user company uses the communication system shown in FIG. This is a system for notifying all reports in the site A (10) via the Internet network (30) to the management server (20) by e-mail.

各サーバの通報が管理サーバまで通報される手順を以下に説明する。具体的に、図１を利用してまず例を説明する。サーバ(63)に通報要因が発生した。サーバ(63)は自分の通報システム基本データ(S12)全体を参照して、宛先「上位代表サーバのIPアドレス(I01)」、送信元機器ID「自サーバ機器ID(I02)」、階層の深さ「自サーバの階層の深さ(I03)」、通報数「1(I04)」、機器ID「自サーバの機器ID(I05)」、通報要因「通報要因(I06)」からなるデータの送信データ(図５)を作成し、送信する。それをグループG1の代表サーバ(53)が受信した。さらに、代表サーバ(53)はサーバ(61)からも通報を受信していたとする。そのとき、代表サーバ(53)は、通報のマージを行い、さらに上位の階層に通報を行う。この場合であれば、宛先(I11)「上位代表サーバ(41)のIPアドレス」、送信元機器ID(I12)「自サーバ機器ID(53)」、階層の深さ(I13)「自サーバの階層の深さ」、通報数(I14)「サーバ(61)からきた通報の通報数(I24)＋サーバ(63)からきた通報の通報数(I04)」と、サーバ(61)の機器ID(I22)と通報要因(I26)、サーバ(63)の機器ID(I07)と通報要因(I08)からなる送信データ(図６)を作成し、上位代表サーバ(40)へ送信する。そして、その情報(図６)を受信した最上位代表サーバ(41)は、管理サーバへ既知の各種通報手段(E-mail、FTP、SNMP等)で通報を行う。本実施例ではE-mailで通報する。これにより、直接管理サーバ(20)へ通報するシステムでは管理サーバ(20)へ2通の通報が送信されるが、本システムでは1通の通報で、2つの通報分に相当する情報が送信できる。本例では、わかりやすいように説明を2つの通報で行ったが、2つより多い通報でも同様のシーケンスで1通のE-mailで通報することが可能であることは明らかだろう。 The procedure for reporting each server to the management server will be described below. Specifically, an example will be described first with reference to FIG. A reporting factor occurred on server (63). The server (63) refers to the entire basic data (S12) of its own reporting system, and sends the destination “IP address of the upper representative server (I01)”, the source device ID “own server device ID (I02)”, The transmission of data consisting of “the depth of the own server hierarchy (I03)”, the number of notifications “1 (I04)”, the device ID “device ID of the own server (I05)”, and the notification factor “notification factor (I06)” Create and send data (Figure 5). This is received by the representative server (53) of group G1. Furthermore, it is assumed that the representative server (53) has received a report from the server (61). At that time, the representative server (53) performs notification merging and reports to a higher hierarchy. In this case, the destination (I11) “IP address of the host server (41)”, source device ID (I12) “local server device ID (53)”, hierarchy depth (I13) “local server “Depth of hierarchy”, number of reports (I14) “number of reports from server (61) (I24) + number of reports from server (63) (I04)”, and server (61) device ID ( Transmission data (FIG. 6) consisting of I22), notification factor (I26), server ID (I07) and notification factor (I08) of the server (63) is created and transmitted to the upper representative server (40). Then, the top-level representative server (41) that has received the information (FIG. 6) reports to the management server by various known reporting means (E-mail, FTP, SNMP, etc.). In this embodiment, notification is made by e-mail. As a result, in the system that reports directly to the management server (20), two notifications are sent to the management server (20), but in this system, information corresponding to two notifications can be sent with one notification. . In this example, the explanation was made with two notifications for the sake of clarity, but it will be clear that even if there are more than two notifications, it is possible to report with one e-mail in the same sequence.

本通報システムを利用せずに、全てのサーバが直接管理サーバへE-mailで通報を行うシステムにすると、全サーバからの通報数と管理サーバへのE-mail通報の発行数が同じであるため、通報を絞る工夫をしなければ、多数の通報は管理サーバに負担をかけてしまう。本通報システムを利用すれば、多くの通報を纏めてE-mailで管理サーバへ通報されるため、多数のサーバからなるサイトであったとしても、管理サーバは軽負荷で受信可能である。さらに、どのサーバに障害が発生した場合でも、代表サーバと待機サーバの冗長構成により、通報システムを維持することが可能である。 If all the servers report directly to the management server via E-mail without using this notification system, the number of reports from all servers and the number of E-mail reports issued to the management server are the same. For this reason, a large number of notifications puts a burden on the management server unless the device is narrowed down. If this notification system is used, many notifications are collected and reported to the management server by e-mail, so even if the site is composed of a large number of servers, the management server can receive light loads. Furthermore, even if a failure occurs in any server, the reporting system can be maintained by the redundant configuration of the representative server and the standby server.

本発明で対象とする通信システムの構成例。The structural example of the communication system made into object by this invention. 本通信システムを実現する際に、システムの構成要素をなす各サーバの内部機構を示した図。The figure which showed the internal mechanism of each server which comprises the component of a system, when implement | achieving this communication system. 図２における通信システム基本データを示した図。The figure which showed the communication system basic data in FIG. 本通信システムの構成手順を示したシーケンス図。The sequence diagram which showed the structure procedure of this communication system. 図１に於いて最上位グループではないあるグループ(G02)のサーバが上位代表サーバへ送信する情報の一例を示した図。The figure which showed an example of the information which the server of a certain group (G02) which is not the highest group in FIG. 1 transmits to a high-order representative server. 図１に於いてグループ(G02)の代表サーバが上位代表サーバへ通報する情報の一例を示した図。The figure which showed an example of the information which the representative server of a group (G02) reports to a high-order representative server in FIG.

Explanation of symbols

(10)…拠点A、(20)…管理サーバ、(30)…インターネット網、(41)…最上位代表サーバ、(51)…サーバ、(52)…待機サーバ、(53)…代表サーバ、(61)…サーバ、(62)…待機サーバ、(63)…代表サーバ、(71)…サーバ、(73)…サーバ。 (10) ... Location A, (20) ... Management server, (30) ... Internet network, (41) ... Top representative server, (51) ... Server, (52) ... Standby server, (53) ... Representative server, (61) ... Server, (62) ... Standby server, (63) ... Representative server, (71) ... Server, (73) ... Server.

Claims

In a system in which a plurality of servers are connected to a management server via a network such as the Internet and communicate between the server and the management server, a step of determining a representative server and a server managed under the server And a step of recursively performing the two steps as a server representative of the servers in the group, and automatically forming a hierarchical group, and all the servers manage the management A server management system in which a server representing management information of all servers is aggregated and communicated to the management server without directly communicating with the server.