JP6315467B2

JP6315467B2 - Network recovery system and program

Info

Publication number: JP6315467B2
Application number: JP2014173172A
Authority: JP
Inventors: 圭介黒木; 林　通秋; 通秋林
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2014-08-27
Filing date: 2014-08-27
Publication date: 2018-04-25
Anticipated expiration: 2034-08-27
Also published as: JP2016048468A; WO2016031805A1

Description

本発明は、複数のノードを含む少なくとも一つのネットワーク内で発生するノードの障害を監視する監視システムおよび各ノードを制御するノード制御システムに接続され、いずれかのノードで障害が発生した場合、ノード制御システムに対し、サービス単位でネットワークを再構成する指示を与えるネットワーク復旧システムおよびプログラムに関する。 The present invention is connected to a monitoring system that monitors a failure of a node occurring in at least one network including a plurality of nodes and a node control system that controls each node. When a failure occurs in any of the nodes, the node The present invention relates to a network recovery system and a program for giving an instruction to reconfigure a network for each service to a control system.

従来から、ＳＤＮ（Software Defined Networking）／ＮＦＶ（Network Function Virtualization）関連の研究や技術開発が盛んに行なわれている。ＳＤＮ／ＮＦＶによって新規のサービス開通の短縮化やＣＡＰＥＸ（Capital expenditure）／ＯＰＥＸ（Operating expenditure)削減の効果が期待されている。ここで、ＯＰＥＸの削減を図るためには、ＳＤＮ／ＮＦＶを実現する仮想化ネットワーク上で障害復旧の自動化などが求められる。例えば、機器故障単位や仮想マシン単位等、ミクロな視点の自動復旧等ではなく、サービス単位での自動復旧を実行する必要がある。 Conventionally, research and technical development related to SDN (Software Defined Networking) / NFV (Network Function Virtualization) have been actively conducted. SDN / NFV is expected to shorten new service opening and reduce CAPEX (Capital expenditure) / OPEX (Operating expenditure). Here, in order to reduce OPEX, it is required to automate failure recovery on a virtual network that realizes SDN / NFV. For example, it is necessary to execute automatic recovery in service units, not automatic recovery from a microscopic viewpoint such as device failure units or virtual machine units.

例えば、特許文献１記載の物理資源制御管理システムは、ハードウェア復旧方式とソフトウェア復旧方式とが復旧速度を損なわずに連携し、高速なフェールオーバを実現しようとしている。すなわち、この物理資源制御管理システムは、ハードウェア空間６１００とソフトウェア空間６５００とから構成され、障害管理手段６５５２がハードウェアによる障害管理とソフトウェアによる障害管理間の連携をとり、資源割当手段６５５１を制御している。 For example, in the physical resource control management system described in Patent Document 1, the hardware recovery method and the software recovery method cooperate with each other without impairing the recovery speed, and attempt to realize high-speed failover. That is, this physical resource control management system is composed of a hardware space 6100 and a software space 6500, and the failure management means 6552 controls the resource allocation means 6551 by linking hardware failure management and software failure management. doing.

特開２００８−１０７８９６号公報JP 2008-107896 A

しかしながら、サービス毎にＳＬＡ（Service Level Agreement）が異なっていることを考慮すると、単一的な復旧手段では対処できない場合がある。また、特許文献１記載の技術は、単にサーバ内部の動作を示すものであり、サービス復旧には十分ではない。 However, considering that the service level agreement (SLA) differs for each service, there are cases where a single recovery means cannot cope with the problem. Further, the technique described in Patent Document 1 merely shows the operation inside the server, and is not sufficient for service restoration.

また、図１４は、従来のネットワーク構成例を示す図である。ネットワークａ〜ｃは、複数のノードを含み、各ノードがインターネットに接続できるように構成されている。各ノードは、ルーティング機能、ファイヤウォール機能、またはバーチャルマシン構築機能などを有しているものとする。図１４において、監視システムａは、ネットワークａの障害を監視し、ノード制御システムａは、ネットワークａに含まれる各ノードを制御する。同様に、監視システムｂは、ネットワークｂの障害を監視し、ノード制御システムｂは、ネットワークｂに含まれる各ノードを制御する。また同様に、監視システムｃは、ネットワークｃの障害を監視し、ノード制御システムｃは、ネットワークｃに含まれる各ノードを制御する。 FIG. 14 is a diagram illustrating a conventional network configuration example. The networks a to c include a plurality of nodes and are configured so that each node can connect to the Internet. Each node has a routing function, a firewall function, a virtual machine construction function, or the like. In FIG. 14, the monitoring system a monitors a failure in the network a, and the node control system a controls each node included in the network a. Similarly, the monitoring system b monitors a failure of the network b, and the node control system b controls each node included in the network b. Similarly, the monitoring system c monitors a failure in the network c, and the node control system c controls each node included in the network c.

図１５は、サービスａの論理構成を示す図である。ファイヤウォールＦＷａ１は、バーチャルマシンＶＭａ１〜ａ３をインターネットに接続する機能を有している。図１６〜図１８は、サービスａの構成例を示す図である。図１６〜図１８に示すように、ファイヤウォールの機能を提供するＦＷａ１が故障し、それを監視システムｂが検知し、同一ネットワークシステムに分類されるノード制御システムｂに対し、復旧手段として、別のバーチャルマシンの構築を指示し、ファイヤウォールの機能を提供する場合がある。 FIG. 15 is a diagram illustrating a logical configuration of the service a. The firewall FWa1 has a function of connecting the virtual machines VMa1 to a3 to the Internet. 16 to 18 are diagrams illustrating configuration examples of the service a. As shown in FIG. 16 to FIG. 18, the FWa 1 that provides the firewall function fails, the monitoring system b detects this, and the node control system b classified as the same network system is provided with a recovery means. May instruct the creation of virtual machines and provide firewall functions.

しかし、ファイヤウォールＦＷａ１を収容していたノード自体が故障してしまった場合、ファイヤウォールＦＷａ１の機能を他のノードで実現しようとしても、図１８に示すように、サービスａを継続することができないバーチャルマシンＶＭａ１が生ずる可能性がある。また、図１８に示すように、復旧のためにネットワークを跨ぐ場合、ファイヤウォールＦＷａ１の機能を構築できない可能性がある。 However, if the node that accommodated the firewall FWa1 has failed, the service a cannot be continued as shown in FIG. 18 even if the function of the firewall FWa1 is implemented by another node. There is a possibility that the virtual machine VMa1 is generated. Also, as shown in FIG. 18, when straddling a network for recovery, there is a possibility that the function of the firewall FWa1 cannot be constructed.

本発明は、このような事情に鑑みてなされたものであり、サービス毎に最適な復旧処理を実施することができるネットワーク復旧システムおよびプログラムを提供することを目的とする。 The present invention has been made in view of such circumstances, and an object of the present invention is to provide a network recovery system and program capable of performing optimal recovery processing for each service.

（１）上記の目的を達成するために、本発明は、以下のような手段を講じた。すなわち、本発明のネットワーク復旧システムは、複数のノードを含む少なくとも一つのネットワーク内で発生するノードの障害を監視する監視システムおよび前記各ノードを制御するノード制御システムに接続され、いずれかのノードで障害が発生した場合、前記ノード制御システムに対し、サービス単位でネットワークを再構成する指示を与えるネットワーク復旧システムであって、前記監視システムからいずれかのノードで障害が発生したことを示す障害情報を取得し、前記取得した障害情報を出力する情報統合部と、ネットワークの障害を復旧させる復旧プランを作成するための複数のアルゴリズムのうちのいずれかを用い、前記障害情報およびネットワーク構成情報に基づいて、サービス毎にネットワークの空きリソースを用いた復旧プランを作成する最適設計部と、前記復旧プランを用いて、復旧のためのネットワーク設定情報を作成し、前記ノード制御システムに対して前記作成したネットワーク設定情報を出力する指示作成部と、を備えることを特徴とする。 (1) In order to achieve the above object, the present invention takes the following measures. That is, the network recovery system of the present invention is connected to a monitoring system that monitors a failure of a node occurring in at least one network including a plurality of nodes and a node control system that controls each of the nodes. When a failure occurs, the network recovery system gives an instruction to the node control system to reconfigure the network on a service basis, and includes failure information indicating that a failure has occurred in any node from the monitoring system. Based on the failure information and the network configuration information, using an information integration unit that acquires and outputs the acquired failure information and a plurality of algorithms for creating a recovery plan for recovering a network failure , Using free network resources for each service An optimal design unit that creates an old plan; and an instruction creation unit that creates network setting information for restoration using the restoration plan and outputs the created network setting information to the node control system; It is characterized by providing.

このように、監視システムからいずれかのノードで障害が発生したことを示す障害情報を取得し、取得した障害情報を出力し、ネットワークの障害を復旧させる復旧プランを作成するための複数のアルゴリズムのうちのいずれかを用い、障害情報およびネットワーク構成情報に基づいて、サービス毎にネットワークの空きリソースを用いた復旧プランを作成し、復旧プランを用いて、復旧のためのネットワーク設定情報を作成し、ノード制御システムに対して作成したネットワーク設定情報を出力するので、障害発生を示す警報が発呼したことを契機として、サービスの復旧を自動的に実施することが可能となる。また、サービス毎にネットワークの空きリソースを用いた復旧プランを作成するので、サービス毎に復旧処理を最適な形で実現することが可能となる。さらに、復旧を制御するのではなく、指示することに特化するため、既存のシステム構成に大きな変更を加えることなく、サービス毎の自動復旧を実施するシステムを実現することが可能となる。 In this way, it is possible to acquire failure information indicating that a failure has occurred in any node from the monitoring system, output the acquired failure information, and create multiple recovery plans for creating a recovery plan that restores network failures. Create a recovery plan that uses available network resources for each service based on failure information and network configuration information, and use the recovery plan to create network setting information for recovery. Since the network setting information created for the node control system is output, it is possible to automatically perform service restoration when an alarm indicating the occurrence of a failure is triggered. In addition, since a recovery plan using free resources of the network is created for each service, it is possible to realize a recovery process in an optimal form for each service. Furthermore, since it specializes in instructing instead of controlling recovery, it is possible to realize a system that performs automatic recovery for each service without making a major change to the existing system configuration.

（２）また、本発明のネットワーク復旧システムにおいて、前記情報統合部は、ネットワーク構成情報、リソース情報および障害情報を記録し、前記最適設計部は、前記ネットワーク構成情報、リソース情報および障害情報に基づいて復旧プランを構成するノードを特定することを特徴とする。 (2) In the network recovery system of the present invention, the information integration unit records network configuration information, resource information, and failure information, and the optimum design unit is based on the network configuration information, resource information, and failure information. And identifying nodes constituting the recovery plan.

このように、情報統合部において、ネットワーク構成情報、リソース情報および障害情報を記録し、最適設計部において、ネットワーク構成情報、リソース情報および障害情報に基づいて復旧プランを構成するノードを特定するので、サービス毎に復旧処理を最適な形で実現することが可能となる。 In this way, the information integration unit records network configuration information, resource information, and failure information, and the optimum design unit identifies the nodes that make up the recovery plan based on the network configuration information, resource information, and failure information. Recovery processing can be realized in an optimum manner for each service.

（３）また、本発明のネットワーク復旧システムにおいて、前記情報統合部は、前記取得した障害情報を共通化して共通障害情報を出力する一方、前記最適設計部が作成した復旧プランを個別化し、個別化復旧プランを前記指示作成部に出力し、前記指示作成部は、前記個別化復旧プランに基づいて、復旧のためのネットワーク設定情報を作成することを特徴とする。 (3) In the network recovery system of the present invention, the information integration unit shares the acquired failure information and outputs common failure information, while individualizing the recovery plan created by the optimum design unit The recovery plan is output to the instruction creating unit, and the instruction creating unit creates network setting information for recovery based on the individualized recovery plan.

このように、情報統合部において、取得した障害情報を共通化して共通障害情報を出力するので、復旧アルゴリズムの増加を抑制することが可能となる。また、最適設計部が作成した復旧プランを個別化し、個別化復旧プランを前記指示作成部に出力するので、各ノード制御システムが取り扱うことのできる情報を提供することが可能となる。 Thus, since the information integration unit shares the acquired failure information and outputs the common failure information, an increase in the recovery algorithm can be suppressed. In addition, since the recovery plan created by the optimum design unit is individualized and the individualized recovery plan is output to the instruction creation unit, it is possible to provide information that can be handled by each node control system.

（４）また、本発明のネットワーク復旧システムにおいて、前記情報統合部は、情報変換テーブルを備え、前記障害情報の共有化および前記復旧プランの個別化を、前記情報変換テーブルを用いて実行することを特徴とする。 (4) In the network recovery system of the present invention, the information integration unit includes an information conversion table, and executes the sharing of the failure information and the individualization of the recovery plan using the information conversion table. It is characterized by.

このように、障害情報の共有化および復旧プランの個別化を、情報変換テーブルを用いて実行するので、処理の単純化および高速化を図ることが可能となる。 As described above, the sharing of the failure information and the individualization of the recovery plan are executed using the information conversion table, so that the process can be simplified and speeded up.

（５）また、本発明のネットワーク復旧システムにおいて、前記指示作成部は、サービス毎に優先度を定め、複数のサービスの復旧が同時に必要になった場合、前記ノード制御システムに対して前記優先度および前記ネットワーク設定情報を出力することを特徴とする。 (5) Further, in the network recovery system of the present invention, the instruction creating unit determines a priority for each service, and when the restoration of a plurality of services is necessary at the same time, the priority is given to the node control system. And outputting the network setting information.

このように、サービス毎に優先度を定め、複数のサービスの復旧が同時に必要になった場合、ノード制御システムに対して優先度およびネットワーク設定情報を出力するので、優先度の高いサービスから復旧処理を実行することが可能となる。 In this way, priority is set for each service, and when multiple services need to be restored at the same time, the priority and network setting information is output to the node control system. Can be executed.

（６）また、本発明のプログラムは、複数のノードを含む少なくとも一つのネットワーク内で発生するノードの障害を監視する監視システムおよび前記各ノードを制御するノード制御システムに接続され、いずれかのノードで障害が発生した場合、前記ノード制御システムに対し、サービス単位でネットワークを再構成する指示を与えるサーバ装置のプログラムであって、前記監視システムからいずれかのノードで障害が発生したことを示す障害情報を取得し、前記取得した障害情報を出力する処理と、ネットワークの障害を復旧させる復旧プランを作成するための複数のアルゴリズムのうちのいずれかを用い、前記障害情報およびネットワーク構成情報に基づいて、サービス毎にネットワークの空きリソースを用いた復旧プランを作成する処理と、前記復旧プランを用いて、復旧のためのネットワーク設定情報を作成し、前記ノード制御システムに対して前記作成したネットワーク設定情報を出力する処理と、の一連の処理を、コンピュータに実行させることを特徴とする。 (6) The program of the present invention is connected to a monitoring system that monitors a failure of a node that occurs in at least one network including a plurality of nodes, and a node control system that controls each of the nodes. A server device program that gives an instruction to the node control system to reconfigure the network in units of services when a failure occurs in the node control system, and indicates that a failure has occurred in any node from the monitoring system Based on the failure information and the network configuration information, using one of a plurality of algorithms for acquiring information, outputting the acquired failure information, and creating a recovery plan for recovering a network failure Create a recovery plan that uses available network resources for each service And a process for generating network setting information for recovery using the recovery plan and outputting the generated network setting information to the node control system. It is characterized by that.

本発明によれば、障害発生を示す警報が発呼したことを契機として、サービスの復旧を自動的に実施することが可能となる。また、サービス毎に最適な形で復旧処理を実現することが可能となる。さらに、既存のシステム構成に大きな変更を加えることなく、サービス毎の自動復旧を実施するシステムを実現することが可能となる。 According to the present invention, it is possible to automatically perform service recovery triggered by the occurrence of an alarm indicating the occurrence of a failure. In addition, the restoration process can be realized in an optimum form for each service. Furthermore, it is possible to realize a system that performs automatic recovery for each service without making a major change to the existing system configuration.

本実施形態に係るネットワーク復旧システムの概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the network recovery system which concerns on this embodiment. 共通化と個別化の例を示す図である。It is a figure which shows the example of sharing and individualization. 最適設計部１０１の動作を示すフローチャートである。3 is a flowchart showing the operation of the optimum design unit 101. 構成情報データベースの例を示す図である。It is a figure which shows the example of a structure information database. リソース情報例を示す図である。It is a figure which shows the resource information example. リソース情報例を示す図である。It is a figure which shows the resource information example. 本実施形態に係るネットワーク復旧システムの動作例を示す図である。It is a figure which shows the operation example of the network recovery system which concerns on this embodiment. 本実施形態に係るネットワーク復旧システムの動作例を示す図である。It is a figure which shows the operation example of the network recovery system which concerns on this embodiment. 本実施形態に係るネットワーク復旧システムの動作例を示す図である。It is a figure which shows the operation example of the network recovery system which concerns on this embodiment. 復旧すべきネットワーク構成を示す図である。It is a figure which shows the network structure which should be recovered. 本実施形態の変形例を示す図である。It is a figure which shows the modification of this embodiment. サービスｂの論理構成示す図である。It is a figure which shows the logic structure of the service b. 時間に対するトラフィック量を示す図である。It is a figure which shows the traffic amount with respect to time. 従来のネットワーク構成例を示す図である。It is a figure which shows the example of a conventional network structure. サービスａの論理構成を示す図である。It is a figure which shows the logical structure of the service a. サービスａの構成例を示す図である。It is a figure which shows the structural example of the service a. サービスａの構成例を示す図である。It is a figure which shows the structural example of the service a. サービスａの構成例を示す図である。It is a figure which shows the structural example of the service a.

本発明者らは、ＯＰＥＸの削減を図るためには、ＳＤＮ／ＮＦＶを実現する仮想化ネットワーク上で障害復旧の自動化などが求められるが、ミクロな視点の自動復旧等ではなく、サービス単位での自動復旧を実行する必要がある点に着目し、サービス毎に最適な復旧アルゴリズムを備えたシステムを設け、そのシステムからノード制御システムに復旧を指示する構成を採ることによって、サービス毎に最適な形で復旧処理を実現することができることを見出し、本発明をするに至った。 In order to reduce OPEX, the present inventors are required to automate failure recovery on a virtualized network that realizes SDN / NFV. However, it is not automatic recovery from a micro perspective, but on a service basis. Focusing on the need to execute automatic recovery, a system with an optimal recovery algorithm is provided for each service, and a configuration that directs the recovery to the node control system from that system is adopted. Thus, the present inventors have found that recovery processing can be realized and have come to the present invention.

すなわち、本発明のネットワーク復旧システムは、複数のノードを含む少なくとも一つのネットワーク内で発生するノードの障害を監視する監視システムおよび前記各ノードを制御するノード制御システムに接続され、いずれかのノードで障害が発生した場合、前記ノード制御システムに対し、サービス単位でネットワークを再構成する指示を与えるネットワーク復旧システムであって、前記監視システムからいずれかのノードで障害が発生したことを示す障害情報を取得し、前記取得した障害情報を出力する情報統合部と、ネットワークの障害を復旧させる復旧プランを作成するための複数のアルゴリズムのうちのいずれかを用い、前記障害情報およびネットワーク構成情報に基づいて、サービス毎にネットワークの空きリソースを用いた復旧プランを作成する最適設計部と、前記復旧プランを用いて、復旧のためのネットワーク設定情報を作成し、前記ノード制御システムに対して前記作成したネットワーク設定情報を出力する指示作成部と、を備えることを特徴とする。 That is, the network recovery system of the present invention is connected to a monitoring system that monitors a failure of a node occurring in at least one network including a plurality of nodes and a node control system that controls each of the nodes. When a failure occurs, the network recovery system gives an instruction to the node control system to reconfigure the network on a service basis, and includes failure information indicating that a failure has occurred in any node from the monitoring system. Based on the failure information and the network configuration information, using an information integration unit that acquires and outputs the acquired failure information and a plurality of algorithms for creating a recovery plan for recovering a network failure , Using free network resources for each service An optimal design unit that creates an old plan; and an instruction creation unit that creates network setting information for restoration using the restoration plan and outputs the created network setting information to the node control system; It is characterized by providing.

これにより、本発明者らは、障害発生を示す警報が発呼したことを契機として、サービスの復旧を自動的に実施することを可能とした。また、サービス毎に最適な形で復旧処理を実現することを可能とし、さらに、既存のシステム構成に大きな変更を加えることなく、サービス毎の自動復旧を実施するシステムを実現することを可能とした。以下、本発明の実施形態について、図面を参照しながら具体的に説明する。 As a result, the present inventors have made it possible to automatically perform service recovery triggered by the occurrence of an alarm indicating the occurrence of a failure. In addition, it is possible to realize recovery processing in an optimal form for each service, and furthermore, it is possible to realize a system that performs automatic recovery for each service without making major changes to the existing system configuration. . Embodiments of the present invention will be specifically described below with reference to the drawings.

図１は、本実施形態に係るネットワーク復旧システムの概略構成を示すブロック図である。このネットワーク復旧システムは、監視システムａ〜ｃおよびノード制御システムａ〜ｃに接続される最適ネットワーク復旧システム１００および監視統合システム２００と、を含む。最適ネットワーク復旧システム１００は、最適設計部１０１、指示作成部１０３および通信部１０４を備えている。また、監視統合システム２００は、情報統合部２０１および通信部２０９を備えており、データベースとして、構成情報２０３、リソース情報２０５および障害情報２０７を有している。なお、最適ネットワーク復旧システム１００および監視統合システム２００は、同一のサーバ上に構築しても構わない。 FIG. 1 is a block diagram showing a schematic configuration of a network recovery system according to the present embodiment. This network recovery system includes an optimal network recovery system 100 and a monitoring integrated system 200 connected to the monitoring systems a to c and the node control systems a to c. The optimum network restoration system 100 includes an optimum design unit 101, an instruction creation unit 103, and a communication unit 104. The monitoring integration system 200 includes an information integration unit 201 and a communication unit 209, and includes configuration information 203, resource information 205, and failure information 207 as a database. Note that the optimum network recovery system 100 and the monitoring integrated system 200 may be constructed on the same server.

ネットワークａ〜ｃは、図示しない複数のノードを含み、各ノードがインターネットに接続できるように構成されている。図１において、監視システムａは、ネットワークａの障害を監視し、ノード制御システムａは、ネットワークａに含まれる各ノードを制御する。同様に、監視システムｂは、ネットワークｂの障害を監視し、ノード制御システムｂは、ネットワークｂに含まれる各ノードを制御する。また同様に、監視システムｃは、ネットワークｃの障害を監視し、ノード制御システムｃは、ネットワークｃに含まれる各ノードを制御する。 The networks a to c include a plurality of nodes (not shown) and are configured such that each node can connect to the Internet. In FIG. 1, a monitoring system a monitors a failure of the network a, and the node control system a controls each node included in the network a. Similarly, the monitoring system b monitors a failure of the network b, and the node control system b controls each node included in the network b. Similarly, the monitoring system c monitors a failure in the network c, and the node control system c controls each node included in the network c.

監視統合システム２００の情報統合部２０１は、障害情報の再形成、すなわち、障害情報の共通化と個別化を行う。例えば、ネットワークスイッチ等において、一般的にそのメーカ等によってインターフェース等の呼称が異なることがある。すなわち、ネットワーク機器からの障害情報をそのまま参照して、復旧プロセスを作成しようとすると、復旧アルゴリズムがメーカ毎に必要になり、アルゴリズム数の肥大化につながる。しかしながら、例えば、ネットワーク機器としての「GigabitEthernet（登録商標）1/1」が故障した場合と「GigabitEthernet（登録商標）1/0/1」が故障した場合とで、その故障の復旧というプロセスに違いはないと考えられる。そこで、情報統合部２０１は、ネットワーク機器からの障害情報を共通化させる。例えば、下記のようにホストの命名規則の違いや、機器のＩｎｔｅｒｆａｃｅの呼称の違いを統一化する機能を有する。 The information integration unit 201 of the monitoring integrated system 200 regenerates failure information, that is, commonizes and individualizes failure information. For example, in a network switch or the like, in general, the name of an interface or the like may differ depending on the manufacturer or the like. That is, if an attempt is made to create a recovery process by referring to failure information from a network device as it is, a recovery algorithm is required for each manufacturer, leading to an increase in the number of algorithms. However, for example, when "GigabitEthernet (registered trademark) 1/1" as a network device fails and when "GigabitEthernet (registered trademark) 1/0/1" fails, the process of restoring the failure is different. It is not considered. Therefore, the information integration unit 201 shares fault information from network devices. For example, it has a function to unify differences in host naming rules and device interface names as described below.

例えば、あるネットワーク機器が「新宿スイッチ01 GigabitEthernet（登録商標）1/1」であるとすると、これを「新宿スイッチ01 Interface1」とし、また、他のネットワーク機器が「新宿_スイッチ02 GigabitEthernet（登録商標）1/0/1」であるとすると、これを「新宿スイッチ02 Interface1」とする。なお、これらの変換処理を、変換テーブルを用いて行なうことも可能である。 For example, if a network device is "Shinjuku Switch 01 GigabitEthernet (registered trademark) 1/1", this is called "Shinjuku Switch 01 Interface1", and another network device is "Shinjuku_Switch 02 GigabitEthernet (registered trademark)". ) 1/0/1 ”, this is“ Shinjuku Switch 02 Interface1 ”. Note that these conversion processes can also be performed using a conversion table.

また、最適ネットワーク復旧システム１００内の最適設計部１０１は、複数の復旧アルゴリズムを有している。例えば、図１ではアルゴリズムａ〜ｃである。最適設計部１０１は、情報統合部２０１によって共通化された障害情報に基づいて、復旧のための設計を行なう。そしてその設計に基づいて、各ノード制御システムａ〜ｃに指示を行なう。ここで、各ノード制御システムａ〜ｃに理解できるように指示を個別化する必要がある。そこで、最適ネットワーク復旧システム１００は、指示に関わる設定情報を情報統合部２０１に一旦戻し、情報統合部２０１がそれを個別化する。最適設計部１０１は、その個別化された設定情報を指示作成部１０３へ伝達し、指示作成部１０３が各ノード制御システムａ〜ｃへの指示を作成する。図２は、共通化と個別化の例を示している。 The optimum design unit 101 in the optimum network restoration system 100 has a plurality of restoration algorithms. For example, in FIG. The optimum design unit 101 performs a design for recovery based on the failure information shared by the information integration unit 201. Based on the design, instructions are given to the node control systems a to c. Here, it is necessary to individualize the instructions so that each node control system ac can understand. Therefore, the optimal network restoration system 100 temporarily returns the setting information related to the instruction to the information integration unit 201, and the information integration unit 201 personalizes it. The optimum design unit 101 transmits the individualized setting information to the instruction creation unit 103, and the instruction creation unit 103 creates instructions for the node control systems a to c. FIG. 2 shows an example of sharing and individualization.

最適設計部１０１は、障害情報２０７と構成情報２０３を参照し、空きリソースへの復旧プランをサービス毎に作成する。図１に示すように、最適設計部１０１には復旧プランを作成するための複数のアルゴリズムが準備されており、基本的にはサービスＳＬＡ毎に用意されている。つまり、サービスＡは復旧の際、そのサービス内を実現する機能間がなるべく遅延量を小さくするような復旧を望むかもしれないし、耐障害性を考慮して、機能間の物理的な距離がなるべく大きくなるような復旧を望むかもしれない。最適設計部１０１は、図３に示すフローチャートに沿って、サービス毎にアルゴリズムを選択していく。 The optimum design unit 101 refers to the failure information 207 and the configuration information 203 and creates a recovery plan for free resources for each service. As shown in FIG. 1, the optimum design unit 101 is provided with a plurality of algorithms for creating a recovery plan, and is basically prepared for each service SLA. In other words, service A may wish to recover by reducing the delay amount as much as possible between the functions realizing the service, and the physical distance between the functions is as much as possible considering fault tolerance. You may want a recovery that will grow. The optimum design unit 101 selects an algorithm for each service according to the flowchart shown in FIG.

図３は、最適設計部１０１の動作を示すフローチャートである。まず、障害通知を受けると（ステップＳ１）、構成情報データベースを参照し、影響するサービスを特定する（ステップＳ２）。次に、復旧できていないサービスのうち、優先度が一番高いサービスを特定する（ステップＳ３）。次に、復旧すべき箇所を特定し（ステップＳ４）、リソース情報を参照し、ＣＰＵ負荷等から使用できるノードを選定する（ステップＳ５）。次に、使用できるノードの構成をパターン化する（ステップＳ６）。ステップＳ６では、例えば、次のようにして、ノードの位置を割り出す。すなわち、「福岡Ｎｏｄｅ（ＶＭ）」、「札幌Ｎｏｄｅ（ＶＭ）」、「東京Ｎｏｄｅ（ＶＭ）」の３つのノードのそれぞれと、「東京Ｎｏｄｅ（ＶＭ）」、「大阪Ｎｏｄｅ（ＶＭ）」、の２つのノードのそれぞれとを対応付ける。 FIG. 3 is a flowchart showing the operation of the optimum design unit 101. First, when a failure notification is received (step S1), the affected service is identified with reference to the configuration information database (step S2). Next, the service with the highest priority among the services that have not been recovered is identified (step S3). Next, a location to be restored is identified (step S4), resource information is referred to, and a node that can be used from a CPU load or the like is selected (step S5). Next, a usable node configuration is patterned (step S6). In step S6, for example, the position of the node is determined as follows. That is, each of the three nodes “Fukuoka Node (VM)”, “Sapporo Node (VM)”, “Tokyo Node (VM)”, “Tokyo Node (VM)”, “Osaka Node (VM)” Associate each of the two nodes.

次に、リソース情報を参照し、予想空き帯域から使用できるリンクを選定し（ステップＳ７）、構成情報データベースを参照し、実行するアルゴリズムを特定する（ステップＳ８）。アルゴリズムを特定したら、そのアルゴリズムを実行し、組み合わせ候補を絞る（ステップＳ９）。次に、絞った候補を任意に抽出する（ステップＳ１０）。次に、影響するサービスがまだあるかどうかを判断し（ステップＳ１１）、影響するサービスがまだある場合は、ステップＳ３に遷移する。一方、ステップＳ１１において、影響するサービスが無い場合は、復旧制御指示を作成して（ステップＳ１２）、終了する。 Next, referring to the resource information, a link that can be used from the expected free bandwidth is selected (step S7), the configuration information database is referred to, and the algorithm to be executed is specified (step S8). If an algorithm is specified, the algorithm is executed and a combination candidate is narrowed down (step S9). Next, the narrowed candidates are arbitrarily extracted (step S10). Next, it is determined whether there are still affected services (step S11). If there are still affected services, the process proceeds to step S3. On the other hand, if there is no affected service in step S11, a recovery control instruction is created (step S12), and the process ends.

また、図１において、指示作成部１０３は、復旧プランから復旧のための設定を作成する。指示作成部１０３は、ノード制御システムａ〜ｃ毎のドライバを持っている。各ノード制御システムａ〜ｃは、ネットワーク毎や、制御機器種別毎に存在する場合があり得る。指示作成部１０３は、各ノード制御システムａ〜ｃの役割や、制御対象ノードを把握しており、与えられた復旧プランを実行するために、どのノード制御システムに指示すべきかを判断できる機能を持つ。また、情報統合部２０１からの復旧プラン情報の個別化と各ノード制御システムドライバを通じて指示設定を作成する。 In FIG. 1, the instruction creation unit 103 creates settings for recovery from the recovery plan. The instruction creation unit 103 has a driver for each of the node control systems a to c. Each node control system ac may exist for each network or each control device type. The instruction creation unit 103 has a function of knowing the role of each of the node control systems a to c and the control target node, and determining which node control system should be instructed to execute a given recovery plan. Have. In addition, an instruction setting is created through individualization of recovery plan information from the information integration unit 201 and each node control system driver.

この指示設定はノード制御システムａ〜ｃが持つＡＰＩ（Application Programming Interface）を呼び出す設定であっても良いし、ＣＬＩ（Command Line Interface）を記述したスクリプト等であっても良い。 This instruction setting may be a setting for calling an API (Application Programming Interface) of the node control systems a to c, or a script describing a CLI (Command Line Interface).

図４は、構成情報データベースの例を示す図である。図４に示すように、サービスに優先度を定義する。そして、複数のサービス復旧が同時に求められた場合に、優先度順に復旧処理を実行する。
また、図５および図６は、リソース情報例を示す図である。これらのリソース情報は、復旧を実施するためのリソース検索に使用される。 FIG. 4 is a diagram illustrating an example of the configuration information database. As shown in FIG. 4, priority is defined for the service. When a plurality of service restorations are requested at the same time, restoration processing is executed in order of priority.
5 and 6 are diagrams showing examples of resource information. These pieces of resource information are used for resource search for performing recovery.

次に、以上のように構成されたネットワーク復旧システムの動作について説明する。図７〜図１０は、本実施形態に係るネットワーク復旧システムの動作例を示す図である。図７において、このネットワーク復旧システムは、監視システムａ〜ｃおよびノード制御システムａ〜ｃに接続される最適ネットワーク復旧システム１００および監視統合システム２００と、を含む。ネットワークａ〜ｃは、複数のノードを含み、各ノードがインターネットに接続できるように構成されている。図７において、監視システムａは、ネットワークａの障害を監視し、ノード制御システムａは、ネットワークａに含まれる各ノードを制御する。同様に、監視システムｂは、ネットワークｂの障害を監視し、ノード制御システムｂは、ネットワークｂに含まれる各ノードを制御する。また同様に、監視システムｃは、ネットワークｃの障害を監視し、ノード制御システムｃは、ネットワークｃに含まれる各ノードを制御する。 Next, the operation of the network recovery system configured as described above will be described. 7 to 10 are diagrams illustrating an operation example of the network recovery system according to the present embodiment. In FIG. 7, this network recovery system includes an optimal network recovery system 100 and a monitoring integrated system 200 connected to the monitoring systems a to c and the node control systems a to c. The networks a to c include a plurality of nodes and are configured so that each node can connect to the Internet. In FIG. 7, the monitoring system a monitors a failure of the network a, and the node control system a controls each node included in the network a. Similarly, the monitoring system b monitors a failure of the network b, and the node control system b controls each node included in the network b. Similarly, the monitoring system c monitors a failure in the network c, and the node control system c controls each node included in the network c.

本実施形態に係るネットワーク復旧システムは、サービスに応じた最適な復旧手段を備えており、以下の特徴を有する。すなわち、最適ネットワーク復旧システム１００は、複数の復旧アルゴリズムを有している。障害情報を共通化したり個別化したりする再形成機能により、アルゴリズムの増大を防止することができる。また、サービス毎に優先度を持たせ、複数のサービスを連続的に復旧させることが可能である。 The network recovery system according to the present embodiment includes an optimal recovery unit corresponding to a service and has the following characteristics. That is, the optimal network recovery system 100 has a plurality of recovery algorithms. An increase in algorithms can be prevented by a re-forming function for sharing or individualizing failure information. In addition, it is possible to give priority to each service and restore a plurality of services continuously.

また、このネットワーク復旧システムは、復旧の際に、リソース割り当てが可能である。すなわち、ネットワークトポロジ情報と、ネットワークリソース情報とを把握することによって、復旧位置の特定をすることが可能である。また、過去のリソース使用状況に基づいて、最適リソース割り当てを実現し、復旧後の安定稼働を実現することが可能である。 The network recovery system can allocate resources at the time of recovery. That is, it is possible to identify the recovery position by grasping the network topology information and the network resource information. Moreover, it is possible to realize optimal resource allocation based on the past resource usage status, and to realize stable operation after recovery.

また、このネットワーク復旧システムは、復旧制御を既存の制御システムに指示することが可能である。すなわち、制御システムの役割を把握することにより、適切な制御システムへの制御の振り分けをすることが可能である。 In addition, this network recovery system can instruct recovery control to an existing control system. That is, by grasping the role of the control system, it is possible to distribute control to an appropriate control system.

図８に示すように、ファイヤウォールＦＷａ１が収容している物理ノードが故障した場合、その故障を監視システムｂが検知し、検知した情報を監視統合システム２００に伝達する。図８に示すように、監視統合システム２００は、情報統合部により、最適ネットワーク復旧システム１００が理解できるような形式で障害情報を再形成する（共通化）。そして、再形成（共通化）した障害情報を最適ネットワーク復旧システム１００に伝達する。ここで、サービスａは、図１５に示す論理構成を採っている。 As shown in FIG. 8, when the physical node accommodated by the firewall FWa1 fails, the monitoring system b detects the failure and transmits the detected information to the monitoring integrated system 200. As shown in FIG. 8, in the monitoring integrated system 200, the information integration unit reforms the failure information in a format that can be understood by the optimum network recovery system 100 (commonization). Then, the reformed (shared) fault information is transmitted to the optimum network restoration system 100. Here, the service a adopts the logical configuration shown in FIG.

次に、図９に示すように、最適ネットワーク復旧システム１００は、監視統合システム２００内の構成情報等を参照し、復旧すべきネットワーク構成を再構築する。復旧すべきネットワーク構成は、例えば、図１０に示すような構成を採る。次に、最適ネットワーク復旧システム１００は、ネットワークの復旧に当たり、サービスａのＳＬＡを満たすため復旧アルゴリズムａ、例えば、「インターネットまでの遅延は１００ｍｓ以内」等を選択し、復旧を試みる。その際、バーチャルマシンＶＭａ１が使用していたＣＰＵ使用量やトラフック使用量等を監視統合システム２００から読み出し、ＶＭ構築等を実施しても問題がないリソースを検索する。使用リソースを決定した後、ノード制御システムａ〜ｃに復旧指示を行なう。最適ネットワーク復旧システム１００より指示を受け取った各ノード制御システムａ〜ｃは、実際にノードを制御し、ネットワークの復旧を実施する。ノード制御後、ノード制御システムａ〜ｃは、最適ネットワーク復旧システム１００にその旨を通知し、最適ネットワーク復旧システム１００は、監視統合システム２００に復旧後の構成を伝達し、情報を更新する。 Next, as shown in FIG. 9, the optimum network recovery system 100 refers to the configuration information in the monitoring integrated system 200 and reconstructs the network configuration to be recovered. The network configuration to be restored adopts a configuration as shown in FIG. 10, for example. Next, the optimal network recovery system 100 selects the recovery algorithm a, for example, “the delay to the Internet is within 100 ms” in order to satisfy the SLA of the service a, and tries to recover the network. At this time, the CPU usage amount, traffic usage amount, and the like used by the virtual machine VMa1 are read from the monitoring integrated system 200, and a resource that has no problem even if the VM construction or the like is executed is searched. After determining the resources to be used, the node control systems a to c are instructed to recover. Each of the node control systems a to c that has received an instruction from the optimal network recovery system 100 actually controls the node and performs network recovery. After the node control, the node control systems a to c notify the optimum network restoration system 100 to that effect, and the optimum network restoration system 100 transmits the restored configuration to the monitoring integrated system 200 and updates the information.

以上説明したように、本実施形態によれば、障害発生を示す警報が発呼したことを契機として、サービスの復旧を自動的に実施することが可能となる。また、サービス毎に最適な形で復旧処理を実現することが可能となる。さらに、既存のシステム構成に大きな変更を加えることなく、サービス毎の自動復旧を実施するシステムを実現することが可能となる。 As described above, according to the present embodiment, it is possible to automatically perform service restoration triggered by the occurrence of a warning indicating the occurrence of a failure. In addition, the restoration process can be realized in an optimum form for each service. Furthermore, it is possible to realize a system that performs automatic recovery for each service without making a major change to the existing system configuration.

（変形例）
図１１は、本実施形態の変形例を示す図である。この変形例では、サービスが複数存在し、例えば、サービスａとサービスｂが同時に提供されているものとする。サービスａは、図１５に示す論理構成を有し、サービスｂは、図１２に示す論理構成を有するものとする。図１１に示すように、ロードバランサＬＢｂ１を収容しているノードが故障した場合、サービスｂのみならず、サービスａにも影響が生ずる。そこで、本実施形態に係るネットワーク復旧システムは、構成情報を参照し、優先度の高いサービスから復旧処理を実施していく。 (Modification)
FIG. 11 is a diagram illustrating a modification of the present embodiment. In this modification, it is assumed that there are a plurality of services, for example, service a and service b are provided simultaneously. The service a has the logical configuration shown in FIG. 15, and the service b has the logical configuration shown in FIG. As shown in FIG. 11, when a node accommodating the load balancer LBb1 fails, not only the service b but also the service a is affected. Therefore, the network recovery system according to the present embodiment refers to the configuration information and performs recovery processing from a service with a high priority.

このように、サービス毎に優先度を定め、複数のサービスの復旧が同時に必要になった場合、ノード制御システムに対して優先度およびネットワーク設定情報を出力することによって、優先度の高いサービスから復旧処理を実行することが可能となる。 In this way, when priority is set for each service and multiple services need to be restored at the same time, the priority and network setting information are output to the node control system to recover from the higher priority service. Processing can be executed.

図１３は、時間に対するトラフィック量を示す図である。本実施形態に係るネットワーク復旧システムは、定期的にトラフィック情報を取得し、任意の期間保持する。そして、任意の期間のトラフィックの傾向を、例えば、図１３に示すように、線形近似を用いて予測する。復旧プランを作成する際に、予測したトラフィック量を用いることによって、復旧後の安定動作を図ることが可能となる。 FIG. 13 is a diagram illustrating traffic volume with respect to time. The network recovery system according to the present embodiment periodically acquires traffic information and holds it for an arbitrary period. Then, a traffic trend in an arbitrary period is predicted using linear approximation as shown in FIG. 13, for example. By using the predicted traffic volume when creating a recovery plan, stable operation after recovery can be achieved.

１００最適ネットワーク復旧システム
１０１最適設計部
１０３指示作成部
１０４通信部
２００監視統合システム
２０１情報統合部
２０３構成情報
２０５リソース情報
２０７障害情報
２０９通信部 100 Optimal Network Recovery System 101 Optimal Design Unit 103 Instruction Creation Unit 104 Communication Unit 200 Monitoring Integrated System 201 Information Integration Unit 203 Configuration Information 205 Resource Information 207 Fault Information 209 Communication Unit

Claims

When a failure occurs in any of the nodes connected to a monitoring system that monitors a node failure that occurs in a plurality of interconnected networks that include a plurality of nodes and a node control system that controls each node , A network recovery system that gives an instruction to the node control system to reconfigure a network in units of services whose communication paths cross each network,
An information integration unit that acquires failure information indicating that a failure has occurred in any node from the monitoring system, and outputs the acquired failure information;
A recovery plan for reconstructing the logical configuration of the network before the occurrence of the failure of the network to be recovered based on the failure information and recovering the network failure based on the reconfigured information of the logical configuration of the network before the occurrence of the network An optimal design unit that checks the free resources of the network for each service and creates a recovery plan using any of a plurality of algorithms for creating
A network recovery system comprising: an instruction creation unit that creates network setting information for restoration using the restoration plan and outputs the created network setting information to the node control system.

The information integration unit records network configuration information, resource information, and failure information,
The network recovery system according to claim 1, wherein the optimum design unit identifies a node constituting a recovery plan based on the network configuration information, resource information, and failure information.

The information integration unit shares the acquired failure information different for each network and outputs common failure information, while the recovery plan created by the optimal design unit is individualized to suit each network, and the individualized recovery plan Is output to the instruction creation unit,
The network restoration system according to claim 1 or 2, wherein the instruction creating unit creates network setting information for restoration based on the individualized restoration plan.

4. The network recovery system according to claim 3, wherein the information integration unit includes an information conversion table, and executes the sharing of the failure information and the individualization of the recovery plan using the information conversion table.

The instruction creating unit determines a priority for each service, and outputs the priority and the network setting information to the node control system when a plurality of services need to be restored simultaneously. The network restoration system according to any one of claims 1 to 4.

When a failure occurs in any of the nodes connected to a monitoring system that monitors a node failure that occurs in a plurality of interconnected networks that include a plurality of nodes and a node control system that controls each node , A program of a server device that gives an instruction to reconfigure a network in units of services whose communication path crosses each network, to the node control system,
Processing for acquiring failure information indicating that a failure has occurred in any node from the monitoring system, and outputting the acquired failure information;
A recovery plan for reconstructing the logical configuration of the network before the occurrence of the failure of the network to be recovered based on the failure information and recovering the network failure based on the reconfigured information of the logical configuration of the network before the occurrence of the network Using one of multiple algorithms to create a network, checking the free resources of the network for each service, creating a recovery plan,
Generating a network setting information for recovery using the recovery plan and outputting the generated network setting information to the node control system; and causing the computer to execute a series of processes. Program.