JP2018170675A

JP2018170675A - Fault recovery procedure optimization system and fault recovery procedure optimization method

Info

Publication number: JP2018170675A
Application number: JP2017067334A
Authority: JP
Inventors: 圭介黒木; Keisuke Kuroki; 林　通秋; Michiaki Hayashi; 通秋林
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2017-03-30
Filing date: 2017-03-30
Publication date: 2018-11-01
Anticipated expiration: 2037-03-30
Also published as: JP6684243B2

Abstract

【課題】機械学習を利用して、ネットワークの状態の変化を把握し、状況に応じた復旧手順を即時に提供すること。【解決手段】各ネットワーク構成情報を取得し、前記取得したネットワーク構成情報を数値化および標準化し、特徴量を算出する特徴量算出部と、前記算出した特徴量と前記特徴量を有するネットワークで発生した障害アラーム種別の組合せ情報を取得し、前記取得した組合せ情報に対応する復旧手順を作成または更新する手順学習・作成部と、前記作成または更新した復旧手順を確からしい順に保管する手順情報保管部と、前記保管された復旧手順のうち、使用対象から除外する復旧手順を保管するフィルタリング保管部と、を備える。【選択図】図１An object of the present invention is to grasp a change in a network state using machine learning and immediately provide a recovery procedure according to the situation. A feature amount calculation unit that acquires each network configuration information, digitizes and standardizes the acquired network configuration information, calculates a feature amount, and is generated in the network having the calculated feature amount and the feature amount. A procedure learning / creating unit that acquires combination information of the failed alarm type, creates or updates a recovery procedure corresponding to the acquired combination information, and a procedure information storage unit that stores the created or updated recovery procedure in a probable order And a filtering storage unit for storing a recovery procedure to be excluded from the use target among the stored recovery procedures. [Selection] Figure 1

Description

本発明は、機械学習により、ネットワークの障害を復旧させる運用手順を作成しまたは作成した運用手順を修正する技術に関する。 The present invention relates to a technique for creating an operation procedure for recovering a network failure by machine learning or correcting the created operation procedure.

従来から、機械学習を行なう場合、古い情報と新しい情報を同じ重みで学習させていた。学習に用いる情報において、時点を考慮して学習させることにより、時間変化に対応した学習を行なっている（特許文献１）。 Conventionally, when machine learning is performed, old information and new information are learned with the same weight. In the information used for learning, learning corresponding to time changes is performed by learning in consideration of the time point (Patent Document 1).

特開２０１４−１６７７４４号公報JP 2014-167744 A

しかしながら、特許文献１では、時間に対する状況変化に対応できる一方で、状態が変化したことをトリガーとして修正を行なう等の処理はなされておらず、状態が変化した場合に即座に学習内容を修正したい場合に対応ができていない。 However, in Patent Document 1, while it is possible to cope with a change in the situation with respect to time, there is no processing such as correction that is triggered by a change in the state, and it is desired to immediately correct the learning content when the state changes. The case has not been addressed.

本発明は、このような事情に鑑みてなされたものであり、学習した内容の修正が必要となった場合に、ネットワークの状態の変化があったかどうかを測定し、そのネットワークの状態に対する最適な解を再作成させ、また、ネットワークの状態の変化がなかった場合においても、不要情報としてフィルタリングすることによって、復旧手順に修正内容を即時に反映する障害復旧手順最適化システムを提供することを目的とする。 The present invention has been made in view of such circumstances, and when correction of learned content is required, it is measured whether there is a change in the state of the network, and an optimal solution for the state of the network is measured. The purpose is to provide a failure recovery procedure optimization system that immediately reflects the correction contents in the recovery procedure by filtering as unnecessary information even when there is no change in the network status. To do.

（１）上記の目的を達成するために、本発明は、以下のような手段を講じた。すなわち、本発明の障害復旧手順最適化システムは、機械学習により、ネットワークの障害を復旧させる運用手順を作成しまたは作成した運用手順を修正し、作成または修正した運用手順を最適化する障害復旧手順最適化システムであって、各ネットワーク構成情報を取得し、前記取得したネットワーク構成情報を数値化および標準化し、特徴量を算出する特徴量算出部と、前記算出した特徴量と前記特徴量を有するネットワークで発生した障害アラーム種別の組合せ情報を取得し、前記取得した組合せ情報に対応する復旧手順を作成または更新する手順学習・作成部と、前記作成または更新した復旧手順を確からしい順に保管する手順情報保管部と、前記保管された復旧手順のうち、使用対象から除外する復旧手順を保管するフィルタリング保管部と、を備え、前記復旧手順に対し、修正が必要であると判断された場合、ネットワーク構成情報を再度取得し、特徴量を算出した結果、前記ネットワークにおいて、前記算出した特徴量が、既存のいずれの特徴量とも異なる新しい特徴量であった場合、前記算出した新しい特徴量と障害アラーム種別の組合せ情報および前記組合せ情報に対する復旧手順を前記フィルタリング保管部に保管することを特徴とする。 (1) In order to achieve the above object, the present invention takes the following measures. That is, the failure recovery procedure optimizing system of the present invention creates an operation procedure for recovering a network failure by machine learning, corrects the created operation procedure, and optimizes the created or corrected operation procedure. An optimization system that acquires each network configuration information, quantifies and standardizes the acquired network configuration information, calculates a feature amount, and includes the calculated feature amount and the feature amount A procedure learning / creating unit that acquires combination information of failure alarm types that have occurred in the network, creates or updates a recovery procedure corresponding to the acquired combination information, and stores the created or updated recovery procedure in a probable order An information storage unit and a filter for storing a recovery procedure to be excluded from the use target among the stored recovery procedures; A storage unit, and when it is determined that the restoration procedure needs to be corrected, the network configuration information is acquired again, and the feature amount is calculated. When the new feature value is different from any existing feature value, the calculated combination information of the new feature value and the failure alarm type and the restoration procedure for the combination information are stored in the filtering storage unit.

このように、各ネットワーク構成情報を取得し、取得したネットワーク構成情報を数値化および標準化し、特徴量を算出し、算出した特徴量と特徴量を有するネットワークで発生した障害アラーム種別の組合せ情報を取得し、取得した組合せ情報に対応する復旧手順を作成または更新し、作成または更新した復旧手順を確からしい順に保管し、保管された復旧手順のうち、使用対象から除外する復旧手順を保管し、復旧手順に対し、修正が必要であると判断された場合、ネットワーク構成情報を再度取得し、特徴量を算出した結果、ネットワークにおいて、算出した特徴量が、既存のいずれの特徴量とも異なる新しい特徴量であった場合、算出した新しい特徴量と障害アラーム種別の組合せ情報および組合せ情報に対する復旧手順をフィルタリング保管部に保管するので、ネットワークの状況に応じた障害の復旧手順を即時に反映させることが可能となる。 In this way, each network configuration information is acquired, the acquired network configuration information is quantified and standardized, the feature amount is calculated, and the combination information of the calculated feature amount and the failure alarm type generated in the network having the feature amount is obtained. Acquire and create or update the recovery procedure corresponding to the acquired combination information, store the created or updated recovery procedure in the most probable order, store the recovery procedures to be excluded from the use among the stored recovery procedures, If it is determined that the restoration procedure needs to be corrected, the network configuration information is obtained again and the feature values are calculated. As a result, the calculated feature values in the network are different from any existing feature values. If it is an amount, filter the combination information of the calculated new feature amount and failure alarm type and the recovery procedure for the combination information Since stored in ring storage portion, it becomes possible to reflect the recovery procedure failure in accordance with the status of the network immediately.

（２）また、本発明の障害復旧手順最適化システムは、前記復旧手順に対し、修正が必要であると判断された場合、ネットワーク構成情報を再度取得し、特徴量を算出した結果、前記ネットワークにおいて、前記算出した特徴量が変化しなかった場合は、前記復旧手順を前記フィルタリング保管部に保管することを特徴とする。 (2) In addition, the failure recovery procedure optimization system of the present invention obtains network configuration information again when it is determined that correction is necessary for the recovery procedure, and calculates the feature value. When the calculated feature value does not change, the restoration procedure is stored in the filtering storage unit.

このように、復旧手順に対し、修正が必要であると判断された場合、ネットワーク構成情報を再度取得し、特徴量を算出した結果、ネットワークにおいて、算出した特徴量が変化しなかった場合は、復旧手順を前記フィルタリング保管部に保管するので、次回に同様の障害が起きた時には、今回修正が必要であると判断された復旧手順をフィルタリングすることで除外することが可能となる。 As described above, when it is determined that the restoration procedure needs to be corrected, the network configuration information is obtained again and the feature amount is calculated. As a result, if the calculated feature amount does not change in the network, Since the recovery procedure is stored in the filtering storage unit, when the same failure occurs next time, it is possible to exclude the recovery procedure that is determined to be corrected this time by filtering.

（３）また、本発明の障害復旧手順最適化システムにおいて、前記復旧手順に対し、修正が必要であると判断された場合、ネットワーク構成情報を再度取得し、特徴量を算出した結果、前記ネットワークにおいて、前記算出した特徴量が変化し、既存のいずれかの特徴量と同一であった場合は、前記復旧手順を前記フィルタリング保管部に保管することを特徴とする。 (3) Also, in the failure recovery procedure optimization system of the present invention, when it is determined that correction is necessary for the recovery procedure, the network configuration information is acquired again and the feature amount is calculated. When the calculated feature value changes and is the same as any of the existing feature values, the restoration procedure is stored in the filtering storage unit.

このように、復旧手順に対し、修正が必要であると判断された場合、ネットワーク構成情報を再度取得し、特徴量を算出した結果、ネットワークにおいて、算出した特徴量が変化し、既存のいずれかの特徴量と同一であった場合は、復旧手順をフィルタリング保管部に保管するので、次回に同様の障害が起きた時には、今回修正が必要であると判断された復旧手順をフィルタリングすることで除外することが可能となる。 As described above, when it is determined that the restoration procedure needs to be corrected, the network configuration information is obtained again and the feature amount is calculated. As a result, the calculated feature amount changes in the network, and any existing If it is the same as the feature amount, the recovery procedure is stored in the filtering storage unit, so when the same failure occurs next time, it is excluded by filtering the recovery procedure that is determined to require correction this time It becomes possible to do.

（４）また、本発明の障害復旧手順最適化システムにおいて、前記算出した新しい特徴量と障害アラーム種別の組合せ情報および前記組合せ情報に対する復旧手順を前記フィルタリング保管部に保管した後、前記フィルタリング保管部に保管されていた従前の復旧手順を削除することを特徴とする。 (4) Also, in the failure recovery procedure optimization system of the present invention, after the combination information of the calculated new feature quantity and failure alarm type and the recovery procedure for the combination information are stored in the filtering storage unit, the filtering storage unit It is characterized by deleting the previous recovery procedure stored in

このように、算出した新しい特徴量と障害アラーム種別の組合せ情報および組合せ情報に対する復旧手順をフィルタリング保管部に保管した後、フィルタリング保管部に保管されていた従前の復旧手順を削除するので、一度除外された復旧手順をその後のネットワークの状況変化に応じて、再度を実行対象とすることが可能となる。 In this way, the combination information of the calculated new feature quantity and failure alarm type and the recovery procedure for the combination information are stored in the filtering storage unit, and then the previous recovery procedure stored in the filtering storage unit is deleted. The restored procedure can be executed again according to the subsequent change in the network status.

（５）また、本発明の障害復旧手順最適化方法は、機械学習により、ネットワークの障害を復旧させる運用手順を作成しまたは作成した運用手順を修正し、作成または修正した運用手順を最適化する障害復旧手順最適化方法であって、特徴量算出部において、各ネットワーク構成情報を取得し、前記取得したネットワーク構成情報を数値化および標準化し、特徴量を算出するステップと、手順学習・作成部において、前記算出した特徴量と前記特徴量を有するネットワークで発生した障害アラーム種別の組合せ情報を取得し、前記取得した組合せ情報に対応する復旧手順を作成または更新するステップと、手順情報保管部において、前記作成または更新した復旧手順を確からしい順に保管するステップと、フィルタリング保管部において、前記保管された復旧手順のうち、使用対象から除外する復旧手順を保管するステップと、を少なくとも含み、前記復旧手順に対し、修正が必要であると判断された場合、ネットワーク構成情報を再度取得し、特徴量を算出した結果、前記ネットワークにおいて、前記算出した特徴量が、既存のいずれの特徴量とも異なる新しい特徴量であった場合、前記算出した新しい特徴量と障害アラーム種別の組合せ情報および前記組合せ情報に対する復旧手順を前記フィルタリング保管部に保管することを特徴とする。 (5) Also, the failure recovery procedure optimization method of the present invention creates an operation procedure for recovering a network failure by machine learning or modifies the created operation procedure, and optimizes the created or modified operation procedure. A method for optimizing a failure recovery procedure, wherein each of the network configuration information is acquired in a feature amount calculation unit, the obtained network configuration information is digitized and standardized, and a feature amount is calculated, and a procedure learning / creation unit A step of acquiring combination information of the calculated feature amount and a failure alarm type generated in the network having the feature amount, and creating or updating a recovery procedure corresponding to the acquired combination information; and a procedure information storage unit A step of storing the created or updated recovery procedures in a probable order and a filtering storage unit; Storing at least a recovery procedure to be excluded from use among the stored recovery procedures, and when it is determined that correction is necessary for the recovery procedure, obtain network configuration information again, If the calculated feature value is a new feature value different from any existing feature value in the network as a result of calculating the feature value, the combination information of the calculated new feature value and the fault alarm type and the combination A recovery procedure for information is stored in the filtering storage unit.

このように、特徴量算出部において、各ネットワーク構成情報を取得し、取得したネットワーク構成情報を数値化および標準化し、特徴量を算出し、手順学習・作成部において、算出した特徴量と特徴量を有するネットワークで発生した障害アラーム種別の組合せ情報を取得し、取得した組合せ情報に対応する復旧手順を作成または更新し、手順情報保管部において、作成または更新した復旧手順を確からしい順に保管し、フィルタリング保管部において、保管された復旧手順のうち、使用対象から除外する復旧手順を保管し、復旧手順に対し、修正が必要であると判断された場合、ネットワーク構成情報を再度取得し、特徴量を算出した結果、ネットワークにおいて、算出した特徴量が、既存のいずれの特徴量とも異なる新しい特徴量であった場合、算出した新しい特徴量と障害アラーム種別の組合せ情報および組合せ情報に対する復旧手順をフィルタリング保管部に保管するので、ネットワークの状況に応じた障害の復旧手順を即時に反映させることが可能となる。 In this way, the feature amount calculation unit acquires each network configuration information, the acquired network configuration information is quantified and standardized, the feature amount is calculated, and the procedure learning / creation unit calculates the calculated feature amount and the feature amount. The combination information of the failure alarm type that occurred in the network that has, the recovery procedure corresponding to the acquired combination information is created or updated, and in the procedure information storage unit, the created or updated recovery procedure is stored in order of probability, The filtering storage unit stores the recovery procedures that are excluded from the use from the stored recovery procedures, and if it is determined that the recovery procedures need to be corrected, the network configuration information is acquired again, and the feature quantity As a result, the calculated feature value in the network is a new feature value that is different from any existing feature value. In this case, the combination information of the calculated new feature quantity and failure alarm type and the recovery procedure for the combination information are stored in the filtering storage unit, so that the failure recovery procedure according to the network status can be reflected immediately. .

本発明によれば、学習した内容の修正が必要なった場合に、ネットワークの状態に変化があったか否かを測定し、そのネットワークの状態に対する最適な解を再作成させ、また、ネットワークの状態に変化がなかった場合においても、修正が必要となった復旧手順を不要情報としてフィルタリングすることによって、復旧手順に修正内容を即時に反映することができ、その結果、ネットワークの障害の内容に応じて、適切な復旧手順を提供することが可能となる。 According to the present invention, when the learned content needs to be corrected, it is measured whether or not the state of the network has changed, and an optimal solution for the state of the network is recreated. Even if there is no change, by filtering the recovery procedure that required correction as unnecessary information, the correction contents can be immediately reflected in the recovery procedure. As a result, depending on the details of the network failure It is possible to provide an appropriate recovery procedure.

本実施形態に係る障害復旧手順最適化システムの概略構成を示す図である。It is a figure which shows schematic structure of the failure recovery procedure optimization system which concerns on this embodiment. ネットワーク情報保管部に保管された情報の一例を示す図である。It is a figure which shows an example of the information preserve | saved at the network information storage part. 特徴量保管部に保管された情報の一例を示す図である。It is a figure which shows an example of the information stored in the feature-value storage part. 手順情報保管部に保管された情報の一例を示す図である。It is a figure which shows an example of the information stored in the procedure information storage part. フィルタリング保管部に保管された情報の一例を示す図である。It is a figure which shows an example of the information stored in the filtering storage part. 障害復旧フェーズの動作を示すフローチャートである。It is a flowchart which shows operation | movement of a failure recovery phase. 手順学習・修正フェーズの動作を示すフローチャートである。It is a flowchart which shows operation | movement of a procedure learning and correction phase.

本発明者らは、ネットワーク障害における復旧手順において、ネットワークのリソース等の状態が変化したことによる復旧手順の修正が即座に反映されないことに着目し、学習した内容の修正が必要なった場合に、ネットワークの状態の変化があったかどうかを測定し、そのネットワークの状態に対する最適な解を再作成させ、また、ネットワークの状態の変化がなかった場合においても、不要情報としてフィルタリングすることによって、復旧手順に修正内容を即時に反映することができ、その結果、ネットワークの障害の内容に応じて、適切な復旧手順を提供することができることを見出し、本発明に至った。 In the recovery procedure in a network failure, the present inventors pay attention to the fact that the correction of the recovery procedure due to a change in the state of the network resource or the like is not immediately reflected, and when it is necessary to correct the learned content, Measure whether there has been a change in the network status, re-create the optimal solution for the network status, and even if there is no change in the network status, filter it as unnecessary information, so that the recovery procedure As a result, it was found that the correction contents can be reflected immediately, and as a result, an appropriate recovery procedure can be provided according to the contents of the failure of the network.

すなわち、本発明の障害復旧手順最適化システムは、機械学習により、ネットワークの障害を復旧させる運用手順を作成しまたは作成した運用手順を修正し、作成または修正した運用手順を最適化する障害復旧手順最適化システムであって、各ネットワーク構成情報を取得し、前記取得したネットワーク構成情報を数値化および標準化し、特徴量を算出する特徴量算出部と、前記算出した特徴量と前記特徴量を有するネットワークで発生した障害アラーム種別の組合せ情報を取得し、前記取得した組合せ情報に対応する復旧手順を作成または更新する手順学習・作成部と、前記作成または更新した復旧手順を確からしい順に保管する手順情報保管部と、前記保管された復旧手順のうち、使用対象から除外する復旧手順を保管するフィルタリング保管部と、を備え、前記復旧手順に対し、修正が必要であると判断された場合、ネットワーク構成情報を再度取得し、特徴量を算出した結果、前記ネットワークにおいて、前記算出した特徴量が、既存のいずれの特徴量とも異なる新しい特徴量であった場合、前記算出した新しい特徴量と障害アラーム種別の組合せ情報および前記組合せ情報に対する復旧手順を前記フィルタリング保管部に保管することを特徴とする。 That is, the failure recovery procedure optimizing system of the present invention creates an operation procedure for recovering a network failure by machine learning, corrects the created operation procedure, and optimizes the created or corrected operation procedure. An optimization system that acquires each network configuration information, quantifies and standardizes the acquired network configuration information, calculates a feature amount, and includes the calculated feature amount and the feature amount A procedure learning / creating unit that acquires combination information of failure alarm types that have occurred in the network, creates or updates a recovery procedure corresponding to the acquired combination information, and stores the created or updated recovery procedure in a probable order An information storage unit and a filter for storing a recovery procedure to be excluded from the use target among the stored recovery procedures; A storage unit, and when it is determined that the restoration procedure needs to be corrected, the network configuration information is acquired again, and the feature amount is calculated. When the new feature value is different from any existing feature value, the calculated combination information of the new feature value and the failure alarm type and the restoration procedure for the combination information are stored in the filtering storage unit.

これにより、本発明者らは、ネットワークの状況に応じた障害の復旧手順を即時に反映させることが可能とした。以下、本発明の実施形態について、図面を参照しながら具体的に説明する。 As a result, the present inventors have made it possible to immediately reflect the failure recovery procedure according to the network status. Embodiments of the present invention will be specifically described below with reference to the drawings.

図１は、本実施形態に係る障害復旧手順最適化システムの概略構成を示す図である。障害復旧手順最適化システム１は、ネットワーク情報保管部１１、監視部１３、特徴量算出部１５、特徴量保管部１７、入力値作成部２１、手順情報保管部２３、結果出力部２５、フィルタリング保管部２７、手順学習・作成部２９、手順結果確認部３１、を備えている。本実施形態に係る障害復旧手順最適化システム１の利用形態は、事前学習、障害復旧、手順学習・修正、およびフィルタ修正の４つのフェーズから成る。以下に、これら４つの利用形態について、順を追って説明しながら、各機能についても説明する。 FIG. 1 is a diagram showing a schematic configuration of a failure recovery procedure optimization system according to the present embodiment. The failure recovery procedure optimization system 1 includes a network information storage unit 11, a monitoring unit 13, a feature amount calculation unit 15, a feature amount storage unit 17, an input value creation unit 21, a procedure information storage unit 23, a result output unit 25, and a filtering storage. Unit 27, procedure learning / creating unit 29, and procedure result confirming unit 31. The usage form of the failure recovery procedure optimization system 1 according to the present embodiment is composed of four phases: prior learning, failure recovery, procedure learning / correction, and filter correction. In the following, each function will be described while explaining these four modes of use step by step.

［１．事前学習］
まず、ユーザは、ネットワーク名とネットワークに関連するＩＰアドレス情報を入力する。入力されたネットワーク名とネットワークに関連するＩＰアドレス情報は、ネットワーク情報保管部１１に保管される。図２は、ネットワーク情報保管部１１に保管された情報の一例を示す図である。 [1. Prior learning]
First, the user inputs a network name and IP address information related to the network. The input network name and IP address information related to the network are stored in the network information storage unit 11. FIG. 2 is a diagram illustrating an example of information stored in the network information storage unit 11.

次に、監視部１３は、ネットワーク情報保管部１１に保管されたＩＰアドレス情報を参照し、各ネットワーク機器のリソース情報（ＣＰＵ使用率、メモリ使用率、トラフィック情報等）を取得し、特徴量算出部１５へ送信する。 Next, the monitoring unit 13 refers to the IP address information stored in the network information storage unit 11, acquires resource information (CPU usage rate, memory usage rate, traffic information, etc.) of each network device, and calculates a feature amount. To the unit 15.

次に、特徴量算出部１５は、ネットワークの状態を特徴量として算出する。ここでは、例えば、機械学習等を用いて、各ネットワークの特徴量を算出することを想定しており、ＳＯＭ（ＳｅｌｆＯｒｇａｎｉｚｉｎｇＭａｐ）やＡｕｔｏｅｎｃｏｄｅｒ等の機械学習を利用すること等が考えられる。各ネットワークの特徴量を算出した後、特徴量保管部１７に各ネットワークおよび各ネットワークの特徴量に関する情報を保管する。図３は、特徴量保管部１７に保管された情報の一例を示す図である。 Next, the feature amount calculation unit 15 calculates a network state as a feature amount. Here, for example, it is assumed that the feature quantity of each network is calculated using machine learning or the like, and machine learning such as SOM (Self Organizing Map) or Auto encoder may be used. After calculating the feature amount of each network, the feature amount storage unit 17 stores information regarding each network and the feature amount of each network. FIG. 3 is a diagram illustrating an example of information stored in the feature amount storage unit 17.

次に、入力値作成部２１は、各ネットワークの特徴量と各ネットワークで発生した障害のアラーム種別を組合せ、それを入力値として取得する。そして、入力値作成部２１は、その障害で用いた復旧手順を出力値、つまり入力値である特徴量とアラーム種別に対する解として、手順学習・作成部２９へ送信する。手順学習・作成部２９は、取得した入力値である特徴量とアラーム種別、および出力値である復旧手順をもとに、学習を実施（教師あり学習）し、その情報を手順情報保管部２３へ保管する。図４は、手順情報保管部２３に保管された情報の一例を示す図である。手順情報保管部２３は、ネットワークの特徴量およびアラーム種別の組合せ毎に、復旧手順を、確からしい順番で保管する。 Next, the input value creation unit 21 combines the feature amount of each network and the alarm type of the failure that has occurred in each network, and acquires it as an input value. Then, the input value creation unit 21 transmits the restoration procedure used in the failure as an output value, that is, a solution for the feature value and the alarm type as the input value, to the procedure learning / creation unit 29. The procedure learning / creation unit 29 performs learning (supervised learning) based on the acquired feature value and alarm type that are input values and the recovery procedure that is an output value, and stores the information as a procedure information storage unit 23. Store in FIG. 4 is a diagram illustrating an example of information stored in the procedure information storage unit 23. The procedure information storage unit 23 stores the recovery procedures in a probable order for each combination of network feature amount and alarm type.

［２．障害復旧］
図６は、障害復旧フェーズの動作を示すフローチャートである。ネットワークで障害が発生すると、障害が発生したネットワークから入力値作成部２１へ向かって障害アラームが発出され、障害を検知する（ステップＳ１０１）。入力値作成部２１は、手順情報保管部２３の情報を参照し、障害が発生したネットワークおよびネットワークの特徴量を確認する（ステップＳ１０２、Ｓ１０３）。確認の結果、これまでに発生したことがある障害か否かを判定する（ステップＳ１０４）。 [2. Disaster recovery]
FIG. 6 is a flowchart showing the operation of the failure recovery phase. When a failure occurs in the network, a failure alarm is issued from the network in which the failure has occurred to the input value creation unit 21, and the failure is detected (step S101). The input value creation unit 21 refers to the information in the procedure information storage unit 23 and confirms the network in which the failure has occurred and the network feature amount (steps S102 and S103). As a result of the confirmation, it is determined whether or not the fault has occurred so far (step S104).

ステップＳ１０４において、入力値作成部２１が、発生した障害は既に起きたことがある障害で、かつ復旧手順情報が存在する場合は、その旨を結果出力部２５へ通知する。結果出力部２５は、フィルタリング保管部２７と手順情報保管部２３の情報を参照し（ステップＳ１０７）、手順情報保管部２３に保管されている情報のうち、フィルタリング保管部２７に保管されている情報以外の該当する復旧手順を、ユーザへ出力する（ステップＳ１０８）。 In step S104, if the failure that has occurred is a failure that has already occurred and recovery procedure information exists, the input value creation unit 21 notifies the result output unit 25 to that effect. The result output unit 25 refers to the information in the filtering storage unit 27 and the procedure information storage unit 23 (step S107), and the information stored in the filtering storage unit 27 among the information stored in the procedure information storage unit 23. The corresponding restoration procedure other than is output to the user (step S108).

一方、ステップＳ１０４において、入力値作成部２１が、発生した障害は初めて発生した障害であると判定した場合は、当該ネットワークの特徴量とアラームの組合せを入力値として手順学習・作成部２９へ送信する（ステップＳ１０５）。 On the other hand, if the input value creation unit 21 determines in step S104 that the failure that has occurred is a failure that has occurred for the first time, the combination of the feature amount of the network and the alarm is transmitted as an input value to the procedure learning / creation unit 29. (Step S105).

次に、手順学習・作成部２９は、入力値作成部２１から取得した入力値から復旧手順を算出し、算出した復旧手順を入力値に対する出力値として、特徴量、ネットワーク名、アラーム種別等の情報とともに、手順情報保管部２３へ保管する（ステップＳ１０６）。その後、結果出力部２５を経由して、算出した復旧手順をユーザへ出力する（ステップＳ１０８）。 Next, the procedure learning / creating unit 29 calculates a restoration procedure from the input value acquired from the input value creating unit 21, and uses the calculated restoration procedure as an output value for the input value, such as a feature amount, a network name, an alarm type, and the like. The information is stored together with the information in the procedure information storage unit 23 (step S106). Then, the calculated restoration procedure is output to the user via the result output unit 25 (step S108).

［３．手順学習・修正］
図７は、手順学習・修正フェーズの動作を示すフローチャートである。まず、ユーザは、結果出力部２５から出力された復旧手順を実行した後、その復旧手順を修正する必要があるか否かを確認し（ステップＳ２０１）、復旧手順を修正する必要があるか否かの判定を行なう（ステップＳ２０２）。ユーザは、判定結果を手順結果確認部３１へ入力する。 [3. Procedure learning / correction]
FIG. 7 is a flowchart showing the operation of the procedure learning / correction phase. First, after executing the recovery procedure output from the result output unit 25, the user checks whether or not the recovery procedure needs to be corrected (step S201), and whether or not the recovery procedure needs to be corrected. Is determined (step S202). The user inputs the determination result to the procedure result confirmation unit 31.

復旧手順に修正の必要がない場合は、手順学習・作成部２９に対し、ネットワークの特徴量とアラーム情報を入力値として、また実行した手順情報を出力値として、ネットワークの特徴量、アラーム情報および手順情報を送信し、学習を行なわせる（ステップＳ２０９）。学習結果を、手順情報保管部２３へ保管（更新）して終了する（ステップＳ２１０）。 If there is no need to modify the restoration procedure, the procedure learning / creating unit 29 receives the network feature value and alarm information as input values, and the executed procedure information as an output value. Procedure information is transmitted and learning is performed (step S209). The learning result is stored (updated) in the procedure information storage unit 23, and the process ends (step S210).

ステップＳ２０２において、手順修正が必要であると判定した場合は、手順結果確認部３１は、監視部１３に再監視を行なうよう依頼する（ステップＳ２０３）。これは、ネットワークの状況が変化し、これまでの手順が利用できない可能性を探ることを目的としている。 If it is determined in step S202 that the procedure needs to be corrected, the procedure result confirmation unit 31 requests the monitoring unit 13 to perform re-monitoring (step S203). This is aimed at exploring the possibility that the status of the network will change and the previous procedures will not be available.

監視部１３は、ネットワーク情報保管部１１を参照し、該当ネットワーク上に存在しているネットワーク機器のリソース情報を取得し、取得したリソース情報を特徴量算出部１５へ送信する。 The monitoring unit 13 refers to the network information storage unit 11, acquires the resource information of the network device existing on the network, and transmits the acquired resource information to the feature amount calculation unit 15.

次に、特徴量算出部１５は、取得したリソース情報をもとに特徴量を算出する（ステップＳ２０４）。特徴量を算出した結果、特徴量が変化しなかった場合（ステップＳ２０５）は、障害が起きたネットワークの特徴量におけるフィルタリングとして、今回実行した復旧手順をフィルタリング保管部２７へ保管（更新）し、次回は利用しないようにする（ステップＳ２０８）。 Next, the feature amount calculation unit 15 calculates a feature amount based on the acquired resource information (step S204). As a result of calculating the feature amount, if the feature amount has not changed (step S205), the recovery procedure executed this time is stored (updated) in the filtering storage unit 27 as filtering in the feature amount of the network in which the failure has occurred, It is not used next time (step S208).

また、ステップＳ２０５において、特徴量を算出した結果、特徴量は変化したが、既に存在している特徴量になった場合（ステップＳ２０６）は、その特徴量におけるフィルタリングとして、今回利用した復旧手順をフィルタリング保管部２７へ新たに入力（更新）し、次回は利用しないようにする（ステップＳ２０８）。 In addition, if the feature quantity has changed as a result of calculating the feature quantity in step S205, but has already existed (step S206), the restoration procedure used this time is used as filtering for the feature quantity. A new input (update) is made to the filtering storage unit 27 so that it will not be used next time (step S208).

また、ステップＳ２０６において、特徴量が変化し、さらにそれが初めての特徴量であった場合は、特徴量保管部１７にその情報を保管し、その特徴量におけるフィルタリングとして、今回利用した復旧手順とともに、フィルタリング保管部２７へ新たに入力（更新）し、次回は利用しないようにする（ステップＳ２０７）。 In step S206, if the feature value changes and is the first feature value, the information is stored in the feature value storage unit 17, and the filtering is performed on the feature value together with the restoration procedure used this time. Then, it is newly input (updated) to the filtering storage unit 27 so that it is not used next time (step S207).

ステップＳ２０５〜Ｓ２０８の処理完了後、手順学習・作成部に対し、特徴量とアラーム情報を入力値として、また利用した復旧手順のうち手順修正の必要がない復旧手順または新たに作成した復旧手順を出力値として、学習させ（ステップＳ２０９）、手順情報保管部２３にその情報を保管（更新）する（ステップＳ２１０）。 After the processing in steps S205 to S208 is completed, the procedure learning / creating unit receives the feature value and the alarm information as input values, and uses a restoration procedure that does not require procedure correction or a newly created restoration procedure. The output value is learned (step S209), and the information is stored (updated) in the procedure information storage unit 23 (step S210).

［４．フィルタ修正］
手順学習・修正フェーズにおいて、学習が進み、フィルタリング保管部に保管されている手順が最も不要と判断された手順情報ではないと判断された場合には、その手順情報をフィルタリング保管部から削除する。例えば、図５において、特徴量「１」、アラーム種別「error」のように、２つの手順情報にフィルタリング（除外）対象としてフィルタリング保管部に保管されているが、復旧手順［4->5->6］がフィルタリング（除外）対象の復旧手順のうち１位ではなくなった場合、復旧手順［4->5->6］をフィルタリング保管部から削除する。ある復旧手順がフィルタリング保管部に保管されると、その復旧手順が使われることはなくなる。しかし、状況によっては除外対象第１位でない復旧手順が有効となる場合もあり得る。そこで、復旧手順がフィルタリング（除外）対象の復旧手順のうち１位ではなくなった場合、その復旧手順を削除することとした。 [4. Filter correction]
In the procedure learning / correction phase, when learning progresses and it is determined that the procedure stored in the filtering storage unit is not the procedure information determined to be the most unnecessary, the procedure information is deleted from the filtering storage unit. For example, in FIG. 5, two procedure information items such as a feature value “1” and an alarm type “error” are stored in the filtering storage unit as filtering (exclusion) targets. > 6] is no longer the first recovery procedure to be filtered (excluded), the recovery procedure [4->5-> 6] is deleted from the filtering storage unit. When a recovery procedure is stored in the filtering storage unit, the recovery procedure is no longer used. However, depending on the situation, a recovery procedure that is not first in the exclusion target may be effective. Therefore, when the recovery procedure is not ranked first among the recovery procedures to be filtered (excluded), the recovery procedure is deleted.

以上説明したように、本実施形態によれば、学習した内容の修正が必要となった場合に、ネットワークの状態に変化があったか否かを測定し、そのネットワークの状態に対する最適な解を再作成させ、また、ネットワークの状態に変化がなかった場合においても、修正が必要となった復旧手順を不要情報としてフィルタリングすることによって、復旧手順に修正内容を即時に反映することができ、その結果、ネットワークの障害の内容に応じて、適切な復旧手順を提供することが可能となる。 As described above, according to the present embodiment, when the learned content needs to be corrected, it is measured whether the network state has changed, and an optimal solution for the network state is recreated. In addition, even when there is no change in the network status, by filtering the recovery procedure that required correction as unnecessary information, the correction contents can be immediately reflected in the recovery procedure. An appropriate recovery procedure can be provided according to the content of the network failure.

１障害復旧手順最適化システム
１１ネットワーク情報保管部
１３監視部
１５特徴量算出部
１７特徴量保管部
２１入力値作成部
２３手順情報保管部
２５結果出力部
２７フィルタリング保管部
２９手順学習・作成部
３１手順結果確認部 DESCRIPTION OF SYMBOLS 1 Failure recovery procedure optimization system 11 Network information storage part 13 Monitoring part 15 Feature-value calculation part 17 Feature-value storage part 21 Input value preparation part 23 Procedure information storage part 25 Result output part 27 Filtering storage part 29 Procedure learning and preparation part 31 Procedure result confirmation section

Claims

A failure recovery procedure optimization system that creates an operation procedure that recovers a network failure by machine learning, modifies the created operation procedure, and optimizes the created or modified operation procedure,
A feature amount calculation unit that acquires each network configuration information, digitizes and standardizes the acquired network configuration information, and calculates a feature amount;
A procedure learning / creating unit that acquires combination information of the calculated feature value and a failure alarm type that has occurred in the network having the feature value, and creates or updates a recovery procedure corresponding to the acquired combination information;
A procedure information storage unit for storing the created or updated recovery procedures in an ascending order;
Among the stored recovery procedures, a filtering storage unit that stores recovery procedures excluded from use, and
When it is determined that the restoration procedure needs to be corrected, the network configuration information is acquired again, and the feature amount is calculated. As a result, the calculated feature amount is the same as any existing feature amount in the network. A failure recovery procedure optimizing system that stores the combination information of the calculated new feature amount and failure alarm type and the recovery procedure for the combination information in the filtering storage unit when they are different new feature amounts.

When it is determined that correction is necessary for the restoration procedure, the network configuration information is obtained again, and the feature amount is calculated. As a result, in the network, when the calculated feature amount has not changed, The failure recovery procedure optimization system according to claim 1, wherein the recovery procedure is stored in the filtering storage unit.

When it is determined that the restoration procedure needs to be corrected, the network configuration information is acquired again, and the feature amount is calculated. As a result, the calculated feature amount changes in the network, and any existing The failure recovery procedure optimization system according to claim 1 or 2, wherein the recovery procedure is stored in the filtering storage unit when the feature amount is the same.

The combination information of the calculated new feature quantity and failure alarm type and the recovery procedure for the combination information are stored in the filtering storage unit, and then the previous recovery procedure stored in the filtering storage unit is deleted. The failure recovery procedure optimization system according to claim 1.

A failure recovery procedure optimization method that creates an operation procedure that recovers a network failure by machine learning or modifies the created operation procedure and optimizes the created or modified operation procedure,
In the feature amount calculating unit, acquiring each network configuration information, quantifying and standardizing the acquired network configuration information, and calculating a feature amount;
In the procedure learning / creating unit, acquiring the combination information of the calculated feature amount and the failure alarm type generated in the network having the feature amount, and creating or updating a recovery procedure corresponding to the acquired combination information;
In the procedure information storage unit, storing the created or updated recovery procedure in the most likely order;
The filtering storage unit includes at least a step of storing a recovery procedure to be excluded from the use target among the stored recovery procedures,
When it is determined that the restoration procedure needs to be corrected, the network configuration information is acquired again, and the feature amount is calculated. As a result, the calculated feature amount is the same as any existing feature amount in the network. A failure recovery procedure optimizing method comprising: storing the combination information of the calculated new feature amount and failure alarm type and the recovery procedure for the combination information in the filtering storage unit when the feature values are different new feature amounts.