KR20190099710A

KR20190099710A - System and method for handling network failure

Info

Publication number: KR20190099710A
Application number: KR1020180019500A
Authority: KR
Inventors: 권성용; 김보섭; 이종필; 채윤주
Original assignee: 주식회사 케이티
Priority date: 2018-02-19
Filing date: 2018-02-19
Publication date: 2019-08-28
Also published as: KR102149930B1

Abstract

The present invention relates to a network failure handling system. According to the present invention, the network failure handling system comprises: a data generation unit which generates a context vector by applying a vectorization algorithm on context information of system log (Syslog) information of network devices, generates a word vector by applying the vectorization algorithm on failure report information for a network including the network devices, and generates training data including the context vector, the word vector, and failure occurrence information of the network devices; and a failure prediction unit for training a failure prediction model with the training data, and determining system log information and failure report information related to a network failure using the failure prediction model.

Description

SYSTEM AND METHOD FOR HANDLING NETWORK FAILURE}

본 발명은 시스템 로그(Syslog) 정보 및 고장 신고 정보를 이용하여 네트워크에 발생된 장애를 처리하는 기술에 관한 것이다.The present invention relates to a technique for dealing with a failure occurring in a network using system log information and failure report information.

네트워크를 구성하는 네트워크 장비에서 장애가 발생하면, 이를 해결하기 위한 장애 경보가 발생한다. 그러나, 장애 경보가 발생한 모든 상황에서 운용 요원의 현장 투입이 요구되는 것은 아니며, 현장 투입이 필요하다고 판단되어 운용 요원이 출동한 경우에도 해당 장애에 대한 사전 정보가 부족한 상태에서 현장에 파견되어 네트워크 장치를 직접 살펴보고 장애의 원인을 판단해야 하기 때문에, 장애에 대한 효율적인 처리가 어렵다는 문제가 있다.When a failure occurs in the network equipment constituting the network, a failure alert is generated to solve the problem. However, not all operation personnel are required to enter the operation site in all situations where a failure alarm has occurred, and even if the operation personnel are dispatched because it is determined that the operation needs to be dispatched, the network device is dispatched to the site with insufficient information on the failure. There is a problem that it is difficult to efficiently handle the disorder because it must look directly to determine the cause of the disorder.

이를 해결하기 위해 기존 특허들은 네트워크 장비들로부터 수신한 시스템 로그 정보를 이용하나, 다음과 같은 한계가 존재한다. 구체적으로, 한국 등록 특허 제10-02466060 및 제10-0908131의 경우, 네트워크 장치로부터 시스템 로그를 수집, 분류 및 분석하여 장애를 감지하거나 예측한다. 하지만, 상기 방법은 전문가에 의해 정의된 룰에 기반한 통계적 수치 분석에 의존하기 때문에 단순히 사전 징후를 파악할 수 있는 장애에 대해서만 선제적 조치를 취할 수 있다는 한계가 있다. 따라서, HW 고장, SW 오동작 등 다양한 네트워크 장애 원인에 대해서는 사전 파악이 힘들고 선제적 조치 및 예방이 어렵다.To solve this problem, existing patents use system log information received from network devices, but the following limitations exist. Specifically, in the case of Korean Patent Nos. 10-02466060 and 10-0908131, system logs are collected, classified, and analyzed from a network device to detect or predict a failure. However, since the method relies on statistical numerical analysis based on rules defined by experts, there is a limit in that preemptive measures can be taken only for obstacles that can be identified in advance. Therefore, it is difficult to identify in advance various causes of network failures such as HW failure and SW malfunction, and prevent preemptive measures and prevention.

이와 달리, 한국 공개 특허 제10-2015-0097351의 경우 수집된 장애 정보를 분석하여 장애 이벤트를 생성하고 장애 조치 방안을 제안한다. 그러나, 상기 방법은 기존에 수집된 장애 정보 분석에만 의존하는바, 기 발생한 장애에 대해서만 운용 요원에게 조치 방법을 제안할 수 있다는 한계가 있다.On the contrary, in case of Korean Patent Publication No. 10-2015-0097351, the collected fault information is analyzed to generate a fault event and propose a countermeasure. However, since the method relies only on analysis of previously collected failure information, there is a limitation in that it is possible to propose an action method to the operating personnel only for the previously occurring failure.

본 발명이 해결하고자 하는 과제는 네트워크 장비로부터 수신한 시스템 로그 정보, 장애 발생 정보, 장애 발생시 고객으로부터 수집한 고장 신고 정보 및 조치 정보를 학습하여, 통신망 운용 중 네트워크 장비로부터 수신한 시스템 로그 정보를 통해 장애의 징후를 감지하고, 장애 발생시 운용 요원이 선제적 조치가 가능하도록 의사결정을 지원하는 시스템 및 방법을 제공하는 것이다.The problem to be solved by the present invention is to learn the system log information received from the network equipment, failure occurrence information, failure report information and action information collected from the customer in the event of a failure, through the system log information received from the network equipment during network operation It provides a system and method that detects signs of a failure and supports decision-making so that operational personnel can take proactive action in the event of a failure.

본 발명의 일 실시예에 따른 네트워크 장애 처리 시스템은 네트워크 장비들의 시스템 로그(Syslog) 정보의 맥락 정보에 벡터화 알고리즘을 적용하여 맥락 벡터를 생성하고, 상기 네트워크 장비들을 포함하는 네트워크에 대한 고장 신고 정보에 벡터화 알고리즘을 적용하여 워드 벡터를 생성하고, 상기 맥락 벡터, 상기 워드 벡터 및 상기 네트워크 장비들의 장애 발생 정보를 포함하는 트레이닝 데이터를 생성하는 데이터 생성부, 그리고 상기 트레이닝 데이터로 장애 예측 모델을 학습시키고, 상기 장애 예측 모델을 이용하여 네트워크 장애와 관련된 시스템 로그 정보 및 고장 신고 정보를 결정하는 장애 예측부를 포함한다.Network failure processing system according to an embodiment of the present invention generates a context vector by applying a vectorization algorithm to the context information of the system log (Syslog) information of the network equipment, and to the failure notification information for the network including the network equipment Applying a vectorization algorithm to generate a word vector, a data generator for generating training data including failure occurrence information of the context vector, the word vector, and the network equipment, and learning a failure prediction model with the training data, The failure prediction unit may be configured to determine system log information and failure report information related to a network failure by using the failure prediction model.

상기 데이터 생성부는 각 네트워크 장비로부터 수신한 응답 메시지에 기초하여 상기 네트워크 장비들의 장애 발생 정보를 결정한다.The data generator determines failure occurrence information of the network devices based on response messages received from each network device.

상기 장애 예측부는 상기 장애 예측 모델이 학습된 이후에 수신한 시스템 로그 정보의 맥락 벡터 및 고장 신고 정보의 워드 벡터를 이용하여 상기 장애 예측 모델을 재학습 시키고, 재학습 결과 생성된 맥락 벡터 및 워드 벡터의 원소값을 이용하여 상기 네트워크 장애와 관련된 시스템 로그 정보 및 고장 신고 정보를 결정한다.The failure prediction unit re-learns the failure prediction model using the context vector of the system log information received after the failure prediction model has been learned and the word vector of the failure report information, and the context vector and the word vector generated as a result of re-learning. The system log information and failure report information related to the network failure are determined by using element values of.

본 발명의 일 실시예에 따른 네트워크 장애 처리 시스템은 상기 결정된 시스템 로그 정보, 고장 신고 정보 및 조치 정보를 트레이닝 데이터로 하여 조치 방법 추천 모델을 학습시키는 조치 방법 추천부를 더 포함한다.The network failure processing system according to an embodiment of the present invention further includes an action method recommendation unit for learning the action method recommendation model using the determined system log information, the failure report information, and the action information as training data.

상기 조치 방법 추천부는 상기 조치 방법 추천 모델이 학습된 이후에 수신한 시스템 로그 정보 및 고장 신고 정보를 상기 조치 방법 추천 모델에 입력하여 상기 시스템 로그 정보 및 상기 고장 신고 정보에 대한 하나 이상의 조치 방법들을 결정한다.The action method recommendation unit determines the one or more actions on the system log information and the failure report information by inputting the system log information and the failure report information received after the action method recommendation model has been learned into the action method recommendation model. do.

본 발명에 따르면, 기존에 통계적 수치 분석에 의존한 방식에서 단순히 기 발생한 장애에 대해서만 예측이 가능한 문제점을 해결할 수 있다.According to the present invention, it is possible to solve a problem that can be predicted only for a previously generated disorder simply in a method that relies on statistical numerical analysis.

또한, 본 발명에 따르면, 운용요원에게 장애와 연관성이 높은 네트워크 장비에 대한 선제적 조치 방법을 제안할 수 있어 효율적인 장애 처리가 가능하다.In addition, according to the present invention, it is possible to propose a preemptive action method for the network equipment highly associated with the failure to the operating personnel, it is possible to efficiently handle the failure.

또한, 본 발명에 따르면, 시스템 로그 정보, 고장 신고 정보, 장애 정보 및 조치 정보를 지속적으로 수신하여 학습 모델을 업데이트 할 수 있어 시스템의 안전성 확보가 가능하다.In addition, according to the present invention, it is possible to continuously receive the system log information, failure report information, failure information and action information to update the learning model to ensure the safety of the system.

또한, 본 발명에 따르면, 고객으로부터 수집된 고장 신고 정보를 사용하여 장애 분류 및 예측을 수행하는바, 서비스 품질 유지가 가능하다.In addition, according to the present invention, failure classification and prediction are performed using the failure report information collected from the customer, so that service quality can be maintained.

도 1은 본 발명의 한 실시예에 따른 네트워크 장애 처리 시스템이 구현되는환경을 도시한 도면이다.
도 2는 본 발명의 한 실시예에 따른 네트워크 장애 처리 시스템(200)의 구조도이다.
도 3은 본 발명의 한 실시예에 따른 데이터 생성부가 생성한 예시적인 맥락 벡터 및 워드 벡터를 도시한 도면이다.
도 4는 본 발명의 한 실시예에 따른 장애 예측부가 장애 예측 모델을 이용하여 네트워크 장애와 연관성이 큰 시스템 로그 및 고장 신고 정보를 결정하는 방법을 설명한 도면이다.
도 5는 본 발명의 한 실시예에 따른 장애 예측부에 의해 재학습된 결과 생성된 맥락 벡터 및 워드 벡터를 각각 도시한 도면이다.
도 6은 본 발명의 한 실시예에 따른 조치 방법 추천부가 조치 방법 추천 모델을 생성하고, 조치 방법을 추천하는 방법을 설명한 도면이다.
도 7은 본 발명의 한 실시예에 따른 네트워크 장애 처리 시스템이 네트워크 장애에 대한 조치 방법을 추천하는 방법을 설명하는 도면이다.1 is a diagram illustrating an environment in which a network failure processing system according to an embodiment of the present invention is implemented.
2 is a structural diagram of a network failure processing system 200 according to an embodiment of the present invention.
3 is a diagram illustrating an exemplary context vector and word vector generated by a data generator according to an exemplary embodiment of the present invention.
FIG. 4 is a diagram illustrating a method of determining a system log and failure report information that is highly related to a network failure by using a failure prediction unit according to an embodiment of the present invention.
5 is a diagram illustrating a context vector and a word vector generated as a result of re-learning by the disability predicting unit according to an embodiment of the present invention.
6 is a view for explaining a method for the action method recommendation unit generates an action method recommendation model and recommends an action method according to an embodiment of the present invention.
7 is a diagram illustrating a method for recommending a method for dealing with a network failure by a network failure processing system according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참고로 하여 본 발명의 실시예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.DETAILED DESCRIPTION Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art may easily implement the present invention. As those skilled in the art would realize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present invention. In the drawings, parts irrelevant to the description are omitted in order to clearly describe the present invention, and like reference numerals designate like parts throughout the specification.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 "…부", "…기", "모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.Throughout the specification, when a part is said to "include" a certain component, it means that it can further include other components, without excluding other components unless specifically stated otherwise. In addition, the terms “… unit”, “… unit”, “module”, etc. described in the specification mean a unit that processes at least one function or operation, which may be implemented by hardware or software or a combination of hardware and software. have.

도 1은 본 발명의 한 실시예에 따른 네트워크 장애 처리 시스템이 구현되는환경을 도시한 도면이다.1 is a diagram illustrating an environment in which a network failure processing system according to an embodiment of the present invention is implemented.

도 1을 참고하면, 네트워크(100)는 복수의 네트워크 장비들(110 내지 113)로 구성되며, 복수의 네트워크 장비들(110 내지 113) 각각은 네트워크 장애 처리 시스템(200)으로 시스템 로그 정보를 전송한다.Referring to FIG. 1, the network 100 includes a plurality of network devices 110 to 113, and each of the plurality of network devices 110 to 113 transmits system log information to the network failure processing system 200. do.

복수의 네트워크 장비들(110 내지 113)은 Syslog를 지원하는 네트워크 장비이면 되고, 장비의 종류 및 유형은 제한되지 않는다. 예를 들면, 복수의 네트워크 장비들(110 내지 113)은 네트워크(100)를 구성하는 라우터, 게이트웨이, 스위치 또는 허브일 수 있다.The plurality of network devices 110 to 113 may be network devices supporting Syslog, and the type and type of the device are not limited. For example, the plurality of network devices 110 to 113 may be routers, gateways, switches, or hubs constituting the network 100.

네트워크 장애 처리 시스템(200)은 복수의 네트워크 장비들(110 내지 113)로부터 수신한 시스템 로그 정보, 장애 발생 정보, 장애 발생시 고객으로부터 수집한 고장 신고 정보 및 조치 정보를 포함하는 트레이닝 데이터로 네트워크(100)의 장애 예측 모델을 학습시킨다.The network failure processing system 200 may include a network 100 as training data including system log information received from a plurality of network devices 110 through 113, failure occurrence information, failure report information and action information collected from a customer when a failure occurs. Train the predictive model of

네트워크 장애 처리 시스템(200)은 장애 예측 모델을 이용하여 네트워크 장애와 관련성이 높은 시스템 로그 정보 및 고장 신고 정보를 결정한다.The network failure processing system 200 determines system log information and failure report information highly related to a network failure by using a failure prediction model.

또한, 네트워크 장애 처리 시스템(200)은 결정된 시스템 로그 정보, 고장 신고 정보 및 조치 정보를 트레이닝 데이터로 하여 조치 방법 추천 모델을 학습시키고, 학습된 조치 방법 추천 모델을 이용하여 의사결정을 지원하는 정보를 운용 요원에게 제공한다.In addition, the network failure processing system 200 learns an action method recommendation model using the determined system log information, failure report information, and action information as training data, and provides information supporting decision making using the learned action method recommendation model. Provide to the operating personnel.

도 2는 본 발명의 한 실시예에 따른 네트워크 장애 처리 시스템(200)의 구조도이고, 도 3은 본 발명의 한 실시예에 따른 데이터 생성부가 생성한 예시적인 맥락 벡터 및 워드 벡터를 도시한 도면이고, 도 4는 본 발명의 한 실시예에 따른 장애 예측부가 장애 예측 모델을 이용하여 네트워크 장애와 연관성이 큰 시스템 로그 및 고장 신고 정보를 결정하는 방법을 설명한 도면이고, 도 5는 본 발명의 한 실시예에 따른 장애 예측부에 의해 재학습된 결과 생성된 맥락 벡터 및 워드 벡터를 각각 도시한 도면이고, 도 6은 본 발명의 한 실시예에 따른 조치 방법 추천부가 조치 방법 추천 모델을 생성하고, 조치 방법을 추천하는 방법을 설명한 도면이다.2 is a structural diagram of a network failure processing system 200 according to an embodiment of the present invention, and FIG. 3 is a diagram illustrating an exemplary context vector and a word vector generated by a data generator according to an embodiment of the present invention. 4 is a diagram for describing a method of determining a system log and failure report information that is highly related to a network failure by using a failure prediction unit according to an embodiment of the present invention, and FIG. 5 illustrates an embodiment of the present invention. FIG. 6 is a diagram illustrating a context vector and a word vector generated as a result of re-learning by the failure predicting unit according to an example. It is a figure explaining the method of recommending a method.

도 2를 참고하면, 네트워크 장애 처리 시스템(200)은 데이터 생성부(210), 장애 예측부(220) 및 조치 방법 추천부(230)를 포함한다.Referring to FIG. 2, the network failure processing system 200 includes a data generator 210, a failure predictor 220, and an action method recommender 230.

데이터 생성부(210)는 네트워크(100)를 구성하는 네트워크 장비들(110 내지 130)으로부터 수신한 시스템 로그 정보의 맥락 벡터, 네트워크(100)에 대한 고장 신고 정보의 워드 벡터 및 네트워크 장비들(110 내지 130)의 장애 발생 정보를 포함하는 트레이닝 데이터를 생성하며, 이 경우 지도 학습을 위해 맥락 벡터 및 워드 벡터에는 해당 벡터의 장애 발생 정보가 매핑된다.The data generator 210 may include a context vector of system log information received from the network devices 110 to 130 constituting the network 100, a word vector of the failure notification information for the network 100, and the network devices 110. To 130) to generate training data including failure occurrence information. In this case, failure occurrence information of the corresponding vector is mapped to the context vector and the word vector for supervised learning.

구체적으로, 데이터 생성부(210)는 네트워크 장비들(110 내지 130)로부터 수신한 시스템 로그 정보를 딥러닝(Deep learning) 모형이 학습할 수 있도록 시스템 로그 정보의 맥락 정보를 추출하고, 추출한 맥락 정보에 대해 벡터화 알고리즘을 이용하여 맥락 벡터로 표현한다.Specifically, the data generator 210 extracts the context information of the system log information so that the deep learning model can learn the system log information received from the network devices 110 to 130, and extracts the extracted context information. Represents as a context vector using a vectorization algorithm.

여기서, 맥락 벡터는 시스템 로그 정보의 맥락 정보를 나타내는 벡터를 지칭한다.Here, the context vector refers to a vector representing context information of system log information.

데이터 생성부(210)는 Word2Vector 또는 Glove 기반의 벡터화 알고리즘을 이용하여 네트워크 장비로부터 수신한 시스템 로그 정보의 맥락 정보를 벡터화할 수 있다. 예를 들면, 도 3을 참고하면, 데이터 생성부(210)는 시스템 로그 정보에서 추출된 맥락 정보 "Timestamp", "IP", "Hostname", "Summary" 및 "Description" 각각에 3차원의 벡터를 할당하여 맥락 벡터를 생성할 수 있다.The data generator 210 may vectorize the context information of the system log information received from the network device by using a vectorization algorithm based on Word2Vector or Glove. For example, referring to FIG. 3, the data generator 210 has a three-dimensional vector in each of context information “Timestamp”, “IP”, “Hostname”, “Summary” and “Description” extracted from system log information. Can be assigned to create a context vector.

이 경우, 생성된 맥락 벡터는 시스템 로그 정보의 시퀀스 정보를 포함한다.In this case, the generated context vector includes sequence information of system log information.

이후, 데이터 생성부(210)는 장애 발생 정보를 결정하고 시스템 로그 정보의 맥락 벡터에 결정된 장애 발생 정보를 매핑한다.Thereafter, the data generator 210 determines the failure occurrence information and maps the determined failure occurrence information to the context vector of the system log information.

구체적으로, 데이터 생성부(210)는 미리 설정된 시간마다 네트워크 장비들(110 내지 130)로 응답 요청 메시지를 전송하고, 응답 메시지의 수신 여부에 따라 네트워크 장비들(110 내지 130)의 장애 발생 정보를 결정하고, 해당 네트워크 장비의 맥락 벡터와 매핑한다.In detail, the data generator 210 transmits a response request message to the network devices 110 to 130 at predetermined time intervals, and provides failure occurrence information of the network devices 110 to 130 according to whether the response message is received. And map it with the context vector of the network device.

데이터 생성부(210)는 특정 시간에 대하여 네트워크 장비들(110 내지 130)로부터 응답 메시지를 수신한 경우, 해당 시간대의 네트워크 장비들(110 내지 130) 각각으로부터 수신한 시스템 로그 정보의 맥락 벡터에 대해 "정상" 장애 발생 정보를 매핑한다.When the data generator 210 receives a response message from the network devices 110 to 130 for a specific time, the data generator 210 may determine a context vector of system log information received from each of the network devices 110 to 130 at a corresponding time zone. Map "normal" fault information.

또한, 데이터 생성부(210)는 특정 네트워크 장비로부터 응답 메시지를 수신하지 못한 경우에도 실제 운용 요원의 조치가 이루어지지 않았을 경우 "정상" 장애 발생 정보를 해당 네트워크 장비로부터 수신한 시스템 로그 정보의 맥락 벡터와 매핑한다.In addition, the data generator 210 is a context vector of the system log information received from the network device "normal" failure occurrence information when the response of the actual operating personnel is not performed even if the response message from the specific network equipment is not received Map with.

반대로, 데이터 생성부(210)는 실제 운용 요원의 조치가 이루어졌을 경우 "비정상" 장애 발생 정보 및 실제 운용 요원이 수행한 조치 방법을 나타내는 조치 정보를 추가로 매핑한다. 이 경우, 데이터 생성부(210)는 해당 조치 정보를 별도의 관리 서버(미도시)에서 수신할 수 있다.On the contrary, the data generation unit 210 further maps "abnormal" failure occurrence information and action information indicating an action method performed by the actual operation agent when the action of the actual operation agent is performed. In this case, the data generator 210 may receive the corresponding action information from a separate management server (not shown).

데이터 생성부(210)는 고장 신고 정보에 대해서도 딥러닝(Deep learning) 모형이 학습할 수 있도록 워드 벡터로 표현하고, 워드 벡터에 장애 발생 정보를 매핑한다.The data generator 210 may express the failure report information as a word vector so that the deep learning model may learn, and map the failure occurrence information to the word vector.

여기서, 고장 신고 정보는 네트워크(100)에 대한 고객의 의견 정보를 지칭하며, 고객으로부터 다양한 방식에 의해 수집되어 별도의 데이터베이스(미도시)에 저장될 수 있다. 데이터 생성부(210)는 상기 데이터베이스에 접근하여 고장 신고 정보를 액세스할 수 있다. 또한, 워드 벡터는 고장 신고 정보에 대해 벡터화 알고리즘을 사용하여 고장 신고 정보를 표현한 벡터를 지칭한다.Here, the failure report information refers to the customer's opinion information on the network 100, it may be collected by the customer in various ways and stored in a separate database (not shown). The data generation unit 210 may access the failure report information by accessing the database. In addition, the word vector refers to a vector representing the failure report information using the vectorization algorithm for the failure report information.

예를 들면, 도 3을 참고하면, 데이터 생성부(210)는 고장 신고 정보 "자주 인터넷이 끊겼어요"의 각 형태소마다 3차원의 벡터를 할당하여 워드 벡터를 생성할 수 있다.For example, referring to FIG. 3, the data generation unit 210 may generate a word vector by assigning a three-dimensional vector to each morpheme of the failure report information "the Internet is frequently disconnected".

또한, 데이터 생성부(210)는 시스템 로그 정보의 맥락 벡터와 동일한 방식으로, 고장 신고 정보의 워드 벡터에 대해 "정상" 장애 발생 정보 또는 "비정상" 장애 발생 정보를 매핑한다.In addition, the data generator 210 maps "normal" failure occurrence information or "abnormal" failure occurrence information to the word vector of the failure report information in the same manner as the context vector of the system log information.

장애 예측부(220)는 데이터 생성부(210)가 생성한 데이터를 트레이닝 데이터로 하여 딥러닝 알고리즘을 통해 네트워크(100)의 장애 예측 모델을 생성한다.The failure prediction unit 220 generates a failure prediction model of the network 100 through a deep learning algorithm using the data generated by the data generation unit 210 as training data.

예를 들면, 장애 예측부(220)는 데이터 생성부(210)가 생성한 데이터를 초기 입력 값으로 사용하여 관심-재귀신경망(Attention-Based Long Short-Term Memory Network, Attention-Based LSTM Network) 모델을 학습할 수 있다. 관심-재귀신경망을 사용하여 학습 모델을 생성하는 방법은 공지된 기술인바, 본 명세서에서는 자세한 설명을 생략한다.For example, the failure prediction unit 220 may use an Attention-Based Long Short-Term Memory Network (Attention-Based LSTM Network) model by using the data generated by the data generator 210 as an initial input value. Can learn. The method of generating a learning model using the ROI is a well-known technique, and thus detailed description thereof will be omitted.

장애 예측부(220)는 장애 예측 모델을 이용하여 네트워크 장애와 관련된 시스템 로그 정보의 맥락 벡터 및 고장 신고 정보의 워드 벡터를 결정한다.The failure prediction unit 220 determines a context vector of system log information related to a network failure and a word vector of failure report information by using a failure prediction model.

예를 들면, 도 4를 참고하면, 장애 예측부(220)는 특정 시간에 수신한 시스템 로그 정보의 맥락 벡터와 고장 신고 정보의 워드 벡터를 관심-재귀신경망을 기반으로 생성된 장애 예측 모델의 히든 레이어로 구성하고, 이미 학습된 관심 레이어들의 가중치들과 재학습을 수행할 수 있다.For example, referring to FIG. 4, the failure predictor 220 hides the context vector of the system log information and the word vector of the failure report information received at a specific time, based on the interest-recursive neural network. A layer may be configured, and weights and relearning of layers of interest that have already been learned may be performed.

도 5는 재학습 결과 생성된 맥락 벡터 및 워드 벡터를 각각 도시한 도면이다.5 is a diagram illustrating a context vector and a word vector generated as a result of relearning.

도 5를 참고하면, 재학습 결과 생성된 맥락 벡터 및 워드 벡터의 원소값은 네트워크 장애와의 관련도를 의미한다. 즉, 재학습 결과 생성된 맥락 벡터 및 워드 벡터의 원소값이 크면 클수록 해당 맥락 벡터와 관련된 시스템 로그 정보 및 해당 워드 벡터와 관련된 고장 신고 정보는 네트워크 장애와 관련도가 높은 것을 의미한다.Referring to FIG. 5, element values of the context vector and the word vector generated as a result of relearning indicate a degree of association with a network failure. That is, as the element values of the context vector and the word vector generated as a result of the re-learning are larger, the system log information related to the context vector and the failure report information related to the word vector mean that the network failure is highly related.

장애 예측부(220)는 재학습 결과 생성된 맥락 벡터 및 워드 벡터의 원소값을 이용하여 네트워크 장애와 관련된 시스템 로그 정보 및 고장 신고 정보를 결정한다.The failure prediction unit 220 determines system log information and failure report information related to a network failure by using element values of the context vector and the word vector generated as a result of the re-learning.

예를 들면, 장애 예측부(220)는 재학습 결과 생성된 맥락 벡터 및 워드 벡터의 원소값 중에서 가장 높은 원소값을 가진 맥락 벡터 및 워드 벡터를 결정할 수 있으며, 상기 맥락 벡터의 시스템 로그 정보 및 상기 워드 벡터의 고장 신고 정보를 네트워크 장애와 관련된 시스템 로그 정보 및 고장 신고 정보로서 결정할 수 있다.For example, the failure prediction unit 220 may determine the context vector and the word vector having the highest element value among the element values of the context vector and the word vector generated as a result of the re-learning, and the system log information of the context vector and the Failure report information of the word vector may be determined as system log information and failure report information related to a network failure.

다른 실시예에서, 장애 예측부(220)는 장애 예측 모델이 생성된 이후 네트워크 장비들(110 내지 130)로부터 수신한 시스템 로그 정보의 맥락 벡터 및 별도의 데이터베이스로부터 수신한 고장 신고 정보의 워드 벡터를 장애 예측 모델에 입력하여 네트워크(100) 내 장애 발생 여부를 결정한다.In another embodiment, the failure prediction unit 220 may generate a context vector of system log information received from the network devices 110 to 130 and a word vector of failure report information received from a separate database after the failure prediction model is generated. It is input to a failure prediction model to determine whether a failure occurs in the network 100.

이 경우, 장애 예측부(220)는 시스템 로그 정보의 맥락 벡터 중 장애 정보와 관련이 높은 "Summary" 및/또는 "Description"부분을 중심으로 벡터간 유사도를 판단할 수 있다. 예를 들면, 장애 예측부(220)는 장애가 발생한 경우의 시스템 로그 정보의 맥락 벡터와 장애 예측 모델이 생성된 이후 수신한 시스템 로그 정보의 맥락 벡터를 비교시 "Summary" 및/또는 "Description"부분을 중심으로 코사인 유사도 알고리즘을 사용하여 벡터 간 유사도를 결정할 수 있다.In this case, the failure prediction unit 220 may determine the similarity between the vectors based on the "Summary" and / or "Description" part of the context vector of the system log information that is related to the failure information. For example, the failure predictor 220 compares the context vector of the system log information when the failure occurs with the context vector of the system log information received after the failure prediction model is generated. The cosine similarity algorithm can be used to determine the similarity between vectors.

장애 예측부(220)는 결정된 시스템 로그 정보 및 고장 신고 정보를 운용 요원에게 제공할 수도 있다.The failure prediction unit 220 may provide the determined system log information and failure report information to the operation personnel.

조치 방법 추천부(230)는 장애 예측부(220)에 의해 결정된 시스템 로그 정보, 고장 신고 정보 및 조치 정보를 트레이닝 데이터로 하여 조치 방법 추천 모델을 학습시키고, 학습된 조치 방법 추천 모델을 이용하여 네트워크 장애에 대한 하나 이상의 조치 방법들을 결정한다.The action method recommendation unit 230 trains the action method recommendation model using the system log information, the failure report information, and the action information determined by the failure prediction unit 220 as training data, and uses the learned action method recommendation model to network. Determine one or more measures for the failure.

구체적으로, 도 6을 참고하면, 조치 방법 추천부(230)는 재학습 결과 생성된 맥락 벡터 또는 워드 벡터, 및 관리 서버로부터 수신한 조치 정보를 트레이닝 데이터로 하여 딥러닝 알고리즘을 통해 상기 네트워크의 조치 방법 추천 모델을 학습시킨다. 이 경우, 조치 방법 추천부(230)는 조치 정보에 포함된 실제 운용 요원이 수행한 조치 방법을 Softmax classifier 기반의 조치 방법 추천 모델로 학습할 수 있다.Specifically, referring to FIG. 6, the action method recommending unit 230 uses the context vector or word vector generated as a result of the re-learning and the action information received from the management server as training data, and measures the network through the deep learning algorithm. Train the method recommendation model. In this case, the action method recommendation unit 230 may learn the action method performed by the actual operation personnel included in the action information as a softmax classifier-based action method recommendation model.

또한, 조치 방법 추천부(230)는 조치 방법 추천 모델을 이용하여, 조치 정보에 포함된 각 조치 방법과 재학습 결과 생성된 맥락 벡터 및 워드 벡터와의 관련도를 결정하고, 각 조치 방법과 이에 대응하는 관련도를 운용 요원에게 제공할 수 있다. 이 경우, 조치 방법 추천부(230)는 softmax 레이어를 통해 각 조치 방법의 관련도를 계산할 수 있고, 관련도가 높은 상위 특정 개수의 조치 방법을 네트워크 장애에 대한 하나 이상의 조치 방법들로서 결정할 수 있다.In addition, the action method recommendation unit 230 determines a degree of relevance between each action method included in the action information and the context vector and word vector generated as a result of re-learning using the action method recommendation model, and each action method and Corresponding relevance may be provided to the operating personnel. In this case, the action method recommendation unit 230 may calculate the relevance of each action method through the softmax layer, and determine a higher specific number of action methods that are highly related as one or more action methods for the network failure.

도 7은 본 발명의 한 실시예에 따른 네트워크 장애 처리 시스템이 네트워크 장애에 대한 조치 방법을 추천하는 방법을 설명하는 도면이다.7 is a diagram illustrating a method for recommending a method for dealing with a network failure by a network failure processing system according to an embodiment of the present invention.

도 7을 참고하면, 네트워크 장애 처리 시스템(200)은 벡터화 알고리즘을 이용하여 시스템 로그 정보의 맥락 벡터 및 고장 신고 정보의 워드 벡터를 생성하고(S100), 생성된 맥락 벡터 및 워드 벡터에 장애 발생 정보를 매핑한다(S110).Referring to FIG. 7, the network failure processing system 200 generates a context vector of system log information and a word vector of failure report information by using a vectorization algorithm (S100), and failure occurrence information in the generated context vector and word vector. To map (S110).

구체적으로, 네트워크 장애 처리 시스템(200)은 네트워크 장비들(110 내지 130)로부터 수신한 시스템 로그 정보를 딥러닝(Deep learning) 모형이 학습할 수 있도록 시스템 로그 정보의 맥락 정보를 추출하고, 추출한 맥락 정보에 대해 벡터화 알고리즘을 이용하여 맥락 벡터로 표현한다. 또한, 네트워크 장애 처리 시스템(200)은 고객으로부터 수신한 고장 신고 정보에 대해 벡터화 알고리즘을 이용하여 고장 신고 정보의 워드 벡터를 생성한다.Specifically, the network failure processing system 200 extracts and extracts context information of the system log information so that a deep learning model can learn the system log information received from the network devices 110 to 130. The information is expressed as a context vector using a vectorization algorithm. In addition, the network failure processing system 200 generates a word vector of the failure report information using the vectorization algorithm for the failure report information received from the customer.

또한, 네트워크 장애 처리 시스템(200)은 특정 시간에 네트워크 장비들(110 내지 130)로부터 수신한 응답 메시지를 통해 장애 발생 정보를 결정하고, 지도 학습을 위한 트레이닝 데이터를 위해 해당 시간에 수신한 시스템 로그 정보의 맥락 벡터 및 고장 신고 정보의 워드 벡터에 장애 발생 정보를 맵핑한다.In addition, the network failure processing system 200 determines failure occurrence information through a response message received from the network devices 110 to 130 at a specific time, and the system log received at that time for training data for supervised learning. The failure occurrence information is mapped to the context vector of the information and the word vector of the failure notification information.

네트워크 장애 처리 시스템(200)은 장애 발생 정보가 맵핑된 시스템 로그 정보의 맥락 벡터 및 고장 신고 정보를 이용하여 장애 예측 모델을 생성한다(S120). 이 경우, 네트워크 장애 처리 시스템(200)은 관심-재귀신경망 모델을 학습할 수 있다.The network failure processing system 200 generates a failure prediction model using the context vector and the failure report information of the system log information to which the failure occurrence information is mapped (S120). In this case, the network failure processing system 200 may learn an interest-recursive neural network model.

네트워크 장애 처리 시스템(200)은 장애 예측 모델을 이용하여 네트워크 장애와 관련된 시스템 로그 정보의 맥락 벡터 및 고장 신고 정보의 워드 벡터를 결정한다(S130).The network failure processing system 200 determines a context vector of system log information related to a network failure and a word vector of failure report information by using a failure prediction model (S130).

구체적으로, 네트워크 장애 처리 시스템(200)은 장애 예측 모델 생성 이후 수신된 시스템 로그 정보의 맥락 벡터 또는 고장 신고 정보의 워드 벡터를 이용하여, 장애 예측 모델을 재학습 시키며, 재학습 결과 생성된 맥락 벡터 및 워드 벡터 의 원소값을 이용하여 네트워크 장애와의 관련도를 결정한다.Specifically, the network failure processing system 200 re-learns the failure prediction model by using the context vector of the system log information received after the failure prediction model generation or the word vector of the failure report information, and the context vector generated as a result of the re-learning. And the element value of the word vector to determine the degree of association with the network failure.

재학습 결과 생성된 맥락 벡터 및 워드 벡터의 원소값이 클수록 네트워크 장애와 관련도가 높은 벡터인바, 네트워크 장애 처리 시스템(200)은 관련도가 높은 특정 순위의 맥락 벡터 또는 워드 벡터를 네트워크 장애와 관련된 맥락 벡터 또는 워드 벡터로서 결정할 수 있다.As the element values of the context vector and the word vector generated as a result of the re-learning are higher, the vector associated with the network failure is higher. It can be determined as a context vector or word vector.

네트워크 장애 처리 시스템(200)은 결정된 맥락 벡터, 워드 벡터 및 조치 정보와의 연관성을 학습한다(S140).The network failure processing system 200 learns an association with the determined context vector, word vector, and action information (S140).

구체적으로, 네트워크 장애 처리 시스템(200)은 결정된 맥락 벡터 또는 워드 벡터에 매핑된 조치 정보에 포함된 조치 방법과 해당 맥락 벡터 또는 해당 워드 벡터를 트레이닝 데이터로 하여 딥러닝 알고리즘을 통해 네트워크(100)의 조치 방법 추천 모델을 생성한다.In detail, the network failure processing system 200 uses the deep learning algorithm of the network 100 through the deep learning algorithm using the action method included in the action information mapped to the determined context vector or the word vector and the context vector or the corresponding word vector as training data. Action Create a recommendation model.

이후, 네트워크 장애 처리 시스템(200)은 조치 정보에 포함된 각 조치 방법과 이에 대응하는 관련도를 결정한다(S150). 이 경우, 네트워크 장애 처리 시스템(200)은 조치 방법 추천 모델에 따라 각 조치 방법에 대해 관련도를 결정할 수 있으며, 결정한 조치 방법 및 이에 대한 관련도를 운용 요원에게 제공할 수 있다.Thereafter, the network failure processing system 200 determines each action method included in the action information and the corresponding degree corresponding thereto (S150). In this case, the network failure processing system 200 may determine the relevance for each action method according to the action method recommendation model, and may provide the operation personnel with the determined action method and the relevance thereto.

이상에서 본 발명의 실시예에 대하여 상세하게 설명하였지만 본 발명의 권리범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 발명의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 발명의 권리범위에 속하는 것이다.Although the embodiments of the present invention have been described in detail above, the scope of the present invention is not limited thereto, and various modifications and improvements of those skilled in the art using the basic concepts of the present invention defined in the following claims are also provided. It belongs to the scope of rights.

Claims

A network fault handling system,
A context vector is generated by applying a vectorization algorithm to context information of Syslog information of network equipments, a word vector is generated by applying a vectorization algorithm to failure notification information about a network including the network equipments, and the context A data generator for generating training data including a vector, the word vector, and failure occurrence information of the network equipment;
A failure prediction unit learning a failure prediction model using the training data and determining system log information and failure report information related to a network failure using the failure prediction model.
Network failure processing system comprising a.

In claim 1,
The data generation unit
Network failure processing system for determining the failure occurrence information of the network equipment based on the response message received from each network equipment.

In claim 1,
The failure prediction unit
Re-learn the failure prediction model using the context vector of the system log information received after the failure prediction model has been learned and the word vector of the failure report information, and the element values of the context vector and the word vector generated as a result of the re-learning are obtained. And determining system log information and failure report information related to the network failure.

In claim 3,
Action method recommending unit for learning the action method recommendation model using the determined system log information, failure report information and action information as training data
Network failure processing system further comprising.

In claim 4,
The measures method recommendation part mentioned above
A network failure processing system for determining one or more measures for the system log information and the failure report information by inputting the system log information and the failure report information received after the action method recommendation model is learned into the action method recommendation model. .