KR102418594B1

KR102418594B1 - Ict equipment management system and method there of

Info

Publication number: KR102418594B1
Application number: KR1020170142075A
Authority: KR
Inventors: 성종규; 오준석; 이동준; 이종필
Original assignee: 주식회사 케이티
Priority date: 2017-10-30
Filing date: 2017-10-30
Publication date: 2022-07-08
Also published as: KR20190047809A

Abstract

본 발명은 디바이스 정보를 저장하는 분산 데이터 저장부, 상기 디바이스의 장애 및 성능과 관련된 운용 데이터를 저장하는 운용 로그 수집부, 상기 운용 데이터를 머신 러닝 로직을 이용하여 학습하는 운용 품질 학습부, 상기 학습을 통하여 심각도(Severity) 및 품질 저하 원인 값을 산출하는 운용 품질 예측부, 및 대개체 대상이 되는 품질 저하 장비의 조회 조건을 저장하는 품질저하 프로파일 관리 블록을 포함하는 품질 저하 장비 추출부를 포함한다.
본 발명은 대량의 이벤트로 나타나는 ICT 운용 정보를 기반으로, 성능 저하의 징후를 사전에 자동으로 감지하여, 문제가 발생할 수 있는 장비를 추출함으로써, 성능저하 발생 요인을 사전에 차단하여 지속적으로 안정적인 서비스를 제공할 수 있게 한다.The present invention is a distributed data storage unit for storing device information, an operation log collection unit for storing operation data related to failure and performance of the device, an operation quality learning unit for learning the operation data using machine learning logic, the learning It includes an operation quality prediction unit that calculates the severity and the cause of quality deterioration through , and a quality deterioration equipment extraction unit including a quality deterioration profile management block that stores the inquiry condition of the quality deterioration equipment to be replaced.
The present invention automatically detects signs of performance degradation in advance based on ICT operation information that appears as a mass event, and extracts equipment that may cause problems, thereby blocking factors that cause performance degradation in advance to provide a continuous and stable service to be able to provide

Description

ICT EQUIPMENT MANAGEMENT SYSTEM AND METHOD THERE OF

본 발명은 ICT 장비의 관리 시스템 및 방법에 관한 것으로, 보다 구체적으로는 대규모 네트워크를 운영하는 망사업자 또는 기업에서 내외부 서비스 제공을 위한 네트워크, 서버, 전원 설비 등 ICT 장비의 안정적인 운용을 위하여, 품질 저하에 영향을 미칠 수 있는 장비를 사전에 추출하여, 최적의 시점에 장비를 대개체 하는 서비스를 제공하는 시스템 및 방법에 관한 것이다.The present invention relates to a management system and method for ICT equipment, and more specifically, for the stable operation of ICT equipment such as a network, server, and power equipment for providing internal and external services in a network operator or company operating a large-scale network, deterioration of quality It relates to a system and method that provides a service to replace equipment at an optimal time by extracting equipment that may have an impact on the system in advance.

현재 네트워크를 판매하는 통신사업자들을 포함하여, 대부분 기업에서는 전용회선, 서버, 스위치, 라우터, UPS 전원, 각종 컨트롤러 장비 등 다양한 ICT (Information & Communication Technology)장비를 사용하고 있다. Currently, most companies, including telecommunication service providers that sell networks, use various ICT (Information & Communication Technology) equipment such as leased lines, servers, switches, routers, UPS power supplies, and various controller equipment.

일부 기업들에서는 이들 장비들을 임대하기도 하지만, 아직까지는 대부분의 기업들이 자사의 장비를 구매를 통하여 구축하고 있다. 이에 회사를 운영하는데 있어, ICT 구매와 운용에 많은 비용이 소요되어, 기업에서는 어떻게 하면 적정한 규모를 적시에 구매하여, 이를 안정적이고 효율적으로 운용할 수 있는 방안에 대해 연구하고 있다.Although some companies lease these equipments, most companies are still building their own equipment through purchase. Therefore, in running a company, ICT purchase and operation costs a lot of money, so companies are researching how to purchase an appropriate size at the right time and operate it stably and efficiently.

종래의 운용관리 시스템은, 네트워크 및 서버의 운용관리를 위해 FCAPS(Fault, Configuration, Account, Performance, Security) 도메인의 관리 기능을 구비하여 네트워크 장비 및 회선을 모니터링하는 시스템으로, 네트워크 장비나 회선의 경보 및 고장을 감시 모니터링하는 장애관리부, 장비나 회선의 물리/논리적인 컨피그(Config) 및 운용상태 정보를 모니터링하는 구성관리부, 과금 정보의 수집/저장/제어 어카운트 관리부, 장비/회선의 성능/트래픽/품질에 대한 모니터링 및 통계관리를 담당하는 성능관리부, 보안/안전/기밀 관리 등의 모너터링하는 보안관리부, 프로토콜로는SNMP(Simple Network Management), ICMP, Netconf, Netflow, RMON 등 표준 프로토콜 및 자체 프로토콜을 사용하는 구성으로 이루어져 있었다. The conventional operation management system is a system for monitoring network equipment and lines by having a management function of the FCAPS (Fault, Configuration, Account, Performance, Security) domain for operation and management of networks and servers. and failure management unit to monitor and monitor failures, configuration management unit to monitor physical/logical configuration and operation status information of equipment or lines, collection/storage/control account management unit for billing information, performance/traffic/ Performance management department in charge of quality monitoring and statistical management, security management department monitoring security/safety/confidentiality management, etc. Protocols include standard protocols such as SNMP (Simple Network Management), ICMP, Netconf, Netflow, RMON, etc. and own protocols It consisted of the use of

이러한 종래의 시스템은 모두 사후 조치를 위한 방법으로, 이상 징후나 사전 감지를 통한 선 조치에는 적절히 대처할 수 없는 측면이 있었다. 또한, 트래픽 이용률, CPU, Memory 이용률에 대한 TCA(Threshold Crossing Alert)에 의한 이상 징후 감지 방법은 특정한 값에 의한 부하가 정상범위를 벗어날 경우 이를 이벤트화 하여 감지하는 방법인데, 다양한 원인으로 다양한 증상이 나타나는 장비의 이상 유무에 정확히 동작하는 임계값이 존재할 수가 없기 때문에 항상 긍정 오류(False Positive)가 너무 많을 수 밖에 없어 정확도가 현저히 떨어지는 문제점이 있었다.All of these conventional systems are methods for follow-up, and there was an aspect that could not properly cope with anomalies or pre-measurements through pre-detection. In addition, the method of detecting anomalies by TCA (Threshold Crossing Alert) for traffic utilization, CPU, and memory utilization is a method to detect when the load by a specific value is out of the normal range by turning it into an event. There was a problem that the accuracy was significantly lowered because there were always too many false positives because there could not be a threshold value that worked exactly when there was an abnormality in the equipment that appeared.

종래의 운용관리 시스템 하에서는 사전에 장애 및 품질저하를 예측이 어려워, 이에 대한 원인을 사전에 제거할 수도 없고, 결국 전체 서비스의 안정성을 보장할 수 없게 된다. Under the conventional operation management system, it is difficult to predict failures and quality deterioration in advance, and the causes thereof cannot be eliminated in advance, and consequently, the stability of the entire service cannot be guaranteed.

즉, 종래의 자원 관리 시스템에서는 장비의 재원과 관련된 내용을 관리할 수 있지만, 장비의 성능저하를 사전에 감지 하여 수많은 장비들 중에서 지속적으로 품질 저하를 일으키는 장비를 사전에 추출하여 장비를 업그레이드 하거나 대개체를 준비할 수는 없다.In other words, in the conventional resource management system, the contents related to the financial resources of equipment can be managed, but by detecting the deterioration of equipment in advance, the equipment that causes continuous quality deterioration among numerous equipment is extracted in advance to upgrade or replace equipment. You cannot prepare an object.

본 발명은 상기 문제점을 해결하기 위한 것으로서, 대규모 네트워크를 운영하는 망사업자 또는 기업에서 ICT 장비를 안정적으로 관리할 수 있도록, 사전에, 업그레이드 및 대개체의 대상이 되는 장비를 추출하는 ICT 장비의 관리 시스템 및 방법을 제공하는데 있다.The present invention is to solve the above problems, and in advance, the management of ICT equipment that extracts equipment to be upgraded and replaced so that a network operator or company operating a large-scale network can stably manage ICT equipment To provide a system and method.

상기 기술적 과제를 해결하기 위하여 본 발명은 품질 저하 장비 관리 장치를 개시하여 디바이스 정보를 저장하는 분산 데이터 저장부, 상기 디바이스의 장애 및 성능과 관련된 운용 데이터를 저장하는 운용 로그 수집부, 상기 운용 데이터를 머신 러닝 로직을 이용하여 학습하는 운용 품질 학습부, 상기 학습을 통하여 심각도(Severity) 및 품질 저하 원인 값을 산출하는 운용 품질 예측부, 및 대개체 대상이 되는 품질 저하 장비의 조회 조건을 저장하는 품질저하 프로파일 관리 블록을 포함하는 품질 저하 장비 추출부를 포함한다.In order to solve the above technical problem, the present invention discloses a quality degradation equipment management apparatus, a distributed data storage unit for storing device information, an operation log collection unit for storing operation data related to failure and performance of the device, and the operation data An operation quality learning unit that learns using machine learning logic, an operation quality prediction unit that calculates the severity and the cause of quality deterioration through the learning, and a quality that stores the inquiry condition of the quality deterioration equipment to be replaced and a degradation equipment extraction unit including a degradation profile management block.

또한 상기 운용 로그 수집부에 저장되는 운용데이터는 디바이스의 장애 이벤트, 임계 값 초과 경고(Thresholds Crossing Alert, TCA) 이벤트 및 시스템 로그 정보를 포함하는 것을 특징으로 한다.In addition, the operation data stored in the operation log collection unit is characterized in that it includes a device failure event, a threshold value crossing warning (Thresholds Crossing Alert, TCA) event, and system log information.

또한 상기 운용 품질 학습부는 머신러닝을 수행하는 머신러닝 처리블록, 상기 머신러닝 처리블록이 학습할 수 있도록 데이터 타입을 변환하는 전처리 기능블록을 포함하는 것을 특징으로 한다.In addition, the operation quality learning unit is characterized in that it includes a machine learning processing block for performing machine learning, a preprocessing function block for converting a data type so that the machine learning processing block can learn.

또한 상기 품질 저하 장비의 조회 조건은 장비 도입일, 품질 이벤트 임계치, 장애 이벤트 임계치, 성능 이벤트 임계치, 운용 VoC(Voice of Customer) 이벤트 임계치, 시스템 로그(SYSLOG) 이벤트 임계치, 심각도(Severity) 및 서비스 등급에서 선택되는 어느 하나 이상을 포함하는 조건인 것을 특징으로 한다.In addition, the inquiry condition of the degraded equipment includes equipment introduction date, quality event threshold, failure event threshold, performance event threshold, operational Voice of Customer (VoC) event threshold, SYSLOG event threshold, severity, and service level. It is characterized in that it is a condition including any one or more selected from.

또한 품질 저하 장비와 연관 장비들의 장애 및 품질 상관도를 평가하는 품질 영향도 분석 블록, 상기 품질저하 장비의 리소스 사용에 대한 예측을 수행하는 성능 예측 블록, 추천된 대개체 장비로 보완할 때 예상되는 품질 영향도를 계산하는 대개체 시뮬레이터 블록을 포함하는 ICT 설계모듈을 더 포함하는 것을 특징으로 한다.In addition, the quality impact analysis block that evaluates the failure and quality correlation of the degraded equipment and related equipment, the performance prediction block that predicts the resource use of the degraded equipment, and the recommended replacement equipment It is characterized in that it further comprises an ICT design module including a substitute simulator block for calculating the quality impact.

또한 상기 품질 영향도 분석 블록은 장비의 리소스 이용 현황을 예측하여, 대개체 대상 장비의 추천모델을 추출하고, 대개체 시점을 예측하는 과정을 수행하는 것을 특징으로 한다.In addition, the quality impact analysis block predicts the resource usage status of the equipment, extracts a recommended model of the replacement target equipment, and performs a process of predicting the replacement time.

또한 상기 대개체 대상 장비의 추천모을의 추출은 GUI(Graphical User Interface])로 표시되는 것을 특징으로 한다.In addition, it is characterized in that the extraction of the recommended collection of the replacement target equipment is displayed in a graphical user interface (GUI).

본 발명의 실시예에 따르면, 대량의 이벤트로 나타나는 ICT 운용 정보를 기반으로, 성능 저하의 징후를 사전에 자동으로 감지하여, 문제가 발생할 수 있는 장비를 추출함으로써, 성능저하 발생 요인을 사전에 차단하여 지속적으로 안정적인 서비스를 제공할 수 있게 한다.According to an embodiment of the present invention, based on ICT operation information that appears as a mass event, automatically detects signs of performance degradation in advance and extracts equipment that may cause problems, thereby blocking factors that cause performance degradation in advance This enables us to provide a continuous and stable service.

또한 장애 및 품질저하의 요인이 되는 장비를 자동으로 파악하여 제공함으로서, 대개체 시점과 규모를 파악할 수 있게하고, 이를 통해 지속적으로 안정적인 서비스를 제공할 수 있도록 한다.In addition, by automatically identifying and providing equipment that is a factor in failure and quality deterioration, it is possible to determine the time and scale of replacement, and through this, it is possible to continuously provide stable service.

그리고 ICT 서비스와 관련된 장애를 사전에 방지하며, 이와 관련된 네트워크 운용비를 절감하고, 업무 효율성을 향상시킬 수 있다. 이는 네트워크 기반의 대규모 ICT 장비를 운용하는 회사들만이 아니라, 이들을 대상으로 서비스를 제공하고 운용대행을 수행하고 있는 통신망 사업자들에게 도움이 될 수 있다.In addition, it is possible to prevent obstacles related to ICT services in advance, reduce related network operation costs, and improve work efficiency. This can be helpful not only to companies operating large-scale network-based ICT equipment, but also to network operators who provide services to them and perform operation on behalf of them.

다만, 본 발명의 실시 예들에 따른 ICT 장비의 관리 시스템 및 방법이 달성할 수 있는 효과는 이상에서 언급한 것들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.However, the effects that can be achieved by the management system and method of ICT equipment according to the embodiments of the present invention are not limited to those mentioned above, and other effects not mentioned are from the description below in the technical field to which the present invention belongs It will be clearly understood by those of ordinary skill in the art.

본 발명에 관한 이해를 돕기 위해 상세한 설명의 일부로 포함되는, 첨부도면은 본 발명에 대한 실시예를 제공하고, 상세한 설명과 함께 본 발명의 기술적 사상을 설명한다.
도 1은 본 발명에 따른 ICT 장비의 관리 시스템의 전체 구성도이다.
도 2는 본 발명에 따른 지능형 운용환경 예측 모듈(1000)의 세부 구성을 도시한 블록도이다.
도 3은 본 발명에 따른 운용 로그 수집부(200)의 세부 구성을 도시한 블록도이다.
도 4는 본 발명에 따른 운용 품질 학습부(300)의 세부 구성을 도시한 블록도이다.
도 5는 본 발명에 따른 운용 품질 예측부(400)의 세부 구성을 도시한 블록도이다.
도 6은 본 발명에 따른 품질저하 장비 추출부(500)의 세부 구성을 도시한 블록도이다.
도 7은 본 발명에 따른 ICT 설계모듈(2000)의 세부 구성을 도시한 블록도이다.
도 8은 본 발명에 따른 대개체 대상 장비를 자동으로 추출하는 방법을 도시한 흐름도이다.BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are included as a part of the detailed description to help the understanding of the present invention, provide embodiments of the present invention, and together with the detailed description, explain the technical spirit of the present invention.
1 is an overall configuration diagram of a management system for ICT equipment according to the present invention.
2 is a block diagram illustrating a detailed configuration of the intelligent operating environment prediction module 1000 according to the present invention.
3 is a block diagram illustrating a detailed configuration of the operation log collection unit 200 according to the present invention.
4 is a block diagram illustrating a detailed configuration of the operation quality learning unit 300 according to the present invention.
5 is a block diagram illustrating a detailed configuration of the operation quality prediction unit 400 according to the present invention.
6 is a block diagram showing a detailed configuration of the quality deterioration equipment extraction unit 500 according to the present invention.
7 is a block diagram showing a detailed configuration of the ICT design module 2000 according to the present invention.
8 is a flowchart illustrating a method for automatically extracting a replacement target device according to the present invention.

본 발명은 다양한 변환을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 이하에서는 특정 실시예들을 첨부된 도면을 기초로 상세히 설명하고자 한다.The present invention can apply various transformations and can have various embodiments. Hereinafter, specific embodiments will be described in detail based on the accompanying drawings.

이하의 실시예는 본 명세서에서 기술된 방법, 장치 및/또는 시스템에 대한 포괄적인 이해를 돕기 위해 제공된다. 그러나 이는 예시에 불과하며 본 발명은 이에 제한되지 않는다.The following examples are provided to provide a comprehensive understanding of the methods, apparatus and/or systems described herein. However, this is merely an example, and the present invention is not limited thereto.

본 발명의 실시예들을 설명함에 있어서, 본 발명과 관련된 공지기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략하기로 한다. 그리고, 후술되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다. 상세한 설명에서 사용되는 용어는 단지 본 발명의 실시 예들을 기술하기 위한 것이며, 결코 제한적이어서는 안 된다. 명확하게 달리 사용되지 않는 한, 단수 형태의 표현은 복수 형태의 의미를 포함한다. 본 설명에서, "포함" 또는 "구비"와 같은 표현은 어떤 특성들, 숫자들, 단계들, 동작들, 요소들, 이들의 일부 또는 조합을 가리키기 위한 것이며, 기술된 것 이외에 하나 또는 그 이상의 다른 특성, 숫자, 단계, 동작, 요소, 이들의 일부 또는 조합의 존재 또는 가능성을 배제하도록 해석되어서는 안 된다. In describing the embodiments of the present invention, if it is determined that the detailed description of the known technology related to the present invention may unnecessarily obscure the gist of the present invention, the detailed description thereof will be omitted. And, the terms to be described later are terms defined in consideration of functions in the present invention, which may vary according to intentions or customs of users and operators. Therefore, the definition should be made based on the content throughout this specification. The terminology used in the detailed description is for the purpose of describing embodiments of the present invention only, and should not be limiting in any way. Unless explicitly used otherwise, expressions in the singular include the meaning of the plural. In this description, expressions such as “comprising” or “comprising” are intended to indicate certain features, numbers, steps, acts, elements, some or a combination thereof, one or more other than those described. It should not be construed to exclude the presence or possibility of other features, numbers, steps, acts, elements, or any part or combination thereof.

또한, 제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되는 것은 아니며, 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다.In addition, terms such as first, second, etc. may be used to describe various components, but the components are not limited by the terms, and the terms are for the purpose of distinguishing one component from other components. is used only as

이하에서는, 본 발명에 따른 ICT 장비 관리 시스템 및 방법의 예시적인 실시 형태들을 첨부된 도면을 참조하여 상세히 설명한다.Hereinafter, exemplary embodiments of an ICT equipment management system and method according to the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명에 따른 ICT 장비의 관리 시스템의 전체 구성도이다.1 is an overall configuration diagram of a management system for ICT equipment according to the present invention.

본 발명은 기업의 ICT 환경에서, 운용관리 시스템(또는 네트워크 관리 시스템)(Network Management System, NMS)과 연동하여 통계정보를 생성하며, 머신러닝 알고리즘을 활용하여 운용환경의 상태를 감지하여, 각 장비가 서비스 품질에 미치는 심각도(Severity)와 원인을 예측하며, 품질 저하 장비를 자동으로 추출하여 제공함으로서, 사전에 업그레이드 및 대개체를 할 수 있도록 하여 서비스 품질을 유지할 수 있는 시스템 및 방법에 관한 것이다.The present invention generates statistical information in conjunction with an operation management system (or network management system) (Network Management System, NMS) in a corporate ICT environment, detects the state of the operation environment by using a machine learning algorithm, and each equipment It is related to a system and method that can maintain service quality by predicting the severity and cause of service quality, and automatically extracting and providing equipment that has deteriorated in quality so that it can be upgraded and replaced in advance.

본 발명에 따른 ICT 장비관리 시스템은 지능형 운용환경 예측 모듈(1000), ICT 설계 모듈(2000), 운용관리 시스템(NMS)(3000)을 포함한다.The ICT equipment management system according to the present invention includes an intelligent operating environment prediction module 1000 , an ICT design module 2000 , and an operation management system (NMS) 3000 .

운용관리 시스템(NMS)(3000)은 ICT 장비의 구성, 장애, 성능, 트래픽을 모니터링한다. 지능형 운용환경 예측모듈(1000)은 상기 운용관리 시스템(3000)이 모니터링한 운용정보에 기초하여 성능 저하 장비를 예측하고 추출한다. ICT 설계모듈(2000)은 상기 지능형 운용환경 예측모듈(1000)이 예측한 예측정보를 통해 업그레이드 또는 대개체를 추천한다. 한편, 업그레이드 또는 대개체 장비의 시연은 GUI(Graphical User Interface])로 시연(또는 표시)될 수 있다.The operation management system (NMS) 3000 monitors the configuration, failure, performance, and traffic of ICT equipment. The intelligent operating environment prediction module 1000 predicts and extracts performance degradation equipment based on the operation information monitored by the operation management system 3000 . The ICT design module 2000 recommends upgrade or replacement through the prediction information predicted by the intelligent operating environment prediction module 1000 . On the other hand, the demonstration of the upgraded or replacement equipment may be demonstrated (or displayed) by a GUI (Graphical User Interface).

본 발명은, 운용관리 시스템(NMS)(3000)과 연동하는 지능형 운용 환경 예측 모듈(1000)을 통해 지능형 운용환경 지원 시스템을 제공할 수 있다. 상기 운용환경 자원 시스템은 품질 저하를 유발하는 장비를 자동으로 추출한 후, 업그레이드나 대개체를 진행할 수 있는 시스템으로, 사전에 장애 및 품질저하를 방지할 수 있다.The present invention may provide an intelligent operating environment support system through the intelligent operating environment prediction module 1000 interworking with the operation management system (NMS) 3000 . The operating environment resource system is a system that can perform upgrade or replacement after automatically extracting equipment causing quality deterioration, and can prevent failure and quality deterioration in advance.

도 2는 본 발명에 따른 지능형 운용환경 예측 모듈(1000)의 세부 구성을 도시한 블록도이다.2 is a block diagram illustrating a detailed configuration of the intelligent operating environment prediction module 1000 according to the present invention.

지능형 운용환경 예측 모듈(1000)은 분산데이터 저장소(100), 운용 로그 수집부(200), 운용 품질 학습부(300), 운용 품질 예측부(400), 품질 저하 장비 추출부(500)를 포함한다.The intelligent operating environment prediction module 1000 includes a distributed data storage 100, an operation log collection unit 200, an operation quality learning unit 300, an operation quality prediction unit 400, and a quality deterioration equipment extraction unit 500 do.

분산데이터 저장소(100)는 데이터 베이스 또는 하둡분산 파일시스템(Hadoop Distributed File System)기반의 데이터를 저장하는 역할을 수행한다. 상기 분산데이터 저장소(100)는 대용량 데이터들을 배치성(일정량의 데이터를 모아 한번에 처리하는 방식) 또는 실시간으로 처리할 수 있는 성능이 요구되므로, 이를 고려한 설계가 필요하다.The distributed data storage 100 serves to store data based on a database or Hadoop Distributed File System. The distributed data storage 100 requires batchability (a method of collecting and processing a certain amount of data at once) or performance capable of processing large amounts of data in real time, so a design is required in consideration of this.

분산데이터 저장소(100)는 장비별 인벤토리 정보, 장비별 장애정보, 장비별 성능정보, 장비별 시스템 로그(syslog) 정보, 학습 정보, 운용 품질 예측 정보등을 저장한다.The distributed data storage 100 stores inventory information by equipment, failure information by equipment, performance information by equipment, system log information by equipment, learning information, operation quality prediction information, and the like.

장비별 인벤토리 정보는 각 NMS와 연동하여 운용하고 있는 모든 장비에 대한 설비(Facility) 정보이다. 상기 설비정보는 장비의 고유 ID, 장비의 도입 연월일, 장비가 설치된 장소, 관리자, 연락처, 제조사, IOS 버전 등의 정보일 수 있다.Inventory information for each equipment is facility information for all equipment operated in conjunction with each NMS. The facility information may be information such as a unique ID of the equipment, a date of introduction of the equipment, a place where the equipment is installed, a manager, a contact information, a manufacturer, an IOS version, and the like.

장비별 장애정보는 각 NMS와 연동하여 주기적으로 수집된 장애 정보일 수 있다.The equipment-specific failure information may be periodically collected failure information in conjunction with each NMS.

장비별 성능정보는 각 NMS와 연동하여 주기적으로 수집된 성능 정보일 수 있다.The performance information for each device may be performance information periodically collected in conjunction with each NMS.

시스템 로그 정보는 각 NMS와 연동하여 장애가 발생한 장비의 시스템 로그(syslog) 정보일 수 있다.The system log information may be system log (syslog) information of a device in which a failure occurs by interworking with each NMS.

학습정보는 운용 품질 학습부(300)에서 머신러닝을 통하여, 분석된 학습결과 정보일 수 있다.The learning information may be learning result information analyzed through machine learning in the operation quality learning unit 300 .

운용 품질 예측 결과 정보는 운용 품질 예측부(400)에서 주기적으로 분석하는 시스템에서 판단한 결과 정보일 수 있다.The operation quality prediction result information may be result information determined by the system periodically analyzed by the operation quality prediction unit 400 .

추가적으로 운용 품질 학습 모듈에서 사용할 학습에 필요한 조건등을 저장할 수 있다. 운용 로그 수집부(200), 운용 품질 학습부(300), 운용 품질 예측부(400), 품질 저하 장비 추출부(500)는 이하에서 도면을 참조하여 상세히 설명한다.Additionally, conditions necessary for learning to be used in the operation quality learning module can be stored. The operation log collection unit 200 , the operation quality learning unit 300 , the operation quality prediction unit 400 , and the quality deterioration equipment extraction unit 500 will be described in detail below with reference to the drawings.

도 3은 본 발명에 따른 운용 로그 수집부(200)의 세부 구성을 도시한 블록도이다.3 is a block diagram illustrating a detailed configuration of the operation log collection unit 200 according to the present invention.

운용 로그 수집부(200)는 관리자가 인지할 수 없는 수많은 장애 및 성능 저하와 관련된 데이터를 장치와 각종 관리시스템으로부터 수집 및 버퍼링하고 이를 가공하여 데이터베이스에 저장하는 역할을 수행한다.The operation log collection unit 200 collects and buffers data related to numerous failures and performance degradation that the administrator cannot recognize from devices and various management systems, processes them, and stores them in a database.

운용 로그 수집부(200)는 연동 블록(210), 장애 이벤트 관리 블록(220), 성능 TCA 관리 블록(230), SYS 로그 관리 블록(240), 배치 데이터 처리(250), 통계 관리 블록(260)을 포함한다.Operation log collection unit 200 is interlocking block 210, failure event management block 220, performance TCA management block 230, SYS log management block 240, batch data processing 250, statistics management block 260 ) is included.

연동 블록(210)은 IPNMS, 전송NMS, 서버관리시스템 등 운용관리 시스템들과의 연동을 담당한다.The interworking block 210 is responsible for interworking with operation management systems such as IPNMS, transmission NMS, and server management system.

연동되는 IP 장비는 라우터, L2/L3/L4스위치, 보안장비가 될 수 있으며, 전송 장비는 다중 서비스 지원 플랫폼(Multi-Service Provisional Platform, MSPP), PTN(Packet Transport Network) 장비, 광 전송망(Optical Transport Netwrok, OTN) 장비일 수 있다.Interworking IP devices can be routers, L2/L3/L4 switches, and security devices. Transmission devices are Multi-Service Provisional Platform (MSPP), PTN (Packet Transport Network) equipment, and optical transmission network (Optical). Transport Network, OTN) equipment.

또한 서버 및 전원은 인증 서버, 과금 서버, 웹서버, 캐시 서버, DHCP(Dynamic Host Configuration Protocol), DNS(Domain Name System]), UTM(Unified Threat Management)등 일 수 있다.In addition, the server and power may be an authentication server, a billing server, a web server, a cache server, DHCP (Dynamic Host Configuration Protocol), DNS (Domain Name System), UTM (Unified Threat Management), and the like.

장애 이벤트 관리 블록(220)은 각 NMS와 연동하여 분석 주기별로 장애 이벤트를 수집한다. 장애 이벤트는 물리적을 포트가 분리되는 포트 다운(port down) 또는 링크간 연결이 끊어지는 링크다운(link down)등이 될 수 있다.The failure event management block 220 collects failure events for each analysis period in association with each NMS. The failure event may be a port down in which a physical port is disconnected or a link down in which the link between links is disconnected.

성능 TCA(Thresholds Crossing Alert) 관리 블록(230)은 각 NMS와 연동하여 분석 주기별로 성능 TCA 이벤트를 수집한다. TCA 이벤트는 CPU 과부하, 메모리 이용량 과다, 디스크 고갈, 주요 프로세스 Exception 과다등에 의해 카운트 될 수 있다.The performance TCA (Thresholds Crossing Alert) management block 230 collects performance TCA events for each analysis period in association with each NMS. TCA events can be counted due to CPU overload, excessive memory usage, disk exhaustion, or excessive major process exceptions.

SYS 로그 관리 블록(240)은 각 NMS와 연동하여 분석 주기별로 장애 이벤트가 발생한 장비에 대한 시스템 로그(syslog) 정보를 연동하여 수집한다. The SYS log management block 240 interworks with each NMS to collect system log (syslog) information on equipment in which a failure event has occurred for each analysis cycle.

배치 데이터 처리 블록(250)은 특정한 조건을 주고, 이에 대한 머신 러닝의 결과를 조회하는 것을 목적으로 한다. 배치 데이터 처리블록(250)은 데이터를 학습하고, 이에 대한 결과를 저장하는 기능 처리부를 포함한다.The batch data processing block 250 provides a specific condition and aims to inquire the result of machine learning for this. The batch data processing block 250 includes a function processing unit for learning data and storing a result thereof.

통계 관리 블록(260)은 시스템 기능 블록과 운용자 조회를 위한 통계 데이터를 생성하는 기능 블록이다. 통계 관리 블록(260)은 일(日), 주(週) 또는 월(月) 별로 통계 데이터를 생성한다.The statistics management block 260 is a system function block and a function block for generating statistical data for an operator inquiry. The statistics management block 260 generates statistical data for each day, week, or month.

운용로그 수집부(200)이 저장하는 값은, 로그를 수집하는데 소요된 시간, 각 ICT 장비의 ID, 장비 Type, 장애 발생 횟수(count), 성능 TCA 횟수(count), 장애원인 코드, 품질 TCA 원인, SYSLOG 이벤트 수, SYSLOG 값일 수 있다.The value stored by the operation log collection unit 200 is the time taken to collect the log, ID of each ICT equipment, equipment type, number of failures (count), performance TCA number (count), failure cause code, quality TCA This can be the cause, the number of SYSLOG events, or the SYSLOG value.

도 4는 본 발명에 따른 운용 품질 학습부(300)의 세부 구성을 도시한 블록도이다.4 is a block diagram illustrating a detailed configuration of the operation quality learning unit 300 according to the present invention.

운용 품질 학습부(300)는 전처리 기능 블록(310), 머신러닝 처리블록(320) 및 머신러닝 환경관리 블록(330)을 포함한다.The operation quality learning unit 300 includes a preprocessing function block 310 , a machine learning processing block 320 , and a machine learning environment management block 330 .

전처리 기능 블록(310)은 수집된 운용 데이터를 기반으로 데이터 타입을 변환하는 기능을 수행한다. 구체적으로 머신러닝 처리 블록(320)에서 사용할 수 있도록 데이터의 타입을 변환하고, 데이터를 전달하는 블록이다.The pre-processing function block 310 performs a function of converting a data type based on the collected operational data. Specifically, it is a block that converts the type of data so that it can be used in the machine learning processing block 320 and transmits the data.

일 실시예로서, 머신러닝을 위하여, 저장 및 처리하는 데이터 포멧은 아래와 표 1과 같은 포멧일 수 있다.As an embodiment, for machine learning, a data format to be stored and processed may be a format as shown in Table 1 below.

[표 1][Table 1]

여기서 품질 이벤트 값은 장애 이벤트 횟수와 성능 TCA 이벤트 횟수, SYSLOG 이벤트 횟수를 합한 것이다.Here, the quality event value is the sum of the number of failure events, the number of performance TCA events, and the number of SYSLOG events.

품질저하 원인 코드에　해당되는　값은　해당되는 데이터 포맷을 분석하여　얻어진 결과치로， 장비에 관한 값, 포트에 관한 값, 제어부, 장비운용 OS, 설정 오류등을 고려할 수 있다. The 　 value corresponding to the quality degradation cause code 　 is the result obtained by analyzing the 　 data format.

서비스 심각도(Severity)값은 하나의 품질 이벤트 세트가 서비스에 미치는　영향을 1~5 레벨로 정의한 값이다. 일 실시예로서 상기 심각도 수치는 최소 1에서 최대 5의 범위로 설정될 수 있는데, 1은 Monitoring, 2는 Warning, 3은 Minor, 4는 Major, 5는 Critical 을 의미할 수 있다.The service severity value is a value that defines the influence of one set of quality events on the service as 1 to 5 levels. As an embodiment, the severity value may be set in a range of a minimum of 1 to a maximum of 5, where 1 may mean Monitoring, 2 may be Warning, 3 may be Minor, 4 may be Major, and 5 may be Critical.

머신러닝 처리 블록(320)은 CNN(Convolutional Neural Networks), RNN(Recurrent Neural Network) 등 머신 러닝 로직을 구비 하여 데이터를 학습하는 모듈이다. 머신 러닝 로직은 현재 사용되는 모든 알고리즘이 사용 가능하다.The machine learning processing block 320 is a module for learning data by having machine learning logic such as Convolutional Neural Networks (CNN) and Recurrent Neural Network (RNN). Machine learning logic can be used with any currently used algorithm.

머신러닝 처리 블록(320)은 전처리 기능블록(310)에서 정제된 값들을 이용하여, 심각도(Severity)를 예측하고, 이를 자동으로 처리할 수 있도록 실제로 데이터를 머신러닝으로 학습하는 기능 모듈이라 할 수 있다.The machine learning processing block 320 is a functional module that actually learns data through machine learning so that the severity can be predicted and automatically processed using the values refined in the preprocessing function block 310. have.

학습은 학습에 필요한 조건 등을 데이터 저장소에서 읽어 수행된다. 이때, 학습에 사용되는 머신러닝 알고리즘(CNN, RNN, 에포크(epoch) 등이 사용될 수 있다. 학습비율은 학습 데이터 70%, 검정 데이터 30% 등을 기본으로 할 수 있으며, 상기 비율은 조정될 수 있다.Learning is performed by reading the conditions necessary for learning from the data storage. In this case, a machine learning algorithm (CNN, RNN, epoch) used for learning may be used, etc. The learning rate may be based on 70% of learning data, 30% of test data, etc., and the ratio may be adjusted .

머신러닝 환경관리 블록(330)은 실시간 운용 데이터에 기반하여, 사전에 학습되어 있는 결과값으로 머신 러닝 처리부(320)와 통신을 하여, 입력과 결과값을 관리하는 기능 처리부이다.The machine learning environment management block 330 is a function processing unit that communicates with the machine learning processing unit 320 with a result value learned in advance based on real-time operation data, and manages the input and the result value.

머신러닝 환경관리 블록(330)은 학습된 데이터를 기준으로 운용되는 데이터 수집 처리부로부터 주기적으로 데이터를 전달받아, 심각도(Severity)를 자동으로 처리하여 값을 저장한다.The machine learning environment management block 330 receives data periodically from the data collection processing unit operated based on the learned data, automatically processes the severity and stores the value.

또한 동작 중인 학습 알고리즘의 운용 환경과 학습 데이터 정의 등을 운용 시간과 함께 저장하여, 새로운 머신 러닝이 동작하는 경우, 상기 정보를 로드하여 사용하거나, 결과 데이터를 분석할 때 참조 할 수 있도록 하는 학습 프로파일 관리 기능을 수행한다.In addition, a learning profile that stores the operating environment of the learning algorithm in operation and the definition of the learning data along with the operating time, so that when new machine learning operates, the information is loaded and used or can be referred to when analyzing the result data It performs management functions.

도 5는 본 발명에 따른 운용 품질 예측부(400)의 세부 구성을 도시한 블록도이다.5 is a block diagram illustrating a detailed configuration of the operation quality prediction unit 400 according to the present invention.

운용 품질 예측부(400)는 운용로그 수집부(200)에 의해 저장되어 있는 데이터로 정기적인 운용을 위한 데이터를 생성하는 모듈이다. 운용 품질 예측부(400)는 전처리 기능 블록(410), 머신러닝 환경관리 블록(420)을 포함한다.The operation quality prediction unit 400 is a module for generating data for regular operation with data stored by the operation log collection unit 200 . The operation quality prediction unit 400 includes a preprocessing function block 410 and a machine learning environment management block 420 .

주기적으로 저장되어 있는 운용 로그 데이터를 읽어(예를 들어, 일 1회, 00시 정각) 전처리 기능 블록(410)을 통해, 앞서 표 1 과 같은 형식으로 처리하고, 운용 품질 학습부(300)에서 머신 러닝을 만들어진 알고리즘의 룰에 의해 심각도(Severity) 및 품질 저하 원인 값을 구하여 저장한다. 심각도는 인공지능 알고리즘에 의하여, 유사도를 기준으로 가장 관련성이 높은 값이 시스템에서 자동으로 생성된다.Reads the operation log data stored periodically (eg, once a day, at 00:00) and processes it in the format shown in Table 1 above through the preprocessing function block 410, and in the operation quality learning unit 300 According to the rules of the algorithm made by machine learning, the severity and the cause of quality deterioration are obtained and stored. The most relevant value based on the similarity is automatically generated in the system by an artificial intelligence algorithm.

이때, 머신 러닝의 동작은 운용 품질 학습부(300)의 머신러닝 처리 블록(320)에서 학습의 결과로 생성된 프로파일을 그대로 적용한다. In this case, the machine learning operation applies the profile generated as a result of learning in the machine learning processing block 320 of the operation quality learning unit 300 as it is.

운용품질 예측부(400)는 운용품질 학습부(300)과 비교하여 머신러닝 환경관리 블록(330)을 제외하고는 동일한 기능 동작을 수행한다. 다만, 앞서 운용품질 학습부(300)에서는 학습을 통한 룰을 만들기 위해 데이터를 사용했다면, 운용 품질 예측부(400)는 실제로 운용하는 데이터를 가공하는 역할을 수행한다.The operation quality prediction unit 400 performs the same functional operation as the operation quality learning unit 300 except for the machine learning environment management block 330 . However, if the operation quality learning unit 300 previously used data to make a rule through learning, the operation quality prediction unit 400 performs a role of processing the data actually operated.

도 6은 본 발명에 따른 품질저하 장비 추출부(500)의 세부 구성을 도시한 블록도이다.6 is a block diagram showing a detailed configuration of the quality deterioration equipment extraction unit 500 according to the present invention.

품질저하 장비 추출부(500)는 시스템의 주요 동작인 대개체 대상 장비를 자동으로 추출하는 기능을 수행한다. 품질저하 장비 추출부(500)는 품질저하 프로파일 관리 블록(510) 및 품질저하 장비 추출 기능블록(520)을 포함한다.The quality deterioration equipment extraction unit 500 performs a function of automatically extracting replacement target equipment, which is a main operation of the system. The quality deterioration equipment extraction unit 500 includes a quality deterioration profile management block 510 and a quality deterioration equipment extraction function block 520 .

품질저하 프로파일 관리 블록(510)은 대개체 대상이 되는 품질 저하 장비의 조회 조건을 저장 관리하는 기능 블록이다.The quality degradation profile management block 510 is a functional block for storing and managing inquiry conditions of the quality degradation equipment, which is a replacement target.

예를 들어, 네트워크 크기가 중급인 회사에서 서버 장비 중에서 최소 3년 이상된 장비 중에서, 서비스 등급 3으로 주요 서비스를 제공하는 서버이며, 사용자 1000명 / 동시 사용자 50명 이상으로 도입된 지 5년인 장비가 한달 동안 200회의 장애를 발생하고, 성능 TCA를 500회 발생하고, 관련된 Syslog 를 2000건, 운용 VoC를 20회로 Severity 3 이상 품질 저하 알람을 100회 발생한 장비에 대해, 운용에 있어 심각한 문제가 있다고 평가되어 시스템에서 장비의 업그레이드나 대개체가 필요하다고 추출될 수 있다.For example, among the server equipment of a company with a medium network size, among the equipment that is at least three years old, it is a server that provides major services with service class 3, and equipment that has been introduced for five years with 1000 users / 50 concurrent users or more There is a serious problem in operation for equipment that has 200 failures per month, 500 performance TCAs, 2000 related Syslogs, 20 operational VoCs and 100 Severity 3 or higher quality degradation alarms. It can be evaluated and extracted that the system needs to be upgraded or replaced.

이 때, 기준이 되는 관련 프로파일 정보는 하기 표 2와 같을 수 있다.In this case, the relevant profile information serving as a reference may be as shown in Table 2 below.

[표 2][Table 2]

상기의 프로파일 항목은 전체를 한꺼번에 적용 또는 개별 칼럼별로 적용 등 다양한 조합으로 사용이 가능하다. 전체에 대한 적용을 디폴트 적용이라고 생각할 수 있으며, 운용 GUI 화면에서 임의기간 조회, 적용 임계치 항목을 별도로 설정하여 기능 동작할 수도 있다.The above profile items can be used in various combinations, such as applying all at once or by applying each column. The application to the whole can be considered as the default application, and it is also possible to operate the function by separately setting the inquiry and application threshold items for an arbitrary period on the operation GUI screen.

장비 도입일은 노후화 장비로 인식하는 장비의 도입 기준일을 의미한다. 조회 시 현재일 기준으로 3년 또는 5년을 역산하여 날짜로 계산하여 입력한다.Equipment introduction date refers to the date of introduction of equipment that is recognized as obsolete equipment. In case of inquiry, calculate 3 or 5 years from the current date and input it as a date.

품질 이벤트 일별 임계치는 일별 품질 이벤트 수에 대한 디폴트 임계치를 의미한다.The quality event daily threshold means a default threshold for the number of quality events per day.

장애 이벤트 임계치는 주기별(예를 들어, 디폴트 1 시간)로 수집되는 장비별 운용 데이터에서 해당 기간동안 발생한 장애 이벤트 임계치를 의미한다.The failure event threshold means the failure event threshold that occurred during the corresponding period in the equipment-specific operation data collected by cycle (eg, default 1 hour).

성능 이벤트 임계치는 품질 저하 장비로 판단되는 수집 주기별 장비의 성능 이벤트(CPU, MM, DISK 이용율 등)에 대한 임계치를 의미한다.The performance event threshold means the threshold for the performance event (CPU, MM, DISK utilization rate, etc.) of the equipment for each collection period that is judged to be a quality deterioration equipment.

운용 VoC(Voice of Customer) 수 임계치는 해당 장비에 접수되는 고객불만사항에 대한 임계치를 의미한다.Operational VoC (Voice of Customer) number threshold means the threshold for customer complaints received by the equipment.

SYSLOG 이벤트 임계치는 주기별 연동되어 수집되는 장비별 SYSLOG 데이터에서 "사전에 품질에 관련된 항목이라고 판별된 SYSLOG"의 횟수(count)에 대한 임계치를 의미한다.The SYSLOG event threshold means the threshold for the count of “syslogs determined in advance to be quality-related items” in SYSLOG data for each equipment that is interlocked and collected by cycle.

적용품질 심각도(Severity)는 장비별 장애, 성능. SYSLOG 등의 운용 통계 정보 원시 데이터(Raw data)에서 기준이 되는 심각도(Severity) 임계치를 의미한다.The severity of the applied quality is the failure and performance of each equipment. It means a severity threshold that is a standard in raw data of operational statistics information such as SYSLOG.

서비스 등급은 장비와 관련된 서비스 등급의 임계치를 의미한다. 이를 설정하여 장비가 시험용/대기 장비 등 또는 사용자 수가 적은 서비스 등급이 낮은 장비를 제외할 수 있다.The service class means the threshold of the service class related to the equipment. By setting this, the equipment can exclude equipment with low service class, such as test/standby equipment, or a small number of users.

품질 저하 지속 임계치는 품질 저하 장비로 판단하는 품질 저하 일별 횟 수(count) (예를 들어, 일일 24회)에 대한 임계치를 의미한다.The quality deterioration continuity threshold means a threshold value for the number of times per day (eg, 24 times a day) of quality deterioration determined by the quality deterioration equipment.

품질저하 장비 추출 기능블록(520)은 머신 러닝에 의한 품질 저하 장비의 예측을 통해, 그 결과 값을 러닝 과정에서 저장된 데이터에 기반하여 확률로 나타내는 기능을 수행하는 모듈이다. The quality deterioration equipment extraction function block 520 is a module that performs a function of representing the result value as a probability based on data stored in the learning process through prediction of the equipment deterioration quality by machine learning.

표 1에 제시된 장비는 (2015-05-DaeJeon-SVR100, 90%)와 같이 결과가 도출될 수 있는데, 예를 들어, 사용자가 장비 추출 기준을 ⅰ) 장비 추출기준을 Severity 3이상, ⅱ) 서비스 등급 3이상, ⅲ) 품질저하 지속 임계치 20 이상, ⅳ) 장애 이벤트 임계치 10 이상과 같은 기준으로 설정하였을 때, 상기 설정기준을 90 % 만족하는 장비라면 90 %로 산출되고, 기준을 모두 만족하면 100 %로 산출된다.For the equipment presented in Table 1, the result can be derived as (2015-05-DaeJeon-SVR100, 90%). For example, the user selects the equipment extraction standard i) Severity 3 or higher, ii) the service Grade 3 or higher, iii) Quality degradation continuity threshold of 20 or higher, iv) Failure event threshold of 10 or higher When the same criteria are set, if the equipment satisfies 90% of the above setting criteria, it is calculated as 90%, and if all criteria are satisfied, it is calculated as 100 It is calculated in %.

품질저하 장비 추출 기능블록(520)에서는 업그레이드 및 대개체의 대상이 되는 장비의 리스트를 데이터 베이스(DB)에서 조회하여, "품질 저하 프로파일"을 기준으로 관련된 장비를 추출하는 기능을 담당한다.In the quality deterioration equipment extraction function block 520, a list of equipment to be upgraded and replaced is inquired from the database (DB), and related equipment is extracted based on the “quality deterioration profile”.

월/연간/특정 기간별로 조회하는 경우, 조회되는 결과는 하기와 같을 수 있다. When searching by month/year/specific period, the searched result may be as follows.

1. 서비스 품질 저하 장비 수 / 전체 장비 수1. Number of equipment with poor service quality / Total number of equipment

2. 조회기간, 장비명, 장비 타입, 심각도(Severity), 장비 도입일, 기간 동안 품질저하 이벤트 수, 누적 이벤트 수, 연간 교체 대상 등록 횟수, 품질 저하 원인, 도입연월일 등이 조회될 수 있다.2. Inquiry period, equipment name, equipment type, severity, equipment introduction date, number of quality deterioration events during period, cumulative number of events, number of registrations subject to annual replacement, cause of quality deterioration, date of introduction, etc. can be inquired.

추가적으로 해당 장비의 포트 수, 제공 서비스, 서비스 중요도, 클라이언트 수, 품질 저하 지속 횟수(count)가 조회될 수 있다.Additionally, the number of ports of the device, the service provided, the service importance, the number of clients, and the continuous count of quality degradation may be inquired.

상기 조회된 결과는 주기적으로 장애 및 성능 문제로 인하여, 서비스 품질 저하에 영향을 미치는 장비로 판단되어, 노후화된 장비로 자동으로 추출된다. 이는 장비의 이용률 및 서비스의 중요도 등을 고려하여 결정된다.The inquired result is determined to be equipment that has an effect on service quality degradation due to periodic failures and performance problems, and is automatically extracted as outdated equipment. This is determined in consideration of the equipment utilization rate and the importance of the service.

도 7은 본 발명에 따른 ICT 설계모듈(2000)의 세부 구성을 도시한 블록도이다.7 is a block diagram showing a detailed configuration of the ICT design module 2000 according to the present invention.

ICT 설계모듈(2000)은 추출한 품질 저하 장비를 언제, 어떤 장비로 교체할지 판단하는 블록이다. 즉, ICT 설계모듈(2000)은 장비의 대개체 시점 및 규모를 추천하는 모듈로, 품질 영향도 분석 블록(600), 성능 예측 블록(700), 대개체 시뮬레이터 블록(800) 및 업그레이드 시뮬레이터 블록(900)을 포함한다.The ICT design module 2000 is a block for determining when and with which equipment to replace the extracted quality-degrading equipment. That is, the ICT design module 2000 is a module that recommends the replacement timing and size of equipment, and includes a quality impact analysis block 600 , a performance prediction block 700 , a replacement simulator block 800 , and an upgrade simulator block ( 900).

품질 영향도 분석 블록(600)은 지능형 운용 환경 예측부에서 자동으로 추출된 품질 저하 장비(대개체 대상 장비)를 대상으로 연관 장비들의 장애와 품질 상관도를 계산한다. 품질 상관도의 수치가 큰 장비는 다수의 장비에 영향을 미치는 장비로, 이러한 장비의 품질저하는 네트워크 운용 및 서비스 제공에 있어 심각한 영향을 미칠 수 있는 것으로 판단된다.The quality impact analysis block 600 calculates the quality correlation with the failure of the related equipment for the quality degradation equipment (generally target equipment) automatically extracted from the intelligent operating environment prediction unit. Equipment with a high quality correlation is equipment that affects a large number of equipment, and it is judged that the quality deterioration of such equipment can have a serious impact on network operation and service provision.

또한 품질 영향도 분석 블록(600)은 해당 장비의 리소스 이용 현황을 예측하여, 해당 리소스가 완전히 고갈되는 시점을 머신러닝 알고리즘을 통해 예측하고, 대개체 추천모델이 되는 장비를 추출한다.In addition, the quality impact analysis block 600 predicts the resource usage status of the corresponding equipment, predicts the time when the corresponding resource is completely depleted through a machine learning algorithm, and extracts the equipment that is a general recommendation model.

성능 TCA가 높게 측정되는 장비는, 현재 서비스 자체에는 문제가 없지만, 이 현상이 동일하게 진행이 된다면, 장단기적으로 서비스 장애가 일어날 수 있음을 내포하고 있다. 이들 성능 지표 항목에서의 과부하 및 리소스 문제가 품질저하로 이어지고, 이후 해당 리소스의 고갈은 장애로 이어질 수 있다.Equipment with high performance TCA has no problem with the current service itself, but if this phenomenon continues, it implies that service failure may occur in the short and long term. Overload and resource problems in these performance index items may lead to quality degradation, and subsequent depletion of the corresponding resource may lead to failure.

품질 영향도 분석 블록(600)은 이러한 항목의 리소스 이용 현황을 예측하여, 해당 리소스가 완전히 고갈되는 시점을 머신러닝 알고리즘으로 예측하여, 장비 대개체 시점을 지정할 수 있다.The quality impact analysis block 600 predicts the resource usage status of these items, predicts the time when the corresponding resource is completely depleted with a machine learning algorithm, and can designate the time of equipment replacement.

한편, 품질저하 장비의 대개체 추천 장비를 추출 시, 품질저하 장비의 모델과 제공 서비스(웹, DB, 데몬 서버, 개발 서버 등), 규모(대기업, 중견기업, 소기업, 벤처), 해당 업계(통신 서비스, 제조, ICT, 식료, 섬유, 목재, 임업, 자동차, 철강 등), 서비스 이용 고객 수(전체, 동시 이용)를 고려한다.On the other hand, when extracting replacement recommended equipment for equipment with reduced quality, the model and service provided (web, DB, daemon server, development server, etc.) Telecommunication services, manufacturing, ICT, food, textiles, wood, forestry, automobiles, steel, etc.), and the number of customers using the service (total and simultaneous use) are taken into account.

성능 예측 블록(700)은 대개체의 대상이 되는 품질저하 장비의 CPU, MM, DISK 등 리소스 사용 추이에 대한 예측을 수행하는 블록이다.The performance prediction block 700 is a block for performing prediction on resource usage trends, such as CPU, MM, DISK, etc. of the quality degradation equipment which is the target of the general replacement.

대개체 시뮬레이터 블록(800)은 추천된 대개체 장비로 리소스 성능을 보완하였을 때, 예상되는 품질 영향도를 계산하는 블록이다.The replacement simulator block 800 is a block for calculating the expected quality impact when resource performance is supplemented with the recommended replacement equipment.

업그레이드 시뮬레이터 블록(900)은 추천된 업그레이드 장비로 대개를 진행하여 리소스 성능을 보완하였을 때, 예상되는 품질 영향도를 계산하는 블록이다.The upgrade simulator block 900 is a block that calculates an expected quality impact when the resource performance is supplemented by performing most of the recommended upgrade equipment.

도 8은 본 발명에 따른 대개체 대상 장비를 자동으로 추출하는 방법을 도시한 흐름도이다.8 is a flowchart illustrating a method for automatically extracting a replacement target device according to the present invention.

S10 단계는 필요시 관리자가 운용 품질 학습부(300)에 모든 ICT 장비의 학습 프로파일을 설정하고 갱신하는 단계이다. 운용 품질 학습 모듈에서 사용하는 학습에 관련된 조건, 즉 각종 파라미터 등을 설정하거나 업데이트 하는 과정, 예를 들어, 머신 러닝 학습 주기, 관련 데이터 용량, 학습과 검증의 비율 등을 결정한다.Step S10 is a step in which the manager sets and updates the learning profile of all ICT equipment in the operation quality learning unit 300 if necessary. It determines the learning-related conditions used in the operational quality learning module, that is, the process of setting or updating various parameters, for example, the machine learning learning cycle, related data capacity, and the ratio of learning and verification.

S20 단계는 운용 프로파일을 설정하고 갱신하는 단계로, 필요시 관리자가 운용 로그 수집부(200)에서 사용하는 각종 프로파일을 설정하거나 업데이트한다. 예를 들어, 각 NMS 에서 데이터를 연동하는 주기, 통계 데이터를 생성하는 배치 작업을 하는 시간 및 주기 등을 설정하거나 업데이트 할 수 있다.Step S20 is a step of setting and updating an operation profile, and the administrator sets or updates various profiles used in the operation log collection unit 200 if necessary. For example, it is possible to set or update the period for interworking data in each NMS, the time and period for batch work that generates statistical data, and the like.

S30 단계는 관리자가 품질저하 장비 추출부(500)에서 사용하는 각종 프로파일을 설정하거나 업데이트 하는 단계이다.Step S30 is a step in which the administrator sets or updates various profiles used in the quality deterioration equipment extraction unit 500 .

S40 단계는 장애, 성능, syslog 데이터를 수집하는 단계로, 운용로그 수집부(200)에서 각 NMS에 연동하여 각종 운용로그를 수집한다.Step S40 is a step of collecting failure, performance, and syslog data, and the operation log collection unit 200 collects various operation logs by interworking with each NMS.

S50 단계는 학습 프로파일을 로드하는 단계로, 시스템에서 운용품질 학습부(300)에서 사용하는 학습에 관련된 설정값을 데이터베이스에서 읽어 메모리에 로드한다.Step S50 is a step of loading a learning profile, and the set value related to learning used in the operation quality learning unit 300 in the system is read from the database and loaded into the memory.

S51 단계는 품질저하 장비 추출 프로파일을 로드하고 갱신하는 단계로, 시스템에서 품질저하 장비 추출부(500)에서 사용하는 프로파일을 데이터베이스에서 읽어 메모리에 로드한다.Step S51 is a step of loading and updating the quality deterioration equipment extraction profile, and the system reads the profile used by the quality deterioration equipment extraction unit 500 from the database and loads it into the memory.

S60 단계는 운용 로그 수집부(200)에서 각 NMS(3000)로부터 연동하여 가져온 데이터를 하둡과 같은 분산 데이터 저장부(100)에 임시로 저장하는 단계이다.Step S60 is a step in which the operation log collection unit 200 temporarily stores data obtained by interworking from each NMS 3000 in the distributed data storage unit 100 such as Hadoop.

S70 단계는 운용 품질 학습부(300)에서 설정된 주기에 따라 학습 데이터를 분산 데이터 저장부(100)에서 읽어오는 단계이다.Step S70 is a step of reading the learning data from the distributed data storage unit 100 according to a cycle set in the operation quality learning unit 300 .

S80 단계는 학습한 데이터를 로딩하는 단계로, 분산 데이터 저장부(100)에서 읽어온 학습 데이터를, 머신러닝 알고리즘 수행 프로세스에 로딩하는 과정이다.Step S80 is a step of loading the learned data, and is a process of loading the learning data read from the distributed data storage unit 100 into the machine learning algorithm execution process.

S90 단계는 수집 데이터를 전처리 하는 단계이다.Step S90 is a step of pre-processing the collected data.

S100 단계는 머신 러닝의 학습 결과를 운용 품질 예측부(400)에 전달하는 단계이다. 운용 품질 예측부(400)는 학습 결과로 전달된 각종 파라미터와 머신러닝 알고리즘을 그대로 적용하는 머신러닝 운용 모델이라고 할 수 있다.Step S100 is a step of transferring the learning result of machine learning to the operation quality prediction unit 400 . The operation quality prediction unit 400 can be said to be a machine learning operation model that applies various parameters and machine learning algorithms delivered as a learning result as it is.

S110 단계는 운용 품질 예측 모듈을 설정하는 단계로, 학습의 결과를 운용품질 예측부(400)에 반영한다.Step S110 is a step of setting the operation quality prediction module, and the result of learning is reflected in the operation quality prediction unit 400 .

S120 단계는 운용품질 예측부(400)에서 운용 품질 예측을 위하여 필요한 데이터를 분산 데이터 저장부(100)에서 데이터를 읽어오는 단계이다. 즉, 운용 프로파일에 따라 필요한 데이터를 주기적으로 로딩한다.Step S120 is a step of reading data from the distributed data storage unit 100 for data required for operation quality prediction in the operation quality prediction unit 400 . That is, the necessary data is periodically loaded according to the operation profile.

S130 단계는 S120 단계에서 읽어온 데이터의 전처리를 수행하는 단계이다. 읽어온 데이터를 "운용 품질 학습부(300)"와 동일하게 데이터의 전처리 과정을 수행한다.Step S130 is a step of performing pre-processing of the data read in step S120. The read data is pre-processed in the same way as the "operation quality learning unit 300".

S140 단계는 운용 품질 예측 프로세스 수행 후, 그 결과를 다시 분산 데이터 저장부(100)에 저장하는 단계이다.Step S140 is a step of storing the result in the distributed data storage unit 100 again after the operation quality prediction process is performed.

S150 단계는 설정된 조건에 의한 각종 통계 데이터를 생성하는 단계이다. 시/일/주/월 별로 통계를 생성할 수 있다.Step S150 is a step of generating various statistical data according to set conditions. Statistics can be created by hour/day/week/month.

S160 단계는 품잘저하 장비 추출부(500)에서 사용하는 데이터를 운용프로파일에 따라 주기적으로 로딩하는 단계이다.Step S160 is a step of periodically loading data used by the quality deterioration equipment extraction unit 500 according to the operation profile.

S170 단계는 품질 저하 장비 추출 알고리즘을 동작하여 품질 저하 장비를 추출하는 단계이다.Step S170 is a step of extracting the quality deterioration equipment by operating the quality deterioration equipment extraction algorithm.

S180 단계는 해당 추출 결과를 저장하는 단계이다.Step S180 is a step of storing the extraction result.

S190 단계는 조회 조건에 따른 품질 저하 장비 리스트 및 세부 내역을 조회하는 단계이다.Step S190 is a step of inquiring a list of quality deterioration equipment and detailed details according to the inquiry condition.

이상의 설명은 본 발명의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서 본 발명에 기재된 실시예는 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 한정되는 것은 아니다. 본 발명의 보호 범위는 아래의 청구범위에 의해서 해석되어야 하며, 그와 균등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely illustrative of the technical spirit of the present invention, and various modifications and variations will be possible without departing from the essential characteristics of the present invention by those skilled in the art to which the present invention pertains. Accordingly, the embodiments described in the present invention are not intended to limit the technical spirit of the present invention, but to explain, and are not limited to these embodiments. The protection scope of the present invention should be interpreted by the following claims, and all technical ideas within the scope equivalent thereto should be construed as being included in the scope of the present invention.

1000 : 지능형 운용환경 예측모듈
100 : 분산 데이터 저장부
200 : 운용 로그 수집부
210 : 연동블록 220 : 장애 이벤트 관리 블록
230 : 성능 TCA 관리 블록 240 : SYS 로그 관리 블록
250 : 배치 데이터 처리 블록
260 : 통계 관리 블록
300 : 운용 품질 학습부
310 ; 전처리 기능 블록 320 : 머신 러닝 처리 블록
330 : 머신 러닝 환경관리 블록
400 : 운용 품질 예측부
410 : 전처리 기능 블록 420 : 머신 러닝 처리 블록
500 : 품질 저하 장비 추출부
2000 : ICT 설계모듈
600 : 품질 영향도 분석 모듈
700 : 성능 예측 모듈
800 : 대개체 시뮬레이터 모듈
900 : 업그레이드 시뮬레이터 모듈
3000 : 운용관리 시스템(NMS)1000: intelligent operating environment prediction module
100: distributed data storage unit
200: operation log collection unit
210: interlocking block 220: failure event management block
230: performance TCA management block 240: SYS log management block
250: batch data processing block
260: statistics management block
300: operation quality learning unit
310; preprocessing function block 320: machine learning processing block
330: machine learning environment management block
400: operation quality prediction unit
410: preprocessing function block 420: machine learning processing block
500: quality deterioration equipment extraction unit
2000: ICT design module
600: quality impact analysis module
700: performance prediction module
800: Most Simulator Module
900 : Upgrade Simulator Module
3000: operation management system (NMS)

Claims

Distributed data storage for storing device information;
an operation log collection unit that collects operation data related to device failure and performance;
an operation quality learning unit for learning the operation data based on machine learning logic;
an operation quality prediction unit for calculating operation quality prediction information including a severity through the learning; and
A quality degradation profile management block for managing a quality degradation profile including a query condition of a quality degradation device to be replaced, and a quality degradation device for detecting the quality degradation equipment based on the operation quality prediction information and the quality degradation profile Degraded equipment management device comprising a; quality deterioration equipment extraction unit including an extraction function block.

According to claim 1,
Operation data stored in the operation log collection unit is a device failure event, a threshold value crossing warning (Thresholds Crossing Alert, TCA) event and system log information, characterized in that it includes a quality degradation equipment management apparatus.

According to claim 1,
The operational quality learning unit, a machine learning processing block for performing machine learning, a quality degradation equipment management device, characterized in that it comprises a pre-processing function block for converting a data type so that the machine learning processing block can learn.

According to claim 1,
The inquiry condition of the degraded equipment is the equipment introduction date, quality event threshold, failure event threshold, performance event threshold, operational Voice of Customer (VoC) event threshold, system log (SYSLOG) event threshold, severity, and service level. Equipment management device for quality degradation, characterized in that the condition including any one or more selected.

According to claim 1,
a quality impact analysis block for evaluating the quality degradation equipment and related equipment failure and quality correlation;
a performance prediction block for predicting resource use of the quality degradation equipment; and
a replacement simulator block that calculates the expected quality impact when supplementing with the recommended replacement equipment;
Quality degradation equipment management device, characterized in that it further comprises an ICT design module comprising a.

6. The method of claim 5,
The quality impact analysis block predicts the resource usage status of the equipment, extracts a recommended model of the replacement target equipment, and performs the process of predicting the replacement time.

7. The method of claim 6,
The information on the recommended model of the replacement target equipment is displayed as a graphical user interface (GUI).