KR20210039040A

KR20210039040A - Obstacle prediction and obstacle prediction modeling management system

Info

Publication number: KR20210039040A
Application number: KR1020190121336A
Authority: KR
Inventors: 윤성군
Original assignee: 주식회사 아이옵스테크놀러지
Priority date: 2019-10-01
Filing date: 2019-10-01
Publication date: 2021-04-09
Also published as: KR102281431B1

Abstract

The present application relates to a management system which performs a modeling through deep learning and predicts a failure. More specifically, the present application relates to a technology which allows for different modeling results extracted from several modelings to be used on one platform. The management system includes a step of predicting a failure caused by the transmission and reception of data, calculating and modeling a probability of failure, and selecting an optical modeling. The management system comprises: a data collection unit which collects data; a failure probability prediction unit which predicts the probability of a failure based on the collected data; a data collection cycle adjusting unit which adjusts a data collection cycle; and a modeling storage unit which stores the modeling and selects an optimal modeling. Physically-separate modelings can be operated as a single resource by using the management system, and resources allocated therefor can be efficiently utilized.

Description

Failure prediction and failure prediction modeling management system {OBSTACLE PREDICTION AND OBSTACLE PREDICTION MODELING MANAGEMENT SYSTEM}

본 출원은 딥러닝을 통해 모델링하고 장애를 예측하는 관리 시스템이다. 좀 더 구체적으로는, 본 출원은 여러 개의 모델링에서 추출된 서로 다른 모델링 결과를 하나의 플랫폼에서 사용 가능하게 하는 기술이다.This application is a management system that models through deep learning and predicts a failure. More specifically, the present application is a technology that enables different modeling results extracted from several modelings to be used on one platform.

현대 사회에서 정보시스템의 발전으로 인한 편리함은 배제할 수 없게 되었다. 특히 요즘 다양한 분야에서 사용 되고 있는 인공지능이 그 예이다. 자동화 기능에 바탕이 되는 인공지능은 다양한 시장에 분포되어 있다.In modern society, convenience due to the development of information systems cannot be excluded. In particular, artificial intelligence, which is used in various fields these days, is an example. Artificial intelligence, which is based on automation functions, is distributed in various markets.

인공지능을 이용하여 장애를 예측하는 기술은 IT 자원 구성 요소 별 직접적인 장애 요건 정의를 통해 필요한 핵심 데이터만 추출 가능하고, 학습 과정에서 필요한 데이터를 수시로 추가하는 기능을 제공한다. 또한 시스템 로그기반이 아닌 IT 자원구성 요소 별 실시간 운영 현황 정보 기반의 장애 예측 정보에 대해 알 수 있다. The technology that predicts disability using artificial intelligence can extract only necessary core data through direct disability requirements definition for each IT resource component, and provides the function of adding necessary data at any time during the learning process. In addition, it is possible to know the failure prediction information based on real-time operation status information for each IT resource component, not based on system log.

인공 지능의 일 예인 딥러닝(Deep Learning)은 기계학습과 달리 인간의 개입을 최소화 하고 데이터를 있는 그대로 학습하여 데이터에 대한 특징 또한 기계 스스로 학습한다. 딥러닝(Deep Learning)은 인공신경망(Artifical Neural Network) 기법 중 하나로 입력층과 출력층 사이에 다수의 은닉층(Hidden Layer)을 가지고 있는 모델이다. 딥러닝(Deep Learning)은 다수의 데이터를 신경망 구조를 통해 높은 정확도를 가진 결과값을 생성하며 특정 환경에 대한 사전 이해가 전혀 필요 없는 장점을 가지고 있다.Unlike machine learning, deep learning, an example of artificial intelligence, minimizes human intervention and learns the data as it is, so that the machine learns the characteristics of the data by itself. Deep Learning is one of the artificial neural network techniques, and is a model that has multiple hidden layers between the input layer and the output layer. Deep Learning generates a result value with high accuracy through a neural network structure of a large number of data, and has the advantage that it does not require any prior understanding of a specific environment.

또한 본 출원은 인공지능 분야의 여러 방법론을 적용하여 기존 방법론과 예측력을 비교 분석 하였다. 머신러닝(Machine Learning), 딥러닝(Deep Learning) 등의 용어로 대표되는 인공지능 분야는 컴퓨터 공학을 이용하여 인간의 두뇌와 같이 컴퓨터가 학습 과정을 거쳐 예측 프로세스 등의 의사결정을 수행하는 체계를 의미한다. In addition, this application compared and analyzed the existing methodology and predictive power by applying several methodologies in the field of artificial intelligence. The field of artificial intelligence, represented by terms such as machine learning and deep learning, uses computer engineering to describe a system in which a computer, like the human brain, performs a decision-making process such as a prediction process through a learning process. it means.

과거에는 다양하고 동시 다발적인 경우의 수를 처리하는데 있어 물리적으로 발생하는 한계로 인하여 주목 받지 못하였으나, 최근 ‘Google’사의 ‘AlphaGo’로 대표되는 딥러닝(Deep Learning) 체계가 실제 인간의 판단 수준과 속도가 대등하거나 오히려 능가할 수 있다는 것을 증명함으로서 전 세계적으로 큰 관심을 받고 있다. In the past, it was not noticed due to the limitation that occurs physically in processing the number of various and simultaneous cases, but the deep learning system represented by'AlphaGo' of'Google' is the level of real human judgment. It is gaining great attention around the world by proving that the speed and speed can be equal or surpassed.

인공지능은 학습 데이터가 많을수록 예측력이 우수해지는 특성이 있으므로, 텍스트 데이터 등의 빅데이터를 원천으로 활용하는 본 출원과 같은 예측 과정에서 더욱 우수한 효과를 기대할 수 있다. Since artificial intelligence has a characteristic that the more the learning data is, the better the predictive power is, so it can be expected to have more excellent effects in the prediction process like this application that uses big data such as text data as a source.

빅데이터 및 인공지능은 제4차 산업혁명의 핵심 기술로서 여러 분야에서 많은 관심을 받고 있지만, 관리, 재무 영역의 연구에 적용된 사례는 아직은 많지 않다. 따라서 본 연구는 관리 시스템에 이러한 새로운 방법론 적용을 시도하고 유용성을 실증해보고자 하였다. Big data and artificial intelligence are the core technologies of the 4th industrial revolution and are receiving a lot of attention in various fields, but there are not many cases applied to research in the areas of management and finance. Therefore, this study attempted to apply this new methodology to the management system and tried to demonstrate its usefulness.

본 출원은 기존의 물리적으로 별개의 모델링들이 하나의 리소스로 운영 가능하게 되고 이를 위해서 할당한 자원을 효율적으로 활용하도록 하는 관리 시스템을 제공하는 데에 그 목적이 있다.The object of the present application is to provide a management system that enables existing physically separate models to be operated as a single resource and to efficiently utilize the allocated resources for this purpose.

본 출원의 실시 예에 따른 관리 시스템은 데이터를 주고받으며 발생하는 장애를 예측고 장애 발생 확률을 연산하여 모델링하고 최적의 모델링을 선택하는 단계를 포함한다. 관리 시스템은 데이터 수집을 하는 데이터 수집부, 수집된 데이터에 기초하여 장애 확률을 예측하는 장애 확률 예측부와 데이터 수집 주기를 조절하는 데이터 수집 주기 조절부, 모델링을 저장하고 최적의 모델링을 선택하는 모델링 저장부로 구성 되어 있다.The management system according to the exemplary embodiment of the present application includes predicting a failure occurring while exchanging data, calculating a probability of occurrence of a failure, and modeling, and selecting an optimal modeling. The management system includes a data collection unit that collects data, a failure probability prediction unit that predicts failure probability based on the collected data, a data collection cycle control unit that adjusts the data collection cycle, and modeling that stores modeling and selects the optimal modeling. It consists of a storage unit.

데이터 수집부는 적어도 하나의 외부 클라이언트 혹은 서버와 통신하며, 적어도 하나의 외부 클라이언트 혹은 서버로부터 데이터를 미리 설정한 주기에 따라 수집하는 것이다. 장애 확률 예측부는 데이터 수집부에서 수집된 데이터에 기초하여, 장애 발생 확률을 연산하고, 장애 확률 예측 결과를 딥러닝을 통해 모델링 한다. 데이터 수집 주기 조절부는 장애 확률 예측부에서 연산된 장애 발생 확률에 기초하여, 데이터 수집부의 데이터 수집 주기를 조절 하는 역할을 한다. 모델링 저장부는 장애 확률 예측부의 모델링을 저장하고 다양한 모델링 중에서 최적의 모델링을 선택하는 역할을 한다.The data collection unit communicates with at least one external client or server, and collects data from at least one external client or server according to a preset period. The failure probability prediction unit calculates a failure occurrence probability based on the data collected by the data collection unit, and models the failure probability prediction result through deep learning. The data collection period control unit controls the data collection period of the data collection unit based on the error occurrence probability calculated by the failure probability prediction unit. The modeling storage unit stores the modeling of the failure probability prediction unit and serves to select an optimal modeling from among various modelings.

본 출원에 따른 관리 시스템은 물리적으로 별개의 모델링들이 하나의 리소스로 운영 가능하게 되고 이를 위해서 할당한 자원을 효율적으로 활용할 수 있다.In the management system according to the present application, physically separate models can be operated as a single resource, and the allocated resources can be efficiently utilized for this purpose.

도 1은 본 출원의 관리 시스템(10)을 나타낸 도면이다.
도 2는 본 출원의 실시 예에 따른 도 1의 관리서버(100)의 일 예를 보여주는 도면이다.
도 3본 출원의 관리 서버(100)의 구성 요소 중 하나인 데이터 수집부(110)의 구성을 나타낸 도면이다.
도 4는 본 출원의 데이터 수집 항목 설정 모듈(111)의 예시를 나타낸 도면이다.
도 5는 본 출원의 장애 확률 예측부(120)에서 예측할 수 있는 장애의 종류를 나타낸 도면이다.
도 6은 본 출원의 다른 실시 예에 따른 데이터 수집부(110)를 보여주는 도면이다.
도 7은 데이터 수집부(110)의 표준 행렬 구성 모듈(113)과 정규화 레이어 모듈(114)의 동작을 설명하는 도면이다.
도 8은 자동화 기능이 적용된 데이터 수집 모듈(112)을 나타낸 도면이다.
도 9는 모델링 저장부(140)의 동작을 설명하는 도면이다.
도 10은 모델링 저장부(140)의 프로세스에 대한 과정을 나타낸 도면이다.
도 11은 서버 환경을 검색하고 분류하는 모듈(115)을 포함한 데이터 수집부(110) 도면이다.
도 12는 서버의 환경에 따라 자동 수집 항목을 적용하는 방법을 나타낸 순서도이다.1 is a diagram showing a management system 10 of the present application.
2 is a diagram illustrating an example of the management server 100 of FIG. 1 according to an embodiment of the present application.
3 is a diagram showing the configuration of the data collection unit 110, which is one of the constituent elements of the management server 100 of the present application.
4 is a diagram showing an example of the data collection item setting module 111 of the present application.
5 is a diagram showing the types of disorders that can be predicted by the failure probability prediction unit 120 of the present application.
6 is a diagram illustrating a data collection unit 110 according to another exemplary embodiment of the present application.
7 is a diagram illustrating operations of the standard matrix configuration module 113 and the normalization layer module 114 of the data collection unit 110.
8 is a diagram showing the data collection module 112 to which the automation function is applied.
9 is a diagram illustrating an operation of the modeling storage unit 140.
10 is a diagram illustrating a process of the modeling storage unit 140.
11 is a diagram of a data collection unit 110 including a module 115 for searching and classifying a server environment.
12 is a flowchart illustrating a method of applying an automatic collection item according to a server environment.

이하에서는 본 출원의 자세한 내용과 특징은 첨부된 도면을 참조하여 상세히 설명하도록 한다. 그러나 본 명세서가 이하에서 개시되는 실시 예들에 한정 되는 것이 아니라 다양한 형태로 응용 가능하며, 기술되는 설명들은 그러한 응용의 바탕이 되어 상기 응용 방안 모두를 포함하고자 하는 것이다. Hereinafter, details and features of the present application will be described in detail with reference to the accompanying drawings. However, the present specification is not limited to the embodiments disclosed below, but can be applied in various forms, and the descriptions described are intended to include all of the above application methods as the basis for such application.

도 1은 본 출원의 관리 시스템(10)을 나타낸 도면이다.1 is a diagram showing a management system 10 of the present application.

도 1을 참조하면, 관리 시스템(10)은 관리 서버(100) 및 복수의 서브 서버(210~240)와 모델링 저장부(140)를 포함한다. Referring to FIG. 1, the management system 10 includes a management server 100, a plurality of sub servers 210 to 240, and a modeling storage unit 140.

관리 서버(100)는 서브 서버(210~240)로부터 유지 및 관리에 필요한 데이터를 수신할 수 있다. 요청된 데이터는 서브 서버(210~240)의 기능에 따른 정보일 수 있고, 서브 서버(210~240)를 관리하는데 필요한 정보일 수 있다. The management server 100 may receive data necessary for maintenance and management from the sub servers 210 to 240. The requested data may be information according to the function of the sub servers 210 to 240, and may be information required to manage the sub servers 210 to 240.

관리 서버(100)는 복수의 서브 서버(210~240)와 통신하며 데이터를 주고받을 수 있다. 관리 서버(100)는 데이터를 미리 설정된 주기에 따라 수집하고 장애 발생 확률을 연산하며 데이터 수집 주기를 조절하는 기능을 포함할 수 있다. The management server 100 may communicate with a plurality of sub-servers 210 to 240 and exchange data. The management server 100 may include a function of collecting data according to a preset period, calculating a probability of occurrence of a failure, and adjusting a data collection period.

관리 시스템(10)에 사용되는 관리 서버(100)는 은행, 보험, 증권회사와 같은 관리 기관들에 적용 가능하다. 관리 시스템(10)에 사용되는 관리 서버(100)에는 관리 업무 시스템, 입수 정보를 저장, 통계, 분석하는 정보 관리 시스템과 같은 데이터를 저장하고 분석하는 시스템이 포함 될 수 있다. The management server 100 used in the management system 10 is applicable to management institutions such as banks, insurance, and securities companies. The management server 100 used in the management system 10 may include a system for storing and analyzing data such as a management work system and an information management system that stores, statistics, and analyzes acquired information.

예를 들어, 은행에서 관리 서버(100)를 적용했을 때, 관리 서버(100)는 입출금 프로그램에 대한 데이터를 수집하고 있다. 관리 서버(100)가 데이터 수집 중 시스템 연계가 되지 않는다고 한다면, 연계 항목 장애 발생 확률이 높아진 것이 판단 될 것이다. 그 결과, 관리 서버(100)는 연계 항목 장애 원인에 대한 데이터를 집중 추출을 하게 되고 데이터 수집 주기 또한 빨라지게 된다. For example, when a bank applies the management server 100, the management server 100 collects data on a deposit/withdrawal program. If the management server 100 does not connect to the system during data collection, it will be determined that the probability of occurrence of a linkage item failure has increased. As a result, the management server 100 intensively extracts data on the cause of the linkage item failure, and the data collection cycle is also accelerated.

서브 서버(210~240)는 적어도 하나이상이며, 관리 서버(100)와 통신하며 데이터를 주고받아 관리 서버(100)가 데이터를 수집 할 수 있도록 한다. 서브 서버는 외부 클라이언트일 수 있다. 외부 클라이언트는 모니터링 되는 호스트들 일 수 있다. 예를 들어 관리 업무, 고객의 정보, 인터넷 뱅킹, 보안, 서비스에 대한 현황들이 서브 서버(210~240)에 포함 될 수 있다.The sub-servers 210 to 240 are at least one, and communicate with the management server 100 to exchange data and allow the management server 100 to collect data. The sub server may be an external client. External clients can be monitored hosts. For example, management tasks, customer information, Internet banking, security, and status of services may be included in the sub servers 210 to 240.

서브 서버(210~240)는 예를 들어, Uinx서버, Window서버, Oracle, Web, Was, M/D, Application SAP일 수 있다. 다만, 본 출원의 기술적 사상은 이에 한정되지 않으며, 상기 서브 서버의 개수 및 종류는 다양할 수 있다. The sub servers 210 to 240 may be, for example, Uinx server, Window server, Oracle, Web, Was, M/D, Application SAP. However, the technical idea of the present application is not limited thereto, and the number and type of the sub-servers may vary.

서브 서버(210~240)의 유지 및 관리에 필요한 데이터로는 CPU 사용률, Memory 점유율, Disk 자동 수집과 같은 시스템의 리소스에 대한 데이터일 수 있다. 데이터를 다른 예로는 Mother board 상태, CPU 온도, Device Driver 정보와 같은 시스템의 하드웨어에 관한 정보 일 수 있다. 다만, 본 출원의 기술적 사상은 이에 한정되지 않으며, 서브 서버(210~240)의 유지 및 관리에 필요한 정보는 상기 예시 외에 다양할 수 있다.Data required for maintenance and management of the sub servers 210 to 240 may be data on system resources such as CPU usage, memory occupancy, and automatic disk collection. Another example of the data may be information on the hardware of the system, such as the state of the mother board, CPU temperature, and device driver information. However, the technical idea of the present application is not limited thereto, and information required for maintenance and management of the sub servers 210 to 240 may be various other than the above example.

관리 시스템(10)에서는 관리 서버(100)와 적어도 하나의 서브 서버(210~240)가 통신을 하며 데이터를 주고받을 수 있다.In the management system 10, the management server 100 and at least one sub-server 210 to 240 communicate with each other to exchange data.

본 출원의 일 실시 예에 따른 관리 서버(100)는 미리 설정된 주기에 따라 서브 서버(210~240)로부터 데이터를 수집하고, 수집된 데이터에 기초하여 장애 발생 확률을 연산 할 수 있다. The management server 100 according to an embodiment of the present application may collect data from the sub servers 210 to 240 according to a preset period, and calculate a probability of occurrence of a failure based on the collected data.

관리 서버(100)에서 모델링 된 데이터를 모델링 저장부(140)에 전송할 수 있다. 본 출원의 일 실시 예에 따른 모델링 저장부(140)는 여러 개의 모델링에서 추출된 서로 다른 모델 결과를 하나의 플램폼에서 사용 가능하게 할 수 있다. Data modeled by the management server 100 may be transmitted to the modeling storage unit 140. The modeling storage unit 140 according to the exemplary embodiment of the present application may enable different model results extracted from several modelings to be used in one platform.

예를 들어, 관리 서버(100)에서 다양한 종류의 장애 결과를 예측하였고 모델링하였다면, 모델링 저장부는 일괄된 인터페이스를 제공하고 다양한 종류의 결과를 하나의 플램폼에서 사용가능하게 하여 최적의 모델링을 선택할 수 있다.For example, if the management server 100 predicted and modeled various types of failure results, the modeling storage unit provides a unified interface and enables various types of results to be used in one platform to select the optimal modeling. have.

특히, 본 출원의 실시 예에 따른 관리 서버(100)는 장애 발생 확률에 기초하여 데이터 수집 주기를 조절함으로써, 장애 발생 확률이 높은 데이터를 집중적으로 추출 할 수 있다. 이와 같이 관리 서버(100)를 관리시스템(10)에 적용할 경우, 대용량 데이터 기반 학습 대비 시스템 비용과 학습 시간을 최소화하고 필요한 데이터만을 주기적으로 추출하여 추가 혹은 삭제의 용이성을 제공하여 효율성을 높일 수 있다. In particular, the management server 100 according to the embodiment of the present application may intensively extract data having a high probability of occurrence of a failure by adjusting a data collection period based on the probability of occurrence of a failure. In this way, when the management server 100 is applied to the management system 10, the system cost and learning time are minimized compared to large-scale data-based learning, and only necessary data is periodically extracted to provide ease of addition or deletion, thereby increasing efficiency. have.

도 2는 본 출원의 실시 예에 따른 도 1의 관리 시스템(10)의 일 예를 보여주는 도면이다.2 is a diagram illustrating an example of the management system 10 of FIG. 1 according to an embodiment of the present application.

도 2를 참조하면, 관리 서버(100)는 미리 설정된 주기에 따라 데이터를 수집하고, 수집된 데이터를 이용하여 장애 발생 확률을 연산할 수 있다. 관리서버(100)는 연산된 장애 발생 확률을 바탕으로 주기를 재설정하고, 장애 발생 확률이 높은 데이터를 집중 추출할 수 있다. 관리서버 (100)는 데이터 수집부 (110), 장애 확률 예측부 (120), 데이터 수집 주기 조절부 (130)을 포함할 수 있다. Referring to FIG. 2, the management server 100 may collect data according to a preset period and calculate a probability of occurrence of a failure using the collected data. The management server 100 may reset a period based on the calculated probability of occurrence of a failure, and intensively extract data having a high probability of occurrence of a failure. The management server 100 may include a data collection unit 110, a failure probability prediction unit 120, and a data collection period control unit 130.

데이터 수집부(110)는 설치된 소프트웨어와 하드웨어를 자동 검색할 수 있다. 예를 들어, 데이터 수집부(110)는 보안 및 암호화 기능을 포함한 메인 프레임, 통신/인터넷 신기술 접목이 유연한 구조인 개방형 유닉스 시스템과 같은 하드웨어를 자동 검색할 수 있다. 데이터 수집부(110)는 표준 어플리케이션 개발 및 운용 환경을 제공하는 패키지인 패키지화된 프레임 워크와 같은 소프트웨어가 자동 검색을 할 수 있다.The data collection unit 110 may automatically search for installed software and hardware. For example, the data collection unit 110 may automatically search for hardware such as a main frame including security and encryption functions, and an open Unix system having a flexible structure in which new communication/internet technologies are combined. The data collection unit 110 may automatically search for software such as a packaged framework, which is a package that provides a standard application development and operation environment.

또한 데이터 수집부(110)는 적어도 하나의 서브 서버(210~240)와 통신할 수 있다. 예를 들어, 데이터 수집부(110)는 인터넷, 블루투스, 인트라넷, 와이파이와 같은 인터페이스를 이용하여 서브 서버(210~240)와 통신할 수 있다.In addition, the data collection unit 110 may communicate with at least one sub server 210 to 240. For example, the data collection unit 110 may communicate with the sub servers 210 to 240 using interfaces such as Internet, Bluetooth, intranet, and Wi-Fi.

그리고 데이터 수집부(110)는 데이터를 미리 설정된 주기에 따라 수집할 수 있다. 예를 들어, 서브 서버(210~240)가 관리 업무, 고객의 정보, 인터넷 뱅킹, 보안으로 구성되어 있다면 관리 서버(100)가 서브 서버(210~240)의 데이터를 미리 설정된 주기마다 수집하게 된다. 주기가 1분이라면, 1분마다 관리 업무 현황, 고객 정보의 현황, 인터넷 뱅킹 현황, 보안 현황에 대해 관리 서버(100)로 데이터를 전송하게 된다. In addition, the data collection unit 110 may collect data according to a preset period. For example, if the sub servers 210 to 240 are configured with management tasks, customer information, internet banking, and security, the management server 100 collects data of the sub servers 210 to 240 at preset periods. . If the cycle is 1 minute, data is transmitted to the management server 100 about the status of management work, status of customer information, status of Internet banking, and status of security every minute.

데이터 수집부(110)는 데이터 수집 항목에 따라 장애 확률 데이터를 수집 할 수 있고 장애 해제 데이터를 수집할 수 있다. 장애 확률 데이터는 높은 발생 확률을 가진 장애에 대한 데이터라 할 수 있다. The data collection unit 110 may collect failure probability data and failure cancellation data according to the data collection item. The failure probability data can be said to be data on a failure with a high probability of occurrence.

예를 들어, 연계 항목 장애의 발생 확률이 높으면 연계 항목 장애의 원인에 대한 데이터를 수집한다. 장애 해제 데이터는 장애가 해결되고 원 상태로 복구하기 위해 필요한 데이터라고 할 수 있다. 예를 들어, 연계 항목 장애가 발생하였다가 복구되면 원래 수집하고 있던 데이터 항목으로 되돌아가게 된다. For example, if the probability of occurrence of a linked item failure is high, data on the cause of the linked item failure is collected. Failure release data can be said to be the data necessary for the failure to be resolved and restoration to the original state. For example, if a link item failure occurs and then recovers, the data item that was originally collected is returned.

장애 확률 예측부(120)는 수집된 데이터에 기초하여 장애 확률을 예측할 수 있다. 예를 들어, 연계 항목에 대한 데이터가 많다면 연계 항목에서 장애가 발생할 확률이 높을 수 있다. 혹은 연계 항목에 대한 데이터가 전과는 다르게 보인다면 장애가 발생할 확률이 높을 수 있다. The failure probability predictor 120 may predict the failure probability based on the collected data. For example, if there is a lot of data on the linked item, there may be a high probability of a failure in the linked item. Or, if the data on the linked item looks different from before, the probability of a failure may be high.

장애 확률 예측 결과는 인공지능 딥러닝을 통해 모델링 할 수 있다. 예를 들어, 발생한 장애의 원인이 여러 가지라면 원인으로 파악되는 확률이 높은 순서대로 인공지능 딥러닝을 이용하여 모델링 할 수 있다. 장애의 원인일 확률이 높은 순서뿐만 아니라, 모델링의 기준은 다양하게 적용될 수 있다. The failure probability prediction result can be modeled through artificial intelligence deep learning. For example, if there are several causes of a failure, it can be modeled using artificial intelligence deep learning in the order in which the probability of being identified as the cause is high. In addition to the order in which the probability of the cause of the disorder is high, the criteria for modeling can be applied in various ways.

장애 확률 예측 결과의 모델링에 적용될 가중치는 장애의 구분 혹은 수집 항목에 따라 차등 적용 될 수 있다. 예를 들어, 연계 항목 장애와 인터널 장애가 동시에 발생 하였다면 두 가지의 장애 중에서 비교적 심한 오류를 나타내는 장애를 먼저 해결하도록 가중치를 설정할 수 있다. 가중치를 주는 기준은 비교적 심한 오류에 더 크게 줄 수 도 있지만 다양한 기준을 적용할 수 있다.The weights to be applied to the modeling of the failure probability prediction result can be applied differentially according to the classification or collection item of the failure. For example, if a linkage item failure and an internal failure occur at the same time, a weight can be set to first resolve a failure indicating a relatively severe error among the two failures. The weighting criterion can give a larger weight to relatively severe errors, but various criteria can be applied.

데이터 수집 주기 조절부(130)는 장애 확률에 따라, 데이터 수집 주기를 조절할 수 있다. 예를 들어, 장애 발생 확률이 높을 경우에는 데이터 수집 주기를 짧게 하여 장애와 관련된 데이터를 많이 수집하도록 한다. 반대로 장애 발생 확률이 낮을 경우에는 데이터 수집 주기를 길게 하여 필요한 데이터만을 수집하도록 한다.The data collection period adjusting unit 130 may adjust the data collection period according to the probability of a failure. For example, if the probability of occurrence of a failure is high, the data collection period is shortened to collect a lot of data related to the failure. Conversely, when the probability of occurrence of a failure is low, the data collection period is lengthened so that only necessary data is collected.

모델링 저장부(140)는 장애 확률 예측부에서 모델링 된 것을 로딩하고 저장하여 최적의 모델링을 선택하게 된다. 모델링을 저장하는 방식과 모델링을 구성하는 Standard Object Description XML을 포함하여 저장함으로, 모델링을 하나의 플랫폼으로 운영이 가능하게 한다. 기존의 물리적으로 여러 개의 모델링들이 하나의 리소스로 운영이 가능하게 된다.The modeling storage unit 140 loads and stores what is modeled by the failure probability prediction unit to select an optimal modeling. Modeling can be operated as a single platform by saving the modeling method and the Standard Object Description XML that composes the modeling. Existing physically multiple models can be operated as a single resource.

예를 들어, 모델링 저장부(140)는 딥러닝을 통해 생성된 모델링과 데이터, 그리고 메타 정보를 기준으로 저장 매체에 정보를 저장한다. 저장된 정보는 TCP 혹은 IP 통신으로 로딩하여 사용할 수 있다.For example, the modeling storage unit 140 stores information in a storage medium based on modeling and data generated through deep learning, and meta information. Stored information can be loaded and used through TCP or IP communication.

도 3은 본 출원의 관리 서버(100)의 구성 요소 중 하나인 데이터 수집부(110)의 구성을 나타낸 도면이다. 3 is a diagram showing the configuration of the data collection unit 110, which is one of the constituent elements of the management server 100 of the present application.

도 3을 참조하면, 데이터 수집부(110)는 장애가 발생했는지의 여부에 따라 장애가 발생 했을 경우에는 발생한 장애에 대한 데이터를, 장애가 복구 되었을 경우에는 장애 발생 전의 데이터를 수집한다. Referring to FIG. 3, the data collection unit 110 collects data on a failure that has occurred when a failure occurs, and data before a failure occurs when the failure is restored, depending on whether or not a failure has occurred.

데이터 수집부(110)는 데이터 수집 항목 설정 모듈(111)과 데이터 수집 모듈(112)을 포함할 수 있으며, 데이터 수집 항목 설정 모듈(111)의 경우, 장애 확률 정보 수집 모듈(111_1)과 장애 해제 정보 수집 모듈(111_2)을 포함할 수 있다. The data collection unit 110 may include a data collection item setting module 111 and a data collection module 112, and in the case of the data collection item setting module 111, the failure probability information collection module 111_1 and the failure release It may include an information collection module 111_2.

데이터 수집 항목 설정 모듈(111)은 장애 확률 정보에 따라 데이터의 수집 항목을 설정하도록 조절할 수 있다. 장애 확률 예측부(120)에서 장애 확률이 높아졌을 경우에는 장애와 관련된 데이터만을 집중적으로 수집 할 수 있다. 반면, 장애가 해결되었을 경우에는 원래의 수집 항목으로 되돌아가도록 할 수 있다.The data collection item setting module 111 may adjust to set the data collection item according to the failure probability information. When the probability of a failure is increased in the failure probability prediction unit 120, only data related to the failure may be intensively collected. On the other hand, when the problem is resolved, it can be reverted to the original collection item.

장애 확률 정보 수집 모듈(111_1)은 장애 확률 예측부(120)에서의 장애 확률에 따라 조절될 수 있다. 장애 확률 정보 수집 모듈(111_1)은 특정 장애 발생 확률이 높아졌을 경우에 그 장애에 관한 데이터만을 집중 추출할 수 있다. 예를 들어, 연계 항목 장애 확률이 높아 졌다면 연계 항목 장애의 원인이 될 수 있는 데이터들을 집중적으로 추출하도록 하는 데이터를 데이터 수집 모듈(112)에 줄 수 있다. The failure probability information collection module 111_1 may be adjusted according to the failure probability in the failure probability prediction unit 120. When the probability of occurrence of a specific failure is increased, the failure probability information collection module 111_1 may intensively extract only data related to the failure. For example, if the probability of a linkage item failure is increased, data for intensively extracting data that may cause a linkage item failure may be provided to the data collection module 112.

장애 해제 정보 수집 모듈(111_2)는 발생한 장애를 해결하고 난 뒤, 원래의 데이터 수집 항목으로 돌아가는 경우를 나타낼 수 있다. 예를 들어, 기존의 관리 현황 고객 정보 현황을 수집하고 있었다면 장애가 발생하였다가 복구되었을 때, 관리 현황, 고객 정보 현황과 같이 원래 수집하고 있던 항목으로 되돌아가 데이터를 수집하도록 하는 데이터를 데이터 수집 모듈(112)에 줄 수 된다. The failure cancellation information collection module 111_2 may indicate a case of returning to an original data collection item after resolving a failure that has occurred. For example, if the existing management status and customer information status were being collected, when a failure occurred and recovered, the data collection module ( 112).

데이터 수집 모듈(112)은 데이터 수집 항목 설정 모듈(111)을 기반으로 장애 확률 정보 수집 모듈(111_1)에서 특정 장애에 대한 정보를 수집하라는 신호가 오게 되면 특정 장애에 대한 데이터를 수집 하도록 한다. The data collection module 112 collects data on a specific failure when a signal to collect information on a specific failure is received from the failure probability information collection module 111_1 based on the data collection item setting module 111.

예를 들어, 연계 항목 장애 확률이 높아 졌다면 연계 항목 장애의 원인이 될 수 있는 데이터들을 집중적으로 추출할 수 있다. For example, if the probability of a linked item failure has increased, data that can cause the linked item failure can be intensively extracted.

반면, 데이터 수집 모듈(112)에 장애 해제 정보 수집 모듈(111_2)의 데이터가 오게 된다면, 장애를 해결했기 때문에 장애 발생 전의 데이터 수집 항목으로 돌아가도록 한다. On the other hand, if the data of the error cancellation information collection module 111_2 comes to the data collection module 112, the data collection item prior to the occurrence of the failure is returned because the failure has been resolved.

예를 들어, 기존의 관리 현황 고객 정보 현황을 수집하고 있었다면 장애가 발생하였다가 복구되었을 때, 관리 현황, 고객 정보 현황과 같이 원래 수집하고 있던 항목으로 되돌아가 데이터를 수집할 수 있다.For example, if the existing management status and customer information status are being collected, when a failure occurs and then recovers, data can be collected by returning to the items that were originally collected, such as management status and customer information status.

도 4는 본 출원의 데이터 수집 항목 설정 모듈(111)의 예시를 나타낸 도면이다. 4 is a diagram showing an example of the data collection item setting module 111 of the present application.

도 4를 참고하면, 데이터 수집 항목 설정 모듈(111)은 데이터 수집 항목을 조절하도록 한다. 장애 확률 예측부(120)에서 특정 장애 발생 확률이 높아졌을 경우, 데이터 수집 항목 조절 모듈(111)은 특정 장애와 관련된 데이터를 집중적으로 수집할 수 있도록 한다. Referring to FIG. 4, the data collection item setting module 111 adjusts the data collection item. When the probability of occurrence of a specific failure is increased in the failure probability prediction unit 120, the data collection item control module 111 enables intensive collection of data related to a specific failure.

데이터 수집 항목 조절 모듈(111)은 장애가 발생 하였을 경우에, 어떠한 장애인지 파악하고 장애를 해결하기 위해 원인이 되는 데이터를 수집하도록 하는 기능을 할 수 있다. When a failure occurs, the data collection item control module 111 may perform a function of identifying a person with a disability and collecting data that causes the disability to be resolved.

예를 들어, 장애 확률 예측부에서 장애의 한 종류인 Was 서비스 멈춤(Was Service shutdown)의 확률이 증가한 것으로 판단 된 경우에는 Was 서비스 지연(Was Service delay)이 발생했거나 엄청난 양의 메모리 크기(Heap Memory Size)가 증가 돼서 장애가 발생했다고 판단 할 수 있다. For example, if the failure probability prediction unit determines that the probability of Was service shutdown, a type of failure, has increased, a Was service delay has occurred or a huge amount of memory (Heap Memory) It can be judged that a failure has occurred due to an increase in size).

데이터 수집 항목 설정 모듈(111)은 장애를 해결하기 위해 Was 서비스 지연(Was Service delay)와 엄청난 양의 메모리 크기(Heap Memory Size) 증가 현상이 발생하는 이유에 대한 데이터를 수집하도록 한다. 이러한 방법으로 분석을 하게 되면 장애가 발생한 이유를 파악 할 수 있다.The data collection item setting module 111 collects data on the reason why Was Service delay and a huge amount of Heap Memory Size increase in order to solve a failure. Analyzing in this way can help determine the reason for the failure.

이와 같이, 데이터 수집 항목 조절 모듈(111)은 장애 확률 예측부(120)의 결과를 바탕으로 수집해야 할 데이터를 파악할 수 있다. 수집해야할 데이터는 데이터 수집 모듈(112)에서 수집하게 된다.In this way, the data collection item control module 111 may grasp data to be collected based on the result of the failure probability prediction unit 120. Data to be collected is collected by the data collection module 112.

도 5는 본 출원의 장애 확률 예측부(120)에서 예측할 수 있는 장애의 종류를 나타낸 도면이다.5 is a diagram showing the types of disorders that can be predicted by the failure probability prediction unit 120 of the present application.

도 5를 참조하면, 장애의 종류는 장애 예측 확률부(120)에서 필요한 데이터이다. 장애는 여러 종류가 있으며 장애를 예측하고 해결하기 위해서는 장애에 관련된 데이터가 필요하다. 장애에는 통제 불가능한 재해를 제외한 발생원인 관점에서 직접적으로 영향을 미치는 장애가 있다. Referring to FIG. 5, the type of a failure is data required by the failure prediction probability unit 120. There are many types of disability, and data related to disability is needed to predict and solve disability. Disability includes disability that directly affects from the point of view of the cause of occurrence, excluding uncontrollable disasters.

예를 들면, 인적 장애, 시스템 장애, 기반구조장애와 같은 통제 가능한 요인들에 의한 정보시스템의 기능저하, 오류, 고장이 있다. For example, there are malfunctions, errors, and failures of information systems due to controllable factors such as human failure, system failure, and infrastructure failure.

장애는 정보시스템 운영상에서 발생되는 사건으로 미약하더라도 정보시스템에 영향을 주게 된다. 이러한 장애들은 환경, 서비스, 설정, 자원, 연계, 성능, 리소스와 같은 정보 시스템의 바탕이 되는 요소들을 통해 예측이 가능하다. 예측 가능한 지표를 적용하여 장애 예측 및 해결을 자동화 할 수 있다. Failure is an event that occurs in the operation of the information system, and even if it is weak, it affects the information system. These obstacles can be predicted through the underlying elements of the information system such as environment, service, configuration, resource, linkage, performance, and resource. Failure prediction and resolution can be automated by applying predictable indicators.

한편, 도 5에 도시된 장애의 종류는 예시적인 것이며, 본 출원의 기술적 사상은 이에 한정되지 않음이 이해될 것이다. 예를 들어, 본 발명의 다른 일 실시예에 따르면, 상기 장애는 장애가 발생한 부품의 종류에 따라 어댑턴 관련 장애, CPU 관련 장애, 디스크 관련 장애, 전원 관련 장애, FAN 관련 장애, 플랫폼 펌웨어 관련 장애로 구분 될 수 있다.Meanwhile, it will be understood that the types of disorders illustrated in FIG. 5 are exemplary, and the technical idea of the present application is not limited thereto. For example, according to another embodiment of the present invention, the failure may be an adapter-related failure, a CPU-related failure, a disk-related failure, a power-related failure, a FAN-related failure, or a platform firmware-related failure depending on the type of the failed component. Can be distinguished.

도 6은 본 출원의 다른 실시 예에 따른 데이터 수집부(110)를 보여주는 도면이다. 6 is a diagram illustrating a data collection unit 110 according to another exemplary embodiment of the present application.

도 6을 참조하면, 데이터 수집부(110)는 적어도 하나의 서브 서버(210~240)와 통신할 수 있고, 데이터를 미리 설정된 주기에 따라 수집할 수 있다. 데이터 수집부(110)는 데이터 수집 항목에 따라 장애 확률 데이터를 수집 할 수 있고 장애 해제 데이터를 수집할 수 있다. 그리고 장애 구분에 따라 가중치 적용과 딥러닝을 통한 장애 확률 계산이 가능하다. Referring to FIG. 6, the data collection unit 110 may communicate with at least one sub-server 210 to 240 and may collect data according to a preset period. The data collection unit 110 may collect failure probability data and failure cancellation data according to the data collection item. In addition, it is possible to calculate the failure probability through deep learning and applying weights according to the classification of the failure.

데이터 수집부 (110)는 데이터 수집 항목 설정 모듈(111), 데이터 수집 모듈(112), 표준 행렬 구성 모듈(113), 정규화 레이어 모듈(114)을 포함할 수 있다.The data collection unit 110 may include a data collection item setting module 111, a data collection module 112, a standard matrix configuration module 113, and a normalization layer module 114.

데이터 수집 항목 설정 모듈(111)은 도 3에 도시된 데이터 수집 항목 설정 모듈(111)과 동일하다.The data collection item setting module 111 is the same as the data collection item setting module 111 shown in FIG. 3.

데이터 수집 모듈(112)은 도 3에 도시된 데이터 수집 모듈(112)과 동일하다. 이에, 이하에서는 동일하거나 유사한 구성 요서는 동일하거나 유사한 참조번호를 사용하여 설명될 것이며, 중복되는 설명은 명확하고 간결한 설명을 위해 생략될 것 이다.The data collection module 112 is the same as the data collection module 112 shown in FIG. 3. Accordingly, hereinafter, the same or similar components will be described using the same or similar reference numerals, and duplicate descriptions will be omitted for clear and concise description.

표준 행렬 구성 모듈(113)은 이전에 수집된 데이터를 행렬로 저장해 둘 수 있다. 장애가 해결되어 초기 설정의 데이터 항목으로 되돌아가는 경우에 집중 추출 항목이 아닌 데이터도 가지고 있어야 한다. 표준 행렬 구성 모듈(113)은 데이터 이탈(drop out)을 방지할 수 있다. The standard matrix configuration module 113 may store previously collected data as a matrix. When the problem is resolved and the data item is returned to the initial setting, data other than the intensive extraction item must also be retained. The standard matrix configuration module 113 may prevent data from dropping out.

예를 들어, 장애가 해결 되었는데 장애가 발생하기 전의 데이터가 존재하지 않는다면 장애 해결을 위해 집중 추출한 데이터를 기반으로 데이터가 수집 될 수 있다. 그렇다면 새로운 장애가 발생했을 경우에 인지를 하지 못하는 상황이 생길 수 있다. 그렇기 때문에, 장애 발생 이전에 수집된 데이터를 행렬로 저장해두고 장애 해결시 이전에 수집된 데이터로 돌아 가야한다. For example, if a problem has been resolved but data before the problem occurred, data may be collected based on the data extracted intensively to solve the problem. If so, there may be situations in which you are not aware of the occurrence of a new disorder. Therefore, it is necessary to store the data collected before the failure as a matrix and return to the previously collected data when the failure is resolved.

정규화 레이어 모듈(114)은 장애의 구분에 따라 혹은 수집 항목에 따라 가중치를 차등 적용하여 가중치가 높은 장애를 먼저 해결 할 수 있다. The normalization layer module 114 may first solve a high-weight fault by applying a weight differentially according to a classification of a fault or a collection item.

예를 들어, 장애의 구분에 따라 가중치를 적용한다면, 연계 항목 장애가 인터널 장애보다 가중치가 클 경우에 정규화 레이어 모듈(114)은 연계 항목 장애를 먼저 해결하도록 할 수 있다.For example, if a weight is applied according to the classification of a failure, when a linkage item failure has a weight greater than an internal failure, the normalization layer module 114 may first resolve the linkage item failure.

도 7은 데이터 수집부(110)의 표준 행렬 구성 모듈(113)과 정규화 레이어 모듈(114)의 동작을 설명하는 도면이다.7 is a diagram illustrating operations of the standard matrix configuration module 113 and the normalization layer module 114 of the data collection unit 110.

도 7을 참조하면, 표준 행렬 구성 모듈(113)은 데이터 이탈(drop out)을 방지하기 위하여 매트릭스 형태로 데이터를 관리할 수 있다. 표준 행렬은 메모리에 별도로 저장된다. Referring to FIG. 7, the standard matrix configuration module 113 may manage data in a matrix form to prevent data dropout. Standard matrices are stored separately in memory.

예를 들어, 장애가 해제되어 초기 설정의 데이터 항목으로 되돌아가는 경우에 이전에 수집된 데이터 중에서 집중 추출 항목이 아닌 데이터 항목도 가지고 있어야 더 정확한 모델링으로 업데이트 할 수 있다. 정확한 모델링을 통해 장애 확률 예측의 정확성을 높일 수 있다. For example, in the case of reverting to the data item of the initial setting after the fault is released, it is necessary to have a data item other than the intensive extraction item among the previously collected data to update more accurate modeling. Accurate modeling can improve the accuracy of predicting failure probability.

정규화 레이어 모듈(114)은 장애의 구분에 따라 혹은 수집 항목에 따라 가중치를 차등 적용할 수 있다. The normalization layer module 114 may differentially apply weights according to a classification of a failure or a collection item.

예를 들어, A 장애와 B 장애의 장애 발생 확률이 같게 나왔을 경우에 가중치가 더 큰 장애를 먼저 해결하도록 할 수 있다. For example, if the probability of occurrence of failures of failure A and failure B is the same, the failure with a larger weight can be solved first.

도 8은 자동화 기능이 적용된 데이터 수집 모듈(112)을 나타낸 도면이다.8 is a diagram showing the data collection module 112 to which the automation function is applied.

도 8을 참조하면, 데이터 수집은 장애가 발생할 확률이 높아지면 해당 팩트들의 수집 주기를 조절하여 장애 관련 정보를 실시간으로 수집한다. 데이터의 주기는 표본에서 만들어진 기대값과 비례하여 자동 조정 하도록 한다. 데이터 자동 병합 기술을 이용하여 장애 발생 확률을 효율적으로 예측하도록 할 수 있다. Referring to FIG. 8, in data collection, when the probability of occurrence of a failure increases, failure-related information is collected in real time by adjusting a collection period of corresponding facts. The period of the data is automatically adjusted in proportion to the expected value made from the sample. It is possible to efficiently predict the probability of occurrence of a failure by using data automatic merging technology.

또한 데이터 이탈(drop out) 방지를 위한 데이터 수집과 주기재설정으로 변경된 데이터는 메인 데이터를 집중적으로 수집하여 효율적인 운영이 가능하도록 한다. In addition, data changed by data collection and periodic resetting to prevent data dropout can be operated efficiently by intensively collecting main data.

반면 은닉 모델에서 만들어진 확률을 사용하지 않는 이유는 은닉 계층이 팩트들의 퍼셉트론(perceptron)으로 가공되기 때문이라 할 수 있다. 이때 퍼셉트론(perceptron)이란 일종의 학습 기계로서, 뇌의 학습 기능을 모델화한 기계라고 할 수 있다.On the other hand, the reason that the probability created in the hidden model is not used is because the hidden layer is processed into a perceptron of facts. At this time, the perceptron is a kind of learning machine, and it can be said to be a machine modeled on the learning function of the brain.

도 9는 모델링 저장부(140)의 동작을 설명하는 도면이다.9 is a diagram illustrating an operation of the modeling storage unit 140.

도 9를 참조하면, 모델링 저장부(140)는 모델링 저장 매체(141)와 다이나믹 모델 로더(142)를 포함하고 있다. Referring to FIG. 9, the modeling storage unit 140 includes a modeling storage medium 141 and a dynamic model loader 142.

장애 확률 예측부(120)에서 딥러닝을 통해 만들어진 모델링은 각 환경에 맞는 모델링의 형태가 다를 수 있다. 모델링 저장부(140)를 통해 모델링이 진화 할 수 있거나, 다양하게 변경 가능하게 운영하여 그 정확성과 실효성을 보장 할 수 있도록 한다.Modeling made through deep learning by the failure probability prediction unit 120 may have a different modeling type for each environment. Modeling can be evolved through the modeling storage unit 140, or it can be operated in various ways to ensure its accuracy and effectiveness.

모델링 저장부(140)는 어떤 형태의 목적(Object)이 산출 될지, 어떤 종류(Class)와 메소드로 접근 할지 알 수 없는 상태에서 일괄된 인터페이스를 제공한다. 모델링 저장부(140)는 런타임시 다이나믹하게 로딩 하여 사용 할 수 있게 함으로 여러 개의 모델링에서 추출된 서로 다른 모델 결과를 하나의 플랫폼에서 사용 가능하게 하여 최적의 모델링을 선택할 수 있다. 따라서, 모델링 저장부(140)는 관리서버(100)로부터 생성된 여러 개의 모델링을 로딩하여 저장하고, 여러 개의 모델링 중에서 최적의 모델링을 선택할 수 있다.The modeling storage unit 140 provides a batch interface in a state where it is not known what type of object is to be calculated and what type (class) and method to access. Since the modeling storage unit 140 can dynamically load and use at runtime, different model results extracted from multiple modelings can be used on one platform, thereby selecting an optimal modeling. Accordingly, the modeling storage unit 140 may load and store a plurality of modelings generated from the management server 100 and select an optimal modeling from among the plurality of modelings.

구체적으로는, 모델링 저장부(140)는 모델링을 저장하는 방식과 모델링을 구성하는 Standard Object Description XML을 포함하여 저장하도록 하여, 모델링을 하나의 플랫폼으로 운영이 가능하도록 할 수 있다. 모델링 저장부(140)는 기존의 물리적으로 여러 개의 모델링들이 하나의 리소스로 운영 가능하게 되고 이를 위해서 할당한 자원을 효율적으로 활용할 수 있다.Specifically, the modeling storage unit 140 may store the modeling including a method of storing the modeling and the Standard Object Description XML constituting the modeling, so that the modeling can be operated as a single platform. The modeling storage unit 140 enables existing physically multiple modelings to be operated as a single resource, and for this purpose, the allocated resources can be efficiently utilized.

이때, 모델링 저장 매체(141)는 Java의 Reflection을 바탕으로 하여 POJO(Plain Old Java Object)로 구성된다. 모델링 저장부(140)의 다이나믹 모델 로더(142)는 모델링 저장 매체(141)에서 모델링을 로딩 할 수 있다. 다이나믹 모델 로더(142)는 로딩 실패 시에 처음 객체를 생성하고, 생성된 객체는 딥러닝을 통해 생성된 모델과 데이터, 메타 정보를 기준으로 저장매체에 정보를 저장할 수 있다. 다이나믹 모델 로더(142)는 모델링 저장 매체(141)로부터 모델링을 TCP 혹은 IP 통신을 이용하여 메타 정보를 기준으로 로딩하여 사용할 수 있다.At this time, the modeling storage medium 141 is composed of a Plain Old Java Object (POJO) based on Java's reflection. The dynamic model loader 142 of the modeling storage unit 140 may load modeling from the modeling storage medium 141. The dynamic model loader 142 may initially create an object when loading fails, and the created object may store information in a storage medium based on a model, data, and meta information generated through deep learning. The dynamic model loader 142 may load and use modeling from the modeling storage medium 141 based on meta information using TCP or IP communication.

모델링 저장부(140)의 프로세스는 비정형 데이터를 로딩하여 정보를 획득하는 방법과는 다르다. 모델링 저장부(140)의 프로세스는 목적(Object)을 통해서 모델링 된 내용으로 인덱스나, 유니크키 등의 정보는 필요 없으며, 로딩 시 모델링 객체와 가중치가 적용된 팩터들의 정보를 즉시 사용할 수 있다.The process of the modeling storage unit 140 is different from the method of obtaining information by loading unstructured data. The process of the modeling storage unit 140 is a content modeled through an object, and does not require information such as an index or a unique key, and when loading, information on the modeling object and weighted factors can be used immediately.

도 10은 모델링 저장부(140)의 프로세스에 대한 과정을 나타낸 도면이다.10 is a diagram illustrating a process of the modeling storage unit 140.

도 10을 참조하면, 모델링 변경 여부에 따라 모델링 저장부(140)의 프로세스를 따르도록 한다.Referring to FIG. 10, the process of the modeling storage unit 140 is followed according to whether or not the modeling is changed.

모델링 저장부(140)는 모델링 저장부(140)의 유효성을 체크하고 변할 수 있는 데이터들을 로딩한 후, 유효한 환경인지 여부를 파악한다. 모델링 저장부(140)의 작동이 유효한 환경이라면, 예측 모델링 매체를 로딩하고, 데이터를 바인딩 하도록 한다.The modeling storage unit 140 checks the validity of the modeling storage unit 140 and loads data that can be changed, and then determines whether or not the environment is valid. If the operation of the modeling storage unit 140 is in an effective environment, the predictive modeling medium is loaded and data is bound.

모델링 저장부(140)는 예측 모델링과 데이터에 따른 모델링이 다를 경우 모델링을 변경하도록 하고 변경된 모델링을 저장하도록 한다. 모델링 저장부(140)의 예측 모델링과 데이터에 따른 모델링이 같은 경우, 실시간으로 데이터를 수신하고 모델링에 데이터를 바인딩 하여 딥러닝 과정을 거치도록 한다.The modeling storage unit 140 changes the modeling and stores the changed modeling when the predictive modeling and the modeling according to the data are different. When the predictive modeling of the modeling storage unit 140 and the modeling according to the data are the same, the data is received in real time and the data is bound to the modeling to undergo a deep learning process.

도 11은 서버 환경을 검색하고 분류하는 모듈(115)을 포함한 데이터 수집부(110) 도면이다.11 is a diagram of a data collection unit 110 including a module 115 for searching and classifying a server environment.

도 11을 참조하면, 서버환경 검색 / 분류 모듈(115)은 설치된 시스템을 검색하고 검색된 시스템을 카테고리 별로 분류할 수 있다. 서버 환경 검색 / 분류 모듈(115)은 하드웨어와 소프트웨어의 검색 및 분류 모듈로 구성 되어 소프트웨어와 하드웨어를 자동으로 검색할 수 있다.Referring to FIG. 11, the server environment search/classification module 115 may search for installed systems and classify the searched systems by category. The server environment search/classification module 115 is composed of a search and classification module for hardware and software, and can automatically search for software and hardware.

하드웨어는 보안 및 암호화 기능을 포함한 메인 프레임, 통신/인터넷 신기술 접목이 유연한 구조인 개방형 유닉스 시스템을 예로 들 수 있다. 소프트웨어는 표준 어플리케이션 개발 및 운용 환경을 제공하는 패키지인 패키지화된 프레임 워크를 예로 들 수 있다. For example, hardware is a mainframe including security and encryption functions, and an open Unix system with a flexible structure incorporating new communication/internet technologies. For example, software is a packaged framework, which is a package that provides a standard application development and operation environment.

서버환경 검색 / 분류 모듈(115)이 시스템을 카테고리 별로 분류 하고 데이터 수집 항목 설정 모듈(111)이 서버 환경 맞는 수집 데이터 항목을 자동 설정하도록 한다. The server environment search/classification module 115 classifies the system by category, and the data collection item setting module 111 automatically sets the collected data items suitable for the server environment.

도 12는 서버의 환경에 따라 자동 수집 항목을 적용하는 방법을 나타낸 순서도이다.12 is a flowchart illustrating a method of applying an automatic collection item according to a server environment.

도 12를 참조하면, 자동화 기능을 포함한 데이터 수집 모듈(112)은 장애를 대응하는 직접적인 팩터를 서버의 환경에 맞게 자동 수집 할 수 있도록 한다. 시스템이 카테고리 별 고유의 정보를 가지고 있으며, 표준화 하고 표준화 된 카테고리에 맞는 지표를 자동 연계시켜 장애 예측에 필요한 데이터를 적절하게 수집할 수 있도록 한다. Referring to FIG. 12, the data collection module 112 including an automation function enables a direct factor corresponding to a failure to be automatically collected according to the environment of the server. The system has unique information for each category, and by standardizing and automatically linking the indicators suitable for the standardized category, it is possible to appropriately collect the data necessary for failure prediction.

자동화 기능을 포함한 데이터 수집 모듈(112)의 작동 방법은 먼저 설치된 하드웨어를 자동 검색하고 수집하여 카테고리를 분류한다. 하드웨어는 보안 및 암호화 기능을 포함한 메인 프레임, 통신/인터넷 신기술 접목이 유연한 구조인 개방형 유닉스 시스템을 예로 들 수 있다. In the method of operating the data collection module 112 including the automation function, first, the installed hardware is automatically searched for and collected to classify the categories. For example, hardware is a mainframe including security and encryption functions, and an open Unix system with a flexible structure incorporating new communication/internet technologies.

그리고 자동화 기능을 포함한 데이터 수집 모듈(112)은 설치된 소프트웨어의 상황을 검색하고 수집한다. 소프트웨어는 표준 어플리케이션 개발 및 운용 환경을 제공하는 패키지인 패키지화된 프레임 워크를 예로 들 수 있다. And the data collection module 112 including the automation function searches and collects the situation of the installed software. For example, software is a packaged framework, which is a package that provides a standard application development and operation environment.

자동화 기능을 포함한 데이터 수집 모듈(112)은 하드웨어와 소프트웨어의 정보를 기반으로 서버환경을 파악하고, 수집해야 할 데이터를 자동으로 설정하도록 할 수 있다. 자동화 기능을 포함한 데이터 수집 모듈(112)은 수집할 데이터를 저장하는 표준 행렬 정보 구성을 하고, 데이터를 수집하게 된다.The data collection module 112 including an automation function may identify a server environment based on information of hardware and software, and automatically set data to be collected. The data collection module 112 including an automation function configures standard matrix information for storing data to be collected, and collects data.

이상, 첨부된 도면을 참조하여 본 명세서의 실시예를 설명하였지만, 본 명세서가 속하는 기술 분야의 통상의 기술자는 본 출원이 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시 될 수 있다는 것을 이해할 수 있을 것이다. 그러므로, 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며, 제한적이 아닌 것으로 이해해야만 한다.As described above, embodiments of the present specification have been described with reference to the accompanying drawings, but those skilled in the art to which the present specification pertains have learned that the present application can be implemented in other specific forms without changing the technical spirit or essential features. You can understand. Therefore, the embodiments described above are illustrative in all respects, and should be understood as non-limiting.

10 :　관리시스템
100 : 관리서버
110 : 데이터 수집부
111 : 데이터 수집 항목 설정 모듈
111_1 : 장애 확률 정보 수집 모듈
111_2 : 장애 해제 정보 수집 모듈
113 : 표준 행렬 구성 모듈
114 : 정규화 레이어 모듈
115 : 서버 환경 검색 / 분류 모듈
115_1 : 하드웨어 검색 / 분류 모듈
115_2 : 소프트웨어 검색 / 분류 모듈
120 : 장애 확률 예측부
130 : 데이터 수집 주기 조절부
140 : 모델링 저장부
141 : 모델링 저장 매체
142 : 다이나믹 모델 로더
210 : 서브 서버
220 : 서브 서버
230 : 서브 서버
240 : 서브 서버
S11 : 모델링 저장부의 유효성 체크 및 환경 변수 로딩
S12 : 예측 모델링 매체 로딩
S13 : 모델링 로딩 및 데이터 바인딩
S14 : 모델링 Output Object 생성
S15 : 모델링 Output Object 저장
S16 : 실시간 데이터 수신부
S17 : 데이터 바인딩
S18 : 딥러닝
S21 : 하드웨어 자동 검색 및 분류
S22 : 소프트웨어 자동 검색 및 분류
S23 : 자동 데이터 수집
S24 : 표준 행렬로 데이터 저장
S25 : 데이터 수집 10: management system
100: management server
110: data collection unit
111: data collection item setting module
111_1: Failure probability information collection module
111_2: Failure cancellation information collection module
113: standard matrix construction module
114: normalization layer module
115: Server environment search/classification module
115_1: Hardware search/classification module
115_2: Software search/classification module
120: failure probability prediction unit
130: data collection period control unit
140: modeling storage unit
141: modeling storage medium
142: Dynamic Model Loader
210: sub server
220: sub server
230: sub server
240: sub server
S11: Modeling storage unit validity check and environment variable loading
S12: predictive modeling medium loading
S13: Modeling loading and data binding
S14: Modeling Output Object creation
S15: Save modeling output object
S16: Real-time data receiver
S17: data binding
S18: Deep Learning
S21: Automatic hardware detection and classification
S22: software automatic detection and classification
S23: Automatic data collection
S24: Save data as standard matrix
S25: data collection

Claims

A data collection unit that communicates with at least one external client and collects data from the at least one external client according to a preset period;
A failure probability prediction unit that calculates a failure occurrence probability based on the data collected by the data collection unit;
A data collection period adjusting unit configured to adjust a data collection period of the data collecting unit based on the probability of occurrence of a failure calculated by the failure probability predicting unit; And
A modeling storage unit that stores the modeling of the failure probability prediction unit and selects at least one artificial intelligence model from among a plurality of modelings;
Management system comprising a.

The method of claim 1,
The modeling storage unit provides a batch interface,
A management system for selecting at least one modeling from among the plurality of modelings.

The method of claim 2,
The plurality of modeling is generated through a deeper function,
The plurality of modeling may have different forms to suit each environment,
The plurality of modeling can be changed through the modeling storage unit, Management system.

The method of claim 1,
The data collection period control unit serves to adjust the data collection period of the data collection unit based on the failure occurrence probability calculated by the failure probability prediction unit,
When the probability of occurrence of the failure is high, the data collection period control unit controls the data collection period as a first data collection period,
When the probability of occurrence of the failure is low, the data collection period control unit controls the data collection period as a second data collection period,
The first data collection period is shorter than the second data collection period.

The method of claim 1,
The failure probability predictor, if predetermined data among the collected data is greater than a reference number, determine a failure prediction result as having a high probability of occurrence of a failure with respect to the predetermined data.

The method of claim 5,
The failure prediction result is modeled through artificial intelligence deep learning, and the weight to be applied to the modeling can be variable.

The method of claim 5,
The above fault is a phenomenon outside the normal operating state,
Wherein the disorder includes a controllable disorder and an uncontrollable disorder.

The method of claim 1,
The data collection unit may include a data collection item setting module for setting or canceling data collection items;
A data collection module for collecting data to be collected determined by the data collection item setting module;
A standard matrix configuration module that stores previous data as a matrix to prevent data from being deviated; And
A management system comprising a normalization layer module that weights the failures that occur.

The method of claim 8,
The data collection item setting module can be adjusted to set the collection item of data according to the failure probability information,
A failure probability information collection module configured to intensively collect only data related to a failure with a high probability when the probability of occurrence of a failure is increased; And
A management system, comprising a fault clearing information module that allows reverting to the original collection item when the fault is resolved and recovered.

The method of claim 8,
The data collection module recognizes that a failure has occurred and collects data on the failure based on the failure probability information collection module,
Recognizing that the failure has been resolved based on the failure cancellation information module, and returning to the data collection item before the failure occurs,
The standard matrix configuration module manages data in the form of a mattress to prevent data from dropping out, and the standard matrix can be separately stored in a memory,
The normalization layer module differentially applies weights based on at least one of a failure classification or a collection item.