KR20240000934A

KR20240000934A - IT infrastructure failure pre-processing system by machine learning algorithm

Info

Publication number: KR20240000934A
Application number: KR1020220077746A
Authority: KR
Inventors: 임승환
Original assignee: (주)제스아이앤씨
Priority date: 2022-06-24
Filing date: 2022-06-24
Publication date: 2024-01-03

Abstract

머신러닝 알고리즘에 의한 IT인프라 장애 사전 처리 시스템은 IT인프라 및 소스코드 개발 시스템 별로 원시데이터를 추출하여 인지모델에 의해 장애유발요소을 인지하고, 상기 원시데이터와 장애유발요소에 대응하는 소스코드를 추출하여 추출된 소스코드 단위로 라벨링하는 마이크로에이전트; 상기 마이크로에이전트로부터 라벨링된 소스코드를 전송받아 분석모델을 통해 장애를 유발할 가능성이 있는 소스코드인 제1장애데이터를 출력하는 데이터수집부; 상기 제1장애데이터를 장애예측모델에 입력하여 상기 제1장애데이터를 비장애요소와 장애요소로 구분하는 제2장애데이터를 출력하여 상기 IT인프라에 대한 장애를 예측하는 장애예측부; 상기 장애요소로 구부된 제2장애데이터를 전송받으면, 장애대응모델에 따라 IT인프라 장애 발생 전에 상기 장애유발요소을 처리하는 장애예방부; 상기 제1장애데이터와 제2장애데이터가 출력되는 과정과 그 과정에 의한 결과에 대한 근거와 상기 장애예방부에서의 상기 장애유발요소의 처리결과를 시각화하여 제공하는 시각화부; 및 강화학습 알고리즘에 따라 상기 인지모델, 분석모델, 장애예측모델, 장애대응모델을 생성하거나 업데이트를 수행하는 머신러닝부;를 포함한다.The IT infrastructure failure pre-processing system using machine learning algorithms extracts raw data for each IT infrastructure and source code development system, recognizes failure-causing factors using a cognitive model, and extracts the source code corresponding to the raw data and failure-causing factors. Microagent that labels extracted source code units; a data collection unit that receives labeled source code from the microagent and outputs first failure data, which is source code likely to cause a failure, through an analysis model; a failure prediction unit that inputs the first failure data into a failure prediction model and outputs second failure data that divides the first failure data into non-failure elements and failure elements to predict failures in the IT infrastructure; A failure prevention unit that, upon receiving the second failure data comprised of the failure elements, processes the failure-causing elements before an IT infrastructure failure occurs according to a failure response model; A visualization unit that visualizes and provides the basis for the process of outputting the first failure data and the second failure data and the results of the process, and the results of processing the failure-causing factors in the failure prevention unit; and a machine learning unit that generates or updates the cognitive model, analysis model, failure prediction model, and failure response model according to a reinforcement learning algorithm.

Description

IT infrastructure failure pre-processing system by machine learning algorithm}

본 발명은 머신러닝 알고리즘에 의해 사전에 감지된 IT인프라 장애를 유발하는 요소를 IT인프라 장애 발생 전에 처리하는 머신러닝 알고리즘에 의한 IT인프라 장애 사전 처리 시스템에 관한 것이다.The present invention relates to an IT infrastructure failure pre-processing system using a machine learning algorithm that processes factors causing IT infrastructure failures detected in advance by a machine learning algorithm before an IT infrastructure failure occurs.

IT인프라(IT Infrastructure) 는 네트워크, 서버, 데이터베이스, 정보보안, 시스템 소프트웨어 및 기반시설 등 IT서비스의 기반이 되는 시스템 및 구조를 의미한다.IT Infrastructure refers to the systems and structures that form the basis of IT services, including networks, servers, databases, information security, system software, and infrastructure.

4차 산업혁명 시대와 함께 IT 서비스 수요의 급증에 따라 사회 전반의 IT 서비스 인프라의 위험요인이 크게 잠재되어 있으며, 산업적으로도 IT인프라 수요의 급증에 따른 장애 발생 빈도가 증가하고 그에 따른 손실도 크게 증가하고 있다. 2017년부터 정부는 효과적인 정보시스템 장애 예방 및 대응을 위해 필요한 개선 사항의 발굴과 조치를 강하게 권고하고 있는 상황에서 이 같은 시장 환경과 수요에 비해 IT 분야의 장애 예측 및 예방 기술의 개발은 크게 진전이 없는 상황이며 해당 분야에 AI 기술의 접목은 더욱 요원한 것이 현실이다.With the era of the Fourth Industrial Revolution and the rapid increase in demand for IT services, there is a significant potential risk factor for IT service infrastructure throughout society. In the industry, the frequency of failures is increasing due to the rapid increase in demand for IT infrastructure, and the resulting losses are also significant. It is increasing. Since 2017, the government has been strongly recommending the discovery and action of necessary improvements to effectively prevent and respond to information system failures. Compared to the market environment and demand, the development of failure prediction and prevention technology in the IT field has made significant progress. The reality is that there is no such situation, and the application of AI technology to this field is even more distant.

또한 IT 장애관리 분야에서 전문화되고 경험 있는 기술인력의 부족은 갈수록 심화되어 가고 있는데 비해 웹/모바일 서비스의 증가로 장애발생은 폭발적으로 증가하고 있다. AI의 오류에 따른 잘못된 판단에 대비한 AI의 예측 근거와 과정에 대한 분석 기술과 함께 개발 단계에서 소스코드 오류 및 파악하기 어려운 장애 원인을 AI에 의해 사전에 감지하여 짧은 시간에 자동화된 장애 예방 기술에 대한 요구는 현재 뿐 만 아니라 향후에 더욱 급증할 것이다.In addition, the shortage of specialized and experienced technical personnel in the field of IT failure management is becoming increasingly severe, while the occurrence of failures is increasing explosively due to the increase in web/mobile services. Automated failure prevention technology in a short period of time by detecting source code errors and difficult-to-identify causes of failures in advance through AI during the development stage, along with analysis technology for AI's prediction basis and process to prepare for incorrect judgments due to AI errors. The demand for it will increase rapidly not only now but also in the future.

이에 비하여 그간의 국내외 서버 기반의 IT인프라 장애 예측 시스템 도입을 위해서는 기대효과에 비해 고가의 비용과 기대 수요에 최적화가 쉽지 않은 솔루션 구성 등의 어려운 문제가 상존하며, 거의 대부분의 IT 스타트업 기업을 포함한 중소 웹/모바일 서비스 기업은 기업 규모상 서버 시스템 운영 및 관리에 매우 큰 어려움을 겪고 있다. 단 한번의 서버 장애로 인해 브랜드가치 하락하거나 신뢰도가 저하되는 경우가 흔히 발생하는 경우가 빈번. 전문인력 부족으로 서버 장애시 즉각적인 복구가 되지 않아 회원이탈, 거래 및 결제 오류, 매출 등에 치명적 타격 등으로 폐업에 이르는 경우까지 발생되는 등 그 심각성은 매우 큰데 비해 현실적인 대응책은 부족한 상황이다.In contrast, the introduction of IT infrastructure failure prediction systems based on domestic and foreign servers still faces difficult problems, such as high cost compared to the expected effect and configuring a solution that is not easy to optimize for expected demand, including most IT startup companies. Small and medium-sized web/mobile service companies are experiencing great difficulties in operating and managing server systems due to their size. It is common for a single server failure to cause a drop in brand value or lower reliability. Due to a lack of professional manpower, immediate recovery is not possible in the event of a server failure, resulting in membership withdrawal, transaction and payment errors, and fatal blows to sales, leading to business closure. The seriousness of the problem is very serious, but realistic countermeasures are lacking.

서버 시스템 관련 장애는 기업입장에서 실제로 장애가 발생한 원인이 외부로 알려지지 않거나 알려져서는 안되는 사실들이 매우 많다. 금융기관 및 기타 웹/모바일 서비스를 통해 다양한 수익사업을 펴고 있는 기업들은 서비스를 중단할 수 없어 시스템 패치를 뒤로 미루는 경우가 많은데 이 때문에 위험성을 인식하면서도 시스템을 중단하고 패치를 적용하기 전까지 외부 공격과 장애에 노출되는 경우가 매우 많다. 이 경우 장애원인으로서 보고 또는 외부에 알려지면 해당 기업의 기술력에 치명적인 약점으로 노출되기에 공개하지 않을 뿐 아니라, 향후에도 반복적인 장애에 노출될 수밖에 없다. 또한, 전문인력 부족으로 패치 적용 및 시스템 점검을 수행하지 못하는 중소기업이 대부분인 것이 IT 업계의 현실이다.When it comes to server system-related failures, there are many facts in which the actual cause of the failure is not known or should not be known to the outside world from a company's perspective. Financial institutions and other companies engaged in various profit-making businesses through web/mobile services often postpone system patches because they cannot stop the service. For this reason, even though they are aware of the risks, they are still vulnerable to external attacks and threats until they stop the system and apply the patch. There are many cases where people are exposed to disabilities. In this case, if it is reported or known externally as the cause of the failure, it will be exposed as a fatal weakness in the company's technological capabilities, so not only will it not be disclosed, but it will also be exposed to repeated failures in the future. In addition, the reality in the IT industry is that most small and medium-sized businesses are unable to apply patches and perform system inspections due to a lack of professional manpower.

이와 함께, 기업의 복잡한 IT인프라 환경도 패치 적용 및 업데이트를 어렵게 하는 대표적인 이유로, 데이터베이스는 미들웨어나 애플리케이션에 연결돼 있는 경우가 대부분이며, 패치 및 업그레이드 정책은 단순 권고부터 버전 업그레이드까지 다양할 수 있다. 특히 문제점이 발견돼 버전을 업그레이드해야 경우 DB와 OS, 어플리케이션, 솔루션 등 제반 인프라와의 연계로 패치를 미룰 수밖에 없는 경우가 비일비재하다. DB나 OS를 업그레이드하기 위해 다른 애플리케이션과 기술적으로 단단히 맞물려 있는 미들웨어를 변경하기는 쉽지 않은 일이기 때문이다.In addition, a company's complex IT infrastructure environment is a representative reason why patch application and updates are difficult. Databases are often connected to middleware or applications, and patch and upgrade policies can vary from simple recommendations to version upgrades. In particular, when a problem is discovered and the version needs to be upgraded, it is common to have no choice but to postpone the patch due to linkage with overall infrastructure such as DB, OS, applications, and solutions. This is because it is not easy to change middleware that is technically tightly interconnected with other applications to upgrade the DB or OS.

이에 본 출원인은 머신 러닝에 의한 학습모델을 사용하여 시스템 개발 단계에서의 소스코드의 오류에 의하여 유발되는 장애를 예측하는 네트워크 장애예측 시스템을 대한민국 등록특허 제10-2295868호에서 제안한 바 있다.Accordingly, the present applicant has proposed in Korea Patent No. 10-2295868 a network failure prediction system that uses a learning model based on machine learning to predict failures caused by errors in source code at the system development stage.

대한민국 등록특허 제10-2295868호Republic of Korea Patent No. 10-2295868 대한민국 등록특허 제10-2391510호Republic of Korea Patent No. 10-2391510

본 발명은 머신러닝 알고리즘에 의한 학습모델을 사용하여 인지한 IT인프라 장애유발요소을 IT인프라 장애 발생 전에 자동으로 처리하는 머신러닝 알고리즘에 의한 IT인프라 장애 사전 처리 시스템을 제공하는 데 그 목적이 있다.The purpose of the present invention is to provide an IT infrastructure failure pre-processing system using a machine learning algorithm that automatically processes IT infrastructure failure-causing factors recognized using a learning model based on a machine learning algorithm before an IT infrastructure failure occurs.

본 발명의 머신러닝 알고리즘에 의한 IT인프라 장애 사전 처리 시스템은 IT인프라 및 소스코드 개발 시스템 별로 원시데이터를 추출하여 인지모델에 의해 장애유발요소을 인지하고, 상기 원시데이터와 장애유발요소에 대응하는 소스코드를 추출하여 추출된 소스코드 단위로 라벨링하는 마이크로에이전트; 상기 마이크로에이전트로부터 라벨링된 소스코드를 전송받아 분석모델을 통해 장애를 유발할 가능성이 있는 소스코드인 제1장애데이터를 출력하는 데이터수집부; 상기 제1장애데이터를 장애예측모델에 입력하여 상기 제1장애데이터를 비장애요소와 장애요소로 구분하는 제2장애데이터를 출력하여 상기 IT인프라에 대한 장애를 예측하는 장애예측부; 상기 장애요소로 구부된 제2장애데이터를 전송받으면, 장애대응모델에 따라 IT인프라 장애 발생 전에 상기 장애유발요소을 처리하는 장애예방부; 상기 제1장애데이터와 제2장애데이터가 출력되는 과정과 그 과정에 의한 결과에 대한 근거와 상기 장애예방부에서의 상기 장애유발요소의 처리결과를 시각화하여 제공하는 시각화부; 및 강화학습 알고리즘에 따라 상기 인지모델, 분석모델, 장애예측모델, 장애대응모델을 생성하거나 업데이트를 수행하는 머신러닝부;를 포함하고, 상기 장애예방부는 과거에 발생한 실제 IT인프라에 대한 장애 정보인 과거장애정보를 기준으로 IT인프라 장애 발생 전에 예방 가능한 IT인프라 장애 유형(이하 "제1장애유형"이라 함.)을 데이터베이스화하여 관리하는 장애정보관리모듈; 상기 제2장애데이터를 전송받아 제1장애유형에 속하는지 판단하는 제1판단모듈; 상기 제1판단모듈에서 제1장애유형으로 판단된 상기 제2장애데이터를 상기 장애대응모델에 입력하고, 상기 장애대응모델에 따라 상기 장애유발요소를 처리하고 그 처리 결과를 출력하는 장애처리모듈; 및 상기 장애처리모듈에 의해 출력되는 처리결과를 상기 머신러닝부로 전송하는 결과전송모듈;을 포함한다.The IT infrastructure failure pre-processing system based on the machine learning algorithm of the present invention extracts raw data for each IT infrastructure and source code development system, recognizes failure-causing elements using a cognitive model, and source code corresponding to the raw data and failure-causing elements. A microagent that extracts and labels extracted source code units; a data collection unit that receives labeled source code from the microagent and outputs first failure data, which is source code likely to cause a failure, through an analysis model; a failure prediction unit that inputs the first failure data into a failure prediction model and outputs second failure data that divides the first failure data into non-failure elements and failure elements to predict failures in the IT infrastructure; A failure prevention unit that, upon receiving the second failure data comprised of the failure elements, processes the failure-causing elements before an IT infrastructure failure occurs according to a failure response model; A visualization unit that visualizes and provides the basis for the process of outputting the first failure data and the second failure data and the results of the process, and the results of processing the failure-causing factors in the failure prevention unit; And a machine learning unit that generates or updates the cognitive model, analysis model, failure prediction model, and failure response model according to a reinforcement learning algorithm, wherein the failure prevention unit collects information on failures in the actual IT infrastructure that occurred in the past. A failure information management module that creates a database and manages IT infrastructure failure types that can be prevented before IT infrastructure failures occur (hereinafter referred to as “first failure type”) based on past failure information; A first judgment module that receives the second failure data and determines whether it belongs to the first failure type; a failure processing module that inputs the second failure data determined as the first failure type by the first judgment module into the failure response model, processes the failure-causing factors according to the failure response model, and outputs the processing result; and a result transmission module that transmits the processing results output by the error processing module to the machine learning unit.

또한, 상기 마이크로에이전트는 상기 IT인프라 및 소스코드 개발 시스템에서 상기 원시데이터를 추출하여 수집하는 데이터수집모듈; 상기 원시데이터에 포함되는 민감정보를 식별하여 처리하는 데이터필터링부; 상기 데이터필터링부로부터 상기 원시데이터를 전송받아 상기 인지모델에 입력하고, 상기 인지모델에 의해 상기 장애유발요소를 인지하는 인지모듈; 상기 장애유발요소와 관련되는 소스코드를 상기 소스코드 개발 시스템에서 수집하는 소스코드수집모듈; 상기 소스코드수집모듈에서 수집된 소스코드를 라벨링알고리즘에 따라 자동으로 라벨링하는 라벨링모듈; 및 상기 라벨링모듈에서 라벨링된 소스코드를 상기 데이터수집부로 전송하는 데이터전송모듈;을 포함한다.In addition, the microagent includes a data collection module that extracts and collects the raw data from the IT infrastructure and source code development system; a data filtering unit that identifies and processes sensitive information included in the raw data; a recognition module that receives the raw data from the data filtering unit, inputs it into the recognition model, and recognizes the failure-causing factors by the recognition model; a source code collection module that collects source code related to the failure-causing element from the source code development system; A labeling module that automatically labels the source code collected in the source code collection module according to a labeling algorithm; and a data transmission module that transmits the source code labeled in the labeling module to the data collection unit.

또한, 상기 데이터필터링부는 상기 원시데이터에 포함되는 민감정보를 식별하는 민감정보식별모듈; 상기 민감정보식별모듈에서 식별된 상기 민감정보가 상기 인지모듈에서 상기 장애유발요소를 인지함에 있어 필요한 필수민감정보인지 판별하는 민감정보판별모듈; 및 필수민감정보로 판별된 상기 민감정보는 일반정보로 자동치환하고, 필수민감정보가 아닌 상기 민감정보는 원시데이터에서 삭제하는 민감정보처리모듈;을 포함한다.Additionally, the data filtering unit includes a sensitive information identification module that identifies sensitive information included in the raw data; a sensitive information determination module that determines whether the sensitive information identified in the sensitive information identification module is essential sensitive information necessary for the recognition module to recognize the failure-causing element; and a sensitive information processing module that automatically replaces the sensitive information determined to be essential sensitive information with general information and deletes the sensitive information that is not essential sensitive information from raw data.

또한, 상기 장애예방부는 상기 제1장애유형을 장애처리를 위해 관리자의 승인이 필요한 제1-1장애유형과 장애처리를 위해 관리자의 승인이 필요하지 않은 제1-2장애유형으로 구분한 후 데이터베이스화하여 관리하는 장애예방유형관리모듈; 상기 제1판단모듈에 의해 상기 제1장애유형에 속하는 상기 제2장애데이터가 제1-1장애유형인지 제1-2장애유형인지를 판단하는 제2판단모듈; 상기 제2장애데이터가 상기 제1-1장애유형으로 판단되면, 관리자단말기로 상기 제1-1장애유형을 리포팅하고, 장애처리를 위한 승인을 요청하여 상기 관리자단말기로부터 승인여부를 수신하는 승인모듈;을 더 포함하고, 상기 장애처리모듈은 관리자단말기로부터 장애처리를 위한 승인을 받은 제1-1장애유형과 상기 제1-2장애유형에 대한 상기 제2장애데이터를 상기 장애대응모델에 입력하고, 상기 장애대응모델에 의해 상기 장애유발요소를 처리하고 그 처리 결과를 출력한다.In addition, the disability prevention department divides the first failure type into 1-1 failure type that requires administrator's approval for failure handling and 1-2 failure type that does not require administrator's approval for failure handling, and then creates a database. A failure prevention type management module that manages and manages; a second judgment module that determines whether the second fault data belonging to the first fault type is a 1-1 fault type or a 1-2 fault type by the first judgment module; When the second failure data is determined to be the first failure type, an approval module reports the first failure type to the manager terminal, requests approval for failure processing, and receives approval from the manager terminal. ; further comprising; wherein the failure processing module inputs the second failure data for the 1-1 failure type and the 1-2 failure type approved for failure handling from the administrator terminal into the failure response model; , the failure-causing factors are processed using the failure response model and the processing results are output.

또한, 상기 머신러닝부는 강화학습 알고리즘에 따라 상기 인지모델을 생성하거나 업데이트를 수행하는 인지모델학습부; 강화학습 알고리즘에 따라 상기 분석모델을 생성하거나 업데이트를 수행하는 분석모델학습부; 강화학습 알고리즘에 따라 상기 장애예측모델을 생성하거나 업데이트를 수행하는 장애예측모델학습부; 및 강화학습 알고리즘에 따라 상기 장애대응모델을 생성하거나 업데이트를 수행하는 장애대응모델학습부;를 포함한다.In addition, the machine learning unit includes a cognitive model learning unit that generates or updates the cognitive model according to a reinforcement learning algorithm; An analysis model learning unit that generates or updates the analysis model according to a reinforcement learning algorithm; A failure prediction model learning unit that generates or updates the failure prediction model according to a reinforcement learning algorithm; and a failure response model learning unit that generates or updates the failure response model according to a reinforcement learning algorithm.

또한, 상기 마이크로에이전트에서 라벨링된 소스코드 단위로 구분되는 기록데이터를 생성하여 데이터베이스화하고, 상기 마이크로에이전트에서 라벨링된 소스코드가 상기 데이터수집부, 장애예측부, 장애예방부에 의해 처리되는 과정을 상기 기록데이터에 기록하여 관리하는 기록관리부;를 더 포함하고, 상기 시각화부는 상기 제2 장애데이터에 대한 소스코드의 기록데이터에 기록된 내용을 역추적하여 장애원인과 장애예측과정을 시각화하는 장애예측시각화모듈; 및 상기 제2 장애데이터에 대한 소스코드의 기록데이터에 기록된 내용을 역추적하여 상기 장애유발요소를 처리하는 과정과 결과를 시각화하는 장애예방시각화모듈;을 포함한다.In addition, the microagent generates recorded data divided into labeled source code units and creates a database, and the source code labeled by the microagent is processed by the data collection unit, failure prediction unit, and failure prevention unit. It further includes a record management unit that records and manages the record data, wherein the visualization unit backtracks the content recorded in the record data of the source code for the second failure data and visualizes the cause of the failure and the failure prediction process. Visualization module; and a failure prevention visualization module that visualizes the process and results of processing the failure-causing factors by backtracking the contents recorded in the record data of the source code for the second failure data.

또한, 상기 장애예측부는 상기 데이터수집부에서 제1장애데이터를 전송받아 적어도 2개의 상기 장애예측모델 각각에 입력하고, 상기 장애예측모델 각각은 제1장애데이터를 예비비장애요소와 예비장애요소로 구분한 제2예비장애데이터를 출력하는 예비장애요소선별부; 및 상기 2개의 장애예측모델 각각이 출력한 제2예비장애데이터를 비교하여 상기 적어도 2개의 장애예측모델 중 적어도 어느 하나의 장애예측모델에 의해 예비장애요소로 출력된 상기 제2예비장애데이터를 IT인프라에 대한 장애를 일으킬 가능성이 있는 소스코드인 제2장애데이터로 출력하는 장애요소선별부;를 포함한다.In addition, the failure prediction unit receives the first failure data from the data collection unit and inputs it into each of at least two failure prediction models, and each of the failure prediction models divides the first failure data into a spare failure element and a spare failure element. A preliminary failure element selection unit that outputs second preliminary failure data; And by comparing the second preliminary failure data output from each of the two failure prediction models, the second preliminary failure data output as a preliminary failure element by at least one of the at least two failure prediction models is IT. It includes a failure element selection unit that outputs secondary failure data, which is source code that may cause a failure to the infrastructure.

본 발명의 머신러닝 알고리즘에 의한 IT인프라 장애 사전 처리 시스템에 의하면, 소스코드에 기반하여 IT인프라 장애 발생 전에 인지된 IT인프라 장애유발요소를 처리하고, 머신러닝 알고리즘 기반으로 IT인프라 장애유발요소를 처리하는 과정에 대한 근거를 제공할 수 있다.According to the IT infrastructure failure pre-processing system using the machine learning algorithm of the present invention, IT infrastructure failure-causing factors recognized before the IT infrastructure failure occurs are processed based on source code, and IT infrastructure failure-causing factors are processed based on the machine learning algorithm. It can provide a basis for the process.

또한, 본 발명은 IT인프라에 대한 장애유발요소를 인지함에 있어 원시데이터에 포함되는 민감정보를 삭제하거나 일반정보로 자동치환함으로써 민감정보의 유출을 방지할 수 있다.In addition, the present invention can prevent leakage of sensitive information by deleting or automatically replacing sensitive information included in raw data with general information when recognizing factors causing failures in IT infrastructure.

또한, 분산 인공지능 모델을 통해 장애유발요소를 추출함으로써, 원시데이터으로부터 장애유발요소의 빠른 인지가 가능할 뿐만 아니라, IT인프라 장애를 예측하는데 소요되는 시간이 단축되고, IT인프라 장애 예측의 정확도가 향상된다.In addition, by extracting failure-causing factors through a distributed artificial intelligence model, not only is it possible to quickly recognize failure-causing factors from raw data, the time required to predict IT infrastructure failures is shortened, and the accuracy of predicting IT infrastructure failures is improved. do.

도 1은 본 발명의 일 실시예에 따른 머신러닝 알고리즘에 의한 IT인프라 장애 사전 처리 시스템에 대한 개념도이다.
도 2는 도 1의 마이크로에이전트에 대한 개념도이다.
도 3은 도 1의 마이크로에이전트에 의해 수집되는 원시데이터를 도시한 표이다.
도 4는 도 2의 데이터필터링부에 대한 개념도이다.
도 5는 도 1의 장애예측부에 대한 개념도이다.
도 6은 도 1의 장애예방부에 대한 개념도이다.
도 7은 도 1의 시각화부에 대한 개념도이다.
도 8은 도 1의 머신러닝부에 대한 개념도이다.
도 9는 본 발명의 일 실시예에 따라 머신러닝 알고리즘에 의한 IT인프라 장애를 사전에 예측하고 처리하는 방법에 따른 순서도이다.
도 10은 도 9의 제1단계를 상세하게 도시화한 순서도이다.
도 11은 도 10의 제1-2단계를 상세하게 도시화한 순서도이다.
도 12는 도 9의 제3단계를 상세하게 도시화한 순서도이다.
도 13은 도 9의 제4단계를 상세하게 도시화한 순서도이다.1 shows machine learning according to an embodiment of the present invention This is a conceptual diagram of an algorithmic IT infrastructure failure pre-processing system.
FIG. 2 is a conceptual diagram of the microagent of FIG. 1.
FIG. 3 is a table showing raw data collected by the microagent of FIG. 1.
FIG. 4 is a conceptual diagram of the data filtering unit of FIG. 2.
Figure 5 is a conceptual diagram of the failure prediction unit of Figure 1.
Figure 6 is a conceptual diagram of the disability prevention department of Figure 1.
Figure 7 is a conceptual diagram of the visualization unit of Figure 1.
Figure 8 is a conceptual diagram of the machine learning unit of Figure 1.
Figure 9 shows machine learning according to an embodiment of the present invention. This is a flowchart of how to predict and deal with IT infrastructure failures based on algorithms.
FIG. 10 is a flowchart illustrating the first step of FIG. 9 in detail.
FIG. 11 is a flow chart illustrating steps 1 and 2 of FIG. 10 in detail.
FIG. 12 is a flowchart illustrating the third step of FIG. 9 in detail.
FIG. 13 is a flowchart illustrating the fourth step of FIG. 9 in detail.

이하에서, 첨부된 도면을 참조하여 실시예들을 상세하게 설명한다. 그러나, 실시예들에는 다양한 변경이 가해질 수 있어서 특허출원의 권리 범위가 이러한 실시예들에 의해 제한되거나 한정되는 것은 아니다. 실시예들에 대한 모든 변경, 균등물 내지 대체물이 권리 범위에 포함되는 것으로 이해되어야 한다.Hereinafter, embodiments will be described in detail with reference to the attached drawings. However, various changes can be made to the embodiments, so the scope of the patent application is not limited or limited by these embodiments. It should be understood that all changes, equivalents, or substitutes for the embodiments are included in the scope of rights.

실시예들에 대한 특정한 구조적 또는 기능적 설명들은 단지 예시를 위한 목적으로 개시된 것으로서, 다양한 형태로 변경되어 실시될 수 있다. 따라서, 실시예들은 특정한 개시형태로 한정되는 것이 아니며, 본 명세서의 범위는 기술적 사상에 포함되는 변경, 균등물, 또는 대체물을 포함한다.Specific structural or functional descriptions of the embodiments are disclosed for illustrative purposes only and may be modified and implemented in various forms. Accordingly, the embodiments are not limited to the specific disclosed form, and the scope of the present specification includes changes, equivalents, or substitutes included in the technical spirit.

제1 또는 제2 등의 용어를 다양한 구성요소들을 설명하는데 사용될 수 있지만, 이런 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 해석되어야 한다. 예를 들어, 제1구성요소는 제2구성요소로 명명될 수 있고, 유사하게 제2구성요소는 제1구성요소로도 명명될 수 있다.Terms such as first or second may be used to describe various components, but these terms should be interpreted only for the purpose of distinguishing one component from another component. For example, a first component may be named a second component, and similarly, the second component may also be named a first component.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다.When a component is referred to as being “connected” to another component, it should be understood that it may be directly connected or connected to the other component, but that other components may exist in between.

실시예에서 사용한 용어는 단지 설명을 목적으로 사용된 것으로, 한정하려는 의도로 해석되어서는 안된다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 명세서 상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terms used in the examples are for descriptive purposes only and should not be construed as limiting. Singular expressions include plural expressions unless the context clearly dictates otherwise. In this specification, terms such as “comprise” or “have” are intended to designate the presence of features, numbers, steps, operations, components, parts, or combinations thereof described in the specification, but are not intended to indicate the presence of one or more other features. It should be understood that this does not exclude in advance the possibility of the existence or addition of elements, numbers, steps, operations, components, parts, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as generally understood by a person of ordinary skill in the technical field to which the embodiments belong. Terms defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the related technology, and unless explicitly defined in the present application, should not be interpreted in an ideal or excessively formal sense. No.

또한, 첨부 도면을 참조하여 설명함에 있어, 도면 부호에 관계없이 동일한 구성 요소는 동일한 참조부호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다. 실시예를 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 실시예의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.In addition, when describing with reference to the accompanying drawings, identical components will be assigned the same reference numerals regardless of the reference numerals, and overlapping descriptions thereof will be omitted. In describing the embodiments, if it is determined that detailed descriptions of related known technologies may unnecessarily obscure the gist of the embodiments, the detailed descriptions are omitted.

본 발명은 머신러닝 알고리즘에 의한 IT인프라 장애를 유발하는 요소를 사전에 감지하여 처리하여 IT인프라 장애를 사전에 예방하는 것이 가능한 머신러닝 알고리즘에 의한 IT인프라 장애 사전 처리 시스템에 관한 것이다.The present invention relates to an IT infrastructure failure pre-processing system using a machine learning algorithm that can prevent IT infrastructure failures in advance by detecting and processing factors causing IT infrastructure failures in advance using a machine learning algorithm.

도 1은 본 발명의 일 실시예에 따른 머신러닝 알고리즘에 의한 IT인프라 장애 사전 처리 시스템(100)에 대한 개념도이다.Figure 1 is a conceptual diagram of an IT infrastructure failure pre-processing system 100 using a machine learning algorithm according to an embodiment of the present invention.

도 1를 참조하면, 본 발명의 일 실시예에 따른 머신러닝 알고리즘에 의한 IT인프라 장애 사전 처리 시스템(100)은 마이크로에이전트(110), 데이터수집부(120), 장애예측부(130), 시각화부(140), 머신러닝부(150), 기록관리부(160), 장애예방부(170)를 포함한다.Referring to FIG. 1, the IT infrastructure failure pre-processing system 100 using a machine learning algorithm according to an embodiment of the present invention includes a microagent 110, a data collection unit 120, a failure prediction unit 130, and visualization. It includes a unit 140, a machine learning unit 150, a record management unit 160, and a failure prevention unit 170.

상기 IT인프라(10)는 기업 또는 기관의 웹서버, WAS, DBMS서버를 포함하는 응용프로그램을 실행하기 위한 응용 플랫폼과 CRM서버, ERP서버, SCM서버를 포함하여 조직 내외부의 프로세스 또는 정보를 통합관리하기 위한 응용 솔루션과 라우터와 백본, 스위치 포함하는 네트워크 장비와 메일서버와, 스토리지와, 스토리지를 포함하며, 본 발명의 머신러닝 알고리즘에 의한 IT인프라 장애 사전 처리 시스템(100)은 상기 IT인프라(10)에 대한 장애 데이터를 수집, 정제, 추출, 분류, 가공, 분석하여 반복학습을 통한 머신러닝 알고리즘 기반의 IT인프라 장애를 예측한 후 처리하여 IT인프라 장애를 예방하고, IT인프라 장애를 감지, 예측, 처리한 결과와 그 판단 근거를 추척하여 시각화한다.The IT infrastructure 10 includes an application platform for executing applications including a web server, WAS, and DBMS server of a company or organization, and integrated management of processes or information inside and outside the organization, including a CRM server, ERP server, and SCM server. The IT infrastructure failure pre-processing system 100 using the machine learning algorithm of the present invention includes network equipment including routers, backbones, and switches, mail servers, storage, and storage. ) by collecting, refining, extracting, classifying, processing, and analyzing failure data to predict IT infrastructure failures based on machine learning algorithms through repeated learning and then processing them to prevent IT infrastructure failures, detect and predict IT infrastructure failures , the processed results and the basis for their judgment are tracked and visualized.

상기 마이크로에이전트(110)는 IT인프라(10) 및 소스코드 개발 시스템(20) 별로 원시데이터를 추출하여 인지모델에 의해 장애유발요소을 인지하고, 상기 원시데이터와 장애유발요소에 대응하는 소스코드를 추출하여 추출된 소스코드 단위로 라벨링을 수행한다.The microagent 110 extracts raw data for each IT infrastructure 10 and source code development system 20, recognizes failure-causing elements using a cognitive model, and extracts source code corresponding to the raw data and failure-causing elements. Labeling is performed in units of extracted source code.

도 2는 도 1의 마이크로에이전트(110)에 대한 개념도이다. 도 2를 참조하면, 본 발명의 일 실시예에 따른 마이크로에이전트(110)는 데이터수집모듈(111), 데이터필터링부(112), 인지모듈(113), 소스코드수집모듈(114), 라벨링모듈(115) 및 데이터전송모듈(116)을 포함한다.FIG. 2 is a conceptual diagram of the microagent 110 of FIG. 1. Referring to Figure 2, the microagent 110 according to an embodiment of the present invention includes a data collection module 111, a data filtering unit 112, a recognition module 113, a source code collection module 114, and a labeling module. (115) and data transmission module 116.

상기 데이터수집모듈(111)은 상기 IT인프라(10) 및 소스코드 개발 시스템(20)에서 상기 원시데이터를 추출하여 수집한다. 이때, 상기 데이터수집모듈(111)에 의해 수집되는 원시데이터는 사전에 미리 정의될 수 있는데, 도 3은 도 1의 마이크로에이전트(110)에 의해 수집되는 원시데이터를 도시한 표이다. The data collection module 111 extracts and collects the raw data from the IT infrastructure 10 and the source code development system 20. At this time, the raw data collected by the data collection module 111 may be predefined in advance, and FIG. 3 is a table showing the raw data collected by the microagent 110 of FIG. 1.

도 3을 참조하면, 상기 데이터수집모듈(111)은 서버, 네트워크, DBMS, 어플리케이션, 응용 솔루션, 응용 플랫폼에 대한 원시데이터를 수집한다.Referring to FIG. 3, the data collection module 111 collects raw data about servers, networks, DBMS, applications, application solutions, and application platforms.

예를 들어 DBMS서버의 경우 상기 원시데이터는 CPU, 메모리, 디스크, 프로세스, 파일시스템, 네트워크IO를 포함하는 서버에 대한 정보와 해당 DBMS서버에 설치된 DBMS에 대한 정보를 포함한다.For example, in the case of a DBMS server, the raw data includes information about the server including CPU, memory, disk, process, file system, and network IO, and information about the DBMS installed on the DBMS server.

이때, DBMS에 대한 정보는 데이터베이스의 IO 성능, 테이블 및 인덱스에 관한 정보를 포함할 수 있다.At this time, information about the DBMS may include information about the IO performance of the database, tables, and indexes.

상기 데이터필터링부(112)는 상기 원시데이터에 포함되는 민감정보를 식별하여 처리하는데 도 4는 도 2의 데이터필터링부(112)에 대한 개념도이다.The data filtering unit 112 identifies and processes sensitive information included in the raw data. FIG. 4 is a conceptual diagram of the data filtering unit 112 of FIG. 2.

도 4를 참조하면, 본 발명의 일 실시예에 따른 데이터필터링부(112)는 민감정보식별모듈(112a), 민감정보판별모듈(112b) 및 민감정보처리모듈(112c)을 포함한다.Referring to FIG. 4, the data filtering unit 112 according to an embodiment of the present invention includes a sensitive information identification module 112a, a sensitive information determination module 112b, and a sensitive information processing module 112c.

상기 민감정보식별모듈(112a)은 상기 원시데이터에 포함되는 민감정보를 식별하고, 상기 민감정보판별모듈(112b)은 상기 민감정보식별모듈(112a)에서 식별된 민감정보가 인지모듈(113)에서 장애유발요소를 인지함에 있어 필요한 필수민감정보인지 판별하며, 상기 민감정보처리모듈(112c)은 필수민감정보로 판별된 민감정보를 일반정보로 자동치환하고, 필수민감정보가 아닌 민감정보는 원시데이터에서 삭제한다.The sensitive information identification module (112a) identifies sensitive information included in the raw data, and the sensitive information determination module (112b) recognizes the sensitive information identified in the sensitive information identification module (112a) in the recognition module (113). It determines whether it is essential sensitive information necessary to recognize a failure-causing factor, and the sensitive information processing module 112c automatically replaces sensitive information determined as essential sensitive information with general information, and sensitive information that is not essential sensitive information is raw data. Delete from

예를 들어, DBMS서버의 경우 DBMS에 관한 정보의 테이블명이 기업 또는 기관에서 비밀리에 개발 중에 제품명을 포함하는 경우 해당 DBMS 테이블명은 민감정보가 될 수 있으며, IT인프라(10)를 통해 제공하는 서비스를 사용하는 사용자의 개인정보가 원시데이터에 포함되는 경우에는 사용자의 개인정보가 민감정보가 된다.For example, in the case of a DBMS server, if the table name of information about the DBMS includes the product name while being secretly developed by a company or organization, the DBMS table name may be sensitive information, and the service provided through the IT infrastructure (10) may be classified as sensitive information. If the user's personal information is included in the raw data, the user's personal information becomes sensitive information.

이때, 상기 사용자의 개인정보는 인지모듈(113)에서 장애유발요소를 인지함에 있어 필수적으로 사용되는 정보가 아닐 수 있는데, 이 경우 상기 민감정보식별모듈(112a)에서 사용자의 개인정보는 1차적으로 민감정보로 분류되고, 상기 민감정보판별모듈(112b)에서 2차적으로 필수민감정보에서 제외되며, 상기 민감정보처리모듈(112c)에서 상기 사용자의 개인정보는 원시데이터에서 삭제된다.At this time, the user's personal information may not be information that is essential for recognizing failure-causing factors in the recognition module 113. In this case, the user's personal information is primarily used in the sensitive information identification module 112a. It is classified as sensitive information, and is secondarily excluded from essential sensitive information in the sensitive information determination module 112b, and the user's personal information is deleted from the raw data in the sensitive information processing module 112c.

한편, 기업에서 개발 중인 제품명에 포함된 DBMS 테이블의 경우에는 상기 민감정보식별모듈(112a)에서 1차적으로 민감정보로 분류되고, 상기 민감정보판별모듈(112b)에서 2차적으로 필수민감정보로 분류되며, 상기 민감정보처리모듈(112c)에서 상기 필수민감정보인 개발 중인 제품명을 포함하는 DBMS테이블명을 일반적인 명칭의 DBMS테이블명으로 자동치환한다.Meanwhile, in the case of the DBMS table included in the product name under development by the company, it is primarily classified as sensitive information in the sensitive information identification module 112a, and secondarily classified as essential sensitive information in the sensitive information identification module 112b. The sensitive information processing module 112c automatically replaces the DBMS table name including the product name under development, which is the essential sensitive information, with a DBMS table name of a general name.

또한, 수집되는 원시데이터가 민감정보인지는 사전에 정의되어 미리 상기 민감정보식별모듈(112a)에 입력되어 저장되어 있을 수 있으며, 상기 원시데이터가 필수민감정보인지는 사전에 정의되어 미리 상기 민감정보처리모듈(112c)에 입력되어 있거나 강화학습 알고리즘을 통해 학습될 수 있다.In addition, whether the collected raw data is sensitive information may be defined in advance and inputted and stored in the sensitive information identification module 112a in advance, and whether the raw data is essential sensitive information may be defined in advance and the sensitive information may be stored in advance. It may be input to the processing module 112c or learned through a reinforcement learning algorithm.

본 발명의 일 실시예에 따른 마이크로에이전트(110)가 원시데이터에 포함되는 민감정보를 삭제하거나 일반정보로 자동치환하는 데이터필터링부(112)를 포함함으로써, IT인프라(10)에 대한 장애유발요소를 인지함에 있어 기업에서 개발 중인 제품명 또는 사용자의 개인정보와 같은 민감정보가 외부로 유출되는 것을 방지할 수 있다.The microagent 110 according to an embodiment of the present invention includes a data filtering unit 112 that deletes sensitive information included in raw data or automatically replaces it with general information, thereby creating a failure-causing element for the IT infrastructure 10. By recognizing this, you can prevent sensitive information, such as the name of a product under development by the company or the user's personal information, from being leaked to the outside.

다시 도 2를 참조하면, 상기 인지모듈(113)은 상기 데이터필터링부로부터 상기 원시데이터를 전송받아 상기 인지모델에 입력하고, 상기 인지모델에 의해 상기 장애유발요소를 인지하는데, 상기 인지모델은 강화학습 알고리즘에 의해 머신러닝부(150)에서 생성되거나 업데이트될 수 있으며, 상기 인지모델에 대하여는 도 8을 참조하여 상세히 설명하기로 한다. Referring again to FIG. 2, the recognition module 113 receives the raw data from the data filtering unit, inputs it into the recognition model, and recognizes the failure-causing factors by the recognition model, and the recognition model is strengthened. It can be created or updated in the machine learning unit 150 by a learning algorithm, and the cognitive model will be described in detail with reference to FIG. 8.

상기 소스코드수집모듈(114)은 상기 장애유발요소와 관련되는 소스코드를 상기 소스코드 개발 시스템(20)에서 수집한다. 이때, 상소 소스코드수집모듈(114)이 수집하는 소스코드는 도 3에서 서버 시스템 소스로 구분된 소소코드를 대상으로 IT인프라(10)의 각 서버의 종류에 따라 미리 설정된 값에 따른 소스코드를 수집한다.The source code collection module 114 collects source codes related to the failure-causing factors from the source code development system 20. At this time, the source code collected by the appeal source code collection module 114 is a source code according to a preset value according to the type of each server of the IT infrastructure 10 targeting the small code classified as server system source in FIG. Collect.

예를 들어, DBMS서버의 경우 인지모델에 의해 인지된 장애유발요소와 관련된 소스코드는 해당 DBMS서버에 대한 DBMS 영역 실행 정보가 될 수 있다.For example, in the case of a DBMS server, the source code related to the error-causing factors recognized by the cognitive model can be DBMS area execution information for the corresponding DBMS server.

한편, 상기 라벨링모듈(115)은 상기 소스코드수집모듈(114)에서 수집된 소스코드를 라벨링알고리즘에 따라 자동으로 라벨링한다.Meanwhile, the labeling module 115 automatically labels the source code collected by the source code collection module 114 according to a labeling algorithm.

이때, 라벨링알고리즘은 Presudo-labeling 기법의 라벨링 알고리즘을 사용할 수 있으며, Presudo-labeling 기법의 라벨링 알고리즘을 사용함에 있어, 충분히 학습되지 않아 소스코드를 신뢰도 있게 라벨링할 수 없는 경우에는 관리자 또는 다른 시스템에 의해 라벨링된 소스코드를 입력하여 반복적으로 강화학습할 수 있다.At this time, the labeling algorithm can use the labeling algorithm of the Presudo-labeling technique. When using the labeling algorithm of the Presudo-labeling technique, if the source code cannot be reliably labeled because it has not been sufficiently learned, the labeling algorithm may be used by the administrator or another system. Reinforcement learning can be performed repeatedly by inputting labeled source code.

상기 라벨링된 소스코드는 기록관리부(160)에 의해 관리되어 시각화부(140)에서 사용될 수 있다.The labeled source code can be managed by the record management unit 160 and used in the visualization unit 140.

상기 데이터전송모듈(116)은 상기 라벨링모듈(115)에서 라벨링된 소스코드를 상기 데이터수집부(120)로 전송한다. 이때, 상기 데이터전송모듈(116)은 Https, SSL 등과 같은 보안 프로토콜을 사용하거나, 서버/클라이언트 세션인증에 RSA2048을 사용하거나, DDOS를 회피하기 위한 알고리즘, 전송되는 데이터양을 줄이기 위한 프로토콜 압축 기술, 쓰레드 자원소모 방지를 위하여 timeout 처리 기능을 탑재할 수 있다. 또한, IT인프라(10)의 각 서버의 종류에 따라 필요한 통신프로토콜을 포함한다.The data transmission module 116 transmits the source code labeled in the labeling module 115 to the data collection unit 120. At this time, the data transmission module 116 uses a security protocol such as Https, SSL, etc., uses RSA2048 for server/client session authentication, an algorithm to avoid DDOS, protocol compression technology to reduce the amount of transmitted data, A timeout processing function can be installed to prevent thread resource consumption. In addition, it includes the necessary communication protocols according to the type of each server of the IT infrastructure 10.

또한, 데이터전송모듈(116)이 데이터수집부(120)로 데이터를 전송함에 있어 TCP/IP 프로토콜을 사용하여는 경우 응답이 지연될 수 있는데, 이를 최소화하기 위하여 상기 데이터전송모듈(116)은 상기 데이터수집부(120)로 전송할 데이터를 큐(Queue) 형식의 데이터구조를 가지는 메시지큐(Message Queue)에 입력한 후, 바로 다음으로 전송할 데이터를 처리하고, 상기 메시지큐에 입력된 데이터들은 들어온 순서대로 TCP/IP프로토콜을 사용하여 데이터수집부(120)로 전송될 수 있다.In addition, if the data transmission module 116 uses the TCP/IP protocol to transmit data to the data collection unit 120, the response may be delayed. In order to minimize this, the data transmission module 116 After data to be transmitted to the data collection unit 120 is input into a message queue having a queue-type data structure, the data to be transmitted immediately next is processed, and the data input to the message queue is entered in the order in which it was received. It can be transmitted to the data collection unit 120 using the TCP/IP protocol.

다시 도 1를 참조하면, 상기 데이터수집부(120)는 상기 마이크로에이전트(110)로부터 라벨링된 소스코드를 전송받아 분석모델을 통해 장애를 유발할 가능성이 있는 소스코드인 제1장애데이터를 출력한다.Referring again to FIG. 1, the data collection unit 120 receives labeled source code from the microagent 110 and outputs first failure data, which is source code that is likely to cause a failure, through an analysis model.

상기 분석모델은 강화학습 알고리즘에 의해 머신러닝부(150)에서 생성되거나 업데이트될 수 있는데, 상기 분석모델에 대하여는 도 8을 참조하여 상세히 설명하기로 한다.The analysis model may be created or updated in the machine learning unit 150 using a reinforcement learning algorithm. The analysis model will be described in detail with reference to FIG. 8.

상기 장애예측부(130)는 상기 제1장애데이터를 적어도 2개의 장애예측모델에 입력하여 상기 제1장애데이터를 비장애요소와 장애요소로 구분하는 제2장애데이터를 출력하여 상기 IT인프라(10)에 대한 장애를 예측한다.The failure prediction unit 130 inputs the first failure data into at least two failure prediction models and outputs second failure data that divides the first failure data into non-failure elements and failure elements to form the IT infrastructure 10. Predict failures for

이때, 상기 비장애요소는 장애를 일으킬 가능성이 낮은 소스코드를 의미하고, 상기 장애요소는 장애를 일으킬 가능성이 높은 소스코드를 의미한다.At this time, the non-failure element refers to source code that is unlikely to cause a failure, and the obstacle element refers to source code that is highly likely to cause a failure.

적어도 2개의 상기 장애예측모델 각각은 분석모델과 같이 강화학습 알고리즘에 의해 머신러닝부(150)에서 생성되거나 업데이트될 수 있는데, 상기 장애예측모델에 대하여는 도 8을 참조하여 상세히 설명하기로 한다.Each of the at least two failure prediction models may be generated or updated in the machine learning unit 150 by a reinforcement learning algorithm like an analysis model. The failure prediction models will be described in detail with reference to FIG. 8.

도 5는 도 1의 장애예측부(130)에 대한 개념도이다. 도 5를 참조하면, 상기 장애예측부(130)는 예비장애요소선별부(131)와 장애요소선별부(132)를 포함한다.Figure 5 is a conceptual diagram of the failure prediction unit 130 of Figure 1. Referring to FIG. 5, the failure prediction unit 130 includes a preliminary obstacle selection unit 131 and a failure element selection unit 132.

상기 예비장애요소선별부(131)는 데이터수집부(120)에서 제1장애데이터를 전송받아 적어도 2개의 장애예측모델 각각에 입력하고, 상기 장애예측모델 각각은 제1장애데이터를 예비비장애요소와 예비장애요소로 구분한 제2예비장애데이터를 출력한다.The preliminary failure element selection unit 131 receives the first failure data from the data collection unit 120 and inputs it into each of at least two failure prediction models, and each of the failure prediction models divides the first failure data into a preliminary failure element and Outputs the second preliminary failure data classified by preliminary failure elements.

상기 장애요소선별부(132)는 상기 2개의 장애예측모델 각각이 출력한 제2예비장애데이터를 비교하여 상기 적어도 2개의 장애예측모델 중 적어도 어느 하나의 장애예측모델에 의해 예비장애요소로 출력된 상기 제2예비장애데이터를 최종적으로 IT인프라(10)에 대한 장애를 일으킬 가능성이 있는 소스코드인 제2장애데이터로 출력한다.The obstacle selection unit 132 compares the second preliminary failure data output by each of the two failure prediction models to determine the preliminary failure element output by at least one failure prediction model among the at least two failure prediction models. The second preliminary failure data is finally output as second failure data, which is source code that has the potential to cause failure to the IT infrastructure 10.

이때, IT인프라(10)를 구성하는 각 서버의 설치 및 사용환경에 따라 상기 장애요소선별부(132)는 상기 2개의 장애예측모델 각각이 출력한 제2예비장애데이터를 비교하여 상기 2개의 장애예측모델 각각에서 예비장애요소로 출력된 상기 제2예비장애데이터를 최종적으로 IT인프라(10)에 대한 장애를 일으킬 가능성이 있는 소스코드인 제2장애데이터로 출력하게 구성하는 것도 가능하다 할 것이다.At this time, according to the installation and use environment of each server constituting the IT infrastructure 10, the failure element selection unit 132 compares the second preliminary failure data output from each of the two failure prediction models to determine the two failures. It would also be possible to configure the second preliminary failure data output as a preliminary failure element in each prediction model to be output as second failure data, which is source code that is likely to cause a failure to the IT infrastructure 10.

상기 기록관리부(160)는 상기 마이크로에이전트(110)에서 라벨링된 소스코드 단위로 구분되는 기록데이터를 생성하여 데이터베이스화하고, 상기 마이크로에이전트(110)에서 라벨링된 소스코드가 상기 데이터수집부(120), 장애예측부(130)에 의해 처리되는 과정을 상기 기록데이터에 기록하여 관리한다.The record management unit 160 generates and databases recorded data divided into source code units labeled by the microagent 110, and the source code labeled by the microagent 110 is stored in the data collection unit 120. , the process processed by the failure prediction unit 130 is recorded and managed in the record data.

도 6은 도 1의 장애예방부(170)에 대한 개념도이다.도 6을 참조하면, 상기 장애예방부(170)는 장애정보관리모듈(171), 제1판단모듈(172), 장애처리모듈(176), 결과전송모듈(177)을 포함하여, 상기 장애요소로 구부된 제2장애데이터를 전송받으면, 장애대응모델에 따라 IT인프라 장애 발생 전에 상기 장애유발요소을 처리한다.FIG. 6 is a conceptual diagram of the failure prevention unit 170 of FIG. 1. Referring to FIG. 6, the failure prevention unit 170 includes a failure information management module 171, a first judgment module 172, and a failure processing module. (176), including the result transmission module 177, when receiving the second failure data containing the failure element, the failure-causing element is processed before the IT infrastructure failure occurs according to the failure response model.

상기 장애정보관리모듈(171)은 과거에 발생한 실제 IT인프라(10)에 대한 장애 정보인 과거장애정보를 기준으로 IT인프라 장애 발생 전에 예방 가능한 IT인프라 장애 유형(이하 "제1장애유형"이라 함.)을 데이터베이스화하여 관리한다.The failure information management module 171 is an IT infrastructure failure type that can be prevented before an IT infrastructure failure occurs based on past failure information, which is information on failures in the actual IT infrastructure 10 that occurred in the past (hereinafter referred to as the “first failure type”). .) is managed in a database.

상기 제1판단모듈(172)은 상기 장애예측부(130)에서 장애요소로 구분되어 출력된 상기 제2장애데이터를 전송받아 제1장애유형에 속하는지 판단한다.The first judgment module 172 receives the second failure data output as a failure element from the failure prediction unit 130 and determines whether it belongs to the first failure type.

상기 장애처리모듈(176)은 상기 제1판단모듈(172)에서 제1장애유형으로 판단된 상기 제2장애데이터를 상기 장애대응모델에 입력하고, 상기 장애대응모델에 따라 IT인프라 장애 발생 전에 상기 장애유발요소를 처리하고 그 처리 결과를 출력한다.The failure processing module 176 inputs the second failure data determined as the first failure type by the first judgment module 172 into the failure response model, and according to the failure response model, before the IT infrastructure failure occurs. Processes failure-causing factors and outputs the processing results.

상기 장애대응모델은 강화학습 알고리즘에 의해 머신러닝부(150)에서 생성되거나 업데이트될 수 있는데, 상기 장애대응모델에 대하여는 도 8을 참조하여 상세히 설명하기로 한다.The failure response model may be created or updated in the machine learning unit 150 using a reinforcement learning algorithm. The failure response model will be described in detail with reference to FIG. 8.

상기 결과전송모듈(177)은 상기 장애처리모듈(176)에 의해 출력되는 처리결과를 상기 머신러닝부(150)로 전송한다. 결과전송모듈(177)을 통해 머신러닝부(150)로 전송된 상기 처리결과는 강화학습 알고리즘에 따라 인지모델, 분석모델, 장애예측모델 및 장애대응모델을 업데이트하는데 사용된다.The result transmission module 177 transmits the processing result output by the error processing module 176 to the machine learning unit 150. The processing results transmitted to the machine learning unit 150 through the result transmission module 177 are used to update the cognitive model, analysis model, failure prediction model, and failure response model according to the reinforcement learning algorithm.

상기 장애예방부(170)는 도 6에 도시된 바와 같이 장애예방유형관리모듈(173), 제2판단모듈(174), 승인모듈(175)을 더 포함할 수 있다.As shown in FIG. 6, the failure prevention unit 170 may further include a failure prevention type management module 173, a second judgment module 174, and an approval module 175.

상기 장애예방유형관리모듈(173)은 상기 제1장애유형을 장애처리를 위해 관리자의 승인이 필요한 제1-1장애유형과 장애처리를 위해 관리자의 승인이 필요하지 않은 제1-2장애유형으로 구분한 후 데이터베이스화하여 관리한다.The failure prevention type management module 173 divides the first failure type into a 1-1 failure type that requires administrator approval for failure handling and a 1-2 failure type that does not require administrator approval for failure handling. After classification, it is managed by creating a database.

상기 제2판단모듈(174)은 상기 제1판단모듈(172)에 의해 상기 제1장애유형에 속하는 상기 제2장애데이터가 제1-1장애유형인지 제1-2장애유형인지를 판단한다.The second judgment module 174 determines whether the second failure data belonging to the first failure type is the 1-1 failure type or the 1-2 failure type by the first determination module 172.

상기 승인모듈(175)은 상기 제2장애데이터가 상기 제1-1장애유형으로 판단되면, 관리자단말기로 상기 제1-1장애유형을 리포팅하고, 장애처리를 위한 승인을 요청하여 상기 관리자단말기로부터 승인여부를 수신한다.When the approval module 175 determines that the second failure data is the first failure type, it reports the first failure type to the manager terminal and requests approval for failure processing from the manager terminal. Receive approval.

만약, 장애예방부(170)가 상기 장애예방유형관리모듈(173), 제2판단모듈(174), 승인모듈(175)을 더 포함하여 구성되는 경우, 상기 장애처리모듈(176)은 관리자단말기로부터 장애처리를 위한 승인을 받은 제1-1장애유형과 상기 제1-2장애유형에 대한 제2장애데이터를 상기 장애대응모델에 입력하고, 상기 장애대응모델에 의해 IT인프라 장애 발생 전에 상기 장애유발요소를 처리하고 그 처리 결과를 출력한다.If the failure prevention unit 170 is configured to further include the failure prevention type management module 173, the second judgment module 174, and the approval module 175, the failure processing module 176 is an administrator terminal. Enter the second failure data for the 1-1 failure type and the 1-2 failure type approved for failure handling from the above into the failure response model, and detect the failure before the IT infrastructure failure occurs by the failure response model. Process triggers and output the processing results.

도 7은 도 1의 시각화부(140)에 대한 개념도이다. 도 7을 참조하면, 상기 시각화부(140)는 상기 제2장애데이터에 대한 소스코드의 기록데이터에 기록된 내용을 역추적하여 장애원인과 장애예측과정을 시각화하는 장애예측시각화모듈과 상기 제2 장애데이터에 대한 소스코드의 기록데이터에 기록된 내용을 역추적하여 상기 장애유발요소를 처리하는 과정과 결과를 시각화하는 장애예방시각화모듈(142)을 사용하여 상기 제1장애데이터와 제2장애데이터가 출력되는 과정과 그 과정에 의한 결과에 대한 근거를 추출한 후 시각화하여 제공한다.FIG. 7 is a conceptual diagram of the visualization unit 140 of FIG. 1. Referring to FIG. 7, the visualization unit 140 includes a failure prediction visualization module that visualizes the cause of the failure and the failure prediction process by backtracking the contents recorded in the record data of the source code for the second failure data, and the second failure prediction visualization module. The first failure data and the second failure data are used using the failure prevention visualization module 142, which visualizes the process and results of processing the failure-causing factors by backtracking the contents recorded in the record data of the source code for the failure data. The basis for the output process and the results resulting from the process is extracted and visualized.

이때, 상기 기록데이터의 데이터구조는 의사결정트리(Decision Tree)를 사용하여 구현되어 상기 기록관리부(160)에 의해 관리될 수 있고, 상기 장애예측시각화모듈은 장애유발요소라고 판단된 사전에 정의되지 않은 제1장애데이터 또는 제2장애데이터의 경우 모델 불가지론(Model-Agnostic)에 따라 상기 제1장애데이터와 제2장애데이터가 출력되는 과정과 그 과정에 의한 결과에 대한 근거를 추출할 수 있으며, 계층적 상관성 전파(Layer-wise Relevance Propagation, LRP)를 통하여 추출된 상기 제1장애데이터와 제2장애데이터가 출력되는 과정과 그 과정에 의한 결과에 대한 근거를 시각화할 수 있다.At this time, the data structure of the record data can be implemented using a decision tree and managed by the record management unit 160, and the failure prediction visualization module is not defined in the dictionary as a failure-causing factor. In the case of first failure data or second failure data that is not available, the basis for the process of outputting the first failure data and second failure data and the results of that process can be extracted according to model-agnostic, It is possible to visualize the process of outputting the first and second failure data extracted through hierarchical correlation propagation (LRP) and the basis for the results of that process.

또한, 장애예방시각화모듈(142)은 상기 제2 장애데이터에 대한 소스코드의 기록데이터에 기록된 내용을 역추적하여 상기 장애유발요소를 처리하는 과정과 결과를 시각화하는 과정에 있어 상기 장애예측시각화모듈과 같이 장애유발요소라고 판단된 사전에 정의되지 않은 제2장애데이터를 역추적하여 상기 제2장애데이터가 출력되는 과정과 그 과정에 의한 결과에 대한 근거를 추출하기 위하여 모델 불가지론(Model-Agnostic)에 따라 할 수 있으며, 제2장애데이터가 출력되는 과정과 그 과정에 의한 결과에 대한 근거를 시각화하기 위하여 계층적 상관성 전파(Layer-wise Relevance Propagation, LRP) 방법을 사용할 수 있다.In addition, the failure prevention visualization module 142 performs the failure prediction visualization in the process of visualizing the process and results of processing the failure-causing factors by backtracking the contents recorded in the record data of the source code for the second failure data. Model-agnostic (Model-Agnostic) to extract evidence for the process of outputting the second failure data and the results of that process by backtracking undefined second failure data that is determined to be a failure-causing element, such as a module. ), and the Layer-wise Relevance Propagation (LRP) method can be used to visualize the process of outputting the second failure data and the basis for the results of the process.

도 8은 도 1의 머신러닝부(150)에 대한 개념도이다. 도 8을 참조하면, 상기 머신러닝부(150)는 강화학습 알고리즘에 따라 상기 인지모델, 분석모델, 장애예측모델, 장애대응모델을 생성하거나 업데이트를 수행하기 위하여 인지모델학습부(151), 분석모델학습부(152), 장애예측모델학습부(153), 장애대응모델학습부(154)를 포함한다.FIG. 8 is a conceptual diagram of the machine learning unit 150 of FIG. 1. Referring to FIG. 8, the machine learning unit 150 uses the cognitive model learning unit 151 and analysis to generate or update the cognitive model, analysis model, failure prediction model, and failure response model according to a reinforcement learning algorithm. It includes a model learning unit (152), a disability prediction model learning unit (153), and a disability response model learning unit (154).

상기 인지모델학습부(151)는 강화학습 알고리즘에 따라 상기 인지모델을 생성하거나 업데이트를 수행하는데, 상기 인지모델학습부(151)는 이상치 탐지(anomaly detection) 알고리즘을 사용한 강화학습 알고리즘에 따라 인지모델을 생성하거나 업데이트할 수 있으며, 상기 인지모델은 과거의 원시데이터를 기반으로 일반적인 군집이나 집단, 그룹 등에서는 나타나지 않는 것들을 이상치라고 정의하고 상기 데이터필터링부(112)로부터 전송받은 원시데이터를 분석하여 이상치를 장애발생요소로 추출한다.The cognitive model learning unit 151 creates or updates the cognitive model according to a reinforcement learning algorithm. The cognitive model learning unit 151 creates a cognitive model according to a reinforcement learning algorithm using an anomaly detection algorithm. can be created or updated, and the cognitive model defines outliers as things that do not appear in general clusters, clusters, groups, etc. based on past raw data, and analyzes the raw data transmitted from the data filtering unit 112 to identify abnormalities. Extract the value as a failure factor.

상기 분석모델학습부(152)는 강화학습 알고리즘에 따라 상기 분석모델을 생성하거나 업데이트를 수행하는데, 상기 분석모델학습부(152)는 연관 규칙 학습(association rule learning) 알고리즘을 사용한 강화학습 알고리즘에 따라 분석모델을 생성하거나 업데이트할 수 있으며, 상기 분석모델은 과거의 장애이력 데이터를 토대로 특정서비스 또는 기능들이 호출되었을 경우 상기 마이크로에이전트(110)에서 전송받은 라벨링된 소스코드가 장애를 유발할 가능성이 있는지 판단하여 제1장애데이터를 출력한다. 이때, 연관 규칙 학습(association rule learning) 알고리즘으로 계산량을 줄이면서 빈발항목집단을 추출하기 위하여 Apriori 알고리즘을 사용할 수 있다.The analysis model learning unit 152 generates or updates the analysis model according to a reinforcement learning algorithm. The analysis model learning unit 152 follows a reinforcement learning algorithm using an association rule learning algorithm. An analysis model can be created or updated, and the analysis model determines whether the labeled source code transmitted from the microagent 110 is likely to cause a failure when specific services or functions are called based on past failure history data. This outputs the first failure data. At this time, the Apriori algorithm can be used to extract frequent item groups while reducing the amount of calculation using an association rule learning algorithm.

상기 장애예측모델학습부(153)는 강화학습 알고리즘에 따라 상기 장애예측모델을 생성하거나 업데이트를 수행하는데, 상기 장애예측모델학습부(153)는 이상치 탐지(anomaly detection) 알고리즘, 연관 규칙 학습(association rule learning) 알고리즘, CRNN(Classified Recurrent Neural Network) 알고리즘을 포함하는 적어도 2개의 강화학습 알고리즘을 선택하여 각 강화학습 알고리즘에 따라 적어도 2개의 장애예측모델을 생성하거나 업데이트를 수행한다.The failure prediction model learning unit 153 generates or updates the failure prediction model according to a reinforcement learning algorithm. The failure prediction model learning unit 153 uses an anomaly detection algorithm and association rule learning. rule learning) algorithm and CRNN (Classified Recurrent Neural Network) algorithm, select at least two reinforcement learning algorithms and generate or update at least two failure prediction models according to each reinforcement learning algorithm.

상기 장애대응모델학습부(154)는 강화학습 알고리즘에 따라 상기 장애대응모델을 생성하거나 업데이트를 수행하는데, 상기 장애대응모델학습부(154)는 강화학습 알고리즘에 따라 분석모델을 생성하거나 업데이트할 수 있으며, 상기 장애대응모델은 과거의 장애처리 데이터와 관리자의 승인여부를 토대로 특정서비스 또는 기능의 종류에 따라 IT인프라 장애 발생 전에 장애유발요소를 제거한다.The failure response model learning unit 154 generates or updates the failure response model according to a reinforcement learning algorithm. The failure response model learning unit 154 can generate or update an analysis model according to a reinforcement learning algorithm. The above failure response model removes failure-causing factors before an IT infrastructure failure occurs depending on the type of specific service or function, based on past failure handling data and administrator approval.

본 발명의 머신러닝 알고리즘에 의한 IT인프라 장애 사전 처리 시스템(100)의 성능을 확인하기 위하여 실시예 1 내지 8과 같이 테스트하였다. 이때, IT인프라(10)로 사용되는 클라이언트와 본 발명의 시스템이 설치되는 서버의 하드웨어 사양은 표 1과 같다.In order to confirm the performance of the IT infrastructure failure pre-processing system 100 using the machine learning algorithm of the present invention, tests were performed as in Examples 1 to 8. At this time, the hardware specifications of the client used as IT infrastructure 10 and the server on which the system of the present invention is installed are shown in Table 1.

서버server 운영체제operating system CentOS7CentOS7 하드웨어hardware CPUCPU Intel zeon E5 10Core 20Thread 2.2GhzIntel zeon E5 10Core 20Thread 2.2Ghz RAMRAM 32GB32GB HDDHDD 1TB1TB 클라이언트Client 운영체제operating system Windows10Windows10 하드웨어hardware CPUCPU Intel core i7 3.2GhzIntel core i7 3.2Ghz RAMRAM 16GB16GB HDDHDD 500GB500GB

1. HTTP 부하 발생기를 이용하여 1000명의 사용자가 웹페이지의 게시글 조회를 시작하고, 동시 접속자 수가 3만명이 도달할 때까지 5분 단위로 접속자 수를 1000명씩 증가시킨다.1. Using the HTTP load generator, 1,000 users start viewing posts on the web page, and the number of concurrent users is increased by 1,000 every 5 minutes until the number of concurrent users reaches 30,000.

2. 다음으로, 동시 접속자 수가 3만명에 도달하면, 1시간 동안 5분 단위로 에러가 발생하는 게시글을 조회하는 접속자 수를 1000명씩 감소시킨다.2. Next, when the number of concurrent users reaches 30,000, the number of users viewing posts with errors is reduced by 1,000 every 5 minutes for 1 hour.

3. 다음으로, 동시 접속자 수가 4만명에 도달할 때까지 5분 단위로 에러가 발생하는 게시글을 조회하는 접속자 수를 1000명씩 증가시킨다.3. Next, increase the number of users viewing posts with errors by 1,000 every 5 minutes until the number of concurrent users reaches 40,000.

4. 다음으로, 동시 접속자 수가 4만명에 도달하면, 1시간 동안 5분 단위로 에러가 발생하는 게시글을 조회하는 접속자 수를 1000명씩 감소시킨다.4. Next, when the number of concurrent users reaches 40,000, the number of users viewing posts with errors is reduced by 1,000 every 5 minutes for 1 hour.

5. 다음으로, 동시 접속자 수가 5만명에 도달할 때까지 5분 단위로 에러가 발생하는 게시글을 조회하는 접속자 수를 1000명씩 증가시킨다.5. Next, increase the number of users viewing posts with errors by 1,000 every 5 minutes until the number of concurrent users reaches 50,000.

6. 다음으로, 동시 접속자 수가 5만명에 도달하면, 부하 발생을 중단하고, 2 내지 5의 절차가 진행되는 동안 본 발명의 머신러닝 알고리즘에 의한 IT인프라 장애 사전 처리 시스템의 로그를 분석하여 제2장애데이터가 출력되는 시점까지의 응답 시간을 산출한다.6. Next, when the number of concurrent users reaches 50,000, load generation is stopped, and while procedures 2 to 5 are in progress, the log of the IT infrastructure failure pre-processing system by the machine learning algorithm of the present invention is analyzed to determine the second Calculate the response time until the failure data is output.

7. 1 내지 6의 절차를 10회 반복 수행한 후 산출되는 응답 시간의 평균을 산출한다.7. After repeating steps 1 to 6 10 times, calculate the average of the response times.

1. IT인프라 장애 상황을 유발시키는 위하여 HTTP 요청/응답 정보 데이터셋 6000건과, 시스템 리소스(CPU, Memory, Disk) 테스트 데이터셋 600건을 준비한다.1. Prepare 6000 HTTP request/response information datasets and 600 system resource (CPU, memory, disk) test datasets to trigger IT infrastructure failure situations.

2. 1시간 동안 HTTP 요청/응답 정보 데이터셋 6000건과, 시스템 리소스(CPU, Memory, Disk) 테스트 데이터셋 600건을 10분 동안 본 발명의 머신러닝 알고리즘에 의한 IT인프라 장애 사전 처리 시스템에 전송한다.2. Transmit 6000 HTTP request/response information datasets for 1 hour and 600 system resource (CPU, memory, disk) test datasets for 10 minutes to the IT infrastructure failure pre-processing system using the machine learning algorithm of the present invention. do.

3. 본 발명의 머신러닝 알고리즘에 의한 IT인프라 장애 사전 처리 시스템이 IT인프라 장애를 예측하는데 소요되는 시간과 검출률을 측정한다.3. The IT infrastructure failure pre-processing system using the machine learning algorithm of the present invention measures the time and detection rate required to predict IT infrastructure failure.

4. 1 내지 3의 절차를 10회 반복 수행한 후 산출되는 응답 시간과 검출률의 평균을 산출한다.4. After repeating procedures 1 to 3 10 times, calculate the average of the response time and detection rate.

1. IT인프라 장애 상황을 유발시키는 위하여 DB Lock 사례 테스트 데이터셋 80건을 준비한다.1. Prepare 80 DB Lock case test data sets to trigger IT infrastructure failure situations.

2. 1시간 동안 DB Lock 사례 테스트 데이터셋 80건을 본 발명의 머신러닝 알고리즘에 의한 IT인프라 장애 사전 처리 시스템에 전송한다.2. For 1 hour, 80 DB Lock case test data sets are transmitted to the IT infrastructure failure pre-processing system using the machine learning algorithm of the present invention.

1. IT인프라 장애 상황을 유발시키는 위하여 Connection Leak 사례 테스트 데이터셋 100건을 준비한다.1. Prepare 100 Connection Leak case test datasets to trigger IT infrastructure failure situations.

2. 1시간 동안 Connection Leak 사례 테스트 데이터셋 100건을 본 발명의 머신러닝 알고리즘에 의한 IT인프라 장애 사전 처리 시스템에 전송한다.2. For 1 hour, 100 Connection Leak case test datasets are transmitted to the IT infrastructure failure pre-processing system using the machine learning algorithm of the present invention.

1. IT인프라 장애 상황을 유발시키는 위하여 CPU 사용량 증가 테스트 데이터셋 100건을 준비한다.1. Prepare 100 test datasets of increased CPU usage to trigger IT infrastructure failure situations.

2. 1시간 동안 CPU 사용량 증가 테스트 데이터셋 100건을 본 발명의 머신러닝 알고리즘에 의한 IT인프라 장애 사전 처리 시스템에 전송한다.2. 100 CPU usage increase test datasets for 1 hour are transmitted to the IT infrastructure failure pre-processing system using the machine learning algorithm of the present invention.

1. IT인프라 장애 상황을 유발시키는 위하여 Memory 사용량 증가 테스트 데이터셋 100건을 준비한다.1. Prepare 100 memory usage increase test datasets to trigger IT infrastructure failure situations.

2. 1시간 동안 Memory 사용량 증가 테스트 데이터셋 100건을 본 발명의 머신러닝 알고리즘에 의한 IT인프라 장애 사전 처리 시스템에 전송한다.2. 100 memory usage increase test datasets for 1 hour are transmitted to the IT infrastructure failure pre-processing system using the machine learning algorithm of the present invention.

1. IT인프라 장애 상황을 유발시키는 위하여 Disk 사용량 증가 테스트 데이터셋 100건을 준비한다.1. Prepare 100 test datasets of increased disk usage to trigger IT infrastructure failure situations.

2. 1시간 동안 Disk 사용량 증가 테스트 데이터셋 100건을 본 발명의 머신러닝 알고리즘에 의한 IT인프라 장애 사전 처리 시스템에 전송한다.2. 100 disk usage increase test datasets for 1 hour are transmitted to the IT infrastructure failure pre-processing system using the machine learning algorithm of the present invention.

또한, 비정규 원시데이터에 의한 IT인프라 장애를 사전에 예측할 수 있는지 시험하기 위하여 아래의 순서로 시험하였다.In addition, to test whether IT infrastructure failures caused by non-normal raw data can be predicted in advance, the test was conducted in the following order.

1. IT인프라 장애 상황을 유발시키는 위하여 비정규 사용자 사례 테스트 데이터셋 100건을 준비한다.1. Prepare 100 irregular user case test datasets to trigger IT infrastructure failure situations.

2. 1시간 동안 비정규 사용자 사례 테스트 데이터셋 100건을 본 발명의 머신러닝 알고리즘에 의한 IT인프라 장애 사전 처리 시스템에 전송한다.2. For 1 hour, 100 test datasets of irregular user cases are transmitted to the IT infrastructure failure pre-processing system using the machine learning algorithm of the present invention.

3. 본 발명의 머신러닝 알고리즘에 의한 IT인프라 장애 사전 처리 시스템이 IT인프라 장애를 예측하는 검출률을 측정한다.3. The IT infrastructure failure pre-processing system based on the machine learning algorithm of the present invention measures the detection rate for predicting IT infrastructure failure.

4. 1 내지 3의 절차를 10회 반복 수행한 후 산출되는 검출률의 평균을 산출한다.4. After repeating procedures 1 to 3 10 times, calculate the average of the detection rate.

다음으로, 본 발명의 머신러닝 알고리즘에 의한 IT인프라 장애 사전 처리 시스템(100)의 성능을 확인하기 위하여 Samsung SDS 사의 IT인프라 장애 사전 감지 솔루션인 "maxigent"(이하, "종래 솔루션"이라 함.)에서 실시예 1 내지 8과 동일한 방식으로 IT인프라 장애 인지에 소요되는 시간과 검출률을 측정하였다.Next, in order to check the performance of the IT infrastructure failure pre-processing system 100 using the machine learning algorithm of the present invention, “maxigent”, an IT infrastructure failure pre-detection solution from Samsung SDS (hereinafter referred to as “conventional solution”). In the same manner as Examples 1 to 8, the time required to recognize IT infrastructure failures and the detection rate were measured.

본 발명의 머신러닝 알고리즘에 의한 IT인프라 장애 사전 처리 시스템(100)과 종래 솔루션을 통해 상기 실시예 1 내지 8에 의해 측정된 의 응답시간과 검출률은 아래 표 2와 같았다.The response times and detection rates of measured in Examples 1 to 8 using the IT infrastructure failure pre-processing system 100 using the machine learning algorithm of the present invention and conventional solutions are shown in Table 2 below.

본 발명의 시스템System of the present invention 종래 솔루션conventional solution 응답속도response speed 검출률Detection rate 응담속도Response speed 검출률Detection rate 실시예 1Example 1 2.072.07 99.1599.15 5.965.96 95.6595.65 실시예 2Example 2 2.112.11 99.5599.55 4.634.63 93.6393.63 실시예 3Example 3 2.492.49 99.0999.09 5.935.93 91.2391.23 실시예 4Example 4 2.772.77 99.9399.93 5.665.66 95.2195.21 실시예 5Example 5 2.442.44 99.2799.27 3.193.19 91.8991.89 실시예 6Example 6 2.222.22 99.7299.72 3.363.36 95.3695.36 실시예 7Example 7 2.012.01 99.5499.54 4.414.41 94.8694.86 실실예 8Real example 8 2.862.86 99.2099.20 3.453.45 93.7493.74

표 2에 기재된 바와 같이, 본 발명의 머신러닝 알고리즘에 의한 IT인프라 장애 사전 처리 시스템(100)은 소스코드에 기반하여 IT인프라 장애를 유발하는 다양한 종류의 장애유발요소를 3초 이내의 시간에 인지할 수 있었으며, 99%이상의 장애유발요소의 검출률을 기록하였으나, 종래 솔루션은 IT인프라 장애를 사전에 인지하는데 3초에서 6초 사이의 시간이 소요되었으며, 검출률 또한 91% 에서 96% 사이를 기록하였음을 확인할 수 있었다.As shown in Table 2, the IT infrastructure failure pre-processing system 100 based on the machine learning algorithm of the present invention recognizes various types of failure-causing factors that cause IT infrastructure failure within 3 seconds based on the source code. was able to do so, and recorded a detection rate of over 99% of failure-causing factors. However, the conventional solution took between 3 and 6 seconds to recognize IT infrastructure failures in advance, and the detection rate was also recorded between 91% and 96%. was able to confirm.

이하에서는 본 발명의 일 실시예에 따라 머신러닝 알고리즘에 의한 IT인프라 장애를 사전에 예측하고 처리하는 방법에 대하여 도 9 내지 도 13을 참조하여 상세히 설명하기로 한다.Hereinafter, a method for predicting and processing IT infrastructure failures by a machine learning algorithm in advance according to an embodiment of the present invention will be described in detail with reference to FIGS. 9 to 13.

상기 머신러닝 알고리즘에 의한 IT인프라 장애를 사전에 예측하고 처리하는 방법에 의하면, 소스코드에 기반하여 IT인프라(10)에 대한 장애유발요소를 사전에 자동으로 인지하여 처리할 수 있을 뿐만 아니라, 머신러닝 알고리즘 기반으로 IT인프라 장애를 예측하는 근거와 처리결과를 제공할 수 있는데, 도 9는 발명의 일 실시예에 따라 머신러닝 알고리즘에 의한 IT인프라 장애를 사전에 예측하고 처리하는 방법에 따른 순서도이다.According to the method of predicting and processing IT infrastructure failures in advance using the machine learning algorithm, not only can failure-causing factors for the IT infrastructure 10 be automatically recognized and processed in advance based on the source code, but also the machine Based on a learning algorithm, it is possible to provide grounds for predicting IT infrastructure failures and processing results. Figure 9 is a flowchart of a method for predicting and processing IT infrastructure failures by a machine learning algorithm in advance according to an embodiment of the invention. .

도 10은 도 9의 제1단계(S100)를 상세하게 도시화한 순서도이고, 도 11는 도 10의 제1-2단계(S120)를 상세하게 도시화한 순서도이며, 도 12는 도 9의 제3단계(S300)를 상세하게 도시화한 순서도이고, 도 13은 도 9의 제4단계(S400)를 상세하게 도시화한 순서도이다.FIG. 10 is a flow chart illustrating the first step (S100) of FIG. 9 in detail, FIG. 11 is a flow chart illustrating the first and second steps (S120) of FIG. 10 in detail, and FIG. 12 is a flow chart illustrating the third step (S120) of FIG. 9. This is a flowchart showing the step (S300) in detail, and FIG. 13 is a flowchart showing the fourth step (S400) of FIG. 9 in detail.

도 9 내지 도 13을 참조하면, 머신러닝 알고리즘에 의한 IT인프라 장애를 사전에 예측하고 처리하는 방법은 제1단계(S100) 부터 제4단계(S400)까지 크게 4개의 단계를 포함하여 수행된다.Referring to FIGS. 9 to 13, the method of predicting and processing IT infrastructure failures using a machine learning algorithm is largely performed including four steps from the first step (S100) to the fourth step (S400).

제1단계(S100)는 마이크로에이전트(110)가 IT인프라(10) 및 소스코드 개발 시스템(20) 별로 원시데이터를 추출하여 인지모델에 의해 장애유발요소을 인지하고, 상기 원시데이터와 장애유발요소에 대응하는 소스코드를 추출하여 추출된 소스코드 단위로 라벨링한다.In the first step (S100), the microagent 110 extracts raw data for each IT infrastructure 10 and source code development system 20, recognizes failure-causing factors using a cognitive model, and The corresponding source code is extracted and labeled in units of extracted source code.

이때, 상기 제1단계(S100)는 제1-1단계(S110)부터 제1-6단계(S160)까지 6개의 단계를 포함하여 수행되는데, 제1-1단계(S110)는 데이터수집모듈(111)이 상기 IT인프라(10) 및 소스코드 개발 시스템(20)에서 상기 원시데이터를 추출하여 수집하고, 제1-2단계(S120)는 데이터필터링부(112)가 상기 원시데이터에 포함되는 민감정보를 식별하여 처리한다.At this time, the first step (S100) is performed including six steps from the 1-1 step (S110) to the 1-6 step (S160). The 1-1 step (S110) is a data collection module ( 111) extracts and collects the raw data from the IT infrastructure 10 and the source code development system 20, and in the first and second steps (S120), the data filtering unit 112 extracts and collects the raw data included in the raw data. Identify and process information.

상기 제1-2단계(S120)는 다시 제1-21단계부터 제1-23단계까지 3개의 단계를 포함하여 수행되는데, 제1-21단계는 민감정보식별모듈(112a)이 상기 원시데이터에 포함되는 민감정보를 식별하고, 제1-22단계는 민감정보판별모듈(112b)이 상기 제1-21단계에서 식별된 상기 민감정보가 상기 인지모듈(113)에서 상기 장애유발요소를 인지함에 있어 필요한 필수민감정보인지 판별하며, 제1-23단계는 민감정보처리모듈(112c)이 필수민감정보로 판별된 상기 민감정보는 일반정보로 자동치환하고, 필수민감정보가 아닌 상기 민감정보는 원시데이터에서 삭제한다.The 1-2 step (S120) is again performed including three steps from the 1-21st step to the 1-23rd step. In the 1-21st step, the sensitive information identification module 112a The sensitive information included is identified, and in steps 1 to 22, the sensitive information determination module 112b recognizes the failure-causing element in the recognition module 113 when the sensitive information identified in steps 1 to 21 is identified. It is determined whether it is necessary essential sensitive information, and in steps 1-23, the sensitive information processing module 112c automatically replaces the sensitive information determined as essential sensitive information with general information, and the sensitive information that is not essential sensitive information is raw data. Delete from

다음으로, 제1-3단계(S130)는 인지모듈(113)이 상기 데이터필터링부로부터 상기 원시데이터를 전송받아 상기 인지모델에 입력하고, 상기 인지모델에 의해 상기 장애유발요소를 인지한다.Next, in step 1-3 (S130), the recognition module 113 receives the raw data from the data filtering unit, inputs it into the recognition model, and recognizes the failure-causing element by the recognition model.

다음으로, 제1-4단계(S140)가 소스코드수집모듈(114)이 상기 장애유발요소와 관련되는 소스코드를 상기 소스코드 개발 시스템(20)에서 수집하고, 제1-5단계(S150)가 라벨링모듈(115)이 상기 소스코드를 라벨링알고리즘에 따라 자동으로 라벨링을 수행한 후에 제1-6단계(S160)에서 데이터전송모듈(116)이 상기 제1-5단계(S150)에서 라벨링된 소스코드를 상기 데이터수집부(120)로 전송한다.Next, in steps 1-4 (S140), the source code collection module 114 collects source codes related to the failure-causing elements from the source code development system 20, and in steps 1-5 (S150) After the labeling module 115 automatically labels the source code according to the labeling algorithm, the data transmission module 116 is labeled in steps 1-5 (S150) in steps 1-6 (S160). The source code is transmitted to the data collection unit 120.

다음으로, 제2단계(S200)는 데이터수집부(120)가 상기 마이크로에이전트(110)로부터 라벨링된 소스코드를 전송받아 분석모델을 통해 장애를 유발할 가능성이 있는 소스코드인 제1장애데이터를 출력한다. Next, in the second step (S200), the data collection unit 120 receives the labeled source code from the microagent 110 and outputs first failure data, which is source code that is likely to cause a failure, through an analysis model. do.

다음으로 제3단계(S300)는 장애예측부(130)가 상기 제1장애데이터를 장애예측모델에 입력하여 상기 제1장애데이터를 비장애요소와 장애요소로 구분하는 제2장애데이터를 출력하여 상기 IT인프라(10)에 대한 장애를 예측한다.Next, in the third step (S300), the failure prediction unit 130 inputs the first failure data into the failure prediction model and outputs second failure data that divides the first failure data into non-failure elements and failure elements. Predict failures in IT infrastructure (10).

이때, 상기 제3단계(S300)는 다시 제3-1단계(S310)와 제3-2단계(S320)를 포함하여 수행되는데, 제3-1단계(S310)는 예비장애요소선별부(131)가 상기 데이터수집부(120)에서 제1장애데이터를 전송받아 적어도 2개의 상기 장애예측모델 각각에 입력하고, 상기 장애예측모델 각각은 제1장애데이터를 예비비장애요소와 예비장애요소로 구분한 제2예비장애데이터를 출력하고, 제3-2단계(S320)는 장애요소선별부(132)가 상기 2개의 장애예측모델 각각이 출력한 제2예비장애데이터를 비교하여 상기 적어도 2개의 장애예측모델 중 적어도 어느 하나의 장애예측모델에 의해 예비장애요소로 출력된 상기 제2예비장애데이터를 IT인프라(10)에 대한 장애를 일으킬 가능성이 있는 소스코드인 제2장애데이터로 출력한다.At this time, the third step (S300) is again performed including the 3-1 step (S310) and the 3-2 step (S320). The 3-1 step (S310) is the preliminary obstacle screening unit 131. ) receives the first failure data from the data collection unit 120 and inputs it into each of at least two failure prediction models, and each of the failure prediction models divides the first failure data into a spare failure element and a spare failure element. The second preliminary failure data is output, and in step 3-2 (S320), the failure element selection unit 132 compares the second preliminary failure data output by each of the two failure prediction models to predict the at least two failures. The second preliminary failure data output as a preliminary failure element by at least one failure prediction model among the models is output as second failure data, which is source code that is likely to cause a failure to the IT infrastructure 10.

다음으로 제4단계(S400)는 장애예방부(170)가 상기 장애요소로 구부된 제2장애데이터를 전송받으면, 장애대응모델에 따라 IT인프라 장애 발생 전에 상기 장애유발요소을 처리한다.Next, in the fourth step (S400), when the failure prevention unit 170 receives the second failure data containing the failure elements, it processes the failure-causing elements before an IT infrastructure failure occurs according to a failure response model.

이때, 상기 제4단계(S400)는 다시 제4-1단계(S410)부터 제4-3단계(S430)까지 3개의 단계를 포함하여 수행되는데, 제4-1단계(S410)는 제1판단모듈(172)이 상기 제2장애데이터를 전송받아 제1장애유형에 속하는지 판단한 후에, 제4-2단계(S420)는 장애처리모듈(176)이 상기 제1판단모듈(172)에서 제1장애유형으로 판단된 상기 제2장애데이터를 상기 장애대응모델에 입력하고, 상기 장애대응모델에 따라 상기 장애유발요소를 처리하고 그 처리 결과를 출력하며, 마지막으로 제4-3단계(S430)에서 결과전송모듈(177)이 상기 제4-2단계(S420)에서 장애처리모듈(176)에 의해 출력되는 처리결과를 상기 머신러닝부(150)로 전송한다.At this time, the fourth step (S400) is again performed including three steps from the 4-1 step (S410) to the 4-3 step (S430), and the 4-1 step (S410) is the first judgment. After the module 172 receives the second failure data and determines whether it belongs to the first failure type, in step 4-2 (S420) the failure processing module 176 receives the first failure data from the first judgment module 172. The second failure data determined to be a failure type is input into the failure response model, the failure-causing factors are processed according to the failure response model, and the processing results are output. Finally, in step 4-3 (S430) The result transmission module 177 transmits the processing result output by the error processing module 176 in step 4-2 (S420) to the machine learning unit 150.

또한, 도 13에 도시된 바와 같이 상기 제4단계(S400)는 제4-1단계(S410)와 제4-2단계(S420) 사이에 제4-1a단계(S410a)와 제4-1b단계(S410b)를 더 포함하여 수행될 수 있는데, 상기 제4-1a단계(S410a)는 상기 제4-1단계(S410) 수행 후에, 제2판단모듈(174)에 의해 상기 제1판단모듈(172)에 의해 상기 제1장애유형에 속하는 상기 제2장애데이터가 제1-1 장애유형인지 제1-2 장애유형인지를 판단하고, 제4-1b단계(S410b)는 상기 제4-1a단계(S410a) 수행 후에, 승인모듈(175)이 상기 제4-1단계(S410)에서 상기 제2장애데이터가 상기 제1-1 장애유형으로 판단되면, 관리자단말기로 상기 제1-1 장애유형을 리포팅하고, 장애처리를 위한 승인을 요청하여 상기 관리자단말기로부터 승인여부를 수신한다.In addition, as shown in Figure 13, the fourth step (S400) is a step 4-1a (S410a) and a 4-1b step between the 4-1 step (S410) and the 4-2 step (S420). It may be performed further including (S410b), wherein the first judgment module 172 is performed by the second judgment module 174 after the fourth-1a step (S410) is performed. ) to determine whether the second failure data belonging to the first failure type is the 1-1 failure type or the 1-2 failure type, and the 4-1b step (S410b) is the 4-1a step ( After performing S410a), if the approval module 175 determines that the second failure data is the 1-1 failure type in step 4-1 (S410), it reports the 1-1 failure type to the administrator terminal. Then, approval for fault handling is requested and approval is received from the manager terminal.

한편, 제4-1단계(S410)와 제4-2단계(S420) 사이에 제4-1a단계(S410a)와 제4-1b단계(S410b)를 더 포함하여 수행되는 경우 상기 제4-2단계(S420)에서 상기 장애처리모듈(176)은 관리자단말기로부터 장애처리를 위한 승인을 받은 제1-1 장애유형과 상기 제1-2 장애유형에 대한 상기 제2장애데이터를 상기 장애대응모델에 입력하고, 상기 장애대응모델에 의해 상기 장애유발요소를 처리하고 그 처리 결과를 출력한다.On the other hand, when step 4-1a (S410a) and step 4-1b (S410b) are further included between step 4-1 (S410) and step 4-2 (S420), step 4-2 In step S420, the failure processing module 176 stores the second failure data for the 1-1 failure type and the 1-2 failure type approved for failure handling by the administrator terminal into the failure response model. Input, the failure-causing factors are processed using the failure response model, and the processing results are output.

본 발명의 머신러닝 알고리즘에 의한 IT인프라 장애 사전 처리 시스템(100)과 머신러닝 알고리즘에 의한 IT인프라 장애를 사전에 예측하고 처리하는 방법에 의하면, 계층적 구조로 설계되는 인지모델, 분석모델, 장애예측모델을 이용한 분산 인공지능 모델을 통해 원시데이터를 분석하여 장애유발요소를 인지하여 처리함으로써, 원시데이터으로부터 장애유발요소의 빠른 인지와 처리가 가능할 뿐만 아니라, IT인프라 장애를 예측하고 예방하는데 소요되는 시간이 단축되고, IT인프라 장애 예측의 정확도가 향상된다.According to the IT infrastructure failure pre-processing system 100 using the machine learning algorithm of the present invention and the method for predicting and processing IT infrastructure failure in advance using the machine learning algorithm, the cognitive model, analysis model, and failure are designed in a hierarchical structure. By analyzing raw data through a distributed artificial intelligence model using a predictive model to recognize and process failure-causing factors, not only is it possible to quickly recognize and process failure-causing factors from raw data, but it is also possible to predict and prevent IT infrastructure failures. Time is shortened, and the accuracy of IT infrastructure failure prediction is improved.

또한, 소스코드에 기반하여 IT인프라(10)에 대한 장애유발요소를 사전에 자동으로 예방가능하고, 머신러닝 알고리즘 기반으로 IT인프라 장애유발요소를 예측하는 근거를 제공할 수 있으며, IT인프라(10)에 대한 장애유발요소를 인지함에 있어 원시데이터에 포함되는 민감정보를 삭제하거나 일반정보로 자동치환함으로써 민감정보의 유출을 방지할 수 있다.In addition, based on the source code, it is possible to automatically prevent factors that cause failures in IT infrastructure (10) in advance, and provide a basis for predicting factors that cause failures in IT infrastructure (10) based on machine learning algorithms. ), the leakage of sensitive information can be prevented by deleting the sensitive information included in the raw data or automatically replacing it with general information.

이상에서 설명된 실시예들은 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치, 방법 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The embodiments described above may be implemented with hardware components, software components, and/or a combination of hardware components and software components. For example, the devices, methods, and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, and a field programmable gate (FPGA). It may be implemented using one or more general-purpose or special-purpose computers, such as an array, programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions. A processing device may execute an operating system (OS) and one or more software applications that run on the operating system. Additionally, a processing device may access, store, manipulate, process, and generate data in response to the execution of software. For ease of understanding, a single processing device may be described as being used; however, those skilled in the art will understand that a processing device includes multiple processing elements and/or multiple types of processing elements. It can be seen that it may include. For example, a processing device may include a plurality of processors or one processor and one controller. Additionally, other processing configurations, such as parallel processors, are possible.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc., singly or in combination. Program instructions recorded on the medium may be specially designed and configured for the embodiment or may be known and available to those skilled in the art of computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. -Includes optical media (magneto-optical media) and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, etc. Examples of program instructions include machine language code, such as that produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter, etc. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.Software may include a computer program, code, instructions, or a combination of one or more of these, which may configure a processing unit to operate as desired, or may be processed independently or collectively. You can command the device. Software and/or data may be used on any type of machine, component, physical device, virtual equipment, computer storage medium or device to be interpreted by or to provide instructions or data to a processing device. , or may be permanently or temporarily embodied in a transmitted signal wave. Software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored on one or more computer-readable recording media.

이상과 같이 실시예들이 비록 한정된 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기를 기초로 다양한 기술적 수정 및 변형을 적용할 수 있다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.Although the embodiments have been described with limited drawings as described above, those skilled in the art can apply various technical modifications and variations based on the above. For example, the described techniques are performed in a different order than the described method, and/or components of the described system, structure, device, circuit, etc. are combined or combined in a different form than the described method, or other components are used. Alternatively, appropriate results may be achieved even if substituted or substituted by an equivalent.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents of the claims also fall within the scope of the following claims.

100 : 머신러닝 알고리즘에 의한 IT인프라 장애 사전 처리 시스템
110 : 마이크로에이전트
120 : 데이터수집부
130 : 장애예측부
140 : 시각화부
150 : 머신러닝부
160 : 기록관리부
170 : 장애예방부100: IT infrastructure failure pre-processing system using machine learning algorithm
110: Microagent
120: Data collection department
130: Disability prediction department
140: Visualization unit
150: Machine Learning Department
160: Records Management Department
170: Disability Prevention Department

Claims

A microagent that extracts raw data for each IT infrastructure and source code development system, recognizes failure-causing elements using a cognitive model, extracts source code corresponding to the raw data and failure-causing elements, and labels the extracted source code in units;
a data collection unit that receives labeled source code from the microagent and outputs first failure data, which is source code likely to cause a failure, through an analysis model;
a failure prediction unit that inputs the first failure data into a failure prediction model and outputs second failure data that divides the first failure data into non-failure elements and failure elements to predict failures in the IT infrastructure;
A failure prevention unit that, upon receiving the second failure data comprised of the failure elements, processes the failure-causing elements before an IT infrastructure failure occurs according to a failure response model;
A visualization unit that visualizes and provides the basis for the process of outputting the first failure data and the second failure data and the results of the process, and the results of processing failure-causing factors in the failure prevention unit; and,
It includes a machine learning unit that generates or updates the cognitive model, analysis model, failure prediction model, and failure response model according to a reinforcement learning algorithm,
The Department of Disability Prevention,
A failure information management module that creates a database and manages IT infrastructure failure types that can be prevented before IT infrastructure failures occur (hereinafter referred to as “first failure type”) based on past failure information, which is information on failures in actual IT infrastructure that occurred in the past. ;
A first judgment module that receives the second failure data and determines whether it belongs to the first failure type;
a failure processing module that inputs the second failure data determined as the first failure type by the first judgment module into the failure response model, processes the failure-causing factors according to the failure response model, and outputs the processing result; and,
An IT infrastructure failure pre-processing system using a machine learning algorithm, including a result transmission module that transmits the processing results output by the failure processing module to the machine learning unit.

According to paragraph 1,
The microagent is,
a data collection module that extracts and collects the raw data from the IT infrastructure and source code development system;
a data filtering unit that identifies and processes sensitive information included in the raw data;
a recognition module that receives the raw data from the data filtering unit, inputs it into the recognition model, and recognizes the failure-causing factors by the recognition model;
a source code collection module that collects source code related to the failure-causing element from the source code development system;
A labeling module that automatically labels the source code collected in the source code collection module according to a labeling algorithm; and,
A data transmission module that transmits the source code labeled in the labeling module to the data collection unit. An IT infrastructure failure pre-processing system using a machine learning algorithm including a.

According to paragraph 2,
The data filtering unit,
a sensitive information identification module that identifies sensitive information included in the raw data;
a sensitive information determination module that determines whether the sensitive information identified in the sensitive information identification module is essential sensitive information necessary for the recognition module to recognize the failure-causing element; and,
A sensitive information processing module that automatically replaces the sensitive information determined as essential sensitive information with general information and deletes the sensitive information that is not essential sensitive information from raw data; an IT infrastructure failure pre-processing system using a machine learning algorithm that includes a .

According to paragraph 1,
The Department of Disability Prevention,
Disability prevention that divides the above 1st failure type into 1-1 failure type, which requires administrator's approval for failure handling, and 1-2 failure type, which does not require administrator's approval for failure handling, and then manages them in a database. Type management module;
a second judgment module that determines whether the second fault data belonging to the first fault type is a 1-1 fault type or a 1-2 fault type by the first judgment module;
When the second failure data is determined to be the first failure type, an approval module reports the first failure type to the manager terminal, requests approval for failure processing, and receives approval from the manager terminal. further includes ;,
The failure processing module inputs the second failure data for the 1-1 failure type and the 1-2 failure type approved for failure handling from the administrator terminal into the failure response model, and An IT infrastructure failure pre-processing system using a machine learning algorithm that processes the above failure-causing factors and outputs the processing results.

According to paragraph 1,
The machine learning department,
A cognitive model learning unit that generates or updates the cognitive model according to a reinforcement learning algorithm;
An analysis model learning unit that generates or updates the analysis model according to a reinforcement learning algorithm;
A failure prediction model learning unit that generates or updates the failure prediction model according to a reinforcement learning algorithm; and,
An IT infrastructure failure pre-processing system using a machine learning algorithm, including a failure response model learning unit that generates or updates the failure response model according to a reinforcement learning algorithm.

According to paragraph 1,
Record data divided into labeled source code units is generated in the microagent and converted into a database, and the process in which source code labeled in the microagent is processed by the data collection unit, failure prediction unit, and failure prevention unit is recorded. It further includes a record management unit that records and manages data,
The visualization unit,
a failure prediction visualization module that visualizes the cause of the failure and the failure prediction process by backtracking the contents recorded in the record data of the source code for the second failure data; and,
A failure prevention visualization module that visualizes the process and results of processing the failure-causing factors by backtracking the contents recorded in the record data of the source code for the second failure data; IT infrastructure failure dictionary by a machine learning algorithm including a processing system.

According to paragraph 1,
The failure prediction unit,
The first failure data is transmitted from the data collection unit and input into each of at least two failure prediction models, and each of the failure prediction models is second preliminary failure data that divides the first failure data into a spare failure element and a spare failure element. A preliminary obstacle selection unit that outputs; and,
By comparing the second preliminary failure data output from each of the two failure prediction models, the second preliminary failure data output as a preliminary failure element by at least one of the at least two failure prediction models is stored in the IT infrastructure. An IT infrastructure failure advance prediction system using a machine learning algorithm that includes a failure element selection unit that outputs second failure data, which is source code that is likely to cause a failure.