KR101971013B1

KR101971013B1 - Cloud infra real time analysis system based on big date and the providing method thereof

Info

Publication number: KR101971013B1
Application number: KR1020160169515A
Authority: KR
Inventors: 문성규
Original assignee: 나무기술 주식회사
Priority date: 2016-12-13
Filing date: 2016-12-13
Publication date: 2019-04-22
Also published as: KR20180068002A

Abstract

본 발명에 따른 빅데이터 기반의 클라우드 인프라 실시간 분석 시스템은, 복수의 고객 서버들과 형성된 클라우드 인프라에서 운영중인 AaaS(Analytics as a Service)를 통해 각 고객 서버의 클라우드 운영 데이터를 수집하는 데이터 수집 서버; 상기 데이터 수집 서버에서 수집된 클라우드 운영 데이터를 분류하여 저장하는 수집 데이터 DB; 상기 데이터 수집 서버에서 수집된 클라우드 운영 데이터의 비정형 데이터를 분석하여 에러 프로세스를 검출하고, 검출된 에러 프로세스의 장애 여부를 지속적으로 모니터링하는 데이터 분석 서버; 및 상기 데이터 분석 서버에서 비정형 데이터로부터 검출된 에러 프로세스를 저장하는 분석 데이터 DB; 를 포함하는 점에 그 특징이 있다.
본 발명에 따르면, 클라우드 인프라 운영 환경에 있어서 실시간으로 빅데이터를 분석하여 비정형 데이터의 신규 프로세스를 지속적으로 모니터링하여 학습함으로써 장애 원인을 조속히 발견하여 대처할 수 있다.A large data-based cloud infrastructure real-time analysis system according to the present invention includes: a data collection server for collecting cloud operation data of each customer server through a plurality of customer servers and an Analytics as a Service (AaaS) operating in a formed cloud infrastructure; A collection data DB for classifying and storing the cloud operating data collected by the data collection server; A data analysis server for analyzing atypical data of cloud operation data collected by the data collection server to detect an error process and continuously monitoring whether a detected error process is interrupted; And an analysis data DB storing an error process detected from unstructured data in the data analysis server; And the like.
According to the present invention, by analyzing big data in real time in a cloud infrastructure operating environment and continuously monitoring and learning new processes of unstructured data, it is possible to quickly find and cope with the cause of the failure.

Description

[0001] The present invention relates to a real-time analysis system for a cloud infrastructure based on Big Data, and a method for providing the real-

본 발명은 빅데이터 기반의 클라우드 인프라 실시간 분석 시스템 및 그 제공방법에 관한 것으로, 특히 클라우드 인프라 운영 환경에 있어서 실시간으로 빅데이터를 분석하여 비정형 데이터의 신규 프로세스를 지속적으로 모니터링하여 장애 발생 여부를 예측할 수 있는 빅데이터 기반의 클라우드 인프라 실시간 분석 시스템 및 그 제공방법에 관한 것이다. The present invention relates to a large-data-based real-time analysis system for a cloud infrastructure and a method for providing the real-time analysis system. More particularly, the present invention relates to a method and system for analyzing large data in real time in a cloud infrastructure operating environment, And a method for providing the real-time analysis system of the cloud infrastructure.

"클라우드(cloud)" 컴퓨팅은 종종 통상적으로는 서비스를 요청하는 위치로부터 떨어진 위치에서 함께 네트워킹 되는 다수의 컴퓨터 서버에 의한 서비스로서의 컴퓨팅 자원의 제공을 지칭한다. 클라우드 데이터 센터는 통상적으로 클라우드 또는 클라우드의 특정 부분을 구성하는 서버의 물리적 배열을 지칭한다. 예로서, 서버는 데이터 센터 내에서 방, 그룹, 열 및 선반(racks) 내에 배열될 수 있다. 데이터 센터는 하나 이상의 서버 방을 포함할 수 있는 하나 이상의 "구역(zone)"을 가질 수 있다. 각각의 방은 하나 이상의 서버 열을 포함할 수 있고, 각각의 열은 하나 이상의 선반을 포함할 수 있다. 각각의 선반은 하나 이상의 개별 서버 노드를 포함할 수 있다. 구역, 방, 선반 및/또는 열 내의 서버는 전력, 에너지, 온도, 열 및/또는 다른 요건을 포함할 수 있는 데이터 센터 설비의 물리 기반구조 요건에 기초하여 가상 그룹 내에 배열될 수 있다.&Quot; Cloud " computing often refers to the provision of computing resources as a service by multiple computer servers that are networked together in a location remote from the service requesting location. A cloud data center typically refers to the physical arrangement of the servers that make up a particular part of the cloud or cloud. By way of example, servers may be arranged in rooms, groups, rows and racks within a data center. A data center may have one or more " zones " that may include one or more server rooms. Each room may include one or more server rows, and each row may include one or more shelves. Each shelf may include one or more individual server nodes. Servers within a zone, room, shelf and / or column may be arranged in a virtual group based on the physical infrastructure requirements of the data center facility, which may include power, energy, temperature, heat and / or other requirements.

서버 및 그의 자원의 부분은 데이터 센터 내의 자신의 물리적 위치에도 불구하고 보안, 서비스 품질, 처리량, 처리 능력 및/또는 다른 기준과 같은 실제 또는 예상 사용 요건에 따라 (예로서, 데이터 센터의 상이한 고객에 의한 사용을 위해) 할당될 수 있다. 예로서, 하나의 고객의 컴퓨팅 워크로드가 가상화를 이용하여 (데이터 센터의 상이한 열, 선반, 그룹 또는 방 안에 위치할 수 있는) 다수의 물리 서버 사이에 또는 동일 서버상의 다수의 노드 또는 자원 사이에 분산될 수 있다. 따라서, 가상화와 관련하여, 서버는 워크로드 요건을 충족시키도록 논리적으로 그룹화될 수 있다.A portion of the server and its resources may be used in accordance with actual or anticipated usage requirements, such as security, quality of service, throughput, throughput and / or other criteria (e.g., (For use by < / RTI > By way of example, a customer's computing workload may be used between multiple physical servers (which may be located in different rows, shelves, groups, or rooms in the data center) or by multiple nodes or resources on the same server Lt; / RTI > Thus, with respect to virtualization, servers can be logically grouped to meet workload requirements.

오늘날 클라우드 데이터 센터 내에 복잡한 구성이 구현됨에 따라 클라우드 데이터 센터를 효율적으로 관리하는 것이 점점 더 어려워졌다. 이러한 어려움에 기여하는 주요 팩터는 데이터 센터를 구성하는 각각의 장치 및/또는 서비스에 의해 생성되는 다수의 동작 데이터이다. 그러한 데이터의 많은 양으로 인해, 데이터 센터 관리자가 종종 자신의 데이터 센터의 건전성(health), 성능 또는 심지어 레이아웃의 전반적인 모습을 실시간으로 획득하여 장애 발생을 점검하기에 어려운 문제점이 있다. Today, complex configurations within the cloud data center have made it increasingly difficult to efficiently manage cloud data centers. The main factors contributing to this difficulty are the number of operational data generated by each device and / or service that makes up the data center. Due to the large amount of such data, data center managers often have difficulties in obtaining the health, performance, or even the overall appearance of their data center in real time to check for failures.

한국공개특허 제2015-0049541호Korean Patent Publication No. 2015-0049541

본 발명은 클라우드 인프라 운영 환경에 있어서 실시간으로 빅데이터를 분석하여 비정형 데이터의 신규 프로세스를 지속적으로 모니터링하여 장애 발생 여부를 예측할 수 있는 빅데이터 기반의 클라우드 인프라 실시간 분석 시스템 및 그 제공방법을 제공하는 것을 목적으로 한다.The present invention provides a big data-based real-time cloud infrastructure analysis system and a method for providing real-time analysis of large data based on large data in real time in a cloud infrastructure operating environment to continuously monitor a new process of unstructured data and predict a failure occurrence The purpose.

상기의 과제를 달성하기 위한 본 발명에 따른 빅데이터 기반의 클라우드 인프라 실시간 분석 시스템은, 복수의 고객 서버들과 형성된 클라우드 인프라에서 운영중인 AaaS(Analytics as a Service)를 통해 각 고객 서버의 클라우드 운영 데이터를 수집하는 데이터 수집 서버; 상기 데이터 수집 서버에서 수집된 클라우드 운영 데이터를 분류하여 저장하는 수집 데이터 DB; 상기 데이터 수집 서버에서 수집된 클라우드 운영 데이터의 비정형 데이터를 분석하여 에러 프로세스를 검출하고, 검출된 에러 프로세스의 장애 여부를 지속적으로 모니터링하는 데이터 분석 서버; 및 상기 데이터 분석 서버에서 비정형 데이터로부터 검출된 에러 프로세스를 저장하는 분석 데이터 DB; 를 포함하는 점에 그 특징이 있다.According to an embodiment of the present invention, there is provided a system for real-time analysis of a cloud infrastructure based on Big Data, comprising: a plurality of client servers and cloud operating data of each client server through an AaaS (Analytics as a Service) A data collection server for collecting the data; A collection data DB for classifying and storing the cloud operating data collected by the data collection server; A data analysis server for analyzing atypical data of cloud operation data collected by the data collection server to detect an error process and continuously monitoring whether a detected error process is interrupted; And an analysis data DB storing an error process detected from unstructured data in the data analysis server; And the like.

여기서, 특히 상기 데이터 분석 서버로부터 분석된 에러 프로세스의 장애 여부에 대응하는 처리 결과를 관리하는 대응 관리 서버; 및 상기 대응 관리 서버의 에러 프로세스의 장애 검출에 대한 처리결과를 저장하는 장애 패턴 데이터 DB를 더 포함하는 점에 그 특징이 있다.In particular, a correspondence management server manages a processing result corresponding to a failure of an error process analyzed by the data analysis server; And a failure pattern data DB storing a processing result of failure detection of an error process of the corresponding management server.

여기서, 특히 상기 클라우드 운영 데이터는, 정형 데이터, 비정형 데이터 및 상태 데이터를 포함하는 점에 그 특징이 있다.In particular, the cloud operating data is characterized in that the cloud operating data includes fixed data, unstructured data, and status data.

여기서, 특히 상기 정형 데이터는 하드웨어 구성인 CPU, 메모리, 디스크 및 네트워크를 포함하는 구동 정보인 점에 그 특징이 있다.In particular, the format data is characterized by being drive information including a CPU, a memory, a disk, and a network, which are hardware structures.

여기서, 특히 상기 비정형 데이터는 클라우드 OS 정보, 하이퍼바이저 정보, 가상 OS 정보, IT 운영 성능 정보 및 성능 측정 응답 시간을 포함하는 점에 그 특징이 있다.In particular, the atypical data is characterized by including cloud OS information, hypervisor information, virtual OS information, IT operational performance information, and performance measurement response time.

여기서, 특히 상기 상태 데이터는 IT 운영 상태 정보로 서버, 프로세스 및 DBMS의 구동 온/오프 상태 정보를 포함하는 점에 그 특징이 있다.In particular, the status data includes information on the on / off state of a server, a process, and a DBMS as IT operation status information.

여기서, 특히 상기 데이터 분석 서버는, 상기 수집되는 클라우드 운영 데이터의 정형 데이터, 비정형 데이터 및 상태 데이터 유형을 실시간으로 분석하는 데이터 실시간 분석부; 상기 데이터 실시간 분석부에서 분석된 데이터 유형별로 장애 패턴 DB에 저장된 데이터와 비교하여 장애를 판단하는 장애 패턴 DB 기반 분석부; 상기 데이터 실시간 분석부의 비정형 데이터의 키워드를 분석하여 에러 프로세스를 검출하는 장애 패턴 도출 분석부; 상기 장애 패턴 DB 기반 분석부에서 판단된 장애 발생과 상기 장애 패턴 도출 분석부에서 검출된 에러 프로세스의 발생에 대한 이벤트를 송부하는 장애 이벤트 발생/알림부; 및 상기 장애 패턴 도출 분석부에서 검출된 에러 프로세스 정보를 등록하는 클라우드 분석 에이전트 동기화부를 포함하는 점에 그 특징이 있다.Here, in particular, the data analysis server may include a data real-time analyzing unit for analyzing the collected data, the atypical data and the status data of the collected cloud operating data in real time; A failure pattern DB based analysis unit for comparing the data stored in the failure pattern database with the data type analyzed by the data real time analysis unit to determine a failure; A failure pattern derivation analyzing unit for analyzing a keyword of unstructured data of the data real-time analyzing unit to detect an error process; A failure event generation / notification unit for sending a failure occurrence determined by the failure pattern DB-based analysis unit and an event for occurrence of an error process detected by the failure pattern derivation analysis unit; And a cloud analysis agent synchronization unit for registering the error process information detected by the failure pattern derivation analysis unit.

여기서, 특히 상기 장애 패턴 도출 분석부에서 상기 에러 프로세스의 검출은 비정형 데이터의 기등록된 키워드 텍스트(TEXT) 파일과 비교하여 신규 키워드 텍스트 파일 검출하는 점에 그 특징이 있다.Here, in particular, the failure pattern derivation analyzing unit is characterized in that the error process is detected by comparing a new keyword text file with a previously registered keyword text (TEXT) file of unstructured data.

또한, 상기의 과제를 달성하기 위한 본 발명에 따른 빅데이터 기반의 클라우드 인프라 실시간 분석 서비스 제공 방법에 있어서, 복수의 고객 서버들과 형성된 클라우드 인프라 운영 데이터를 수신하는 단계; 상기 수신된 운영 데이터를 비정형 데이터, 정형 데이터 및 상태 데이터로 분류하여 분석하는 단계; 상기 분석하는 하는 단계에서 분석된 상기 비정형 데이터를 기저장된 장애 패턴과 비교하여 비정형 데이터의 장애 여부를 판단하는 단계; 상기 비정형 데이터의 장애 패턴이 검출되지 않는 경우, 빅데이터 DB와 비교하여 비정형 데이터가 에러 프로세스인지 여부를 판단하고, 에러 프로세스가 아닌 경우 신규 프로세스로 검출하여 모니터링 리스트에 등록하는 단계; 및 상기 신규 프로세스를 클라우드 인프라 운영 AaaS(Analytics as a Service)에 동기화하는 단계;를 포함하는 점에 그 특징이 있다.According to another aspect of the present invention, there is provided a method for providing a real-time analysis service of a cloud infrastructure based on Big Data, the method comprising: receiving cloud infrastructure operation data formed with a plurality of customer servers; Classifying and analyzing the received operational data into atypical data, fixed data, and status data; Comparing the atypical data analyzed in the analyzing step with a previously stored fault pattern to determine whether the irregular data has failed; Determining whether or not the unstructured data is an error process in comparison with the big data DB if the failure pattern of the atypical data is not detected; And synchronizing the new process to the cloud infrastructure operating AaaS (Analytics as a Service).

여기서, 특히 상기 클라우드 인프라 운영 데이터를 수신하는 단계이전, 복수의 고객 서버들과 형성된 클라우드 인프라에서 운영중인 AaaS(Analytics as a Service)를 통해 각 고객 서버의 클라우드 운영 데이터를 수집하는 단계; 및 상기 데이터 수집 서버에서 수집된 클라우드 운영 데이터를 분류하여 저장하는 단계를 더 포함하는 점에 그 특징이 있다.Collecting cloud operating data of each customer server through an AaaS (Analytics as a Service) operating in a cloud infrastructure formed with a plurality of customer servers before receiving the cloud infrastructure operating data; And categorizing and storing the cloud operation data collected by the data collection server.

여기서, 특히 상기 빅데이터 DB는, 로그 데이터, 성능 데이터, 점검 데이터 및 이벤트 데이터를 포함하는 분석 데이터 DB인 점에 그 특징이 있다.In particular, the big data DB is an analysis data DB including log data, performance data, check data, and event data.

여기서, 특히 상기 신규 프로세스를 검출하는 단계에서, 상기 검출된 신규 프로세스에 대한 이벤트를 송신하는 단계를 포함하는 점에 그 특징이 있다.Here, in particular, the step of detecting the new process includes transmitting an event for the detected new process.

여기서, 특히 상기 정형 데이터는 고객 서버의 하드웨어 구성인 CPU, 메모리, 디스크 및 네트워크를 포함하는 구동 정보인 점에 그 특징이 있다.In particular, the format data is characterized by being drive information including a CPU, a memory, a disk, and a network, which are hardware configurations of a customer server.

여기서, 특히 상기 장애 여부를 판단하는 단계에서 상기 분석된 정형 데이터를 빅데이터 DB의 하드웨어 구동 임계치와 비교하여 장애 여부를 판단하는 점에 그 특징이 있다.In particular, in the step of determining whether or not the fault is present, the analyzed form data is compared with a hardware operation threshold value of the big data DB to determine whether the fault has occurred or not.

여기서, 특히 상기 장애 여부 판단에서 상기 분석된 정형 데이터가 구동 임계치를 초과하면 장애 발생 이벤트를 송신하고, 빅데이터 DB에 장애 발생 정형 데이터를 저장하는 단계를 수행하는 점에 그 특징이 있다.In particular, the method includes the step of transmitting a fault occurrence event when the analyzed formulated data exceeds the drive threshold in the determination of the fault, and storing the fault occurrence form data in the big data DB.

여기서, 특히 상기 장애 여부를 판단하는 단계에서 상기 분석된 상태 데이터인 서버, 프로세스 및 DBMS의 구동 온/오프 상태를 판단하여 장애 여부를 판단하는 것을 포함하는 점에 그 특징이 있다.In particular, the method includes determining whether the server, the process, and the DBMS, which are the analyzed state data, are turned on / off to determine whether or not the fault has occurred.

여기서, 특히 상기 장애 여부 판단에서 상기 상태 데이터가 오프 상태로 판단되면, 장애 발생 이벤트를 송신하고, 빅데이터 DB에 장애 발생 상태 데이터를 저장하는 단계를 수행하는 점에 그 특징이 있다.In this case, when the state data is determined to be in the off state, the fault occurrence event is transmitted and the fault occurrence state data is stored in the big data DB.

여기서, 특히 상기 비정형 데이터의 장애 패턴이 검출되지 않는 경우, 빅데이터 DB와 비교하여 비정형 데이터가 에러 프로세스로 판단된 경우 에러 프로세스 장애 검출 이벤트를 송신하는 단계를 포함하는 점에 그 특징이 있다.In this case, when the unstructured data is not detected as an error process, the error process failure detection event is transmitted to the big data DB.

여기서, 특히 상기 동기화하는 단계 이후, 상기 검출된 신규 프로세스를 빅데이터 DB의 비정형 데이터로 저장하는 단계를 수행하는 점에 그 특징이 있다.In particular, the step of storing the detected new process as unstructured data of the big data DB is performed after the synchronizing.

여기서, 특히 상기 비정형 데이터를 기저장된 장애 패턴 비교하여 비정형 데이터의 장애 여부를 판단하는 단계에서 비정형 데이터가 장애 데이터로 판단되면 장애 발생 이벤트를 송신하는 점에 그 특징이 있다.In particular, when the irregular data is judged to be the fault data in the step of judging whether the irregular data is faulty by comparing the irregular data with the pre-stored fault pattern, the fault occurrence event is transmitted.

본 발명에 따르면, 클라우드 인프라 운영 환경에 있어서 실시간으로 빅데이터를 분석하여 비정형 데이터의 신규 프로세스를 지속적으로 모니터링하여 학습함으로써 장애 원인을 조속히 발견하여 대처할 수 있다.According to the present invention, by analyzing big data in real time in a cloud infrastructure operating environment and continuously monitoring and learning new processes of unstructured data, it is possible to quickly find and cope with the cause of the failure.

도 1은 본 발명의 실시 예에 따른 빅데이터 기반의 클라우드 인프라 실시간 분석 시스템에 대한 전체적인 구성을 개략적으로 도시한 도면.
도 2는 상기 도 1의 데이터 수집 서버에 수집되는 데이터 계층을 도시한 도면.
도 3은 상기 도 1의 데이터 분석 서버의 구성을 개략적으로 도시한 도면이다.
도 4는 본 발명의 실시 예에 따른 빅테이터 기반의 클라우드 인프라 분석 서비스 제공방법에 대한 순서도.BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a diagram schematically illustrating the overall structure of a real-time analysis system for a cloud infrastructure based on Big Data according to an embodiment of the present invention; FIG.
FIG. 2 illustrates a data layer collected in the data collection server of FIG. 1; FIG.
3 is a diagram schematically showing a configuration of the data analysis server of FIG.
FIG. 4 is a flowchart illustrating a method of providing a cloud-based infrastructure analysis service based on a victor according to an embodiment of the present invention; FIG.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시 예를 가질 수 있는 바, 특정 실시 예들을 도면에 예시하고 이를 상세한 설명을 통해 상세히 설명하고자 한다. 그러나 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.While the present invention has been described in connection with certain exemplary embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, but, on the contrary, is intended to cover various modifications and similarities. It is to be understood, however, that the invention is not to be limited to the specific embodiments, but includes all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.

본 발명을 설명함에 있어 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 또한, 본 명세서의 설명 과정에서 이용되는 숫자(예를 들어, 제1, 제2 등)는 하나의 구성요소를 다른 구성요소와 구분하기 위한 식별기호에 불과하다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. In addition, numerals (e.g., first, second, etc.) used in the description of the present invention are merely an identifier for distinguishing one component from another.

또한, 본 명세서에서, 일 구성요소가 다른 구성요소와 "연결된다" 거나 "접속된다" 등으로 언급된 때에는, 상기 일 구성요소가 상기 다른 구성요소와 직접 연결되거나 또는 직접 접속될 수도 있지만, 특별히 반대되는 기재가 존재하지 않는 이상, 중간에 또 다른 구성요소를 매개하여 연결되거나 또는 접속될 수도 있다고 이해되어야 할 것이다. Also, in this specification, when an element is referred to as being " connected " or " connected " with another element, the element may be directly connected or directly connected to the other element, It should be understood that, unless an opposite description is present, it may be connected or connected via another element in the middle.

이하, 본 발명의 바람직한 실시 예를 첨부한 도면에 의거하여 상세하게 설명하면 다음과 같다. Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 실시 예에 따른 빅데이터 기반의 클라우드 인프라 실시간 분석 시스템에 대한 전체적인 구성을 개략적으로 도시한 도면이고, 도 2는 상기 도 1의 데이터 수집 서버에 수집되는 데이터 계층을 도시한 도면이고, 도 3은 상기 도 1의 데이터 분석 서버의 구성을 개략적으로 도시한 도면이다.FIG. 1 is a diagram schematically showing an overall configuration of a real-time analysis system of a cloud infrastructure based on a big data according to an embodiment of the present invention. FIG. 2 is a diagram showing a data layer collected in the data collection server of FIG. And FIG. 3 is a diagram schematically showing the configuration of the data analysis server of FIG.

도 1에 도시된 바와 같이, 본 발명에 따른 빅데이터 기반의 클라우드 인프라 실시간 분석 시스템은, 복수의 고객 서버들(100), 데이터 수집 서버(200), 수집 데이터 DB(210), 데이터 분석 서버(300), 분석 데이터 DB(310), 대응 관리 서버(400) 및 장애 패턴 데이터 DB(410)를 포함하여 구성된다. 1, a large data-based real-time cloud infrastructure analysis system according to the present invention includes a plurality of client servers 100, a data collection server 200, a collection data DB 210, a data analysis server 300, an analysis data DB 310, a corresponding management server 400, and a failure pattern data DB 410.

상기 데이터 수집 서버(200)는 복수의 고객 서버들과 형성된 클라우드 인프라에서 운영중인 분석 서비스 에이전트(AaaS:Analytics as a Service)(미도시)를 통해 각 고객 서버의 클라우드 운영 데이터를 수집한다.The data collection server 200 collects cloud operation data of each customer server through an analytic service agent (AaaS: not shown) operating in a cloud infrastructure formed with a plurality of customer servers.

여기서, 상기 데이터 수집 서버(200)는 상기 복수의 고객 서버들(100)이 사전에 등록되어 있는 에이전트 접속인지 타당성 여부를 검증한다. 예를 들어, 호스트(host)명, IP 어드레스(address) 등을 검증하는 것이 바람직하다. 즉, 대규모 클라우드 환경에서도 운영 가능하도록 병렬 구성 즉, L4 기반 하드웨어 부하 분산 및 타스크 큐(Task Queue)를 이용한 스토어-엔드-포워드(store-and-forward)방식 지원을 한다.Here, the data collection server 200 verifies whether or not the plurality of customer servers 100 is an agent connection registered in advance. For example, it is desirable to verify the host name, IP address, and the like. In other words, it supports a parallel configuration, that is, L4-based hardware load balancing and a store-and-forward method using a task queue so that it can operate even in a large-scale cloud environment.

또한, 상기 데이터 수집 서버(200)에서 상기 클라우드 인프라의 구성, 성능, 장애 데이터 수집을 위해 지원하는 VDI 환경은 Citrix XenDesktop, 및 VMware View 환경을 포함하며, 하이퍼바이저(Hypervisor) 환경은 XEN, KVM, RHEV, VMware vSphere, 및 마이크로소프트 Hyper-V 환경과, IBM PowerVM, 및 HP VM 환경을 포함하고, 상기 클라우드 인프라 스토리지는 RDBMS(관계형 데이터베이스) DB, 및 NoSQL DB를 포함하게 된다.In addition, the VDI environment supported by the data collection server 200 for the configuration, performance, and fault data collection of the cloud infrastructure includes a Citrix XenDesktop and a VMware View environment. The hypervisor environment includes XEN, KVM, RHEV, VMware vSphere, and Microsoft Hyper-V environments, IBM PowerVM, and HP VM environments. The cloud infrastructure storage includes RDBMS (Relational Database) DB, and NoSQL DB.

도 2에 도시된 바와 같이, 상기 데이터 수집 서버(200)에 수집된 상기 클라우드 운영 데이터는 정형 데이터, 비정형 데이터 및 상태 데이터를 포함하게 된다. 상기 정형 데이터는 하드웨어 구성인 CPU, 메모리, 디스크 및 네트워크를 포함하는 구동 정보이고, 상기 비정형 데이터는 클라우드 OS 정보, 하이퍼바이저 정보, 가상 OS 정보, IT 운영 성능 정보 및 성능 측정 응답 시간을 포함하게 된다. 상기 상태 데이터는 IT 운영 상태 정보로 서버, 프로세스 및 DBMS의 구동 온/오프 상태 정보를 포함하는 것을 의미한다.As shown in FIG. 2, the cloud operation data collected in the data collection server 200 includes the form data, the unstructured data, and the state data. The formatted data is drive information including a hardware configuration including a CPU, a memory, a disk, and a network, and the atypical data includes cloud OS information, hypervisor information, virtual OS information, IT operational performance information, and performance measurement response time . The status data includes information on the on / off status of the server, the process, and the DBMS as IT operation status information.

상기 수집 데이터 DB(210)는 상기 데이터 수집 서버(200)에서 수집된 클라우드 운영 데이터의 환경, 이벤트, 성능, 장애 데이터를 분류하여 저장하게 된다. The collection data DB 210 classifies and stores environment, event, performance, and failure data of the cloud operating data collected by the data collection server 200.

상기 데이터 분석 서버(300)는 상기 데이터 수집 서버(200)에서 수집된 클라우드 운영 데이터의 비정형 데이터를 분석하여 에러 프로세스를 검출하고, 검출된 에러 프로세스의 장애 여부를 지속적으로 모니터링하게 된다. The data analysis server 300 analyzes the atypical data of the cloud operating data collected by the data collection server 200 to detect an error process and continuously monitors whether the error process is detected.

도 3에 도시된 바와 같이, 상기 데이터 분석 서버(300)는 데이터 실시간 분석부(320), 장애 패턴 DB 기반 분석부(330), 장애 이벤트 발생/알림부(340), 장애 패턴 도출 분석부(350) 및 클라우드 분석 에이전트 동기화부(360)를 포함하여 구성된다. 3, the data analysis server 300 includes a data real-time analysis unit 320, a fault pattern DB-based analysis unit 330, a fault event generation / notification unit 340, a fault pattern derivation analysis unit 350 and a cloud analysis agent synchronization unit 360. [

상기 데이터 실시간 분석부(320)는 상기 수집되는 클라우드 운영 데이터의 정형 데이터, 비정형 데이터 및 상태 데이터 유형을 실시간으로 분석하게 된다.The data real-time analyzing unit 320 analyzes real-time the form data, unstructured data, and status data types of the collected cloud operating data.

보다 구체적으로, 상기 데이터 실시간 분석부(320)는 상기 데이터 수집 서버(200)에서 수집된 클라우드 운영 데이터의 상기 정형 데이터는 하드웨어 구성인 CPU, 메모리, 디스크 및 네트워크 구동 정보를 포함하여 구분하고, 상기 비정형 데이터는 클라우드 OS 정보, 하이퍼바이저 정보, 가상 OS 정보, IT 운영 성능 정보 및 성능 측정 응답 시간을 포함하여 구분하게 된다. 그리고 상기 상태 데이터는 IT 운영 상태 정보로 서버, 프로세스 및 DBMS의 구동 온/오프 상태 정보를 포함하여 구분하게 된다.More specifically, the data real-time analyzing unit 320 divides the formatted data of the cloud operating data collected by the data collecting server 200 into a hardware configuration including a CPU, a memory, a disk, and a network driving information, Unstructured data includes cloud OS information, hypervisor information, virtual OS information, IT operational performance information, and performance measurement response time. The state data is classified into IT operation state information including information on the on / off state of the server, the process, and the DBMS.

이러한 상기 데이터 실시간 분석부(320)에서는 상기 데이터 수집 서버(200)에서 수집되는 상기 고객 서버들(100)과 형성된 분석 서비스 에이전트(미도시)에서 송신되는 클라우드 운영 데이터를 분석하게 되고, 이러한 클라우드 운영 데이터는, VDI Layer 구성, 성능, 장애 데이터와, VM(Virtual Mashin) Layer 구성, 성능, 장애 데이터와, 상기 하이퍼바이저 계층(Hypervisor Layer) 구성, 성능, 장애 데이터와, 상기 클라우드 인프라 네트워크 구성, 성능, 장애 데이터와, 상기 클라우드 인프라 스토리지 구성, 성능, 장애 데이터를 포함하여 유형별로 분석하게 된다.The data real-time analyzing unit 320 analyzes the cloud operating data transmitted from the customer servers 100 and the analysis service agent (not shown) collected by the data collecting server 200, The data includes at least one of a VDI layer configuration, performance, fault data, VM (Virtual Mashin) layer configuration, performance, fault data, the hypervisor layer configuration, performance and fault data, , Failure data, and the cloud infrastructure storage configuration, performance, and failure data.

상기 장애 패턴 DB 기반 분석부(330)는 상기 데이터 실시간 분석부(320)에서 분석된 데이터 유형별로 장애 패턴 DB에 저장된 데이터와 비교하여 장애를 판단하게 된다. The failure pattern DB-based analysis unit 330 compares the data stored in the failure pattern database with data analyzed by the data real-time analysis unit 320 to determine a failure.

보다 구체적으로, 상기 비정형 데이터 수신 내용과 장애 패턴 DB의 일치 여부를 검색하게 된다. 이러한 장애 패턴 DB의 유형은 관리자/운영자가 입력한 검색 키워드, 밴더(vendor)에서 제공하는 중요 키워드, 유지보수 분석 결과 또는 타 사이트에서 발생한 사례를 통한 중요 키워드, 기타 일반적으로 알려진 범용적으로 관리되고 있는 중요 키워드, 비정형 데이터 내의 패턴 항목별 빈도수를 집계하게 된다. More specifically, it is searched whether the irregular data reception contents match the fault pattern DB. The type of the failure pattern DB is managed by a search keyword inputted by an administrator / operator, an important keyword provided by a vendor, a maintenance analysis result or an important keyword through a case occurred in another site, The number of important keywords, and the frequency of each pattern item in unstructured data.

상기 장애 패턴 도출 분석부(350)는 상기 데이터 실시간 분석부(320)의 비정형 데이터의 키워드를 분석하여 에러 프로세스를 검출하게 된다. 여기서, 상기 장애 패턴 도출 검출부(350)는 상기 에러 프로세스의 검출은 비정형 데이터의 기등록된 키워드 텍스트(TEXT) 파일과 비교하여 신규 키워드 텍스트 파일 검출하게 된다. The failure pattern derivation analyzer 350 analyzes the keyword of the atypical data of the data real-time analyzer 320 to detect an error process. Here, the failure pattern derivation detecting unit 350 compares the detection of the error process with a pre-registered keyword text (TEXT) file of irregular data to detect a new keyword text file.

보다 구체적으로, 시스템 사용률이 과다하게 높은 문제 프로세스, 현재 운영중인 클라우드 에이전트에서 분석된 사용률이 높은 상위 프로세스를 검출하게 되고, 이때 OS/WAS/DBMS 등 시스템 프로세스는 제외하게 된다.More specifically, a problem process in which the system utilization rate is excessively high, an upper process having a high utilization rate analyzed by the currently operated cloud agent is detected, and system processes such as OS / WAS / DBMS are excluded.

또한, 장애/문제가 발생한 시스템에서 검출된 등록되지 않은 패턴과 동일한 유형의 텍스트, 동일한 인프라 문제 즉, 예를 들어, 네트워크, 디스크, DB 접속 오류 등의 발생시 등록되어 있는 클라우드 인프라 구성 정보와 비정형 데이터 패턴의 문제 텍스트를 연계하여 연관 관계를 분석하게 된다. In addition, the same type of text as the unregistered pattern detected in the system in which the failure / problem has occurred, the cloud infrastructure configuration information registered at the time of occurrence of the same infrastructure problem, for example, network, disk, DB connection error, The problem text of the pattern is linked to analyze the association.

상기 장애 이벤트 발생/알림부(340)는 상기 장애 패턴 DB 기반 분석부(330)에서 판단된 장애 발생과 상기 장애 패턴 도출 분석부에서 검출된 에러 프로세스의 발생에 대한 이벤트를 송부하게 된다. 즉, 상기 비정형 데이터 분석을 통해 도출된 문제 유형, 분석 결과, 서버 정보(호스트명, ip 어드레스) 등을 통합 대시보드에 JSON 기반의 표준 이벤트로 변환하여 송신하게 된다. 즉, 분석된 문제 패턴의 관리자 확인을 위하여 통합 대시보드에 분석 결과에 대한 이벤트 송신을 하게 된다.The failure event generating / notifying unit 340 transmits an event for the occurrence of the failure determined by the failure pattern DB-based analysis unit 330 and an occurrence of the error process detected by the failure pattern derivation analysis unit. In other words, the trouble type, analysis result, server information (host name, ip address) derived through the above unstructured data analysis are converted into JSON-based standard events in the integrated dashboard and transmitted. That is, the event is transmitted to the integrated dashboard to confirm the administrator of the analyzed problem pattern.

상기 클라우드 분석 에이전트 동기화부(360)는 상기 장애 패턴 도출 분석부(350)에서 검출된 에러 프로세스 정보를 등록하게 된다. 여기서, 상기 장애 패턴 DB 기반 분석부(330)에 등록되지 않은 신규 패턴 프로세스(에러 프로세스)를 추가 모니터링을 자동화하기 위해 현재 등록/접속되어 있는 AaaS 에이전트에 패턴 정보 송신 및 동기화를 하게 된다. 즉, 신규 접속 또는 재접속되는 에이전트에 장애 패턴 DB 정보 이외에 신규로 등록된 패턴 정보를 송신하여 동기화하게 된다. The cloud analysis agent synchronization unit 360 registers the error process information detected by the failure pattern derivation analysis unit 350. [ Here, pattern information is transmitted and synchronized with an AaaS agent currently registered / connected to automate additional monitoring of a new pattern process (error process) not registered in the failure pattern DB-based analyzer 330. That is, the pattern information newly registered in addition to the failure pattern DB information is transmitted to the agent that is newly connected or reconnected and is synchronized.

상기 분석 데이터 DB(310)는 상기 데이터 분석 서버(300)에서 비정형 데이터로부터 검출된 에러 프로세스를 저장하게 된다. 여기서, 분석 데이터 DB(310)는 빅데이터 DB에 해당하는 것으로, 빅데이터는 정형, 비정형 및 상태 데이터를 모두 포괄하게 된다. 즉, 정형, 비정형 및 상태를 포괄하는 빅데이터를 이용하여 장애 패턴 정보를 예측하여 그에 대응할 수 있다.The analysis data DB 310 stores the error process detected from the unstructured data in the data analysis server 300. Here, the analysis data DB 310 corresponds to the big data DB, and the big data covers all the fixed, irregular and state data. That is, the failure pattern information can be predicted and coped with using the big data including the shape, the irregular shape, and the state.

상기 대응 관리 서버(400)는 상기 데이터 분석 서버(300)로부터 분석된 에러 프로세스의 장애 여부에 대응하는 처리 결과를 관리한다.The correspondence management server 400 manages the processing result corresponding to whether the error process analyzed by the data analysis server 300 has failed or not.

상기 장애 패턴 데이터 DB(410)는 상기 대응 관리 서버(400)의 에러 프로세스의 장애 검출에 대한 처리결과를 저장하게 된다.The failure pattern data DB 410 stores the processing result of failure detection of the error process of the corresponding management server 400. [

또한, 도 4는 본 발명의 실시 예에 따른 빅테이터 기반의 클라우드 인프라 분석 서비스 제공방법에 대한 순서도이다.FIG. 4 is a flowchart illustrating a method of providing a cloud-based infrastructure analysis service based on a large-scale data according to an exemplary embodiment of the present invention.

도 4에 도시된 바와 같이, 본 발명에 따른 빅데이터 기반의 클라우드 인프라 실시간 분석 서비스 제공 방법은, 먼저 복수의 고객 서버들(100)과 형성된 클라우드 인프라에서 운영중인 AaaS(Analytics as a Service)를 통해 클라우드 인프라 운영 데이터를 수신하여 수집하는 단계가 수행된다(S401).As shown in FIG. 4, a method for providing real-time analysis service of a cloud infrastructure based on Big Data according to the present invention comprises the steps of: (a) providing a plurality of customer servers 100 with an AaaS (Analytics as a Service) The step of receiving and collecting the cloud infrastructure operational data is performed (S401).

그리고 상기 데이터 수집 서버(200)에서 수집된 클라우드 운영 데이터를 실시간으로 분석하여 수집 데이터 DB(210)에 저장하는 단계를 수행하게 된다(S402). 여기서, 상기 수신된 운영 데이터를 비정형 데이터, 정형 데이터 및 상태 데이터로 분류하여 분석하게 된다. 상기 수집되는 클라우드 운영 데이터의 정형 데이터, 비정형 데이터 및 상태 데이터 유형을 실시간으로 분석하게 된다.In operation S402, the cloud operating data collected by the data collecting server 200 is analyzed in real time and stored in the collected data DB 210. Here, the received operational data is classified into atypical data, fixed data, and status data and analyzed. And analyzes the collected data, the unstructured data, and the status data type of the cloud operating data in real time.

이어서, 상기 분석하는 하는 단계에서 분석된 상기 비정형 데이터를 기저장된 장애 패턴과 비교하여(S403) 비정형 데이터의 장애 여부를 판단하는 단계가 수행된다(S404). 보다 구체적으로, 상기 비정형 데이터 수신 내용과 장애 패턴 DB의 일치 여부를 검색하게 된다. 이러한 장애 패턴 DB의 유형은 관리자/운영자가 입력한 검색 키워드, 밴더(vendor)에서 제공하는 중요 키워드, 유지보수 분석 결과 또는 타 사이트에서 발생한 사례를 통한 중요 키워드, 기타 일반적으로 알려진 범용적으로 관리되고 있는 중요 키워드, 비정형 데이터 내의 패턴 항목별 빈도수를 집계하게 된다. In step S403, the irregular data analyzed in step S403 is compared with the previously stored irregular data in step S404. More specifically, it is searched whether the irregular data reception contents match the fault pattern DB. The type of the failure pattern DB is managed by a search keyword inputted by an administrator / operator, an important keyword provided by a vendor, a maintenance analysis result or an important keyword through a case occurred in another site, The number of important keywords, and the frequency of each pattern item in unstructured data.

그 다음, 상기 비정형 데이터의 장애 패턴이 검출되지 않는 경우, 빅데이터 DB와 비교하여 비정형 데이터가 에러 프로세스인지 여부를 판단하게 된다(S406). 즉, 상기 데이터 실시간 분석부에서 상기 빅데이터 DB와 비정형 데이터의 키워드를 비교 분석하여 에러 프로세스에 해당되는지를 판단하게 된다.Next, when the fault pattern of the irregular data is not detected, it is compared with the big data DB to determine whether the irregular data is an error process (S406). That is, the data real-time analyzing unit compares and analyzes the keywords of the big data DB and the unstructured data to determine whether or not it is an error process.

이어, 상기 에러 프로세스가 아닌 경우 신규 프로세스로 검출하고, 상기 검출된 신규 프로세스에 대한 이벤트를 송신하게 된다(S408). 여기서, 상기 에러 프로세스의 검출은 비정형 데이터의 기등록된 키워드 텍스트(TEXT) 파일과 비교하여 신규 프로세스의 키워드 텍스트 파일을 검출하게 된다.If it is not the error process, it is detected as a new process, and an event for the detected new process is transmitted (S408). Here, the detection of the error process compares the keyword file (TEXT) file of the unstructured data with the previously registered keyword text file (TEXT) to detect the keyword text file of the new process.

그리고 상기 검출된 신규 프로세스를 모니터링 리스트에 등록하는 단계를 수행하게 된다(S409). 즉, 검출된 신규 프로세스가 장애 데이터 가능성을 보유할 수 있으므로 지속적으로 모니터링을 하기 위함이다.Then, the detected new process is registered in the monitoring list (S409). That is, since the detected new process may have the possibility of failure data, it is for continuous monitoring.

그 다음, 상기 신규 프로세스를 클라우드 인프라 운영 AaaS(Analytics as a Service)에 동기화하는 단계가 수행된다(S410). Then, the step of synchronizing the new process to the cloud infrastructure operating AaaS (Analytics as a Service) is performed (S410).

보다 구체적으로, 상기 장애 패턴 DB 기반 분석부에 등록되지 않은 신규 패턴 프로세스(에러 프로세스)를 추가 모니터링을 자동화하기 위해 현재 등록/접속되어 있는 AaaS 에이전트에 패턴 정보 송신 및 동기화를 하게 된다. 즉, 신규 접속 또는 재접속되는 에이전트에 장애 패턴 DB 정보 이외에 신규로 등록된 패턴 정보를 송신하여 동기화하게 된다. More specifically, pattern information is transmitted and synchronized to an AaaS agent currently registered / connected to automate additional monitoring of a new pattern process (error process) not registered in the failure pattern DB-based analysis unit. That is, the pattern information newly registered in addition to the failure pattern DB information is transmitted to the agent that is newly connected or reconnected and is synchronized.

이어, 상기 동기화하는 단계 이후, 상기 검출된 신규 프로세스를 빅데이터 DB의 비정형 데이터로 저장하는 단계를 수행하게 된다(S411). 여기서, 상기 검출된 신규 프로세스의 텍스트를 네트워크, 디스크, DB 접속 오류 등의 발생시 등록되어 있는 클라우드 인프라 구성 정보와 비정형 데이터 패턴의 문제 텍스트를 연계하여 빅데이터 DB에 저장하고, 상기 비정형 데이터에 대한 장애 데이터 검출의 반복적인 과정을 거친 학습 과정을 통해 새로 발생되는 장애 데이터에 대해 조속한 장애 처리를 할 수 있게 된다.After the synchronization, the detected new process is stored as unstructured data of the big data DB (S411). In this case, the text of the detected new process is stored in the big data DB in association with the problem text of the unstructured data pattern and the cloud infrastructure configuration information registered at the time of occurrence of network, disk, DB connection error, etc., It is possible to quickly deal with newly generated failure data through a learning process that is repeatedly performed through data detection.

한편, 상기 비정형 데이터를 기저장된 장애 패턴 비교하여 비정형 데이터의 장애 여부를 판단하는 단계(S404)에서 비정형 데이터가 장애 데이터로 판단되면 장애 발생 이벤트를 송신하는 단계를 수행한다(S405).If it is determined in step S404 that the irregular data has failed, the irregular data is compared with the previously stored irregular data to determine whether the irregular data has failed. If the irregular data is determined to be the irregular data, a fault occurrence event is transmitted in step S405.

또한, 상기 비정형 데이터의 장애 패턴을 검출하는 단계에서 장애 패턴이 검출되지 않는 경우(S404), 빅데이터 DB와 비교하여 비정형 데이터가 에러 프로세스로 판단된 경우(S406)에 에러 프로세스 장애 검출 이벤트를 송신하는 단계가 수행된다(S407). If a fault pattern is not detected in the step of detecting the fault pattern of the atypical data (S404), if an irregular data is determined to be an error process as compared with the big data DB (S406), an error process fault detection event is transmitted Is performed (S407).

한편, 상기 분석하는 하는 단계(S402) 이후, 상기 분석된 데이터가 비정형 데이터가 아닌 경우, 정형 데이터인지 여부를 판단하고(S420), 빅데이터 DB의 하드웨어 구동 임계치와 비교하여(S421) 상기 분석된 정형 데이터가 구동 임계치를 초과하는 경우 장애 데이터로 판단하여, 장애 발생 이벤트 송부를 하게 된다(S422). 그리고 장애 발생 정형 데이터를 빅데이터 DB에 저장하게 된다(S423).If the analyzed data is not unstructured data, it is determined whether or not the analyzed data is fixed data (S420). The analyzed data is compared with a hardware operation threshold value of the big data DB (S421) If the formatted data exceeds the drive threshold, it is determined to be failure data, and a failure occurrence event is transmitted (S422). Then, the failure occurrence format data is stored in the big data DB (S423).

또한, 상기 분석된 데이터가 정형 데이터인지 여부를 판단하는 단계(S420)에서 상기 분석된 데이터가 정형 데이터가 아닌 경우 상태 데이터인지 여부를 판단하게 된다(S430). 즉, 정형 데이터인 서버, 프로세스 및 DBMS의 구동 온/오프 상태를 판단하여 장애 여부를 판단하게 된다. 이때, 상기 장애 여부 판단에서 상기 상태 데이터가 오프 상태이면, 장애로 판단되고(S431), 장애 발생 이벤트를 송신하고(S432), 빅데이터 DB에 장애 발생 상태 데이터를 저장하는 단계(S433)를 수행하게 된다. If it is determined that the analyzed data is not the formatted data (S430), it is determined whether the analyzed data is the formatted data (S430). That is, it judges whether the server, the process and the DBMS which are the formal data are on / off state and judges whether there is a failure. At this time, if the status data is off in the determination of the failure, the failure is determined to be a failure (S431), the failure occurrence event is transmitted (S432), and the failure occurrence status data is stored in the big data DB .

마지막으로, 상기 비정형 데이터, 상기 정형 데이터 및 상태 데이터의 장애 데이터 여부를 판단하는 과정을 거쳐 처리된 데이터 처리 결과를 장애 패턴 DB에 저장하게 된다(S412). Finally, it is determined whether the irregular data, the fixed data, and the status data are fault data, and the processed data processing result is stored in the fault pattern database (S412).

따라서, 본 발명에 따르면, 클라우드 인프라 운영 환경에 있어서 실시간으로 빅데이터를 분석하여 비정형 데이터의 신규 프로세스를 지속적으로 모니터링하여 빅데이터 DB를 통해 학습함으로써 장애 원인을 조속히 발견하여 대처할 수 있다.Therefore, according to the present invention, in a cloud infrastructure operating environment, large data can be analyzed in real time, a new process of unstructured data can be constantly monitored, and learning through a big data DB can be quickly detected and coped with.

본 발명의 권리범위는 상술한 실시 예에 한정되는 것이 아니라 첨부된 특허청구범위 내에서 다양한 형태의 실시 예로 구현될 수 있다. 특허청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진자라면 누구든지 변형 가능한 다양한 범위까지 본 발명의 청구범위 기재의 범위 내에 있는 것으로 본다. The scope of the present invention is not limited to the above-described embodiments, but may be embodied in various forms of embodiments within the scope of the appended claims. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present invention as defined by the appended claims.

100 --- 고객 서버 200 --- 데이터 수집 서버
210 --- 수집 데이터 DB 300 --- 데이터 분석 서버
310 --- 분석 데이터 DB 320 --- 데이터 실시간 분석부
330 --- 장애 패턴 DB 기반 분석부 340 --- 장애 이벤트 발생/알림부
350 --- 장애 패턴 도출 분석부 360 --- 클라우드 분석 에이전트 동기화부
400 --- 대응 관리 서버 410 --- 장애 패턴 데이터 DB100 --- Customer server 200 --- Data collection server
210 --- Collection data DB 300 --- Data analysis server
310 --- Analysis data DB 320 --- Data real time analysis part
330 --- Fault pattern DB-based analysis unit 340 --- Fault event occurrence / notification unit
350 --- Failure pattern derivation analysis part 360 --- Cloud analysis agent synchronization part
400 --- Response management server 410 --- Fault pattern data DB

Claims

In a big data-based cloud infrastructure real-time analysis system,
A data collection server for collecting cloud operating data of each customer server through a plurality of customer servers and an Analytics as a Service (AaaS) operating in a cloud infrastructure formed;
A collection data DB for classifying and storing the cloud operating data collected by the data collection server;
A data analysis server for analyzing atypical data of cloud operation data collected by the data collection server to detect an error process and continuously monitoring whether a detected error process is interrupted; And
An analysis data DB storing an error process detected from unstructured data in the data analysis server; Lt; / RTI >
The data analysis server comprises:
A data real-time analyzer for analyzing the collected data, the unstructured data and the status data of the cloud operating data in real time;
A failure pattern DB based analysis unit for comparing the data stored in the failure pattern database with the data type analyzed by the data real time analysis unit to determine a failure;
A failure pattern derivation analyzing unit for analyzing a keyword of unstructured data of the data real-time analyzing unit to detect an error process;
A failure event generation / notification unit for sending a failure occurrence determined by the failure pattern DB-based analysis unit and an event for occurrence of an error process detected by the failure pattern derivation analysis unit; And
And a cloud analysis agent synchronization unit for registering the error process information detected by the failure pattern derivation analysis unit.

The method according to claim 1,
A correspondence management server for managing a processing result corresponding to whether the error process analyzed by the data analysis server is a failure; And
Further comprising a failure pattern data DB storing a processing result of failure detection of an error process of the corresponding management server.

The method according to claim 1,
The cloud operating data comprises:
Wherein said real-time analysis system comprises at least one of a real-time data, a fixed data, an unstructured data, and status data.

The method of claim 3,
Wherein the formatted data is drive information including a hardware configuration including a CPU, a memory, a disk, and a network.

The method of claim 3,
Wherein the atypical data includes cloud OS information, hypervisor information, virtual OS information, IT operational performance information, and performance measurement response time.

The method of claim 3,
Wherein the state data includes information on the on / off state of the server, the process, and the DBMS as the IT operation state information.

delete

The method according to claim 1,
In the failure pattern derivation analysis unit
Wherein the detection of the error process is performed by comparing a new keyword text file with a pre-registered keyword text (TEXT) file of the unstructured data.

A method for providing a real-time analysis service of a cloud infrastructure based on Big Data,
Receiving cloud infrastructure operational data formed with a plurality of customer servers;
Classifying and analyzing the received operational data into atypical data, fixed data, and status data;
Comparing the atypical data analyzed in the analyzing step with a previously stored fault pattern to determine whether the irregular data has failed;
Determining whether or not the unstructured data is an error process in comparison with the big data DB if the failure pattern of the atypical data is not detected; And
And synchronizing the new process to a cloud infrastructure operational AaaS (Analytics as a Service)
Wherein the determination of whether or not the error process is performed is made by comparing a keyword text file of the new process with a previously registered keyword text file of the atypical data, and detecting the keyword text file of the new process.

10. The method of claim 9,
Prior to receiving the cloud infrastructure operational data,
Collecting cloud operating data of each customer server through a plurality of customer servers and an Analytics as a Service (AaaS) operating in a formed cloud infrastructure; And
Further comprising classifying and storing the cloud operation data collected by the data collection server.

10. The method of claim 9,
In the big data DB,
Log data, performance data, check data, and event data.

10. The method of claim 9,
In the step of detecting the new process,
And transmitting an event for the detected new process. &Lt; Desc / Clms Page number 19 >

11. The method of claim 10,
The cloud operating data comprises:
Wherein the real-time data includes fixed data, unstructured data, and status data.

14. The method of claim 13,
Wherein the formatted data is drive information including a CPU, a memory, a disk, and a network, which are hardware configurations of a customer server.

14. The method of claim 13,
Wherein the atypical data includes cloud OS information, hypervisor information, virtual OS information, IT operational performance information, and performance measurement response time.

14. The method of claim 13,
Wherein the state data includes information on the on / off state of the server, the process, and the DBMS as the IT operational state information.

10. The method of claim 9,
And comparing the analyzed form data with a hardware operation threshold of the big data DB to determine whether the failure has occurred or not.

18. The method of claim 17,
Wherein the step of transmitting the fault occurrence event and storing the fault occurrence format data in the big data DB is performed when the analyzed formulated data exceeds the drive threshold in the determination of the fault, Delivery method.

10. The method of claim 9,
And determining whether the server, the process, and the DBMS are on / off status of the analyzed state data in the step of determining whether the failure is a failure, .

20. The method of claim 19,
Wherein when the status data is determined to be in the off state, the failure occurrence event is transmitted and the failure status data is stored in the big data DB. Way.

10. The method of claim 9,
And transmitting an error process failure detection event when the unstructured data is determined to be an error process as compared with the big data DB when the failure pattern of the atypical data is not detected. Method of providing analysis.

10. The method of claim 9,
And storing the detected new process as unstructured data of a big data DB after the synchronizing step.

10. The method of claim 9,
Comparing the unstructured data with a previously stored fault pattern to determine whether the unstructured data is faulty; and transmitting the fault occurrence event when the irregular data is determined to be fault data.