KR20170050506A

KR20170050506A - Method for analyzing categorical data

Info

Publication number: KR20170050506A
Application number: KR1020150152117A
Authority: KR
Inventors: 강지훈; 권순목; 유동호; 박성미; 박용로
Original assignee: 삼성에스디에스 주식회사
Priority date: 2015-10-30
Filing date: 2015-10-30
Publication date: 2017-05-11
Also published as: KR102280884B1

Abstract

Disclosed is a categorical data analysis method, converting discrete categorical data difficult to extract correlation therefrom into continuous data, to analyze the continuous data. According to one embodiment of the present invention, the method comprises the following steps of: generating a first accumulation function with categorical data and generating a second accumulation function with second categorical data; removing a rise pattern from the first accumulation function to generate a first residual function and removing the rise pattern from the second accumulation function to generate a second residual function; generating a correlation function expressing a correlation between the first and second residual functions; generating the final residual function with a difference value between the first residual function and the correlation function; and determining that abnormality has occurred in an object to be monitored when the final residual function is out of a predetermined range.

Description

{METHOD FOR ANALYZING CATEGORICAL DATA}

본 발명은 범주형 데이터 분석 방법에 관한 것으로, 보다 상세하게는 단속적인 데이터를 갖는 이산적인 범주형 데이터를 연속적인 데이터로 변환한 후 이를 분석하는 범주형 데이터 분석 방법에 관한 것이다.The present invention relates to a categorical data analysis method, and more particularly, to a categorical data analysis method for converting discrete categorical data having intermittent data into continuous data and analyzing the categorical data.

현대의 빅데이터 시대에는 연속형 데이터 뿐만 아니라 수많은 형태의 범주형 데이터가 존재한다. 변수가 취할 수 있는 값이 범주(category)로 주어지는 경우 그 변수를 범주형 변수라고 하는데, 이러한 범주형 변수들로 이루어진 자료를 범주형 데이터라고 한다.In the modern Big Data era, there are numerous types of categorical data as well as continuous data. When a value that can be taken by a variable is given as a category, the variable is called a categorical variable. Data consisting of these categorical variables is called categorical data.

범주형 데이터의 세부 유형은 다음과 같다.The detailed types of categorical data are as follows.

범주형

Category
이분형척도 (예 : 찬성, 반대)This blanket measure (eg, yes, no) 명목형척도(예 : 여러 가지 종교)Nominal measures (eg various religions) 순서형척도(예 : 청년층, 중년층, 장년층)Ordered measures (eg, youth, middle-aged, elderly)

이러한 범주형 데이터들은 여러 산업에서 새로운 가치창출을 위해 적극적으로 활용되고 있다.These categorical data are actively being used to create new value in various industries.

그러나, 종래의 통계학, 기계학습 기반의 분석 로직들은 주로 연속적인 데이터를 처리하는데 초점을 맞추고 있는바, 이산적인 데이터 값을 갖는 범주형 데이터를 분석하는데 한계가 있다는 문제점이 있었다.However, conventional statistical and machine learning based analysis logic mainly focuses on processing continuous data, and there is a problem in that it is difficult to analyze categorical data having discrete data values.

예를 들어, 예측, 분류, 모니터링 등 실제 알고리즘의 활용 개체가 주로 연속형 함수로 표현된 데이터들이며, 연속형 데이터들에 대한 분석 로직이 장기간동안 다양한 형태로 개발되고 발전되어 왔다.For example, the utilization of actual algorithms such as prediction, classification, and monitoring are mainly represented by continuous functions, and analysis logic for continuous data has been developed and developed in various forms over a long period of time.

이에 반해, 이산적인 데이터를 갖는 범주형 데이터 분석, 특히 특정 이벤트의 시간흐름에 따른 변화와 관련하여 데이터의 성질을 분석하는 연구는 상당히 제한적으로 진행되어 왔다.On the other hand, categorical data analysis with discrete data, especially the analysis of the nature of data in relation to changes over time of a particular event, has been quite limited.

도 1은 이산적인 범주형 데이터를 분석하는 과정을 설명하기 위한 도면이다.FIG. 1 is a diagram for explaining a process of analyzing discrete categorical data.

도 1에는 특정 장비에서 발생될 수 있는 로그 이벤트 A와 B의 시간 흐름에 따른 빈도 변화를 도시하고 있다. 도 1 및 도 2를 살펴보면 이벤트 A의 경우 대부분 0의 값을 갖고, 2014년 4월 30일부터 2014년 7월 중순, 2015년 1월부터 2015년 4월까지 간헐적인 패턴으로 이벤트가 발생됨을 알 수 있다.FIG. 1 shows changes in frequency of log events A and B that may occur in a specific device with time. Referring to FIGS. 1 and 2, it can be seen that event A has a value of mostly 0, and events are generated in an intermittent pattern from April 30, 2014 to July 2014, and from January 2015 to April 2015. .

마찬가지로, 이벤트 B의 경우 이벤트 A에 비해 이벤트가 빈번하게 발생되지만 2014년 7월 말을 기점으로 2014년 12월까지 이벤트 빈도가 0임을 알 수 있다.Similarly, in event B, events occur frequently compared to event A, but it can be seen that the event frequency is zero by the end of July 2014 and until December 2014.

도 1에 도시된 바와 같은 비연속적인 데이터들은 기존 통계적 기법으로는 이벤트 A와 B의 상관관계를 충분히 분석할 수 없다는 문제점이 있었다. 따라서, 이산적인 범주형 데이터를 효과적으로 분석할 수 있는 새로운 형태의 모니터링 방법에 대한 필요성이 대두되었다.The non-continuous data as shown in FIG. 1 has a problem that the correlation between the events A and B can not be sufficiently analyzed by the existing statistical techniques. Therefore, there is a need for a new type of monitoring method that can effectively analyze discrete categorical data.

본 발명은 상술한 문제점을 해결하기 위해 안출된 것으로 상호 연관 관계를 도출하기 어려운 이산적인 범주형 데이터를 연속적인 데이터로 변환하여 이를 분석할 수 있는 범주형 데이터 분석 방법을 제공하는데 있다.Disclosure of Invention Technical Problem [8] Accordingly, the present invention has been made in view of the above problems, and it is an object of the present invention to provide a categorical data analysis method capable of converting discrete categorical data,

본 발명의 기술적 과제들은 이상에서 언급한 기술적 과제들로 제한되지 않으며, 언급되지 않은 또 다른 기술적 과제들은 아래의 기재로부터 본 발명이 속한 기술분야의 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The technical objects of the present invention are not limited to the above-mentioned technical problems, and other technical subjects not mentioned can be clearly understood by those skilled in the art from the following description.

상술한 목적을 달성하기 위한 본 발명의 일 실시예에 따른 범주형 데이터 분석 방법은, 제1 범주형 데이터로 제1 누적 함수를 생성하고, 제2 범주형 데이터로 제2 누적 함수를 생성하는 단계, 상기 제1 누적 함수에서 상승 패턴을 제거하여 제1 잔차 함수를 생성하고, 상기 제2 누적 함수에서 상승 패턴을 제거하여 제2 잔차 함수를 생성하는 단계, 상기 제1 잔차 함수와 제2 잔차 함수의 상호 연관 관계를 나타내는 상관 관계 함수를 생성하는 단계, 상기 제1 잔차 함수와 상기 상관 관계 함수와의 차이값으로 최종 잔차 함수를 생성하는 단계 및 상기 최종 잔차 함수가 기 설정된 범위를 벗어나면 모니터링 대상에 이상이 발생한 것으로 판단하는 단계를 포함한다.According to an aspect of the present invention, there is provided a categorical data analysis method comprising: generating a first cumulative function as first categorical data and a second cumulative function as second categorical data; Generating a first residual function by removing a rising pattern from the first cumulative function and removing a rising pattern from the second cumulative function to generate a second residual function, Generating a final residual function by using a difference value between the first residual function and the correlation function, and generating a final residual function by using a difference between the first residual function and the correlation function, It is determined that an abnormality has occurred.

본 발명의 일 실시예에 따르면, 상기 제1 잔차 함수 및 상기 제2 잔차 함수를 생성하는 단계는, 상기 제1 누적 함수의 상승 패턴을 나타내는 제1 추세선 및 상기 제2 누적 함수의 상승 패턴을 나타내는 제2 추세선을 생성하는 단계 및 상기 제1 추세선과 상기 제1 누적 함수의 차이값인 잔차로 제1 잔차 함수를 생성하고, 상기 제2 추세선과 상기 제2 누적함수의 차이값인 잔차로 상기 제2 잔차 함수를 생성하는 단계를 포함할 수 있다.According to an embodiment of the present invention, the step of generating the first residual function and the second residual function includes a first trend line representing a rising pattern of the first cumulative function and a second trend line representing a rising pattern of the second cumulative function Generating a second trend line and a first residual function as a difference between the first trend line and the first cumulative function and generating a second residual function as a difference between the second trend line and the second cumulative function, 2 < / RTI > residual function.

본 발명의 일 실시예에 따르면, 상기 제1 추세선 및 상기 제2 추세선을 생성하는 단계는, 시계열 선형회귀분석(Time-Series Linear Regression)으로 상기 제1 추세선의 기울기 및 상기 제2 추세선을 기울기를 산출하는 단계를 포함할 수 있다.According to an embodiment of the present invention, the step of generating the first trendline and the second trendline may include the step of: slope the first trendline and the second trendline by time-series linear regression And a step of calculating the number of steps.

본 발명의 일 실시예에 따르면, 상기 제1 추세선의 기울기 및 상기 제2 추세선의 기울기를 산출하는 단계는, 상기 제1 추세선과 상기 제1 누적 함수의 차이값인 잔차의 최소제곱을 만족하는 기울기를 상기 제1 추세선의 기울기로 결정하는 단계 및 상기 제2 추세선과 상기 제2 누적 함수의 차이값인 잔차의 최소제곱을 만족하는 기울기를 상기 제2 추세선의 기울기로 결정하는 단계를 포함할 수 있다.According to an embodiment of the present invention, the step of calculating the slope of the first trendline and the slope of the second trendline may include calculating a slope of the first trendline and a slope of the second trendline, Determining the slope of the first trendline as a slope of the first trendline and determining a slope of the second trendline as a slope satisfying a least square of the residual, which is a difference between the second trendline and the second cumulative function, .

본 발명의 일 실시예에 따르면, 상기 상관 관계 함수를 생성하는 단계는, 상기 제2 추세선과 상기 제2 누적함수의 차이값인 잔차를 독립 변수로 회귀 모델에 활용하여 상관 관계 함수를 생성하는 단계를 포함할 수 있다.According to an embodiment of the present invention, the step of generating the correlation function may include generating a correlation function using the residual, which is a difference value between the second trend line and the second cumulative function, . &Lt; / RTI >

본 발명의 일 실시예에 따르면, 상기 상관 관계 함수를 생성하는 단계는, 모델로는 다중선형회귀, 신경망 모델, 의사 결정 나무(Regression Tree), Regularized Regression 기법 중 적어도 하나를 이용할 수 있다.According to an embodiment of the present invention, the generating of the correlation function may use at least one of a multiple linear regression, a neural network model, a regression tree, and a regularized regression technique.

본 발명의 일 실시예에 따르면, 상기 최종 잔차 함수가 기 설정된 범위를 벗어나면 모니터링 대상에 이상이 발생한 것으로 판단하는 단계는, 제1 범주형 데이터의 평균값 또는 중앙값으로 기준값을 산출하는 단계, 상기 기준값에 수용 가능한 변동값을 더하여 상한 기준값을 산출하고, 상기 변동값을 차감하여 하한 기준값을 산출하는 단계 및 상기 최종 잔차 함수가 상기 상한 기준값 또는 상기 하한 기준값을 벗어나면 상기 모니터링 대상에 이상이 발생한 것으로 판단하는 단계를 포함할 수 있다.According to an embodiment of the present invention, when the final residual function is out of a preset range, the step of determining that an abnormality has occurred in the monitoring target may include calculating a reference value using an average value or a median value of the first categorical data, Calculating a lower limit reference value by subtracting the variation value from the upper limit reference value, and determining that an abnormality has occurred in the monitoring target if the final residual function is out of the upper limit reference value or the lower limit reference value .

본 발명의 일 실시예에 따르면, 상기 제1 범주형 데이터 및 상기 제2 범주형 데이터는 비연속적인 이산적인 데이터일 수 있다.According to an embodiment of the present invention, the first categorical data and the second categorical data may be discontinuous discrete data.

본 발명의 일 실시예에 따른 범주형 데이터 분석 장치는, 제1 범주형 데이터로 제1 누적 함수를 생성하고, 제2 범주형 데이터로 제2 누적 함수를 생성하는 누적 함수 생성부, 상기 제1 누적 함수에서 상승 패턴을 제거하여 제1 잔차 함수를 생성하고, 상기 제2 누적 함수에서 상승 패턴을 제거하여 제2 잔차 함수를 생성하는 잔차 함수 생성부, 상기 제1 잔차 함수와 상기 제2 잔차 함수의 상호 연관 관계를 나타내는 상관 관계 함수를 생성하는 상관 관계 함수 생성부, 상기 제1 잔차 함수와 상기 상관 관계 함수의 차이값으로 최종 잔차 함수를 생성하는 최종 잔차 함수 생성부 및 상기 최종 잔차 함수가 기 설정된 범위를 벗어나면 모니터링 대상에 이상이 발생한 것으로 판단하는 이상 감지부를 포함한다.The categorical data analyzing apparatus according to an embodiment of the present invention includes an accumulated function generating unit for generating a first cumulative function with first categorical data and a second cumulative function with second categorical data, A residual function generating unit for generating a first residual function by removing the rising pattern from the cumulative function and generating a second residual function by removing the rising pattern from the second cumulative function, A final residual function generating unit for generating a final residual function by a difference value between the first residual function and the correlation function, And detects an abnormality in the monitored object if the measured range is out of the predetermined range.

본 발명의 일 실시예에 따르면, 상기 잔차 함수 생성부는, 상기 제1 누적 함수의 상승 패턴을 나타내는 제1 추세선 및 상기 제2 누적 함수의 상승 패턴을 나타내는 제2 추세선을 생성하는 추세선 생성부;를 포함하고, 상기 제1 추세선과 상기 제1 누적 함수의 차이값인 잔차로 제1 잔차 함수를 생성하고, 상기 제2 추세선과 상기 제2 누적함수의 차이값인 잔차로 상기 제2 잔차 함수를 생성할 수 있다.According to an embodiment of the present invention, the residual function generating unit may include a trend line generating unit for generating a first trend line representing a rising pattern of the first cumulative function and a second trend line representing a rising pattern of the second cumulative function, Generating a first residual function as a difference between the first trend line and the first cumulative function and generating the second residual function as a residual difference between the second trend line and the second cumulative function; can do.

본 발명의 일 실시예에 따르면, 상관 관계 함수 생성부는, 상기 제2 추세선과 상기 제2 누적 함수의 차이값인 잔차를 독립 변수로 회귀 모델에 활용하여 상관 관계 함수를 생성할 수 있다.According to an embodiment of the present invention, the correlation function generator may generate a correlation function by using a residual, which is a difference value between the second trend line and the second cumulative function, as an independent variable in a regression model.

본 발명의 일 실시예에 따르면, 상기 이상 감지부는, 제1 범주형 데이터의 평균값 또는 중앙값으로 기준값을 산출하는 기준값 산출부, 상기 기준값에 수용 가능한 변동값을 더하여 상한 기준값을 산출하는 상한 기준값 산출부 및 상기 변동값을 차감하여 하한 기준값을 산출하는 하한 기준값 산출부를 포함하고, 상기 최종 잔차 함수가 상기 상한 기준값 또는 상기 하한 기준값을 벗어나면 상기 모니터링 대상에 이상이 발생한 것으로 판단할 수 있다.According to an embodiment of the present invention, the abnormality detection unit may include a reference value calculation unit for calculating a reference value from an average value or a median value of the first categorical data, an upper limit reference value calculation unit for calculating an upper limit reference value by adding a variation value acceptable to the reference value, And a lower limit reference value calculation unit for calculating the lower limit reference value by subtracting the variation value. When the final residual function is out of the upper limit reference value or the lower limit reference value, it can be determined that an abnormality has occurred in the monitoring target.

본 발명의 또 다른 실시예에 따른 범주형 데이터 분석 장치는, 하나 이상의 프로세서, 상기 프로세서에 의하여 수행되는 컴퓨터 프로그램을 로드(load)하는 메모리 및 범주형 데이터를 분석할 수 있는 컴퓨터 프로그램을 저장하는 스토리지를 포함하되, 상기 컴퓨터 프로그램은, 제1 범주형 데이터로 제1 누적 함수를 생성하고, 제2 범주형 데이터로 제2 누적 함수를 생성하는 오퍼레이션, 상기 제1 누적 함수에서 상승 패턴을 제거하여 제1 잔차 함수를 생성하고, 상기 제2 누적 함수에서 상승 패턴을 제거하여 제2 잔차 함수를 생성하는 오퍼레이션, 상기 제1 잔차 함수와 제2 잔차 함수의 상호 연관 관계를 나타내는 상관 관계 함수를 생성하는 오퍼레이션, 상기 제1 잔차 함수와 상기 상관 관계 함수와의 차이값으로 최종 잔차 함수를 생성하는 오퍼레이션 및 상기 최종 잔차 함수가 기 설정된 범위를 벗어나면 모니터링 대상에 이상이 발생한 것으로 판단하는 오퍼레이션을 포함한다.A categorical data analysis apparatus according to another embodiment of the present invention includes a storage for storing one or more processors, a memory for loading computer programs executed by the processor, and a computer program for analyzing categorical data The computer program comprising instructions for: generating a first cumulative function as first categorical data and generating a second cumulative function as second categorical data, removing the rising pattern from the first cumulative function, An operation of generating a correlation function indicating a correlation function between the first residual function and the second residual function, generating an error function by generating a first residual function, removing the rising pattern from the second accumulation function to generate a second residual function, An operation of generating a final residual function as a difference value between the first residual function and the correlation function, and And determining that an abnormality has occurred in the monitoring object if the final residual function is out of a predetermined range.

본 발명의 또 다른 실시예에 다른 컴퓨터 프로그램은, 컴퓨터 장치와 결합하여, 제1 범주형 데이터로 제1 누적 함수를 생성하고, 제2 범주형 데이터로 제2 누적 함수를 생성하는 단계, 상기 제1 누적 함수에서 상승 패턴을 제거하여 제1 잔차 함수를 생성하고, 상기 제2 누적 함수에서 상승 패턴을 제거하여 제2 잔차 함수를 생성하는 단계, 상기 제1 잔차 함수와 제2 잔차 함수의 상호 연관 관계를 나타내는 상관 관계 함수를 생성하는 단계, 상기 제1 잔차 함수와 상기 상관 관계 함수와의 차이값으로 최종 잔차 함수를 생성하는 단계 및 상기 최종 잔차 함수가 기 설정된 범위를 벗어나면 모니터링 대상에 이상이 발생한 것으로 판단하는 단계를 실행하도록 컴퓨터 판독 가능한 기록 매체에 저장된다.Another computer program in accordance with another embodiment of the present invention is a computer program product comprising computer executable instructions for generating a first cumulative function with first categorical data and a second cumulative function with second categorical data, Generating a first residual function by removing the rising pattern from the first cumulative function, generating a second residual function by removing the rising pattern from the second cumulative function, and correlating the first residual function with the second residual function Generating a correlation function indicating a correlation between the first residual function and the correlation function; generating a final residual function as a difference value between the first residual function and the correlation function; and if the final residual function is out of a predetermined range, It is stored in a computer-readable recording medium so as to execute the step of judging that it has occurred.

상술한 본 발명의 일 실시예에 따른 범주형 데이터 분석 방법에 따르면, 상호 연관 관계를 도출하기 어려운 이산적인 범주형 데이터를 연속적인 데이터로 변환하여 이를 분석할 수 있게 된다는 효과를 달성할 수 있다.According to the categorical data analysis method according to an embodiment of the present invention, it is possible to convert discrete categorical data, which is difficult to derive an interrelationship, into continuous data and analyze the categorical data.

도 1은 이산적인 범주형 데이터를 분석하는 과정을 설명하기 위한 도면이다.
도 2는 본 발명의 일 실시예에 따른 범주형 데이터 분석 방법을 설명하기 위한 흐름도이다.
도 3 및 도 4는 본 발명의 일 실시예에 따라 이산적인 범주형 데이터를 연속적인 데이터로 변환하는 과정을 설명하기 위한 도면이다.
도 5는 본 발명의 일 실시예에 따라 누적 함수에서 상승 패턴을 제거한 결과를 설명하기 위한 도면이다.
도 6은 본 발명의 일 실시예에 따라 다른 이벤트와의 상관 관계를 제외한 잔차를 설명하기 위한 도면이다.
도 7은 본 발명의 일 실시예에 따라 이산적인 범주형 데이터가 다른 이벤트들과의 상관 관계를 제외한 최종 잔차로 변환되는 과정을 설명하기 위한 도면이다.
도 8은 본 발명의 일 실시예에 따른 범주형 데이터 분석 장치를 설명하기 위한 기능 블럭도이다.
도 9는 본 발명의 또 다른 실시예에 따른 범주형 데이터 분석 장치를 설명하기 위한 기능 블록도이다.FIG. 1 is a diagram for explaining a process of analyzing discrete categorical data.
2 is a flowchart illustrating a categorical data analysis method according to an embodiment of the present invention.
3 and 4 are views for explaining a process of converting discrete categorical data into continuous data according to an embodiment of the present invention.
5 is a diagram for explaining a result of removing a rising pattern in an accumulation function according to an embodiment of the present invention.
6 is a view for explaining residuals excluding correlation with other events according to an embodiment of the present invention.
7 is a diagram for explaining a process in which discrete categorical data is transformed into a final residual except for correlation with other events according to an embodiment of the present invention.
8 is a functional block diagram for explaining a categorical data analyzing apparatus according to an embodiment of the present invention.
9 is a functional block diagram illustrating a categorical data analysis apparatus according to another embodiment of the present invention.

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시예를 상세히 설명한다. 본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 게시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 게시가 완전하도록 하고, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 명세서 전체에 걸쳐 동일 참조 부호는 동일 구성 요소를 지칭한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. BRIEF DESCRIPTION OF THE DRAWINGS The advantages and features of the present invention, and the manner of achieving them, will be apparent from and elucidated with reference to the embodiments described hereinafter in conjunction with the accompanying drawings. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. To fully disclose the scope of the invention to those skilled in the art, and the invention is only defined by the scope of the claims. Like reference numerals refer to like elements throughout the specification.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다.Unless defined otherwise, all terms (including technical and scientific terms) used herein may be used in a sense commonly understood by one of ordinary skill in the art to which this invention belongs. Also, commonly used predefined terms are not ideally or excessively interpreted unless explicitly defined otherwise.

또한, 본 명세서에서 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함될 수 있다. 명세서에서 사용되는 "포함한다(comprises)" 및/또는 "포함하는(comprising)"은 언급된 구성요소, 단계, 동작 및/또는 소자는 하나 이상의 다른 구성요소, 단계, 동작 및/또는 소자의 존재 또는 추가를 배제하지 않는다.Also, the singular forms herein may include plural forms unless specifically stated in the text. It is noted that the terms "comprises" and / or "comprising" used in the specification are intended to be inclusive in a manner similar to the components, steps, operations, and / Or additions.

도 2는 본 발명의 일 실시예에 따른 범주형 데이터 분석 방법을 설명하기 위한 흐름도이다.2 is a flowchart illustrating a categorical data analysis method according to an embodiment of the present invention.

본 발명의 일 실시예에 따른 범주형 데이터 분석 장치는 이산적인 제1 범주형 데이터로 제1 누적 함수를 생성하고 제2 범주형 데이터로 제2 누적 함수를 생성한다(S210).The categorical data analysis apparatus according to an embodiment of the present invention generates a first cumulative function with discrete first categorical data and a second cumulative function with second categorical data (S210).

이때, 제1 범주형 데이터는 모니터링 대상에서 발생된 이벤트 A의 발생 빈도이고 제2 범주형 데이터는 이벤트 B의 발생 빈도일 수 있다. 즉, 제1 누적 함수는 시간 경과에 따른 이벤트 A의 발생 빈도를 순차적으로 합산하여 생성된 것일 수 있다. 마찬가지로, 제2 누적 함수는 시간 경과에 따른 이벤트 B의 발생 빈도를 순차적으로 합산하여 생성된 것일 수 있다.At this time, the first categorical data may be the occurrence frequency of the event A generated in the monitoring object, and the second categorical data may be the occurrence frequency of the event B. That is, the first cumulative function may be generated by sequentially summing the frequency of occurrence of the event A over time. Likewise, the second cumulative function may be generated by sequentially summing the frequency of occurrence of the event B over time.

이후, 제1 누적 함수에서 상승 패턴을 제거하여 제1 잔차 함수를 생성하고, 제2 누적 함수에서 상승 패턴을 제거하여 제2 잔차 함수를 생성한다(S220). 제1 잔차 함수 및 제2 잔차 함수를 생성하는 구체적인 방법은 도 4 및 도 5에서 상세하게 설명하도록 한다.Thereafter, a first residual function is generated by removing the rising pattern from the first cumulative function, and a second residual function is generated by removing the rising pattern from the second cumulative function (S220). A specific method of generating the first residual function and the second residual function will be described in detail in FIG. 4 and FIG.

상술한 과정을 거쳐 생성된 제1 잔차 함수 및 제2 잔차 함수는 연속적인 데이터를 갖는 값이다. 따라서, 제1 잔차 함수 및 제2 잔차 함수간에 상호 연관 관계를 나타내는 상관 관계 함수를 생성할 수 있게 된다(S230).The first residual function and the second residual function generated through the above process are values having continuous data. Accordingly, a correlation function indicating an interrelationship between the first residual function and the second residual function can be generated (S230).

즉, 제1 범주형 데이터 및 제2 범주형 데이터 그 자체는 이산적인 특징에 의해 상호 연관 관계를 도출하는데 한계가 있으나, 제1 범주형 데이터 및 제2 범주형 데이터를 연속적인 데이터인 제1 잔차 함수 및 제2 잔차 함수로 변환하였는바, 두 데이터간에 상호 연관 관계를 용이하게 도출할 수 있게 되는 것이다.That is, although the first categorical data and the second categorical data themselves are limited in deriving the mutual relationship by the discrete feature, the first categorical data and the second categorical data are divided into the first residual data Function and the second residual function, it is possible to easily derive a correlation between the two data.

이후, 제1 잔차 함수 및 상관 관계 함수의 차이값으로 최종 잔차 함수를 생성하고(S240), 이를 모니터링하여 최종 잔차 함수가 기 설정된 범위를 벗어나면 모니터링 대상에 이상이 발생한 것으로 판단할 수 있다(S250).Thereafter, a final residual function is generated as a difference value between the first residual function and the correlation function (S240), and it can be determined that an abnormality has occurred in the monitoring target if the final residual function is out of a preset range (S250 ).

이하에서는, 각 단계별 과정을 구체적으로 설명하도록 한다.In the following, each step will be described in detail.

도 3 및 도 4는 본 발명의 일 실시예에 따라 이산적인 범주형 데이터를 연속적인 데이터로 변환하는 과정을 설명하기 위한 도면이다.3 and 4 are views for explaining a process of converting discrete categorical data into continuous data according to an embodiment of the present invention.

본 실시예에서 모니터링 대상은 기계 설비인 것을 예로 들어 설명하나, 모니터링 대상이 될 수 있는 것은 이에 한정되지는 않는다.In the present embodiment, the monitoring target is a mechanical equipment, but the present invention is not limited thereto.

도 3에서 A 및 B는 기계 설비에서 발생될 수 있는 고장 유형을 의미한다. 즉, 도 3을 살펴보면 A 및 B가 간헐적인 패턴을 가지며 발생되고 있음을 알 수 있다. 따라서, 도 3과 같은 이벤트가 발생하는 경우 이벤트 A와 B의 Person’s correlation coefficient 값이 거의 0에 가까워 A 및 B의 빈도의 상관 관계에 대한 효과적인 분석 결과를 얻을 수 없게 된다.In Fig. 3, A and B indicate the type of failure that can occur in the hardware. That is, in FIG. 3, it can be seen that A and B are generated with an intermittent pattern. Therefore, when the event shown in FIG. 3 occurs, the value of the person's correlation coefficient between the events A and B is close to zero, so that an effective analysis result on the correlation between the frequencies of A and B can not be obtained.

따라서, 본 발명의 일 실시예에 따른 범주형 데이터 분석 장치는 단속적인 데이터 값을 갖는 이벤트 A 및 이벤트 B의 발생빈도를 연속적인 데이터로 변환하기 위해 누적 함수를 생성한다.Therefore, the categorical data analysis apparatus according to an embodiment of the present invention generates an accumulated function to convert the frequency of occurrence of event A and event B having intermittent data values into continuous data.

도 4는 본 발명의 일 실시예에 따라 단속적인 데이터를 누적하여 생성한 누적 함수를 설명하기 위한 도면이다.FIG. 4 is a view for explaining an accumulated function generated by accumulating intermittent data according to an embodiment of the present invention.

본 발명의 일 실시예에 따른 범주형 데이터 분석 방법은 제1 범주형 데이터로 제1 누적 함수를 생성하고, 제2 범주형 데이터로 제2 누적 함수를 생성한다. 여기에서, 제1 범주형 데이터는 시간 경과에 따른 이벤트 A의 발생 빈도이고, 제2 범주형 데이터는 시간 경과에 따른 이벤트 B의 발생 빈도일 수 있다.The categorical data analysis method according to an embodiment of the present invention generates a first cumulative function as the first categorical data and a second cumulative function as the second categorical data. Here, the first categorical data is the frequency of occurrence of event A over time, and the second categorical data may be the frequency of occurrence of event B over time.

도 3에 도시된 바와 같이 불연속적이며 상대적으로 희박하게 발생하는 이벤트들에 누적합을 산출하면, 도 4에 도시된 바와 같은 누적 함수를 생성할 수 있다. 이벤트가 집중적으로 발생하는 구간은 누적 함수의 기울기가 상승하며, 상대적으로 이벤트가 발생하지 않는 구간은 완만한 기울기로 표현된다.As shown in FIG. 3, when a cumulative sum is calculated for events that occur discontinuously and relatively lean, the cumulative function as shown in FIG. 4 can be generated. The slope of the cumulative function is increased in an interval where the event is concentrated, and the slope of the interval in which the event is relatively not occurs is represented by a gentle slope.

이때, 시계열 선형회귀분석(Time-Series Linear Regression)을 활용하면 제1 누적 함수(410)의 상승 패턴을 나타내는 제1 추세선(420)의 기울기를 산출할 수 있다. 여기에서 제1 누적 함수(410)는 이벤트 A의 발생 빈도를 누적한 그래프이다. 마찬가지로 이벤트 B의 발생 빈도를 누적한 제2 누적 함수(430)의 상승 패턴을 나타내는 제2 추세선(440)의 기울기도 산출할 수 있다.At this time, the slope of the first trend line 420 representing the rising pattern of the first accumulation function 410 can be calculated by using a time-series linear regression analysis. Here, the first accumulation function 410 is a graph in which the frequency of occurrence of the event A is accumulated. Likewise, the slope of the second trend line 440 indicating the rising pattern of the second accumulation function 430, which accumulates the occurrence frequency of the event B, can also be calculated.

도 4에 도시된 그래프는 이벤트들의 발생 빈도를 누적한 것이므로 제1 누적 함수(410)와 제2 누적 함수(420)는 상승하는 패턴을 가지게 된다. 그러나, 본 발명에서는 기계 설비가 정상적으로 동작하는 경우와 비교했을 때, 각 이벤트의 발생 빈도가 어떻게 변화하는지를 관찰하고자 하므로, 상술한 상승 패턴을 제거하는 단계가 필요하다.Since the frequency of occurrence of events is accumulated in the graph shown in FIG. 4, the first accumulation function 410 and the second accumulation function 420 have a rising pattern. However, in the present invention, in order to observe how the frequency of occurrence of each event changes as compared with the case where the hardware is normally operated, a step of removing the rising pattern as described above is required.

도 5는 본 발명의 일 실시예에 따라 누적 함수에서 상승 패턴을 제거한 결과를 설명하기 위한 도면이다.5 is a diagram for explaining a result of removing a rising pattern in an accumulation function according to an embodiment of the present invention.

본 발명의 일 실시예에 따른 범주형 데이터 분석 방법은 누적 함수에서 상승 패턴을 제거하여 이산적인 범주형 데이터를 연속적인 데이터로 변환한다. 구체적으로, 시간에 따른 이벤트 A의 발생 빈도를 A 이벤트의 누적 빈도와 예측 누적 빈도의 차이인 잔차로 변환하면 상승 패턴이 제거된 데이터값을 산출할 수 있다.The categorical data analysis method according to an embodiment of the present invention removes the rising pattern from the cumulative function to convert the discrete categorical data into continuous data. Specifically, by converting the occurrence frequency of the event A over time to the residual, which is the difference between the cumulative frequency of the A event and the predicted cumulative frequency, the data value with the rising pattern removed can be calculated.

구체적으로, 도 4에서 A 이벤트의 누적 빈도를 의미하는 제1 누적 함수(410)와 예측 누적 빈도를 의미하는 제1 추세선(420)의 차이를 산출하면 도 5에 도시된 제1 잔차 함수(510)를 산출할 수 있다.4, when the difference between the first cumulative function 410 indicating the cumulative frequency of the A event and the first trendline 420 indicating the predicted cumulative frequency is calculated, the first residual function 510 shown in FIG. 5 ) Can be calculated.

마찬가지로, 도 4에서 B 이벤트의 누적 빈도를 의미하는 제2 누적 함수(430)와 이벤트 B의 예측 누적 빈도를 의미하는 제2 추세선(440)의 차이를 산출하면 도 5에 도시된 제2 잔차 함수(520)를 산출할 수 있다.4, the difference between the second cumulative function 430, which indicates the cumulative frequency of the B event, and the second trendline 440, which indicates the predicted cumulative frequency of the event B, (520) can be calculated.

이를 수식으로 표현하면 다음과 같다.This can be expressed as follows.

여기에서, Ψ는 특정 이벤트의 발생 빈도를 누적한 누적 함수, α는 누적 함수의 추세선 기울기, e(t)는 실제 측정된 측정값과 예측값의 차이, 즉, 잔차를 의미한다.Here, Ψ is a cumulative function that accumulates the frequency of occurrence of a particular event, α is a trend line slope of the cumulative function, and e (t) is the difference between the actual measured value and the predicted value.

즉 제1 잔차 함수(510)와 제2 잔차 함수(520)는 이산적인 제1 범주형 데이터와 제2 범주형 데이터가 연속적인 데이터로 변환된 결과물이다.That is, the first residual function 510 and the second residual function 520 are the result of transforming discrete first categorical data and second categorical data into continuous data.

한편, 상술한 상승 패턴을 제거하기 위해서는 시계열 회귀 분석 모델을 이용할 수 있는데, 시계열 회귀 분석은 시간에 다른 특정 변수값의 패턴을 표현하기 위한 방법으로, 잔차의 최소제곱(least square)을 만족시키는 추세선의 기울기 α를 산출하는 방식으로 진행된다.In order to remove the rising pattern, a time series regression analysis model can be used. Time series regression analysis is a method for expressing patterns of other specific variable values in time, The slope " alpha "

상술한 과정을 통해 이산적인 범주형 데이터를 연속적인 값을 갖는 데이터로 변환할 수 있게 된다는 효과를 달성할 수 있다. 즉, 도 5의 제1 잔차 함수(510)는 도 3의 이벤트 A 발생 빈도를 변환한 값으로 이산적인 값을 가지는 데이터가 연속적인 값을 갖는 데이터로 변환되었음을 확인할 수 있다.The effect that the discrete categorical data can be converted into the data having the continuous value can be achieved through the above-described process. That is, the first residual function 510 of FIG. 5 is a value obtained by converting the frequency of occurrence of the event A in FIG. 3, and it can be confirmed that data having a discrete value is converted into data having a continuous value.

마찬가지로, 도 3의 이산적인 범주형 데이터인 이벤트 B 발생 빈도도 연속적인 제2 잔차 함수(520)로 변환되었음을 알 수 있다.Similarly, it can be seen that the frequency of occurrence of event B, which is the discrete categorical data of FIG. 3, is also transformed into a continuous second residual function 520.

이하에서는 연속적인 데이터로 변환된 이벤트들간에 상관 관계를 이용하여 모니터링 대상인 기계설비에 발생될 수 있는 이상 현상을 미리 감지할 수 있는 방법에 대해 설명하도록 한다.Hereinafter, a description will be made of a method of detecting an anomaly that may occur in a machine to be monitored in advance by using a correlation between events converted into continuous data.

도 6은 본 발명의 일 실시예에 따라 다른 이벤트와의 상관 관계를 제외한 잔차를 설명하기 위한 도면이다.6 is a view for explaining residuals excluding correlation with other events according to an embodiment of the present invention.

도 3 내지 도 5에서는 기계 설비에서 발생되는 이산적인 범주형 데이터를 연속적인 데이터로 변환하는 방법에 대해 설명하였다. 이때, 기계 설비에서 발생될 수 있는 이벤트 A와 이벤트 B는 상호 연관 관계가 있을 수 있다.FIGS. 3-5 illustrate a method for converting discrete categorical data generated in hardware into continuous data. At this time, events A and B that may occur in the hardware may be correlated.

예를 들어, 이벤트 B의 발생이 이벤트 A 발생을 야기하는 경우 이벤트 B 발생 빈도가 높아짐에 따라 이벤트 A의 발생 빈도가 높아지는 것은 정상적인 경우라고 할 수 있다.For example, when the occurrence of the event B causes the occurrence of the event A, it is normal that the occurrence frequency of the event A increases as the occurrence frequency of the event B increases.

반면, 이벤트 B가 발생되지 않았음에도 이벤트 A의 발생빈도가 높아졌다면 기계 설비에 이상이 발생되었을 가능성이 크다고 할 수 있다.On the other hand, if the frequency of occurrence of event A is increased even though event B is not generated, it is highly likely that an abnormality has occurred in the mechanical equipment.

예를 들어, 기계 설비에서 온도의 증가는 압력의 증가를 수반하므로 온도가 기 설정된 임계값을 초과하는 빈도가 높아짐에 따라 압력이 기 설정된 임계값을 초과하는 빈도가 높아지는 것은 자연스러운 현상이라고 할 수 있다.For example, it is natural that the increase of the temperature in the mechanical equipment involves an increase in the pressure, so that the frequency of the pressure exceeding the predetermined threshold increases as the frequency exceeds the preset threshold value .

그러나, 온도가 기 설정된 임계값을 초과하는 이벤트가 발생하지 않았음에도 압력이 기설정된 임계값을 초과하는 이벤트의 빈도가 증가하였다면 기계 설비에 이상이 발생되었을 가능성이 크다고 판단할 수 있다.However, if the frequency of the event in which the pressure exceeds the predetermined threshold value has increased even though the event in which the temperature exceeds the predetermined threshold has not occurred, it can be judged that the possibility of an abnormality in the mechanical equipment is high.

즉, 이벤트 A의 발생 빈도를 연속적인 데이터로 변환한 제1 잔차 함수(510)와 이벤트 B의 발생 빈도를 연속적인 데이터로 변환한 제2 잔차 함수(520)와의 상호 연관 관계를 나타내는 상관 관계 함수를 생성하여, 이를 제1 잔차 함수(510)에서 제외하면 이벤트 B의 발생과 무관하게 이벤트 A가 발생된 경우를 감지할 수 있게 된다.That is, a correlation function (hereinafter, referred to as " correlation function ") that shows a correlation between a first residual function 510 in which the occurrence frequency of the event A is converted into continuous data and a second residual function 520 in which the occurrence frequency of the event B is converted into continuous data And excludes it from the first residual function 510, it is possible to detect the occurrence of the event A regardless of the occurrence of the event B.

이를 수식으로 나타내면 다음과 같다.The equation is expressed as follows.

여기에서, e’(t)는 다른 이벤트들과의 상관 관계를 제외한 잔차, Φ는 도 3 내지 도 5에서 설명한 방법으로 변환된 연속적인 데이터로 변환된 잔차 함수, f는 다른 잔차 함수들과의 상호 연관 관계를 나타내는 상관 관계 함수를 의미한다.Here, e '(t) is the residual without correlation with other events, phi is the residual function transformed into the continuous data transformed by the method described in Figs. 3 to 5, f is the residual function with other residual functions Means a correlation function indicating a correlation.

이벤트 A와 이벤트 B를 예로 들면, Φ(t)는 제1 잔차, f는 이벤트 A와 이벤트 B의 상호 연관 관계를 나타내는 상관 관계 함수일 수 있다. Taking event A and event B as an example,? (T) may be a first residual, and f may be a correlation function indicating a correlation between event A and event B.

본 발명의 일 실시예에 따라 각 범주형 데이터들간에 상호 연관 관계를 나타내는 f함수는 잔차값들을 독립변수로 활용하여 구축된 회귀 모델로 산출할 수 있다. 회귀 모델로는 다중선형회귀, 신경망 모델, 의사 결정 나무(Regression Tree), Regularized Regression 기법 등이 사용될 수 있으나, 이에 한정되지 않으며 다른 범용적인 회귀 모델이 사용될 수도 있다.According to an embodiment of the present invention, the f-function indicating the correlation between each categorical data can be calculated by using a regression model constructed by using residual values as independent variables. The regression model may be a multiple linear regression, a neural network model, a regression tree, or a regularized regression technique. However, the present invention is not limited to this, and other general regression models may be used.

예를 들어, 이벤트 A와 관련된 제1 범주형 데이터와 이벤트 B와 관련된 제2 범주형 데이터와의 상호 연관 관계를 나타내는 상관 관계 함수를 생성하고자 하는 경우, 제2 잔차 함수(520)를 구성하는 잔차값을 독립 변수로 제1 잔차 함수(510)를 구성하는 잔차값을 상관 관계 함수로 표현할 수 있다.For example, when it is desired to generate a correlation function indicating the correlation between the first categorical data related to the event A and the second categorical data related to the event B, the residual function constituting the second residual function 520 The residual value constituting the first residual function 510 can be expressed as a correlation function.

관련하여, 도 6에는 다른 이벤트들과의 상관 관계가 제거된 최종 잔차 함수(620)가 도시되어 있다. 구체적으로, 도 6에는 이벤트 A의 발생 빈도가 연속적인 데이터로 변환된 제1 잔차 함수(510) 및 회귀 모델을 이용하여 산출된 이벤트 A와 이벤트 B와의 상관 관계 함수(610)가 도시되어 있다.In this regard, FIG. 6 shows a final residual function 620 that is uncorrelated with other events. Specifically, FIG. 6 shows a first residual function 510 in which the occurrence frequency of the event A is converted into continuous data, and a correlation function 610 between the event A and the event B calculated using the regression model.

제1 잔차 함수(510)에서 이벤트 B와의 상관 관계 함수(610)의 차이값인 최종 잔차 함수(620)를 산출하면 이벤트 B와는 무관한 이벤트 A 독립적인 변화량을 파악할 수 있다.The final residual function 620, which is a difference value of the correlation function 610 with the event B, is calculated in the first residual function 510, so that the event A independent change amount independent of the event B can be grasped.

이후, A의 독립적인 변화량을 모니터링 한 결과 변화량이 기 설정된 범위를 벗어나면 관찰 대상인 기계 설비에 고장이 발생된 것으로 판단할 수 있다.Thereafter, it can be judged that a fault has occurred in the mechanical equipment to be observed if the variation amount is out of the predetermined range as a result of monitoring the independent variation amount of A.

이때, 기 설정된 범위는 정상적인 통계량의 대표값인 평균값, 중앙값 등에서 특정 산포만큼을 수용할 수 있는 변동값을 더하거나 차감하여 산출하게 된다. 구체적으로, 특정 이벤트와 관련된 최종 잔차 함수(620)를 모니터링 하여 기계 설비에 고장이 발생되었는지 여부를 판단하는 기준값, 상한 임계값 및 하한 임계값은 다음과 같은 수식을 통해 산출될 수 있다.In this case, the predetermined range is calculated by adding or subtracting a variation value capable of accommodating a certain amount of variation in an average value and a median value, which are representative values of a normal statistical value. Specifically, the reference value, the upper limit threshold value, and the lower limit threshold value for determining whether a failure has occurred in the hardware by monitoring the final residual function 620 related to a specific event can be calculated through the following equation.

즉, 관측치의 통계량이 정상구간의 평균값을 기준으로 k라는 상수 범위에 정상 통계량의 표준편차를 곱하여 기 설정된 범위 이내에 해당값이 위치하는 경우는 정상, 기 설정된 범위를 초과하면 기계 설비에 고장이 발생된 것으로 판단할 수 있다.That is, if the statistical value of the observed value is multiplied by the standard deviation of the normal statistic multiplied by the constant range of k based on the average value of the normal section, if the corresponding value is located within the preset range, .

관련하여, 도 6을 살펴보면 최종 잔차 함수(620)와 상한 임계값(630) 및 하한 임계값(640)이 도시되어 있음을 알 수 있다.Referring to FIG. 6, it can be seen that the final residual function 620, the upper threshold value 630 and the lower threshold value 640 are shown.

도 7은 본 발명의 일 실시예에 따라 이산적인 범주형 데이터가 다른 이벤트들과의 상관 관계를 제외한 최종 잔차로 변환되는 과정을 설명하기 위한 도면이다.7 is a diagram for explaining a process in which discrete categorical data is transformed into a final residual except for correlation with other events according to an embodiment of the present invention.

S710 단계에서는 모니터링 대상으로부터 수신되는 데이터를 수집한다. 이때 수집되는 데이터는 모니터링 대상에서 발생되는 이벤트의 발생 빈도에 관한 것으로 이산적인 범주형 데이터일 수 있다.In step S710, data received from the monitoring object is collected. The data collected at this time may be discrete categorical data regarding the frequency of occurrence of events occurring in the monitoring object.

S720 단계에서는 이산적인 범주형 데이터를 누적하여 누적 함수를 생성한다. 누적 함수는 이벤트 발생 횟수를 순차적으로 합산하여 생성되므로 시간에 따라 상승하는 패턴을 갖게 된다.In step S720, the cumulative function is generated by accumulating discrete categorical data. Since the cumulative function is generated by sequentially summing the event occurrence counts, it has a pattern that increases with time.

그러나, 누적 함수의 상승 패턴은 누적 함수의 속성에 따른 필연적인 것으로 본 발명에서 관심 있는 대상이 아니므로 시계열 회귀분석을 통해 상승 패턴을 제거한다.However, since the rising pattern of the cumulative function is inevitable according to the property of the cumulative function, it is not an object of interest in the present invention, and thus the rising pattern is eliminated through time series regression analysis.

S720 단계에서 상승 패턴이 제거되면 S730 단계에서는 이산적인 범주형 데이터가 연속적인 데이터로 변환된 결과를 얻을 수 있다. S730 단계에서 변환된 연속적인 데이터는 이벤트 발생 빈도의 평균적인 경우보다 많이 발생했는지 또는 적게 발생했는지 여부에 관한 정보가 포함될 수 있다.If the rising pattern is removed in step S720, the discrete categorical data is converted into continuous data in step S730. The continuous data converted in step S730 may include information on whether the event occurred more frequently or less than the average occurrence frequency of events.

즉, S730 단계에서 연속적인 데이터는 실제 발생된 이벤트의 누적 빈도와 예측 누적 빈도의 차이값을 의미하므로, 그 차이값인 잔차가 0보다 큰 것은 평균적인 경우에 비해 이벤트가 많이 발생된 것을 의미하고 잔차가 0보다 작은 것은 평균적인 경우에 비해 이벤트가 적게 발생된 것을 의미한다.That is, since the continuous data in step S730 indicates the difference between the cumulative frequency of the actual event and the predicted cumulative frequency, if the difference is greater than 0, it means that a lot of events are generated When the residual is less than 0, it means that fewer events are generated than the average case.

S740 단계에서는 다른 이벤트들과의 상관 관계를 제외한 최종 잔차를 산출한다. 특정 이벤트 A의 발생은 이벤트 B의 발생에 기인한 것일 수 있으므로, 다른 이벤트와의 상관 관계를 제외하면 이벤트 A의 독립적인 발생 빈도 변화량을 얻을 수 있게 된다.In step S740, the final residual excluding the correlation with other events is calculated. Since the occurrence of the specific event A may be caused by the occurrence of the event B, the independent occurrence frequency variation of the event A can be obtained by excluding the correlation with other events.

이후, 이벤트 A의 독립적인 발생 빈도가 기 설정된 범위를 초과하면 모니터링 대상에 이상이 있는 것으로 판단할 수 있다. 상술한 범주형 데이터 분석 방법에 따르면 이산적인 범주형 데이터도 효과적으로 분석할 수 있다는 효과를 달성할 수 있다.Thereafter, if the frequency of occurrence of the event A independently exceeds a predetermined range, it can be determined that there is an abnormality in the monitoring object. According to the above categorical data analysis method, discrete categorical data can be effectively analyzed.

도 8은 본 발명의 일 실시예에 따른 범주형 데이터 분석 장치를 설명하기 위한 기능 블럭도이다.8 is a functional block diagram for explaining a categorical data analyzing apparatus according to an embodiment of the present invention.

도 8에 도시된 범주형 데이터 분석 장치(800)는 누적 함수 생성부(810), 잔차 함수 생성부(820), 상관 관계 함수 생성부(830), 최종 잔차 함수 생성부(840) 및 이상 감지부(850)를 포함한다.8 includes a cumulative function generating unit 810, a residual function generating unit 820, a correlation function generating unit 830, a final residual function generating unit 840, 850 < / RTI >

도 8에는 본 발명의 실시예와 관련있는 구성요소들만이 도시되어 있다. 따라서, 본 발명이 속하는 기술분야의 통상의 기술자라면 도 8에 도시된 구성요소 이외에 다른 범용적인 구성요소가 더 포함될 수 있음을 알 수 있다.Only the components associated with the embodiment of the present invention are shown in Fig. Therefore, it is apparent to those skilled in the art that other general-purpose components other than those shown in FIG. 8 may be further included.

누적 함수 생성부(810)는 제1 범주형 데이터로 제1 누적 함수를 생성하고, 제2 범주형 데이터로 제2 누적 함수를 생성한다. 이산적인 범주형 데이터로 누적 함수를 생성하는 구체적인 방법은 도 4에서 설명한 바와 같다.The cumulative function generator 810 generates a first cumulative function as the first categorical data and a second cumulative function as the second categorical data. A concrete method of generating the cumulative function with the discrete categorical data is as described in Fig.

잔차 함수 생성부(820)는 상기 제1 누적 함수에서 상승 패턴을 제거하여 제1 잔차 함수를 생성하고, 상기 제2 누적 함수에서 상승 패턴을 제거하여 제2 잔차 함수를 생성한다. 이를 위해, 잔차 함수 생성부(820)는 상기 제1 누적 함수의 상승 패턴을 나타내는 제1 추세선 및 상기 제2 누적 함수의 상승 패턴을 나타내는 제2 추세선을 생성하는 추세선 생성부(821)을 포함할 수 있다.The residual function generation unit 820 generates a first residual function by removing the rising pattern from the first accumulated function, and removes the rising pattern from the second accumulated function to generate a second residual function. To this end, the residual function generation unit 820 includes a trend line generation unit 821 for generating a first trend line showing a rising pattern of the first cumulative function and a second trend line showing a rising pattern of the second cumulative function .

누적 함수에서 상승 패턴을 생성하여 연속적인 데이터인 잔차 함수를 생성하는 구체적인 방법은 도 5에서 설명한 바와 같으므로 중복되는 설명은 생략하도록 한다.A specific method of generating the rising pattern in the cumulative function to generate the residual function as continuous data is the same as that described with reference to FIG. 5, so that redundant description will be omitted.

상관 관계 함수 생성부(830)는 상기 제1 잔차 함수와 상기 제2 잔차 함수의 상호 연관 관계를 나타내는 상관 관계 함수를 생성한다. 상관 관계 함수를 생성하는 구체적인 방법은 도 6에서 설명한 바와 같으므로 중복되는 설명은 생략하도록 한다.The correlation function generator 830 generates a correlation function indicating a correlation between the first residual function and the second residual function. Since a specific method of generating the correlation function is the same as that described with reference to FIG. 6, a duplicate description will be omitted.

최종 잔차 함수 생성부(840)는 상기 제1 잔차 함수와 상기 상관 관계 함수의 차이값으로 최종 잔차 함수를 생성한다. 마찬가지로, 최종 잔차 함수를 생성하는 구체적인 방법도 도 6에 기재되어 있으므로 중복되는 설명은 생략하도록 한다.The final residual function generator 840 generates a final residual function as a difference between the first residual function and the correlation function. Likewise, a concrete method of generating the final residual function is also described in FIG. 6, so that redundant explanations are omitted.

이상 감지부(850)는 최종 잔차 함수가 기 설정된 범위를 벗어나면 모니터링 대상에 이상이 발생한 것으로 판단한다. 이를 위해, 본 발명의 일 실시예에 따른 이상 감지부(850)는 제1 범주형 데이터의 평균값 또는 중앙값으로 기준값을 산출 기준값 산출부(851), 기준값에 수용 가능한 변동값을 더하여 상한 기준값을 산출하는 상한 기준값 산출부(852) 및 변동값을 차감하여 하한 기준값을 산출하는 하한 기준값 산출부(853)을 포함할 수 있다.The anomaly detection unit 850 determines that an abnormality has occurred in the monitoring target if the final residual function is out of a preset range. To this end, the abnormality sensing unit 850 according to an embodiment of the present invention calculates the upper limit reference value by adding the reference value to the calculation reference value calculating unit 851 and the variation value that can be accommodated in the reference value with the average value or median value of the first categorical data And a lower limit reference value calculating unit 853 for calculating a lower limit reference value by subtracting the variation value from the upper limit reference value calculating unit 852. [

도 9는 본 발명의 또 다른 실시예에 따른 범주형 데이터 분석 장치를 설명하기 위한 기능 블록도이다.9 is a functional block diagram illustrating a categorical data analysis apparatus according to another embodiment of the present invention.

도 9에 도시된 범주형 데이터 분석 장치(900)는 프로세서(910), 스토리지(920), 메모리(930), 네트워크 인터페이스(940) 및 버스(950)을 포함한다.9 includes a processor 910, a storage 920, a memory 930, a network interface 940 and a bus 950. The cached data analysis apparatus 900 includes a processor 910, a storage 920, a memory 930, a network interface 940 and a bus 950.

프로세서(910)는 범주형 데이터 분석 프로그램을 실행한다. 그러나, 프로세서(910)에서 실행될 수 있는 프로그램은 이에 한정되지 않으며 다른 범용적이 프로그램이 실행될 수도 있다.Processor 910 executes the categorical data analysis program. However, the program that can be executed in the processor 910 is not limited thereto, and other general-purpose programs may be executed.

스토리지(920)는 범주형 데이터 분석 프로그램이 저장된다. 본 발명의 일 실시예에 따른 범주형 데이터 분석 프로그램은 제1 범주형 데이터로 제1 누적 함수를 생성하고, 제2 범주형 데이터로 제2 누적 함수를 생성하는 단계, 상기 제1 누적 함수에서 상승 패턴을 제거하여 제1 잔차 함수를 생성하고, 상기 제2 누적 함수에서 상승 패턴을 제거하여 제2 잔차 함수를 생성하는 단계, 상기 제1 잔차 함수와 제2 잔차 함수의 상호 연관 관계를 나타내는 상관 관계 함수를 생성하는 단계, 상기 제1 잔차 함수와 상기 상관 관계 함수와의 차이값으로 최종 잔차 함수를 생성하는 단계 및 상기 최종 잔차 함수가 기 설정된 범위를 벗어나면 모니터링 대상에 이상이 발생한 것으로 판단하는 단계를 실행한다.The storage 920 stores a categorical data analysis program. The categorical data analysis program according to an embodiment of the present invention includes the steps of generating a first cumulative function with first categorical data and generating a second cumulative function with second categorical data, Generating a first residual function by removing a pattern and removing a rising pattern from the second cumulative function to generate a second residual function; calculating a correlation function expressing a correlation between the first residual function and the second residual function Generating a final residual function as a difference value between the first residual function and the correlation function, and determining that an abnormality has occurred in the monitoring target if the final residual function is out of a preset range .

메모리(930)는 범주형 데이터 분석 프로그램을 로딩하여, 그 프로그램이 프로세서(910)에서 실행될 수 있도록 한다.The memory 930 loads the categorical data analysis program so that the program can be executed on the processor 910. [

네트워크 인터페이스(940)에는 다양한 컴퓨팅 장치가 연결될 수 있다. 예를 들어, 모니터링 대상이 되는 기계 설비가 연결되어 기계 설비에서 측정되는 범주형 데이터를 수신하도록 구현될 수 있다.The network interface 940 may be coupled to various computing devices. For example, the monitored hardware may be implemented to receive categorical data that is connected and measured at the hardware.

버스(950)는 상술한 프로세서(910), 스토리지(920), 메모리(930) 및 네트워크 인터페이스(940)가 연결되는 데이터 이동 통로로서의 역할을 수행한다.The bus 950 serves as a data movement path through which the processor 910, the storage 920, the memory 930, and the network interface 940 described above are connected.

한편, 상술한 방법은 컴퓨터에서 실행될 수 있는 프로그램으로 작성 가능하고, 컴퓨터로 읽을 수 있는 기록매체를 이용하여 상기 프로그램을 동작시키는 범용 디지털 컴퓨터에서 구현될 수 있다. 또한, 상술한 방법에서 사용된 데이터의 구조는 컴퓨터로 읽을 수 있는 기록매체에 여러 수단을 통하여 기록될 수 있다. 상기 컴퓨터로 읽을 수 있는 기록매체는 마그네틱 저장매체(예를 들면, 롬, 플로피 디스크, 하드 디스크 등), 광학적 판독 매체(예를 들면, 시디롬, 디브이디 등)와 같은 저장매체를 포함한다.Meanwhile, the above-described method can be implemented in a general-purpose digital computer that can be created as a program that can be executed by a computer and operates the program using a computer-readable recording medium. In addition, the structure of the data used in the above-described method can be recorded on a computer-readable recording medium through various means. The computer-readable recording medium includes a storage medium such as a magnetic storage medium (e.g., ROM, floppy disk, hard disk, etc.), optical reading medium (e.g., CD ROM,

본 실시예와 관련된 기술 분야에서 통상의 지식을 가진 자는 상기된 기재의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 방법들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. Therefore, the disclosed methods should be considered from an illustrative point of view, not from a restrictive point of view. The scope of the present invention is defined by the appended claims rather than by the foregoing description, and all differences within the scope of equivalents thereof should be construed as being included in the present invention.

Claims

Generating a first cumulative function with the first categorical data and a second cumulative function with the second categorical data;
Removing a rising pattern from the first accumulation function to generate a first residual function, and removing a rising pattern from the second accumulation function to generate a second residual function;
Generating a correlation function indicating a correlation between the first residual function and the second residual function;
Generating a final residual function as a difference value between the first residual function and the correlation function; And
And determining that an abnormality has occurred in the monitoring object if the final residual function is out of a preset range.

The method according to claim 1,
Wherein the generating the first residual function and the second residual function comprises:
Generating a first trend line representing a rising pattern of the first cumulative function and a second trend line representing a rising pattern of the second cumulative function; And
Generating a first residual function as a difference between the first trend line and the first cumulative function and generating the second residual function as a residual difference between the second trend line and the second cumulative function, A method of categorical data analysis comprising:

3. The method of claim 2,
Wherein the generating of the first trendline and the second trendline comprises:
Calculating a slope of the first trendline and a slope of the second trendline with a time-series linear regression.

The method of claim 3,
Wherein the step of calculating the slope of the first trendline and the slope of the second trendline comprises:
Determining a slope satisfying a least square of a residual, which is a difference between the first trendline and the first cumulative function, as a slope of the first trendline; And
Determining a slope satisfying a least square of a residual, which is a difference between the second trendline and the second cumulative function, as the slope of the second trendline.

The method according to claim 1,
Wherein the generating the correlation function comprises:
And generating a correlation function by using a residual, which is a difference value between the second trend line and the second cumulative function, as an independent variable in a regression model.

6. The method of claim 5,
Wherein the generating the correlation function comprises:
A categorical data analysis method using at least one of a multiple linear regression, a neural network model, a regression tree, and a regularized regression technique.

The method according to claim 1,
Determining that an abnormality has occurred in the monitoring object if the final residual function is out of a predetermined range,
Calculating a reference value from an average value or a median value of the first categorical data;
Calculating an upper limit reference value by adding the allowable variation value to the reference value, and calculating the lower limit reference value by subtracting the variation value; And
And determining that an abnormality has occurred in the monitoring object if the final residual function is out of the upper limit reference value or the lower limit reference value.

The method according to claim 1,
Wherein the first categorical data and the second categorical data are discontinuous discrete data.

An accumulative function generator for generating a first cumulative function as the first categorical data and a second cumulative function as the second categorical data;
A residual function generating unit for generating a first residual function by removing the rising pattern from the first cumulative function, and removing a rising pattern from the second cumulative function to generate a second residual function;
A correlation function generator for generating a correlation function indicating a correlation between the first residual function and the second residual function;
A final residual function generating unit for generating a final residual function as a difference value between the first residual function and the correlation function; And
And an abnormality detecting unit for determining that an abnormality has occurred in the monitoring target if the final residual function is out of a preset range.

10. The method of claim 9,
Wherein the residual function generating unit comprises:
And a trendline generator for generating a first trendline representing a rising pattern of the first cumulative function and a second trendline representing a rising pattern of the second cumulative function,
A categorizing unit for generating a first residual function by a residual difference between the first trend line and the first cumulative function and generating the second residual function by a residual difference between the second trend line and the second cumulative function, Data analysis device.

10. The method of claim 9,
The correlation function generation unit,
And generating a correlation function by using the residual, which is a difference value between the second trend line and the second cumulative function, as an independent variable in a regression model.

10. The method of claim 9,
Wherein the abnormality detecting unit comprises:
A reference value calculation unit for calculating a reference value from an average value or a median value of the first categorical data;
An upper limit reference value calculation unit for calculating an upper limit reference value by adding the acceptable variation value to the reference value; And
And a lower limit reference value calculation unit for calculating the lower limit reference value by subtracting the variation value,
And determines that an abnormality has occurred in the monitoring object if the final residual function is out of the upper limit reference value or the lower limit reference value.

10. The method of claim 9,
Wherein the first categorical data and the second categorical data are discontinuous discrete data.

One or more processors;
A memory for loading a computer program executed by the processor; And
A storage for storing a computer program capable of analyzing categorical data,
The computer program comprising:
Generating a first cumulative function with the first categorical data and a second cumulative function with the second categorical data;
Removing the rising pattern from the first cumulative function to generate a first residual function, and removing the rising pattern from the second cumulative function to generate a second residual function;
An operation of generating a correlation function indicating an interrelation between the first residual function and the second residual function;
An operation of generating a final residual function as a difference value between the first residual function and the correlation function; And
And determining that an abnormality has occurred in the monitoring object if the final residual function is out of a predetermined range.

In combination with the computer device,
Generating a first cumulative function with the first categorical data and a second cumulative function with the second categorical data;
Removing a rising pattern from the first accumulation function to generate a first residual function, and removing a rising pattern from the second accumulation function to generate a second residual function;
Generating a correlation function indicating a correlation between the first residual function and the second residual function;
Generating a final residual function as a difference value between the first residual function and the correlation function; And
And determining that an abnormality has occurred in the monitoring target if the final residual function is out of a predetermined range.