KR20210027214A

KR20210027214A - Methods and apparatus for predicting data

Info

Publication number: KR20210027214A
Application number: KR1020200110650A
Authority: KR
Inventors: 최재식; 한지연; 이교운
Original assignee: 울산과학기술원; 주식회사 인이지
Priority date: 2019-08-30
Filing date: 2020-08-31
Publication date: 2021-03-10
Also published as: KR102446854B1

Abstract

Disclosed are a method and device for predicting data. The method according to one embodiment of the present invention comprises the steps of: extracting change point information of given data by detecting a change in a covariance structure of a Gaussian process; updating the change point distribution in real time based on the change point information; and predicting data based on the updated change point distribution.

Description

Data prediction method and apparatus {METHODS AND APPARATUS FOR PREDICTING DATA}

아래 실시예들은 데이터를 예측하는 방법 및 장치에 관한 것으로, 구체적으로는 가우시안 과정의 공분산 구조에서의 베이지안 실시간 변화점을 검출하고, 변화점을 이용하여 데이터 예측의 정확도를 향상시킬 수 있다.The following embodiments relate to a method and apparatus for predicting data. Specifically, a Bayesian real-time change point in a covariance structure of a Gaussian process may be detected, and accuracy of data prediction may be improved by using the change point.

시퀀셜 데이터(sequential data)에서 갑작스러운 변화를 탐지하기위한 데이터 분석과 같은 변화점 탐지(CPD; Change Point Detection) 문제는 미래 이벤트의 예측을 개선하는 데 중요한 요소이다.The problem of change point detection (CPD), such as data analysis to detect sudden changes in sequential data, is an important factor in improving the prediction of future events.

변화점 검출은 기본 분포가 변경되는 특정 순차적 위치이다. 변화점은 기후 모델링, 음성 인식, 이미지 분석, 인간 활동 인식(human activity recognition)을 포함한 수많은 영역에서 중요한 역할을 한다.Change point detection is a specific sequential location where the underlying distribution changes. Change points play an important role in a number of areas, including climate modeling, speech recognition, image analysis, and human activity recognition.

종래에는, 변화점 검출을 위해 베이지안 변화점 검출 방법이 사용되었다. 그런, 기존의 베이지안 실시간 변화점 검출에서는 변화점에 대한 분포를 실시간으로 갱신하며 계산하지만, 실제 변화점 검출에 대한 정확도를 보장하기 어렵고 매개변수 에 따라 적절한 변화점을 검출하지 못하거나, 무의미하게 많이 검출하는 등 매개변수 설정에 크게 영향을 받는다는 한계를 가진다. Conventionally, the Bayesian change point detection method has been used to detect the change point. However, in the existing Bayesian real-time change point detection, the distribution of the change point is updated and calculated in real time, but it is difficult to guarantee the accuracy of the actual change point detection, and it is not possible to detect an appropriate change point according to the parameter, or it is meaningless. It has a limitation that it is greatly influenced by parameter settings such as detection.

또한 기존에 가우시안 과정에서 평균이 변하는 경우에 통계 검정법을 이용하여 변화점을 검출하는 연구가 진행되었으나 보다 다양한 변화를 모델링 할 수 있는 공분산 변화에 대한 통계검정법 연구는 진행되지 않았으며, 실시간으로 적용하기 어렵다는 한계가 있다.In addition, when the mean changes during the Gaussian process, a statistical test was used to detect the point of change, but a statistical test for covariance changes that can model more diverse changes has not been conducted, and it is applied in real time. There is a limit to being difficult.

실시예들은 가우시안 과정의 공분산 구조에서의 베이지안 실시간 변화점을 검출하고자 한다.Embodiments attempt to detect a Bayesian real-time change point in a covariance structure of a Gaussian process.

실시예들은 가우시안 과정의 공분산 변화를 탐지하고, 이를 활용하여 실시간으로 변화점 분포를 갱신하고 미래의 데이터를 예측하고자 한다.Embodiments attempt to detect a change in covariance in a Gaussian process, update the distribution of change points in real time using this, and predict future data.

실시예들은 데이터 예측값과 변화점 확률분포를 출력하고자 한다.Embodiments attempt to output a predicted data value and a probability distribution of a change point.

실시예들은 보정된 변화점 확률 분포를 통해 미래 데이터 예측의 정확도를 향상시키고자 한다.Embodiments aim to improve the accuracy of prediction of future data through a corrected probability distribution of a change point.

일 실시예에 따른 데이터 예측 방법은 가우시안 과정의 공분산 구조 변화를 탐지하여 주어진 데이터의 변화점 정보를 추출하는 단계; 상기 변화점 정보에 기초하여, 실시간으로 변화점 분포를 갱신하는 단계; 및 상기 갱신된 변화점 분포에 기초하여, 데이터를 예측하는 단계를 포함한다.A data prediction method according to an embodiment includes the steps of detecting a change in a covariance structure of a Gaussian process and extracting change point information of given data; Updating a change point distribution in real time based on the change point information; And predicting data based on the updated distribution of change points.

상기 변화점 정보를 추출하는 단계는 상기 변화점의 부재를 주장하는 제1 가설을 설정하는 단계; 상기 변화점의 존재를 주장하는 제2 가설을 설정하는 단계; 상기 제1 가설 및 상기 제2 가설 사이의 우도비(likelihood ration)를 계산하는 단계; 및 상기 우도비에 기초하여 제1 가설 및 제2 가설 중 어느 하나를 선택하는 단계를 포함할 수 있다.The extracting of the change point information may include setting a first hypothesis claiming the absence of the change point; Establishing a second hypothesis that asserts the existence of the change point; Calculating a likelihood ratio between the first hypothesis and the second hypothesis; And selecting one of the first hypothesis and the second hypothesis based on the likelihood ratio.

상기 선택하는 단계는 상기 우도비가 미리 정해진 임계값 이상인 경우 상기 제2 가설을 선택하고, 상기 우도비가 상기 임계값 미만인 경우 상기 제1 가설을 선택하는 단계를 포함할 수 있다.The selecting may include selecting the second hypothesis when the likelihood ratio is greater than or equal to a predetermined threshold value, and selecting the first hypothesis when the likelihood ratio is less than the threshold value.

상기 갱신하는 단계는 상기 변화점이 검출된 경우 상기 변화점 이후 데이터를 통해 상기 변화점 분포를 갱신할 수 있다.In the updating step, when the change point is detected, the distribution of the change point may be updated through data after the change point.

상기 변화점 정보를 추출하는 단계는 상기 주어진 데이터에 슬라이딩 윈도우(sliding window)를 수행하여 상기 변화점 정보를 추출할 수 있다.In the step of extracting the change point information, the change point information may be extracted by performing a sliding window on the given data.

상기 갱신하는 단계는 상기 변화점 정보를 이용하여 매개변수로 발생될 수 있는 오차를 보정하는 단계를 포함할 수 있다.The updating may include correcting an error that may occur as a parameter using the change point information.

일 실시예에 따른 데이터 예측 장치는 적어도 하나의 프로그램이 저장된 메모리; 및 상기 적어도 하나의 프로그램을 실행하는 프로세서를 포함하고, 상기 프로세서는, 가우시안 과정의 공분산 구조 변화를 탐지하여 주어진 데이터의 변화점 정보를 추출하고, 상기 변화점 정보에 기초하여, 실시간으로 변화점 분포를 갱신하고, 상기 갱신된 변화점 분포에 기초하여, 데이터를 예측한다.An apparatus for predicting data according to an embodiment includes: a memory storing at least one program; And a processor executing the at least one program, wherein the processor detects a change in a covariance structure of a Gaussian process, extracts change point information of given data, and distributes change point in real time based on the change point information Is updated, and data is predicted based on the updated distribution of change points.

상기 프로세서는 상기 변화점의 부재를 주장하는 제1 가설을 설정하고, 상기 변화점의 존재를 주장하는 제2 가설을 설정하고, 상기 제1 가설 및 상기 제2 가설 사이의 우도비(likelihood ration)를 계산하고, 상기 우도비에 기초하여 제1 가설 및 제2 가설 중 어느 하나를 선택할 수 있다.The processor establishes a first hypothesis claiming the absence of the change point, establishes a second hypothesis claiming the existence of the change point, and a likelihood ratio between the first hypothesis and the second hypothesis. Is calculated, and one of the first hypothesis and the second hypothesis may be selected based on the likelihood ratio.

상기 프로세서는 상기 우도비가 미리 정해진 임계값 이상인 경우 상기 제2 가설을 선택하고, 상기 우도비가 상기 임계값 미만인 경우 상기 제1 가설을 선택할 수 있다.The processor may select the second hypothesis when the likelihood ratio is greater than or equal to a predetermined threshold value, and select the first hypothesis when the likelihood ratio is less than the threshold value.

상기 프로세서는 상기 변화점이 검출된 경우 상기 변화점 이후 데이터를 통해 상기 변화점 분포를 갱신할 수 있다.When the change point is detected, the processor may update the distribution of the change point through data after the change point.

상기 프로세서는 상기 주어진 데이터에 슬라이딩 윈도우(sliding window)를 수행하여 상기 변화점 정보를 추출할 수 있다.The processor may extract the change point information by performing a sliding window on the given data.

상기 프로세서는 상기 변화점 정보를 이용하여 매개변수로 발생될 수 있는 오차를 보정할 수 있다.The processor may correct an error that may occur as a parameter by using the change point information.

실시예들은 가우시안 과정의 공분산 구조에서의 베이지안 실시간 변화점을 검출할 수 있다.Embodiments may detect a Bayesian real-time change point in a covariance structure of a Gaussian process.

실시예들은 가우시안 과정의 공분산 변화를 탐지하고, 이를 활용하여 실시간으로 변화점 분포를 갱신하고 미래의 데이터를 예측할 수 있다.Embodiments may detect a change in covariance in a Gaussian process, and use this to update a distribution of change points in real time and predict future data.

실시예들은 데이터 예측값과 변화점 확률분포를 출력할 수 있다.Embodiments may output a data predicted value and a probability distribution of a change point.

실시예들은 보정된 변화점 확률 분포를 통해 미래 데이터 예측의 정확도를 향상시킬 수 있다.Embodiments may improve the accuracy of prediction of future data through the corrected probability distribution of change points.

도 1은 일 실시예에 따른 데이터 예측 방법을 설명하기 위한 순서도이다.
도 2는 일 실시예에 따른 공분산 구조의 변화점 검출 방법이 가우시안 프로세스 회귀의 품질에 미치는 영향을 설명하기 위한 도면이다.
도 3은 일 실시예에 따른 데이터 예측 장치의 산출물을 도시한 도면이다.
도 4는 일 실시예에 따른 데이터 예측 장치의 블록도를 도시한 도면이다.1 is a flowchart illustrating a data prediction method according to an exemplary embodiment.
FIG. 2 is a diagram illustrating an effect of a method of detecting a change point of a covariance structure on a quality of a Gaussian process regression according to an exemplary embodiment.
3 is a diagram illustrating an output of a data prediction apparatus according to an exemplary embodiment.
4 is a block diagram of an apparatus for predicting data according to an exemplary embodiment.

본 명세서에서 개시되어 있는 특정한 구조적 또는 기능적 설명들은 단지 기술적 개념에 따른 실시예들을 설명하기 위한 목적으로 예시된 것으로서, 실시예들은 다양한 다른 형태로 실시될 수 있으며 본 명세서에 설명된 실시예들에 한정되지 않는다.Specific structural or functional descriptions disclosed in this specification are exemplified only for the purpose of describing embodiments according to a technical concept, and the embodiments may be implemented in various different forms and are limited to the embodiments described herein. It doesn't work.

제1 또는 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 이런 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 이해되어야 한다. 예를 들어 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소는 제1 구성요소로도 명명될 수 있다.Terms such as first or second may be used to describe various components, but these terms should be understood only for the purpose of distinguishing one component from other components. For example, a first component may be referred to as a second component, and similarly, a second component may be referred to as a first component.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. 구성요소들 간의 관계를 설명하는 표현들, 예를 들어 "~간의에"와 "바로~간의에" 또는 "~에 이웃하는"과 "~에 직접 이웃하는" 등도 마찬가지로 해석되어야 한다.When a component is referred to as being "connected" or "connected" to another component, it is understood that it may be directly connected or connected to the other component, but other components may exist in the middle. It should be. On the other hand, when a component is referred to as being "directly connected" or "directly connected" to another component, it should be understood that there is no other component in the middle. Expressions describing the relationship between the elements, for example, "between" and "just between" or "neighboring to" and "directly adjacent to" should be interpreted as well.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 설시된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함으로 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Singular expressions include plural expressions unless the context clearly indicates otherwise. In the present specification, terms such as "comprise" or "have" are intended to designate that the specified features, numbers, steps, actions, components, parts, or combinations thereof exist, but one or more other features or numbers, It is to be understood that the presence or addition of steps, actions, components, parts, or combinations thereof, does not preclude the possibility of preliminary exclusion.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 해당 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 갖는 것으로 해석되어야 하며, 본 명세서에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the relevant technical field. Terms as defined in a commonly used dictionary should be construed as having a meaning consistent with the meaning of the related technology, and should not be interpreted as an ideal or excessively formal meaning unless explicitly defined in the present specification. Does not.

실시예들은 퍼스널 컴퓨터, 랩톱 컴퓨터, 태블릿 컴퓨터, 스마트 폰, 텔레비전, 스마트 가전 기기, 지능형 자동차, 키오스크, 웨어러블 장치 등 다양한 형태의 제품으로 구현될 수 있다. 이하, 실시예들을 첨부된 도면을 참조하여 상세하게 설명한다. 각 도면에 제시된 동일한 참조 부호는 동일한 부재를 나타낸다.The embodiments may be implemented in various types of products such as a personal computer, a laptop computer, a tablet computer, a smart phone, a television, a smart home appliance, an intelligent vehicle, a kiosk, and a wearable device. Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. The same reference numerals shown in each drawing indicate the same members.

도 1은 일 실시예에 따른 데이터 예측 방법을 설명하기 위한 순서도이다.1 is a flowchart illustrating a data prediction method according to an exemplary embodiment.

도 1을 참조하면, 일 실시예에 따른 단계들(110 내지 130)은 데이터 예측 장치에 의해 수행될 수 있다. 데이터 예측 장치는 하나 또는 그 이상의 하드웨어 모듈, 하나 또는 그 이상의 소프트웨어 모듈, 또는 이들의 다양한 조합에 의하여 구현될 수 있다.Referring to FIG. 1, steps 110 to 130 according to an embodiment may be performed by a data prediction apparatus. The data prediction apparatus may be implemented by one or more hardware modules, one or more software modules, or various combinations thereof.

일 실시예에 따른 데이터 예측 방법은 가우시안 과정의 공분산 변화를 탐지하는 통계 검정 부분과 이를 활용하여 실시간으로 변화점 분포를 갱신하고 미래의 데이터를 예측하는 베이지안 알고리즘 부분으로 나눌 수 있다.The data prediction method according to an embodiment may be divided into a statistical test part that detects a change in covariance of a Gaussian process and a Bayesian algorithm part that updates a distribution of change points in real time and predicts future data using the same.

단계(110)에서, 일 실시예에 따른 데이터 예측 장치는 가우시안 과정의 공분산 구조 변화를 탐지하여 주어진 데이터의 변화점 정보를 추출한다. 데이터 예측 장치는 상기 변화점의 부재를 주장하는 제1 가설을 설정하고, 상기 변화점의 존재를 주장하는 제2 가설을 설정하고, 상기 제1 가설 및 상기 제2 가설 사이의 우도비(likelihood ration)를 계산하고, 우도비에 기초하여 제1 가설 및 제2 가설 중 어느 하나를 선택할 수 있다.In step 110, the apparatus for predicting data according to an exemplary embodiment detects a change in the covariance structure of a Gaussian process and extracts change point information of the given data. The data prediction apparatus sets a first hypothesis claiming the absence of the change point, sets a second hypothesis claiming the existence of the change point, and a likelihood ratio between the first hypothesis and the second hypothesis. ) Is calculated, and one of the first hypothesis and the second hypothesis can be selected based on the likelihood ratio.

예를 들어, 데이터 예측 장치는 통계 검정을 수행하기 위하여 변화점의 부재를 주장하는 귀무가설과 변화점의 존재를 주장하는 대립가설을 설정할 수 있다. 데이터 예측 장치는 두 가설 사이의 우도비를 계산하여 우도비가 임계값보다 크면 귀무가설을 기각할 수 있다. 임계값은 두 가설의 확률 과정에 따른 우도비의 확률 분포에 대해 통계검정의 오류값이 미리 정해진 오류값을 넘지 않도록 결정될 수 있다.For example, the data prediction apparatus may set a null hypothesis claiming the absence of a change point and an alternative hypothesis claiming the existence of a change point in order to perform a statistical test. The data prediction apparatus may calculate a likelihood ratio between two hypotheses and reject the null hypothesis if the likelihood ratio is greater than a threshold value. The threshold value may be determined so that the error value of the statistical test does not exceed a predetermined error value for the probability distribution of the likelihood ratio according to the probability process of the two hypotheses.

데이터 예측 장치는 실시간 알고리즘에 적용하기 위해서 주어진 데이터에 슬라이딩 윈도우(sliding window)를 수행하여 상기 변화점 정보를 추출할 수 있다.The data prediction apparatus may extract the change point information by performing a sliding window on the given data in order to apply it to a real-time algorithm.

단계(120)에서, 데이터 예측 장치는 변화점 정보에 기초하여, 실시간으로 변화점 분포를 갱신하고, 단계(130)에서, 데이터 예측 장치는 갱신된 변화점 분포에 기초하여, 데이터를 예측한다.In step 120, the data prediction apparatus updates the change point distribution in real time based on the change point information, and in step 130, the data prediction apparatus predicts data based on the updated change point distribution.

일 실시예에 따른 데이터 예측 장치는 통계 검정의 결과에 따라 베이지안 실시간 변화점 검출 알고리즘을 적절히 조정할 수 있다. 예를 들어, 통계 검정의 결과 가우시안 과정의 공분산 구조에서 변화점이 검출되었다면, 변화점 이후의 예측모델을 변화점 이후 데이터를 통해 갱신하고 미래 데이터 예측 시 변화점 이전의 데이터는 적게 고려하도록 조정할 수 있다. 반대로 데이터 예측 장치는 통계 검정의 결과 변화점이 검출되지 않았다면 변화점 이전의 데이터를 더 고려할 수 있도록 조정할 수 있다.The data prediction apparatus according to an embodiment may appropriately adjust the Bayesian real-time change point detection algorithm according to the result of the statistical test. For example, if a change point is detected in the covariance structure of the Gaussian process as a result of a statistical test, the prediction model after the change point can be updated through data after the change point, and the data before the change point can be adjusted to consider less when predicting future data. . Conversely, if a change point is not detected as a result of a statistical test, the data prediction device may adjust to further consider data before the change point.

일 실시예에 따른 데이터 예측 장치는 주어진 데이터에 대해서 슬라이딩 윈도 통계 검정을 수행하고 결과에 따라 변화점 확률 분포를 갱신할 수 있다. 이 때, 기존 베이지안 실시간 변화점 검출 알고리즘은 매개 변수에 따라 변화점이 실제 변화점의 수보다 적게 또는 많이 생길 수 있으나 제시된 통계 검정 방법을 활용하면 적절하지 않은 매개변수로 인한 효과를 보정해 줄 수 있다. 보정된 변화점 확률 분포를 통해 미래 데이터 예측의 정확도를 향상시킬 수 있다.The data prediction apparatus according to an embodiment may perform a sliding window statistical test on given data and update a probability distribution of a change point according to a result. At this time, the existing Bayesian real-time change point detection algorithm may have fewer or more change points than the actual number of change points depending on the parameter, but using the presented statistical test method can compensate for the effects of inappropriate parameters. . The accuracy of prediction of future data can be improved through the corrected probability distribution of change points.

일 실시예에 따른 데이터 예측 장치는 공분산 구조가 변하는 시계열 데이터에서 변화점을 검출하고, 변화점 정보를 활용하여 보다 정확한 실시간 시계열 데이터 예측을 수행할 수 있다.The apparatus for predicting data according to an embodiment may detect a change point in time series data having a change in a covariance structure, and perform more accurate real-time time series data prediction by using the change point information.

도 2는 일 실시예에 따른 공분산 구조의 변화점 검출 방법이 가우시안 프로세스 회귀의 품질에 미치는 영향을 설명하기 위한 도면이다.FIG. 2 is a diagram illustrating an effect of a method of detecting a change point of a covariance structure on a quality of a Gaussian process regression according to an exemplary embodiment.

도 2를 참조하면, 도면(210)은 중간에 의도된 변화점(intended CP)이 있는 가우시안 프로세스의 샘플을 나타낸다. 도면(220)은 전체 데이터 세트를 사용하여 하이퍼 파라미터(hyper parameter)를 학습 한 가우시안 프로세스 모델의 샘플을 나타낸다. 도면(230)은 공분산 구조가 깨지고, 하이퍼 파라미터가 별도로 학습된 가우시안 프로세스 모델의 샘플을 나타낸다.Referring to FIG. 2, a diagram 210 shows a sample of a Gaussian process with an intended CP in the middle. Fig. 220 shows a sample of a Gaussian process model that has learned a hyper parameter using the entire data set. Fig. 230 shows a sample of a Gaussian process model in which the covariance structure is broken and hyperparameters are separately learned.

도면(220) 및 도면(230)을 참조하면, 비정상 데이터를 시간 불변 가우시안 프로세스에 피팅하면 부정확한 모델이 생성됨을 알 수 있다. 공분산 구조에서 구조적 단절이 있는 가우시안 프로세스 회귀는 보다 표현력이 뛰어나고 비정상 데이터 분석에 더 적합할 수 있다.Referring to Figures 220 and 230, it can be seen that fitting anomalous data to a time-invariant Gaussian process generates an inaccurate model. Gaussian process regression with structural breaks in the covariance structure is more expressive and may be more suitable for analysis of anomalous data.

아래에서, 통계 검정 방법에 대한 구체적인 내용을 먼저 설명한 후 베이지안 알고리즘을 설명한다.In the following, details on the statistical test method are first described, and then the Bayesian algorithm is described.

공분산 구조의 변화를 탐지하기 위한 검정을 구성하기 위해 귀무 가설을 H0 : Cov(Xi, Xj) = K(i, j)로 정의하고 대립 가설을

,

와 같이 정의할 수 있다.To construct a test to detect changes in the covariance structure, we define the null hypothesis as H0: Cov(Xi, Xj) = K(i, j) and the alternative hypothesis.

,

It can be defined as

여기서 K, K '및 K' '는 커널 함수이고, Σ 및 Σt '는 각각 H₀ 및 H_1,t에 대한 공분산 행렬을 나타낸다. 우도 비

는 아래 수학식 1과 같을 수 있다.Here, K, K', and K'' are kernel functions, and Σ and Σt' denote covariance matrices for _{H 0} and H _{1,t, respectively.} Likelihood rain

May be equal to Equation 1 below.

귀무 가설 하에서, 즉 X ~ N (0, Σ),

Under the null hypothesis, i.e. X to N (0, Σ),

여기서

는

의 아이젠밸류(eigenvalues)이고, ui, vi-는 차수가 1 인 카이 제곱 분포를 따른다.here

Is

Is the eigenvalues of, and ui and vi- follow a chi-square distribution of degree 1.

대립 가설 하에서, 즉

,

Under the alternative hypothesis, i.e.

,

여기서

는

의 아이젠밸류이고, ui, vi-는 차수가 1 인 카이 제곱 분포를 따른다.here

Is

Is the eigenvalue of, and ui and vi- follow the chi-square distribution of degree 1.

수학식 2와 수학식 3은 두 개의 양의 준정부호(positive semi-denite) 2차 항의 차이가 자유도가 n인 카이-제곱 랜덤 변수와 자유도가 1인 독립 카이-제곱 랜덤 변수의 선형 조합 간의 뺄셈으로 표현 될 수 있음을 보여준다.Equations 2 and 3 are the subtraction between a linear combination of a chi-square random variable with n degrees of freedom and an independent chi-square random variable with 1 degree of freedom in which the difference between two positive semi-denite quadratic terms is Shows that it can be expressed as

공분산 구조가 두 개의 다른 커널로 분리되는 경우 H₀와 H₁은 K피j) = 0이라는 점을 제외하고 전술한 바와 유사하게 정의될 수 있다. 그러면 해당 공분산 행렬은 하기 수학식 4와 같이 작성 될 수 있을 수 있다.When the covariance structure is divided into two different kernels, H ₀ and H ₁ may be defined similarly as described above except that Kpj) = 0. Then, the corresponding covariance matrix may be written as in Equation 4 below.

X_a := X_1:t, X_b : = X_t+1:n로 정의할 때, r, c∈{a, b}에 대하여 K_rc는 X_r과 X_c 간의 공분산 행렬을 나타낸다. 우도비 검정(likelihood ratio test)를 하기 수학식 5와 같이 정의할 수 있다.When defined as X _a := X _1:t , X _b : = X _t+1:n , for r, c∈{a, b}, K _rc represents the covariance matrix between X _r and X _c. The likelihood ratio test can be defined as in Equation 5 below.

추가 기본형 및 정리를 위해 상수 Ct를 다음과 같이 정의할 수 있다.For additional basic form and theorem, the constant Ct can be defined as:

공분산 행렬 Σ 및 Σt '의 경우

. 여기서

은 행렬 M의 가장 작은 고유 값을 나타낸다.For covariance matrices Σ and Σt '

. here

Represents the smallest eigenvalue of matrix M.

,

를 만족한다. 여기서 ∧는 최소 연산자를 의미한다.

,

Is satisfied. Here, ∧ means the minimum operator.

Qt를

로 정의한다.Qt

It is defined as

이 귀무 가설 하에서 정확할 확률은

이상이다. 이를 수학식으로 나타내면, 수학식 6과 같다.

The probability of being correct under this null hypothesis is

That's it. If this is expressed by Equation 6, it is the same as Equation 6.

이 대립 가설 하에서 정확할 확률은

이상이다. 이를 수학식으로 나타내면, 수학식 7과 같다.

The probability of being correct under this alternative hypothesis is

That's it. If this is expressed by Equation 7, it is the same as Equation 7.

즉, 유형I 또는 유형II 오류를 δ / 2 미만으로 제어 할 수 있다. That is, the type I or type II error can be controlled to less than δ/2.

를 만족할 때, 조건부 탐지 오류 확률(conditional detection error probability)은 하기 수학식 8과 같이 제한될 수 있다.

When is satisfied, the conditional detection error probability may be limited as shown in Equation 8 below.

수학식 8을 사용하여 일반 공분산 커널 변화(general covariance kernel change)에 대한 우도비 검정이 지정된 조건에서 오류 경계 δ에 대해 통계적으로 정확함을 보장할 수 있다. 임계 값을 null 분포의 엡실론 상한보다 크거나 같게 설정하면, 제한된 유형 I의 오류를 보장 할 수 있다. 임계 값을 대체 분포의 엡실론 하한보다 작거나 같게 설정하면, 제한된 유형 II 오류를 보장 할 수 있을 수 있다.Using Equation 8, it is possible to ensure that the likelihood ratio test for the general covariance kernel change is statistically correct for the error boundary δ under a specified condition. By setting the threshold value equal to or greater than the epsilon upper limit of the null distribution, a limited type I error can be guaranteed. By setting the threshold value equal to or less than the epsilon lower limit of the replacement distribution, it may be possible to guarantee a limited type II error.

와

사이의 세 가지 가능한 불균형 케이스가 있을 수 있다.

>

인 경우, 유형 I 및 유형 II 오류를 모두 보장하는 임계 값은 없을 수 있다.

=

인 경우, 유형 I 및 유형 II 오류를 모두 보장 할 수있는 임계 값은 하나뿐이다.

<

인 경우, 유형 I 및 유형 II 오류를 모두 보장 할 수 있는 임계 값은 영역으로 존재할 수 있다.

Wow

There can be three possible unbalanced cases in between.

>

In the case of, there may not be a threshold that guarantees both type I and type II errors.

=

If so, there is only one threshold that can guarantee both Type I and Type II errors.

<

In the case of, a threshold value that can guarantee both type I and type II errors may exist as an area.

일 실시예에 따른 베이지안 알고리즘은 아래 표 1과 같을 수 있다.The Bayesian algorithm according to an embodiment may be shown in Table 1 below.

알고리즘 1은 이론적으로 정당화 된 온라인 변경 감지 알고리즘인 CBOCPD(Conrmatory Bayesian Online CPD)를 제시할 수 있다. CBOCPD의 주요 아이디어는 변화점 간의 간격이 데이터와 무관하다는 가정의 한계를 극복하는 것이다. Algorithm 1 can propose a theoretically justified online change detection algorithm, CBOCPD (Conrmatory Bayesian Online CPD). The main idea of CBOCPD is to overcome the limitations of the assumption that the spacing between points of change is data independent.

표 1을 참조하면, 1, 2 행은 매개 변수를 초기화하는 것을 나타내고, 3-13 행은 아래 수학식 9와 같이 나타낼 수 있다.Referring to Table 1, rows 1 and 2 indicate parameter initialization, and rows 3-13 may be expressed as Equation 9 below.

여기에

과

의 두 개의 우도 비 테스트가 있을 수 있다.Here

and

There may be two likelihood ratio tests.

수학식 6, 7에서 이론적으로 계산 된 임계값이 실제로 사용하기에 충분하지 않기 때문에 경험적 임계 값이 사용될 수 있다.Since the threshold values theoretically calculated in Equations 6 and 7 are not sufficient for practical use, an empirical threshold value can be used.

만약

를 만족하는 경우,

로 정의할 수 있고,

를 만족하는 경우

으로 정의할 수 있다. 우도비 검정은 t 주변의 윈도우 W = xt-m : t + m 에 적용될 수 있다.

는 우도 비율을 최대화하는 윈도우의 시점이다(

). 여기서, CW⊆{t-m, ..., t + m}는 창에 대한 변화점 후보 집합이다.if

If you are satisfied with,

Can be defined as,

If you are satisfied

It can be defined as The likelihood ratio test can be applied to the window W = xt-m: t + m around t.

Is the point in time of the window to maximize the likelihood ratio (

). Here, CW⊆{tm, ..., t + m} is a set of candidate change points for the window.

에서 두 우도비 검정을 통과하고,

가 t와 일치하면 t가 변화점이라고 결정하고 BOCPD 프레임 워크의 변경 가능성

을 높일 수 있다.

Passes the two likelihood ratio tests at,

If is coincident with t, it is determined that t is the point of change and the possibility of change in the BOCPD framework

Can increase.

반대로, 두 테스트 모두 통과하지 못하면 변경이 없다고 강력하게 믿고 BOCPD 프레임 워크의 변경 가능성을 줄일 수 있다.Conversely, if both tests fail, you can strongly believe that there is no change, reducing the likelihood of changes to the BOCPD framework.

도 3은 일 실시예에 따른 데이터 예측 장치의 산출물을 도시한 도면이다.3 is a diagram illustrating an output of a data prediction apparatus according to an exemplary embodiment.

도 3을 참조하면, 일 실시예에 따른 데이터 예측 장치는 미래데이터 예측값과 변화점 확률분포를 출력할 수 있다.Referring to FIG. 3, the apparatus for predicting data according to an embodiment may output a predicted value of future data and a probability distribution of a change point.

보다 구체적으로, 도면(310)은 시간 스케일(timescale)

가 200일 때의 출력이고, 도면(320)은 시간 스케일(timescale)

가 25일 때의 출력이다. 도면(310) 및 도면(320)을 참조하면, CBOCPD가 통계 테스트를 통해 데이터의 매끄러운 정도의 변경을 식별하는 반면 BOCPD는 변경을 너무 적거나 너무 많이 캡처한다는 것을 보여준다.More specifically, the diagram 310 is a time scale (timescale)

Is the output when is 200, and the drawing 320 is a timescale

This is the output when is 25. Referring to Figures 310 and 320, it is shown that CBOCPD identifies a smooth degree of change in the data through statistical testing, whereas BOCPD captures too little or too many changes.

도 4는 일 실시예에 따른 데이터 예측 장치의 블록도를 도시한 도면이다.4 is a block diagram of an apparatus for predicting data according to an exemplary embodiment.

도 4를 참조하면, 일 실시예에 따른 데이터 예측 장치(400)는 프로세서(410)를 포함한다. 데이터 예측 장치(400)는 메모리(430) 및 통신 인터페이스(450)를 더 포함할 수 있다. 프로세서(410), 메모리(430) 및 통신 인터페이스(450)는 통신 버스(405)를 통해 서로 통신할 수 있다.Referring to FIG. 4, the apparatus 400 for predicting data according to an embodiment includes a processor 410. The data prediction apparatus 400 may further include a memory 430 and a communication interface 450. The processor 410, the memory 430, and the communication interface 450 may communicate with each other through a communication bus 405.

메모리(430)에는 적어도 하나의 프로그램이 저장될 수 있고, 프로세서(410)는 적어도 하나의 프로그램을 실행할 수 있다.At least one program may be stored in the memory 430, and the processor 410 may execute at least one program.

프로세서(410)는 가우시안 과정의 공분산 구조 변화를 탐지하여 주어진 데이터의 변화점 정보를 추출하고, 변화점 정보에 기초하여, 실시간으로 변화점 분포를 갱신하고, 갱신된 변화점 분포에 기초하여, 데이터를 예측한다.The processor 410 detects a change in the covariance structure of the Gaussian process, extracts change point information of the given data, updates the change point distribution in real time based on the change point information, and based on the updated change point distribution, the data Predict.

메모리(430)는 휘발성 메모리 또는 비 휘발성 메모리일 수 있다.The memory 430 may be a volatile memory or a non-volatile memory.

실시예에 따라서, 프로세서(410)는 변화점의 부재를 주장하는 제1 가설을 설정하고, 변화점의 존재를 주장하는 제2 가설을 설정하고, 제1 가설 및 제2 가설 사이의 우도비(likelihood ration)를 계산하고, 우도비에 기초하여 제1 가설 및 제2 가설 중 어느 하나를 선택할 수 있다.According to an embodiment, the processor 410 establishes a first hypothesis claiming the absence of a change point, sets a second hypothesis claiming the existence of a change point, and the likelihood ratio between the first hypothesis and the second hypothesis ( likelihood ration) may be calculated, and one of the first hypothesis and the second hypothesis may be selected based on the likelihood ratio.

프로세서(410)는 우도비가 미리 정해진 임계값 이상인 경우 제2 가설을 선택하고, 우도비가 임계값 미만인 경우 제1 가설을 선택할 수 있다.The processor 410 may select the second hypothesis when the likelihood ratio is greater than or equal to a predetermined threshold value, and select the first hypothesis when the likelihood ratio is less than the threshold value.

프로세서(410)는 변화점이 검출된 경우 변화점 이후 데이터를 통해 변화점 분포를 갱신할 수 있다.When the change point is detected, the processor 410 may update the distribution of the change point through data after the change point.

프로세서(410)는 주어진 데이터에 슬라이딩 윈도우(sliding window)를 수행하여 변화점 정보를 추출할 수 있다.The processor 410 may extract change point information by performing a sliding window on the given data.

프로세서(410)는 변화점 정보를 이용하여 매개변수로 발생될 수 있는 오차를 보정할 수 있다.The processor 410 may correct an error that may occur as a parameter by using the change point information.

이상에서 설명된 실시예들은 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치, 방법 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The embodiments described above may be implemented as a hardware component, a software component, and/or a combination of a hardware component and a software component. For example, the devices, methods, and components described in the embodiments are, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate (FPGA). array), programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications executed on the operating system. Further, the processing device may access, store, manipulate, process, and generate data in response to the execution of software. For the convenience of understanding, although it is sometimes described that one processing device is used, one of ordinary skill in the art, the processing device is a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that it may include. For example, the processing device may include a plurality of processors or one processor and one controller. In addition, other processing configurations are possible, such as a parallel processor.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instructions, or a combination of one or more of these, configuring the processing unit to operate as desired or processed independently or collectively. You can command the device. Software and/or data may be interpreted by a processing device or, to provide instructions or data to a processing device, of any type of machine, component, physical device, virtual equipment, computer storage medium or device. , Or may be permanently or temporarily embodyed in a transmitted signal wave. The software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored on one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the embodiment, or may be known and usable to those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. -A hardware device specially configured to store and execute program instructions such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those produced by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.

이상과 같이 실시예들이 비록 한정된 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기를 기초로 다양한 기술적 수정 및 변형을 적용할 수 있다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described by the limited drawings, a person of ordinary skill in the art can apply various technical modifications and variations based on the above. For example, the described techniques are performed in a different order from the described method, and/or components such as systems, structures, devices, circuits, etc. described are combined or combined in a form different from the described method, or other components Alternatively, even if substituted or substituted by an equivalent, an appropriate result can be achieved.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and those equivalent to the claims also fall within the scope of the claims to be described later.

Claims

Detecting a change in a covariance structure in a Gaussian process and extracting change point information of the given data;
Updating a change point distribution in real time based on the change point information; And
Predicting data based on the updated distribution of change points
Containing, data prediction method.

The method of claim 1,
The step of extracting the change point information
Establishing a first hypothesis asserting the absence of the point of change;
Establishing a second hypothesis that asserts the existence of the change point;
Calculating a likelihood ratio between the first hypothesis and the second hypothesis; And
Selecting one of a first hypothesis and a second hypothesis based on the likelihood ratio
Containing, data prediction method.

The method of claim 2,
The selecting step
Selecting the second hypothesis when the likelihood ratio is greater than or equal to a predetermined threshold value, and selecting the first hypothesis when the likelihood ratio is less than the threshold value
Containing, data prediction method.

The method of claim 1,
The updating step
When the change point is detected, updating the change point distribution through data after the change point
Containing, data prediction method.

The method of claim 1,
The step of extracting the change point information
Extracting the change point information by performing a sliding window on the given data
Containing, data prediction method.

The method of claim 1,
The updating step
Correcting an error that may occur as a parameter using the change point information
Containing, data prediction method.

A computer program stored in a medium for executing the method of any one of claims 1 to 6 in combination with hardware.

A memory in which at least one program is stored; And
And a processor that executes the at least one program,
The processor,
Detects the change in the covariance structure of the Gaussian process and extracts the change point information of the given data
Based on the change point information, update the change point distribution in real time,
A data prediction device that predicts data based on the updated distribution of change points.

The method of claim 8,
The processor is
Establish a first hypothesis asserting the absence of the above point of change,
Establish a second hypothesis that asserts the existence of the above point of change,
Calculate a likelihood ration between the first hypothesis and the second hypothesis,
A data prediction apparatus for selecting one of a first hypothesis and a second hypothesis based on the likelihood ratio.

The method of claim 9,
The processor is
When the likelihood ratio is greater than or equal to a predetermined threshold value, the second hypothesis is selected, and when the likelihood ratio is less than the threshold value, the first hypothesis is selected.

The method of claim 8,
The processor is
When the change point is detected, the change point distribution is updated through data after the change point.

The method of claim 8,
The processor is
A data prediction apparatus for extracting the change point information by performing a sliding window on the given data.

The method of claim 8,
The processor is
A data prediction apparatus for correcting an error that may occur as a parameter by using the change point information.