KR20190013038A

KR20190013038A - System and method for trend predicting based on Multi-Sequences data Using multi feature extract technique

Info

Publication number: KR20190013038A
Application number: KR1020170096965A
Authority: KR
Inventors: 이동규; 김대한; 임홍순
Original assignee: 주식회사 빅트리
Priority date: 2017-07-31
Filing date: 2017-07-31
Publication date: 2019-02-11

Abstract

The present invention relates to a multi-sequence data prediction system using a multi-feature extraction technique and a prediction method thereof. According to the present invention, the prediction method comprises: a step of analyzing an association rule pattern; a step of determining an association rule pattern with a high prediction possibility; a step of generating selected data as a prediction model; and a step of using the generated prediction model to predict a future trend upon generation of an associated pattern. Accordingly, a trend changed after generation of a feature pattern of sequential data is able to be precisely predicted.

Description

TECHNICAL FIELD The present invention relates to a system and method for predicting multiple time series data using multiple feature extraction techniques,

본 발명은 시계열 패턴 분석 기법 분야에서 다중 특징 추출 기법을 이용해 다중 시계열 데이터의 추세(trend)를 예측하는 시스템 및 그 방법에 관한 것으로, 더욱 상세하게는 시계열 데이터가 가지는 시계열 고유의 특징요소들을 정의해 특징을 추출하고 패턴화한 후, 패턴들을 종합해 다중 시계열 데이터 간의 연관 규칙 패턴을 분석해 예측 모델을 만들어 다중 시계열 데이터의 추세를 예측하는 시스템 및 그 방법에 관한 것이다.The present invention relates to a system and method for predicting trends of multiple time series data using a multiple feature extraction technique in the field of time series pattern analysis, and more particularly, to a system and method for predicting trends of multiple time series data by defining a time series- The present invention relates to a system and method for predicting trends of multiple time series data by extracting and patterning features, analyzing association rule patterns among multiple time series data by combining patterns, and generating prediction models.

시계열 데이터 분석은 데이터 마이닝 기술과 통계적 분석 기술을 활용해 무질서하고 다양한 시계열 데이터로부터 계측에 대한 사전 지식 없이 과거 데이터들을 이용해 모델을 구축하고, 모델을 이용해 미래를 예측 할 수 있는 분석이다.Time series data analysis is an analysis that uses data mining technology and statistical analysis technology to build a model using past data without any prior knowledge of measurement from disordered and various time series data and to predict future using model.

또한, 시계열 분석을 통한 예측에서는 관측된 과거 자료들을 분석하여 규칙적인 패턴을 발견해 모형화한 추정된 모형을 이용해 미래 시점의 관측치를 예측하게 된다.Also, in forecasting through time series analysis, the observed historical data are analyzed, and regular patterns are found and modeled.

그러나 기존 분석 방식은 예측하려는 시계열의 고유 특성보다 절댓값과 형태만을 고려해 패턴을 추출하고, 단일 시계열에 한정한 모델을 분석해 예측을 시도해왔다.However, existing analytical methods have attempted to predict patterns by extracting patterns considering only absolute value and shape, and analyzing a model limited to a single time series rather than the inherent characteristics of a time series to be predicted.

결국, 종래의 시계열 데이터 분석은 사전에 정의된, 즉 자주 발견되는 규칙적이고 정형화된 패턴에 대해서만 예측이 가능하며, 분석된 단일 시계열과 같은 성격을 가진 시계열 데이터에 대해서만 예측 할 수 있다는 한계를 가지고 있다.As a result, conventional time-series data analysis has a limitation that it can be predicted only for regularly defined and regular patterns that are predefined, that is, frequently found, and can be predicted only for time series data having the same characteristics as a single time series analyzed .

산업의 발전과 전산화를 통해 다양한 형태의 시계열 데이터가 생성되고 방대하고 다양한 종류의 시계열에 대해 복합적이 연관성 분석 필요성이 증가하고 있어, 기존의 단일 시계열 분석 방법을 이용한 한계를 해결하기 위해 다중 시계열 데이터 패턴 추출 및 연관성 분석과 같이 확장성을 가지는 효율적인 시계열 데이터 분석 방법이 필요하다.Various kinds of time series data are generated through the development and industrialization of the industry, and the necessity of a complex correlation analysis is increasing for a wide variety of time series. In order to solve the limitation using the existing single time series analysis method, Efficient and time-series data analysis methods such as extraction and correlation analysis are needed.

본 발명이 이루고자 하는 기술적 과제는 시계열 데이터 속성을 가진 다양한 형태의 데이터를 입력받아 해당 시계열의 특성을 나타내는 특징들을 추출하고 특징들을 이용해 다른 시계열 들과의 연관성을 모델화하여 예측하는 시스템 및 그 방법을 제공하기 위한 것이다.According to an aspect of the present invention, there is provided a system and method for receiving various types of data having time series data attributes and extracting features representing characteristics of the corresponding time series and modeling and associating the association with other time series using the features .

또한, 본 발명이 이루고자 하는 또 다른 기술적 과제는 시계열 데이터가 가지는 공통된 속성을 분석, 추출해 특징을 정의하는 특징 추출 부와 타임 래그(time lag)방식을 이용한 다양한 시계열 데이터 간의 연관성 분석 정확도를 향상시킨 모델로 예측하는 시스템 및 그 방법을 제공하기 위한 것이다.According to another aspect of the present invention, there is provided a method for analyzing and extracting common attributes of time series data, including a feature extraction unit for defining features and a model for improving accuracy of association analysis between various time series data using a time lag And a method for predicting the same.

상기와 같은 목적을 달성하기 위하여, 본 발명은 분석을 원하는 시계열들을 순차적으로 입력받는 시계열 데이터 입력부; 상기 입력된 시계열 데이터의 절대적 값들에 차이를 줄이기 위한 데이터 정제부; 상기 정제된 데이터에 특징을 찾고 정의하는 특징 추출부; 상기 추출된 특징들이 해당 시계열데이터에 나타나는 시간과 특징 패턴에 출현 빈도를 이용한 가중치 계산과 유사 패턴집합을 문자화하여 특징 패턴 클러스터(Cluster)를 하는 유사 특징 패턴 클러스터부; 상기 입력된 단일 시계열 데이터의 특징 패턴 클러스터 데이터를 클러스터 시간기준으로 저장하는 클러스터 저장부; 상기 분석된 단일 시계열들의 패턴 정보를 다중 시계열 분석하기 위해 입력받고 세트화하는 다중 시계열 입력부; 상기 입력된 다중 시계열 패턴 세트들에 대해 AprioriAll 알고리즘을 이용해 연관성 분석을 하고, 연관성 정도에 따라 연관이 있는지 없는지 결정하는 연관성 분석, 연관성 결정부; 상기 결정된 다중 시계열 데이터들 간의 관계를 다중 시계열 패턴 모델화하고 저장하는 예측 모델 생성부; 상기 생성된 다중 시계열 패턴 모델을 반영하여 추세(trend)를 예측하는 추세 예측부;를 포함한다.In order to accomplish the above object, the present invention provides a time-series data input unit for sequentially receiving time series desired to be analyzed; A data refining unit for reducing a difference between absolute values of the input time series data; A feature extraction unit that finds and defines a characteristic of the purified data; A similar feature pattern cluster unit that performs feature pattern clustering by characterizing a set of similarity patterns and a weight calculation using the appearance frequency in the time and feature patterns appearing in the time series data; A cluster storage unit for storing the feature pattern cluster data of the input single time series data on a cluster time basis; A multiple time series input unit for receiving and analyzing pattern information of the analyzed single time series for multiple time series analysis; A relevance analysis and association determination unit for performing association analysis using the AprioriAll algorithm for the input multiple time series pattern sets and determining whether there is an association according to the degree of association; A prediction model generation unit for modeling and storing a plurality of time series data of a plurality of time series data; And a trend predicting unit for predicting a trend by reflecting the generated multiple time series pattern model.

상기 데이터 정제부는, 상기 시계열 데이터 입력부로부터 들어오는 시계열 데이터 정보들은 값의 단위가 상이하고 이상치가 포함되는 부분을 다음 수학식을 이용해 이상치를 제거하고, 0과 1사이의 값으로 정규화 한다.The data refiner may normalize the time series data information received from the time series data input unit to a value between 0 and 1 by removing an ideal value using a unit of a value and an ideal value by using the following equation.

(수식 1 - 이상치 제거 수식)(Equation 1 - outlier removal formula)

TS는 이상치가 제거된 시계열 데이터, a 는 시퀀스 압축계수, t는 시간 단위의 시계열 값, τ는 이상치 제거 시계열 기간이다.TS is time series data from which anomalies are removed, a is a sequence compression factor, t is a time series value in time units, and τ is an outlier removal time series period.

(수식 2 - 정규화 수식)(Equation 2 - normalization formula)

는 정규화된 시계열 데이터, TS는 이상치가 제거된 시퀀스 데이터, μ는 이상치가 제거된 시퀀스 데이터의 평균이다.

Is the normalized time series data, TS is the sequence data from which the ideal value is removed, and is the average of the sequence data from which the ideal value is removed.

상기 특징 추출부는, 시계열 데이터가 가지고 있는 고유의 특성을 찾기 위해 다음과 같은 3가지 기준으로 특징을 추출한다. 첫 번째, 시계열 특징 추출방법은 시계열 데이터의 변화 속도와 변화 규모의 정도를 파악할 수 있는 특징 추출로 다음 수식을 이용해 산출하게 된다.The feature extracting unit extracts features based on the following three criteria to find the unique characteristics possessed by the time series data. First, the time series feature extraction method is a feature extraction that can grasp the rate of change of the time series data and the degree of change scale, and it is calculated using the following equation.

(수식 3 - 변화 속도 및 규모 정의 수식)(Equation 3 - Variation Rate and Scale Definition Equation)

VM은 변화 속도 및 규모 정의 특징값,VM defines the change rate and scale definition feature value,

는 시계열 데이터에서 t와 t-1간의차이 값들 중 0 미만의 값들의 평균값,

Is an average value of values less than 0 among the difference values between t and t-1 in time series data,

는 시계열 데이터에서 t와 t-1간의 차이 값들 중 0 이상의 값들의 평균값이다.

Is an average value of zero or more of the difference values between t and t-1 in time series data.

두 번째, 시계열 특징 추출 방법은 시계열 데이터의 변동성을 파악할 수 있는 특징 추출로 다음 수식을 이용해 산출하게 된다.Second, the time series feature extraction method is feature extraction that can detect the volatility of time series data, and it is calculated by using the following formula.

(수식 4 - 변동성 특징 정의 수식)(Equation 4 - Variability Feature Definition Equation)

VolTS는 변동성 정의 특징값, N은 변동성을 산출하기 위한 시계열 데이터 길이,

는 시간 기준 t와 t-1의 변화율,

는

의 평균 값이다.VolTS is a volatility definition characteristic value, N is a time series data length for calculating volatility,

Is the rate of change of the time base t and t-1,

The

.

세 번째, 시계열 특징 추출 방법은 부분 시계열 형태 특징 추출로 n 일로 분리한 시계열 데이터 중 빈도가 높은 상위 10개의 부분 시계열 형태를 특징값으로 한다. 부분 시퀀스 추출 수식은 다음과 같다.Third, the time series feature extraction method is a partial time series type feature extraction, and the feature values are the top 10 partial time series types having high frequency among the time series data separated into n days. The partial sequence extraction formula is as follows.

(수식 5 - 부분 시계열 형태 추출 정의 수식)(Equation 5 - partial time series form extraction definition formula)

SubTS는 부분 시계열 형태 정의 특징값, i는 시간 단위에 시작 값, n은 시간 단위에 끝 값이다.SubTS is the partial time series type definition feature value, i is the start value in time unit, and n is the end value in time unit.

한편, 특징 패턴 클러스터 생성부는, 상기 특징 추출부에서 추출된 3가지 특징값들을 3차원 벡터로 만들어 완전 그래프 클러스터링 기법을 사용하며 3차원 패턴집합 중 유사한 패턴 집합을 문자화 후 클러스터링한다. 클러스터링하는 기준은 특징이 발생했던 시간을 기준에 클러스터와 특징이 발생한 빈도를(frequency) 기준으로 한 클러스터(Cluster) 하게 된다. 빈도 기준 클러스터의 결과로 나왔던 특징은 높은 빈도를 가지는 특징은 높은 가중치를 가지게 되어 다중 시계열 간에 연관성 분석부에서 연관성 측정 가중치에 사용된다.On the other hand, the feature pattern cluster generating unit uses a full graph clustering technique by making three feature vectors extracted from the feature extracting unit into a three-dimensional vector, and clusters similar pattern sets among the three-dimensional pattern sets after clustering. The clustering criterion is cluster based on the frequency at which the feature occurred and the frequency at which the feature occurred. The feature that resulted from the frequency-based cluster is that the feature with high frequency has a high weight and is used in the correlation measurement part in the correlation analysis part among multiple time series.

또한, 본 발명에서 연관성 분석와 연관성 결정부는, 상기 클러스터 생성부에서 생성된 개별시계열 클러스터 데이터를 이용해 데이터의 특징들 집합하고 시간 지연기법(time lag)과 연관성 분석 AprioriAll알고리즘을 이용해 다중 시계열 데이터 간의 연관성을 분석하는 부분을 포함한다.In the present invention, the relevance analysis and relevance determining unit may collect the characteristics of data using the individual time series cluster data generated by the cluster generating unit, and associate the time lag and the association analysis between multiple time series data using the AprioriAll algorithm. It includes analysis.

이와 같이 본 발명에 따르면, 날씨의 온도, 주가, 환율, 경제지표 등의 다양한 시계열 데이터를 독립적으로 변화 속도와 규모, 변동성, 시계열 형태(shape)로 정의한 특징을 추출, 클러스터 한 데이터를 이용해 다중 시계열 간의 연관성을 분석해 추세 예측 모델을 구축하여, 예측을 목표로 하는 시계열 데이터의 특징 패턴 발생 이후 변화하는 추세를 정확하게 예측 할 수 있다는 장점이 있다.As described above, according to the present invention, various time series data such as weather temperature, stock price, exchange rate, economic index, and the like are independently extracted as change rate, scale, variability, and time series shape, And a trend prediction model can be constructed by analyzing the correlation between the characteristic patterns of the time series data and the time series data.

즉, 독립적 인으로 정의된 특징 추출을 통해 다른 성격에 시계열을 공통적인 기준으로 분석할 수 있으며, 빈도 기준의 클러스터링을 통해 특징 집합에 가중치를 설정하여 연관성 분석에 사용함으로써 높은 정확도의 연관 분석이 가능하게 되며, 시간 지연 기법(time lag)을 이용해 시계열 간의 선 작용, 후 작용을 분석해 예측에 정확도를 높일 수 있는 장점이 있다.In other words, it is possible to analyze the time series to other characteristics through the independent feature extraction, and it can be used for the correlation analysis by setting the weight on the feature set through the clustering based on the frequency. , And it has an advantage of improving the accuracy of prediction by analyzing the pre-action and post-action between time series using time lag.

한편, 본 발명의 효과는 다중 시계열 데이터 분석 방법과 시스템으로 나타냄으로써 다양한 복수의 시계열 데이터 분석을 요구하는 산업분야에서 다양하게 활용될 수 있다.Meanwhile, since the effect of the present invention is represented by a multi-time series data analysis method and system, it can be utilized in various industrial fields requiring various time series data analysis.

도 1은 본 발명의 일 실시 예에 따른 다중 시계열 데이터 기반 추세 예측 시스템의 블록도이다.
도 2는 데이터 정제부의 내부 구성도이다.
도 3은 특징 추출부의 내부 구성도이다.
도 4는 본 발명의 다른 실시 예에 따른 다중 시계열 데이터 기반 추세 예측 방법의 순서도 이다.
도 5는 유사 특징 패턴 집합을 클러스터하고 문자화하는 과정을 표시부에 표시한 도면이다.
도 6은 시간 지연 기법을 이용한 다중 시계열 데이터 간 연관성 분석 과정을 표시한 도면이다.
도 7은 예측 모델에 포함된 연관 규칙 패턴을 이용해 예측하는 예시를 표시한 도면이다.1 is a block diagram of a multiple time series data based trend prediction system in accordance with an embodiment of the present invention.
2 is an internal configuration diagram of the data refining unit.
3 is an internal configuration diagram of the feature extraction unit.
4 is a flowchart of a multiple time series data-based trend prediction method according to another embodiment of the present invention.
5 is a diagram showing a process of clustering and characterizing a set of similar feature patterns in a display unit.
6 is a diagram illustrating a procedure for analyzing association between multiple time series data using a time delay technique.
7 is a diagram showing an example of predicting using an association rule pattern included in a prediction model.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되는 실시 예를 참조하면 명확해질 것이다.BRIEF DESCRIPTION OF THE DRAWINGS The advantages and features of the present invention, and how to accomplish them, will become apparent by reference to the embodiments described in detail below with reference to the accompanying drawings.

또한, 본 발명은 이하에서 개시되는 실시 예로 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이다.Further, the present invention is not limited to the embodiments disclosed below, but may be implemented in various other forms.

본 명세서에서 본 실시 예는 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이다.The present embodiments are provided so that the disclosure of the present invention is thoroughly disclosed and that those skilled in the art will fully understand the scope of the present invention.

그리고, 본 발명은 청구항 범주에 의해 정의될 뿐이다.And, the present invention is only defined by the claims.

따라서, 몇몇 실시 예에서, 잘 알려진 구성 요소, 잘 알려진 동작 및 잘 알려진 기술들은 본 발명이 모호하게 해석되는 것을 피하기 위하여 구체적으로 설명되지 않는다.Thus, in some embodiments, well known components, well known operations, and well-known techniques are not specifically described to avoid an undesirable interpretation of the present invention.

또한, 명세서 전체에 걸쳐 동일 참조부호는 동일한 구성 요소를 지칭하고, 본 명세서에서 언급된 용어들은 실시 예를 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다.In addition, throughout the specification, like reference numerals refer to like elements, and the terms referred to herein are for describing the embodiments and are not intended to limit the present invention.

본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함하며, ‘포함(또는, 종합)한다’로 언급된 구성 요소 및 동작은 하나 이상의 다른 구성요소 및 동작은 하나 이상의 다른 구성요소 및 동작의 존재 또는 추가를 배제하지 않는다.In this specification, the singular forms include plural forms unless the context clearly dictates otherwise, and components and acts referred to as " comprising (or collectively) " means that one or more other components and actions are performed by one or more other components and operations Or < / RTI >

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어 포함)는 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 공통적으로 이해될 수 있는 의미로 사용 될 수 있을 것이다.Unless defined otherwise, all terms (including technical and scientific terms) used herein may be used in a sense commonly understood by one of ordinary skill in the art to which this invention belongs.

그렇기 때문에, 일반적으로 사용되는 사전에 정의되어 있는 용어들은 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다.As such, commonly used predefined terms are not ideally or excessively interpreted unless defined otherwise.

이하, 첨부된 도면을 참고로 본 발명의 바람직한 실시 예에 대하여 설명한다.Hereinafter, preferred embodiments of the present invention will be described with reference to the accompanying drawings.

하기에서는 본 발명에 따른 시스템 및 분석 방법을 금융 데이터와 날씨 데이터를 이용한 주가 추세 예측 시스템에 적용하는 경우로 예로 설명한다.In the following description, the system and the analysis method according to the present invention are applied to a stock price trending system using financial data and weather data.

이하, 도 1 내지 도 4를 참조하면, 다중 특징 추출 기법을 이용한 다중 시계열 데이터 기반 추세 예측 시스템(100)은 개별적으로 입력되는 시계열 특징 추출 및 특징 클러스터를 하기 위한 개별 시계열 특징 클러스터부(200)와 개별 시계열 특징 클러스터부에서 분석한 내용을 종합해 다중 시계열 간에 연관성 분석과 예측 모델을 생성하는 다중 시계열 연관성 분석 예측부(300)를 포함한다.Referring to FIGS. 1 to 4, a multiple time series data-based trend prediction system 100 using a multiple feature extraction technique includes an individual time series feature cluster unit 200 for individually inputting time series feature extraction and feature clusters, And a multiple time series correlation analysis prediction unit 300 for generating a correlation analysis and a prediction model among multiple time series by combining the analysis results of the individual time series characteristic cluster unit.

날씨 데이터와 금융 데이터(국내외 주가지수, 국내외 경제지표 이용한 추세 예측을 위해 입력되는 데이터는 시계열 개념을 가진 데이터들을 사용해 데이터들 간의 유의미한 연관성을 정의된 특징들로부터 분석해 모델을 결정하게 된다.Weather data and financial data (input data for forecasting trends using domestic and overseas stock indices, domestic and overseas economic indicators, etc.) are used to determine the model by analyzing the meaningful relationships between the data using defined time series data.

여기서, 시계열 개념이란 시간의 흐름에 따라 일정한 간격으로 관찰하여 기록된 자료를 나타내는 것으로서, 개별 시계열 특징 클러스터링 시스템(200)은 시간을 기준으로 특징들을 추출하고 유사한 특징들을 클러스터 한 데이터를 다중 시계열 연관성 분석 예측부(300)에서 연관성을 분석해 모형화하고, 이를 이용하여 미래에 관측될 추세 값을 예측한다.Here, the time series concept refers to data recorded at regular intervals according to the time. The individual time series feature clustering system 200 extracts features based on time, and clusters of similar features are subjected to multiple time series correlation analysis The predictor 300 analyzes the association and models it, and predicts future trend values using the model.

여기서, 개별 시계열 특징 클러스터링 시스템(200)은, 다중 시계열 데이터를 독립적으로 분석해 특징을 추출하고, 유사한 특징을 집합해 클러스터 하는 시스템이다. 그리고 상기 과정을 수행하기 위해 시계열 데이터 입력부(210), 데이터 정제부(220), 특징 추출부(230), 특징 패턴 클러스터 생성부(240), 클러스터 저장부(250)를 포함한다. Here, the individual time-series feature clustering system 200 is a system for independently analyzing multiple time-series data to extract features, and collecting and clustering similar features. In addition, the apparatus includes a time series data input unit 210, a data refining unit 220, a feature extraction unit 230, a feature pattern cluster generation unit 240, and a cluster storage unit 250 to perform the above process.

시계열 데이터 입력부(210)는 시계열 데이터를 입력받는 부분으로 분석하기 위한 시계열 데이터의 전체 기간을 입력 값으로 받고, 데이터베이스, 파일형식의 로데이터(raw data)에 따라 입력 받는 형식을 달리하게 된다.The time series data input unit 210 receives the entire period of the time series data for analyzing the time series data as an input part and receives the input data according to the raw data of the database and file format.

여기서, 날씨 데이터는 데이터베이스에, 국내 주가지수 데이터는 파일형식으로 되어 있다면 입력되는 형식을 {날짜, 값}로 맞춰 데이터를 입력 받는다.Here, the weather data is input to the database, and the domestic stock index data is set to the file format, and the input format is set to {date, value}.

상기 데이터 정제부(220)는 입력된 시계열 데이터들의 절댓값의 형태가 상이하기 때문에 다른 시계열과 비교를 위해 시계열 로우 데이터를 표준화를 한다. 표준화 과정은 순차적으로 실시되며, DWT변환부(221)와 정규화부(222)가 포함된다.The data refinement unit 220 standardizes the time series data for comparison with other time series because the type of the extreme value of the input time series data is different. The standardization process is performed sequentially, and includes a DWT conversion unit 221 and a normalization unit 222.

도 2는 데이터 정제부의 내부 구성도이다.2 is an internal configuration diagram of the data refining unit.

데이터 정제부(220)에서는 입력된 시계열 데이터에 대한 이상치를 제거하고 평활화를 하기 위해 DWT변환(221)을 수행한다. DWT변환(221)은 웨이블릿 변환을 이용한 고주파(Hight Frequency)필터로 고주파와 같은 잔파동을 제거해 시퀀스를 평활화하는 기법이며, 다음 수학식 1을 이용해 평활화 작업을 수행 할 수 있다.The data refinement unit 220 performs a DWT transformation 221 to remove an ideal value of the inputted time series data and perform smoothing. The DWT transform 221 is a high frequency filter using a wavelet transform and smoothes the sequence by eliminating the residual wave such as a high frequency. The smoothing operation can be performed using the following equation (1).

여기서, TS는 이상치가 제거되어 평활화된 시계열 데이터를 나타내고, a 는 시계열의 평활화 압축 정도를 표시하는 압축계수, t는 평활화를 하기 위해 시계열 데이터의 시간 단위 값을 시계열 값, τ는 평활화할 부분 시계열의 기간이다.T denotes a time series value of the time series data for smoothing, a time series value of the time series data, and? Denotes a partial time series to be smoothed .

다음으로 정규화부(222)는 이상치를 제거한 평활화 데이터를 입력받아 0과 1사이의 정규화 값으로 시계열 데이터 절대값을 변환한다. 다음 수학식 2를 이용하여 정규화 작업을 수행 할 수 있다.Next, the normalization unit 222 receives the smoothed data from which the ideal value is removed, and converts the absolute value of the time-series data into a normalized value between 0 and 1. The normalization operation can be performed using the following equation (2).

여기서,

는 정규화된 시계열 데이터를 나타내며, TS는 이상치가 제거된 평활화된 시계열 데이터, μ는 평활화된 시계열 데이터의 평균값을 나타낸다.here,

TS denotes normalized time series data, TS denotes smoothed time series data from which an ideal value is removed, and μ denotes an average value of smoothed time series data.

구체적으로, 데이터 정제부(220)에 날씨 데이터의 온도 데이터와 시계열 정제기간이 입력되면 DWT변환부(221)에서는 지정 기간 내에 시계열 데이터의 이상치를 제거하고 평활화된 데이터를 추출 후, 정규화부(222)에서 0~1사이의 값으로 정규화한 값이 출력된다. 마찬가지로 금융 데이터의 코스피(KOSPI)지수의 종가 데이터와 정제 기간이 입력되면 상기 데이터 정제부와 같은 정제방법으로 0~1 사이의 값으로 정규화한 값이 출력 된다.Specifically, when the temperature data of the weather data and the time series refining period are input to the data refining unit 220, the DWT converting unit 221 removes the abnormal value of the time series data within the designated period, extracts the smoothed data, ), A value normalized to a value between 0 and 1 is output. Similarly, when the closing price data of the KOSPI index of the financial data and the refinement period are input, a value normalized to a value between 0 and 1 is output by a refining method similar to the data refining unit.

특징 추출부(230)는 데이터 정제부(220)에서 출력된 정제 데이터를 입력받아 시계열 데이터가 가지고 있는 고유의 특징들을 변화 속도 및 규모 특징 추출부(231), 변동성 특징 추출부(232), 부분 시계열 형태 추출부(233)를 이용 추출한다.The feature extraction unit 230 receives the tabular data output from the data refinement unit 220, and extracts unique characteristics of the time series data from the change rate and scale feature extraction unit 231, the variance feature extraction unit 232, And extracts the time-series form extraction unit 233 by using the time-

여기서, 시계열 데이터가 가지고 있는 고유의 특징이라 함은 일반적으로 시간의 속성이 있는 시계열 데이터가 시간이 변함으로써 가지는 변화의 속성을 말한다. 본 발명의 특징 추출부에서는 변화의 속성을 변화규모와 속도, 변동성, 변화하는 시계열의 형태로 정의해 특징을 추출한다.Here, the characteristic inherent in the time series data is the property of the change that the time series data having the time attribute has by changing the time. In the feature extraction unit of the present invention, the characteristics of the change are defined by the change scale, the speed, the variability, and the changing time series.

도 3은 특징 추출부의 내부 구성도이다.3 is an internal configuration diagram of the feature extraction unit.

변화 속도 및 규모 특징 추출부(231)는, 시계열 데이터의 공통적 속성 중 시간에 비례한 변화량의 세기에 대한 특징을 추출하기 위해 변화 속도와 변화 규모를 정의한다. 변화 속도는 동일 기간에 하락 변화량과 상승 변화량 간의 차이 값의 평균에 총량이며, 변화 규모는 부분 시계열 기간의 시작지점과 끝 지점 사이에 모든 시계열 데이터 값의 변화량에 근거해 기간 내에 변화한 총량으로서 비율로 표현한다. 변화 속도 및 규모 특징 추출부(231)는 다음 수학식 3을 이용하여 추출하게 된다.The change rate and scale feature extraction unit 231 defines a change rate and a change scale in order to extract characteristics of intensity of a change amount proportional to time among common attributes of time series data. The rate of change is the total amount on the average of the difference between the falling change amount and the rising change amount in the same period, and the change scale is the total amount changed within the period based on the change amount of all the time series data values between the start point and the end point of the partial time series period . The change rate and scale feature extraction unit 231 extracts the change rate and scale feature using the following equation (3).

여기서, VM은 변화 속도 및 규모 정의 특징값을 나타내며,

는 시계열 데이터에서 t와 t-1간의 차이 값들 중 0 이상의 값들의 평균값을 나타낸다.Here, VM represents the change rate and scale definition feature value,

Represents an average value of zero or more of the difference values between t and t-1 in the time-series data.

구체적으로, 변화 속도와 규모 특징 추출부(231)에서는 데이터 정제부(220)로 부터 전달받은 온도 시계열 데이터를 시작 시점과 끝 시점으로 나눈 부분 시계열로 나눠, 해당 기간의 데이터 요소 간 변화량의 총량과 변화량 차이 값의 평균값을 계산해 특징값을 추출한다. 이때, 연속적으로 부분 시계열의 변화 속도와 규모 특징을 추출하기 위해서는 부분 시계열의 시작 시점과 끝 시점을 변경해서 수학식 3에 적용해 추출 할 수 있다.Specifically, the change rate and scale feature extraction unit 231 divides the temperature time series data received from the data refinement unit 220 into partial time series divided by the start time and the end time, The feature value is extracted by calculating the average value of the variation amount difference value. In this case, in order to continuously extract the change rate and scale characteristics of the partial time series, the start time and end time of the partial time series can be changed and applied to the equation (3).

그리고 이렇게 추출된 특징값은 비율(%)로 산출되는데, 0(%)에 가까울수록 해당 기간 내 변화 속도 및 규모가 없거나 적음을 의미하고 100(%)에 가까울수록 해당 기간 내 변화 속도 및 규모가 빠르거나 많음을 의미한다.The feature values extracted as above are calculated as the ratio (%). The closer to 0 (%), the less or less the change rate and scale in the period, and the closer to 100 (%) the change rate and scale It means faster or more.

다음으로, 변동성 특징 추출부(232)는 시계열 데이터의 공통적 속성 중 변동성을 추출하기 위한 특징 추출부로 시계열 데이터의 시간에 비례한 성질 및 그 정도의 변화를 추출한다. 변화의 성질 및 그 정도는 시계열 데이터의 방향과 상관없이 변화 정도 추출로 수학식 4를 이용해 산출하게 된다.Next, the variance feature extracting unit 232 extracts the property and the degree of change in the property of the time series data in proportion to the time, as a feature extracting unit for extracting the volatility among the common attributes of the time series data. The nature and extent of the change are calculated using equation (4) as the degree of change extraction regardless of the direction of the time series data.

VolTS는 변동성 정의 특징값을 나타내며, N은 변동성을 산출하기 위한 시계열 데이터 길이,

는 시간 기준 t와 t-1의 변화율,

는

의 평균값을 나타낸다.Volts denotes the value of the volatility definition characteristic, N denotes a time series data length for calculating the volatility,

Is the rate of change of the time base t and t-1,

The

.

구체적으로, 변동성 특징 추출부(232)는, 데이터 정제부(220)로부터 전달받은 온도 시계열 데이터에서 변동성을 산출하기 위한 시계열 길이를 지정 후, 해당 시계열의 표준편차와 평균을 이용해 특징을 추출한다.Specifically, the volatility characteristic extracting unit 232 specifies a time series length for calculating the variability in the temperature time series data transmitted from the data refining unit 220, and then extracts the characteristic using the standard deviation and average of the time series.

그리고, 추출된 변동성 특징값은 정규분포 시그마 값을 대입한 정규화된 값을 최종 변동성 특징 값으로 정의하게 된다.Then, the extracted variability feature value defines the normalized value obtained by substituting the normal distribution sigma value as the final variance feature value.

다음으로, 부분 시계열 형태 추출부(233)는 시계열 데이터의 공통 속성 중 형태를 추출하기 위한 특징 추출부로 시간에 변화에 따른 시계열 변화 형태 추출한다. 시간 변화의 형태 추출은 시작 시점과 끝 시점을 지정해 1일 또는 N일 단위로 시작 시점을 변경하며 형태를 추출로 수학식 5를 이용해 산출하게 된다.Next, the partial time-series form extraction unit 233 extracts a time-series change form according to a change in time with a feature extraction unit for extracting a form among common attributes of time series data. The shape change of the time change is calculated by changing the start point by 1 day or N days by specifying the start point and the end point and extracting the shape by using Equation (5).

SubTS 는 부분 시계열 형태 정의 특징값을 나타내며, i는 시작 시점의 시간 단위 값, n은 끝 시점의 시간 단위 값, SubTS_Array는 시계열 데이터의 부분 시계열 형태집합을 나타낸다. SubTS denotes a partial time series type definition feature value, i denotes a time unit value at the start time, n denotes a time unit value at the end time, and SubTS_Array denotes a partial time series type set of time series data.

구체적으로, 부분 시계열 형태 추출부(233)에서는, 데이터 정제부(220)에서 전달받은 온도 시계열 데이터에서 부분 시계열의 형태를 추출하기 위해 시작 시점을 끝 시점을 지정하는 시작 시점 단위 값(i), 끝 시점 단위 값(n)과 시점 변경을 위한 단위 값을 지정하고 부분 시계열의 형태들을 추출하고 형태집합(SubTS_Array)으로 최종 부분 시계열 형태 특징을 산출하게 된다.Specifically, the partial time-series form extraction unit 233 extracts a start time unit value (i) for specifying the end point of the start point in order to extract the form of the partial time series from the temperature time-series data received from the data refinement unit 220, The endpoint unit value (n) and the unit value for the viewpoint change are specified, and the partial time series is extracted and the final partial time series type feature is calculated by the type set (SubTS_Array).

특징 패턴 클러스터 생성부(240)는 특징 추출부(230)에서 전달받은 특징을 3차원 벡터로 재구성하여 패턴화한 패턴 변수, 예를 들어, [변화 속도 및 규모 특징 데이터, 변동성 특징 데이터, 부분 시계열 형태 데이터]의 패턴 변수를 완전 그래프 클러스터링 알고리즘에 적용하여 특징 패턴을 클러스터(Cluster)하고 클러스터 된 패턴집합을 문자화 한다.The characteristic pattern cluster generation unit 240 generates a characteristic pattern cluster based on a pattern parameter that reconfigures the characteristic transferred from the characteristic extraction unit 230 into a three-dimensional vector and patterned, for example, [change speed and scale characteristic data, Pattern data] is applied to the full graph clustering algorithm to cluster the feature patterns and characterize the clustered pattern sets.

완전 그래프 클러스터링 알고리즘은 그래프 클러스터링 기법(Graph Clustering)을 이용해 계산하는 방법으로 해당 기술분야의 당업자에게는 잘 알려진 내용이므로 구체적인 설명은 생략하도록 한다. The full graph clustering algorithm is a method of calculating using the graph clustering technique, and is well known to those skilled in the art, so a detailed description thereof will be omitted.

또한, 패턴 클러스터 생성부(240)에서 클러스터링은 특징 추출부(230)에서 전달하는 특징값들이 가지는 시간 속성값을 기준으로 클러스터 한다. 온도 데이터의 특징값에 포함된 시간에 나타난 특징이 다른 날의 같은 시간대의 특징, 예들 들면, 1월 20일 오후 3시부터 6시까지의 특징값, 1월 21일 오후 3시부터 6까지의 특징값을 클러스터링 한다..The clustering in the pattern cluster generating unit 240 clusters based on the time attribute values of the feature values delivered by the feature extracting unit 230. [ The characteristic of the time included in the characteristic value of the temperature data is the characteristic of the same time zone of another day, for example, the characteristic value from 3:00 pm to 6:00 pm on January 20, the characteristic from 3:00 pm to 6:00 pm on January 21 Clusters the values.

그리고, 패턴 클러스터 생성부(240)에서 클러스터링하는 기준은 특징 추출(230)에서 전달하는 특징값들에 유사 특징 빈도수(frequency)를 기준으로 클러스터 한다. 유사 특징 빈도수가 높은 특징은 높은 가중치를 가지게 되어 추후 다중 시계열 간에 연관성 결정부(330) 연관 에서 연관성 측정 가중치에 사용된다.The criterion for clustering in the pattern cluster generation unit 240 clusters the feature values transmitted in the feature extraction 230 on the basis of the similar feature frequency. The feature having the similar feature frequency has a high weight and is used for the association measurement weight in the association determining unit 330 association between the multiple time series.

도 5는 상기 특징 패턴 클러스터 생성부(240)에서 특징 벡터를 클러스터링하고 문자화하는 과정을 표시한다.FIG. 5 shows a process of clustering and characterizing feature vectors in the feature pattern cluster generator 240.

클러스터 저장부(250)는 특징 패턴 클러스터 생성부(240)에서 생성된 특징 패턴 집합을 클러스터DB(261)에 저장한다.The cluster storage unit 250 stores the set of feature patterns generated by the feature pattern cluster generation unit 240 in the cluster DB 261.

다중 시계열 연관성 분석 예측부(300)는 상기 시계열 특징 클러스터부(200)에서 분류된 각각의 개별 시계열 특징 패턴 집합들을 이용해 다중 시계열 간의 연관성 분석을 하며, 분석 결과 중 결정계수가 높은 연관성을 결정해 다중 시계열 간의 예측 모델을 생성하고, 예측모델을 이용해 특징 패턴 발생 이후 추세(trend)를 예측한다. 그리고 상기 과정을 수행하기 위해 다중 시계열 입력부(310), 연관성 분석부(320), 연관성 결정부(330), 모델 생성부(340), 추세 예측부(350)를 포함한다.The multiple time series correlation analysis predicting unit 300 analyzes the association between multiple time series using each set of individual time series feature patterns classified in the time series feature cluster unit 200, A predictive model is created between time series, and the trend after the feature pattern is predicted using the predictive model. The relevance analyzing unit 320, the relevancy determining unit 330, the model generating unit 340, and the trend predicting unit 350 may be included in the apparatus of the present invention.

다중 시계열 입력부(310)는 상기 시계열 특징 클러스터부(200)에서 추출된 개별 시계열 데이터 특징 패턴 집합들이 저장되어 있는 클러스터 데이터베이스(251)로부터 입력받아 연관성 분석부(320)로 전달한다. 이때, 입력되는 시계열 데이터의 시간 순서를 오름 차순으로 정렬한다.The multiple time series input unit 310 receives the input time series data from the cluster database 251 storing the individual time series data feature pattern sets extracted from the time series feature cluster unit 200 and transfers the input data to the relation analysis unit 320. At this time, the time sequence of the inputted time series data is arranged in ascending order.

연관성 분석부(320)는 다중 시계열 입력부(310)로부터 전달받은 데이터를 시간 지연 기법(time lag)과 연관 규칙(Association Rule) 추출 AprioriAll 알고리즘을 이용하여 연관 패턴을 생성하고, 생성된 연관 패턴을 연관성 결정부(330)로 전달하게 된다.The association analyzing unit 320 generates an association pattern using a time lag and an association rule extraction AprioriAll algorithm from the data received from the multiple time series input unit 310, And transmits it to the determination unit 330.

또한, 연관성 분석부(320)는 다중 시계열 입력부(310)에서 데이터를 전달받은 패턴에 대해 시간 지연 기법을 이용해 시간 기준으로 연관성 분석 실시는 동일 시간대의 온도 특징 집합 데이터와 KOSPI 지수 특징 집합 데이터를 조합하는 경우와 동일 데이터에서 온도 특징 집한 데이터의 시간 기준을 하루 전으로 바꾸는 경우, 동일 데이터에서 KOSPI 지수 특징 집합 데이터의 시간 기준을 하루 전으로 바꾸는 경우 등, 전달받은 시계열의 개수에 따라 시간을 다르게 하는 경우의 집합을 만들어 AprioriAll 알고리즘을 이용해 모든 연관성 집합의 지지도(support)와 신뢰도(confidence), 연과 규칙 집합을 추출한다.In addition, the association analyzing unit 320 analyzes the pattern received from the multiple time series input unit 310 using a time delay technique, and performs association analysis on a time basis using a combination of temperature feature set data and KOSPI feature feature set data at the same time And the time reference of data collected in the same temperature by the same data is changed to a day before, and when the time reference of the KOSPI index feature set data is changed from the same data to a day before, the time is different according to the number of transmitted time series The AprioriAll algorithm is used to create a set of cases and extract the support, confidence, sequence, and rule set of all association sets.

AprioriAll 알고리즘은 모든 항목 집합에 대한 지지도를 계산하고 지지도 및 신뢰도 값에 상관없이 모든 연관 규칙을 출력한다. 또한, AprioriAll 알고리즘은 연관성 결정부(330)에서 기존 Apriori알고리즘의 제한적 연관 규칙 패턴 추출 방법보다 비제한적 연관 규칙을 추출하기 위해 사용된다.The AprioriAll algorithm calculates the support for all sets of items and outputs all association rules regardless of their support and confidence values. In addition, the AprioriAll algorithm is used in the association determining unit 330 to extract the unconstrained association rule from the existing Apriori algorithm.

도 5를 참고하여 연관성 분석부(320)에 대해서 구체적으로 살펴 보도록 한다.Referring to FIG. 5, the association analyzer 320 will be described in detail.

연관성 분석부(320)은 다중 시계열 입력부(310)로부터 전달받은 다중 시계열 데이터의 특징 패턴 집합들을 시간 순서로 정렬해 다중 시계열 간의 동 시간대의 집합 들로 나열(P01)한다. 이후, 하루의 시간 지연을 통해 특징 집합의 연관 규칙, 빈도를 추출하는 과정과 발생 빈도에 따른 지지도(support)와 신뢰도(confidence)와 값을 추출하는 과정(P02)을 통해 하루 시간 지연 연관 규칙 패턴을 생성한다. 또한, 상기 시간 지연 방법과 동일하게 2일, 5일의 시간지연으로 연관 규칙 패턴(P03, P04)을 생성 한다.The relevance analysis unit 320 arranges the feature pattern sets of the multiple time series data received from the multiple time series input unit 310 in chronological order and arranges them in sets of the same time series among multiple time series (P01). Then, through the process of extracting the association rule and frequency of the feature set through the time delay of one day and the process of extracting support and reliability and value according to the occurrence frequency (P02) . In addition, the association rule patterns P03 and P04 are generated with a time delay of 2 days and 5 days in the same manner as the time delay method.

연관성 결정부(330)는 연관성 분석부(320)에서 생성된 연관 규칙 패턴을 전달 받아 예측 모델로서 의미있는 연관 규칙패턴을 결정한다.The association determining unit 330 receives the association rule pattern generated by the association analyzing unit 320 and determines a meaningful association rule pattern as a prediction model.

구체적으로, 연관성 결정부(330)에서 결정되는 연관 규칙 패턴은 상기 특징 패턴 클러스터 생성부(240)에서 산출된 패턴 집합 가중치 값을 이용해 연관성 분석부(320)에서 전달된 연관 규칙 패턴, 지지도(support), 신뢰도(confidence)값과 종합 계산해 최종적으로 예측 모델로 사용될 다중 시계열 간의 연관 규칙 패턴을 선정하게 된다. 수학식 6은 가중 지지도 및 가중 신뢰도를 기준으로 예측 모델로 사용할 연관 규칙 패턴을 선정하는 방법을 포함한다.Specifically, the association rule pattern determined by the association determining unit 330 may be an association rule pattern, a support rule, and an association rule, which are transmitted from the association analyzing unit 320 using the pattern set weight value calculated by the feature pattern cluster generating unit 240. [ ), Confidence values, and finally, an association rule pattern between multiple time series to be used as a prediction model is selected. Equation (6) includes a method of selecting an association rule pattern to be used as a prediction model based on weighted support and weighted reliability.

는 가중 지지도를 나타내며,

는 가중 신뢰도를 나타낸다. 또한, 계산된 가중 지지도는 최소 지지도 α 값 이상인 값만 수용하며, 미만인 값은 0이 되며, 가중 신뢰도는 최소 신뢰도 β값 이상인 값만 수용하며, 미만인 값은 0이 된다.

Represents the weighted support,

Represents the weighted reliability. Also, the calculated weighted support accepts only values that are greater than or equal to the minimum support α, while a value less than 0 accepts only the value that is greater than or equal to the minimum reliability β.

예측 모델 생성부(340)는 연관성 결정부(330)에서 결정된 연관 규칙 패턴을 저장한다.The prediction model generation unit 340 stores an association rule pattern determined by the association determination unit 330. [

추세 예측부(350)는 예측 모델 생성부(340)에서 저장된 연관 규칙 패턴 예측 모델을 이용해 예측하고자 하는 시계열 데이터의 현재 특징 패턴과 비교해 추세를 예측한다.The trend predicting unit 350 predicts the trend by comparing the current feature pattern of the time series data to be predicted using the association rule pattern prediction model stored in the prediction model generating unit 340.

도 7를 참고하여 추세 예측부(350)에 대해서 구체적으로 살펴 보도록 한다.The trend prediction unit 350 will be described in detail with reference to FIG.

추세 예측부(350)에 예측하고자 하는 시계열 데이터의 특징 패턴 ‘K’를 입력하게 되면 예측모델로부터 다중 시계열 연관 규칙 패턴을 분류 (P05)하며, ‘K’와 연관된 다중 시계열 연관 규칙을 기반으로 미래 추세를 예측(P06)한다.If a characteristic pattern 'K' of time series data to be predicted is input to the trend predicting unit 350, a multiple time series association rule pattern is classified (P05) from a prediction model, and a future time series association rule Predict the trend (P06).

도 4을 참고하여, 본 발명인 다중 특징 추출 기법을 이용한 다중 시계열 데이터 기반 추세 예측 방법을 설명한다. 상기에서 설명한 내용과 중복 되는 부분에 대한 구체적인 설명은 생략한다.Referring to FIG. 4, a multi-time series data-based trend prediction method using the multi-feature extraction technique of the present invention will be described. A detailed description of the parts overlapping with those described above will be omitted.

먼저, 예측 분석을 하고자 하는 시간 속성을 가진 여러 시계열 데이터를 선정한다(S10).First, several time series data having a time attribute for predictive analysis are selected (S10).

다음으로, 선정된 데이터를 개별적으로 시계열 데이터 입력부(210)를 통해 입력받아 값의 범위를 표준화 하기 위해 데이터를 정제한다(S20). 시계열 데이터 표준화는 DWT변환부(221)에서 이상치를 제거하고 정규화부(222)에서 0과 1사이의 값으로 정규화 한다.Next, the selected data is individually inputted through the time series data input unit 210 and the data is refined to standardize the range of the values (S20). In the time series data normalization, the normalization unit 222 removes the ideal value from the DWT conversion unit 221 and normalizes the value to a value between 0 and 1.

다음으로, 정제된 데이터를 시계열 특성을 나타내는 특징들을 추출한다(S20).Next, the characteristics representing the time series characteristic of the purified data are extracted (S20).

구체적으로, 특징 추출부(230)에서 시계열 특징을 변화 속도 및 규모 특징, 변동성 특징, 부분 시계열 형태 특징으로 정의하며, 공통된 정의된 특징을 추출한다.Specifically, the feature extraction unit 230 defines a time series feature as a change rate and scale feature, a volatility feature, and a partial time series feature, and extracts a common defined feature.

다음으로, 시계열 데이터 특징 패턴 추출 단계(S20)에서 전달받은 특징을 시간, 빈도수 기준으로 패턴을 종합해 유사 특징 패턴 단위로 클러스터 한다(S30).Next, the features received in the time series data feature pattern extracting step (S20) are clustered on the basis of time and frequency, and are clustered in a similar feature pattern unit (S30).

그리고, 상기 선정된 모든 다중 시계열 데이터에 대한 특징 패턴 및 유사특징 패턴 종합 클러스터 추출이 완료되었는지 확인(S40)한다.In step S40, it is determined whether the feature pattern and the similar feature pattern comprehensive cluster extraction for all the selected multiple time series data are completed.

모든 다중 시계열 데이터에 대한 특징 추출 및 클러스터가 완료되었다면, 유사 특징 패턴 단위로 클러스터 된 개별 시계열 데이터들을 AprioriAll 알고리즘을 이용해 연관 규칙 패턴을 분류한다(S50).If feature extraction and clustering for all the multiple time series data are completed, the association rule patterns are classified using the AprioriAll algorithm for individual time series data clustered in the similar feature pattern unit (S50).

마지막으로, 분류된 연관 규칙 패턴 중 가중치 지지도 값과 가중치 신뢰도 값을이용해 예측 확률이 높은 연관 규칙 패턴을 선정하고, 선정된 패턴들로 예측 모델을 생성한다(S60). 또한, 생성된 예측 모델을 이용해 다중 시계열에 연관 규칙 패턴 발생시 추세(trend) 예측을 시도할 수 있다.Finally, an association rule pattern having a high probability of prediction is selected using the weight support value and weight reliability value among the classified association rule patterns, and a prediction model is generated with the selected patterns (S60). Also, a trend prediction can be attempted when an association rule pattern occurs in multiple time series using the generated prediction model.

이상과 같이 본 발명은 개별 시계열 데이터가 입력되면, 해당 시계열 고유의 특징을 추출하고 클러스터링하여 패턴 집합을 정의하고 함으로써, 패턴집합에 대한 연관 규칙 패턴을 이용한 다중 시계열 데이터 간 연관성 분석으로 다중 시계열 데이터 기반의 추세 예측 모델을 만들어 낼 수가 있고, 이를 통해 추세 예측이 가능하다.As described above, according to the present invention, when individual time series data are inputted, the association between the multiple time series data is analyzed using the association rule pattern for the pattern set by extracting and clustering the unique characteristic of the time series, , And it is possible to predict trends.

그리고, 본 발명의 기본적인 기술적 사상과 범주 내에서 당해 업계 통상의 지식을 가진 자에게 있어서는 다른 많은 변형 및 응용 또한 가능함은 물론이다.It will be apparent to those skilled in the art that many other modifications and applications are possible within the scope and spirit of the present invention.

100 : 다중 특징분석을 이용한 다중 시계열 데이터 추세 예측 시스템
200 : 개빌 시계열 데이터 특징 클러스터부
210 : 시계열 데이터 입력부
220 : 데이터 정제부
230 : 특징 추출부
240 : 특징 패턴 클러스터 생성부
250 : 클러스터 저장부
300 : 다중 시계열 연관성 분석 예측부
310 : 다중 시계열 입력부
320 : 연관성 분석부
330 : 연관성 결정부
340 : 모델 생성부
350 : 추세 예측부100: Multiple Time Series Data Prediction System Using Multiple Feature Analysis
200: Dogubil time series data characteristic Cluster part
210: Time series data input unit
220: Data refining unit
230: Feature extraction unit
240: Feature pattern cluster generation unit
250: Cluster storage unit
300: Multiple time series correlation analysis prediction unit
310: Multiple time series input unit
320: Association Analysis Unit
330: Association determining section
340:
350:

Claims

A time series data input unit sequentially receiving a plurality of time series data;
A data refinement unit for eliminating and standardizing a difference between different absolute values of the input time series data;
A feature extraction unit for defining and extracting three common characteristics of the refined time series data (a change rate, a change scale characteristic, a variability feature, and a partial time series shape feature);
A feature pattern cluster generation unit for performing a feature pattern cluster by characterizing a set of similarity patterns and a weight calculation using a frequency at which the extracted features appear in the time series data and an appearance frequency in the feature pattern;
A cluster storage unit for storing the feature pattern cluster data according to a feature pattern generation time reference and a frequency number;
A plurality of time series input units for receiving the feature pattern cluster information of one time series stored in the cluster storage unit as a plurality of time series feature pattern cluster data;
A relevance analyzer for inferring an association rule pattern using an association rule analysis (AprioriAll) algorithm for data received from the multiple time series input unit;
An association determining unit for determining an association rule pattern to be used as a prediction model by using an association rule pattern inferred by the association analyzing unit and a feature pattern frequency weight value in an association rule pattern;
A prediction model generation unit for storing a prediction model by combining the association rule patterns determined in the association determination unit;
A trend predicting unit for predicting a trend using the association rule pattern prediction model stored in the prediction model generation unit;
And a plurality of time series data trend prediction systems using multiple feature analysis.

The method according to claim 1,
The data-
In order to reduce the anomalous value between the time series data transmitted to the time series data input unit and ensure the commonality of the time series, the time series data obtained by sequentially removing the calculation abnormal values by using the discrete wavelet transform and the normalization method is divided into 0 and 1 To the feature extraction unit. The system for estimating multiple time series data trends using multi-feature analysis.

The method according to claim 1,
The feature extraction unit may extract,
Three types of feature extraction techniques, a volatility feature extraction technique, and a partial time series extraction technique, which extract characteristics of a time attribute, which is a time series characteristic common to the refined time series data received from the data refining section, And a time series characteristic is extracted using the feature analysis.

The method according to claim 1,
Wherein the feature pattern cluster generating unit comprises:
The features extracted from the feature extraction unit (the change rate, the change scale feature, the variability feature, and the partial time series shape feature) are reconstructed into a three-dimensional vector, and the similar pattern weight value extraction through the clustering based on the same time reference and the clustering based on the frequency reference A full graph algorithm is applied and extracted, and a set of similar feature patterns extracted are characterized to generate a final similarity cluster, and a multiple time series data trend prediction system using multiple feature analysis.

The method according to claim 1,
The association analyzing unit,
A plurality of feature sets of the individual time series data generated by the feature pattern cluster generation unit are received from the multiple time series data input unit, and an association rule extraction considering a time difference using a time lag technique, The association rule pattern is analyzed using the AprioriAll algorithm which can extract all the association rules between the association rule and the association rule together with the support value and the confidence value. Trend Forecasting System.

The method according to claim 1,
The association determining unit may determine,
The relevance analyzing unit analyzes the association rule patterns of multiple time series and the similarity feature pattern weight values extracted from the feature pattern cluster generating unit and receives the similarity feature pattern weights and calculates the degree of support and reliability values of the association rule patterns delayed time- And determining a multiple time series association rule pattern to be used as a prediction model based on the value of the time series correlation rule pattern.

The method according to claim 1,
Wherein the trend predicting unit predicts,
Wherein the trend is predicted in relation to multiple time series using an association rule pattern using the predictive model determined by the relevancy determination unit.

A method for predicting multiple time series data trends using multiple feature analysis techniques,
Selecting multiple time series data with a time attribute to be predicted.
One time series of the selected multiple time series data is input and the abnormal values are removed and standardized to refine the data by using a DWT (Dscret Wavelet Transfer) technique and a normalization technique, and the change rate, the change scale characteristic value, The step of extracting three kinds of morphological features and defining the characteristics of time series.
(Cluster) a feature-based synthesis (clusters) and a frequency-based weightedness extraction for extracting similar feature patterns among the defined features. And
Checking whether the feature pattern for all the multiple time series data and the similar feature pattern comprehensive cluster extraction are completed.
Analyzing the association rule pattern using the AprioriAll algorithm and the time delay technique that can receive the completed multiple time series data similarity pattern cluster data and analyze all the association rule patterns among the association rule algorithms,
Determining an association rule pattern according to a predictive probability value using the extracted similarity feature pattern inter-similarity feature weight weight extraction data and the association rule pattern analysis step among the rule pattern analysis data received in the association rule pattern analysis step, Generating data as a prediction model,
And estimating a future trend when an association pattern is generated using the generated prediction model.