KR101384047B1

KR101384047B1 - System and method for time series clustering using frequeny transform scheme

Info

Publication number: KR101384047B1
Application number: KR1020110070557A
Authority: KR
Inventors: 정윤영; 최재걸; 송아름
Original assignee: 네이버 주식회사
Priority date: 2011-07-15
Filing date: 2011-07-15
Publication date: 2014-04-10
Also published as: KR20110097741A

Abstract

주파수 변환 기법을 이용한 시계열 클러스터링 시스템 및 방법이 개시된다. 시계열 클러스터링 시스템은 키워드에 대한 시계열 자료에 주파수 변환 기법을 적용하여 주파수 변환 계수를 선택하는 계수 선택부, 상기 주파수 변환 계수를 이용하여 상기 키워드에 대한 시계열 자료로부터 클러스터를 생성하는 클러스터 생성부 및 상기 클러스터의 특성을 이용하여 상기 클러스터를 시계열 패턴으로 분류하는 패턴 분류부를 포함할 수 있다.Disclosed are a time series clustering system and method using a frequency conversion technique. The time series clustering system includes a coefficient selector for selecting a frequency transform coefficient by applying a frequency transform technique to time series data for a keyword, a cluster generator for generating a cluster from the time series data for the keyword using the frequency transform coefficient, and the cluster The pattern classifier may be configured to classify the cluster into a time series pattern using a characteristic of.

Description

System and method for time series clustering using frequency transform technique {SYSTEM AND METHOD FOR TIME SERIES CLUSTERING USING FREQUENY TRANSFORM SCHEME}

본 발명은 시계열적인 특성을 나타내는 자료를 클러스터링 하는 시스템 및 장치에 관한 것으로, 보다 자세하게는, 주파수 변환 기법을 이용하여 키워드에 대한 시계열적인 자료를 클러스터링 하여 시계열적인 패턴에 따라 분류하는 시스템 및 방법에 관한 것이다.The present invention relates to a system and apparatus for clustering data exhibiting time-series characteristics. More particularly, the present invention relates to a system and method for clustering time-series data on keywords using frequency conversion techniques and classifying them according to time-series patterns. will be.

사용자들이 검색을 위해 입력하는 키워드의 종류는 특정 시점마다 다르다. 예를 들면, "바캉스"와 같이 여름과 관련된 키워드는 매년 여름마다 검색 횟수가 급증하며, "단풍놀이"와 같이 가을과 관련된 키워드는 매년 가을마다 검색 횟수가 급증할 수 있다. 이와 같이, 매년 계절마다 검색 횟수가 급증하는 키워드가 존재한다.The types of keywords that users enter for a search vary at specific times. For example, a keyword related to summer such as "vacation" may increase in the number of searches every summer, and a keyword related to autumn such as "autumn leaves" may increase in number of searches every autumn. As such, there is a keyword in which the number of searches is rapidly increasing every season.

또한, 특정 시점에 개봉한 영화 제목과 같은 키워드는 영화 개봉 시점을 기준으로 검색 횟수가 급증하지만, 이후에는 이러한 검색 횟수의 급증 현상은 나타나지 않는 경우도 존재할 수 있다.In addition, although the number of searches such as the title of the movie opened at a specific time point is rapidly increased based on the release time of the movie, there may be a case where such a sudden increase in the number of search times does not appear.

광고주나 검색 엔진 관리자는 이러한 시계열적인 특성을 나타내는 키워드의 정보를 이용하여 컨텐츠 및 서비스 관리의 효율성을 향상시킬 수 있다. 그리고, 키워드의 시계열적인 특성으로 고려하여 마케팅 역량 강화와 키워드 광고에서 가격 로직을 개선할 수 있다.Advertisers or search engine managers can improve the efficiency of content and service management by using information on keywords representing these time series characteristics. In addition, considering the time-series characteristics of keywords, it is possible to improve marketing capabilities and improve price logic in keyword advertising.

결국, 시계열적 특성을 나타내는 수많은 키워드를 시계열 특성을 고려하여 키워드의 시계열 특성을 활용하기 위한 방안이 요구되고 있다.As a result, there is a demand for a method for utilizing the time series characteristics of keywords in consideration of the time series characteristics of a number of keywords representing the time series characteristics.

본 발명은 주파수 변환 기법을 이용하여 시간의 흐름에 따라 관측된 키워드에 대한 시계열 자료를 클러스터링 함으로써 키워드를 시계열 패턴에 따라 분류할 수 있는 시계열 클러스터링 시스템 및 방법을 제공할 수 있다.The present invention can provide a time-series clustering system and method that can classify keywords according to time-series patterns by clustering time-series data on keywords observed over time using a frequency conversion technique.

본 발명은 비연속적인 신호 표현에 유리한 웨이블렛 분석을 기반으로 키워드에 대한 시계열 자료를 클러스터링 함으로써 시계열 자료의 시계열 패턴을 보다 정확하게 파악할 수 있는 시계열 클러스터링 시스템 및 방법을 제공할 수 있다.The present invention can provide a time series clustering system and method that can more accurately grasp the time series pattern of time series data by clustering time series data for keywords based on wavelet analysis, which is advantageous for discontinuous signal expression.

본 발명은 키워드에 대한 시계열 자료를 클러스터링 과정을 통해 시계열 패턴으로 분류하고, 패턴 분류 정보를 제공함으로써 시계열 패턴에 따른 키워드에 대한 사용자의 관심을 쉽게 파악할 수 있는 시계열 클러스터링 시스템 및 방법을 제공할 수 있다.The present invention can provide a time series clustering system and method for easily classifying a user's interest in a keyword according to a time series pattern by classifying time series data for a keyword into a time series pattern through a clustering process and providing pattern classification information. .

본 발명의 일실시예에 따른 시계열 클러스터링 시스템은 키워드에 대한 시계열 자료에 주파수 변환 기법을 적용하여 주파수 변환 계수를 선택하는 계수 선택부, 상기 주파수 변환 계수를 이용하여 상기 키워드에 대한 시계열 자료로부터 클러스터를 생성하는 클러스터 생성부 및 상기 클러스터의 특성을 이용하여 상기 클러스터를 시계열 패턴으로 분류하는 패턴 분류부를 포함할 수 있다.According to an embodiment of the present invention, a time series clustering system selects a frequency conversion coefficient by applying a frequency conversion technique to time series data for a keyword, and selects a cluster from time series data for the keyword using the frequency conversion coefficient. It may include a cluster generating unit for generating and a pattern classification unit for classifying the cluster into a time series pattern using the characteristics of the cluster.

본 발명의 일실시예에 따른 시계열 클러스터링 시스템은 미리 설정한 테마별로 키워드의 패턴 분류 정보를 제공하는 분류 정보 제공부를 더 포함할 수 있다.The time series clustering system according to an embodiment of the present invention may further include a classification information providing unit that provides pattern classification information of a keyword for each of a predetermined theme.

본 발명의 일실시예에 따른 시계열 클러스터링 방법은 키워드에 대한 시계열 자료에 주파수 변환 기법을 적용하여 주파수 변환 계수를 선택하는 단계, 상기 주파수 변환 계수를 이용하여 상기 키워드에 대한 시계열 자료로부터 클러스터를 생성하는 단계 및 상기 클러스터의 특성을 이용하여 상기 클러스터를 시계열 패턴으로 분류하는 단계를 포함할 수 있다.The time series clustering method according to an embodiment of the present invention includes selecting a frequency conversion coefficient by applying a frequency conversion technique to time series data for a keyword, and generating a cluster from the time series data for the keyword using the frequency conversion coefficient. And classifying the cluster into a time series pattern using characteristics of the cluster.

본 발명의 일실시예에 따른 시계열 클러스터링 방법은 미리 설정한 테마별로 키워드의 패턴 분류 정보를 제공하는 단계를 더 포함할 수 있다.The time series clustering method according to an embodiment of the present invention may further include providing pattern classification information of a keyword for each preset theme.

본 발명의 일실시예에 따르면, 주파수 변환 기법을 이용하여 시간의 흐름에 따라 관측된 키워드에 대한 시계열 자료를 클러스터링 함으로써 키워드를 시계열 패턴에 따라 분류할 수 있는 시계열 클러스터링 시스템 및 방법이 제공된다.According to an embodiment of the present invention, there is provided a time series clustering system and a method for classifying keywords according to time series patterns by clustering time series data on keywords observed over time using a frequency conversion technique.

본 발명의 일실시예에 따르면, 비연속적인 신호 표현에 유리한 웨이블렛 분석을 기반으로 키워드에 대한 시계열 자료를 클러스터링 함으로써 시계열 자료의 시계열 패턴을 보다 정확하게 파악할 수 있는 시계열 클러스터링 시스템 및 방법이 제공될 수 있다.According to an embodiment of the present invention, a time series clustering system and method for clustering time series data for keywords based on wavelet analysis, which is advantageous for discontinuous signal representation, may be used to more accurately grasp a time series pattern of time series data. .

본 발명의 일실시예에 따르면, 키워드에 대한 시계열 자료를 클러스터링 과정을 통해 시계열 패턴으로 분류하고, 패턴 분류 정보를 제공함으로써 시계열 패턴에 따른 키워드에 대한 사용자의 관심을 쉽게 파악할 수 있는 시계열 클러스터링 시스템 및 방법이 제공될 수 있다.According to an embodiment of the present invention, a time series clustering system for classifying time series data on keywords into a time series pattern through a clustering process, and providing pattern classification information to easily grasp the user's interest in keywords according to the time series pattern; A method may be provided.

도 1은 본 발명의 일실시예에 따른 시계열 클러스터링 시스템을 이용하여 시계열 패턴을 추출하는 전체 과정을 도시한 도면이다.
도 2는 본 발명의 일실시예에 따른 시계열 클러스터링 시스템의 전체 구성을 도시한 블록 다이어그램이다.
도 3은 본 발명의 일실시예에 따라 시계열 자료에 웨이블렛 변환을 적용하여 도출된 웨이블렛 변환 계수의 일례를 도시한 도면이다.
도 4는 본 발명의 일실시예에 따라 키워드에 따른 시계열 자료로부터 패턴을 분류하는 전체 과정을 도시한 도면이다.
도 5는 본 발명의 일실시예에 따라 웨이블렛 계수를 선택하는 과정의 일례를 설명하기 위한 도면이다.
도 6은 본 발명의 일실시예에 따라 시계열 자료를 그룹핑하여 생성된 클러스터의 일례를 도시한 도면이다.
도 7은 본 발명의 일실시예에 따라 이동 평균 방법을 이용하여 클러스터를 패턴으로 분류하는 과정을 설명하기 위한 도면이다.
도 8은 본 발명의 일실시예에 따라 테마별로 키워드의 패턴 분류 정보를 제공하는 일례를 설명하기 위한 도면이다.
도 9는 본 발명의 일실시예에 따른 시계열 클러스터링 방법의 전체 과정을 도시한 플로우차트이다.1 is a diagram illustrating an entire process of extracting a time series pattern using a time series clustering system according to an embodiment of the present invention.
2 is a block diagram showing the overall configuration of a time series clustering system according to an embodiment of the present invention.
3 is a diagram illustrating an example of wavelet transform coefficients derived by applying wavelet transform to time series data according to an embodiment of the present invention.
4 is a diagram illustrating an entire process of classifying patterns from time series data according to keywords according to an embodiment of the present invention.
5 is a diagram illustrating an example of a process of selecting a wavelet coefficient according to an embodiment of the present invention.
6 illustrates an example of a cluster generated by grouping time series data according to an embodiment of the present invention.
7 is a diagram illustrating a process of classifying clusters into patterns using a moving average method according to an embodiment of the present invention.
8 is a view for explaining an example of providing pattern classification information of keywords for each theme according to an embodiment of the present invention.
9 is a flowchart illustrating the entire process of the time series clustering method according to an embodiment of the present invention.

이하, 첨부된 도면들에 기재된 내용들을 참조하여 본 발명에 따른 실시예를 상세하게 설명한다. 다만, 본 발명이 실시예들에 의해 제한되거나 한정되는 것은 아니다. 각 도면에 제시된 동일한 참조부호는 동일한 부재를 나타낸다. 본 발명의 일실시예에 따른 시계열 클러스터링 방법은 시계열 클러스터링 시스템에 포함된 각 구성 요소에 의해 수행될 수 있다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. However, the present invention is not limited to or limited by the embodiments. Like reference symbols in the drawings denote like elements. The time series clustering method according to an embodiment of the present invention may be performed by each component included in the time series clustering system.

도 1은 본 발명의 일실시예에 따른 시계열 클러스터링 시스템을 이용하여 시계열 패턴을 추출하는 전체 과정을 도시한 도면이다.1 is a diagram illustrating an entire process of extracting a time series pattern using a time series clustering system according to an embodiment of the present invention.

시계열 클러스터링 시스템(100)은 입력된 시계열 자료(101)를 분석하여 클러스터를 생성하고, 클러스터의 특성을 이용하여 클러스터를 시계열 패턴(103)으로 분류할 수 있다. 여기서, 시계열 자료는 시간의 흐름에 따라 일정한 간격으로 기록된 자료를 의미할 수 있다. 일례로, 시계열 자료는 일별 검색 엔진의 페이지뷰(page view), 서비스별 클릭수 또는 키워드에 대한 쿼리 카운트 등을 포함할 수 있다. 여기서, 시계열 클러스터링 시스템(100)은 키워드에 대한 시계열 자료인 쿼리 카운트(102)를 이용하여 시계열 분석을 수행할 수 있다.The time series clustering system 100 may generate a cluster by analyzing the input time series data 101, and classify the cluster into the time series pattern 103 using characteristics of the cluster. Here, the time series data may mean data recorded at regular intervals as time passes. For example, the time series data may include a page view of a daily search engine, a click count for each service, or a query count for a keyword. Here, the time series clustering system 100 may perform time series analysis using the query count 102 which is time series data for a keyword.

본 발명의 일실시예에 따른 시계열 클러스터링 시스템(100)은 키워드에 대한 시계열 자료로부터 유사한 시계열 특성을 나타내는 키워드를 시계열 패턴에 따라 분류할 수 있다. 이 때, 본 발명의 일실시예에 따른 시계열 클러스터링 시스템(100)은 시계열 자료를 주파수에 따른 신호로 분해하여 각 스케일 별로 수학적 함수를 적용함으로써 시계열 패턴으로 분류할 수 있다. 이 때, 클러스터링 시스템(100)은 웨이블렛 변환 기법을 적용하여 키워드에 대한 시계열 자료를 시계열 패턴으로 분류할 수 있다.The time series clustering system 100 according to an embodiment of the present invention may classify keywords representing similar time series characteristics from time series data for keywords according to time series patterns. At this time, the time series clustering system 100 according to an embodiment of the present invention may be classified into a time series pattern by decomposing time series data into signals according to frequencies and applying a mathematical function for each scale. In this case, the clustering system 100 may classify the time series data for the keyword into a time series pattern by applying a wavelet transform technique.

도 2는 본 발명의 일실시예에 따른 시계열 클러스터링 시스템의 전체 구성을 도시한 블록 다이어그램이다.2 is a block diagram showing the overall configuration of a time series clustering system according to an embodiment of the present invention.

도 2를 참고하면, 시계열 클러스터링 시스템(100)은 계수 선택부(201), 클러스터 생성부(202), 패턴 분류부(203)를 포함할 수 있다. 추가적으로, 시계열 클러스터링 시스템(100)은 분류 정보 제공부(204)를 더 포함할 수 있다.Referring to FIG. 2, the time series clustering system 100 may include a coefficient selector 201, a cluster generator 202, and a pattern classifier 203. In addition, the time series clustering system 100 may further include a classification information provider 204.

계수 선택부(201)는 키워드에 대한 시계열 자료에 주파수 변환 기법을 적용하여 주파수 변환 계수를 선택할 수 있다.The coefficient selector 201 may select a frequency transform coefficient by applying a frequency transform technique to time series data for the keyword.

이 때, 키워드에 대한 시계열 자료는 키워드에 대해 시간에 따라 수집한 쿼리 카운트(query count)를 포함할 수 있다. 계수 선택부(201)는 키워드에 대한 시계열 자료에 웨이블렛 변환(wavelet transform)을 적용하여 추출된 복수의 웨이블렛 계수 중 선택할 수 있다.In this case, the time series data for the keyword may include a query count collected over time for the keyword. The coefficient selector 201 may select from among a plurality of wavelet coefficients extracted by applying a wavelet transform to time series data for a keyword.

계수 선택부(201)는 시계열 자료에 대해 시간 구간에 따른 스케일 대역으로 구성된 주파수 변환 계수를 선택할 수 있다. 이 때, 스케일 대역은 단계 n에 따라 시계열 자료의 전체 시간 구간(2^K)에 대해 1/2ⁿ(n=1,2, ···, k-1)의 시간 구간으로 설정될 수 있다.The coefficient selector 201 may select a frequency conversion coefficient having a scale band according to a time interval for time series data. At this time, the scale band may be set to a time interval of ^{1/2 n (n = 1,2, ···} , k-1) for the entire time interval (2 ^K) of the time series data according to step n.

웨이블렛 변환은 웨이블렛 함수에 대응하는 웨이블렛 계수를 갖는 필터의 집합을 의미할 수 있다. 구체적으로, 웨이블렛 변환은 웨이블렛 함수를 스케일 대역 각각에 적용하여 웨이블렛 계수를 획득하는 과정을 의미할 수 있다. 스케일 대역 각각의 웨이블렛 계수와 웨이블렛 함수의 곱을 결합하면, 해당 스케일에서의 분해 시계열이 될 수 있다. 이 때, 웨이블렛 함수는 공간상 지역적으로 비연속적인 신호를 표현할 수 있다. 계수 선택부(201)는 웨이블렛 변환을 시계열 자료에 적용하면 자료 개수(2^k)만큼의 웨이블렛 계수를 얻을 수 있다.The wavelet transform may mean a set of filters having wavelet coefficients corresponding to the wavelet function. Specifically, the wavelet transform may mean a process of obtaining a wavelet coefficient by applying a wavelet function to each scale band. Combining the product of the wavelet coefficients and the wavelet function of each scale band can be a decomposition time series at the corresponding scale. In this case, the wavelet function may represent a signal that is discontinuous in space. The coefficient selector 201 may obtain wavelet coefficients equal to the number of data 2 ^k when the wavelet transform is applied to the time series data.

클러스터 생성부(202)는 주파수 변환 계수를 이용하여 키워드에 대한 시계열 자료로부터 클러스터를 생성할 수 있다.The cluster generator 202 may generate a cluster from time series data for a keyword by using the frequency conversion coefficient.

일례로, 클러스터 생성부(202)는 시계열 자료에 주파수 변환 기법을 적용한 결과 도출된 주파수 변환 계수 중 유사한 주파수 대역별로 클러스터링하여 클러스터를 생성할 수 있다.For example, the cluster generator 202 may generate a cluster by clustering similar frequency bands among frequency transform coefficients derived as a result of applying a frequency transform technique to time series data.

패턴 분류부(203)는 클러스터의 특성을 이용하여 클러스터를 시계열 패턴으로 분류할 수 있다.The pattern classifier 203 may classify the cluster into a time series pattern using characteristics of the cluster.

이 때, 패턴 분류부(203)는 클러스터의 변곡 특성을 이용하여 클러스터를 시계열 패턴으로 분류할 수 있다. 일례로, 패턴 분류부(203)는 클러스터의 기울기 변화에 따른 변곡 위치 및 변곡 개수에 기초하여 클러스터를 시계열 패턴으로 분류할 수 있다.In this case, the pattern classifier 203 may classify the cluster into a time series pattern using the inflection characteristics of the cluster. For example, the pattern classifier 203 may classify the cluster into a time series pattern based on the inflection position and the number of inflections according to the change of the inclination of the cluster.

여기서, 패턴 분류부(203)는 클러스터를 다중 변곡 또는 일회 변곡 중 어느 하나의 시계열 패턴으로 분류할 수 있다. 이 때, 다중 변곡 또는 일회 변곡에 해당하지 않고, 변곡이 불규칙적이거나 스테디한 형태를 나타내는 클러스터는 기타 변곡으로 분류될 수 있다.Here, the pattern classifier 203 may classify the cluster into any one time series pattern among multiple inflections and one inflection. At this time, clusters that do not correspond to multiple inflections or one-time inflections and have irregular or steady shapes may be classified into other inflections.

일례로, 패턴 분류부(203)는 변곡 주기가 일, 주, 월, 분기 또는 연도 단위이고, 변곡 회수가 전체 시간 구간동안 2회 이상인 클러스터를 다중 변곡으로 분류할 수 있다. 이 때, 일, 주, 월, 분기 또는 연도 단위는 다양한 숫자가 적용될 수 있다(예를 들면, 2년, 3개월 등). 일례로, 1) 올림픽, 월드컵 등과 같이 미리 설정된 연도(1년, 2년, 4년 등)마다 열리는 체육대회나 2) 엑스포, 비엔날레 등과 같이 미리 설정된 주, 월, 분기, 연도마다 열리는 축제, 전시회, 박람회 또는 3) 공휴일, 기념일 등과 같이 매년 발생하는 특정일 등과 연관된 키워드의 클러스터는 다중 변곡과 같은 시계열 패턴으로 분류될 수 있다.For example, the pattern classifying unit 203 may classify a cluster into multiple inflections in which the inflection period is in units of one, week, month, quarter, or year, and the number of inflections is two or more times during the entire time interval. In this case, various numbers may be applied to the day, week, month, quarter, or year unit (for example, two years, three months, etc.). For example, 1) an athletic competition held every year (1 year, 2 years, 4 years, etc.), such as the Olympics, World Cup, etc. 2) festivals, exhibitions held every preset week, month, quarter, year, such as Expo, Biennale, etc. 3) Clusters of keywords associated with a particular day, such as a trade fair or 3) holidays, anniversaries, etc., which occur each year, may be classified into a time series pattern such as multiple inflections.

그리고, 패턴 분류부(203)는 변곡 회수가 전체 시간 구간동안 1회인 클러스터를 일회 변곡으로 분류할 수 있다. 예를 들어, 특정 시점 또는 특정 시간 간격 동안 사용자들의 관심이 집중된 영화 또는 드라마 등과 같은 일회성 컨텐츠와 관련된 키워드의 클러스터는 일회 변곡과 같은 시계열 패턴으로 분류될 수 있다. 본 발명은 상기 언급한 키워드의 클러스터 예시에 한정되지 않고, 시스템의 구성에 따라 다양한 예시가 적용될 수 있다.In addition, the pattern classifying unit 203 may classify the cluster having one inflection number once in the entire time interval. For example, clusters of keywords related to one-time content, such as movies or dramas, in which a user's attention is focused during a certain time point or a specific time interval, may be classified into a time series pattern such as a one-time inflection. The present invention is not limited to the cluster example of the aforementioned keywords, and various examples may be applied according to the configuration of the system.

분류 정보 제공부(204)는 미리 설정한 테마별로 키워드의 패턴 분류 정보를 제공할 수 있다. 일례로, 분류 정보 제공부(204)는 테마별로 클러스터의 패턴으로 따른 키워드의 종류 및 키워드의 쿼리 카운트를 포함하는 패턴 분류 정보를 제공할 수 있다.The classification information providing unit 204 may provide pattern classification information of a keyword for each preset theme. For example, the classification information providing unit 204 may provide pattern classification information including a type of a keyword according to a cluster pattern and a query count of the keyword for each theme.

도 3은 본 발명의 일실시예에 따라 시계열 자료에 웨이블렛 변환을 적용하여 도출된 웨이블렛 변환 계수의 일례를 도시한 도면이다.3 is a diagram illustrating an example of wavelet transform coefficients derived by applying wavelet transform to time series data according to an embodiment of the present invention.

도 3을 참고하면, 키워드에 대한 쿼리 카운트(301)가 도시된다. 이 때, 키워드에 대한 쿼리 카운트(301)는 사용자가 입력한 쿼리에 포함된 키워드의 검색 횟수를 의미할 수 있다. 예를 들어, 키워드에 대한 쿼리 카운트(301)는 일별, 주별, 월별, 분기별 또는 연도별로 기록될 수 있다. 키워드에 대한 쿼리 카운트(301)를 살펴보면, 특정 시점에서 키워드의 검색 횟수가 급증하는 것을 알 수 있다.Referring to FIG. 3, a query count 301 for a keyword is shown. In this case, the query count 301 for the keyword may mean the number of searches for the keyword included in the query input by the user. For example, the query count 301 for a keyword may be recorded daily, weekly, monthly, quarterly or yearly. Looking at the query count 301 for the keyword, it can be seen that the number of searches for the keyword increases rapidly at a particular point in time.

도 3을 참고하면, 시간 구간에 따른 스케일 대역(302)이 도시된다. 웨이블렛 변환은 웨이블렛 함수를 시간 구간에 따른 스케일 대역(302) 각각에 적용하여 웨이블렛 계수를 얻는 과정을 의미할 수 있다. 스케일 대역의 시간 구간은 단계 n에 대해 1/2ⁿ으로 감소할 수 있으며, 스케일 대역의 개수는 단계 n에 대해 2ⁿ개(n=1, ···, k-1, 전체시간구간=2^k)로 증가할 수 있다. 시간 구간에 따른 스케일 대역(302)에서 스케일 대역의 시간 구간이 클수록 해당 스케일 대역은 저주파 성분을 의미할 수 있다.Referring to FIG. 3, a scale band 302 is shown over time. The wavelet transform may mean a process of obtaining a wavelet coefficient by applying a wavelet function to each of the scale bands 302 according to a time interval. The time interval of the scale band can be reduced to 1/2 ⁿ for step n, and the number of scale bands is 2 ⁿ for step n (n = 1, ..., k-1, total time interval = 2). ^k ). In the scale band 302 according to the time interval, the scale band may mean a low frequency component as the time interval of the scale band increases.

결과적으로, 키워드에 대한 쿼리 카운트(301)에 웨이블렛 변환을 수행하면, 시간 구간에 따른 스케일 대역(301) 각각에 대해 웨이블렛 계수(303)가 도출될 수 있다. 그리고, 스케일 대역 각각의 웨이블렛 계수(303)와 웨이블렛 함수를 곱하여 결합하면, 해당 스케일 대역에서의 분해 시계열 자료를 획득할 수 있다.As a result, when the wavelet transform is performed on the query count 301 for the keyword, the wavelet coefficient 303 may be derived for each of the scale bands 301 according to the time interval. When the wavelet coefficient 303 of each of the scale bands is multiplied and combined, the decomposition time series data of the corresponding scale band may be obtained.

도 4는 본 발명의 일실시예에 따라 키워드에 따른 시계열 자료로부터 시계열 패턴을 분류하는 전체 과정을 도시한 도면이다.4 is a diagram illustrating an entire process of classifying a time series pattern from time series data according to a keyword according to an embodiment of the present invention.

도 4를 참고하면, 키워드에 대한 시계열 자료(401)가 도시되어 있다. 시계열 자료(401)의 예시로서, 시계열 자료 D₁(401-1), 시계열 자료 D₂(401-2), 시계열 자료 D₃(401-3), 시계열 자료 D₄(401-4), 시계열 자료 D₅(401-5) 및 시계열 자료 D₆(401-6)가 도시되어 있다.Referring to FIG. 4, time series data 401 for a keyword is shown. As examples of time series data 401, time series data D ₁ (401-1), time series data D ₂ (401-2), time series data D ₃ (401-3), time series data D ₄ (401-4), time series Data D ₅ (401-5) and time series Data D ₆ (401-6) are shown.

시계열 클러스터링 시스템(100)은 시계열 자료에 웨이블렛 변환을 적용하여 추출된 웨이블렛 계수를 선택할 수 있다. 이 때, 시계열 클러스터링 시스템(100)은 시계열 자료에 주파수 변환 기법을 적용한 결과 도출된 주파수 변환 계수 중 유사한 주파수 대역별로 클러스터링하여 클러스터를 생성할 수 있다. 이 때, 웨이블렛 계수는 특정 시간 구간에 따른 스케일 대역으로 구성될 수 있다. 스케일 대역은 단계 n에 따라 시계열 자료(401)의 전체 시간 구간에 대해 1/2ⁿ(n=1, ···, k-1, 전체시간구간=2^k)의 시간 구간으로 설정될 수 있다. 그리고, 클러스터(402)는 유사한 시계열 특성을 나타내는 시계열 자료(401)의 집합을 의미할 수 있다.The time series clustering system 100 may select a wavelet coefficient extracted by applying a wavelet transform to time series data. In this case, the time series clustering system 100 may generate a cluster by clustering by similar frequency bands among frequency conversion coefficients derived as a result of applying a frequency conversion technique to time series data. In this case, the wavelet coefficient may be configured as a scale band according to a specific time interval. The scale band may be set to a time interval of 1/2 ⁿ (n = 1, ..., k-1, total time interval = 2 ^k ) for the entire time interval of the time series data 401 according to step n. . In addition, the cluster 402 may mean a set of time series data 401 exhibiting similar time series characteristics.

도 4를 참고하면, 시계열 자료 D₁(401-1), 시계열 자료 D₂(401-2)로부터 클러스터 C₁(402-1)이 생성될 수 있다. 그리고, 시계열 자료 D₃(401-3), 시계열 자료 D₄(401-4), 시계열 자료 D₅(401-5)로부터 클러스터 C₂(402-2)가 생성될 수 있다. 또한, 시계열 자료 D₆(401-6)로부터 클러스터 C₃(402-3)가 생성될 수 있다.Referring to FIG. 4, cluster C ₁ 402-1 may be generated from time series data D ₁ 401-1 and time series data D ₂ 401-2. The cluster C ₂ 402-2 may be generated from the time series data D ₃ 401-3, the time series data D ₄ 401-4, and the time series data D ₅ 401-5. In addition, cluster C ₃ 402-3 may be generated from time series data D ₆ 401-6.

시계열 클러스터링 시스템(100)은 클러스터(402)의 변곡 특성을 이용하여 클러스터(402)를 시계열 패턴(403)으로 분류할 수 있다. 구체적으로, 시계열 클러스터링 시스템(100)은 클러스터(402)의 기울기 변화에 따른 변곡 위치 및 변곡 개수에 따라 클러스터(402)를 시계열 패턴(403)으로 분류할 수 있다. 이 때, 클러스터(402)에서 기울기가 증가하였다가 감소하는 부분을 변곡으로 정의될 수 있다.The time series clustering system 100 may classify the cluster 402 into the time series pattern 403 by using the inflection characteristics of the cluster 402. In detail, the time series clustering system 100 may classify the cluster 402 into the time series pattern 403 according to the inflection position and the inflection number according to the change in the slope of the cluster 402. In this case, a portion in which the slope increases and decreases in the cluster 402 may be defined as an inflection.

도 4를 참고하면, 클러스터 C₁(402-1)은 시계열 패턴 P₁(403-1)으로 분류될 수 있다. 그리고, 클러스터 C₂(402-2)는 시계열 패턴 P₂(403-2)로 분류될 수 있다. 또한, 클러스터 C₃(402-3)은 시계열 패턴 P₃(403-3)으로 분류될 수 있다.Referring to FIG. 4, the cluster C ₁ 402-1 may be classified into a time series pattern P ₁ 403-1. The cluster C ₂ 402-2 may be classified into a time series pattern P ₂ 403-2. Cluster C ₃ 402-3 may also be classified into time series pattern P ₃ 403-3.

여기서, 시계열 패턴 P₁(403-1)은 기타 변곡을 의미하고, 시계열 패턴 P₂(403-2)는 다중 변곡을 의미하며, 시계열 패턴 P₃(403-3)은 일회 변곡을 의미할 수 있다. 시계열 클러스터링 시스템(100)은 변곡 주기가 일, 주, 월, 분기 또는 연도 단위이고, 변곡 회수가 전체 시간 구간동안 2회 이상인 클러스터를 다중 변곡으로 분류할 수 있고, 변곡 회수가 전체 시간 구간동안 1회인 클러스터를 일회 변곡으로 분류할 수 있다. 나머지 불규칙적이거나 스테디한 상태인 클러스터는 기타 변곡으로 분류할 수 있다.Here, the time series pattern P ₁ 403-1 may mean other inflections, the time series pattern P ₂ 403-2 may mean multiple inflections, and the time series pattern P ₃ 403-3 may mean one time inflection. have. The time series clustering system 100 may classify a cluster in which the inflection period is one, week, month, quarter, or year, and the inflection number is two or more times in the entire time interval, and the inflection number is one during the entire time interval. Canine clusters can be classified as one inflection. The remaining irregular or steady clusters can be classified as other inflections.

다중 변곡은 키워드가 변곡 주기가 일, 주, 월, 분기 또는 연도 단위로 키워드의 쿼리 카운트가 집중되는 형태의 키워드에 나타날 수 있다. 그리고, 일회 변곡은 특정 시점 또는 특정 기간에서 집중해서 입력되는 형태의 키워드에 나타날 수 있다. 기타 변곡은 다중 변곡 및 일회 변곡으로 분류되지 않은 키워드(불규칙, 계속 증가/감소되는 형태)에 나타날 수 있다.In the multiple inflection, a keyword may appear in a keyword in which an inflection cycle of the keyword is concentrated in units of days, weeks, months, quarters, or years. In addition, the one-time inflection may appear in a keyword of a type that is concentrated at a specific time point or a specific time period. Other inflections may appear in keywords (irregular, ever increasing / decreasing forms) that are not classified as multiple inflections and one-time inflections.

도 5는 본 발명의 일실시예에 따라 웨이블렛 계수를 선택하는 과정의 일례를 설명하기 위한 도면이다.5 is a diagram illustrating an example of a process of selecting a wavelet coefficient according to an embodiment of the present invention.

시계열 클러스터링 시스템(100)은 키워드에 대한 시계열 자료에 주파수 변환 기법을 적용하여 주파수 변환 계수를 선택할 수 있다. 일례로, 시계열 클러스터링 시스템(100)은 웨이블렛 변환을 적용하여 추출된 웨이블렛 계수를 선택할 수 있다. 구체적으로, 웨이블렛 변환은 웨이블렛 함수를 시계열 자료에 적용하여 시간 구간에 따른 스케일 대역으로 구성된 웨이블렛 계수를 추출하는 과정이다.The time series clustering system 100 may select a frequency conversion coefficient by applying a frequency conversion technique to time series data for a keyword. In one example, the time series clustering system 100 may apply the wavelet transform to select the extracted wavelet coefficients. Specifically, the wavelet transform is a process of extracting wavelet coefficients composed of scale bands according to time intervals by applying a wavelet function to time series data.

웨이블렛 함수는 하기 수학식 1로 표현될 수 있다.The wavelet function may be expressed by Equation 1 below.

수학식 1에서 볼 수 있듯이, 웨이블렛 함수는 한 쌍의 기저함수인 부 웨이블렛(father wavelet) 함수

와 모 웨이블렛(mother wavelet) 함수

의 조합으로 표현될 수 있다. j는 스케일 대역을 의미하고, k는 시간 위치를 의미할 수 있다. 그리고, a는 웨이블렛 계수를 의미할 수 있다. 이 때, 모 웨이블렛 함수는 확장과 전이를 통해 다양한 스케일에 적합한 함수 집합을 형성할 수 있다. 임의의 함수

는 기저함수들의 선형 조합으로 표현될 수 있다.As you can see from Equation 1, the wavelet function is a father wavelet function that is a pair of basis functions.

And mother wavelet functions

It can be expressed as a combination of. j may mean a scale band, and k may mean a time position. And a may mean a wavelet coefficient. At this time, the parent wavelet function may form a set of functions suitable for various scales through expansion and transition. Arbitrary function

Can be expressed as a linear combination of basis functions.

도 5를 참고하면, 시간 구간에 따른 스케일 대역으로 구성된 웨이블렛 계수가 도시된다. 스케일 대역은 단계가 증가할수록 스케일 대역의 개수는 2배씩 증가하지만, 전체 시간 구간에서 스케일 대역의 시간 구간은 1/2로 작아질 수 있다. 스케일 대역의 시간 구간이 작을수록 시계열 자료의 시계열 분석이 어려울 수 있기 때문에, 시계열 클러스터링 시스템(100)은 도 5와 같이 2단계까지의 스케일 대역을 나타내는 웨이블렛 계수(W₁, W₂)를 선택할 수 있다. 만약, 전체 시간 구간이 1024일인 경우, 1단계 스케일 대역은 512일의 시간 구간을 나타내고, 2단계 스케일 대역은 256일의 시간 구간을 나타낼 수 있다.Referring to FIG. 5, a wavelet coefficient composed of scale bands according to time intervals is shown. Although the number of scale bands is increased by two times as the scale band increases, the time interval of the scale band may be reduced to 1/2 in the entire time interval. As the time interval of the scale band is smaller, time series analysis of time series data may be more difficult, so the time series clustering system 100 may select wavelet coefficients W ₁ and W ₂ representing scale bands up to two levels as shown in FIG. 5. have. If the entire time interval is 1024 days, the first-stage scale band may represent a time period of 512 days, and the second-stage scale band may represent a time period of 256 days.

이 때, 전체 1024일에 대응하는 키워드에 대한 시계열 자료에 있어서 1단계 스케일 대역을 나타내는 웨이블렛 계수가 선택된 경우, 시계열 클러스터링 시스템(100)은 1024일의 시계열 자료 중 512일의 시간 구간에 대응하는 주파수 변환 계수 중 유사한 주파수 대역별로 클러스터링하여 클러스터를 생성할 수 있다.At this time, when the wavelet coefficient indicating the first-stage scale band is selected in the time series data for the keyword corresponding to the entire 1024 days, the time series clustering system 100 performs a frequency corresponding to the 512-day time interval among the 1024 days of time series data. Clusters may be generated by clustering similar frequency bands among the transform coefficients.

도 6은 본 발명의 일실시예에 따라 시계열 자료를 그룹핑하여 생성된 클러스터의 일례를 도시한 도면이다.6 illustrates an example of a cluster generated by grouping time series data according to an embodiment of the present invention.

클러스터(601)는 특정 시간 주기별로 키워드 검색 횟수가 급증하는 형태를 나타낸다. 특정 시간 주기는 매년 주기를 의미할 수 있다. 예를 들면, 키워드가 스키인 경우, 전체 1024일 중 매년 겨울에 집중해서 검색되므로 클러스터(601)와 같은 시계열 특성을 나타낼 수 있다.The cluster 601 shows a form in which the number of keyword searches increases rapidly in a specific time period. The specific time period may mean an annual period. For example, if the keyword is ski, since the skies are searched intensively every winter of the 1024 days, the keyword may exhibit time series characteristics such as the cluster 601.

클러스터(602)는 시간에 관계없이 불규칙하거나 스테디한 키워드 검색 횟수를 나타낸다. 시간과 관련이 없는 키워드의 시계열 특성은 클러스터(602)와 같은 시계열 특성을 나타낼 수 있다.Cluster 602 represents an irregular or steady number of keyword searches, regardless of time. Time series characteristics of keywords not related to time may represent time series characteristics, such as cluster 602.

클러스터(603)는 전체 기간 동안 특정 시점에서 키워드 검색 횟수가 급증하는 형태를 나타낸다. 예를 들면, 1회성 이벤트(영화, 드라마, 운동경기 등)에 관한 키워드의 시계열 특성은 클러스터(603)와 같은 시계열 특성을 나타낼 수 있다.The cluster 603 shows a sudden increase in the number of keyword searches at a specific point in time over the entire period. For example, a time series characteristic of a keyword regarding a one-time event (movie, drama, athletic game, etc.) may represent a time series characteristic such as a cluster 603.

도 7은 본 발명의 일실시예에 따라 이동 평균 방법을 이용하여 클러스터를 패턴으로 분류하는 과정을 설명하기 위한 도면이다.7 is a diagram illustrating a process of classifying clusters into patterns using a moving average method according to an embodiment of the present invention.

도 7은 클러스터에 포함된 시계열 자료의 원본 자료(701) 및 스무딩된 자료(702)를 나타낸다. 일례로, 시계열 클러스터링 시스템(100)은 이동 평균 방법을 이용하여 원본 자료(701)를 스무딩(smoothing)할 수 있다. 이 때, 이동 평균 방법은 기준 시점과 인접한 n개의 관측값의 평균을 의미할 수 있다.7 shows original data 701 and smoothed data 702 of time series data included in the cluster. In one example, the time series clustering system 100 may smooth the original material 701 using a moving average method. In this case, the moving average method may mean an average of n observation values adjacent to the reference time point.

그러면, 시계열 클러스터링 시스템(100)은 스무딩된 자료(702)의 기울기 변화를 고려하여 변곡을 추출하고, 변곡 위치와 변곡 개수에 따라 클러스터를 다중 변곡 또는 일회 변곡 중 어느 하나로 분류할 수 있다. 시계열 클러스터링 시스템(100)은 변곡 주기가 일, 주, 월, 분기 또는 연도 단위이고, 변곡 회수가 전체 시간 구간동안 2회 이상인 클러스터를 다중 변곡으로 분류할 수 있다. 그리고, 시계열 클러스터링 시스템(100)은 변곡 회수가 전체 시간 구간동안 1회인 클러스터를 일회 변곡으로 분류할 수 있다. 클러스터 중 다중 변곡 또는 일회 변곡 어디에도 속하지 않은 것은 기타 변곡으로 분류될 수 있다.Then, the time series clustering system 100 may extract the inflection in consideration of the change in the slope of the smoothed material 702, and classify the cluster into one of multiple inflections or one inflection according to the inflection position and the number of inflections. The time series clustering system 100 may classify a cluster into multiple inflections in which the inflection period is in units of one, week, month, quarter, or year, and the number of inflections is two or more times during the entire time interval. In addition, the time series clustering system 100 may classify a cluster having one inflection number once in an entire inflection time period. Any cluster that does not belong to multiple inflections or one inflection may be classified as other inflections.

도 8은 본 발명의 일실시예에 따라 테마별로 키워드의 패턴 분류 정보를 제공하는 일례를 설명하기 위한 도면이다.8 is a view for explaining an example of providing pattern classification information of keywords for each theme according to an embodiment of the present invention.

시계열 클러스터링 시스템(100)은 미리 설정한 테마별로 키워드의 패턴 분류 정보를 제공할 수 있다. 일례로, 시계열 클러스터링 시스템(100)은 테마별로 클러스터의 시계열 패턴에 따른 키워드의 종류 및 키워드의 쿼리 카운트를 포함하는 패턴 분류 정보를 제공할 수 있다.The time series clustering system 100 may provide pattern classification information of a keyword for each preset theme. For example, the time series clustering system 100 may provide pattern classification information including a type of a keyword and a query count of the keyword according to a time series pattern of a cluster for each theme.

도 8을 참고하면, 테마별로 키워드의 패턴 분류 정보의 정량적 지표를 나타내는 표(801)가 도시되어 있다. 구체적으로, 표(801)에서, 테마가 "축제"인 경우, 전체 시간 구간동안 변곡 주기가 연도 단위(매년, 2년마다 또는 4년마다 등)로 변곡이 발생하여 다중 변곡으로 분류된 키워드는 A5개(전체 X1%)이고, 한번 변곡이 발생하여 일회 변곡으로 분류된 키워드는 Y1개이며, 불규칙하거나 스테디한 시계열 특성을 나타내는 키워드는 Z1인 것을 알 수 있다.Referring to FIG. 8, a table 801 showing a quantitative index of pattern classification information of a keyword for each theme is illustrated. Specifically, in the table 801, when the theme is "festival," the inflection period occurs in the year unit (every year, every two years or every four years, etc.) during the entire time interval, the keyword classified as a multiple inflection It can be seen that there are A5 (X1% in total), Y1 keywords are classified once as inflections occur, and Y1 keywords representing irregular or steady time series characteristics are Z1.

특히, "축제, 산, 요리"와 같이 계절에 따라 집중적으로 입력되는 정보성 테마 키워드의 경우, 시계열 클러스터링 시스템(100)은 그래프(802)와 같이 계절별 분포도를 제공할 수 있다. 또한, 시계열 클러스터링 시스템(100)은 각 계절마다 대표적인 키워드의 종류 및 상기 키워드의 쿼리 카운트(QC)를 포함하는 패턴 분류 정보를 제공할 수 있다. 예를 들면, 테마가 "축제"인 경우, 봄에 벚꽃 축제(QC: aaaaa), 논산 딸기 축제(QC: bbbbb)와 같은 패턴 분류 정보를 제공할 수 있다. In particular, in the case of information-themed keywords that are input intensively according to the seasons such as "Festivals, Mountains, and Cooking," the time-series clustering system 100 may provide a seasonal distribution like the graph 802. In addition, the time series clustering system 100 may provide pattern classification information including a type of a representative keyword and a query count (QC) of the keyword in each season. For example, when the theme is "festival", pattern classification information such as cherry blossom festival (QC: aaaaa) and nonsan strawberry festival (QC: bbbbb) may be provided in the spring.

이러한 패턴 분류 정보를 통해 계절별 사용자가 선호하는 키워드의 종류를 파악할 수 있다. 그리고, 계절별 사용자가 선호하는 키워드의 종류를 파악함으로써, 각 계절마다 사용자 타겟팅 광고를 적당한 시기에 제공함으로써 광고 효과를 향상시킬 수 있다. 또는 광고주는 특정 테마마다 제공되는 시계열적인 정보(키워드 클러스터)를 이용하여 사용자 타겟팅 광고를 수행할 수 있다.Through such pattern classification information, it is possible to determine the kind of keywords preferred by seasonal users. In addition, by identifying the types of keywords preferred by season users, advertisement effects may be improved by providing user-targeted advertisements at appropriate times in each season. Alternatively, the advertiser may perform user targeting advertisement using time series information (keyword cluster) provided for each specific theme.

도 9는 본 발명의 일실시예에 따른 시계열 클러스터링 방법의 전체 과정을 도시한 플로우차트이다.9 is a flowchart illustrating the entire process of the time series clustering method according to an embodiment of the present invention.

단계(S910)에서, 시계열 클러스터링 시스템(100)은 키워드에 대한 시계열 자료에 주파수 변환 기법을 적용하여 주파수 변환 계수를 선택할 수 있다. 키워드에 대한 시계열 자료는 키워드에 대해 시간에 따라 수집한 쿼리 카운트(query count)를 포함할 수 있다.In operation S910, the time series clustering system 100 may select a frequency conversion coefficient by applying a frequency conversion technique to time series data for a keyword. Time series data for a keyword may include a query count collected over time for the keyword.

일례로, 시계열 클러스터링 시스템(100)은 키워드에 대한 시계열 자료에 웨이블렛 변환(wavelet transform)을 적용하여 추출된 웨이블렛 계수를 선택할 수 있다. 웨이블렛 변환은 웨이블렛 함수에 대응하는 웨이블렛 계수를 갖는 필터의 집합을 의미할 수 있다. 구체적으로, 웨이블렛 변환은 웨이블렛 함수를 스케일 대역 각각에 적용하여 웨이블렛 계수를 획득하는 과정을 의미할 수 있다. For example, the time series clustering system 100 may select a wavelet coefficient extracted by applying a wavelet transform to time series data for a keyword. The wavelet transform may mean a set of filters having wavelet coefficients corresponding to the wavelet function. Specifically, the wavelet transform may mean a process of obtaining a wavelet coefficient by applying a wavelet function to each scale band.

이 때, 시계열 클러스터링 시스템(100)은 시계열 자료에 대해 시간 구간에 따른 스케일 대역으로 구성된 주파수 변환 계수를 선택할 수 있다. 이 때, 스케일 대역은 단계 n에 따라 시계열 자료의 전체 시간 구간에 대해 1/2ⁿ(n=1, ···, k-1, 전체시간=2^k)의 시간 구간으로 설정될 수 있다. 반대로, 스케일 대역의 개수는 단계 n에 대해 2ⁿ 개로 증가할 수 있다.In this case, the time series clustering system 100 may select a frequency conversion coefficient having a scale band according to a time interval for time series data. At this time, the scale band may be set to a time interval of 1/2 ⁿ (n = 1, ..., k-1, total time = 2 ^k ) for the entire time interval of the time series data according to step n. In contrast, the number of scale bands may increase to 2 ⁿ for step n.

단계(S920)에서, 시계열 클러스터링 시스템(100)은 주파수 변환 계수를 이용하여 키워드에 대한 시계열 자료로부터 클러스터를 생성할 수 있다. 이 때, 주파수 변환 계수는 웨이블렛 변환을 통해 결정된 웨이블렛 계수를 의미할 수 있다.In operation S920, the time series clustering system 100 may generate a cluster from time series data for a keyword using a frequency transform coefficient. In this case, the frequency transform coefficient may mean a wavelet coefficient determined through wavelet transform.

그리고, 시계열 클러스터링 시스템(100)은 시계열 자료에 주파수 변환 기법을 적용한 결과 도출된 주파수 변환 계수 중 유사한 주파수 대역별로 클러스터링하여 클러스터를 생성할 수 있다.The time series clustering system 100 may generate a cluster by clustering similar frequency bands among frequency conversion coefficients derived as a result of applying a frequency conversion technique to time series data.

단계(S930)에서, 시계열 클러스터링 시스템(100)은 클러스터의 특성을 이용하여 클러스터를 시계열 패턴으로 분류할 수 있다. 일례로, 시계열 클러스터링 시스템(100)은 클러스터의 변곡 특성을 이용하여 클러스터를 시계열 패턴으로 분류할 수 있다. 구체적으로, 클러스터링 시스템(100)은 클러스터의 기울기 변화에 따른 변곡 위치 및 변곡 개수에 기초하여 클러스터를 시계열 패턴으로 분류할 수 있다.In operation S930, the time series clustering system 100 may classify the cluster into a time series pattern using characteristics of the cluster. For example, the time series clustering system 100 may classify the cluster into a time series pattern by using the inflection characteristics of the cluster. In detail, the clustering system 100 may classify the cluster into a time series pattern based on the inflection position and the number of inflections according to the change of the inclination of the cluster.

일례로, 시계열 클러스터링 시스템(100)은 클러스터를 다중 변곡 또는 일회 변곡 중 어느 하나의 시계열 패턴으로 분류할 수 있다. 이 때, 시계열 클러스터링 시스템(100)은 변곡 주기가 일, 주, 월, 분기 또는 연도 단위이고, 변곡 회수가 전체 시간 구간동안 2회 이상인 클러스터를 다중 변곡으로 분류하고, 전체 시간 구간동안 변곡 회수가 1회인 클러스터를 일회 변곡으로 분류할 수 있다.In one example, the time series clustering system 100 may classify the cluster into one time series pattern of multiple inflections or one inflection. At this time, the time series clustering system 100 classifies a cluster in which the inflection period is in units of one, week, month, quarter, or year, and the inflection number is two or more times in the entire time interval, and the inflection frequency is increased in the entire time interval. One-time clusters can be classified as one inflection.

다중 변곡은 키워드가 변곡 주기가 일, 주, 월, 분기 또는 연도 단위에 따라 집중적으로 입력되는 형태의 키워드에 나타날 수 있다. 그리고, 일회 변곡은 특정 시점에서 집중해서 입력되는 형태의 키워드에 나타날 수 있다. 기타 변곡은 다중 변곡 및 일회 변곡으로 분류되지 않은 키워드(불규칙, 계속 증가/감소되는 형태)에 나타날 수 있다.The multiple inflection may appear in a keyword in which a keyword is intensively input according to the inflection period in units of days, weeks, months, quarters, or years. The inflection once may appear in a keyword of a type that is concentrated at a specific point in time. Other inflections may appear in keywords (irregular, ever increasing / decreasing forms) that are not classified as multiple inflections and one-time inflections.

단계(S940)에서, 시계열 클러스터링 시스템(100)은 미리 설정한 테마별로 키워드의 패턴 분류 정보를 제공할 수 있다. 일례로, 시계열 클러스터링 시스템(100)은 테마별로 클러스터의 시계열 패턴으로 따른 키워드의 종류 및 키워드의 쿼리 카운트를 포함하는 패턴 분류 정보를 제공할 수 있다. 이러한 패턴 분류 정보를 통해 계절별 사용자가 선호하는 키워드의 종류를 파악할 수 있다.In operation S940, the time series clustering system 100 may provide pattern classification information of a keyword for each preset theme. For example, the time series clustering system 100 may provide pattern classification information including a type of a keyword and a query count of the keyword according to a time series pattern of a cluster for each theme. Through such pattern classification information, it is possible to determine the kind of keywords preferred by seasonal users.

도 9에서 설명되지 않은 구체적인 부분은 도 1 내지 도 8의 설명을 참고할 수 있다.Specific parts not described in FIG. 9 may refer to descriptions of FIGS. 1 to 8.

또한 본 발명의 일실시예에 따른 시계열 클러스터링 방법은 다양한 컴퓨터로 구현되는 동작을 수행하기 위한 프로그램 명령을 포함하는 컴퓨터 판독 가능 매체를 포함한다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.In addition, the time-series clustering method according to an embodiment of the present invention includes a computer-readable medium including program instructions for performing operations implemented by various computers. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The media may be program instructions that are specially designed and constructed for the present invention or may be available to those skilled in the art of computer software. Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like.

이상과 같이 본 발명은 비록 한정된 실시예와 도면에 의해 설명되었으나, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 이는 본 발명이 속하는 분야에서 통상의 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다. 따라서, 본 발명 사상은 아래에 기재된 특허청구범위에 의해서만 파악되어야 하고, 이의 균등 또는 등가적 변형 모두는 본 발명 사상의 범주에 속한다고 할 것이다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, Modification is possible. Accordingly, the spirit of the present invention should be understood only in accordance with the following claims, and all equivalents or equivalent variations thereof are included in the scope of the present invention.

100: 시계열 클러스터링 시스템
101: 시계열 자료
102: 시계열 자료(키워드 QC)
103: 시계열 패턴100: time series clustering system
101: time series data
102: time series data (keyword QC)
103: time series pattern

Claims

A coefficient selector which selects an extracted frequency transform coefficient by applying a frequency transform technique to time series data for a keyword;
A cluster generation unit for generating a cluster by grouping the selected frequency transform coefficients for each keyword according to similarity; And
A pattern classifier that classifies the cluster into a time series pattern using characteristics of the cluster.
Time series clustering system comprising a.

Selecting the extracted frequency transform coefficients by applying a frequency transform technique to time series data for the keyword;
Generating a cluster by grouping the selected frequency transform coefficients for each keyword according to similarity; And
Classifying the cluster into a time series pattern using characteristics of the cluster
Time series clustering method comprising a.