KR101132450B1

KR101132450B1 - Realtime rush keyword and adaptive system

Info

Publication number: KR101132450B1
Application number: KR1020070122342A
Authority: KR
Inventors: 최재걸; 김동욱; 박영광
Original assignee: 엔에이치엔(주)
Priority date: 2007-11-28
Filing date: 2007-11-28
Publication date: 2012-03-30
Also published as: KR20070117526A

Abstract

본 발명은 키워드 입력성향의 시간적 특성을 고려하여 미래시점에서의 키워드별 검색횟수를 예측하고, 예측된 추정검색횟수를 이용하여 급상승 키워드를 실시간으로 검출하는 실시간 급상승 키워드 추출 방법 및 실시간 급상승 키워드 추출 시스템에 관한 것이다. 본 발명에 따르면, 키워드의 현재 입력성향을 시간적으로 고려하여 키워드에 대한 시계열 자료를 생성하고, 생성된 시계열 자료를 이용하여 각 키워드별로 미래 시점에서의 추정검색횟수를 실시간으로 예측할 수 있는 실시간 급상승 키워드 추출 방법 및 실시간 급상승 키워드 추출 시스템을 제공할 수 있다.The present invention predicts the number of searches for each keyword in the future in consideration of the temporal characteristics of the keyword input tendency, and uses the estimated estimated number of times to detect the rising keyword in real time. It is about. According to the present invention, time-series data is generated for a keyword in consideration of a current input tendency of a keyword, and a real-time spike keyword for predicting an estimated number of searches at a future time point for each keyword in real time using the generated time-series data. An extraction method and a real-time rising keyword extraction system can be provided.

급상승 키워드, 키워드 입력성향, 키워드, 추정검색횟수, 지수평활법 Rising Keywords, Keyword Propensity, Keyword, Estimated Search Count, Exponential Smoothing

Description

Real-time rising keyword extraction method and real-time rising keyword extraction system {REALTIME RUSH KEYWORD AND ADAPTIVE SYSTEM}

도 1은 본 발명의 일실시예에 따른 실시간 급상승 키워드 추출 시스템의 네트워크 연결을 나타내는 도면이다.1 is a diagram illustrating a network connection of a real-time rising keyword extraction system according to an embodiment of the present invention.

도 2는 본 발명의 일실시예에 따른 실시간 급상승 키워드 추출 시스템을 나타내는 구성도이다.2 is a block diagram showing a real-time rising keyword extraction system according to an embodiment of the present invention.

도 3은 본 발명에 따라 예측된 추정검색횟수와 실제 카운트된 검색횟수를 비교하기 위한 도면이다.3 is a diagram for comparing an estimated estimated search count and an actual counted search count according to the present invention.

도 4는 본 발명에 따른 급상승 키워드 추출의 개략적인 알고리즘을 설명하기 위한 도면이다.4 is a view for explaining a schematic algorithm of the extraction of the rising keyword according to the present invention.

도 5는 본 발명에 따른 키워드 별 추정검색횟수의 예측 및 급상승 키워드의 추출 일례를 예시하는 도면이다.5 is a diagram illustrating an example of prediction of the estimated number of searches for each keyword and extraction of the rising keyword according to the present invention.

도 6은 본 발명의 일실시예에 따른 실시간 급상승 키워드 추출 방법을 구체적으로 도시한 작업 흐름도이다.6 is a flowchart illustrating a method of extracting a real-time rising keyword in detail according to an embodiment of the present invention.

도 7은 본 발명에 따른 키워드별 추정검색횟수를 연산하는 일례를 설명하기 위한 작업 흐름도이다.7 is a flowchart illustrating an example of calculating the estimated search frequency for each keyword according to the present invention.

도 8은 본 발명에 따른 급상승 키워드를 검출하는 일례를 설명하기 위한 작 업 흐름도이다.8 is a flowchart illustrating an example of detecting a rising keyword according to the present invention.

<도면의 주요 부분에 대한 부호의 설명><Explanation of symbols for the main parts of the drawings>

200 : 실시간 급상승 키워드 추출 시스템200: Real-time rising keyword extraction system

210 : 저장공간 220 : 로그 식별 수단210: storage space 220: log identification means

220 : 카운트 수단 240 : 시계열 생성 수단220: count means 240: time series generating means

250 : 횟수 추정 수단 260 : 키워드 검출 수단250: number estimation means 260: keyword detection means

본 발명은 키워드 입력성향의 시간적 특성을 고려하여 미래시점에서의 키워드별 검색횟수를 예측하고, 예측된 추정검색횟수를 이용하여 급상승 키워드를 실시간으로 검출하는 실시간 급상승 키워드 추출 방법 및 실시간 급상승 키워드 추출 시스템에 관한 것이다.The present invention predicts the number of searches for each keyword in the future in consideration of the temporal characteristics of the keyword input tendency, and uses the estimated estimated number of times to detect the rising keyword in real time. It is about.

검색 서비스를 제공하는 검색서버는 검색자로부터 키워드가 입력하면 해당 키워드에 응답하여 검색 결과(예컨대, 상기 키워드가 직접 포함된 웹사이트의 요약 정보, 상기 키워드가 직접 포함된 기사 정보, 상기 키워드를 직접 포함된 파일명 등)를 검색자에게 제공한다.When a search server providing a search service inputs a keyword from a searcher, the search server responds to the keyword in response to the keyword (eg, summary information of a website including the keyword directly, article information including the keyword directly, and the keyword directly). Provide the searcher with the filename it contains).

이러한 검색 서비스에 있어서 키워드는 다양하게 존재할 수 있으나, 실질적으로 키워드의 입력횟수에 기준하여 천여 개의 상위 키워드가 전체 검색횟수에서 차지하는 비율은 매우 높고, 나머지 키워드가 전체 검색횟수에서 차지하는 비율은 매우 낮다. 이는 검색횟수가 높은 상위 키워드 몇 개에 대해서 집중적으로 검색 결과를 제공하는 맞춤(Customization) 서비스를 통하여 검색자에게 더 높은 수준의 검색 서비스를 제공할 수 있다는 것을 의미한다.In such a search service, there may be various keywords, but in practice, the ratio of the thousands of top keywords to the total number of searches is very high, and the ratio of the remaining keywords to the total number of searches is very low, based on the number of inputs of the keywords. This means that a higher level of search service can be provided to searchers through a customization service that provides intensive search results for some of the top keywords with a high number of searches.

그런데, 이러한 상위 키워드를 분석해 보면, 1) 상시적으로 상위에 랭크되는 키워드(예를 들면, "게임", "고스톱", "연예인" 등)가 있는 반면, 2) 일정한 기간에 순위가 급상승하여 상위 키워드에 속하게 되는 키워드가 있을 수 있다.However, when analyzing the top keywords, 1) there is a keyword that is always ranked at the top (for example, "game", "go-stop", "celebrity", etc.), 2) the ranking rapidly rises over a certain period of time. There may be keywords belonging to the parent keyword.

상기 1)의 경우는 현재 대부분의 인터넷 검색 서비스 제공 업체에서 나름대로 최적화된 정보를 검색자에게 제공하고 있지만, 2)의 경우 기존 시스템에서는 급상승 키워드를 판단하기 위한 기준데이터가 상황에 맞게 변경하지 못하여 검색횟수가 급격하게 증가하는 급상승 키워드를 최적하게 식별하지 못하는 실정에 있다. 특히, 기존 시스템은 시간적 특성을 강하게 나타내는 키워드들을 급상승 키워드로 잘못 추출하는 경우가 빈번하였다. 예를 들어, 기존 시스템은 'OO은행'이라는 키워드가 특정 시간대에 검색이 다수 일어나기 때문에, 해당 시간대에서 항상 급상승 키워드로 검출하여 왔다. 즉, 상기 키워드 'OO은행'은 은행 업무 시작을 전후하여 급속히 검색이 일어날 수 있으며, 이에 따라 기존 시스템은 항상 오전 시간대에 상기 'OO은행'을 급상승 키워드로 판단하게 된다. 하지만, 매일 동일한 시간대에만 급격한 검색횟수의 증가를 보이는 해당 키워드를 급상승 키워드라 판단하는 것을 적절한 급상승 키워드의 판단이라 할 수 없다.In case of 1), most internet search service providers currently provide optimized information to searchers, but in case of 2) the existing system does not change the reference data for judging rising keywords accordingly. There is a situation in which it is not possible to optimally identify a rapidly rising keyword in which the number of times increases rapidly. In particular, the existing system frequently incorrectly extracts keywords indicating strong temporal characteristics as rising keywords. For example, in the existing system, since the keyword 'OO bank' is frequently searched in a specific time zone, it has always been detected as a rapidly rising keyword in that time zone. That is, the keyword 'OO bank' may be rapidly searched before and after the start of the banking business. Accordingly, the existing system always determines the 'OO bank' as a rapidly rising keyword in the morning time. However, judging the corresponding keyword showing a sudden increase in the number of searches only in the same time zone every day is not a proper keyword for the sudden rise.

따라서, 실시간 상황 및 시간적 요소가 고려된 기준 데이터를 생성하고, 기준데이터의 변경에 따른 급상승 판단 로직을 변경함으로써 현시점에서의 키워드 입 력성향을 실시간으로 반영하여 급상승 키워드를 검출할 수 있는 새로운 기술이 절실히 요청되어 왔다.Therefore, by generating reference data considering the real-time situation and temporal factors, and changing the sudden determination logic according to the change of the reference data, a new technology that can detect the rising keyword by reflecting the keyword input tendency at the present time in real time is introduced. It has been urgently requested.

본 발명은 상기와 같은 문제점을 해결하기 위하여 안출된 것으로서, 키워드의 현재 입력성향을 시간적으로 고려하여 키워드에 대한 시계열 자료를 생성하고, 생성된 시계열 자료를 이용하여 각 키워드별로 미래 시점에서의 추정검색횟수를 실시간으로 예측할 수 있는 실시간 급상승 키워드 추출 방법 및 실시간 급상승 키워드 추출 시스템을 제공하는 것을 목적으로 한다.The present invention has been made to solve the above problems, generating time series data for keywords in consideration of the current input tendency of the keywords, and using the generated time series data estimated retrieval at each time point for each keyword An object of the present invention is to provide a real-time rising keyword extraction method and a real-time rising keyword extraction system capable of predicting the number of times in real time.

또한, 본 발명은 예측된 추정검색횟수를 이용하여 소정 시간대에서의 검색횟수 변화를 고려한 급상승 키워드를 추출함으로써 키워드에 대한 입력 성향을 정확하게 모니터링 할 수 있는 환경을 마련하는 실시간 급상승 키워드 추출 방법 및 실시간 급상승 키워드 추출 시스템을 제공하는 것을 목적으로 한다.In addition, the present invention provides a real-time spike keyword extraction method and real-time spike to provide an environment that can accurately monitor the input tendency for the keyword by extracting the rising keyword in consideration of the change in the number of searches in a predetermined time period using the estimated estimated number of searches It is an object to provide a keyword extraction system.

또한, 본 발명은, 일정한 기간 동안의 키워드 입력성향을 집계한 이후에야 급상승 키워드를 판단하는 종래의 판단 방식에서 탈피하고, 현시점의 키워드 입력성향을 실시간으로 반영하여 급상승 키워드를 검출할 수 있는 실시간 급상승 키워드 추출 방법 및 실시간 급상승 키워드 추출 시스템을 제공하는 것을 목적으로 한다.In addition, the present invention is to escape from the conventional determination method for determining the rising keyword only after the keyword input tendency for a certain period of time, and real-time rapid rise that can detect the rising keyword by reflecting the current keyword input tendency in real time It is an object of the present invention to provide a keyword extraction method and a real-time rising keyword extraction system.

또한, 본 발명은 검색자로부터의 입력횟수가 급상승하는 키워드를 신속하게 검출함으로써, 검색횟수의 증가 및 검색자의 접속 시간을 증가시켜 인터넷 검색 서비스 제공 업체로 하여금 영업 이익을 증대할 수 있도록 하는 실시간 급상승 키워 드 추출 방법 및 실시간 급상승 키워드 추출 시스템을 제공하는 것을 목적으로 한다.In addition, the present invention rapidly detects a keyword that surges in the number of times input from the searcher, thereby increasing the number of searches and the access time of the searcher to increase the operating profit to the Internet search service provider to increase the operating profit It is an object of the present invention to provide a keyword extraction method and a real-time rising keyword extraction system.

상기의 목적을 이루기 위한 본 발명에 따른 실시간 급상승 키워드 추출 방법은, 검색 서버로부터 수집한 로그정보를 수집일 및 수집시간에 대응하여 저장공간에 기록하는 단계와, 수집시간 t에서부터 상기 t 이전 수집시간 T까지의 각 수집시간에 대해, 수집일 n에서부터 상기 n 이전의 수집일 N까지의 수집일에 대응하는 저장공간을 식별하는 단계와, 상기 식별된 저장공간에 기록된 로그정보를 분석하여 키워드 각각에 대한 검색횟수를, 수집시간 별로 카운트 함으로써 각 키워드의 시계열 검색횟수를 생성하는 단계와, 상기 생성된 시계열 검색횟수를 이용하여 미래시점의 키워드별 추정검색횟수를 연산하는 단계, 및 상기 연산된 추정검색횟수를 이용하여 급상승 키워드를 검출하는 단계를 포함하는 것을 특징으로 한다.Real-time rising keyword extraction method according to the present invention for achieving the above object, the step of recording the log information collected from the search server in the storage space corresponding to the collection date and collection time, the collection time t to the previous collection time t For each collection time up to T, identifying the storage space corresponding to the collection date from the collection date n to the collection day N before the n, and analyzing the log information recorded in the identified storage space, each keyword Generating a time series search number of each keyword by counting the number of search times for each collection time, calculating an estimated search number for each keyword at a future time using the generated time series search number, and the calculated estimation And detecting a sudden keyword by using the number of searches.

또한, 상기 목적을 달성하기 위한 본 발명의 기술적 구성으로서 실시간 급상승 키워드 추출 시스템은, 검색 서버로부터 수집한 로그정보를 수집일 및 수집시간에 대응하여 기록하는 저장공간과, 수집시간 t에서부터 상기 t 이전 수집시간 T까지의 각 수집시간에 대해, 수집일 n에서부터 상기 n 이전의 수집일 N까지의 수집일에 대응하는 저장공간을 식별하는 로그 식별 수단과, 상기 식별된 저장공간 각각에 기록된 로그정보를 분석하여 키워드 각각에 대한 검색횟수를, 수집시간 별로 카운트하는 카운트 수단과, 상기 카운트된 검색횟수를 통해, 상기 키워드 각각에 대한 시계열 검색횟수를 생성하는 시계열 생성 수단과, 상기 생성된 시계열 검색횟수 를 이용하여 미래시점의 키워드별 추정검색횟수를 연산하는 횟수 추정 수단, 및 상기 연산된 추정검색횟수를 이용하여 급상승 키워드를 검출하는 키워드 검출 수단을 포함하는 것을 특징으로 한다.In addition, the real-time fast rising keyword extraction system as a technical configuration of the present invention for achieving the above object, the storage space for recording the log information collected from the search server corresponding to the collection date and collection time, and from the collection time t before the t Log identification means for identifying a storage space corresponding to a collection date from a collection date n to a collection date N before the n, for each collection time up to a collection time T, and log information recorded in each of the identified storage spaces. A counting means for counting the number of searches for each keyword by the collection time, time series generating means for generating a time series search frequency for each of the keywords through the counted search times, and the generated time series search frequency. A number estimating means for calculating an estimated number of searches for each keyword of a future time point using the symbol, and the calculated estimated search Using the number is characterized by comprising keyword detection means for detecting a rising keyword.

이하, 첨부된 도면을 참조하여 실시간 급상승 키워드 추출 방법 및 실시간 급상승 키워드 추출 시스템에 대하여 설명한다.Hereinafter, a real-time rising keyword extraction method and a real-time rising keyword extraction system will be described with reference to the accompanying drawings.

본 명세서에서 지속적으로 사용되는 "급상승 키워드"는 검색 서비스를 제공하는 검색 서버에 입력되는 키워드 중에서 평소에 입력되던 검색횟수에 비해 현시점에서 급격하게 검색횟수가 증가한 키워드를 지칭할 수 있다. 이러한 급상승 키워드는 검색 서버에 접속하는 검색자에게, 다른 검색자의 키워드 입력성향을 인지시켜 평소에 비해 급격히 관심을 모우는 키워드가 무엇인가에 대한 정보를 제공하는 역할을 한다.The term " surge keyword " which is continuously used in the present specification may refer to a keyword in which the number of searches is rapidly increased at this time compared to the number of searches that are normally input among keywords input to a search server providing a search service. Such a rising keyword serves to provide a searcher accessing a search server with information on a keyword that attracts interest more rapidly than usual by recognizing a keyword input tendency of other searchers.

특히 본 명세서에서는, 날짜와 시간을 모두 고려하여 선별되는 시계열(time series) 자료의 기준데이터를 통해 미래시점에서의 키워드 검색횟수를 추정, 예측하고, 예측된 검색횟수를 이용하여 급격하게 검색횟수가 증가한 키워드를 급상승 키워드로 추출하고 있다.In particular, in the present specification, keyword search frequency is estimated and predicted in the future through reference data of time series data that is selected in consideration of both date and time, and the number of searches is abruptly using the predicted search frequency. The increased keyword is being extracted as a rising keyword.

이를 통해, 본 실시예에서는 날짜 뿐만 아니라 특정 시간에 대해 검색 패턴을 분석하고, 분석 결과 평소와 상이한 검색 패턴으로 검색횟수가 급격히 증가하는 키워드를 정확하게 식별하고 있다.Through this, in this embodiment, the search pattern is analyzed not only for the date but also for a specific time, and as a result of the analysis, a keyword that rapidly increases the number of searches by a different search pattern is identified correctly.

더불어, 본 발명에 의하면 급상승 키워드의 검색자 제공에 따라, 상기 검색자로 하여금 급상승 키워드를 한번쯤 입력하도록 유도할 수 있고, 검색횟수 및 검 색시간을 증가시켜 보다 많은 영업적 이익이 창출되도록 할 수 있다.In addition, according to the present invention, according to the provision of the search keyword of the rising keyword, it is possible to induce the searcher to enter the rising keyword once, and to increase the number of searches and search time to be made more business profits have.

실시간 급상승 키워드 추출 시스템(100)은 평소 검색 패턴에 비하여 검색횟수가 급격하게 증가한 것으로 판단되는 키워드를 급상승 키워드로서 검출하고, 검출된 급상승 키워드를 급상승 정도(예, 급상승 지수의 크기)에 따라 배열하여 각 검색 서버(110)에 제공하는 역할을 한다.The real-time rising keyword extraction system 100 detects the keyword that is determined to have increased the number of searches drastically compared to the usual search pattern as the rising keyword, and arranges the detected rising keyword according to the rising degree (eg, the magnitude of the rising index). It serves to provide to each search server 110.

특히, 실시간 급상승 키워드 추출 시스템(100)은 특정 날짜 또는 특정 시간에서의 키워드 입력성향을 고려하여 미래의 검색횟수를 예측하고, 상기 예측된 검색횟수를 실시간으로 반영하여 신속하면서도 정확하게 급상승 키워드를 검출해 낼 수 있도록 한다.In particular, the real-time rising keyword extraction system 100 predicts the future number of searches in consideration of keyword input tendency on a specific date or a specific time, and detects the rising keyword quickly and accurately by reflecting the predicted number of searches in real time. Make it work.

우선, 검색 서버(110)는 검색자(120)가 찾고자 하는 콘텐츠 자료를 보유하는 웹사이트로의 접속이 용이하도록 검색 서비스를 지원하는 서치 프로그램 또는 검색 엔진 등을 지칭할 수 있다. 즉, 검색 서버(110)는 검색자(120)의 검색 요청에 응답하여, 검색자(120)가 요구하는 정보를 제공할 수 있는 소정의 콘텐츠 공급자(CP)에 대한 간략 정보를 제공함으로써 콘텐츠 자료 검색에 소요되는 시간을 절약하며 검색되는 자료의 정확도를 높이는 역할을 수행한다.First, the search server 110 may refer to a search program or a search engine that supports a search service so that the searcher 120 can easily access a web site that holds a content material to be searched for. That is, the search server 110 responds to the search request of the searcher 120 to provide content information by providing brief information about a predetermined content provider (CP) that can provide the information requested by the searcher 120. It saves time spent searching and improves the accuracy of the searched data.

여기서 검색 요청은 검색 서버(110)에 접속한 검색자(120)가 소정의 검색창에 검색용 키워드를 입력함에 따라 발생할 수 있으며, 검색 서버(110)는 입력된 검색용 키워드에 대응하는 소정 콘텐츠 공급자(CP)에 대한 간략 정보를 검색하여 검 색자(120)에게 제공할 수 있다.The search request may be generated when the searcher 120 connected to the search server 110 inputs a search keyword in a predetermined search box, and the search server 110 may select content corresponding to the input search keyword. The brief information on the supplier CP may be searched and provided to the searcher 120.

특히, 검색 서버(110)는 검색자(120)가 키워드를 입력한 것에 대한 이력 기록으로서 로그정보를 생성하여 보유할 수 있다. 상기 로그정보에는 검색자(120)에 의해 입력이 이루어진 검색용 키워드, 또는 상기 검색용 키워드가 입력된 시간 등에 관한 정보를 포함할 수 있으며, 검색 서버(110)는 검색용 키워드가 입력될 때마다 로그정보를 생성하거나 갱신 기록하는 처리를 수행할 수 있다.In particular, the search server 110 may generate and maintain log information as a history record of the searcher 120 inputting a keyword. The log information may include information about a search keyword inputted by the searcher 120 or a time when the search keyword is input, and the search server 110 may be input whenever a search keyword is input. A process of generating or updating and recording log information can be performed.

이에 따라, 상기 로그정보에는 검색자(120)가 입력한 검색용 키워드, 상기 키워드의 입력 시간 등에 관한 정보를 포함할 수 있다.Accordingly, the log information may include information about a search keyword input by the searcher 120, an input time of the keyword, and the like.

검색 서버(110)는 생성되는 로그정보를 메모리 수단(도시하지 않음)에 기록해두고, 본 발명의 실시간 급상승 키워드 추출 시스템(100)에서 상기 메모리 수단에 기록된 로그정보를 소정 시간 간격으로 리드(read)함에 따라 삭제하고 차후 생성될 로그정보의 기록에 대비할 수도 있다.The search server 110 records the generated log information in a memory means (not shown), and reads the log information recorded in the memory means at a predetermined time interval in the real-time spike keyword extraction system 100 of the present invention. In this case, it can be deleted and prepared for recording of log information to be created later.

검색자(120)는 검색 서버(110)와의 접속을 위한 사용자 단말기(130)를 보유하며, 검색하고자 하는 콘텐츠 자료를 보유하는 콘텐츠 공급자(CP)의 웹사이트에 대한 검색 요청을, 검색 서버(110)의 검색창에 검색용 키워드를 입력하여 생성하는 인터넷 이용자를 의미할 수 있다. 또한, 검색자(120)는 본 발명의 실시간 급상승 키워드 추출 시스템(100)에서 제공하는 급상승 키워드 리스트를, 검색 서버(110)를 통해 제공 받을 수 있어, 평소에 비해 급격히 인기를 모우는 키워드에 대한 정보를 습득할 수 있다.The searcher 120 has a user terminal 130 for accessing the search server 110, and makes a search request for a website of a content provider (CP) that holds a content material to be searched. May refer to an Internet user generated by inputting a keyword for search in the search box. In addition, the searcher 120 may be provided through the search server 110, the rising keyword list provided by the real-time rising keyword extraction system 100 of the present invention, for a keyword that is rapidly gaining popularity than usual Learn information.

사용자 단말기(130)는 인터넷 등의 통신망(140)을 통해 실시간 급상승 키워 드 추출 시스템(100)과의 접속 상태를 유지하며, 실시간 급상승 키워드 추출 시스템(100)에 의해 급상승 키워드로 검출된 키워드를 시각화하여 검색자(120)에게 제공할 수 있다.The user terminal 130 maintains a connection state with the real-time rising keyword extraction system 100 through a communication network 140 such as the Internet, and visualizes the keyword detected as the rising keyword by the real-time rising keyword extraction system 100. May be provided to the searcher 120.

실시간 급상승 키워드 추출 시스템(100)은 하나 이상의 검색 서버(110)로부터 정해진 시간 간격으로 로그정보를 지속적으로 수집하고, 수집된 로그정보를 가공, 분석하여 미래의 추정검색횟수를 예측하며, 예측된 추정검색횟수를 이용하여 급격하게 검색횟수가 증가할 키워드를 급상승 키워드로서 추출할 수 있다. 이하, 도 2를 참조하여 본 발명의 실시간 급상승 키워드 추출 시스템(200)의 구체적인 구성을 설명한다.The real-time rising keyword extraction system 100 continuously collects log information from one or more search servers 110 at predetermined time intervals, and processes and analyzes the collected log information to predict future estimated search times and predicted predictions. The search frequency may be used to extract keywords that will rapidly increase the number of searches as rising keywords. Hereinafter, a detailed configuration of the real-time rising keyword extraction system 200 of the present invention will be described with reference to FIG. 2.

이하, 도 2를 참조하여 본 발명의 실시간 급상승 키워드 추출 시스템(200)의 구체적인 구성을 설명한다.Hereinafter, a detailed configuration of the real-time rising keyword extraction system 200 of the present invention will be described with reference to FIG. 2.

본 발명의 실시간 급상승 키워드 추출 시스템(200)은 저장공간(210), 로그 식별 수단(210), 카운트 수단(230), 시계열 생성 수단(240), 횟수 추정 수단(250) 및 키워드 검출 수단(260)을 포함할 수 있다.Real-time rising keyword extraction system 200 of the present invention is a storage space 210, log identification means 210, counting means 230, time series generating means 240, number estimation means 250 and keyword detection means 260 ) May be included.

저장공간(210)은 검색 서버(110)로부터 수집한 로그정보를 수집일 및 수집시간에 대응하여 기록한다. 실시간 급상승 키워드 추출 시스템(200)은 소정 시간 간격으로 검색 서버(110)로부터 로그정보를 리드하고, 상기 시간(수집일 및 수집시간)에 대응하는 저장공간(210)으로 상기 리드한 로그정보를 기록, 유지시킨다. 예 컨대, 소정 시간 3월5일 15:00:00에 리드되는 로그정보는, 상기 수집일 '3월5일' 및 수집시간 '15:00:00'에 대응하는 논리적, 물리적 기록 수단인 저장공간(210)에 기록될 수 있다. 여기서, 로그정보를 리드하는 시간 간격은 본 시스템의 운영자에 의해 유연하게 설정될 수 있으며, 예컨대 급상승 키워드를 보다 정확하게 검출 가능하도록 허용되는 최소 시간 간격으로 설정될 수 있다.The storage space 210 records log information collected from the search server 110 in response to a collection date and a collection time. The real-time rising keyword extraction system 200 reads log information from the search server 110 at predetermined time intervals, and records the read log information into the storage space 210 corresponding to the time (collection date and collection time). , Keep. For example, the log information read at 15:00:00 on March 5 of the predetermined time is stored as a logical and physical recording means corresponding to the collection date 'March 5' and the collection time '15: 00: 00 '. It can be recorded in the space 210. Here, the time interval for reading the log information can be set flexibly by the operator of the present system, for example, it can be set to the minimum time interval allowed to more accurately detect the rising keyword.

로그정보의 저장공간(210) 기록에 있어서, 실시간 급상승 키워드 추출 시스템(200)은 리드된 로그정보를 날짜별로 또는 시간대별로 구분하여 정렬하고, 상기 날짜/시간에 대응하는 저장공간(210)으로 로그정보를 기록할 수 있다. 이에 따라 특정 시간에 대응하는 저장공간(210)의 식별이 보다 명료하게 이루어지도록 하는 환경을 마련할 수 있다.In recording the storage information 210 of the log information, the real-time rising keyword extraction system 200 sorts the read log information by date or by time zone, and logs the data into the storage space 210 corresponding to the date / time. Information can be recorded. Accordingly, it is possible to provide an environment for identifying the storage space 210 corresponding to a specific time more clearly.

로그 식별 수단(210)은 수집시간 t에서부터 상기 t 이전 수집시간 T까지의 각 수집시간에 대해, 수집일 n에서부터 상기 n 이전의 수집일 N까지의 수집일에 대응하는 저장공간(210)을 식별한다. 즉, 로그 식별 수단(210)은 특정 시간에 대해 날짜별 검색횟수를 확인하기 위해 분석대상으로서의 로그정보를 식별하는 역할을 하는 것으로, 슬라이딩 윈도우(Sliding Window)를 이용하여 특정 시간대와 관련한 저장공간(210)을 식별할 수 있다. 슬라이딩 윈도우는 시간 흐름에 따라 분석 대상이 변경되도록 하는 것으로, 본 시스템의 운영자에 의해 설계된 시간폭을 유지하며, 상기 시간폭 이내에 포함되는 시간대의 저장공간(210)을 식별할 수 있도록 한다.The log identifying means 210 identifies the storage space 210 corresponding to the collection date from the collection date n to the collection date N before the n for each collection time from the collection time t to the collection time T before the t. do. That is, the log identification unit 210 serves to identify log information as an analysis target in order to check the number of times of searches for a specific time. The storage space associated with a specific time zone is provided using a sliding window ( 210 can be identified. The sliding window allows the object to be analyzed to change over time, and maintains the time width designed by the operator of the present system, and makes it possible to identify the storage space 210 of the time zone included within the time width.

예컨대, 시간 15:00:00에서부터 15:02:00까지 5sec 단위의 수집시간에 대응 하여 로그정보를 기록하는 저장공간(210)에 대해, 시간폭 30sec를 갖는 슬라이딩 윈도우를 이용하여 분석 대상을 식별하는 경우, 로그 식별 수단(210)은 기준 시점 15:02:00을 기준으로 지난 30sec 이내에 발생한 로그정보를 저장하는 저장공간들(시간 15:02:00, 15:01:55 ～ 15:01:30에 각각 대응하는 저장공간(210))을 식별할 수 있다.For example, an analysis target is identified using a sliding window having a time width of 30 sec for the storage space 210 that records log information corresponding to a collection time in units of 5 sec from 15:00:00 to 15:02:00. In this case, the log identifying means 210 stores storage spaces for storing log information generated within the last 30 sec based on the reference time point 15:02:00 (time 15:02:00, 15:01:55 to 15:01: Storage spaces 210 respectively corresponding to 30 may be identified.

또한, 로그 식별 수단(210)은 소정 기준 날짜를 기준으로 하여 설정된 날짜에 대해 특정 수집시간에 대응하는 저장공간(210)을 더 식별할 수 있다. 상술한 예에서 로그 식별 수단(210)은 기준 날짜 3월5일을 기준으로 이전 2일간, 즉 3월4일 및 3월3일에 대응하며 상기 식별된 수집시간 15:02:00, 15:01:55 ～ 15:01:30에 각각 대응하는 저장공간(210)을 식별할 수 있다.In addition, the log identification means 210 may further identify the storage space 210 corresponding to the specific collection time for the date set on the basis of the predetermined reference date. In the above-described example, the log identification means 210 corresponds to the previous two days, that is, March 4 and March 3, based on the March 5 of the reference date, and the identified collection time 15:02:00, 15: The storage spaces 210 corresponding to 01:55 to 15:01:30 may be identified.

이에 따라, 로그 식별 수단(210)은 수집일 3월5일, 3월4일, 3월3일에 대해, 수집시간 15:02:00, 15:01:55 ～ 15:01:30에 각각 대응하는 저장공간을 식별하고, 분석 대상으로서의 로그 정보를 결정할 수 있다.Accordingly, the log identification means 210 is the collection time 15:02:00, 15:01:55 to 15:01:30 for the March 5, March 4, March 3 collection date, respectively Corresponding storage spaces can be identified and log information as an analysis target can be determined.

카운트 수단(230)은 식별된 저장공간(210) 각각에 기록된 로그정보를 분석하여 키워드 각각에 대한 검색횟수를, 수집시간 별로 카운트한다. 즉, 카운트 수단(230)은 슬라이딩 윈도우에 의해 식별된 저장공간(210)으로부터 로그정보를 추출하고, 추출된 로그정보를 구성하는 키워드가 저장공간(210)과 연관된 수집시간에서 입력이 이루어진 검색횟수를 카운트한다. 이에 따라, 본 발명의 실시간 급상승 키워드 추출 시스템(200)은 소정 수집시간에서 입력이 이루어지는 특정 키워드의 검색횟수를 인지할 수 있게 된다. 카운트 수단(230)에 의해 카운트된 검색횟수는, 설정된 수집일 동안(예, 3월3일부터 3월5일까지)의 입력된 횟수를 합산한 형태의 수치이거나 또는 일별 평균한 수치의 형태일 수 있다.The counting unit 230 analyzes log information recorded in each of the identified storage spaces 210 and counts the number of searches for each keyword by collection time. That is, the counting means 230 extracts log information from the storage space 210 identified by the sliding window, and the number of searches in which a keyword constituting the extracted log information is input at a collection time associated with the storage space 210. Counts. Accordingly, the real-time rising keyword extraction system 200 of the present invention can recognize the number of times of searching for a specific keyword that is input at a predetermined collection time. The number of searches counted by the counting means 230 may be in the form of a sum of the number of inputs during the set collection day (for example, March 3 to March 5) or in the form of a daily average value. Can be.

시계열 생성 수단(240)은 카운트된 검색횟수를 통해, 키워드 각각에 대한 시계열 검색횟수를 생성한다. 즉, 시계열 생성 수단(240)은 특정 키워드에 대해 연속되는 각 시간에서의 검색횟수를 배열하여 시계열 검색횟수를 생성하는 역할을 한다. 예컨대, 연속되는 수집시간 15:02:00, 15:01:55 ～ 15:01:30에 키워드 '학교'가 입력된 검색횟수가 각각 '100회, 120회 ～ 40회'일 경우, 시계열 생성 수단(240)은 상기 검색횟수 '100회, 120회 ～ 40회'를 키워드 '학교'에 대한 시계열 검색횟수로서 생성할 수 있다.The time series generating unit 240 generates a time series search number for each keyword through the counted search times. That is, the time series generating unit 240 generates a time series retrieval frequency by arranging retrieval times at each successive time for a specific keyword. For example, if the number of search times in which the keyword 'school' is entered at 15:02:00 and 15:01:55 to 15:01:30 is 100, 120 and 40, respectively, a time series is generated. The means 240 may generate the search frequency '100 times, 120-40 times' as a time series search frequency for the keyword 'school'.

횟수 추정 수단(250)은 생성된 시계열 검색횟수를 이용하여 미래시점의 키워드별 추정검색횟수를 연산한다. 즉, 횟수 추정 수단(250)은 시간적 흐름에 따른 시계열 자료(검색횟수)를 활용하여 장래에 발생할 일별 검색횟수를 예측하는 역할을 한다.The number estimating means 250 calculates an estimated number of searches for each keyword in the future using the generated time series search count. That is, the number estimating means 250 serves to predict the number of daily searches that will occur in the future by using time series data (number of searches) according to time flow.

추정검색횟수의 연산에 있어서, 횟수 추정 수단(250)은 과거 n일간의 T(q)(T(q): 키워드의 시계열 검색횟수)를 이용하여 현 시점 'k+1'에서 발생할 검색횟수 Tk+1(q)를 예측할 수 있다. 횟수 추정 수단(250)이 활용할 수 있는 추정검색횟수의 예측 방식은 다양할 수 있으며, 본 시스템의 운영자는 키워드의 시계열 자료 특성을 최적하게 유추할 수 있는 예측 방식을 선정하여 추정검색횟수가 연산되도록 할 수 있다.In the calculation of the estimated search frequency, the number estimation means 250 uses T (q) (T (q): time series search frequency of the keyword) of the past n days to search frequency Tk to occur at the current time 'k + 1'. We can predict +1 (q). The estimation method of the estimated search number that may be utilized by the number estimating means 250 may vary, and the operator of the present system selects a prediction method that can optimally infer the characteristics of the time series data of the keyword so that the estimated search number is calculated. can do.

본 실시예에서는 실제 시스템으로의 구현이 용이하며, O(n) 시간 이내에서 추정검색횟수를 예측할 수 있는 '지수평활법'을 이용하는 것을 예시하여 설명한다. 단 추정검색횟수의 예측 방식으로 상기 지수평활법 이외에 ARIMA 방법, EWMA 방법 등이 유연하게 사용될 수 있음은 물론이다.In the present embodiment, an implementation of an actual system is easy, and an example of using an 'exponential smoothing method' that can predict an estimated number of searches within O (n) time will be described. However, in addition to the exponential smoothing method, the ARIMA method, the EWMA method, etc. may be flexibly used as a prediction method of the estimated search frequency.

시계열 자료가 시간에 따른 변화 가능한 미지의 모수를 갖는 모형을 따른다고 가정한다.Suppose that time series data follow a model with unknown parameters that can change over time.

삭제delete

1부터 n시점까지 관측된 값이 존재한다면, 횟수 추정 수단(250)은 l시점 이후의 추정검색횟수를 식 1을 만족하여 연산한다.If there is a value observed from 1 to n time points, the number estimating means 250 calculates the estimated number of searches after 1 time by satisfying equation (1).

여기서 c는 가중값의 합이 1이 되게 해주는 정규화상수이며, n이 충분히 커질 경우 w와 같은 값이 된다. 이 경우 횟수 추정 수단(250)은 추정검색횟수를 식 2를 만족하여 연산할 수 있다.Where c is the normal constant that makes the sum of the weights equal to 1. If n is large enough, it is equal to w. In this case, the number estimating means 250 may calculate the estimated number of searches by satisfying Equation 2.

여기서 w는 평활상수이며 대개 0.05부터 0.3 사이의 값을 이용할 수 있다. 본 실시예에서는 본 출원인이 실험과 통계를 통해 최적한 것으로 판단한 0.3에 근접하는 값을 상기 w로서 활용하고 있다.Where w is a smoothing constant and values between 0.05 and 0.3 are usually available. In the present embodiment, a value close to 0.3 determined by the applicant through the experiment and statistics is utilized as the w.

또한 횟수 추정 수단(250)은 지수평활법으로 예측한 추정검색횟수의 오차범위를 추정하기 위하여 SSE를 이용한다. 상기 SSE는 각 시점에서 예측한 추정검색횟수와 실제 측정한 검색횟수 사이의 오차를 제곱하여 더한 값으로, 분산 또는 표준편차와 같은 의미를 갖는다. 횟수 추정 수단(250)은 식 3을 만족하여 연산할 수 있다.In addition, the number estimating means 250 uses the SSE to estimate the error range of the estimated search frequency predicted by the exponential smoothing method. The SSE is a sum of squared errors between an estimated number of searches predicted at each time point and an actual number of searches, and has the same meaning as variance or standard deviation. The number estimating means 250 may calculate by satisfying Equation 3.

또한, 횟수 추정 수단(250)은 연산된 SSE의 평균값을 이용하여 표준적인 편차 σ을 구할 수 있다. 상기 σ은 후술하는 급상승 지수에 이용될 수 있다. 횟수 추정 수단(250)은 식 4를 만족하여 연산할 수 있다.In addition, the number estimating means 250 may obtain a standard deviation σ using the calculated average value of SSE. The above sigma may be used for the sudden index to be described later. The number estimating means 250 may calculate by satisfying Equation 4.

도 3에서는 키워드 'OO은행'에 대해 지수평활법을 이용하여 예측한 미래시점에서의 추정검색횟수 및 실제 측정한 검색횟수를 비교하고 있다.In FIG. 3, the estimated number of searches in the future and the number of searches measured in the future are compared using the index smoothing method for the keyword 'OO bank'.

도 3에 도시한 바와 같이, 지수평활법을 이용하여 연산된 1시차 이후의 추정검색횟수는, 실제 측정된 검색횟수와 대략적으로 일치하고 있다.As shown in Fig. 3, the estimated search frequency after one parallax calculated using the exponential smoothing method is approximately equal to the actual measured search frequency.

예컨대, 소정 미래시점 '45'에 대해, 횟수 추정 수단(250)은 실제 측정한 약 '3,200회'와 허용된 오차범위 이내에 있는 '3,100회'를 추정검색횟수로 예측하고 있다. 이를 통해, 본 실시예에서 시계열 자료를 활용한 지수평활법을 통해 소정 미래시점에서의 검색횟수가 비교적 정확하게 예측되고 있음을 알 수 있다.For example, for a predetermined future time '45', the number estimating means 250 predicts about '3,200 times' actually measured and '3,100 times' within an allowable error range as an estimated search frequency. Through this, it can be seen that the number of retrieval at a certain future point is predicted relatively accurately through the exponential smoothing method using time series data in this embodiment.

키워드 검출 수단(260)은 연산된 추정검색횟수를 이용하여 급상승 키워드를 검출한다. 즉, 키워드 검출 수단(260)은 연산된 추정검색횟수를 참조하여 키워드 각각에 대한 급상승 지수를 산출하고, 산출된 급상승 지수를 이용하여 급격하게 검색횟수가 증가한 키워드를 추출하는 역할을 한다.The keyword detecting means 260 detects the rapidly rising keyword by using the calculated estimated number of searches. That is, the keyword detecting means 260 calculates a spike index for each keyword by referring to the calculated estimated search frequency, and extracts a keyword whose search frequency is rapidly increased by using the calculated spike index.

상기 급상승 지수는 식 5로 표기할 수 있으며, 키워드 검출 수단(260)은 급상승 지수가 상대적으로 높은 키워드를 선별하여 급상승 키워드로서 추출할 수 있다.The spike index may be expressed by Equation 5, and the keyword detecting unit 260 may select a keyword having a relatively high spike index and extract the keyword as the spike keyword.

급상승 지수의 산출에 있어서, 키워드 검출 수단(260)은 키워드와 관련한 팩터를 연산하고, 키워드의 입력 현황을 고려하여 상기 연산된 팩터 각각에 적용할 계수(가중치)의 크기를 결정한다. 예컨대, 키워드 검출 수단(260)은 연산된 추정검색횟수의 현황에 따라 팩터에 적용되는 계수(가중치)의 크기를 자동적으로 조정할 수 있다In calculating the sudden increase index, the keyword detecting means 260 calculates a factor related to the keyword and determines the magnitude of the coefficient (weighting value) to be applied to each of the calculated factors in consideration of the input status of the keyword. For example, the keyword detecting means 260 may automatically adjust the magnitude of the coefficient (weighting value) applied to the factor according to the calculated estimated search count.

키워드 검출 수단(260)에 의해 연산되는 팩터로는 상기 식 5에서와 같이, 급상승 관련 팩터인 팩터 DPA 및 팩터 SQD와, 전체검색횟수 관련 팩터인 팩터 MPR 및 CPM을 예시할 수 있다.As the factor calculated by the keyword detecting means 260, as shown in Equation 5, the factors related to the sudden rise factor DPA and factor SQD, and the factors MPR and CPM that are related to the total number of retrieval may be exemplified.

예를 들어, 추정검색횟수를 연산한 키워드의 총 개수가 선정된 수 이하일 경우, 키워드 검출 수단(260)은 분석 대상이 되는 키워드 모두에 대해 검색횟수가 높게 나타날 것으로 판단하고, 급격하게 검색횟수가 증가한 키워드를 위주로 급상승 키워드를 추출할 수 있다. 이에 따라, 키워드 검출 수단(260)은 급상승 관련 팩터에 적용하는 계수(가중치)의 크기가 전체검색횟수 관련 팩터에 적용하는 계수(가중치)의 크기보다 상대적으로 높게 되도록 제어할 수 있다. 즉, 키워드 검출 수단(260)은 급상승 관련 팩터인 팩터 DPA 및 팩터 SQD가 보다 강조되도록 가중치의 크기를 결정할 수 있다.For example, if the total number of keywords for which the estimated number of searches is calculated is less than or equal to the selected number, the keyword detecting means 260 determines that the number of searches is high for all the keywords to be analyzed, and the number of searches is abruptly increased. Rising keywords can be extracted based on the increased keywords. Accordingly, the keyword detecting means 260 may control the size of the coefficient (weighting value) applied to the sudden increase factor to be relatively higher than the size of the coefficient (weighting value) applied to the overall search frequency related factor. That is, the keyword detecting means 260 may determine the magnitude of the weight so that the factor DPA and the factor SQD, which are related to the spike, are more emphasized.

상기 팩터 DPA는

를 만족하여 연산될 수 있으며, 키워드 검출 수단(260)은 추정검색횟수를 산출한 키워드 각각에 대한 실시간 검색횟수(R(query)) 및 예측한 추정검색횟수(Exp(query))를 카운트하고, 카운트된 추정검색횟수(R(query))와 추정검색횟수(Exp(query))와의 차이값을, 상기 추정검색횟수(Exp(query))로 나누어 팩터 DPA을 연산할 수 있다. 팩터 DPA는 특정 쿼리의 실시간 검색횟수와 기준 검색횟수의 차이를 기준 검색횟수로 나누어 준 값이다. 즉, 팩터 DPA는 평균적으로 검색되는 횟수와 현재 검색되는 횟수와의 차이를 정규화한 값이다. 상기 값이 클수록 평균적 검색횟수를 상회하여 많이 검색되고 있음을, 즉 급상승되는 키워드 임을 나타내고 있다.The factor DPA is

The keyword detecting means 260 counts the real-time search count (R (query)) and the estimated estimated search count (Exp (query)) for each of the keywords for which the estimated search count is calculated. The factor DPA may be calculated by dividing a difference value between the estimated estimated search count R (query) and the estimated search count Exp (query) by the estimated search count Exp (query). The factor DPA is a value obtained by dividing the difference between the real time search count and the reference search count of a specific query by the reference search count. That is, the factor DPA is a value obtained by normalizing the difference between the number of times the average number of searches and the number of times the current number of searches. The larger the value is, the more the average number of searches is being searched, that is, the keyword is rapidly rising.

상기 팩터 SQD는

을 만족하며 연산될 수 있으며, 키워드 검출 수단(260)은 추정검색횟수를 산출한 각 키워드에 대한 변량을 계산하고, 계산된 변량을 특정 쿼리의 오차 범위 (σ(query)로 나누어 팩터 SQD를 연산한다. 여기서 변량이란 실시간 검색횟수 (R(query))와 추정검색횟수(Exp(query))와의 차이값을 지칭하는 값이다. 팩터 SQD는 변량을 표준적인 편차로 나누어준 값으로서, 표준 정규 분포상에서의 위치를 나타낼 수 있다. 상기 팩터 SQD의 크기가 클수록 급상승 정도가 크다는 것을 의미할 수 있다.The factor SQD is

The keyword detection means 260 calculates the variance for each keyword for which the estimated number of search is calculated, and calculates the factor SQD by dividing the calculated variance by the error range σ (query). Here, the variable is a value that refers to the difference between the real-time search count (R (query)) and the estimated search count (Exp (query)) The factor SQD is a standard normal distribution obtained by dividing the variable by the standard deviation The larger the size of the factor SQD, the greater the degree of sudden increase.

반면, 추정검색횟수를 연산한 키워드의 총 개수가 선정된 수 이상일 경우, 키워드 검출 수단(260)은 분석 대상이 되는 키워드에 대한 검색횟수가 대부분 낮게 형성될 것으로 판단하고 검색횟수가 상대적으로 높은 키워드를 식별하여 급상승 키워드를 추출할 수 있다. On the other hand, if the total number of keywords calculated by the estimated number of search times is greater than or equal to the selected number, the keyword detecting means 260 determines that the number of searches for the keyword to be analyzed is formed to be mostly low, and that the number of search times is relatively high. Can be extracted to extract the rising keyword.

이에 따라, 키워드 검출 수단(260)은 전체검색횟수 관련 팩터에 적용하는 계수(가중치)의 크기가, 급상승 관련 팩터에 적용하는 계수(가중치)의 크기보다 상대적으로 높게 되도록 제어할 수 있다. 즉, 키워드 검출 수단(260)은 급상승 관련 팩터인 팩터 MPR 및 CPM이 보다 강조되도록 가중치의 크기를 결정할 수 있다.Accordingly, the keyword detecting means 260 can control the size of the coefficient (weighting value) applied to the factor related to the total number of search times to be relatively higher than the size of the coefficient (weighting value) applied to the factor related to the sudden increase. That is, the keyword detecting means 260 may determine the magnitude of the weight so that the factor MPR and the CPM, which are the factors related to the spike, are more emphasized.

상기 팩터 MPR은

을 만족하며 연산될 수 있으며, 키워드 검출 수단(260)은 추정검색횟수를 산출한 키워드 각각에 대한 쿼리 순위(Rank(query))를 결정하고, 결정된 쿼리 순위(Rank(query))를 역수 형태로 전환하여 팩터 MPR을 연산한다. 팩터 MPR은 실시간으로 집계된 쿼리의 순위의 역수형태이고, 팩터 MPR의 값이 클수록 순위가 높다는 것을 나타낸다. 이러한 팩터 MPR은 연산된 값을 직접 이용하지 않고, 값을 적절하게 변형하여 순위의 현재 상태를 반영하는 척도로서 사용할 수 있다.The factor MPR is

The keyword detection means 260 determines the rank (rank) for each of the keywords for which the estimated number of search is calculated, and converts the determined rank (query) into an inverse form. Convert to compute factor MPR. The factor MPR is an inverse of the rank of queries aggregated in real time, and the larger the value of the factor MPR, the higher the rank. This factor MPR can be used as a measure of reflecting the current state of the ranking by appropriately modifying the value without using the calculated value directly.

상기 팩터 CPM은

을 만족하며 연산될 수 있으며, 키워드 검출 수단(260)은 추정검색횟수를 산출한 키워드 각각에 대한 실시간 추정검색횟수(R(query))를 카운트하고, 카운트된 추정검색횟수(R(query))를, 급상승이 되기 위해 설정된 최소 검색횟수(MinimumCnt)로 나누어 팩터 CPM을 연산한다. 팩터 CPM은 실제 검색횟수를 반영하는 인자일 수 있다. 즉, 팩터 CPM은 실제 검색된 횟수 를 일정 크기(급상승이라고 판단할 수 있는 최소 검색횟수)로 나누어준 값이다. 상기 값은 실제 검색횟수가 반영되는 인자로서, 특정 키워드의 실제 순위가 반영되는 인자와 쌍을 이룰 수 있으며, 상기 값이 큰 키워드일수록 현 상태의 반영이 두드러질 수 있다.The factor CPM is

The keyword detection means 260 counts the real-time estimated search count (R (query)) for each of the keywords for which the estimated search count is calculated, and counts the estimated search counts (R (query)). Calculate the factor CPM by dividing by the minimum number of searches (MinimumCnt) set to increase rapidly. Factor CPM may be a factor that reflects the actual number of searches. That is, the factor CPM is a value obtained by dividing the number of actual searches by a predetermined size (the minimum number of searches that can be determined as a sudden rise). The value is a factor reflecting the actual number of searches, and may be paired with a factor reflecting the actual ranking of a specific keyword, and the larger the value, the more prominent the reflection of the current state may be.

이러한 급상승 지수에서의 계수(가중치) 자동 조절을 통해 본 발명은 키워드 입력 현황에 따라 정보로서 실질적인 가치가 급상승 인기키워드의 추출을 자연스럽게 유도할 수 있는 환경을 마련한다.Through the automatic adjustment of the coefficient (weighting value) in such a rising index, the present invention provides an environment in which a substantial value as an information can be naturally induced to extract popular keywords according to the keyword input status.

즉, 키워드 검출 수단(260)은 산출된 급상승 지수를 이용하여 추정검색횟수를 산출한 키워드 각각에 대한 순위를 결정하고, 선정된 순위 이내의 키워드를 급상승 키워드로 추출할 수 있다.That is, the keyword detecting unit 260 may determine the rank of each keyword for which the estimated number of searches is calculated by using the calculated spike index, and extract keywords within the selected rank as spike keywords.

이후, 본 발명의 실시간 급상승 키워드 추출 시스템(200)은 추출된 급상승 키워드들을 목록화하고, 목록화한 급상승 키워드 리스트를 소정 검색자(120)가 접속한 검색 서버(110)에 제공함으로써 급격하게 검색횟수가 증가한 키워드에 대한 정보가 검색 서버(110)에서 검색자(120)에게 제공되도록 할 수 있다.Thereafter, the real-time rising keyword extraction system 200 of the present invention lists the extracted rising keyword and rapidly searches by providing the listed rising keyword list to the search server 110 to which the predetermined searcher 120 is connected. Information about an increased number of keywords may be provided to the searcher 120 in the search server 110.

따라서, 본 발명에 따르면, 소정 시간대에서 입력이 이루어진 키워드에 대한 검색 입력 성향을 이용하여 키워드 각각에 대한 미래시점에서의 추정검색횟수를 예측할 수 있다.Therefore, according to the present invention, the estimated number of searches for future keywords for each keyword can be predicted using the search input propensity for the keywords inputted at a predetermined time period.

또한, 본 발명에 의하면, 예측된 추정검색횟수를 이용하여 소정 시간대에서의 검색횟수 변화를 고려한 급상승 키워드를 추출함으로써 키워드에 대한 입력 성향을 정확하게 모니터링 할 수 있는 환경을 마련한다.In addition, according to the present invention, by using the predicted estimated number of searches to extract the rapidly rising keyword in consideration of the change in the number of searches in a predetermined time period to provide an environment that can accurately monitor the input propensity to the keyword.

도 3에서는 추정검색횟수의 예측 방식으로 지수평활법을 사용하는 것을 예시하여 설명하고 있으나, 지수평활법 이외에 ARIMA 방법, EWMA 방법 등이 유연하게 사용될 수 있음은 물론이다.In FIG. 3, the exponential smoothing method is used as an example of estimating the estimated search frequency. However, in addition to the exponential smoothing method, the ARIMA method and the EWMA method may be used flexibly.

우선, ① 단계에서 실시간 급상승 키워드 추출 시스템(200)은 일정 간격 마다 소정의 분석 장치에서 검색 서버(110)에 누적된 로그정보를 수집한다. 즉, 실시간 급상승 키워드 추출 시스템(200)은 각 검색 서버(110)에 누적되는 로그정보를 실시간으로 수집하여 분석장비에 제공한다. 예컨대, 실시간 급상승 키워드 추출 시스템(200)은 5초 간격으로 검색 서버(110)로부터 로그정보를 분석 장치로 가져올 수 있다.First, in step ①, the real-time rising keyword extraction system 200 collects log information accumulated in the search server 110 in a predetermined analysis device at predetermined intervals. That is, the real-time rising keyword extraction system 200 collects log information accumulated in each search server 110 in real time and provides the analysis equipment. For example, the real-time rising keyword extraction system 200 may bring the log information from the search server 110 to the analysis device at intervals of 5 seconds.

② 단계에서 실시간 급상승 키워드 추출 시스템(200)은 분석장비에 수집된 로그정보를 파싱하여 필요한 데이터를 시간 t의 데이터로 저장한다. 이때, 실시간 급상승 키워드 추출 시스템(200)은 수집일 n 및 수집시간 t에 대응하는 저장공간(210)으로, 상기 수집한 로그정보를 기록할 수 있다.In step ②, the real-time rising keyword extraction system 200 parses the log information collected in the analysis equipment and stores the necessary data as data of time t. In this case, the real-time rising keyword extraction system 200 may record the collected log information into the storage space 210 corresponding to the collection date n and the collection time t.

③ 단계에서 실시간 급상승 키워드 추출 시스템(200)은 수집된 실시간 데이터만으로는 실시간 급상승이라고 판단할 만한 기준이 없다는 가정하에서, 이전에 데이터를 일정 기간 수합하여 각 쿼리별로 급상승이라고 판단할 만한 기준을 작성해 둔다.In step ③, the real-time spike keyword extraction system 200 prepares a criterion that can be determined to be a spike for each query by collecting data for a certain period of time under the assumption that the collected real-time data alone does not have a criterion that can be determined to be a real-time spike.

④ 단계에서 실시간 급상승 키워드 추출 시스템(200)은 각 t 시점에서 수합 된 데이터들을 time-window의 크기만큼 합산하여 사용한다. 예컨대, 300초의 time-window를 설정했다면 실시간 급상승 키워드 추출 시스템(200)은 현시점으로부터 300초 이전에 생성된 데이터들을 합산하여 사용할 수 있다. 상기 time-window는 정해진 시간마다 슬라이딩을 하며, 분석 대상이 되는 데이터를 지속적으로 변화시킨다. 예컨대, time-window가 5초 간격으로 슬라이딩하는 경우, 실시간 급상승 키워드 추출 시스템(200)은 5초마다 새로운 데이터를 이용하여 알고리즘을 재수행 할 수 있다.In step ④, the real-time rising keyword extraction system 200 sums and uses the data collected at each time point t by the size of the time-window. For example, if a time-window of 300 seconds is set, the real-time spike keyword extraction system 200 may add and use data generated 300 seconds before the present time. The time-window slides at a predetermined time and continuously changes data to be analyzed. For example, when the time-window slides at 5 second intervals, the real-time rising keyword extraction system 200 may re-run the algorithm using new data every 5 seconds.

⑤ 단계에서 실시간 급상승 키워드 추출 시스템(200)은 time-window에 속한 데이터를 합산한 결과와 기 생성된 기준데이터를 비교 분석하여 급격하게 검색횟수가 증가한 키워드를 추출한다.In step ⑤, the real-time fast rising keyword extraction system 200 extracts a keyword whose search count is increased rapidly by comparing and analyzing the result of summing the data belonging to the time-window and the previously generated reference data.

⑥ 단계에서 실시간 급상승 키워드 추출 시스템(200)은 시간 t 시점의 데이터들의 기본적인 정보(쿼리별 검색속도, 검색 가속도 등)를 저장해 둔다. 이러한 처리는 급상승 키워드라고 판단된 키워드들을 검색자(120)에게 노출시킬 때, 함께 표시해야 할 정보를 유지하는 과정이다.In step ⑥, the real-time rising keyword extraction system 200 stores basic information (search speed, search acceleration, etc. for each query) of data at time t. This process is a process of maintaining information to be displayed together when exposing keywords determined to be rising keywords to the searcher 120.

⑦ 단계에서 실시간 급상승 키워드 추출 시스템(200)은 상기 time-window가 한번 슬라이딩 할 때 마다 급상승 쿼리들을 추출한다. 이후, 실시간 급상승 키워드 추출 시스템(200)은 추출된 상기 급상승 쿼리와 쿼리 정보를 조합하여 최종 결과물을 산출하여 검색자(120)에게 제공할 수 있다.In step ⑦, the real-time spike keyword extraction system 200 extracts the spike queries whenever the time-window slides once. Thereafter, the real-time spike keyword extraction system 200 may combine the extracted spike query and the query information to calculate a final result and provide it to the searcher 120.

도 5의 ⅰ)에서는 슬라이딩 윈도우에 속하는 시간폭 이내에서 식별되는 특정 시간대에 대해 일별 키워드의 검색횟수를 카운트한 일례를 예시한다.5) illustrates an example of counting the number of searches for a daily keyword in a specific time zone identified within a time width belonging to the sliding window.

즉, 실시간 급상승 키워드 추출 시스템(200)은 시간폭 10sec를 갖는 슬라이딩 윈도우를 이용하여 시간 t '15:00:00 ～ 15:00:10'에 대응하는 모든 저장공간(210)을 식별하되 설정된 수집일(도 5에서는 n=3)에 대응하는 저장공간(210) 만을 선별하여 로그정보를 검색할 수 있다. 예컨대, 실시간 급상승 키워드 추출 시스템(200)은 수집시간 '15:00:00'에 대응하는 저장공간(210)으로부터, 수집일 '3월3일'에 대해 키워드 '선물, 프리스타일, 스펀지'와 관련한 로그정보를, 수집일 '3월 4일'에 대해 키워드 '선물, 프리스타일'과 관련한 로그정보를, 수집일 '3월 5일'에 대해 키워드 '선물, 지압'과 관련한 로그정보를 검색할 수 있다.That is, the real-time rising keyword extraction system 200 identifies all the storage spaces 210 corresponding to the time t '15: 00: 00 ～ 15:00:10 'by using a sliding window having a time width of 10 sec. Log information may be retrieved by selecting only the storage space 210 corresponding to the day (n = 3 in FIG. 5). For example, the real-time fast rising keyword extraction system 200 and the keyword 'gift, freestyle, sponge' for the collection date 'March 3' from the storage space 210 corresponding to the collection time '15: 00: 00 ' Search log information related to the keyword 'gift, freestyle' for the collection date 'March 4', and log information related to the keyword 'gift, chiropractor' for the collection date 'March 5' can do.

또한, 실시간 급상승 키워드 추출 시스템(200)은 검색된 로그정보를 이용하여 키워드별 검색횟수를 상기 수집시간에 대해 합산함으로써 키워드에 대한 시계열 자료를 생성한다. 예컨대 키워드 '선물'에 있어서, 실시간 급상승 키워드 추출 시스템(200)은 수집시간 '15:00:00'에 대해 검색횟수 261회를, 수집시간 '15:00:05'에 대해 검색횟수 161회, 수집시간 '15:00:10'에 대해 검색횟수 179회를 카운트 함으로써 키워드 '선물'에 대한 시계열 자료를 생성할 수 있다.In addition, the real-time rising keyword extraction system 200 generates the time series data for the keyword by summing the number of searches for each keyword with respect to the collection time using the searched log information. For example, in the keyword 'gift', the real-time spike keyword extraction system 200 performs 261 searches for the collection time '15: 00: 00 ', 161 searches for the collection time '15: 00: 05', By counting 179 searches for the collection time '15: 00: 10 ', time series data for the keyword' gift 'can be generated.

이후, 실시간 급상승 키워드 추출 시스템(200)은 생성한 시계열 자료를 활용한 지수평활법을 통해 미래시점에서의 추정검색횟수를 예측한다. 도 5의 ⅱ)에서는 지수평활법에 의한 추정검색횟수의 연산식

또는

에 상기 시계열 자료를 대입하여 l 시점에서의 추정검색횟수를 예측하는 일례를 예시하고 있다.Then, the real-time fast rising keyword extraction system 200 predicts the estimated number of searches in the future through the exponential smoothing method using the generated time series data. In ii) of FIG. 5, an expression for calculating the estimated number of searches by the exponential smoothing method.

or

An example of predicting the estimated search frequency at time point l by substituting the time series data into the above is illustrated.

지수평활법을 통한 추정검색횟수의 예측에 있어서, 실시간 급상승 키워드 추출 시스템(200)은 현재 시점 (k+1)에서 발생할 검색횟수와 표준편차를 예측하여 이 두 값을 기준데이터로 사용할 수 있다. 이때, 실시간 급상승 키워드 추출 시스템(200)은 어느 순간까지의 시계열 자료를 사용하여 기준데이터를 생성할 것인가에 대한 판단이 요구될 수 있다.In estimating the estimated number of searches using the exponential smoothing method, the real-time spike keyword extraction system 200 may use the two values as the reference data by predicting the number of searches and the standard deviation to occur at the current time point (k + 1). In this case, the real-time fast rising keyword extraction system 200 may be required to determine whether to generate the reference data using the time series data up to a moment.

만약 현시점의 (k+1) 직전 k 시점까지의 시계열 자료를 사용하게 된다면, 실시간 급상승 키워드 추출 시스템(200)은 가장 최근의 검색 성향을 따르는 기준데이터를 생성하게 될 것이다. 이 경우, 추출된 급상승 키워드는 급상승 상태를 유지하는 시간이 매우 짧게 될 수 있다.If time series data up to the point k just before (k + 1) is used, the real-time spike keyword extraction system 200 will generate the reference data following the most recent search tendency. In this case, the extracted spiked keyword may have a very short time to maintain the spiked state.

반면, 현시점과 너무 오래 전의 데이터를 사용하게 되면, 실시간 급상승 키워드 추출 시스템(200)은 최근의 검색 성향을 반영하지 못하므로 한번 급상승 키워드로 판단된 키워드를 오랜 기간 동안 급상승 상태를 유지하게 된다.On the other hand, if the current and too long data is used, the real-time spike keyword extraction system 200 does not reflect the recent search tendency and maintains the spike state for a long time.

이에 따라, 실시간 급상승 키워드 추출 시스템(200)은 이 두 요소의 조화점을 적절히 찾아서, 적당한 간격으로 기준데이터를 자동적으로 변경할 필요가 있다. 본 실시예에서는 출원인이 실험적으로 얻어낸 최적 시간인 12시 간격으로 기준데이터를 변경하고 있다.Accordingly, the real-time rising keyword extraction system 200 needs to find the harmony point of the two elements appropriately, and automatically change the reference data at appropriate intervals. In this embodiment, the reference data are changed at an interval of 12 hours, which is the optimum time obtained by the applicant experimentally.

도 5의 ⅱ)는 슬라이딩 윈도우에 의해 선별된 키워드 각각에 대한 검색횟수 와 지수평활법에 의해 예측된 추정검색횟수를 비교한 도표이다.5 ii) is a chart comparing the number of searches for each of the keywords selected by the sliding window and the estimated number of searches predicted by the exponential smoothing method.

상기 도 5의 ⅰ)에서 키워드 '선물'에 대한 시계열 자료(연속적으로 카운트된 검색횟수)는 시간 '15:00:00'에서 261회, 시간 '15:00:05'에서 161회, 시간 '15:00:10'에서 '179'로 총 601회이다. 이러한 시계열 자료를 본 발명에 따른 지수평활법에 대입하는 경우, 실시간 급상승 키워드 추출 시스템(200)은 키워드 '선물'에 대한 미래시점에서의 일일 추정검색횟수로 76567.4회를 연산할 수 있다. 즉, 실시간 급상승 키워드 추출 시스템(200)은 현재 검색되는 속도대로 하루종일 검색된다는 가정하에서, 몇 회의 검색이 발생될지를 예측하여 추정검색횟수를 계산함으로써 급상승 키워드를 판단할 수 있는 근거를 마련한다.In FIG. 5, the time series data (the number of consecutive counts of searches) for the keyword 'gift' are 261 times at '15: 00: 00 'and 161 times at '15: 00: 05'. 15:00:10 'to 179' total 601 times. When substituting such time series data into the exponential smoothing method according to the present invention, the real-time fast rising keyword extraction system 200 may calculate 76567.4 times as the estimated number of daily searches in the future for the keyword 'gift'. That is, the real-time rising keyword extraction system 200 provides a basis for determining the rising keyword by calculating the estimated number of searches by predicting how many times the search will occur, assuming that the search is performed all day at the speed of the current search.

이후, 실시간 급상승 키워드 추출 시스템(200)은 실시간 추정검색횟수 및 기준데이터를 이용하여 급상승 키워드를 판정하며, 본 실시예에서는 급상승 지수의 산출을 통해 급상승 키워드를 추출한다. 즉 실시간 급상승 키워드 추출 시스템(200)은 팩터 DPA, 팩터 MPR, 팩터 SQD, 팩터 CPM를 일정한 값으로 resizing 해서 합산하여 급상승 지수를 산출하고, 산출된 급상승 지수를 이용하여 키워드 각각에 대해 순위를 결정한다.Then, the real-time rising keyword extraction system 200 determines the rising keyword by using the real-time estimated search frequency and the reference data, in the present embodiment extracts the rising keyword through the calculation of the rising index. That is, the real-time spike keyword extraction system 200 calculates the spike index by resizing the factor DPA, the factor MPR, the factor SQD, and the factor CPM to a constant value, and determines the rank for each keyword using the calculated spike index. .

또한, 실시간 급상승 키워드 추출 시스템(200)은 설정된 범위 이내의 순위를 갖는 키워드를 급상승 키워드로 검출하되, 순위가 상대적으로 높은 키워드가 상단에 배열되도록 하는 예컨대 급상승 키워드 리스트를 생성하고 이를 검색자(120)가 접속한 검색 서버(110)에 제공한다. 즉, 실시간 급상승 키워드 추출 시스템(200)은 급상승 지수가 높아 우선하여 추출된 키워드에 보다 높은 순위가 부여되 도록 하여 급상승 키워드 리스트를 생성하고, 이를 검색 서버(110)에 제공함으로써 검색 서버(110)에 접속하는 검색자(120)에게 급상승 키워드 리스트가 노출되도록 할 수 있다. 특히, 실시간 급상승 키워드 추출 시스템(200)은 상기 급상승 키워드 리스트를 소정 시간 간격(예, 로그정보를 수집하는 주기)마다 생성함으로써 급상승 키워드 리스트를 지속적으로 갱신하여 검색자(120)에게 노출할 수 있으며, 현재 키워드의 검색 성향에 관한 최신의 정보가 보다 신속하게 검색자(120)에게 전달되도록 할 수 있다.In addition, the real-time rising keyword extraction system 200 detects a keyword having a ranking within a predetermined range as a rising keyword, but generates a rising keyword list, for example, such that the keyword having a relatively high ranking is arranged at the top and the searcher 120 ) To the search server 110 to which the user connected. That is, the real-time rising keyword extraction system 200 generates a rising keyword list by assigning a higher ranking to the extracted keyword because the rising index is higher and provides the search server 110 with the search server 110. The rising keyword list may be exposed to the searcher 120 accessing the. In particular, the real-time rising keyword extraction system 200 generates the sudden keyword list every predetermined time interval (for example, a cycle for collecting log information) to continuously update the rising keyword list to be exposed to the searcher 120. In this case, the latest information about the current disposition of the keyword may be transmitted to the searcher 120 more quickly.

본 실시예에 따르면, 키워드에 대해 예측한 추정검색횟수를 급상승 키워드를 검출하는 데 활용함으로써, 어느 하나의 키워드에 대한 시간 변화에 따른 입력 회수의 추세를 모니터링 할 수 있다.According to the present exemplary embodiment, the estimated number of searches predicted for a keyword may be used to detect a rapidly rising keyword, thereby monitoring the trend of the number of inputs according to a change in time for any one keyword.

또한, 본 발명에 따르면, 현재 검색자(120)로부터의 검색횟수가 급격하게 상승하고 있는 키워드를 실시간으로 검출함으로써, 현재 가장 인기 있는 키워드에 대한 정보를 신속하고 정확하게 식별할 수 있다.In addition, according to the present invention, by detecting in real time a keyword that the number of searches from the searcher 120 is rapidly increasing in real time, it is possible to quickly and accurately identify the information on the most popular keywords.

이하, 본 발명의 실시예에 따른 실시간 급상승 키워드 추출 시스템(200)의 작업 흐름을 상세히 설명한다.Hereinafter, the workflow of the real-time rising keyword extraction system 200 according to an embodiment of the present invention will be described in detail.

본 발명의 실시간 급상승 키워드 추출 방법은 상술한 실시간 급상승 키워드 추출 시스템(200)에 의해 수행된다.The real-time rising keyword extraction method of the present invention is performed by the real-time rising keyword extraction system 200 described above.

우선, 실시간 급상승 키워드 추출 시스템(200)은 검색 서버(110)로부터 수 집한 로그정보를 수집일 및 수집시간에 대응하여 저장공간(210)에 기록한다(S610). 본 단계(S610)는 검색 서버(110)의 메모리 수단에 누적, 기록되는 로그정보를 주어진 시간 간격을 리드하고, 상기 리드한 로그정보를 수집일 n 및 수집시간 t에 대응하는 저장공간(210)에 기록하는 과정이다.First, the real-time rising keyword extraction system 200 records the log information collected from the search server 110 in the storage space 210 corresponding to the collection date and collection time (S610). This step (S610) reads the log information accumulated and recorded in the memory means of the search server 110 at a given time interval, and stores the read log information corresponding to the collection date n and the collection time t. The process of writing in.

또한, 실시간 급상승 키워드 추출 시스템(200)은 수집시간 t에서부터 상기 t 이전 수집시간 T까지의 각 수집시간에 대해, 수집일 n에서부터 상기 n 이전의 수집일 N까지에 대응하는 저장공간(210)을 검색 서버에서 식별한다(S620). 본 단계(S620)는 소정의 시간폭은 갖는 슬라이딩 윈도우에 의해 분석 대상이 되는 저장공간(210)을 식별하는 과정이다. 즉, 실시간 급상승 키워드 추출 시스템(200)은 수집시간 t에 대응하며 수집일 n에서부터 상기 n 이전의 수집일 N까지에 대응하는 저장공간(210)을 검색 서버(110)에서 우선 식별하고, 상기 식별된 저장공간(210) 중에서 수집일 n에서부터 수집일 N까지에 대응하는 저장공간(210)을 다시 식별한다. 즉, 실시간 급상승 키워드 추출 시스템(200)은 특정 시간대에 대해 소정 날짜별로 대응하는 저장공간(210)을 선별할 수 있다(도 5 참조).In addition, the real-time rising keyword extraction system 200 for each collection time from the collection time t to the collection time T before the t, the storage space 210 corresponding to the collection day N before the collection date n before the n It identifies in the search server (S620). This step (S620) is a process of identifying the storage space 210 to be analyzed by the sliding window having a predetermined time width. That is, the real-time fast rising keyword extraction system 200 first identifies the storage space 210 corresponding to the collection time t and corresponds to the collection date N from the collection date n to the collection date N before the n, and then identifies the identification. Among the storage spaces 210, the storage spaces 210 corresponding to the collection days n to the collection days N are identified again. That is, the real-time rising keyword extraction system 200 may select the storage space 210 corresponding to a predetermined date for a specific time zone (see FIG. 5).

다음으로 실시간 급상승 키워드 추출 시스템(200)은 식별된 저장공간(210) 각각에 기록된 로그정보를 분석하여 키워드 각각에 대한 검색횟수를 수집시간 별로 카운트함으로써 시계열 검색횟수를 생성한다(S630). 본 단계(S630)는 분석 대상의 키워드가 이전 기간 동안 특정 시간대에 발생한 검색횟수를 확인함으로써, 해당 키워드에 대한 시계열 자료를 생성하는 과정이다. 상술한 도 5에서는 키워드 '선물'에 대한 시계열 자료로서, 시간 '15:00:00'에서 261회, 시간 '15:00:05'에서 161 회, 시간 '15:00:10'에서 '179'의 검색횟수를 카운트하는 것을 예시한다.Next, the real-time fast rising keyword extraction system 200 analyzes log information recorded in each of the identified storage spaces 210 and counts the number of searches for each keyword by the collection time (S630). This step (S630) is a process of generating a time series data for the keyword by identifying the number of times the keyword of the analysis target occurred in a specific time period during the previous period. In FIG. 5 described above, as a time series data for the keyword 'gift', 261 times at '15: 00: 00 ', 161 times at '15: 00: 05', and '179 at '15: 00: 10' For example, counting the number of searches.

또한, 실시간 급상승 키워드 추출 시스템(200)은 생성된 시계열 검색횟수를 이용하여 미래시점의 키워드별 추정검색횟수를 연산한다(S640). 본 단계(S640)는 슬라이딩 윈도우의 시간폭 이내에서의 키워드 검색성향을 반영하여 미래의 검색횟수를 추정, 예측하는 과정이다. 특히, 본 실시예에서는 지수평활법을 활용하여 미래시점에서의 추정검색횟수를 연산한다.In addition, the real-time fast rising keyword extraction system 200 calculates the estimated number of searches for each keyword in the future using the generated time series search times (S640). This step (S640) is a process of estimating and predicting the number of future searches by reflecting the keyword search tendency within the time width of the sliding window. In particular, the present embodiment calculates the estimated number of searches in the future using the exponential smoothing method.

실시간 급상승 키워드 추출 시스템(200)은

또는

를 만족하여 미래시점의 추정검색횟수를 연산한다(S710). 본 단계(S710)는 지수평활법을 활용하여 O(n) 시간 이내에서 추정검색횟수를 예측하는 과정이다.Real-time rising keyword extraction system 200

or

Calculate the estimated number of searches in the future by satisfying the (S710). This step (S710) is a process of predicting the estimated number of searches within O (n) time using the exponential smoothing method.

즉, 실시간 급상승 키워드 추출 시스템(200)은 l시점 이후의 추정검색횟수를

를 만족하여 연산할 수 있다.That is, the real-time spike keyword extraction system 200 calculates the estimated number of searches since l time point.

Can be calculated by satisfying

이때, c가 가중값의 합이 1이 되게 해주는 정규화상수이며, n이 충분히 커질 경우 실시간 급상승 키워드 추출 시스템(200)은 상기 c를 w와 같은 값으로 하는

를 만족하여 l시점 이후의 추정검색횟수를 연산할 수 있 다. 상술한 바와 같이, 본 실시예에서 w는 0.3에 근접하는 수치로 결정할 수 있다.In this case, c is a normalized constant that allows the sum of the weighted values to be 1, and when n becomes large enough, the real-time spike keyword extraction system 200 sets c to the same value as w.

You can calculate the estimated number of searches after l by satisfying. As described above, in the present embodiment, w may be determined as a value approaching 0.3.

다시 도 6을 살펴보면, 실시간 급상승 키워드 추출 시스템(200)은 연산된 추정검색횟수를 이용하여 급상승 키워드를 검출한다(S650). 본 단계(S650)는 추정검색횟수를 참조하여 키워드 별 급상승 지수를 산출하고, 산출된 급상승 지수에 따라 급상승 키워드를 추출하는 과정이다.Referring to FIG. 6 again, the real-time spike keyword extraction system 200 detects the spike keyword by using the calculated estimated search frequency (S650). This step (S650) is a process of calculating the sudden index for each keyword with reference to the estimated number of searches, and extracts the sudden keyword in accordance with the calculated sudden index.

도 8은 본 발명에 따른 급상승 키워드를 검출하는 일례를 설명하기 위한 작업 흐름도이다.8 is a flowchart illustrating an example of detecting a rising keyword according to the present invention.

우선, 실시간 급상승 키워드 추출 시스템(200)은 연산된 추정검색횟수를 참조하여 키워드 각각에 대한 급상승 지수를 산출한다(S810). 본 단계(S810)는 키워드의 급상승 정도를 판단하는 근거로서, 급상승 지수를 구하는 공식

를 활용하여 급상승 지수를 각 키워드에 부여하는 과정이다. 즉, 실시간 급상승 키워드 추출 시스템(200)은 키워드와 관련하여 급상승 관련 팩터인 팩터 DPA 및 팩터 SQD와, 전체검색횟수 관련 팩터인 팩터 MPR 또는 팩터 CPM을 연산한다. 상기 팩터 각각에 대한 연산 과정은 상술한 키워드 검출 수단(260)의 설명으로 갈음하고 여기서는 생략한다.First, the real-time spike keyword extraction system 200 calculates a spike index for each keyword by referring to the calculated estimated search frequency (S810). This step (S810) is a basis for determining the degree of spike of the keyword, the formula for obtaining the spike index

It is a process to give a spike index to each keyword by using. That is, the real-time spike keyword extraction system 200 calculates factor DPA and factor SQD, which are factors related to spikes, and factor MPR or factor CPM, which are factors related to the total number of searches, in relation to keywords. The calculation process for each of the factors is replaced with the description of the keyword detecting means 260 described above, and is omitted here.

다만, 본 단계(S810)에서의 실시간 급상승 키워드 추출 시스템(200)은 키워드의 입력 현황에 따라 상기 급상승 지수를 구성하는 팩터에 적용할 계수(가중치) α,β,γ,δ의 크기를 자동으로 조정할 수 있다.However, the real-time rising keyword extraction system 200 in this step (S810) automatically sizes the coefficients (weighting values) α, β, γ, δ to be applied to the factors constituting the rising index according to the input status of the keyword. I can adjust it.

만약 분석 대상이 되는 키워드가 적고 키워드 각각의 추정검색횟수가 높게 연산되는 경우, 실시간 급상승 키워드 추출 시스템(200)은 급상승 관련 팩터에 적용하는 가중치의 크기가, 전체검색횟수 관련 팩터에 적용하는 가중치의 크기보다 상대적으로 높게 되도록 할 수 있다. 즉, 실시간 급상승 키워드 추출 시스템(200)은 급상승 관련 팩터인 팩터 DPA(계수 α) 또는 팩터 SQD(계수 γ)를 강조하여 급상승 지수를 산출한다.If the number of keywords to be analyzed is small and the estimated number of searches for each keyword is calculated high, the real-time spike keyword extraction system 200 may determine the magnitude of the weight applied to the factor related to the spike, and the weight of the weight applied to the factor related to the total search count. It can be made relatively higher than size. That is, the real-time spike keyword extraction system 200 calculates the spike index by emphasizing the factor DPA (the coefficient α) or the factor SQD (the coefficient γ) which are the factors related to the spike.

반면, 분석 대상이 되는 키워드가 많고 키워드 각각의 추정검색횟수가 적게 연산되는 경우, 실시간 급상승 키워드 추출 시스템(200)은 전체검색횟수 관련 팩터에 적용하는 가중치의 크기가, 급상승 관련 팩터에 적용하는 가중치의 크기보다 상대적으로 높게 되도록 할 수 있다. 즉, 실시간 급상승 키워드 추출 시스템(200)은 전체검색횟수 관련 팩터인 팩터 MPR(계수 β) 또는 팩터 CPM(계수 δ)를 강조하여 급상승 지수를 산출한다.On the other hand, if there are many keywords to be analyzed and the estimated number of searches for each keyword is calculated less, the real-time spike keyword extraction system 200 weights the magnitude of the weight applied to the factor related to the total number of searches, and the weight applied to the factor related to the spike. It can be made relatively higher than the size of. That is, the real-time spike keyword extraction system 200 calculates the spike index by emphasizing the factor MPR (coefficient β) or the factor CPM (coefficient δ) which are factors related to the total number of searches.

또한, 실시간 급상승 키워드 추출 시스템(200)은 산출된 급상승 지수를 이용하여 키워드의 순위를 결정하고, 선정된 순위 이내의 키워드를 인기키워드로서 추출한다(S820). 본 단계(S820)는 검색 서버(110)에 접속한 검색자(120)에게 현시점에서 인기가 많은 키워드를 상기 급상승 지수 순으로 검출하고 목록에 배열하여 급상승 키워드 리스트를 생성하는 과정이다.In addition, the real-time rising keyword extraction system 200 determines the ranking of the keyword using the calculated rising index, and extracts the keywords within the selected ranking as a popular keyword (S820). This step (S820) is a process of generating a keyword list by detecting the keywords most popular at the present time to the searcher 120 accessing the search server 110 in the order of the sudden index and arranging them in the list.

이후, 실시간 급상승 키워드 추출 시스템(200)은 생성된 급상승 키워드 리스트를 검색 서버(110)를 통해 검색자(120)에게 노출되도록 제어할 수 있다.Thereafter, the real-time rising keyword extraction system 200 may control the generated rising keyword list to be exposed to the searcher 120 through the search server 110.

본 발명에 따르면, 소정 시간대에서 입력이 이루어진 키워드에 대한 검색 입력 성향을 이용하여 키워드 각각에 대한 미래시점에서의 추정검색횟수를 예측할 수 있다.According to the present invention, it is possible to predict the estimated number of searches in the future for each keyword by using the search input propensity for the keywords inputted at a predetermined time period.

또한, 본 발명에 따르면, 일정한 기간 동안의 키워드 입력성향을 집계한 이후에야 급상승 키워드를 판단하는 종래의 판단 방식에서 탈피하고, 현시점의 키워드 입력성향을 실시간으로 반영하여 급상승 키워드를 검출할 수 있다.In addition, according to the present invention, it is possible to detect the rapidly rising keyword by reflecting the current keyword input tendency in real time after estimating the rising keyword only after calculating the keyword input tendency for a certain period.

본 발명의 실시예들은 다양한 컴퓨터로 구현되는 동작을 수행하기 위한 프로그램 명령을 포함하는 컴퓨터 판독 가능 매체를 포함한다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 로컬 데이터 파일, 로컬 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체는 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM, DVD와 같은 광기록 매체, 플롭티컬 디스크와 같은 자기-광 매체, 및 롬, 램, 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 상기 매체는 프로그램 명령, 로컬 데이터 구조 등을 지정하는 신호를 전송하는 반송파를 포함하는 광 또는 금속선, 도파관 등의 전송 매체일 수도 있다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨 터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.Embodiments of the present invention include computer readable media including program instructions for performing various computer implemented operations. The computer readable medium may include program instructions, local data files, local data structures, or the like, alone or in combination. The media may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well-known and available to those having skill in the computer software arts. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical recording media such as CD-ROMs, DVDs, magnetic-optical media such as floppy disks, and ROM, RAM, flash memory, and the like. Hardware devices specifically configured to store and execute the same program instructions are included. The medium may be a transmission medium such as an optical or metal wire, a waveguide, or the like including a carrier wave for transmitting a signal specifying a program command, a local data structure, or the like. Examples of program instructions include machine code, such as that produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter.

지금까지 본 발명에 따른 구체적인 실시예에 관하여 설명하였으나, 본 발명의 범위에서 벗어나지 않는 한도 내에서는 여러 가지 변형이 가능함은 물론이다. 그러므로, 본 발명의 범위는 설명된 실시예에 국한되어 정해져서는 안되며, 후술하는 특허청구범위뿐 아니라 이 특허청구범위와 균등한 것들에 의해 정해져야 한다.While specific embodiments of the present invention have been described so far, various modifications are possible without departing from the scope of the present invention. Therefore, the scope of the present invention should not be limited to the described embodiments, but should be determined by the equivalents of the claims and the claims.

이상의 설명에서 알 수 있는 바와 같이, 본 발명에 따르면, 키워드의 현재 입력성향을 시간적으로 고려하여 키워드에 대한 시계열 자료를 생성하고, 생성된 시계열 자료를 이용하여 각 키워드별로 미래 시점에서의 추정검색횟수를 실시간으로 예측할 수 있는 실시간 급상승 키워드 추출 방법 및 실시간 급상승 키워드 추출 시스템을 제공할 수 있다.As can be seen from the above description, according to the present invention, time series data for a keyword is generated in consideration of a current input tendency of a keyword, and the estimated number of searches for each keyword at a future time point is generated using the generated time series data. It is possible to provide a real-time rising keyword extraction method and a real-time rising keyword extraction system that can be predicted in real time.

또한, 본 발명에 따르면, 예측된 추정검색횟수를 이용하여 소정 시간대에서의 검색횟수 변화를 고려한 급상승 키워드를 추출함으로써 키워드에 대한 입력 성향을 정확하게 모니터링 할 수 있는 환경을 마련하는 실시간 급상승 키워드 추출 방법 및 실시간 급상승 키워드 추출 시스템을 제공할 수 있다.In addition, according to the present invention, by extracting the rising keyword in consideration of the change in the number of searches in a predetermined time period using the predicted estimated number of searches, real-time rising keyword extraction method for providing an environment that can accurately monitor the input propensity for the keyword and Real-time rising keyword extraction system can be provided.

또한, 본 발명에 따르면, 일정한 기간 동안의 키워드 입력성향을 집계한 이후에야 급상승 키워드를 판단하는 종래의 판단 방식에서 탈피하고, 현시점의 키워드 입력성향을 실시간으로 반영하여 급상승 키워드를 검출할 수 있는 실시간 급상승 키워드 추출 방법 및 실시간 급상승 키워드 추출 시스템을 제공할 수 있다.In addition, according to the present invention, after the keyword input propensity for a certain period of time has been counted, the real-time for escaping from the conventional judgment method for determining the sudden keyword, and reflecting the current keyword input propensity in real time to detect the rapidly rising keyword A rising keyword extraction method and a real-time rising keyword extraction system can be provided.

또한, 본 발명에 따르면, 검색자로부터의 입력횟수가 급상승하는 키워드를 신속하게 검출함으로써, 검색횟수의 증가 및 검색자의 접속 시간을 증가시켜 인터넷 검색 서비스 제공 업체로 하여금 영업 이익을 증대할 수 있도록 하는 실시간 급상승 키워드 추출 방법 및 실시간 급상승 키워드 추출 시스템을 제공할 수 있다.In addition, according to the present invention, by quickly detecting a keyword that is rapidly rising the number of inputs from the searcher, by increasing the number of searches and the access time of the searcher to enable the Internet search service provider to increase the operating profit A real-time rising keyword extraction method and a real-time rising keyword extraction system can be provided.

Claims

In the real-time rising keyword extraction method performed by the real-time rising keyword extraction system comprising a counting means, a time series generating means, a number estimating means and a keyword detecting means,

Analyzing, by the counting means, log information of the stored keywords according to the collection time and the collection date, and counting the number of searches for each keyword by the collection date and the collection time;

Generating, by the time series generating means, a time series search frequency for each keyword based on the search frequency counted by the collection date and the collection time;

Calculating, by the number estimating means, an estimated number of searches for each keyword at a future time using the time series search number; And

Detecting, by the keyword detecting means, a rapidly rising keyword in which the time series search count is rapidly changed in a preset time period by using the calculated estimated search number;

Including,

Computing the estimated search frequency,

or

To calculate the estimated number of searches,

Wherein Zn (l) is an estimated number of searches, C is a normalization constant, and W is a smoothing constant.

The method of claim 1,

The log information,

Real-time rising keyword extraction method characterized in that it is identified using a sliding window having a predetermined time width.

The method of claim 1,

Computing the estimated search frequency for each keyword of the future time,

A method of extracting a real-time rising keyword, comprising calculating an estimated number of searches occurring at a current time k + 1 using the number of time series searches by keyword over the past N days.

The method of claim 1,

Computing the estimated search frequency for each keyword of the future time,

Real-time spike keywords characterized by determining the estimated number of searches in the future by applying the time-series searches from past time points to at least one of exponential smoothing, ARIMA (Autoregressive Integrated Moving Average) and EMWA (Exponentially Weighted Moving Average) methods Extraction method.

The method of claim 1,

Detecting the rising keyword,

A method of extracting a real-time rising keyword, wherein the rising keyword is detected based on the degree to which the time series search count is increased rapidly based on the search pattern corresponding to the collection date and the collection time for each keyword.

The method of claim 1,

Detecting the rising keyword,

Real-time rising keyword extraction method characterized in that to calculate the rising index using the real-time ranking of each keyword.

The method of claim 1,

Detecting the rising keyword,

A real-time spike keyword extraction method comprising calculating a spike index by using a ratio between a minimum number of searches set to be detected as a spike keyword compared to the number of searches input in real time.

The method of claim 1,

Detecting the rising keyword,

A method of extracting a real-time rising keyword using a difference between a search number input in real time and an estimated estimated search number.

Counting means for analyzing the log information of the stored keywords according to the collection time and the collection date to count the number of searches for each keyword by collection date and collection time;

Time series generating means for generating a time series search frequency for each keyword based on the number of searches counted by the collection date and collection time;

A number estimating means for calculating an estimated number of searches for each keyword at a future time using the time series search times; And

Keyword detecting means for detecting a sudden keyword whose time series search count is rapidly changed in a preset time zone by using the calculated estimated search number

Including,

The number estimating means,

or

To calculate the estimated number of searches,

Zn (l) is an estimated number of searches, C is a normalization constant, W is a smoothing keyword extraction system, characterized in that the smoothing constant.

10. The method of claim 9,

The log information,

Real-time rising keyword extraction system, characterized in that it is identified using a sliding window having a predetermined time width.

10. The method of claim 9,

The number estimating means,

Real-time fast rising keyword extraction system, characterized in that by using the number of time series search by keyword over the past N days to calculate the estimated number of searches occurring at the current time k + 1.

10. The method of claim 9,

The number estimating means,

Real-time spike keywords characterized by determining the estimated number of searches in the future by applying the time-series searches from past time points to at least one of exponential smoothing, ARIMA (Autoregressive Integrated Moving Average) and EMWA (Exponentially Weighted Moving Average) methods Extraction system.

10. The method of claim 9,

The keyword detecting means,

Real-time rising keyword extraction system, characterized in that to detect the rising keyword using the degree to which the time series search frequency is rapidly increased based on the search pattern corresponding to the collection date and collection time for each keyword.

10. The method of claim 9,

The keyword detecting means,

Real-time rising keyword extraction system, characterized in that to calculate the rising index using the real-time ranking of each keyword.

10. The method of claim 9,

The keyword detecting means,

Real-time rising keyword extraction system, characterized in that to calculate the rising index by using the ratio between the minimum number of times set to be detected as the rising keyword to the number of searches entered in real time.

10. The method of claim 9,

The keyword detecting means,

Real-time rising keyword extraction system, characterized in that to calculate the rising index by using the difference between the number of search input in real time and the estimated estimated number of searches.