KR20120089963A

KR20120089963A - Method and system for providing reprsentation words of real-time popular keyword

Info

Publication number: KR20120089963A
Application number: KR1020100140403A
Authority: KR
Inventors: 신재승; 박영섭; 최재걸; 노원숙
Original assignee: 엔에이치엔(주)
Priority date: 2010-12-31
Filing date: 2010-12-31
Publication date: 2012-08-16
Also published as: KR101220080B1

Abstract

PURPOSE: A method and a system for supplying representative sentence of a real-time favorite keyword are provided to supply a favorite keyword by supplying a first candidate representative sentence related to an exposure reason of a favorite keyword. CONSTITUTION: A first candidate representative sentence generating unit(210) generates a first candidate representative sentence about a favorite keyword based on a document. A second candidate representative sentence generating unit(220) generates a second candidate representative sentence about the favorite keyword based on broadcasting data. A representative sentence determining unit(240) determines a representative sentence about the favorite keyword by using the first and second candidate representative sentences. The representative sentence determining unit determines a representative sentence.

Description

METHOD AND SYSTEM FOR PROVIDING REPRSENTATION WORDS OF REAL-TIME POPULAR KEYWORD}

본 발명은 실시간 인기 키워드에 대한 대표 문구를 제공하는 방법 및 시스템에 관한 것으로, 보다 구체적으로 사이트에 노출된 인기 키워드가 실시간으로 급상승한 원인인 대표 문구를 문서와 방송 데이터에 기초하여 추출하는 방법 및 시스템에 관한 것이다.The present invention relates to a method and system for providing a representative phrase for a real-time popular keyword, and more particularly, a method for extracting a representative phrase, which is the cause of the sudden increase in popularity of a keyword exposed on a site, based on documents and broadcast data. It is about the system.

일반적으로, 키워드 검색을 제공하는 대부분의 웹 페이지는 현재 시점에서 실시간으로 검색 요청이 급증하는 검색어인 인기 키워드를 노출한다. 이때, 웹 페이지 상에는 선택된 인기 키워드와 인기 키워드의 순위가 노출된다.In general, most web pages that provide keyword search expose popular keywords, which are search terms that are rapidly increasing in real time at the present time. At this time, the ranking of the selected popular keyword and the popular keyword is exposed on the web page.

특히, 인기 키워드는 웹 페이지 상에 단순히 연예인 이름이거나, 국제 또는 국내 단체명으로 노출된다. 이에 따라, 인기 키워드가 어떤 내용을 포함하고 있는지 알고자 하는 경우, 사용자는 마우스 클릭 등을 이용하여 특정 인기 키워드를 선택할 수 있다. 그러면, 웹 페이지에는 선택된 인기 키워드와 관련된 기사, 문서 등이 노출될 수 있다.In particular, the popular keyword is simply a celebrity name or exposed as an international or domestic corporate name on a web page. Accordingly, when the user wants to know what contents the popular keyword includes, the user can select a specific popular keyword by using a mouse click or the like. Then, articles, documents, etc. related to the selected popular keyword may be exposed on the web page.

이처럼, 인기 키워드에 대한 구체적인 내용을 알고 싶은 경우뿐만 아니라, 인기 키워드가 된 원인 등과 같이 개략적인 내용을 알고자 하는 경우에도 사용자는 인기 키워드를 선택하고, 선택된 인기 키워드에 대한 다수의 기사들을 모두 읽어야 하는 번거로움이 존재한다.In this way, not only if you want to know the details of the popular keywords, but also if you want to know the general contents such as why they became popular keywords, the user should select the popular keywords and read all the articles about the selected popular keywords. There is a hassle to do.

또한, 대부분의 사용자들은 텔레비전이나 라디오를 청취하다가 출연진의 정보가 궁금하여 검색을 수행하거나, 특정 방송 프로그램이 언제 시작되는지, 혹은 오늘 방송하는 특정 방송 프로그램의 내용이 궁금해서 검색을 수행한다. 이에 따라, 실시간 인기 키워드로는 연예인 이름이나 방송 프로그램 제목이 대부분이다. In addition, most of the users listen to the TV or radio and perform a search because they are curious about the information of the cast, or when the specific broadcast program starts or the content of a specific broadcast program that is broadcast today. Accordingly, most of the real-time popular keywords are celebrity names and broadcast program titles.

따라서, 실시간 인기 키워드에 해당하는 방송 프로그램을 제공하여 실시간 인기 키워드가 된 원인을 사용자에게 알려줄 수 있는 방안이 필요하다.Therefore, there is a need for a method of providing a broadcast program corresponding to a real-time popular keyword to inform the user of the cause of the real-time popular keyword.

또한, 사용자가 인기 키워드 및 인기 키워드와 관련된 기사들을 단계적으로 선택하지 않더라도 실시간으로 인기 키워드가 된 원인을 제공할 수 있는 방안이 필요하다.In addition, there is a need for a method capable of providing a reason for becoming a popular keyword in real time even if a user does not select a popular keyword and articles related to the popular keyword step by step.

본 발명은 문서를 이용하여 실시간으로 인기 키워드가 노출된 원인과 관련된 제1 후보 대표 문구를 제공함으로써 인기 키워드로 제시된 이유에 대한 궁금증을 해결할 수 있는 방법 및 시스템을 제공한다.The present invention provides a method and system that can solve a question about a reason given as a popular keyword by providing a first candidate representative phrase related to a cause of exposure of a popular keyword in real time using a document.

본 발명은 방송 데이터를 이용하여 실시간으로 인기 키워드가 노출된 원인과 관련된 제2 후보 대표 문구를 제공함으로써 인기 키워드로 제시된 이유에 대한 궁금증을 해결할 수 있는 방법 및 시스템을 제공한다.The present invention provides a method and system that can solve a question about a reason presented as a popular keyword by providing a second candidate representative phrase related to the cause of exposure of the popular keyword in real time using broadcast data.

본 발명은 제1 후보 대표 문구 및 제2 후보 대표 문구에 대해 인기도와 방송 적합도를 고려하여 인기 키워드에 대한 최종적인 대표 문구를 결정함으로써 인기 키워드가 노출된 원인을 보다 정확하게 제공할 수 있는 방법 및 시스템을 제공한다.The present invention provides a method and system for more accurately providing the cause of the popular keyword exposure by determining the final representative phrase for the popular keyword in consideration of the popularity and broadcast suitability for the first candidate representative phrase and the second candidate representative phrase. To provide.

본 발명의 일실시예에 따른 대표 문구 제공 방법은, 문서를 기초로 인기 키워드에 대한 제1 후보 대표 문구를 생성하는 단계, 방송 데이터를 기초로 인기 키워드에 대한 제2 후보 대표 문구를 생성하는 단계, 및 상기 제1 후보 대표 문구 및 상기 제2 후보 대표 문구를 이용하여 상기 인기 키워드에 대한 최종적인 대표 문구를 결정하는 단계를 포함할 수 있다.In accordance with an embodiment of the present invention, a method of providing representative phrases may include generating a first candidate representative phrase for a popular keyword based on a document, and generating a second candidate representative phrase for a popular keyword based on broadcast data. And determining the final representative phrase for the popular keyword by using the first candidate representative phrase and the second candidate representative phrase.

또한, 상기 최종적인 대표 문구를 결정하는 단계는, 상기 제1 후보 대표 문구와 상기 제2 후보 대표 문구 중 인기 키워드에 대한 적합도가 더 높은 것을 최종적인 대표 문구로 결정할 수 있다.In the determining of the final representative phrase, the final representative phrase may be determined to have a higher suitability for a popular keyword among the first candidate representative phrase and the second candidate representative phrase.

또한, 상기 제1 후보 대표 문구를 생성하기 위해 이용된 문서의 인기도를 계산하는 단계를 더 포함할 수 있다.The method may further include calculating a popularity of the document used to generate the first candidate representative phrase.

또한, 상기 대표 문구를 결정하는 단계는, 상기 문서의 인기도와 기설정된 기준 인기도를 이용하여 상기 인기 키워드에 대한 적합도가 더 높은 것으로 결정된 상기 제1 후보 대표 문구를 최종적인 대표 문구로 결정할 수 있다.The determining of the representative phrase may include determining the first candidate representative phrase as the final representative phrase by using the popularity of the document and a predetermined reference popularity.

또한, 상기 제2 후보 대표 문구를 생성하기 위해 이용된 방송 데이터의 방송 시간과 인기 키워드의 노출 시간을 이용하여 제2 후보 대표 문구의 시간 점수를 계산하는 단계를 더 포함할 수 있다.The method may further include calculating a time score of the second candidate representative phrase by using the broadcast time of the broadcast data used to generate the second candidate representative phrase and the exposure time of the popular keyword.

또한, 상기 대표 문구를 결정하는 단계는, 상기 제2 후보 대표 문구의 시간 점수와 기설정된 기준 시간 점수를 이용하여 상기 인기 키워드에 대한 적합도가 더 높은 것으로 결정된 상기 제2 후보 대표 문구를 최종적인 대표 문구로 결정할 수 있다.In the determining of the representative phrase, the second candidate representative phrase which is determined to have a higher suitability for the popular keyword by using the time score of the second candidate representative phrase and a predetermined reference time score is finally represented. You can decide by phrases.

또한, 상기 제1 후보 대표 문구를 생성하는 단계는, 상기 인기 키워드를 포함하는 문서들의 기준 단어를 결정하는 단계, 상기 기준 단어 중에서 대표 기준 단어를 결정하는 단계, 대표 기준 단어의 이전 또는 이후에 연속하는 단어와 상기 대표 기준 단어를 조합하여 상기 대표 기준 단어를 확장하는 단계, 및 확장된 대표 기준 단어를 이용하여 상기 제1 후보 대표 문구를 생성하는 단계를 포함할 수 있다.The generating of the first candidate representative phrase includes: determining a reference word of documents including the popular keyword, determining a representative reference word among the reference words, and successively before or after the representative reference word. And combining the representative reference word and the representative reference word to expand the representative reference word, and generating the first candidate representative phrase using the extended representative reference word.

또한, 상기 대표 기준 단어를 확장하는 단계는, 상기 인기 키워드를 포함하는 문서들에서 상기 대표 기준 단어의 이전 또는 이후에 연속하는 단어가 포함되는 조건부 확률을 계산하는 단계, 및 상기 조건부 확률에 기초하여 상기 대표 기준 단어를 확장하는 단계를 포함할 수 있다.The expanding of the representative criterion word may include calculating a conditional probability including successive words before or after the representative criterion word in documents including the popular keyword, and based on the conditional probability. And expanding the representative reference word.

또한, 상기 대표 기준 단어를 결정하는 단계는, 상기 인기 키워드를 포함하는 문서들의 형태소를 분석하여 상기 기준 단어의 빈도수를 카운트하는 단계, 및 카운트된 빈도수에 기초하여 상기 대표 기준 단어를 결정하는 단계를 포함할 수 있다.The determining of the representative reference word may include analyzing a morpheme of documents including the popular keyword, counting a frequency of the reference word, and determining the representative reference word based on the counted frequency. It may include.

또한, 상기 인기 키워드를 포함하는 문서들을 수집하는 단계, 및 수집된 문서들에 대해 클러스터(Cluster)를 수행하는 단계를 더 포함할 수 있다. 그러면, 상기 제1 후보 대표 문구를 생성하는 단계는, 상기 클러스터에 속하는 문서들을 기초로 대표 기준 단어를 확장하여 상기 제1 후보 대표 문구를 생성할 수 있다.The method may further include collecting documents including the popular keyword, and performing a cluster on the collected documents. Then, in the generating of the first candidate representative phrase, the first candidate representative phrase may be generated by extending a representative reference word based on documents belonging to the cluster.

또한, 상기 제1 후보 대표 문구를 생성하는 단계는, 상기 수집된 문서들에 가중치를 부가하는 단계, 및 상기 가중치를 이용하여 클러스터의 노출 우선 순위를 결정하는 단계를 더 포함할 수 있다.The generating of the first candidate representative phrase may further include adding a weight to the collected documents, and determining an exposure priority of the cluster using the weight.

또한, 상기 제2 후보 대표 문구를 생성하는 단계는, 상기 방송 데이터에서 상기 인기 키워드를 포함하는 방송 프로그램을 결정하는 단계, 및 방송 프로그램의 방송 시간을 이용하여 상기 제2 후보 대표 문구를 생성하는 단계를 포함할 수 있다.The generating of the second candidate representative phrase may include: determining a broadcast program including the popular keyword in the broadcast data; and generating the second candidate representative phrase using a broadcast time of a broadcast program. It may include.

또한, 상기 제2 후보 대표 문구를 생성하는 단계는, 상기 인기 키워드가 여러 단어로 조합된 형태인 경우, 상기 인기 키워드를 형태소 분석하여 복수의 단어들로 분리하는 단계, 상기 인기 키워드를 포함하는 방송 프로그램 중에서 분리된 복수의 단어들에 기초하여 어느 하나의 방송 프로그램을 선택하는 단계, 및 선택된 방송 프로그램의 방송 시간을 이용하여 상기 제2 후보 대표 문구를 생성하는 단계를 포함할 수 있다.The generating of the second candidate representative phrase may include: when the popular keyword is a combination of several words, stemming the popular keyword into a plurality of words, and broadcasting the popular keyword. The method may include selecting any one broadcast program based on a plurality of words separated from the program, and generating the second candidate representative phrase using the broadcast time of the selected broadcast program.

또한, 상기 제2 후보 대표 문구를 생성하는 단계는, 상기 인기 키워드를 포함하는 방송 프로그램에 매칭 점수, 시간 점수, 및 방송국 가중치를 부가하는 단계, 상기 매칭 점수, 시간 점수, 방송국 가중치 및 방송 프로그램의 시청률 중 적어도 하나를 이용하여 상기 방송 프로그램에 대한 최종 매칭 점수를 계산하는 단계, 상기 인기 키워드를 포함하는 방송 프로그램 중에서 상기 최종 매칭 점수를 기초로 어느 하나의 방송 프로그램을 선택하는 단계, 및 선택된 방송 프로그램의 방송 시간을 이용하여 상기 제2 후보 대표 문구를 생성하는 단계를 포함할 수 있다.The generating of the second candidate representative phrase may include adding a matching score, a time score, and a broadcast station weight to a broadcast program including the popular keyword, wherein the matching score, time score, broadcast station weight, and broadcasting program are included. Calculating a final matching score for the broadcast program using at least one of the viewer ratings, selecting one of the broadcast programs based on the final matching score among broadcast programs including the popular keyword, and selecting the selected broadcast program And generating the second candidate representative phrase by using the broadcast time.

또한, 상기 최종적인 대표 문구와 상기 인기 키워드를 결합하여 제공하는 단계를 더 포함할 수 있다.The method may further include providing the final representative phrase by combining the popular keyword.

본 발명의 일실시예에 따른 대표 문구 제공 시스템은, 문서를 기초로 인기 키워드에 대한 제1 후보 대표 문구를 생성하는 제1 후보 대표 문구 생성부, 방송 데이터를 기초로 인기 키워드에 대한 제2 후보 대표 문구를 생성하는 제2 후보 대표 문구 생성부, 및 상기 제1 후보 대표 문구 및 상기 제2 후보 대표 문구를 이용하여 상기 인기 키워드에 대한 최종적인 대표 문구를 결정하는 대표 문구 결정부를 포함할 수 있다.Representative phrase providing system according to an embodiment of the present invention, a first candidate representative phrase generation unit for generating a first candidate representative phrase for a popular keyword based on a document, a second candidate for a popular keyword based on broadcast data And a second candidate representative phrase generation unit generating a representative phrase, and a representative phrase determination unit determining a final representative phrase for the popular keyword by using the first candidate representative phrase and the second candidate representative phrase. .

또한, 상기 대표 문구 결정부는, 상기 제1 후보 대표 문구와 상기 제2 후보 대표 문구 중 인기 키워드에 대한 적합도가 더 높은 것을 최종적인 대표 문구로 결정할 수 있다.In addition, the representative phrase determination unit may determine that the higher suitability of the popular keyword among the first candidate representative phrase and the second candidate representative phrase as the final representative phrase.

또한, 상기 제1 후보 대표 문구를 생성하기 위해 이용된 문서의 인기도를 계산하는 인기도 계산부를 더 포함할 수 있다. 그러면, 상기 대표 문구 결정부는, 상기 문서의 인기도와 기설정된 기준 인기도를 이용하여 상기 인기 키워드에 대한 적합도가 더 높은 것으로 결정된 제1 후보 대표 문구를 최종적인 대표 문구로 결정할 수 있다.The apparatus may further include a popularity calculator configured to calculate a popularity of the document used to generate the first candidate representative phrase. Then, the representative phrase determination unit may determine, as the final representative phrase, the first candidate representative phrase determined to have a higher suitability for the popular keyword by using the popularity of the document and a predetermined reference popularity.

또한, 상기 제2 후보 대표 문구를 생성하기 위해 이용된 방송 데이터의 방송 시간과 인기 키워드의 노출 시간을 이용하여 제2 후보 대표 문구의 시간 점수를 계산하는 시간 점수 계산부를 더 포함할 수 있다. 그러면, 상기 대표 문구 결정부는, 상기 제2 후보 대표 문구의 시간 점수와 기설정된 기준 시간 점수를 이용하여 상기 인기 키워드에 대한 적합도가 더 높은 것으로 결정된 상기 제2 후보 대표 문구를 최종적인 대표 문구로 결정할 수 있다.The apparatus may further include a time score calculator configured to calculate a time score of the second candidate representative phrase by using the broadcast time of the broadcast data used to generate the second candidate representative phrase and the exposure time of the popular keyword. Then, the representative phrase determination unit determines the second candidate representative phrase as the final representative phrase that is determined to have higher suitability for the popular keyword by using the time score of the second candidate representative phrase and a preset reference time score. Can be.

또한, 상기 제1 후보 대표 문구 생성부는, 상기 인기 키워드를 포함하는 문서들의 형태소를 분석하여 기준 단어를 결정하고, 결정된 기준 단어의 빈도수에 따라 대표 기준 단어를 결정하는 기준 단어 결정부, 및 상기 대표 기준 단어의 이전 또는 이후에 연속하는 단어와 상기 대표 기준 단어를 조합하여 상기 대표 기준 단어를 확장하고, 확장된 대표 기준 단어를 이용하여 상기 제2 후보 대표 문구를 생성하는 기준 단어 확장부를 포함할 수 있다.The first candidate representative phrase generation unit may further determine a reference word by analyzing the morphemes of documents including the popular keyword, and determine the representative reference word according to the determined frequency of the reference word, and the representative. And a reference word extension unit configured to expand the representative reference word by combining a continuous word before or after the reference word and the representative reference word, and generate the second candidate representative phrase using the extended representative reference word. have.

또한, 상기 인기 키워드를 포함하는 문서들을 수집하는 문서 수집부, 및 수집된 문서들에 대해 클러스터(Cluster)를 수행하는 클러스터 수행부를 더 포함할 수 있다.The apparatus may further include a document collecting unit collecting documents including the popular keyword, and a cluster performing unit performing a cluster on the collected documents.

또한, 상기 제2 후보 대표 문구 생성부는, 상기 방송 데이터에서 상기 인기 키워드를 포함하는 방송 프로그램을 결정하는 방송 프로그램 결정부, 및 결정된 방송 프로그램의 방송 시간을 이용하여 상기 제2 후보 대표 문구를 생성하는 생성부를 포함할 수 있다.The second candidate representative phrase generation unit may be configured to generate the second candidate representative phrase using the broadcast program determiner that determines a broadcast program including the popular keyword in the broadcast data, and the broadcast time of the determined broadcast program. It may include a generation unit.

또한, 상기 제2 후보 대표 문구 생성부는, 상기 인기 키워드가 여러 단어로 조합된 형태인 경우, 상기 인기 키워드를 형태소 분석하여 복수의 단어들로 분리하고, 상기 인기 키워드를 포함하는 방송 프로그램 중에서 분리된 복수의 단어들에 기초하여 어느 하나의 방송 프로그램을 선택하는 방송 프로그램 결정부, 및 선택된 방송 프로그램의 방송 시간을 이용하여 상기 제2 후보 대표 문구를 생성하는 생성부를 포함할 수 있다.The second candidate representative phrase generator may, when the popular keyword is in the form of a combination of words, separates the popular keyword into a plurality of words and separates the broadcast keyword including the popular keyword from the broadcast program including the popular keyword. The apparatus may include a broadcast program determiner that selects any one broadcast program based on a plurality of words, and a generator that generates the second candidate representative phrase using the broadcast time of the selected broadcast program.

또한, 상기 제2 후보 대표 문구 생성부는, 상기 인기 키워드를 포함하는 방송 프로그램에 매칭 점수, 시간 점수, 및 방송국 가중치를 부가하고, 상기 매칭 점수, 시간 점수, 방송국 가중치 및 방송 프로그램의 시청률 중 적어도 하나를 이용하여 상기 방송 프로그램에 대한 최종 매칭 점수를 계산하는 매칭 점수 계산부, 상기 인기 키워드를 포함하는 방송 프로그램 중에서 상기 최종 매칭 점수를 기초로 어느 하나의 방송 프로그램을 선택하는 방송 프로그램 결정부, 및 선택된 방송 프로그램의 방송 시간을 이용하여 상기 제2 후보 대표 문구를 생성하는 생성부를 포함할 수 있다.The second candidate representative phrase generation unit may add a matching score, a time score, and a broadcast station weight to a broadcast program including the popular keyword, and at least one of the matching score, the time score, the broadcast station weight, and the viewer rating of the broadcast program. A matching score calculation unit that calculates a final matching score for the broadcast program using a broadcast program determination unit that selects any one broadcast program based on the final matching score among broadcast programs including the popular keyword, and selected It may include a generation unit for generating the second candidate representative phrase by using the broadcast time of the broadcast program.

또한, 상기 대표 문구와 상기 인기 키워드를 결합하여 제공하는 대표 문구 제공부를 포함할 수 있다.In addition, the representative phrase may include a representative phrase providing unit for providing a combination of the popular keyword.

본 발명의 일실시예에 따르면, 문서를 이용하여 실시간으로 인기 키워드가 노출된 원인과 관련된 제1 후보 대표 문구를 제공함으로써 인기 키워드로 제시된 이유에 대한 궁금증이 해결될 수 있다.According to an embodiment of the present invention, the question about the reason for being presented as the popular keyword may be solved by providing a first candidate representative phrase related to the cause of the popular keyword being exposed in real time using the document.

본 발명의 일실시예에 따르면, 방송 데이터를 이용하여 실시간으로 인기 키워드가 노출된 원인과 관련된 제2 후보 대표 문구를 제공함으로써 인기 키워드로 제시된 이유에 대한 궁금증이 해결될 수 있다.According to an embodiment of the present invention, questions about the reason for being presented as a popular keyword may be solved by providing a second candidate representative phrase related to the cause of exposure of the popular keyword in real time using broadcast data.

본 발명의 일실시예에 따르면, 제1 후보 대표 문구 및 제2 후보 대표 문구에 대해 인기도와 방송 적합도를 고려하여 인기 키워드에 대한 최종적인 대표 문구를 결정함으로써 인기 키워드가 노출된 원인이 보다 정확하게 제공될 수 있다. According to an embodiment of the present invention, by determining the final representative phrase for the popular keyword in consideration of the popularity and broadcasting suitability for the first candidate representative phrase and the second candidate representative phrase provides a more accurate cause of the exposure of the popular keyword Can be.

도 1은 본 발명의 일실시예에 따른 대표 문구를 생성하는 과정을 설명하기 위해 제공되는 도면이다.
도 2는 본 발명의 일실시예에 따른 대표 문구 제공 시스템의 구성을 도시한 블록도이다.
도 3는 본 발명의 일실시예에 따른 도 2의 대표 문구 제공 시스템의 동작을 설명하기 위해 제공되는 흐름도이다.
도 4는 본 발명의 일실시예에 따른 제1 후보 대표 문구 생성부의 세부 구성을 도시한 블럭도이다.
도 5는 제1 후보 대표 문구와 인기 키워드를 결합하여 웹 페이지에 노출하는 화면을 도시한 도면이다.
도 6은 본 발명의 일 실시예에 따른 제1 후보 대표 문구를 생성하는 방법을 설명하기 위해 제공되는 흐름도이다.
도 7은 본 발명의 일실시예에 따른 제2 후보 대표 문구 생성부의 세부 구성을 도시한 블럭도이다.
도 8은 제2 후보 대표 문구와 인기 키워드를 결합하여 웹 페이지에 노출하는 화면을 도시한 도면이다.
도 9는 본 발명의 일실시예에 따라 방송 시간을 이용하여 제2 후보 대표 문구를 제공하는 과정을 설명하기 위해 제공되는 흐름도이다.
도 10은 여러 단어가 조합된 형태를 갖는 인기 키워드에 대한 제2 후보 대표 문구를 생성하는 과정을 설명하기 위해 제공되는 흐름도이다.
도 11은 본 발명의 일실시예에 따라 매칭 점수를 이용하여 제2 후보 대표 문구를 생성하는 과정을 설명하기 위해 제공되는 흐름도이다.1 is a view provided to explain a process of generating a representative phrase according to an embodiment of the present invention.
2 is a block diagram showing the configuration of a representative phrase providing system according to an embodiment of the present invention.
3 is a flowchart provided to explain the operation of the representative phrase providing system of FIG. 2 according to an embodiment of the present invention.
4 is a block diagram illustrating a detailed configuration of a first candidate representative phrase generation unit according to an embodiment of the present invention.
FIG. 5 is a diagram illustrating a screen displaying a web page by combining a first candidate representative phrase and a popular keyword.
6 is a flowchart provided to explain a method of generating a first candidate representative phrase according to an embodiment of the present invention.
7 is a block diagram illustrating a detailed configuration of a second candidate representative phrase generation unit according to an embodiment of the present invention.
FIG. 8 is a diagram illustrating a screen in which a second candidate representative phrase and a popular keyword are combined and displayed on a web page.
9 is a flowchart provided to explain a process of providing a second candidate representative phrase using broadcast time according to an embodiment of the present invention.
FIG. 10 is a flowchart provided to explain a process of generating a second candidate representative phrase for a popular keyword having a combination of several words.
11 is a flowchart provided to explain a process of generating a second candidate representative phrase using a matching score according to an embodiment of the present invention.

이하, 본 발명의 실시예를 첨부된 도면을 참조하여 상세하게 설명한다. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일실시예에 따른 대표 문구를 생성하는 과정을 설명하기 위해 제공되는 도면이다.1 is a view provided to explain a process of generating a representative phrase according to an embodiment of the present invention.

도 1을 따르면, 대표 문구 제공 시스템(30)은 실시간 인기 키워드(40)와 관련된 문서들(10) 및 실시간 인기 키워드(40)와 관련된 방송 데이터들(20)을 수집할 수 있다. 여기서, 대표 문구 제공 시스템(30)은 인기 키워드를 포함하는 텍스트 문서(document), 인기 키워드와 관련된 기사(article), 및 상기 인기 키워드의 정지 영상과 같은 문서들(10)을 포털 사이트, 뉴스 사이트, 블로그, 카페, 마이크로 블로그와 같은 커뮤니티 등으로부터 수집할 수 있다.Referring to FIG. 1, the representative phrase providing system 30 may collect documents 10 related to the real time popular keyword 40 and broadcast data 20 related to the real time popular keyword 40. Here, the representative phrase providing system 30 may include a text document including a popular keyword, an article related to the popular keyword, and documents 10 such as a still image of the popular keyword, such as a portal site and a news site. From blogs, cafes, communities such as microblogs.

이때, 대표 문구 제공 시스템(30)은 수집된 문서들의 형태소를 분석하여 인기 키워드에 대한 제1 후보 대표 문구를 생성할 수 있다. 여기서, 제1 후보 대표 문구는 웹 페이지에 노출하고자 하는 인기 키워드가 실시간 인기 키워드가 된 원인을 나타내는 문구로서, 문서를 기초로 생성된 문구를 의미한다.In this case, the representative phrase providing system 30 may generate the first candidate representative phrase for the popular keyword by analyzing the morphemes of the collected documents. Here, the first candidate representative phrase is a phrase indicating a reason that the popular keyword to be exposed on the web page becomes a real-time popular keyword, and means a phrase generated based on a document.

그리고, 방송 데이터를 수집하는 경우, 대표 문구 제공 시스템(30)은 기설정된 시간마다 주기적으로 방송 데이터들을 방송국으로부터 수집할 수 있다. 여기서, 대표 문구 제공 시스템(30)은 공중파 TV 방송, 케이블 TV 방송, 라디오 방송, 인터넷 방송, 위성 TV 방송과 관련된 방송국 사이트, 뉴스 사이트 등으로부터 방송 편성표, 전자 프로그램 가이드와 같은 방송 데이터들(20)을 수집할 수 있다. In addition, when collecting broadcast data, the representative phrase providing system 30 may periodically collect broadcast data from a broadcasting station at predetermined time intervals. Here, the representative phrase providing system 30 is broadcast data such as a broadcast schedule, an electronic program guide from a broadcasting station site, a news site, and the like related to airwave TV broadcasting, cable TV broadcasting, radio broadcasting, internet broadcasting, satellite TV broadcasting, and the like. Can be collected.

이때, 대표 문구 제공 시스템(30)은 수집된 방송 데이터에서 인기 키워드를 포함하는 방송 프로그램을 결정할 수 있다. 그러면, 대표 문구 제공 시스템(30)은 결정된 방송 프로그램을 기초로 인기 키워드에 대한 제2 후보 대표 문구를 생성할 수 있다. 여기서, 제2 후보 대표 문구는 웹 페이지에 노출하고자 하는 인기 키워드가 실시간 인기 키워드가 된 원인을 나타내는 문구로서, 방송 데이터를 기초로 생성된 문구를 의미한다.In this case, the representative phrase providing system 30 may determine a broadcast program including a popular keyword from the collected broadcast data. Then, the representative phrase providing system 30 may generate a second candidate representative phrase for the popular keyword based on the determined broadcast program. Here, the second candidate representative phrase is a phrase indicating a reason that the popular keyword to be exposed on the web page becomes a real-time popular keyword and means a phrase generated based on broadcast data.

일례로, 대표 문구 제공 시스템(30)은 결정된 방송 프로그램의 방송 시간을 이용하여 제2 후보 대표 문구를 생성할 수 있다. 여기서, 방송 시간은 방송 프로그램의 방송 시작 시간 및 방송 종료 시간을 포함할 수 있다.For example, the representative phrase providing system 30 may generate the second candidate representative phrase using the determined broadcast time of the broadcast program. Here, the broadcast time may include a broadcast start time and broadcast end time of the broadcast program.

그리고, 대표 문구 제공 시스템(30)은 제1 후보 대표 문구와 제2 후보 대표 문구를 이용하여 인기 키워드에 대한 대표 문구를 결정할 수 있다. 이때, 대표 문구 제공 시스템(30)은 제1 후보 대표 문구와 제2 후보 대표 문구 중 인기 키워드에 대한 적합도가 더 높은 것을 최종적인 대표 문구로 결정할 수 있다. 여기서, 제1 후보 대표 문구와 제2 후보 대표 문구는 인기 키워드가 실시간으로 급상승하게 된 원인인 대표 문구가 될 가능성이 높은 후보군을 의미한다.In addition, the representative phrase providing system 30 may determine the representative phrase for the popular keyword by using the first candidate representative phrase and the second candidate representative phrase. In this case, the representative phrase providing system 30 may determine that the higher suitability of the popular keyword among the first candidate representative phrase and the second candidate representative phrase is the final representative phrase. Here, the first candidate representative phrase and the second candidate representative phrase mean a candidate group having a high possibility of becoming the representative phrase which is the cause of the sudden rise in popularity keywords in real time.

일례로, 대표 문구 제공 시스템(30)은 제1 후보 대표 문구를 생성하기 위해 이용된 문서의 인기도에 기초하여 제1 후보 대표 문구의 적합도를 결정할 수 있다. 그리고, 대표 문구 제공 시스템(30)은 제1 후보 대표 문구의 적합도에 기초하여 제1 후보 대표 문구를 최종적인 대표 문구로 결정할 수 있다.In one example, the representative phrase providing system 30 may determine a goodness of fit of the first candidate representative phrase based on the popularity of the document used to generate the first candidate representative phrase. In addition, the representative phrase providing system 30 may determine the first candidate representative phrase as the final representative phrase based on the goodness of fit of the first candidate representative phrase.

이때, 제1 후보 대표 문구가 적합하지 않은 것으로 결정된 경우, 대표 문구 제공 시스템(30)은 제2 후보 대표 문구의 시간 점수에 기초하여 제2 후보 대표 문구의 적합도를 결정할 수 있다. 이어, 대표 문구 제공 시스템(30)은 제2 후보 대표 문구의 적합도에 기초하여 제2 후보 대표 문구를 최종적인 대표 문구로 결정할 수 있다.In this case, when it is determined that the first candidate representative phrase is not suitable, the representative phrase providing system 30 may determine the suitability of the second candidate representative phrase based on a time score of the second candidate representative phrase. Subsequently, the representative phrase providing system 30 may determine the second candidate representative phrase as the final representative phrase based on the goodness of fit of the second candidate representative phrase.

그러면, 대표 문구 제공 시스템(30)은 최종적인 대표 문구와 인기 키워드를 결합하여 웹 페이지에 노출할 수 있다.Then, the representative phrase providing system 30 may combine the final representative phrase and the popular keyword to expose the web page.

도 2는 본 발명의 일실시예에 따른 대표 문구 제공 시스템의 구성을 도시한 블록도이다. 2 is a block diagram showing the configuration of a representative phrase providing system according to an embodiment of the present invention.

도 2에 따르면, 대표 문구 제공 시스템(200)은 제1 후보 대표 문구 생성부(210), 제2 후보 대표 문구 생성부(220), 인기도 계산부(230), 대표 문구 결정부(240), 시간 점수 계산부(250), 및 대표 문구 제공부(260)를 포함할 수 있다.2, the representative phrase providing system 200 includes a first candidate representative phrase generator 210, a second candidate representative phrase generator 220, a popularity calculator 230, a representative phrase determiner 240, The time score calculator 250 and the representative phrase provider 260 may be included.

먼저, 제1 후보 대표 문구 생성부(210)는 문서(110)를 기초로 인기 키워드에 대한 제1 후보 대표 문구를 생성할 수 있다. 여기서, 제1 후보 대표 문구는 인기 키워드가 제시된 근거를 나타내는 문구로서, 문서(110)에 기초하여 추출될 수 있다.First, the first candidate representative phrase generation unit 210 may generate a first candidate representative phrase for a popular keyword based on the document 110. Here, the first candidate representative phrase is a phrase indicating the basis for which the popular keyword is presented and may be extracted based on the document 110.

일례로, 제1 후보 대표 문구 생성부(210)는 실시간 인기 키워드와 관련된 문서들의 형태소를 분석하여 제1 후보 대표 문구를 생성할 수 있다. 여기서, 제1 후보 대표 문구를 생성하는 자세한 과정은 도 4 내지 도 6을 참조하여 후술하기로 한다.For example, the first candidate representative phrase generation unit 210 may generate a first candidate representative phrase by analyzing morphemes of documents related to a real-time popular keyword. Here, a detailed process of generating the first candidate representative phrase will be described later with reference to FIGS. 4 to 6.

이어, 제2 후보 대표 문구 생성부(220)는 방송 데이터(120)를 기초로 인기 키워드에 대한 제2 후보 대표 문구를 생성할 수 있다. 여기서, 제2 후보 대표 문구는 인기 키워드가 제시된 근거를 나타내는 문구로서, 방송 데이터(120)에 기초하여 추출될 수 있다.Subsequently, the second candidate representative phrase generator 220 may generate a second candidate representative phrase for a popular keyword based on the broadcast data 120. Here, the second candidate representative phrase is a phrase indicating the basis for which the popular keyword is presented and may be extracted based on the broadcast data 120.

일례로, 제2 후보 대표 문구 생성부(220)는 방송 데이터(120)에서 인기 키워드를 포함하는 방송 프로그램을 결정하고, 결정된 방송 프로그램의 방송 시간에 기초하여 제2 후보 대표 문구를 생성할 수 있다. 여기서, 제2 후보 대표 문구를 생성하는 자세한 과정은 도 7 내지 도 11을 참조하여 후술하기로 한다.For example, the second candidate representative phrase generator 220 may determine a broadcast program including a popular keyword in the broadcast data 120, and generate the second candidate representative phrase based on the determined broadcast time of the broadcast program. . Here, a detailed process of generating the second candidate representative phrase will be described later with reference to FIGS. 7 to 11.

인기도 계산부(230)는 인기 키워드와 관련하여 수집된 문서들 중에서 제1 후보 대표 문구를 생성하기 위해 이용된 문서를 추출할 수 있다. 그리고, 인기도 계산부(230)는 추출된 문서의 인기도를 계산할 수 있다. The popularity calculator 230 may extract a document used to generate the first candidate representative phrase from among documents collected in association with the popular keyword. The popularity calculator 230 may calculate the popularity of the extracted document.

일례로, 인기도 계산부(230)는 추출된 문서의 클릭수 및 노출수를 조합한 검색 대비 클릭 비율(CTR)을 이용하여 문서의 인기도를 계산할 수 있다. 이때, 인기도 계산부(230)는 추출된 문서의 클릭 로그 자료를 이용하여 클릭수를 계산할 수 있다. 그리고, 인기도 계산부(230)는 추출된 문서의 검색 로그 자료를 이용하여 문서의 노출수를 계산할 수 있다. 여기서, 클릭 로그 자료 및 검색 로그 자료는 최근 30분 이내로 한정될 수 있다.For example, the popularity calculator 230 may calculate the popularity of the document using a click-to-search ratio (CTR) that combines the number of clicks and impressions of the extracted document. At this time, the popularity calculation unit 230 may calculate the number of clicks using the click log data of the extracted document. The popularity calculator 230 may calculate the number of exposures of the document using the search log data of the extracted document. Here, the click log data and the search log data may be limited to the last 30 minutes.

그러면, 대표 문구 결정부(240)는 제1 후보 대표 문구의 적합도 및 제2 후보 대표 문구의 적합도에 기초하여 제1 후보 대표 문구 및 제2 후보 대표 문구 중 인기 키워드에 대한 적합도가 더 높은 것을 최종적인 대표 문구로 결정할 수 있다.Then, the representative phrase determination unit 240 determines that the higher suitability for the popular keyword among the first candidate representative phrase and the second candidate representative phrase is higher based on the goodness of fit of the first candidate representative phrase and the second candidate representative phrase. It can be decided by a representative representative phrase.

일례로, 적합도 결정에 문서의 인기도와 기설정된 기준 인기도를 이용하고, 추출된 문서가 한 개인 경우, 인기도 계산부(230)는 하나의 문서 인기도를 최종 인기도로 계산할 수 있다. 이때, 추출된 문서가 복수 개인 경우, 인기도 계산부(230)는 복수의 문서 인기도를 조합하여 최종 인기도를 계산할 수 있다.For example, when the document popularity and the predetermined reference popularity are used for determining the goodness of fit, and there is only one extracted document, the popularity calculator 230 may calculate one document popularity as the final popularity. In this case, when there are a plurality of extracted documents, the popularity calculation unit 230 may calculate the final popularity by combining the plurality of document popularity.

그러면, 대표 문구 결정부(240)는 최종 인기도와 기준 인기도를 이용하여 제1 후보 대표 문구의 적합도를 결정할 수 있다.Then, the representative phrase determination unit 240 may determine the suitability of the first candidate representative phrase using the final popularity and the reference popularity.

일례로, 기준 인기도가 α로 기설정된 경우, 대표 문구 결정부(240)는 문서의 인기도와 기준 인기도 α를 비교할 수 있다. 그리고, 최종 인기도가 기준 인기도 α 이상인 경우, 대표 문구 결정부(240)는 제1 후보 대표 문구가 제2 후보 대표 문구보다 인기 키워드에 대한 적합도가 더 높은 것으로 결정할 수 있다. 그러면, 대표 문구 결정부(240)는 제1 후보 대표 문구를 최종적인 대표 문구로 결정할 수 있다. 즉, 최종 인기도가 기준 인기도 α 이상인 경우, 대표 문구 결정부(240)는 제1 후보 대표 문구를 생성하는 데 이용한 문서가 인기도를 획득한 것으로 결정할 수 있다. 여기서, 기준 인기도 α는 변경될 수 있다.For example, when the reference popularity is preset to α, the representative phrase determination unit 240 may compare the popularity of the document and the reference popularity α. If the final popularity is greater than or equal to the reference popularity α, the representative phrase determination unit 240 may determine that the first candidate representative phrase has a higher suitability for the popular keyword than the second candidate representative phrase. Then, the representative phrase determination unit 240 may determine the first candidate representative phrase as the final representative phrase. That is, when the final popularity is greater than or equal to the reference popularity α, the representative phrase determination unit 240 may determine that the document used to generate the first candidate representative phrase has obtained popularity. Here, the reference popularity α may be changed.

다른 예로, 최종 인기도가 기준 인기도 α 미만인 경우, 대표 문구 결정부(240)는 제1 후보 대표 문구가 인기 키워드에 대해 최종적인 대표 문구로 적합하지 않은 것으로 결정할 수 있다. As another example, when the final popularity is less than the reference popularity α, the representative phrase determination unit 240 may determine that the first candidate representative phrase is not suitable as the final representative phrase for the popular keyword.

그러면, 시간 점수 계산부(250)는 제2 후보 대표 문구의 시간 점수를 계산할 수 있다. 이때, 시간 점수 계산부(250)는 1)제2 후보 대표 문구를 생성하는 데 이용된 방송 프로그램의 방송 시간, 및 2)인기 키워드의 노출 시간에 기초하여 제2 후보 대표 문구의 시간 점수를 계산할 수 있다. 여기서, 방송 시간은, 방송 시작 시간 및 방송 종료 시간을 포함할 수 있다.Then, the time score calculator 250 may calculate the time score of the second candidate representative phrase. In this case, the time score calculator 250 may calculate the time score of the second candidate representative phrase based on 1) the broadcast time of the broadcast program used to generate the second candidate representative phrase, and 2) the exposure time of the popular keyword. Can be. Here, the broadcast time may include a broadcast start time and a broadcast end time.

일례로, 노출 시간이 방송 시작 시간 이상이고, 방송 종료 시간 이하인 경우, 시간 점수 계산부(250)는 제2 후보 대표 문구의 시간 점수를 기설정된 기준 점수 β로 계산할 수 있다. 즉, 방송 프로그램이 방송되는 시간 중에 인기 키워드가 노출된 경우, 시간 점수 계산부(250)는 제2 후보 대표 문구의 시간 점수를 기준 점수 β로 계산할 수 있다.For example, when the exposure time is greater than or equal to the broadcast start time and less than or equal to the broadcast end time, the time score calculator 250 may calculate the time score of the second candidate representative phrase as the predetermined reference score β. That is, when the popular keyword is exposed during the broadcast time of the broadcast program, the time score calculator 250 may calculate the time score of the second candidate representative phrase as the reference score β.

다른 예로, 노출 시간이 방송 시작 시간 이전이거나, 또는 노출 시간이 방송 종료 시간 이후인 경우, 시간 점수 계산부(250)는 기준 점수 β에서 오차 단위 γ 마다 오차 점수 δ를 감산하여 제2 후보 대표 문구의 시간 점수를 계산할 수 있다. 여기서, 오차 단위 γ 및 오차 점수 δ는 기설정되며, 변경될 수 있다.As another example, when the exposure time is before the broadcast start time, or when the exposure time is after the broadcast end time, the time score calculator 250 subtracts the error score δ for each error unit γ from the reference score β to form the second candidate representative phrase. The time score of can be calculated. Here, the error unit γ and the error score δ are preset and may be changed.

이때, 노출 시간이 방송 시작 시간 이전인 경우, 시간 점수 계산부(250)는 노출 시간과 방송 시작 시간 간의 차이값을 계산할 수 있다. 그리고, 시간 점수 계산부(250)는 차이값을 기준으로 오차 단위마다 기준 점수에서 오차 점수를 감산하여 제2 후보 대표 문구의 시간 점수를 계산할 수 있다. 예를 들어, 차이값이 20분이고, 오차 단위 γ가 10분, 오차 점수 δ가 5로 기설정된 경우, 시간 점수 계산부(250)는 차이값을 기준으로 계산된 오차값 2와 오차 점수 5의 곱을 기준 점수 β에서 감산(β-(2×5))할 수 있다. 그리고, 시간 점수 계산부(250)는 감산 결과인 β-10을 제2 후보 대표 문구의 시간 점수로 계산할 수 있다.In this case, when the exposure time is before the broadcast start time, the time score calculator 250 may calculate a difference value between the exposure time and the broadcast start time. The time score calculator 250 may calculate the time score of the second candidate representative phrase by subtracting the error score from the reference score for each error unit based on the difference value. For example, when the difference value is 20 minutes, the error unit γ is set to 10 minutes, and the error score δ is 5, the time score calculator 250 may calculate the error value 2 and the error score 5 calculated based on the difference value. The product can be subtracted from the reference score β (β− (2 × 5)). In addition, the time score calculator 250 may calculate β-10, which is a result of the subtraction, as a time score of the second candidate representative phrase.

그러면, 대표 문구 결정부(240)는 제2 후보 대표 문구의 시간 점수와 기설정된 기준 시간 점수 τ를 이용하여 제2 후보 대표 문구가 제1 후보 대표 문구보다 인기 키워드에 대한 적합도가 더 높은지 여부를 결정할 수 있다. Then, the representative phrase determination unit 240 determines whether the second candidate representative phrase has a higher suitability for the popular keyword than the first candidate representative phrase by using the time score of the second candidate representative phrase and the preset reference time score τ. You can decide.

일례로, 제2 후보 대표 문구의 시간 점수가 기준 시간 점수 이하가 아닌 경우, 대표 문구 결정부(240)는 제2 후보 대표 문구가 인기 키워드에 대한 적합도가 더 높은 것으로 결정할 수 있다. 이에 따라, 대표 문구 결정부(240)는 제2 후보 대표 문구를 최종적인 대표 문구로 결정할 수 있다. 이때, 제2 후보 대표 문구의 시간 점수가 기준 시간 점수 이하인 경우, 대표 문구 결정부(240)는 제2 후보 대표 문구가 인기 키워드에 대해 최종적인 대표 문구로 적합하지 않다고 결정할 수 있다.For example, when the time score of the second candidate representative phrase is not less than or equal to the reference time score, the representative phrase determination unit 240 may determine that the second candidate representative phrase has a higher suitability for the popular keyword. Accordingly, the representative phrase determination unit 240 may determine the second candidate representative phrase as the final representative phrase. In this case, when the time score of the second candidate representative phrase is less than or equal to the reference time score, the representative phrase determination unit 240 may determine that the second candidate representative phrase is not suitable as the final representative phrase for the popular keyword.

대표 문구 제공부(260)는 인기 키워드에 대해 최종적으로 결정된 대표 문구와 인기 키워드를 결합하여 제공할 수 있다. 그러면, 사용자는 인기 키워드의 인기도가 실시간으로 급상승한 이유를 최종적으로 결정된 대표 문구를 통해 확인할 수 있다.The representative phrase provider 260 may provide a combination of the representative phrase and the popular keyword finally determined for the popular keyword. Then, the user can confirm the reason why the popularity of the popular keyword is rapidly increased in real time through the representative phrase finally determined.

일례로, 제1 후보 대표 문구가 "홍콩 여배우 BBB와 비밀 데이트"이고, 제1 후보 대표 문구를 인기 키워드에 대해 최종적인 대표 문구로 결정한 경우, 대표 문구 제공부(260)는 도 5와 같이, 대표 문구 "홍콩 여배우 BBB와 비밀 데이트"(521)와 인기 키워드 "AAA"(510)를 결합하여 웹 페이지에 노출할 수 있다. 이때, 대표 문구 제공부(260)는 인기 키워드 "AAA"에 대한 사진(522)을 인기 키워드(510) 및 대표 문구(521)와 결합하여 웹 페이지에 노출할 수도 있다.For example, when the first candidate representative phrase is "secret date with Hong Kong actress BBB," and the first candidate representative phrase is determined as the final representative phrase for the popular keyword, the representative phrase providing unit 260 as shown in FIG. 5, The representative phrase "Hong Kong actress BBB and secret date" (521) and the popular keyword "AAA" (510) can be combined to expose the web page. In this case, the representative phrase provider 260 may combine the photo 522 of the popular keyword “AAA” with the popular keyword 510 and the representative phrase 521 to expose the web page.

다른 예로, 제2 후보 대표 문구가 "22:00부터 방송 시작"이고, 제2 후보 대표 문구를 인기 키워드에 대해 최종적인 대표 문구로 결정한 경우, 대표 문구 제공부(260)는 도 8과 같이, 대표 문구 제공부(260)는 인기 키워드 "XXX"(810)와 대표 문구 "22:00부터 방송 시작"(821)을 결합하여 웹 페이지에 노출할 수 있다. 이때, 대표 문구 제공부(260)는 인기 키워드 "XXX"의 방송 프로그램에 대한 화보 사진(822)을 인기 키워드(810) 및 대표 문구(821)와 결합하여 웹 페이지에 노출할 수도 있다. As another example, when the second candidate representative phrase is "starting broadcasting from 22:00" and the second candidate representative phrase is determined as the final representative phrase for the popular keyword, the representative phrase providing unit 260 as shown in FIG. 8, The representative phrase provider 260 may combine the popular keyword "XXX" 810 and the representative phrase "start broadcasting from 22:00" 821 and expose the web page. In this case, the representative phrase provider 260 may combine the pictorial photo 822 of the broadcast program of the popular keyword “XXX” with the popular keyword 810 and the representative phrase 821 and expose the web page.

이상의 도 2에서는, 대표 문구 제공 시스템(200)이 제1 후보 대표 문구 생성부(210), 제2 후보 대표 문구 생성부(220), 인기도 계산부(230), 대표 문구 결정부(240), 시간 점수 계산부(250), 및 대표 문구 제공부(260)를 포함하는 것으로 설명하였으나, 이는 실시예에 해당되며, 본 명세서에서 사용되는 대표 문구 제공 시스템의 구성은 사용자, 운용자의 의도 또는 본 발명이 속하는 분야의 관례 등에 따라 달라질 수 있다. 따라서, 본 대표 문구 제공 시스템의 구성이 도 2에 의해 제한되거나 한정되는 것은 아니다.In FIG. 2, the representative phrase providing system 200 includes a first candidate representative phrase generation unit 210, a second candidate representative phrase generation unit 220, a popularity calculation unit 230, a representative phrase determination unit 240, Although it has been described as including the time score calculation unit 250, and the representative phrase providing unit 260, this is an embodiment, the configuration of the representative phrase providing system used in the present specification is the user or operator's intention or the present invention It may vary depending on the practice of the field to which it belongs. Therefore, the configuration of the present representative phrase providing system is not limited or limited by FIG. 2.

일례로, 본 발명의 일실시예에 따른 대표 문구 제공 시스템은, 제1 후보 대표 문구 생성부, 제2 후보 대표 문구 생성부, 및 대표 문구 결정부로 구성될 수 있다. 이때, 대표 문구 제공 시스템은 대표 문구 제공부를 더 포함할 수도 있다.In one example, the representative phrase providing system according to an embodiment of the present invention may be composed of a first candidate representative phrase generation unit, a second candidate representative phrase generation unit, and a representative phrase determination unit. In this case, the representative phrase providing system may further include a representative phrase providing unit.

다른 예로, 본 발명의 일실시예에 따른 대표 문구 제공 시스템은, 제1 후보 대표 문구 생성부, 제2 후보 대표 문구 생성부, 인기도 계산부, 및 대표 문구 결정부로 구성될 수 있다. 이때, 대표 문구 제공 시스템은 대표 문구 제공부를 더 포함할 수도 있다.As another example, the representative phrase providing system according to an embodiment of the present invention, may be composed of a first candidate representative phrase generation unit, a second candidate representative phrase generation unit, a popularity calculation unit, and a representative phrase determination unit. In this case, the representative phrase providing system may further include a representative phrase providing unit.

또 다른 예로, 본 발명의 일실시예에 따른 대표 문구 제공 시스템은, 대표 문구 제공 시스템은, 제1 후보 대표 문구 생성부, 제2 후보 대표 문구 생성부, 시간 점수 계산부, 및 대표 문구 결정부로 구성될 수도 있다. 이때, 대표 문구 제공 시스템은 대표 문구 제공부를 더 포함할 수도 있다.As another example, the representative phrase providing system according to an embodiment of the present invention, the representative phrase providing system, the first candidate representative phrase generation unit, the second candidate representative phrase generation unit, time score calculation unit, and representative phrase determination unit It may be configured. In this case, the representative phrase providing system may further include a representative phrase providing unit.

도 3는 본 발명의 일실시예에 따른 도 2의 대표 문구 제공 시스템의 동작을 설명하기 위해 제공되는 흐름도이다. 3 is a flowchart provided to explain the operation of the representative phrase providing system of FIG. 2 according to an embodiment of the present invention.

먼저, 310 단계에서, 제1 후보 대표 문구 생성부(210)는 문서를 기초로 제1 후보 대표 문구를 생성할 수 있다. 이때, 제1 후보 대표 문구 생성부(210)는 실시간 인기 키워드와 관련된 문서들의 형태소를 분석하여 제1 후보 대표 문구를 생성할 수 있다.First, in operation 310, the first candidate representative phrase generator 210 may generate a first candidate representative phrase based on a document. In this case, the first candidate representative phrase generation unit 210 may generate the first candidate representative phrase by analyzing the morphemes of documents related to the real-time popular keyword.

이어, 320 단계에서, 제2 후보 대표 문구 생성부(220)는 방송 데이터를 기초로 제2 후보 대표 문구를 생성할 수 있다. 일례로, 제2 후보 대표 문구 생성부(220)는 방송 데이터에 포함된 복수의 방송 프로그램들 중에서 인기 키워드를 포함하는 방송 프로그램을 결정할 수 있다. 그리고, 제2 후보 대표 문구 생성부(220)는 결정된 방송 프로그램의 방송 시간에 기초하여 제2 후보 대표 문구를 생성할 수 있다.In operation 320, the second candidate representative phrase generator 220 may generate the second candidate representative phrase based on the broadcast data. For example, the second candidate representative phrase generator 220 may determine a broadcast program including a popular keyword among a plurality of broadcast programs included in the broadcast data. In addition, the second candidate representative phrase generator 220 may generate the second candidate representative phrase based on the determined broadcast time of the broadcast program.

그리고, 330 단계에서, 인기도 계산부(230)는 제1 후보 대표 문구를 생성하는데 이용된 하나 이상의 문서의 인기도를 계산할 수 있다.In operation 330, the popularity calculator 230 may calculate the popularity of one or more documents used to generate the first candidate representative phrase.

이때, 인기도 계산부(230)는 인기 키워드와 관련하여 수집된 문서들 중에서 제1 후보 대표 문구를 생성하기 위해 이용된 문서를 추출할 수 있다. 그리고, 인기도 계산부(230)는 추출된 문서의 클릭수 및 노출수를 조합한 검색 대비 클릭 비율(CTR)을 이용하여 문서의 인기도를 계산할 수 있다. 이때, 인기도 계산부(230)는 추출된 문서의 클릭 로그 자료를 이용하여 클릭수를 계산할 수 있다. 그리고, 인기도 계산부(230)는 추출된 문서의 검색 로그 자료를 이용하여 문서의 노출수를 계산할 수 있다. 여기서, 클릭 로그 자료 및 검색 로그 자료는 최근 30분 이내에 생성된 자료일 수 있으나, 본 발명은 상기 언급된 시간으로 한정되는 것은 아니다.In this case, the popularity calculator 230 may extract a document used to generate the first candidate representative phrase from the documents collected in association with the popular keyword. The popularity calculator 230 may calculate the popularity of the document using a click-to-search ratio (CTR) that combines the number of clicks and impressions of the extracted document. At this time, the popularity calculation unit 230 may calculate the number of clicks using the click log data of the extracted document. The popularity calculator 230 may calculate the number of exposures of the document using the search log data of the extracted document. Here, the click log data and the search log data may be data generated within the last 30 minutes, but the present invention is not limited to the above-mentioned time.

이처럼, 제1 후보 대표 문구를 생성하는 데 이용된 문서가 한 개인 경우, 인기도 계산부(230)는 한 개의 문서 인기도를 계산할 수 있다. 그리고, 제1 후보 대표 문구를 생성하는 데 이용된 문서가 복수 개인 경우, 인기도 계산부(230)는 복수의 문서에 대응하는 문서 인기도를 각각 계산할 수 있다.As such, when there is only one document used to generate the first candidate representative phrase, the popularity calculator 230 may calculate one document popularity. In addition, when there are a plurality of documents used to generate the first candidate representative phrase, the popularity calculation unit 230 may calculate document popularity corresponding to the plurality of documents, respectively.

이에 따라, 335 단계에서, 인기도 계산부(230)는 복수의 문서 인기도를 조합하여 최종 인기도를 계산할 수 있다. 이때, 계산된 문서 인기도가 한 개인 경우, 인기도 계산부(230)는 하나의 문서 인기도를 최종 인기도로 계산할 수 있다.Accordingly, in operation 335, the popularity calculator 230 may calculate a final popularity by combining a plurality of document popularitys. In this case, when the calculated document popularity is one, the popularity calculator 230 may calculate one document popularity as the final popularity.

이어, 대표 문구 결정부(240)는 제1 후보 대표 문구의 적합도 및 제2 후보 대표 문구의 적합도에 기초하여 제1 후보 대표 문구 및 제2 후보 대표 문구 중 인기 키워드에 대한 적합도가 더 높은 것을 최종적인 대표 문구로 결정할 수 있다.Subsequently, the representative phrase determination unit 240 determines that a higher degree of suitability for the popular keyword among the first candidate representative phrase and the second candidate representative phrase is higher based on the goodness of fit of the first candidate representative phrase and the second candidate representative phrase. It can be decided by a representative representative phrase.

그리고, 340 단계에서, 대표 문구 결정부(240)는 최종 인기도와 기설정된 기준 인기도를 이용하여 제1 후보 대표 문구가 인기 키워드에 대한 적합도가 더 높은지 여부를 결정할 수 있다.In operation 340, the representative phrase determination unit 240 may determine whether the first candidate representative phrase has a higher suitability for the popular keyword by using the final popularity and the predetermined reference popularity.

이때, 최종 인기도가 기준 인기도 이상인 경우, 350 단계에서, 대표 문구 결정부(240)는 제1 후보 대표 문구가 제2 후보 대표 문구보다 인기 키워드에 대한 적합도가 더 높은 것으로 결정할 수 있다. 그리고, 대표 문구 결정부(240)는 제1 후보 대표 문구를 최종적인 대표 문구로 결정할 수 있다. 즉, 최종 인기도가 기준 인기도 α 이상인 경우, 대표 문구 결정부(240)는 제1 후보 대표 문구를 생성하는 데 이용한 문서가 인기도를 획득한 것으로 결정할 수 있다. In this case, when the final popularity is greater than or equal to the reference popularity, in step 350, the representative phrase determination unit 240 may determine that the first candidate representative phrase has a higher suitability for the popular keyword than the second candidate representative phrase. In addition, the representative phrase determination unit 240 may determine the first candidate representative phrase as the final representative phrase. That is, when the final popularity is greater than or equal to the reference popularity α, the representative phrase determination unit 240 may determine that the document used to generate the first candidate representative phrase has obtained popularity.

그리고, 최종 인기도가 기준 인기도 미만인 경우, 대표 문구 결정부(240)는 제1 후보 대표 문구가 인기 키워드에 대한 적합도가 더 높지 않은 것으로 결정할 수 있다. 즉, 대표 문구 결정부(240)는 제1 후보 대표 문구가 최종적인 대표 문구로 적합하지 않은 것으로 결정할 수 있다.In addition, when the final popularity is less than the reference popularity, the representative phrase determination unit 240 may determine that the first candidate representative phrase is not higher suitability for the popular keyword. That is, the representative phrase determination unit 240 may determine that the first candidate representative phrase is not suitable as the final representative phrase.

그러면, 360 단계에서, 시간 점수 계산부(250)는 제2 후보 대표 문구의 시간 점수를 계산할 수 있다. 이때, 시간 점수 계산부(250)는 제2 후보 대표 문구를 생성하는 데 이용된 방송 프로그램의 방송 시간, 및 인기 키워드의 노출 시간에 기초하여 제2 후보 대표 문구의 시간 점수를 계산할 수 있다. 여기서, 방송 시간은, 방송 시작 시간 및 방송 종료 시간을 포함할 수 있다.Then, in step 360, the time score calculator 250 may calculate the time score of the second candidate representative phrase. In this case, the time score calculator 250 may calculate the time score of the second candidate representative phrase based on the broadcast time of the broadcast program used to generate the second candidate representative phrase, and the exposure time of the popular keyword. Here, the broadcast time may include a broadcast start time and a broadcast end time.

일례로, 노출 시간이 방송 시작 시간 이상이고, 방송 종료 시간 이하인 경우, 시간 점수 계산부(250)는 제2 후보 대표 문구의 시간 점수를 기설정된 기준 점수 β로 계산할 수 있다. For example, when the exposure time is greater than or equal to the broadcast start time and less than or equal to the broadcast end time, the time score calculator 250 may calculate the time score of the second candidate representative phrase as the predetermined reference score β.

다른 예로, 노출 시간이 방송 시작 시간 이전이거나, 또는 노출 시간이 방송 종료 시간 이후인 경우, 시간 점수 계산부(250)는 기준 점수 β에서 오차 단위 γ 마다 오차 점수 δ를 감산하여 제2 후보 대표 문구의 시간 점수를 계산할 수 있다.As another example, when the exposure time is before the broadcast start time, or when the exposure time is after the broadcast end time, the time score calculator 250 subtracts the error score δ for each error unit γ from the reference score β to form the second candidate representative phrase. The time score of can be calculated.

마찬가지로, 노출 시간이 방송 종료 시간 이후인 경우, 시간 점수 계산부(250)는 노출 시간과 방송 종료 시간 간의 차이값을 계산할 수 있다. 그 리고, 시간 점수 계산부(250)는 차이값을 기준으로 오차 단위마다 기준 점수에서 오차 점수를 감산하여 제2 후보 대표 문구의 시간 점수를 계산할 수 있다.Similarly, when the exposure time is after the broadcast end time, the time score calculator 250 may calculate a difference between the exposure time and the broadcast end time. In addition, the time score calculator 250 may calculate the time score of the second candidate representative phrase by subtracting the error score from the reference score for each error unit based on the difference value.

이어, 370 단계에서, 대표 문구 결정부(240)는 시간 점수와 기설정된 기준 시간 점수 τ를 이용하여 제2 후보 대표 문구가 제1 후보 대표 문구보다 인기 키워드에 대한 적합도가 더 높은지 여부를 결정할 수 있다. In operation 370, the representative phrase determination unit 240 may determine whether the second candidate representative phrase has a higher suitability for the popular keyword than the first candidate representative phrase by using the time score and the preset reference time score τ. have.

이때, 제2 후보 대표 문구의 시간 점수가 기준 시간 점수 이하인 경우, 대표 문구 결정부(240)는 제2 후보 대표 문구가 인기 키워드에 대한 적합도가 더 높지 않은 것으로 결정할 수 있다. 즉, 대표 문구 결정부(240)는 제2 후보 대표 문구가 최종적인 대표 문구로 적합하지 않다고 결정할 수 있다. In this case, when the time score of the second candidate representative phrase is less than or equal to the reference time score, the representative phrase determination unit 240 may determine that the second candidate representative phrase is not higher in suitability for the popular keyword. That is, the representative phrase determination unit 240 may determine that the second candidate representative phrase is not suitable as the final representative phrase.

그리고, 제2 후보 대표 문구의 시간 점수가 기준 시간 점수 이하가 아닌 경우, 380 단계에서, 대표 문구 결정부(240)는 제2 후보 대표 문구가 제1 후보 대표 문구보다 인기 키워드에 대한 적합도가 더 높은 것으로 결정할 수 있다. 즉, 최종 인기도가 기준 인기도 이상이고, 제2 후보 대표 문구의 시간 점수가 기준 시간 점수 이하가 아닌 경우, 제2 후보 대표 문구가 제1 후보 대표 문구보다 인기 키워드에 대한 적합도가 더 높은 것으로 결정할 수 있다. 이에 따라, 대표 문구 결정부(240)는 제2 후보 대표 문구를 인기 키워드에 대해 최종적인 대표 문구로 결정할 수 있다.If the time score of the second candidate representative phrase is not less than or equal to the reference time score, in step 380, the representative phrase determination unit 240 has a better suitability for a popular keyword than the first candidate representative phrase in the second candidate representative phrase 240. It can be determined to be high. That is, when the final popularity is greater than or equal to the reference popularity and the time score of the second candidate representative phrase is not less than or equal to the reference time score, it may be determined that the second candidate representative phrase has a higher suitability for the popular keyword than the first candidate representative phrase. have. Accordingly, the representative phrase determination unit 240 may determine the second candidate representative phrase as the final representative phrase for the popular keyword.

그러면, 390 단계에서, 대표 문구 제공부(260)는 대표 문구와 인기 키워드를 결합하여 제공할 수 있다.Then, in step 390, the representative phrase provider 260 may provide a combination of the representative phrase and the popular keyword.

일례로, 제1 후보 대표 문구가 인기 키워드에 대해 최종적인 대표 문구로 결정된 경우, 도 5와 같이, 대표 문구 제공부(260)는 제1 후보 대표 문구와 인기 키워드를 결합하여 웹 페이지에 노출할 수 있다. For example, when the first candidate representative phrase is determined as the final representative phrase for the popular keyword, as shown in FIG. 5, the representative phrase provider 260 combines the first candidate representative phrase and the popular keyword and exposes the web page. Can be.

다른 예로, 제2 후보 대표 문구가 인기 키워드에 대해 최종적인 대표 문구로 결정된 경우, 도 8과 같이, 대표 문구 제공부(260)는 제2 후보 대표 문구와 인기 키워드를 결합하여 웹 페이지에 노출할 수 있다.As another example, when the second candidate representative phrase is determined as the final representative phrase for the popular keyword, as shown in FIG. 8, the representative phrase provider 260 combines the second candidate representative phrase and the popular keyword to be displayed on a web page. Can be.

한편, 도 3의 370단계에서, 제2 후보 대표 문구의 시간 점수가 기준 시간 점수 이하인 경우, 대표 문구 결정부(240)는 제1 후보 대표 문구 및 제2 후보 대표 문구 전부 인기 키워드에 대해 최종적인 대표 문구로 결정하지 않을 수 있다. 이처럼, 제1 후보 대표 문구 및 제2 후보 대표 문구 전부 인기 키워드에 대해 최종적인 대표 문구로 결정되지 않으면, 대표 문구 결정부(240)는 제1 후보 대표 문구 및 제2 후보 대표 문구 중 어느 하나를 랜덤하게 선택하여 인기 키워드에 대해 최종적인 대표 문구로 결정할 수 있다. 그러면, 대표 문구 제공부(260)는 랜덤하게 선택된 대표 문구와 인기 키워드를 결합하여 노출할 수 있다.Meanwhile, in step 370 of FIG. 3, when the time score of the second candidate representative phrase is less than or equal to the reference time score, the representative phrase determination unit 240 may finalize all of the first candidate representative phrase and the second candidate representative phrase for the popular keyword. It may not be decided by a representative phrase. As such, if both the first candidate representative phrase and the second candidate representative phrase are not determined as the final representative phrase for the popular keyword, the representative phrase determination unit 240 selects any one of the first candidate representative phrase and the second candidate representative phrase. Selected randomly to determine the final representative phrase for popular keywords. Then, the representative phrase provider 260 may combine and expose a randomly selected representative phrase and popular keywords.

또한, 제1 후보 대표 문구 및 제2 후보 대표 문구 모두 최종적인 대표 문구로 결정되지 않은 경우, 대표 문구 결정부(240)는 제1 후보 대표 문구 및 제2 후보 대표 문구를 모두 인기 키워드에 대해 최종적인 대표 문구로 결정할 수도 있다. 그러면, 대표 문구 제공부는, 제1 후보 대표 문구 및 제2 후보 대표 문구, 그리고 인기 키워드를 결합하여 노출할 수 있다.In addition, when both the first candidate representative phrase and the second candidate representative phrase are not determined as final representative phrases, the representative phrase determination unit 240 ends both the first candidate representative phrases and the second candidate representative phrases with respect to the popular keyword. It may be decided by a representative representative phrase. Then, the representative phrase providing unit may combine and expose the first candidate representative phrase, the second candidate representative phrase, and the popular keyword.

또한, 제1 후보 대표 문구 및 제2 후보 대표 문구 모두 최종적인 대표 문구로 결정되지 않은 경우, 대표 제공부(260)는 대표 문구 없이 인기 키워드만을 노출할 수도 있다.In addition, when neither the first candidate representative phrase nor the second candidate representative phrase is determined as the final representative phrase, the representative provider 260 may expose only the popular keyword without the representative phrase.

이상의 도 3에서는, 인기도 계산, 시간 점수 계산을 이용하여 제1 및 제2 후보 대표 문구의 적합도를 결정하고, 결정된 적합도에 기초하여 최종적인 대표 문구를 결정 및 제공하는 구성에 대해 설명하였으나, 이는 실시예에 해당되며, 본 명세서에서 사용되는 대표 문구 제공 방법의 구성은 사용자, 운용자의 의도 또는 본 발명이 속하는 분야의 관례 등에 따라 달라질 수 있다. 따라서, 본 대표 문구 제공 방법의 구성이 도 3에 의해 제한되거나 한정되는 것은 아니다.In FIG. 3, the configuration of determining the suitability of the first and second candidate representative phrases using the popularity calculation and the time score calculation, and determining and providing the final representative phrase based on the determined fitness, has been described. For example, the configuration of the representative phrase providing method used in the present specification may vary according to a user, an operator's intention, or a custom in the field to which the present invention belongs. Therefore, the configuration of the present representative phrase providing method is not limited or limited by FIG. 3.

일례로, 본 발명의 일실시예에 따른 대표 문구 제공 방법은, 문서를 기초로 인기 키워드에 대한 제1 후보 대표 문구를 생성하는 단계, 방송 데이터를 기초로 인기 키워드에 대한 제2 후보 대표 문구를 생성하는 단계, 및 제1 후보 대표 문구 및 제2 후보 대표 문구를 이용하여 인기 키워드에 대해 최종적인 대표 문구를 결정하는 단계를 포함할 수 있다. 이 때, 대표 문구 제공 방법은 최종적인 대표 문구와 인기 키워드를 결합하여 제공하는 단계를 더 포함할 수 있다.For example, the method of providing a representative phrase according to an embodiment of the present invention may include generating a first candidate representative phrase for a popular keyword based on a document, and displaying a second candidate representative phrase for a popular keyword based on broadcast data. And generating a final representative phrase for the popular keyword using the first candidate representative phrase and the second candidate representative phrase. In this case, the representative phrase providing method may further include providing the final representative phrase by combining the popular keyword.

다른 예로, 대표 문구 제공 방법은, 문서를 기초로 인기 키워드에 대한 제1 후보 대표 문구를 생성하는 단계, 방송 데이터를 기초로 인기 키워드에 대한 제2 후보 대표 문구를 생성하는 단계, 제1 후보 대표 문구를 생성하기 위해 이용된 문서의 인기도를 계산하는 단계, 제1 후보 대표 문구 및 제2 후보 대표 문구를 이용하여 인기 키워드에 대해 최종적인 대표 문구를 결정하는 단계를 포함할 수 있다. 이 때, 대표 문구 제공 방법은 최종적인 대표 문구와 인기 키워드를 결합하여 제공하는 단계를 더 포함할 수 있다.As another example, the representative phrase providing method may include generating a first candidate representative phrase for a popular keyword based on a document, generating a second candidate representative phrase for a popular keyword based on broadcast data, and first candidate representative Calculating the popularity of the document used to generate the phrase, and determining the final representative phrase for the popular keyword using the first candidate representative phrase and the second candidate representative phrase. In this case, the representative phrase providing method may further include providing the final representative phrase by combining the popular keyword.

또 다른 예로, 본 발명의 일실시예에 따른 대표 문구 제공 방법은, 문서를 기초로 인기 키워드에 대한 제1 후보 대표 문구를 생성하는 단계, 방송 데이터를 기초로 인기 키워드에 대한 제2 후보 대표 문구를 생성하는 단계, 제2 후보 대표 문구를 생성하기 위해 이용된 방송 데이터의 방송 시간과 인기 키워드의 노출 시간을 이용하여 제2 후보 대표 문구의 시간 점수를 계산하는 단계, 그리고 제1 후보 대표 문구 및 제2 후보 대표 문구를 이용하여 인기 키워드에 대해 최종적인 대표 문구를 결정하는 단계를 포함할 수 있다. 이때, 대표 문구 제공 방법은 최종적인 대표 문구와 인기 키워드를 결합하여 제공하는 단계를 더 포함할 수 있다.As another example, the representative phrase providing method according to an embodiment of the present invention, generating a first candidate representative phrase for the popular keyword based on the document, the second candidate representative phrase for the popular keyword based on the broadcast data Calculating a time score of the second candidate representative phrase using the broadcast time of the broadcast data used to generate the second candidate representative phrase and the exposure time of the popular keyword, and the first candidate representative phrase and The method may include determining a final representative phrase for the popular keyword by using the second candidate representative phrase. In this case, the representative phrase providing method may further include providing a final representative phrase by combining the popular keyword.

또 다른 예로, 문서를 기초로 인기 키워드에 대한 제1 후보 대표 문구를 생성하는 단계, 방송 데이터를 기초로 인기 키워드에 대한 제2 후보 대표 문구를 생성하는 단계, 제1 후보 대표 문구를 생성하기 위해 이용된 문서의 인기도를 계산하는 단계, 제2 후보 대표 문구를 생성하기 위해 이용된 방송 데이터의 방송 시간과 인기 키워드의 노출 시간을 이용하여 제2 후보 대표 문구의 시간 점수를 계산하는 단계, 및 제1 후보 대표 문구 및 제2 후보 대표 문구를 이용하여 인기 키워드에 대해 최종적인 대표 문구를 결정하는 단계를 포함할 수 있다. 이때, 대표 문구 제공 방법은 최종적인 대표 문구와 인기 키워드를 결합하여 제공하는 단계를 더 포함할 수 있다.In another example, generating a first candidate representative phrase for a popular keyword based on a document, generating a second candidate representative phrase for a popular keyword based on broadcast data, and generating a first candidate representative phrase Calculating a popularity of the used document, calculating a time score of the second candidate representative phrase using the broadcast time of the broadcast data used to generate the second candidate representative phrase and the exposure time of the popular keyword, and And determining the final representative phrase for the popular keyword by using the first candidate representative phrase and the second candidate representative phrase. In this case, the representative phrase providing method may further include providing a final representative phrase by combining the popular keyword.

이하에서는, 도 4 내지 도 6을 참조하여 제1 후보 대표 문구를 생성하는 구성, 및 제1 후보 대표 문구가 대표 문구로 결정된 경우, 제1 후보 대표 문구와 인기 키워드를 결합하여 제공하는 구성에 대해 상세히 설명하기로 한다.Hereinafter, a configuration of generating the first candidate representative phrase and a configuration of combining the first candidate representative phrase and a popular keyword when the first candidate representative phrase is determined as the representative phrase will be described with reference to FIGS. 4 to 6. It will be described in detail.

도 4는 본 발명의 일실시예에 따른 제1 후보 대표 문구 생성부의 세부 구성을 도시한 블럭도이다. 4 is a block diagram illustrating a detailed configuration of a first candidate representative phrase generation unit according to an embodiment of the present invention.

도 4에 따르면, 도 2의 대표 문구 제공 시스템은, 문서 수집부, 및 클러스터 수행부를 더 포함할 수 있다.According to FIG. 4, the representative phrase providing system of FIG. 2 may further include a document collecting unit and a cluster performing unit.

먼저, 문서 수집부(410)는 웹 페이지 상에 실시간 급상승 검색어로 노출하고자 하는 인기 키워드와 관련된 문서들을 각종 포털 사이트, 뉴스 사이트, 블로그, 마이크로 블로그 등을 통해 수집할 수 있다.First, the document collection unit 410 may collect documents related to popular keywords to be exposed as a real-time rising search word on a web page through various portal sites, news sites, blogs, microblogs, and the like.

일례로, 문서 수집부(410)는 인기 키워드 AAA에 대한 문서들을 아래의 표 1과 같이 수집할 수 있다.For example, the document collection unit 410 may collect the documents for the popular keyword AAA as shown in Table 1 below.

문서document NNN_news#0NNN_news # 0 AAA, 홍콩, 최고 여배우, 'BBB'와 中서 비밀 데이트?Secret date with AAA, Hong Kong, top actress, 'BBB'? NNN _news#1NNN _news # 1 AAA, 홍콩 여배우 BBB와 몰래 데이트? AAA, secretly dating Hong Kong actress BBB? MMM _news#3MMM _news # 3 AAA-BBB, 연인? 콘서트 항상 인기 폭발AAA-BBB, sweetheart? Concert always popular explosion MMM _news#1MMM _news # 1 AAA, 홍콩 여배우 BBB와 비밀 데이트?Secret date with AAA, Hong Kong actress BBB? LLL#0LLL # 0 AAA, BBB 콘서트 영상 인기AAA and BBB Concert Videos Popular

클러스터 수행부(420)는 대해 N-그램 형태소 분석을 이용하여 수집된 문서들에 대해 클러스터를 수행할 수 있다. 여기서, N-그램은 단어를 N개까지 연이어 붙여서 사용하는 것으로, 1-그램은 단어 하나를 의미하고, 2-그램은 두 개의 단어가 연이어 결합된 형태를 의미한다. 이처럼, N-그램 형태소 분석을 통해 클러스터를 수행함에 따라, 클러스터의 품질이 향상될 수 있다.The cluster performer 420 may perform a cluster on the collected documents using N-gram morpheme analysis. Here, N-gram is used to add up to N words in a row, 1-gram means one word, 2-gram means a form in which two words are combined in succession. As such, as the cluster is performed through N-gram morpheme analysis, the quality of the cluster may be improved.

이때, 1-그램 형태소 분석의 경우, 잘못된 클러스터가 생성될 수 있으나, 5-그램까지 이용하여 형태소 분석을 하는 경우, AAA 키워드와 연관성이 높은 문서들끼리 묶일 확률이 증가한다. 일례로, 클러스터 수행부(420)는 표 1의 NNN_news#0에 대해 5-그램 형태소 분석을 이용하여 클러스터 인풋 데이터(Cluster Input Data)를 생성할 수 있다. 이때, 생성된 클러스터 인풋 데이터는 아래의 표 2와 같을 수 있다.In this case, in the case of 1-gram morphological analysis, an incorrect cluster may be generated, but when stemming using 5 grams, the probability that the documents related to the AAA keyword are related to each other increases. For example, the cluster execution unit 420 may generate cluster input data by using 5-gram morpheme analysis on NNN_news # 0 of Table 1. In this case, the generated cluster input data may be as shown in Table 2 below.

문서document 클러스터 인풋 데이터Cluster input data NNN_news#0NNN_news # 0 AAA AAA홍콩 AAA홍콩최고 AAA홍콩최고여배우 AAA홍콩최고여배우BBB AAA홍콩최고여배우BBB와 AAA홍콩최고여배우BBB와中 홍콩 홍콩최고 홍콩최고여배우 홍콩최고여배우BBB 홍콩최고여배우BBB와 홍콩최고여배우BBB와中 홍콩최고여배우BBB와中서 홍콩최고여배우BBB와中서비밀 최고 최고여배우 최고여배우BBB와 최고여배우BBB와中 최고여배우BBB와中서 최고여배우BBB와中서비밀 여배우 여배우BBB 여배우BBB와 여배우BBB와中 여배우BBB와中서 여배우BBB와中서비밀 여배우BBB와中서비밀데이트 BBB BBB와 BBB와中 BBB와中서 BBB와中서비밀 BBB와中서비밀데이트 中中서 中서비밀 中서비밀데이트 서 서비밀 서비밀데이트 비밀 비밀데이트 데이트AAA Hong Kong AAA Hong Kong Top AAA Hong Kong Best Actor Hong Kong Best Actor BBB AAA Hong Kong Best Actor BBB and AAA Hong Kong Best Actor BBB Hong Kong Best Actor Hong Kong Best Actor Hong Kong Best Actor BBB Hong Kong Best Actor BBB Best Actress BBB and Middle East Hong Kong Best Actress BBB and Middle Secret Best Actor Best Actress BBB and Best Actress BBB Best Middle Actress BBB and Middle Best Actress BBB and Middle Secret Actress Actress BBB Actress BBB and Actress BBB and Middle Actress BBB and Chinese actressBBB and Chinese secret actressBBB and Chinese secret date BBB BBB and BBB and Chinese BBB and Chinese BBB and Chinese secret BBB and Chinese secret date Chinese and Chinese secret Secret Secret Date Secret Secret Date

표 2에서는 문서 NNN_news#0에 대해 생성된 클러스터 인풋 데이터를 설명하고 있으나, 클러스터 수행부(420)는 수집된 모든 문서들에 대해 N-그램 형태소 분석을 이용하여 클러스터 인풋 데이터를 생성할 수 있다.In Table 2, although the cluster input data generated for the document NNN_news # 0 is described, the cluster execution unit 420 may generate cluster input data using N-gram morpheme analysis for all the collected documents.

그러면, 클러스터 수행부(420)는 생성된 클러스터 인풋 데이터와 카테고리컬(Categorical) 클러스터링 기법을 이용하여 수집된 모든 문서들에 대해 클러스터를 수행할 수 있다. 이때, 클러스터 수행부(420)는 자카드(Jaccard) 계수를 기반으로 하는 계층적 클러스터링 기법을 이용하여 클러스터를 수행할 수 있다. 일례로, 클러스터 수행부(420)는 표 1에서 수집된 문서들에 대해 상기 계층적 클러스터링 기법을 이용하여 클러스터를 수행함으로써 아래의 표 3 및 표 4와 같은 2개의 클러스터를 생성할 수 있다.Then, the cluster execution unit 420 may perform a cluster on all the collected documents using the generated cluster input data and the categorical clustering technique. In this case, the cluster execution unit 420 may perform a cluster using a hierarchical clustering technique based on a Jaccard coefficient. For example, the cluster execution unit 420 may generate two clusters as shown in Tables 3 and 4 by performing a cluster on the documents collected in Table 1 using the hierarchical clustering technique.

# 클러스터 A # Cluster A MMM _news#1MMM _news # 1 AAA, 홍콩 여배우 BBB와 비밀 데이트?Secret date with AAA, Hong Kong actress BBB? NNN_news#0NNN_news # 0 AAA, 홍콩, 최고 여배우, 'BBB'와 中서 비밀 데이트?Secret date with AAA, Hong Kong, top actress, 'BBB'? NNN _news#1NNN _news # 1 AAA, 홍콩 여배우 BBB와 몰래 데이트? AAA, secretly dating Hong Kong actress BBB?

# 클러스터 B # Cluster B LLL#0LLL # 0 AAA, BBB 콘서트 영상 인기AAA and BBB Concert Videos Popular MMM _news#3MMM _news # 3 AAA-BBB, 연인? 콘서트 항상 인기 폭발AAA-BBB, sweetheart? Concert always popular explosion

표 3에 따르면, 클러스터 A는 인기 키워드 AAA와 BBB 간의 데이트와 관련된 문서들을 포함하는 클러스터이다. 그리고, 표 4에 따르면, 클러스터 B는 인기 키워드 AAA와 BBB간의 콘서트와 관련된 문서들을 포함하는 클러스터이다.According to Table 3, cluster A is a cluster containing documents related to the date between popular keywords AAA and BBB. In addition, according to Table 4, cluster B is a cluster including documents related to a concert between popular keywords AAA and BBB.

제1 후보 대표 문구 생성부(430)는 생성된 복수의 클러스터에서 각 클러스터를 대표하는 문장 하나를 조합하여 인기 키워드에 대한 제1 후보 대표 문구를 생성할 수 있다. 여기서, 제1 후보 대표 문구 생성부(430)는 기준 단어 결정부(431) 및 기준 단어 확장부(432)를 포함할 수 있다.The first candidate representative phrase generation unit 430 may generate a first candidate representative phrase for a popular keyword by combining one sentence representing each cluster in the generated plurality of clusters. Here, the first candidate representative phrase generator 430 may include a reference word determiner 431 and a reference word expander 432.

기준 단어 결정부(431)는 하나의 클러스터에 포함된 모든 문서들의 형태소를 분석하여 하나 이상의 기준 단어를 결정할 수 있다. 일례로, 클러스터 A에 포함된 문서들의 형태소를 분석하는 경우, 기준 단어 결정부(431)는 문서 MMM _news#1, NNN_news#0, 및 NNN _news#1의 형태소를 분석하여 "홍콩", "데이트", "BBB", "여배우", "비밀", "몰래"를 기준 단어로 결정할 수 있다. The reference word determiner 431 may determine one or more reference words by analyzing the morphemes of all documents included in one cluster. For example, when analyzing the stems of the documents included in the cluster A, the reference word determiner 431 analyzes the stems of the documents MMM _news # 1, NNN_news # 0, and NNN _news # 1 "Hong Kong", "date "," BBB "," actress "," secret "," sneak "can be determined as the reference word.

이때, 기준 단어 결정부(431)는 결정된 기준 단어가 상기 문서들 MMM _news#1, NNN_news#0, 및 NNN _news#1에 포함되는 빈도수를 카운트할 수 있다. 일례로, 클러스터 A에 대해 결정된 기준 단어와 카운트된 기준 단어의 빈도수는 아래의 표 5와 같을 수 있다.In this case, the reference word determiner 431 may count the frequency in which the determined reference word is included in the documents MMM_news # 1, NNN_news # 0, and NNN_news # 1. For example, the frequency of the reference word and the counted reference word determined for the cluster A may be as shown in Table 5 below.

기준 단어Reference word 빈도수Frequency 홍콩Hong Kong 232232 데이트Date 224224 BBBBBB 215215 여배우actress 152152 비밀Secret 142142 몰래stealthily 7070

기준 단어 결정부(431)는 카운트된 기준 단어의 빈도수가 가장 높은 순으로 2개의 기준 단어를 대표 기준 단어로 결정할 수 있다. 일례로, 표 5에 따르면, 기준 단어 결정부(431)는 빈도수가 높은 상위 2개의 "홍콩"과 "데이트"를 대표 기준 단어로 결정할 수 있다.The reference word determiner 431 may determine two reference words as representative reference words in order of the highest frequency of counted reference words. For example, according to Table 5, the reference word determiner 431 may determine the top two "Hong Kong" and "date" having a high frequency as the representative reference word.

기준 단어 확장부(432)는 결정된 대표 기준 단어와 연속하는 단어가 클러스터에 속하는 문서들에 포함되는 조건부 확률을 계산할 수 있다. 이때, 기준 단어 확장부(432)는 바이그램(bi-gram) 조건부 확률을 이용하여 대표 기준 단어와 이전 또는 이후에 연속하는 단어가 포함될 조건부 확률을 계산할 수 있다.The reference word expansion unit 432 may calculate a conditional probability that the determined representative reference word and consecutive words are included in documents belonging to the cluster. In this case, the reference word expansion unit 432 may calculate a conditional probability including the representative reference word and a continuous word before or after using the bi-gram conditional probability.

일례로, 기준 단어 확장부(232)는 클러스터 A에 속하는 문서들에서 대표 기준 단어 "홍콩"과 연속하는 단어 "여배우" 및 "최고"가 포함되는 조건부 확률을 계산할 수 있다. 이때, 기준 단어 확장부(432)는 아래의 수학식 1을 이용하여 "홍콩" 이후에 연속하는 단어 "여배우"가 클러스터 A에 속하는 문서들에 포함되는 조건부 확률을 2/3로 계산할 수 있다. 보다 상세하게는, 클러스터 A에 속하는 문서 MMM _news#1, NNN_news#0, 및 NNN _news#1 중에서 "홍콩 여배우"가 포함된 문서는 MMM _news#1 및 NNN _news#1이다. 이에 따라, 기준 단어 확장부(232)는 "홍콩 여배우"의 조건부 확률을 2/3로 계산할 수 있다.In one example, the reference word expansion unit 232 may calculate a conditional probability of including the representative reference word "Hong Kong" and the consecutive words "actress" and "highest" in documents belonging to the cluster A. In this case, the reference word expansion unit 432 may calculate the conditional probability that the word "actress" consecutive after "Hong Kong" is included in documents belonging to the cluster A by 2/3 using Equation 1 below. More specifically, among documents MMM_news # 1, NNN_news # 0, and NNN_news # 1 belonging to cluster A, the document including "Hong Kong actress" is MMM_news # 1 and NNN_news # 1. Accordingly, the reference word expansion unit 232 may calculate the conditional probability of "Hong Kong actress" to 2/3.

동일한 방법으로, 기준 단어 확장부(432)는 클러스터 A에 속하는 문서들에 "홍콩" 이후에 연속하는 단어 "최고"가 포함되는 조건부 확률을 1/3로 계산할 수 있다. 보다 상세하게는, 클러스터 A에 속하는 문서들 중에 "홍콩 최고"를 포함하는 문서는 NNN_news#0 하나이므로, 기준 단어 확장부(432)는 아래의 수학식 1을 이용하여 "홍콩 최고"의 조건부 확률을 1/3로 계산할 수 있다.In the same way, the reference word extension 432 may calculate a 1/3 conditional probability that documents belonging to cluster A include the word "highest" after "Hong Kong". More specifically, since the document including "Hong Kong Highest" among the documents belonging to Cluster A is NNN_news # 0, the reference word expansion unit 432 uses the Equation 1 below to conditional probability of "Hong Kong Highest". Can be calculated as 1/3.

그리고, 기준 단어 확장부(432)는 계산된 조건부 확률 중 확률 값이 큰 확률이 기설정된 기준 확률 이상이면, 확률 값이 큰 확률에 해당하는 단어로 대표 기준 단어를 확장할 수 있다. The reference word expansion unit 432 may extend the representative reference word to a word corresponding to a probability having a large probability value if the probability of the calculated conditional probability is greater than or equal to a predetermined reference probability.

일례로, 기준 확률이 1/2로 기설정된 경우, 기준 단어 확장부(432)는 계산된 조건부 확률을 비교하여 확률 값이 조건부 확률을 선택할 수 있다. 그러면, 기준 단어 확장부(432)는 조건부 확률 2/3와 1/3을 비교하여 2/3를 선택할 수 있다. 그리고, 기준 단어 확장부(432)는 선택한 조건부 확률 2/3이 기준 확률 1/2 이상인지를 판단할 수 있다. 이때, 조건부 확률 2/3이 기준 확률 1/2 이상이므로, 기준 단어 확장부(432)는 조건부 확률 2/3에 해당하는 단어 "홍콩 여배우"로 대표 기준 단어를 확장할 수 있다. 즉, 대표 기준 단어가 "홍콩"에서 "홍콩 여배우"로 확장될 수 있다.For example, when the reference probability is set to 1/2, the reference word expansion unit 432 may compare the calculated conditional probability and select a conditional probability with a probability value. Then, the reference word expansion unit 432 may select 2/3 by comparing the conditional probability 2/3 with 1/3. The reference word expansion unit 432 may determine whether the selected conditional probability 2/3 is greater than or equal to the reference probability 1/2. In this case, since the conditional probability 2/3 is greater than or equal to the reference probability 1/2, the reference word expansion unit 432 may expand the representative reference word to the word “Hong Kong actress” corresponding to the conditional probability 2/3. That is, the representative criterion word can be extended from "Hong Kong" to "Hong Kong actress".

이때, 기준 단어 확장부(432)는 확장하고자 하는 단어의 조건부 확률이 기준 확률 미만이 될 때까지 대표 기준 단어를 계속하여 확장할 수 있다.In this case, the reference word expansion unit 432 may continuously expand the representative reference word until the conditional probability of the word to be extended becomes less than the reference probability.

일례로, 기준 단어 확장부(432)는 클러스터 A에 속하는 문서들에 확장된 단어 "여배우" 이후에 연속하는 단어가 포함될 조건부 확률을 계산할 수 있다. 이때, 기준 단어 확장부(432)는 위의 수학식 1과 동일한 방법으로 P(BBB와|여배우)의 조건부 확률을 1로 계산할 수 있다. 보다 상세하게는, 클러스터 A에 속하는 모든 문서에 여배우 이후에 연속하여 단어 "BBB와"가 포함되므로, 기준 단어 확장부(432)는 P(BBB와|여배우)의 조건부 확률을 3/3=1로 계산할 수 있다. 그리고, 기준 단어 확장부(432)는 계산된 P(BBB와|여배우)의 조건부 확률 1이 기설정된 기준 확률 1/2 이상이므로, 대표 기준 단어를 "홍콩 여배우"에서 "홍콩 여배우 BBB와"로 확장할 수 있다.In one example, the reference word expansion unit 432 may calculate a conditional probability that the subsequent words will be included after the extended word "actress" in the documents belonging to the cluster A. In this case, the reference word expansion unit 432 may calculate the conditional probability of P (BBB and actress) as 1 in the same manner as in Equation 1 above. More specifically, since all documents belonging to cluster A contain the word "BBB" consecutively after the actress, the reference word extension 432 determines the conditional probability of P (BBB and | actress) 3/3 = 1. Can be calculated as The reference word expansion unit 432 calculates the representative reference word from "Hong Kong actress" to "Hong Kong actress BBB" because the conditional probability 1 of the calculated P (BBB and actress) is equal to or larger than the preset reference probability 1/2. Can be extended.

이때, 기준 단어 확장부(432)는 확장된 단어의 조건부 확률이 기준 확률 미만이면, 대표 기준 단어의 확장을 종료할 수 있다. 일례로, 바이그램 조건부 확률을 이용하는 경우, 기준 단어 확장부(432)는 클러스터 A에 속하는 문서들에서 확장된 단어 "BBB와" 이후에 연속하는 단어를 포함하는 조건부 확률이 1/3, 1/3, 1/3로 기준 확률 1/2 미만이므로 대표 기준 단어의 확장을 종료할 수 있다.In this case, if the conditional probability of the expanded word is less than the reference probability, the reference word expansion unit 432 may terminate the expansion of the representative reference word. In one example, when using the Baigram conditional probability, the reference word expansion unit 432 has a conditional probability of including the extended word " BBB " and subsequent words in documents belonging to the cluster A. In other words, since the third probability is less than the reference probability 1/2, the expansion of the representative reference word can be terminated.

보다 상세하게는, 문서 MMM _news#1은 "BBB와" 연속하는 단어로 "비밀"을 포함하고, NNN_news#0는 "BBB와" 연속하는 단어로 "中서"를 포함하고, 및 NNN _news#1는 "BBB와" 연속하는 단어로 "몰래"를 포함한다. 이에 따라, 기준 단어 확장부(432)는 클러스터 A에 속하는 문서들에서 "BBB와" 이후에 연속하는 단어를 포함하는 조건부 확률을 1/3, 1/3, 1/3로 계산할 수 있다. 이때, 계산된 조건부 확률이 모두 동일하므로, 기준 확장부(432)는 이중 하나인 1/3을 선택하고, 선택된 조건부 확률 1/3과 기준 확률 1/2을 비교할 수 있다. 그리고, 기준 확장부(432)는 선택된 조건부 확률 1/2이 기준 확률 1/2 미만이므로 대표 기준 단어의 확장을 종료할 수 있다. 이에 따라, 기준 단어 확장부(432)는 바이그램 조건부 확률을 이용하여 대표 기준 단어 "홍콩"을 "홍콩 여배우 BBB와"까지 확장할 수 있다.More specifically, document MMM _news # 1 contains "secret" as the word "BBB" contiguous, NNN_news # 0 contains "Middle West" as the word "BBB" contiguous, and NNN_news # 1 includes "sneak" as the word "continued with BBB". Accordingly, the reference word expansion unit 432 may calculate conditional probabilities including words consecutive with " BBB " after the documents belonging to the cluster A as 1/3, 1/3, and 1/3. At this time, since all of the calculated conditional probabilities are the same, the reference extension unit 432 may select one of the 1/3, and compare the selected conditional probability 1/3 with the reference probability 1/2. Since the selected conditional probability 1/2 is less than the reference probability 1/2, the reference expansion unit 432 may terminate the expansion of the representative reference word. Accordingly, the reference word expansion unit 432 may extend the representative reference word "Hong Kong" to "Hong Kong actress BBB" by using the Baigram conditional probability.

이어, 기준 단어 확장부(432)는 첫 번째로 빈도수가 높은 대표 기준 단어 "홍콩"의 확장이 종료되면, 두 번째로 빈도수가 높은 대표 기준 단어 "데이트"를 확장할 수 있다. 이때, 기준 단어 확장부(432)는 "홍콩"과 마찬가지로 바이그램 조건부 확률을 이용하여 "데이트"를 확장할 수 있다.Subsequently, when the expansion of the first high frequency representative reference word "Hong Kong" ends, the reference word expansion unit 432 may expand the second high frequency representative reference word "date". In this case, the reference word expansion unit 432 may expand the "date" using the Baigram conditional probability like the "Hong Kong".

일례로, 기준 단어 확장부(432)는 클러스터 A에 속하는 문서들에서 "데이트" 이전에 연속하는 단어 "비밀 데이트" 및 "몰래 데이트"의 조건부 확률을 2/3, 1/3로 각각 계산할 수 있다. 그리고, 기준 단어 확장부(432)는 계산된 조건부 확률 중 확률 값이 큰 2/3를 선택할 수 있다. 이어, 기준 단어 확장부(432)는 선택된 조건부 확률 2/3이 기준 확률 1/2 이상이므로, 2/3에 해당하는 단어 "비밀 데이트"로 대표 기준 단어를 확장할 수 있다. For example, the reference word extension 432 may calculate conditional probabilities of consecutive words “secret date” and “secret date” before 2 " date " have. In addition, the reference word expansion unit 432 may select 2/3 having a greater probability value among the calculated conditional probabilities. Subsequently, since the selected conditional probability 2/3 is equal to or greater than the reference probability 1/2, the reference word expansion unit 432 may extend the representative reference word with the word “secret date” corresponding to 2/3.

이때, 기준 단어 확장부(432)는 "홍콩"과 마찬가지로, 확장된 단어의 바이그램 조건부 확률이 기준 확률 미만이 될 때까지 대표 기준 단어를 계속 확장할 수 있다. 그러면, 기준 단어 확장부(432)는 대표 기준 단어 "데이트"를 최종적으로 "비밀 데이트"까지 확장할 수 있다.In this case, like the "Hong Kong", the reference word expansion unit 432 may continue to expand the representative reference word until the Bygram conditional probability of the expanded word is less than the reference probability. Then, the reference word expansion unit 432 may extend the representative reference word "date" to finally "secret date".

그리고, 기준 단어 확장부(432)는 확장된 대표 기준 단어들을 조합하여 제1 후보 대표 문구를 생성할 수 있다. 일례로, 기준 단어 확장부(432)는 "홍콩 여배우 BBB와"와 "비밀 데이트"를 조합하여 "홍콩 여배우 BBB와 비밀 데이트"를 제1 후보 대표 문구로 생성할 수 있다. 이때, 제1 후보 대표 문구가 대표 문구로 결정된 경우, 대표 문구 제공부는, 제1 후보 대표 문구 "홍콩 여배우 BBB와 비밀 데이트"와 인기 키워드 "AAA"를 결합하여 웹 페이지에 노출할 수 있다.The reference word expansion unit 432 may generate the first candidate representative phrase by combining the extended representative reference words. For example, the reference word expansion unit 432 may combine “Hong Kong actress BBB” with “secret date” to generate “Hong Kong actress BBB and secret date” as the first candidate representative phrase. In this case, when the first candidate representative phrase is determined as the representative phrase, the representative phrase provider may combine the first candidate representative phrase “Hong Kong actress BBB and secret date” and the popular keyword “AAA” to expose the web page.

동일한 방법으로, 제1 후보 대표 문구 생성부(430)는 클러스터 B에 포함된 모든 문서들의 형태소를 분석하여 하나 이상의 기준 단어를 결정할 수도 있다. 일례로, 기준 단어 결정부(431)는 문서 LLL#0 및 MMM_news#3의 형태소를 분석하여 "BBB", "콘서트", "영상", "인기", "폭발", "연인"을 기준 단어로 결정할 수 있다. 이때, 기준 단어 결정부(431)는 결정된 기준 단어가 상기 문서들 LLL#0, MMM_news#3에 포함되는 빈도수를 카운트하고, 카운트된 기준 단어의 빈도수가 가장 높은 순으로 2개 이상의 기준 단어를 대표 기준 단어로 결정할 수 있다. 그러면, 기준 단어 확장부(432)는 결정된 대표 기준 단어와 연속하는 단어가 클러스터 B에 속하는 문서들에 포함되는 조건부 확률을 계산할 수 있다.In the same manner, the first candidate representative phrase generator 430 may determine one or more reference words by analyzing the morphemes of all documents included in the cluster B. FIG. In one example, the reference word determiner 431 analyzes the morphemes of the documents LLL # 0 and MMM_news # 3 to determine the reference words "BBB", "concert", "image", "popular", "explosion", and "lover". Can be determined. At this time, the reference word determiner 431 counts the frequency in which the determined reference word is included in the documents LLL # 0 and MMM_news # 3, and represents two or more reference words in order of the highest frequency of the counted reference words. Can be determined by the reference word. Then, the reference word expansion unit 432 may calculate a conditional probability that the words consecutive to the determined representative reference word are included in the documents belonging to the cluster B.

그리고, 기준 단어 확장부(432)는 계산된 조건부 확률 중 확률 값이 큰 확률이 기설정된 기준 확률 이상이면, 확률 값이 큰 확률에 해당하는 단어로 대표 기준 단어를 확장할 수 있다. 이때, 기준 단어 확장부(432)는 확장하고자 하는 단어의 조건부 확률이 기준 확률 미만이 될 때까지 대표 기준 단어를 계속하여 확장할 수 있다. 마지막으로, 기준 단어 확장부(432)는 확장된 대표 기준 단어들을 조합하여 클러스터 B에 포함된 문서들에 대한 제1 후보 대표 문구를 생성할 수 있다.The reference word expansion unit 432 may extend the representative reference word to a word corresponding to a probability having a large probability value if the probability of the calculated conditional probability is greater than or equal to a predetermined reference probability. In this case, the reference word expansion unit 432 may continuously expand the representative reference word until the conditional probability of the word to be extended becomes less than the reference probability. Finally, the reference word extension 432 may combine the extended representative reference words to generate a first candidate representative phrase for the documents included in the cluster B. FIG.

이때, 제1 후보 대표 문구가 대표 문구로 결정된 경우, 대표 문구 제공부는 생성된 클러스터 A 및 클러스터 B에 대한 제1 후보 대표 문구를 인기 키워드 "AAA"와 각각 결합하여 웹 페이지에 노출할 수 있다. 일례로, 대표 문구 제공부는 클러스터 A 및 클러스터 B에 노출 우선 순위를 부여하고, 노출 우선 순위가 높은 클러스터의 제1 후보 대표 문구와 인기 키워드를 결합하여 웹 페이지에 노출할 수 있다.In this case, when the first candidate representative phrase is determined as the representative phrase, the representative phrase provider may combine the generated first candidate representative phrases for the cluster A and the cluster B with the popular keyword "AAA", respectively, to expose the web page. For example, the representative phrase provider may assign the exposure priority to the cluster A and the cluster B, and combine the first candidate representative phrase and the popular keyword of the cluster having the high exposure priority to expose the web page.

보다 상세하게는, 클러스터 A의 노출 우선 순위가 80이고, 클러스터 B의 노출 우선 순위가 60인 경우, 인기 키워드 "AAA" 검색이 요청되면, 대표 문구 제공부는 클러스터 A의 대표 문구와 인기 키워드 "AAA"를 결합하여 웹 페이지에 노출할 수 있다. 이때, 대표 문구 제공부는 복수의 클러스터들에 포함된 문서들의 가중치를 이용하여 노출 우선 순위를 결정할 수 있다. 일례로, 대표 문구 제공부는 클러스터에 포함된 문서들의 가중치의 합이 높은 순으로 노출 우선 순위를 높게 부여할 수 있다. 그러면, 대표 문구 제공부는 복수의 클러스터들 중 노출 우선 순위가 가장 높은 클러스터를 선택하고, 선택한 클러스터의 제1 후보 대표 문구와 인기 키워드를 결합하여 웹 페이지에 노출할 수 있다.More specifically, in the case where the exposure priority of the cluster A is 80 and the exposure priority of the cluster B is 60, when the search for the popular keyword "AAA" is requested, the representative phrase provider may provide the representative phrase of the cluster A and the popular keyword "AAA." "To expose web pages. In this case, the representative phrase providing unit may determine the exposure priority using the weights of the documents included in the plurality of clusters. For example, the representative phrase provider may assign the exposure priority to the highest sum of the weights of the documents included in the cluster. Then, the representative phrase provider may select the cluster having the highest exposure priority among the plurality of clusters, and combine the first candidate representative phrase of the selected cluster with the popular keyword to expose the web page.

다른 예로, 대표 문구 제공부는 클러스터의 크기(size)를 이용하여 복수의 클러스터 중 어느 하나의 클러스터의 제1 후보 대표 문구와 인기 키워드를 결합하여 웹 페이지에 노출할 수 있다. 여기서, 클러스터의 크기는 클러스터에 속하는 문서의 개수이다. 즉, 대표 문구 제공부는 클러스터 A 및 클러스터 B 중 문서를 많이 포함하고 있는 클러스터의 제1 후보 대표 문구와 인기 키워드를 결합하여 웹 페이지에 노출할 수 있다.As another example, the representative phrase provider may combine the first candidate representative phrase of one of the plurality of clusters with popular keywords by using the size of the cluster to expose the web page. Here, the size of the cluster is the number of documents belonging to the cluster. That is, the representative phrase providing unit may combine the first candidate representative phrase of the cluster including a large number of documents among the cluster A and the cluster B with the popular keyword to expose the web page.

이때, 대표 문구 제공부는 클러스터에 속하는 문서의 작성 시간에 따라 가중치를 부가하여 문서의 크기를 계산할 수 도 있다. 그러면, 대표 문구 제공부는 최근에 작성된 문서 일수록 높은 가중치를 부가하여, 클러스터 A에 포함된 문서의 개수가 클러스터 B에 포함된 문서의 개수보다 작더라도 최근에 작성된 문서의 개수가 클러스터 B보다 많은 경우, 클러스터 A의 크기가 클러스터 B의 크기보다 클 수도 있다. 이 경우, 대표 문구 제공부는 클러스터 A의 제1 후보 대표 문구와 인기 키워드를 결합하여 노출할 수 있다.In this case, the representative phrase providing unit may calculate the size of the document by adding a weight according to the creation time of the document belonging to the cluster. Then, the representative phrase providing unit adds a higher weight to the recently written document, so that even if the number of documents included in the cluster A is smaller than the number of documents included in the cluster B, the number of recently created documents is larger than the cluster B, The size of cluster A may be larger than the size of cluster B. In this case, the representative phrase provider may combine and expose the first candidate representative phrase of the cluster A and the popular keyword.

마찬가지로, 클러스터 B에 포함된 문서의 개수가 클러스터 A에 포함된 문서의 개수보다 작더라도, 클러스터 B에 클러스터 A 포함된 문서들보다 최근에 작성된 문서가 많은 경우, 클러스터 B의 크기가 클러스터 A의 크기보다 클 수도 있다. 그러면, 대표 문구 제공부는 클러스터 B의 제1 대표 문구와 인기 키워드를 결합하여 노출할 수도 있다.Similarly, even if the number of documents included in Cluster B is less than the number of documents included in Cluster A, if Cluster B has more recent documents than the documents contained in Cluster A, then the size of Cluster B is the size of Cluster A. May be greater than Then, the representative phrase provider may combine and expose the first representative phrase of the cluster B and the popular keyword.

이때, 대표 문구 제공부는 인기 키워드들에 해당하는 제1 후보 대표 문구를 모두 웹 페이지 상에 노출할 수 있을 뿐만 아니라, 웹 페이지에 노출된 인기 키워드들 중 마우스, 터치, 포인터 등에 의해 활성화된 인기 키워드의 제1 대표 문구를 웹 페이지에 노출할 수도 있다.In this case, the representative phrase provider may not only expose all of the first candidate representative phrases corresponding to the popular keywords on the web page, but also popular keywords activated by a mouse, a touch, a pointer, etc. among the popular keywords exposed on the web page. The first representative phrase of may be exposed on a web page.

일례로, 도 5와 같이, 실시간 인기 키워드들을 포함하는 목록(500)에서 마우스가 AAA 키워드(510)를 가리키는 경우, 대표 문구 제공부는 인기 키워드 AAA(510)와 AAA에 대한 제1 후보 대표 문구(521)를 함께 웹 페이지에 노출할 수 있다.For example, as illustrated in FIG. 5, when the mouse points to the AAA keyword 510 in the list 500 including the real-time popular keywords, the representative phrase providing unit may include the first candidate representative phrase for the popular keyword AAA 510 and AAA (see FIG. 5). 521 can be exposed together on the web page.

이때, 대표 문구 제공부는 인기 키워드에 대한 대표 문구(521)와 함께 인기 키워드에 대한 정지 영상 등의 사진(522)을 함께 웹 페이지(520)에 노출할 수도 있다. 이를 위해, 대표 문구 제공부는 수집된 문서들의 작성 시간에 따라 문서들에 가중치를 부가할 수 있다. 일례로, 대표 문구 제공부는 작성 시간이 최근에 해당할수록 가중치를 높게 부여할 수 있다. 그러면, 대표 문구 제공부는 클러스터에 포함된 문서들 중에서 인기 키워드에 대한 사진을 포함하고 있는 문서를 추출할 수 있다. 그리고, 대표 문구 제공부는 추출된 문서들 중 가중치가 가장 높은 문서에 포함된 사진을 인기 키워드에 대한 제1 후보 대표 문구 및 인기 키워드와 결합하여 제공할 수 있다.In this case, the representative phrase provider may expose the representative phrase 521 for the popular keyword together with the photo 522 such as a still image of the popular keyword on the web page 520. To this end, the representative phrase provider may add weights to the documents according to the creation time of the collected documents. For example, the representative phrase providing unit may give a higher weight as the preparation time corresponds to the more recent. Then, the representative phrase provider may extract a document including a picture of a popular keyword from the documents included in the cluster. The representative phrase provider may provide a photo included in the document having the highest weight among the extracted documents in combination with the first candidate representative phrase and the popular keyword for the popular keyword.

이상에서는, 기설정된 기준 확률을 이용하여 대표 기준 단어를 확장함에 따라 제1 후보 대표 문구를 생성하는 과정에 대해 설명하였으나, 기준 단어 확장부(432)는 아래의 수학식 2를 이용하여 기준 확률을 계산할 수도 있다. 그러면, 기준 단어 확장부(432)는 계산된 기준 확률과 조건부 확률을 비교하여 대표 기준 단어를 확장할 수도 있다.In the above, the process of generating the first candidate representative phrase by expanding the representative reference word using the predetermined reference probability has been described, but the reference word expansion unit 432 calculates the reference probability using Equation 2 below. You can also calculate Then, the reference word expansion unit 432 may expand the representative reference word by comparing the calculated reference probability with the conditional probability.

수학식 2에서, N(K_Cluster)는 K 클러스터에 포함된 문서의 개수이고, α는 임의의 변수이다. 일례로, α로는 0.3, 0.4, 0.5, 0.7 등의 실수가 이용될 수 있다.In Equation 2, N (K_Cluster) is the number of documents included in the K cluster, and α is an arbitrary variable. As one example, real numbers such as 0.3, 0.4, 0.5, 0.7, and the like may be used.

수학식 2에 따르면, 기준 단어 확장부(432)는 클러스터에 속하는 문서의 개수에 기초하여 기준 확률을 계산할 수 있다. 일례로, α=0.5가 이용되는 경우, 기준 단어 확장부(432)는 클러스터 A에 속하는 문서의 개수가 3개이므로, 클러스터 A에 대한 기준 확률을

으로 계산할 수 있다. According to Equation 2, the reference word expansion unit 432 may calculate a reference probability based on the number of documents belonging to the cluster. For example, when α = 0.5 is used, the reference word expansion unit 432 calculates a reference probability for the cluster A since the number of documents belonging to the cluster A is three.

Can be calculated as

도 6은 본 발명의 일 실시예에 따른 제1 후보 대표 문구를 생성하는 방법을 설명하기 위해 제공되는 흐름도이다.6 is a flowchart provided to explain a method of generating a first candidate representative phrase according to an embodiment of the present invention.

먼저, 610 단계에서, 문서 수집부(410)는 웹 페이지에 노출하고자 하는 실시간 인기 키워드와 관련된 문서들을 포털 등을 통해 수집할 수 있다.First, in step 610, the document collection unit 410 may collect documents related to the real-time popular keywords to be exposed to the web page through the portal.

이어, 620 단계에서, 클러스터 수행부(420)는 수집된 문서들의 형태소를 분석하여 클러스터를 수행할 수 있다.In operation 620, the cluster execution unit 420 may analyze the morphemes of the collected documents to perform the cluster.

그리고, 630 단계에서, 기준 단어 결정부(431)는 클러스터에 포함된 문서들의 형태소를 분석하여, 문서 별로 기준 단어를 결정할 수 있다. 이때, 기준 단어 결정부(431)는 형태소 분석을 통해 결정된 기준 단어가 문서들에 포함되는 빈도수를 카운트할 수 있다. 일례로, 클러스터 A에 포함된 문서들을 대상으로 결정된 기준 단어 및 기준 단어의 빈도수는 위의 표 5와 같을 수 있다.In operation 630, the reference word determiner 431 may analyze the morphemes of the documents included in the cluster to determine the reference word for each document. In this case, the reference word determiner 431 may count the frequency in which the reference word determined through the morpheme analysis is included in the documents. For example, the frequency of the reference word and the reference word determined for the documents included in the cluster A may be as shown in Table 5 above.

이어, 640 단계에서, 기준 단어 결정부(431)는 기준 단어의 빈도수에 기초하여 대표 기준 단어를 결정할 수 있다. 일례로, 기준 단어 결정부(431)는 위의 표 5에 따라 결정된 기준 단어들 중 빈도수가 높은 2개의 기준 단어 "홍콩" 및 "데이트"를 대표 기준 단어로 결정할 수 있다.In operation 640, the reference word determiner 431 may determine the representative reference word based on the frequency of the reference word. For example, the reference word determiner 431 may determine two reference words “Hong Kong” and “date” having a high frequency among the reference words determined according to Table 5 as the representative reference words.

그러면, 650 단계에서, 기준 단어 확장부(432)는 바이그램(bi-gram) 조건부 확률을 이용하여 결정된 대표 기준 단어의 조건부 확률을 계산할 수 있다. 여기서, 조건부 확률은 클러스터에 포함되는 문서들에서 대표 기준 단어와 이전 또는 이후에 연속하는 단어가 포함되는 확률을 의미한다.Then, in operation 650, the reference word expansion unit 432 may calculate the conditional probability of the representative reference word determined using the bi-gram conditional probability. Here, the conditional probability means a probability that a representative reference word and a word before or after consecutive are included in documents included in the cluster.

일례로, 기준 단어 확장부(432)는 클러스터 A 포함된 문서들에서 대표 기준 단어 "홍콩" 이후에 단어 "여배우"가 연속하는 조건부 확률을 2/3 및 1/3로 계산할 수 있다. 이때, 기준 단어 확장부(432)는 계산된 조건부 확률 중 큰 조건부 확률 2/3을 선택할 수 있다.In one example, the reference word extension 432 may calculate the conditional probability that the word "actress" continues after the representative reference word "Hong Kong" in the documents included in Cluster A to 2/3 and 1/3. In this case, the reference word expansion unit 432 may select a large conditional probability 2/3 of the calculated conditional probabilities.

이어, 660 단계에서, 기준 단어 확장부(432)는 선택된 조건부 확률이 기설정된 기준 확률 이상인지 여부를 판단할 수 있다. 일례로, 기준 확률이 1/2로 기설정된 경우, 기준 단어 확장부(432)는 계산된 "홍콩 여배우"의 조건부 확률 2/3이 기준 확률 1/2 이상인지를 판단할 수 있다.In operation 660, the reference word expansion unit 432 may determine whether the selected conditional probability is greater than or equal to a preset reference probability. For example, when the reference probability is preset to 1/2, the reference word expansion unit 432 may determine whether the conditional probability 2/3 of the calculated “Hong Kong actress” is equal to or greater than the reference probability 1/2.

이때, 계산된 조건부 확률이 기준 확률 이상으로 판단된 경우, 기준 단어 확장부(432)는 대표 기준 단어를 선택된 조건부 확률에 해당하는 단어로 확장할 수 있다. 그리고, 680 단계에서, 기준 단어 확장부(432)는 확장된 대표 기준 단어를 650 내지 660과 동일한 방법을 이용하여 계속 확장할 수 있다. 일례로, 기준 단어 확장부(432)는 "여배우" 이후에 "BBB와"가 연속하는 조건부 확률을 계산하고, 계산된 조건부 확률과 기준 확률을 비교함으로써 대표 기준 단어를 "BBB와"로 확장할 수 있다.In this case, when the calculated conditional probability is determined to be greater than or equal to the reference probability, the reference word expansion unit 432 may expand the representative reference word into a word corresponding to the selected conditional probability. In operation 680, the reference word expansion unit 432 may continuously expand the extended representative reference word using the same method as that of the 650 to 660. In one example, the reference word extension 432 expands the representative reference word to "BBB" by calculating a conditional probability that "BBB" continues after "actress" and comparing the calculated conditional probability with the reference probability. Can be.

이때, 계산된 조건부 확률이 기준 확률 미만으로 판단된 경우, 670 단계에서, 기준 단어 확장부(432)는 대표 기준 단어의 확장을 종료할 수 있다.In this case, when the calculated conditional probability is determined to be less than the reference probability, in operation 670, the reference word expansion unit 432 may terminate expansion of the representative reference word.

일례로, 기준 단어 확장부(432)는 클러스터 A에 포함된 문서들에서 대표 기준 단어 "BBB와" 이후에 연속하는 단어가 포함될 조건부 확률을 계산할 수 있다. 이때, 계산된 조건부 확률이 1/3로 모두 기준 확률 1/2 미만이므로, 기준 단어 확장부(432)는 대표 기준 단어의 확장을 종료할 수 있다. 그러면, 확장된 대표 기준 단어는 "홍콩 여배우 BBB와"가 될 수 있다.In one example, the reference word expansion unit 432 may calculate a conditional probability that the consecutive words after the representative reference word "BBB" in the documents included in the cluster A will be included. In this case, since the calculated conditional probabilities are all 1/3 and less than the reference probability 1/2, the reference word expansion unit 432 may terminate the expansion of the representative reference word. The extended representative criterion word can then be "Hong Kong actress BBB".

마찬가지로, 기준 단어 확장부(432)는 "홍콩"과 동일한 방법으로 640 단계에서 결정된 대표 기준 단어 "데이트"에 대해 확장을 수행할 수 있다. 그러면, 기준 단어 확장부(432)는 대표 기준 단어를 "비밀 데이트"로 확장할 수 있다. Similarly, the reference word expansion unit 432 may perform expansion on the representative reference word "date" determined in step 640 in the same manner as "Hong Kong". Then, the reference word expansion unit 432 may expand the representative reference word to "secret date".

이어, 690 단계에서, 기준 단어 확장부(432)는 확장된 대표 기준 단어들을 조합하여 제1 후보 대표 문구를 생성할 수 있다. 일례로, 기준 단어 확장부(432)는 대표 기준 단어 "홍콩"을 기초로 확장된 "홍콩 여배우 BBB와"와 대표 기준 단어 "데이트"를 기초로 확장된 "비밀 데이트"를 조합하여 "홍콩 여배우 BBB와 비밀 데이트"를 제1 후보 대표 문구로 생성할 수 있다.In operation 690, the reference word expansion unit 432 may generate the first candidate representative phrase by combining the extended representative reference words. In one example, the reference word extension 432 combines the "Hong Kong actress BBB" extended based on the representative reference word "Hong Kong" and the "Hong Kong actress" expanded based on the representative reference word "date". BBB and Secret Date "may be generated as the first candidate representative phrase.

이때, 제1 후보 대표 문구가 대표 문구로 결정된 경우, 대표 문구 제공부는 생성된 제1 후보 대표 문구와 인기 키워드를 결합하여 제공할 수 있다.In this case, when the first candidate representative phrase is determined as the representative phrase, the representative phrase provider may provide a combination of the generated first candidate representative phrase and the popular keyword.

일례로, 도 5와 같이, 대표 문구 제공부는 인기 키워드 목록(500)에 포함된 모든 인기 키워드들에 대한 대표 문구들을 웹 페이지에 노출할 수 있다. 이외에, 대표 문구 제공부는 인기 키워드 목록(500)에 포함된 인기 키워드들 중 마우스 등에 의해 활성화된 인기 키워드에 대한 대표 문구만을 웹 페이지에 노출할 수도 있다.For example, as shown in FIG. 5, the representative phrase provider may expose representative phrases for all popular keywords included in the popular keyword list 500 on a web page. In addition, the representative phrase provider may expose only the representative phrase for the popular keyword activated by the mouse among the popular keywords included in the popular keyword list 500 on the web page.

다른 예로, 대표 문구 제공부는 제1 후보 대표 문구(521)와 함께 인기 키워드에 대한 정지 영상 등의 사진(522)을 함께 노출할 수도 있다. 이때, 대표 문구 제공부)는 수집된 문서들의 작성 시간에 따라 문서들에 가중치를 부가하고, 부가된 가중치가 가장 높은 문서에 포함된 사진을 인기 키워드에 대한 제1 후보 대표 문구와 함께 웹 페이지에 노출할 수 있다. 그리고, 가중치가 가장 높은 문서에 사진이 포함되지 않은 경우, 대표 문구 제공부는, 가중치가 차순위인 문서에 포함된 사진을 이용할 수도 있다. 여기서, 대표 문구 제공부는 문서의 작성 시간이 최근에 해당할수록 가중치를 높게 부여할 수 있다.As another example, the representative phrase provider may expose the first candidate representative phrase 521 together with a picture 522 such as a still image of a popular keyword. At this time, the representative phrase providing unit) adds weights to the documents according to the collection time of the collected documents, and adds the photo included in the highest weighted document to the web page along with the first candidate representative phrases for the popular keywords. May be exposed. In addition, when a photograph is not included in the document having the highest weight, the representative phrase providing unit may use the photograph included in the document having the next highest weight. Here, the representative phrase providing unit may give a higher weight as the creation time of the document corresponds to the more recent.

이상의 도 6에서는 하나의 클러스터 A에 대한 제1 후보 대표 문구를 생성하고, 생성한 클러스터 A의 제1 후보 대표 문구와 인기 키워드를 결합하여 제공하는 것에 대해 설명하였으나, 본 대표 문구 제공 방법은, 복수의 클러스터에 대한 제1 후보 대표 문구를 생성할 수 있다. 그리고, 제1 후보 대표 문구가 대표 문구로 결정된 경우, 대표 문구 제공부는, 생성한 복수의 제1 후보 대표 문구 중 적어도 하나와 인기 키워드를 결합하여 웹 페이지에 노출할 수도 있다.In FIG. 6, the first candidate representative phrase for one cluster A is generated, and the first candidate representative phrase of the cluster A and the popular keyword are combined and provided. However, the method for providing the representative phrase includes a plurality of representative phrases. A first candidate representative phrase for the cluster of may be generated. When the first candidate representative phrase is determined as the representative phrase, the representative phrase provider may combine at least one of the generated first candidate representative phrases and a popular keyword to expose the web page.

일례로, 대표 문구 제공부는 생성된 클러스터 A 및 클러스터 B에 대한 제1 후보 대표 문구를 인기 키워드 "AAA"와 각각 결합하여 웹 페이지에 노출할 수 있다. 이때, 대표 문구 제공부는 클러스터 A 및 클러스터 B에 노출 우선 순위를 부여하고, 노출 우선 순위가 높은 클러스터의 제1 후보 대표 문구와 인기 키워드를 결합하여 웹 페이지에 노출할 수 있다.For example, the representative phrase provider may combine the first candidate representative phrases for the generated cluster A and cluster B with the popular keyword "AAA" to expose the web page. In this case, the representative phrase provider may assign exposure priority to the cluster A and the cluster B, and combine the first candidate representative phrase and the popular keyword of the cluster having the high exposure priority to expose the web page.

보다 상세하게는, 클러스터 A의 노출 우선 순위가 80이고, 클러스터 B의 노출 우선 순위가 60인 경우, 인기 키워드 "AAA" 검색이 요청되면, 대표 문구 제공부는 클러스터 A의 제1 후보 대표 문구와 인기 키워드 "AAA"를 결합하여 웹 페이지에 노출할 수 있다. 이때, 대표 문구 제공부는 복수의 클러스터들에 포함된 문서들의 가중치를 이용하여 노출 우선 순위를 결정할 수 있다. 일례로, 대표 문구 제공부는 클러스터에 포함된 문서들의 가중치의 합이 높은 순으로 노출 우선 순위를 높게 부여할 수 있다. 그러면, 대표 문구 제공부는 복수의 클러스터들 중 노출 우선 순위가 가장 높은 클러스터를 선택하고, 선택한 클러스터의 제1 후보 대표 문구와 인기 키워드를 결합하여 웹 페이지에 노출할 수 있다.More specifically, when the exposure priority of the cluster A is 80 and the exposure priority of the cluster B is 60, if a popular keyword "AAA" search is requested, the representative phrase providing unit is popular with the first candidate representative phrase of the cluster A. You can combine the keyword "AAA" to expose it on a web page. In this case, the representative phrase providing unit may determine the exposure priority using the weights of the documents included in the plurality of clusters. For example, the representative phrase provider may assign the exposure priority to the highest sum of the weights of the documents included in the cluster. Then, the representative phrase provider may select the cluster having the highest exposure priority among the plurality of clusters, and combine the first candidate representative phrase of the selected cluster with the popular keyword to expose the web page.

마찬가지로, 클러스터 B에 포함된 문서의 개수가 클러스터 A에 포함된 문서의 개수보다 작더라도, 클러스터 B에 클러스터 A 포함된 문서들보다 최근에 작성된 문서가 많은 경우, 클러스터 B의 크기가 클러스터 A의 크기보다 클 수도 있다. 그러면, 대표 문구 제공부는 클러스터 B의 제1 후보 대표 문구와 인기 키워드를 결합하여 노출할 수도 있다.Similarly, even if the number of documents included in Cluster B is less than the number of documents included in Cluster A, if Cluster B has more recent documents than the documents contained in Cluster A, then the size of Cluster B is the size of Cluster A. May be greater than Then, the representative phrase provider may combine and expose the first candidate representative phrase of the cluster B and the popular keyword.

지금까지, 도 4 내지 도 6을 참조하여 제1 후보 대표 문구를 생성하고, 제1 후보 대표 문구가 대표 문구로 결정됨에 따라 제1 후보 대표 문구와 인기 키워드를 결합하여 제공하는 구성에 대해 설명하였다. 이때, 도 4 내지 도 6에서는, 복수의 기준 단어들 중 빈도수가 높은 순으로 2개의 대표 기준 단어를 결정하고, 조건부 확률에 기초하여 대표 기준 단어를 확장 및 대표 문구를 생성하는 것으로 설명하였으나, 이는 실시예에 해당되며, 기준 단어 결정부(431)는 복수의 기준 단어들 중 빈도수가 가장 높은 하나의 대표 기준 단어를 결정하고, 기준 단어 확장부(432)는 결정된 대표 기준 단어를 확장하여 대표 문구를 생성할 수도 있다.Up to now, the configuration of generating the first candidate representative phrase and combining the first candidate representative phrase and the popular keyword as the first candidate representative phrase is determined as the representative phrase has been described with reference to FIGS. 4 to 6. . In this case, in FIGS. 4 to 6, two representative reference words are determined in order of high frequency among the plurality of reference words, and the representative reference words are expanded and representative phrases are generated based on the conditional probability. The reference word determiner 431 determines one representative reference word having the highest frequency among the plurality of reference words, and the reference word expander 432 expands the determined representative reference word to represent the representative phrase. You can also create

이하에서는, 도 7 내지 도 11을 참조하여, 제2 후보 대표 문구를 생성하는 구성, 및 제2 후보 대표 문구가 대표 문구로 결정된 경우, 제2 후보 대표 문구와 인기 키워드를 결합하여 제공하는 구성에 대해 설명하기로 한다.Hereinafter, referring to FIGS. 7 to 11, when the second candidate representative phrase is determined as the representative phrase, and when the second candidate representative phrase is determined as the representative phrase, the second candidate representative phrase and the popular keyword are combined to provide. This will be explained.

도 7은 본 발명의 일실시예에 따른 제2 후보 대표 문구 생성부의 세부 구성을 도시한 블럭도이다. 7 is a block diagram illustrating a detailed configuration of a second candidate representative phrase generation unit according to an embodiment of the present invention.

도 7에 따르면, 도 2의 대표 문구 제공 시스템은, 방송 데이터 수집부, 및 데이터 베이스를 더 포함할 수 있다.According to FIG. 7, the representative phrase providing system of FIG. 2 may further include a broadcast data collecting unit and a database.

방송 데이터 수집부(710)는 방송국으로부터 방송 데이터들을 수집할 수 있다. 여기서, 방송국은 공중파 TV 방송, 케이블 TV 방송, 라디오 방송, 인터넷 방송, 위성 TV 방송을 포함할 수 있다. 이때, 방송 데이터 수집부(710)는 매일 오전, 매일 점심, 매일 오후, 또는 매주 목요일 등과 같이, 기설정된 시간마다 주기적으로 방송 데이터들을 수집할 수 있다. The broadcast data collection unit 710 may collect broadcast data from a broadcast station. Here, the broadcasting station may include airwave TV broadcast, cable TV broadcast, radio broadcast, internet broadcast, and satellite TV broadcast. In this case, the broadcast data collection unit 710 may periodically collect broadcast data every predetermined time, such as every morning, every day lunch, every afternoon, or every Thursday.

그리고, 방송 데이터 수집부(710)는 수집된 방송 데이터를 아래의 표 6의 형식으로 데이터 베이스(720)에 저장할 수 있다.The broadcast data collection unit 710 may store the collected broadcast data in the database 720 in the form of Table 6 below.

방송국Broadcast stations 프로그램명Program name 시작시간Start time 종료시간End time 출연자이름Performer Name 극중이름Name in the play 방송내용Broadcast contents AAAAAA AAA 뉴스AAA News 20100312 20:0020100312 20:00 20100312 21:0020100312 21:00 KKKKKK 선거 투표Election ballot BBB2BBB2 XXXXXX 20100312 22:0020100312 22:00 20100312 23:2020100312 23:20 OOO, PPP OOO, PPP QQQ, RRRQQQ, RRR 노비, 도망, 장군Novy, run away, general BBB1BBB1 YYYYYY 20100312 22:0020100312 22:00 20100312 23:0020100312 23:00 JJJJJJ 안경 비밀Glasses secrets BBB2BBB2 ZZZZZZ 20100312 21:0020100312 21:00 20100312 22:0020100312 22:00 VVV, PPPVVV, PPP 모내기 귀농Rice planting

제2 후보 대표 문구 생성부(730)는 방송 데이터에서 인기 키워드를 포함하는 방송 프로그램을 결정할 수 있다. 이때, 제2 후보 대표 문구 생성부(730)는 인기 키워드의 노출 시간과 결정된 방송 프로그램의 방송 시간 간의 차의 절대값으로서 차이값을 계산할 수 있다. 여기서, 방송 시간은 방송 시작 시간 및 방송 종료 시간을 포함할 수 있다.The second candidate representative phrase generator 730 may determine a broadcast program including a popular keyword in broadcast data. In this case, the second candidate representative phrase generator 730 may calculate a difference value as an absolute value of the difference between the exposure time of the popular keyword and the broadcast time of the determined broadcast program. Here, the broadcast time may include a broadcast start time and a broadcast end time.

일례로, 제2 후보 대표 문구 생성부(730)는 인기 키워드의 노출 시간이 방송 시작 시간 이전인 경우, 노출 시간과 방송 시작 시간 간의 차의 절대값으로서 차이값을 계산할 수 있다. For example, when the exposure time of the popular keyword is before the broadcast start time, the second candidate representative phrase generator 730 may calculate a difference value as an absolute value of the difference between the exposure time and the broadcast start time.

다른 예로, 인기 키워드의 노출 시간이 방송 종료 시간 이후인 경우, 제2 후보 대표 문구 생성부(730)는 노출 시간과 방송 종료 시간 간의 차의 절대값으로서 차이값을 계산할 수 있다.As another example, when the exposure time of the popular keyword is after the broadcast end time, the second candidate representative phrase generator 730 may calculate a difference value as an absolute value of the difference between the exposure time and the broadcast end time.

또 다른 예로, 인기 키워드의 노출 시간이 방송 시작 시간과 방송 종료 시간 사이에 포함되는 경우, 제2 후보 대표 문구 생성부(730)는 방송 시작 시간 및 방송 종료 시간 중 어느 하나와 노출 시간 간의 차의 절대값으로서 차이값을 계산할 수 있다. 이때, 방송 시작 시간 2시간 전과 방송 종료 시간 2시간 이후에 방송 프로그램에 대한 검색이 많이 수행되므로, 기준값은 2시간으로 기설정될 수 있다.As another example, when the exposure time of the popular keyword is included between the broadcast start time and the broadcast end time, the second candidate representative phrase generator 730 may determine a difference between any one of the broadcast start time and the broadcast end time and the exposure time. The difference value can be calculated as an absolute value. In this case, since many searches for the broadcast program are performed 2 hours before the broadcast start time and 2 hours after the broadcast end time, the reference value may be preset to 2 hours.

그러면, 제2 후보 대표 문구 생성부(730)는 계산된 차이값과 기설정된 기준값을 비교하여 인기 키워드에 대한 대표 문구를 생성할 수 있다. 일례로, 제2 후보 대표 문구 생성부(730)는 계산된 차이값이 기설정된 기준값 이하이면, 결정된 방송 프로그램의 방송 시간에 기초하여 제2 후보 대표 문구를 생성할 수 있다Then, the second candidate representative phrase generation unit 730 may generate the representative phrase for the popular keyword by comparing the calculated difference value with a predetermined reference value. For example, if the calculated difference value is less than or equal to a predetermined reference value, the second candidate representative phrase generator 730 may generate the second candidate representative phrase based on the determined broadcast time of the broadcast program.

보다 상세하게는, 제2 후보 대표 문구 생성부(730)는 방송 프로그램 결정부(731), 매칭 점수 계산부(732), 및 생성부(733)를 포함할 수 있다.In more detail, the second candidate representative phrase generator 730 may include a broadcast program determiner 731, a matching score calculator 732, and a generator 733.

방송 프로그램 결정부(731)는 방송 데이터에 인기 키워드를 매칭하여, 방송 데이터에서 인기 키워드를 포함하는 방송 프로그램을 결정할 수 있다.The broadcast program determiner 731 may match a popular keyword with the broadcast data to determine a broadcast program including the popular keyword in the broadcast data.

일례로, 인기 키워드로 "XXX"가 웹 페이지에 노출된 경우, 방송 프로그램 결정부(731)는 표 6의 방송 데이터에서 인기 키워드 "XXX"를 포함하는 방송 프로그램을 결정할 수 있다.For example, when "XXX" is exposed as a popular keyword on a web page, the broadcast program determining unit 731 may determine a broadcast program including the popular keyword "XXX" in the broadcast data of Table 6.

이때, 노출 시간이 방송 시작 시간 이전인 경우, 매칭 점수 계산부(732)는 결정된 방송 프로그램의 방송 시작 시간과 노출 시간 간의 차의 절대값으로서 차이값을 계산할 수 있다. 그러면, 생성부(733)는 계산된 차이값이 기설정된 기준값 이하이면, 인기 키워드 "XXX"가 방송 프로그램 "XXX"로 인해 실시간으로 인기 키워드가 된 원인으로 결정할 수 있다. In this case, when the exposure time is before the broadcast start time, the matching score calculator 732 may calculate a difference value as an absolute value of the difference between the broadcast start time and the exposure time of the determined broadcast program. Then, the generation unit 733 may determine that the popular keyword "XXX" has become a popular keyword in real time due to the broadcast program "XXX" when the calculated difference value is equal to or less than the preset reference value.

이에 따라, 생성부(733)는 인기 키워드의 원인으로 결정된 방송 프로그램을 기초로 인기 키워드에 대한 제2 후보 대표 문구를 생성할 수 있다.Accordingly, the generation unit 733 may generate the second candidate representative phrase for the popular keyword based on the broadcast program determined as the cause of the popular keyword.

일례로, 생성부(733)는 방송 프로그램 "XXX"의 방송 시작 시간을 기초로 "22:00부터 방송"을 제2 후보 대표 문구로 생성할 수 있다.For example, the generator 733 may generate “broadcast from 22:00” as the second candidate representative phrase based on the broadcast start time of the broadcast program “XXX”.

다른 예로, 인기 키워드로 "QQQ"가 웹 페이지에 노출된 경우, 방송 프로그램 결정부(731)는 표 6의 방송 데이터에서 인기 키워드 "QQQ"를 포함하는 방송 프로그램 "XXX"를 결정할 수 있다. 이때, 노출 시간이 방송 종료 시간 이후인 경우, 매칭 점수 계산부(732)는 결정된 방송 프로그램의 방송 종료 시간과 노출 시간 간의 차의 절대값으로서 차이값을 계산할 수 있다. 그러면, 생성부(733)는 계산된 차이값이 기설정된 기준값 이하이면, 인기 키워드 "QQQ"가 방송 프로그램 "XXX"로 인해 실시간으로 인기 키워드가 된 원인으로 결정할 수 있다. As another example, when "QQQ" is exposed as a popular keyword on a web page, the broadcast program determiner 731 may determine a broadcast program "XXX" including the popular keyword "QQQ" in the broadcast data of Table 6. In this case, when the exposure time is after the broadcast end time, the matching score calculator 732 may calculate a difference value as an absolute value of the difference between the broadcast end time and the exposure time of the determined broadcast program. Then, the generation unit 733 may determine that the popular keyword "QQQ" becomes a popular keyword in real time due to the broadcast program "XXX" when the calculated difference value is equal to or less than the predetermined reference value.

이에 따라, 생성부(733)는 인기 키워드의 원인으로 결정된 방송 프로그램을 기초로 제2 후보 대표 문구를 생성할 수 있다. 일례로, 생성부(733)는 방송 프로그램 "XXX"의 방송 시간을 기초로 "22:00부터 23:20까지 방송"을 제2 후보 대표 문구로 생성할 수 있다.Accordingly, the generation unit 733 may generate the second candidate representative phrase based on the broadcast program determined as the cause of the popular keyword. For example, the generator 733 may generate “broadcast from 22:00 to 23:20” as the second candidate representative phrase based on the broadcast time of the broadcast program “XXX”.

한편, 노출 시간이 방송 시작 시간과 방송 종료 시간 사이에 포함된 경우, 즉, 노출 시간이 방송 시작 시간 이후이고, 방송 종료 시간 이전인 경우, 매칭 점수 계산부(732)는 결정된 방송 프로그램의 방송 시작 시간과 방송 종료 시간 중 어느 하나와 노출 시간 간의 차의 절대값으로서 차이값을 계산할 수 있다. 그리고, 생성부(733)는 차이값에 따라 결정된 방송 프로그램의 방송 시간을 이용하여 제2 후보 대표 문구를 생성할 수 있다.Meanwhile, when the exposure time is included between the broadcast start time and the broadcast end time, that is, when the exposure time is after the broadcast start time and before the broadcast end time, the matching score calculator 732 starts broadcasting of the determined broadcast program. The difference value may be calculated as an absolute value of the difference between any one of the time and the broadcast end time and the exposure time. The generator 733 may generate the second candidate representative phrase by using the broadcast time of the broadcast program determined according to the difference value.

이하에서는, 도 7을 계속 참조하여, 인기 키워드가 여러 단어가 조합된 구조인 경우, 제2 후보 대표 문구를 생성하는 과정에 대해 설명하기로 한다. Hereinafter, referring to FIG. 7, a process of generating a second candidate representative phrase when the popular keyword has a structure in which several words are combined will be described.

먼저, 여러 단어가 조합된 형태의 인기 키워드가 노출된 경우, 방송 프로그램 결정부(731)는 인기 키워드를 형태소 분석하여 복수의 단어들로 분리할 수 있다. 그리고, 방송 프로그램 결정부(731)는 방송 데이터에 분리된 단어를 매칭하여, 방송 데이터에서 분리된 단어를 포함하는 방송 프로그램을 결정할 수 있다. 이때, 방송 프로그램 결정부(731)는 방송 데이터에서 분리된 단어를 포함하는 방송 프로그램이 존재할 때까지 분리된 단어들을 계속하여 분리할 수 있다.First, when a popular keyword having a combination of several words is exposed, the broadcast program determiner 731 may form a plurality of words by morphologically analyzing the popular keyword. The broadcast program determiner 731 may match a word separated from the broadcast data to determine a broadcast program including the word separated from the broadcast data. In this case, the broadcast program determiner 731 may continue to separate the separated words until there is a broadcast program including the words separated from the broadcast data.

일례로, 인기 키워드가 여러 단어가 조합된 "XXX PPP QQQ" 구조인 경우, 방송 프로그램 결정부(731)는 인기 키워드를 형태소 분석하여 "XXX", "PPP", "QQQ"로 분리할 수 있다. 그리고, 방송 프로그램 결정부(731)는 방송 데이터와 매칭을 통해 분리된 단어들을 포함하는 방송 프로그램을 결정할 수 있다.For example, when the popular keyword has a structure of "XXX PPP QQQ" in which several words are combined, the broadcast program determiner 731 may morph the popular keyword into "XXX", "PPP", and "QQQ". . The broadcast program determiner 731 may determine a broadcast program including words separated through matching with the broadcast data.

이때, PPP를 이용하는 경우를 예로 들면, 방송 프로그램 결정부(731)는 방송 데이터에서 분리된 단어 "PPP"를 포함하는 방송 프로그램으로 "XXX"와 "ZZZ"를 결정할 수 있다. 그러면, 방송 프로그램 결정부(731)는 방송 데이터에 분리된 단어들을 포함하는 방송 프로그램이 존재하므로, 분리된 단어 "PPP"를 계속 분리하는 것을 종료할 수 있다. 여기서, 방송 데이터에 분리된 단어들을 포함하는 방송 프로그램이 존재하지 않는 경우, 방송 프로그램 결정부(731)는 분리된 단어 "PPP"를 계속하여 분리할 수 있다.In this case, as an example of using PPP, the broadcast program determining unit 731 may determine "XXX" and "ZZZ" as a broadcast program including the word "PPP" separated from broadcast data. Then, the broadcast program determination unit 731 may end the continuous separation of the separated word "PPP" because there is a broadcast program including the separated words in the broadcast data. Here, when there is no broadcast program including the separated words in the broadcast data, the broadcast program determination unit 731 may continue to separate the separated word "PPP".

이때, 분리된 단어를 포함하는 방송 프로그램이 복수 개인 경우, 방송 프로그램 결정부(731)는 결정된 방송 프로그램 중 분리된 단어들을 가장 많이 포함하는 방송 프로그램을 선택할 수 있다. In this case, when there are a plurality of broadcast programs including the separated words, the broadcast program determiner 731 may select a broadcast program including the most separated words among the determined broadcast programs.

일례로, 방송 데이터에서 분리된 단어 "PPP"를 포함하는 방송 프로그램으로 "XXX"와 "ZZZ"가 결정된 경우, 방송 프로그램 결정부(731)는 방송 프로그램 "XXX" 및 "ZZZ"에 포함되는 분리된 단어의 개수를 카운트할 수 있다. 그러면, 방송 프로그램 결정부(731)는 방송 프로그램 "XXX"는 3개의 분리된 단어 "XXX", "PPP", 및 "QQQ"를 포함하고, "ZZZ"는 1개의 분리된 단어 "PPP"를 포함하는 것으로 카운트할 수 있다. 이에 따라, 방송 프로그램 결정부(731)는 가장 많은 분리된 단어를 포함하는 방송 프로그램 "XXX"를 선택할 수 있다.For example, when "XXX" and "ZZZ" are determined as a broadcast program including the word "PPP" separated from the broadcast data, the broadcast program determining unit 731 is divided into broadcast programs "XXX" and "ZZZ". The number of words can be counted. The broadcast program determining unit 731 then broadcast program "XXX" includes three separate words "XXX", "PPP", and "QQQ", and "ZZZ" represents one separate word "PPP". We can count by including. Accordingly, the broadcast program determiner 731 may select a broadcast program "XXX" that includes the most separated words.

다른 예로, 방송 프로그램 결정부(731)는 분리된 단어를 포함하는 방송 프로그램이 복수 개이고, 복수의 방송 프로그램이 동일한 개수의 분리된 단어들을 포함하는 경우, 방송 프로그램 결정부(731)는 결정된 복수의 방송 프로그램의 방송 시간에 기초하여 복수의 방송 프로그램 중 어느 하나의 방송 프로그램을 선택할 수 있다. 이때, 방송 프로그램 결정부(731)는 복수의 방송 프로그램의 방송 시작 시간과 노출 시간, 방송 종료 시간과 노출 시간을 비교하고, 비교 결과 노출 시간에 가장 가까운 방송 시간에 해당하는 방송 프로그램을 선택할 수 있다.As another example, when the broadcast program determining unit 731 includes a plurality of broadcast programs including the separated words, and the plurality of broadcast programs include the same number of separated words, the broadcast program determining unit 731 may determine the plurality of determined programs. Any one of a plurality of broadcast programs may be selected based on the broadcast time of the broadcast program. In this case, the broadcast program determiner 731 may compare the broadcast start time and the exposure time, the broadcast end time and the exposure time of the plurality of broadcast programs, and select a broadcast program corresponding to the broadcast time closest to the exposure time as a result of the comparison. .

마찬가지로, 인기 키워드가 하나의 단어로 구성되고, 인기 키워드를 포함하는 방송 프로그램이 복수개인 경우에도, 방송 프로그램 결정부(731)는 복수의 방송 프로그램의 방송 시간 중 노출 시간에 가장 가까운 방송 프로그램을 선택할 수 있다.Similarly, even when the popular keyword is composed of one word and there are a plurality of broadcast programs including the popular keyword, the broadcast program determining unit 731 selects a broadcast program that is closest to the exposure time among the broadcast times of the plurality of broadcast programs. Can be.

그러면, 매칭 점수 계산부(732)는 선택한 방송 프로그램의 방송 시간과 노출 시간 간의 차의 절대값으로서 차이값을 계산할 수 있다. 그리고, 생성부(233)는 차이값을 기초로 결정된 방송 프로그램의 방송 시간을 이용하여 제2 후보 대표 문구를 생성할 수 있다. 여기서, 방송 시간은 방송 시작 시간 및 방송 종료 시간을 포함할 수 있다. 이때, 차이값을 계산하는 과정은 인기 키워드가 하나의 단어로 구성된 경우, 앞에서 설명한 매칭 점수 계산부(732)에서 차이값을 계산하는 과정과 동일하므로 중복되는 설명은 생략하기로 한다.Then, the matching score calculator 732 may calculate a difference value as an absolute value of the difference between the broadcast time and the exposure time of the selected broadcast program. The generation unit 233 may generate the second candidate representative phrase using the broadcast time of the broadcast program determined based on the difference value. Here, the broadcast time may include a broadcast start time and a broadcast end time. In this case, the process of calculating the difference value is the same as the process of calculating the difference value in the matching score calculation unit 732 described above, if the popular keyword is composed of a single word will be omitted.

한편, 인기 키워드를 포함하는 방송 프로그램이 복수 개로 결정된 경우, 방송 프로그램 결정부(731)는 복수의 방송 프로그램의 매칭 점수를 이용하여 복수의 방송 프로그램 중 어느 하나를 선택할 수도 있다. 여기서, 인기 키워드는 하나의 단어로 구성될 수도 있고, 여러 단어가 조합된 형태일 수도 있다. Meanwhile, when a plurality of broadcast programs including popular keywords are determined, the broadcast program determiner 731 may select one of the plurality of broadcast programs by using matching scores of the plurality of broadcast programs. Here, the popular keyword may be composed of one word or may be a combination of several words.

이때, 매칭 점수 계산부(732)는 방송 데이터를 구성하는 항목들 중 인기 키워드를 포함하는 항목에 따라 매칭 점수를 부가하고, 매칭 점수와 가중치를 이용하여 최종 매칭 점수를 계산할 수 있다. In this case, the matching score calculator 732 may add a matching score according to an item including a popular keyword among items constituting the broadcast data, and calculate a final matching score using the matching score and the weight.

먼저, 매칭 점수 계산부(732)는 방송 데이터를 구성하는 항목들 중 인기 키워드가 프로그램 명, 출연진, 극중 이름, 및 방송 내용에 포함되는지 여부에 따라 인기 키워드를 포함하는 방송 프로그램 별로 매칭 점수를 부가할 수 있다. 일례로, 매칭 점수 계산부(732)는 방송 데이터를 구성하는 항목들 중 인기 키워드가 프로그램 명에 포함되는 경우 100, 출연진에 포함되는 경우 80, 극중 이름에 포함되는 경우 80, 및 방송 내용에 포함되는 경우 50을 매칭 점수로 부가할 수 있다.First, the matching score calculator 732 adds a matching score for each broadcast program including the popular keyword according to whether the popular keyword among the items constituting the broadcast data is included in the program name, cast member, name in the play, and broadcast content. can do. For example, the matching score calculator 732 may include 100 when a popular keyword is included in a program name among items constituting broadcast data, 80 when included in a cast, 80 when included in a name in a play, and broadcast content. 50 may be added as a matching score.

그리고, 매칭 점수 계산부(732)는 인기 키워드가 노출된 시간과 인기 키워드를 포함하는 방송 프로그램의 방송 시간을 이용하여 방송 프로그램 별 시간 점수를 부가할 수 있다. 일례로, 인기 키워드가 노출된 시간이 기설정된 오차 범위 내에서 방송 시간과 일치하는 경우, 매칭 점수 계산부(732)는 시간 점수를 50으로 부가할 수 있다. 이때, 매칭 점수 계산부(732)는 인기 키워드가 노출된 시간이 인기 키워드를 포함하는 방송 프로그램의 방송 시간과 10분 오차 시 마다 50에서 -5점을 감산하여 시간 점수를 부가할 수 있다. In addition, the matching score calculator 732 may add a time score for each broadcast program using a time at which a popular keyword is exposed and a broadcast time of a broadcast program including the popular keyword. For example, when the time at which the popular keyword is exposed coincides with the broadcast time within a preset error range, the matching score calculator 732 may add a time score of 50. In this case, the matching score calculator 732 may add a time score by subtracting 50 to -5 points every 10 minutes for the broadcasting time of the broadcast program including the popular keyword from the time when the popular keyword is exposed.

또한, 매칭 점수 계산부(732)는 인기 키워드를 포함하는 방송 프로그램이 속하는 방송의 형태에 따라 방송 프로그램 별로 방송국 가중치를 부가할 수 있다. 이때, 매칭 점수 계산부(732)는 공중파 방송, 케이블 방송, 라이오 방송 순으로 높은 가중치를 부가할 수 있다. 일례로, 매칭 점수 계산부(732)는 공중파 방송에 가중치 1, 케이블 방송에 가중치 0.1, 라이오 방송에 가중치 0.05를 부가할 수 있다.In addition, the matching score calculator 732 may add broadcasting station weights for each broadcasting program according to the type of broadcasting to which the broadcasting program including the popular keyword belongs. In this case, the matching score calculator 732 may add high weights in the order of airwave broadcasting, cable broadcasting, and lion broadcasting. For example, the matching score calculator 732 may add weight 1 to over-the-air broadcast, weight 0.1 to cable broadcast, and 0.05 weight to lion broadcast.

마지막으로, 매칭 점수 계산부(732)는 인기 키워드를 포함하는 방송 프로그램 별로 부가된 매칭 점수, 시간 점수, 및 방송국 가중치와 방송 프로그램의 시청률을 이용하여 최종 매칭 점수를 방송 프로그램 별로 계산할 수 있다. 일례로, 매칭 점수 계산부(732)는 아래의 수학식 3을 이용하여 인기 키워드를 포함하는 방송 프로그램 별 최종 매칭 점수를 계산할 수 있다.Finally, the matching score calculator 732 may calculate the final matching score for each broadcast program by using a matching score, a time score, and a broadcasting station weight and an audience rating of the broadcast program added for each broadcast program including the popular keyword. For example, the matching score calculator 732 may calculate a final matching score for each broadcast program including a popular keyword by using Equation 3 below.

수학식 3에서, P는 프로그램명, 출연진, 극중 이름, 및 방송 내용에 따라 부가된 매칭 점수, T는 시간 점수, S는 방송국 가중치, A는 시청률이다.In Equation 3, P is a matching score added according to a program name, cast, name in the play, and broadcast content, T is time score, S is broadcasting station weight, and A is viewer rating.

수학식 3에 따르면, 매칭 점수 계산부(732)는 매칭 점수와 시간 점수의 합에 방송국 가중치를 곱하고, 계산된 곱에 시청률을 곱하여 방송 프로그램 별로 최종 매칭 점수를 계산할 수 있다. 일례로, 인기 키워드 PPP를 포함하는 방송 프로그램 "XXX" 및 "ZZZ"의 시청률이 30%, 10%인 경우, 매칭 점수 계산부(732)는 (P+T)×S에 0.3을 곱하여 XXX의 최종 매칭 점수를 계산하고, (P+T)×S에 0.1을 곱하여 ZZZ의 최종 매칭 점수를 계산할 수 있다.According to Equation 3, the matching score calculator 732 may calculate the final matching score for each broadcast program by multiplying the sum of the matching score and the time score by the broadcasting station weight and multiplying the calculated product by the audience rating. For example, when the ratings of the broadcast programs "XXX" and "ZZZ" including the popular keyword PPP are 30% and 10%, the matching score calculation unit 732 multiplies (P + T) × S by 0.3 to give XXX. The final matching score may be calculated, and the final matching score of ZZZ may be calculated by multiplying (P + T) × S by 0.1.

그러면, 방송 프로그램 결정부(731)는 인기 키워드를 포함하는 방송 프로그램들 중 계산된 최종 매칭 점수가 가장 높은 방송 프로그램을 선택할 수 있다.Then, the broadcast program determiner 731 may select a broadcast program having the highest final matching score calculated among broadcast programs including popular keywords.

이어, 생성부(733)는 선택한 방송 프로그램의 방송 시간을 기초로 인기 키워드에 대한 제2 후보 대표 문구를 생성할 수 있다. 일례로, 생성부(733)는 도 8과 같이, 선택한 방송 프로그램 "XXX"의 방송 시간을 기초로 "22:00부터 방송"을 제2 후보 대표 문구(821)로 생성할 수 있다.Subsequently, the generation unit 733 may generate the second candidate representative phrase for the popular keyword based on the broadcast time of the selected broadcast program. For example, as illustrated in FIG. 8, the generation unit 733 may generate “broadcast from 22:00” as the second candidate representative phrase 821 based on the broadcast time of the selected broadcast program “XXX”.

이때, 제2 후보 대표 문구가 대표 문구로 결정된 경우, 대표 문구 제공부는 제2 후보 대표 문구(821)와 인기 키워드(810)를 결합하여 제공할 수 있다. 일례로, 대표 문구 제공부는 제2 후보 대표 문구(821)와 함께 인기 키워드에 대한 정지 영상 등의 사진(822)을 함께 웹 페이지에 노출할 수 있다. 이때, 대표 문구 제공부는 실시간 인기 키워드들을 포함하는 목록(800)에서 마우스, 포인터 등에 의해 활성화된 인기 키워드에 대한 대표 문구와 사진을 웹 페이지에 노출할 수 있다.In this case, when the second candidate representative phrase is determined as the representative phrase, the representative phrase provider may provide a combination of the second candidate representative phrase 821 and the popular keyword 810. For example, the representative phrase provider may expose a photo 822 such as a still image of a popular keyword together with the second candidate representative phrase 821 on a web page. In this case, the representative phrase provider may expose the representative phrase and the picture of the popular keyword activated by the mouse, the pointer, etc. in the list 800 including the real-time popular keywords on the web page.

도 9는 본 발명의 일실시예에 따라 방송 시간을 이용하여 제2 후보 대표 문구를 제공하는 과정을 설명하기 위해 제공되는 흐름도이다. 9 is a flowchart provided to explain a process of providing a second candidate representative phrase using broadcast time according to an embodiment of the present invention.

도 9에 따르면, 910 단계에서, 방송 데이터 수집부(720)는 방송국으로부터 방송 데이터를 주기적으로 수집하여 데이터 베이스(710)에 저장할 수 있다. 여기서, 방송국은 공중파 TV 방송, 케이블 TV 방송, 라디오 방송, 인터넷 방송, 위성 TV 방송을 포함할 수 있다.Referring to FIG. 9, in operation 910, the broadcast data collector 720 may periodically collect broadcast data from a broadcast station and store the broadcast data in the database 710. Here, the broadcasting station may include airwave TV broadcast, cable TV broadcast, radio broadcast, internet broadcast, and satellite TV broadcast.

이어, 920 단계에서, 방송 프로그램 결정부(731)는 방송 데이터에서 인기 키워드를 포함하는 방송 프로그램을 결정할 수 있다. 일례로, 방송 프로그램 결정부(731)는 표 6과 같은 구성을 갖는 방송 데이터에서 웹 페이지에 노출된 인기 키워드 "QQQ"를 포함하는 방송 프로그램 "XXX"를 결정할 수 있다.In operation 920, the broadcast program determiner 731 may determine a broadcast program including a popular keyword in broadcast data. In one example, the broadcast program determiner 731 may determine a broadcast program "XXX" including the popular keyword "QQQ" exposed to the web page from the broadcast data having the configuration shown in Table 6.

이때, 930 단계에서, 인기 키워드가 노출된 시간이 결정된 방송 프로그램의 방송 시작 시간과 방송 종료 시간 사이에 포함되지 않는 경우, 940 단계에서, 매칭 점수 계산부(732)는 결정된 방송 프로그램의 방송 시간과 노출 시간 간의 차의 절대값으로서 차이값을 계산할 수 있다.In this case, in step 930, when the time at which the popular keyword is exposed is not included between the determined broadcast start time and the broadcast end time of the broadcast program, in step 940, the matching score calculator 732 may determine the broadcast time of the determined broadcast program. The difference value can be calculated as the absolute value of the difference between exposure times.

일례로, 노출 시간이 방송 시작 시간 이전이거나, 방송 종료 시간 이후인 경우, 매칭 점수 계산부(732)는 결정된 방송 프로그램의 방송 시작 시간과 노출 시간 간의 차이값, 또는 방송 종료 시간과 노출 시간 간의 차이값을 계산할 수 있다.For example, when the exposure time is before the broadcast start time or after the broadcast end time, the matching score calculator 732 may determine the difference between the broadcast start time and the exposure time of the determined broadcast program, or the difference between the broadcast end time and the exposure time. The value can be calculated.

그리고, 950 단계에서, 매칭 점수 계산부(732)는 계산된 차이값이 기설정된 기준값 이하인지 비교할 수 있다. 일례로, 방송 시작 시간 이전 2시간과 방송 종료 시간 이후 2시간 동안 방송 프로그램과 관련된 키워드가 인기 키워드로 검색될 확률이 높으므로, 기준 값은 2시간으로 기설정될 수 있다.In operation 950, the matching score calculator 732 may compare whether the calculated difference value is equal to or less than a predetermined reference value. For example, since a keyword related to a broadcast program has a high probability of being searched as a popular keyword for two hours before the broadcast start time and two hours after the broadcast end time, the reference value may be preset to two hours.

이때, 차이값이 기준값 이하인 경우, 960 단계에서, 생성부(733)는 결정된 방송 프로그램의 방송 시간을 기초로 인기 키워드에 대한 제2 후보 대표 문구를 생성할 수 있다.In this case, when the difference value is less than or equal to the reference value, in step 960, the generation unit 733 may generate a second candidate representative phrase for the popular keyword based on the determined broadcast time of the broadcast program.

일례로, 노출 시간이 결정된 방송 프로그램 "XXX"의 방송 시작 시간 이전에 해당하는 경우, 생성부(733)는 결정된 방송 프로그램 "XXX"의 방송 시작 시간을 이용하여 "22:00부터 방송 시작"을 제2 후보 대표 문구로 생성할 수 있다.For example, when the exposure time corresponds to the broadcast start time of the determined broadcast program "XXX", the generation unit 733 uses the determined broadcast start time of the broadcast program "XXX" to perform "broadcast start from 22:00". The second candidate representative phrase may be generated.

다른 예로, 노출 시간이 결정된 방송 프로그램 "XXX"의 방송 종료 시간 이후에 해당하는 경우, 생성부(733)는 결정된 방송 프로그램 "XXX"의 방송 시작 시간 및 방송 종료 시간을 이용하여 "22:00부터 23:20까지 방송"을 제2 후보 대표 문구로 생성할 수 있다.As another example, when the exposure time corresponds to after the broadcast end time of the determined broadcast program "XXX", the generation unit 733 starts from "22:00" using the broadcast start time and broadcast end time of the determined broadcast program "XXX". Broadcast until 23:20 ”may be generated as the second candidate representative phrase.

이때, 대표 문구 결정부에서 제2 후보 대표 문구가 대표 문구로 결정된 경우, 대표 문구 제공부는 제2 후보 대표 문구와 인기 키워드를 결합하여 제공할 수 있다. 일례로, 대표 문구 제공부는 인기 키워드 "QQQ"와 제2 후보 대표 문구 "22:00부터 방송 시작"을 결합하여 웹 페이지에 노출할 수 있다. 이때, 대표 문구 제공부는 인기 키워드 "QQQ"의 방송 프로그램 "XXX"에 대한 화보 사진을 인기 키워드 및 제2 후보 대표 문구와 결합하여 웹 페이지에 노출할 수도 있다. In this case, when the second candidate representative phrase is determined as the representative phrase in the representative phrase determination unit, the representative phrase provider may provide a combination of the second candidate representative phrase and the popular keyword. For example, the representative phrase provider may combine the popular keyword "QQQ" with the second candidate representative phrase "start broadcasting from 22:00" to expose the web page. At this time, the representative phrase provider may combine the pictorial photo of the broadcast program "XXX" of the popular keyword "QQQ" with the popular keyword and the second candidate representative phrase to expose the web page.

한편, 960 단계에서, 결정된 방송 프로그램이 복수개인 경우, 방송 프로그램 결정부(931)는 결정된 방송 프로그램의 방송 시간을 기초로 어느 하나의 방송 프로그램을 선택할 수 있다. 일례로, 결정된 방송 프로그램들의 방송 시작 시간과 노출 시간을 비교하고, 결정된 방송 프로그램들의 방송 종료 시간과 노출 시간을 비교하여, 노출 시간에 가장 가까운 방송 시간을 갖는 방송 프로그램을 선택할 수 있다. 그러면, 생성부(733)는 선택된 방송 프로그램의 방송 시간을 이용하여 제2 후보 대표 문구를 생성할 수 있다.In operation 960, when there are a plurality of determined broadcast programs, the broadcast program determiner 931 may select any one broadcast program based on the determined broadcast time of the broadcast program. For example, the broadcast start time and exposure time of the determined broadcast programs may be compared, and the broadcast end time and exposure time of the determined broadcast programs may be compared to select a broadcast program having a broadcast time closest to the exposure time. Then, the generation unit 733 may generate the second candidate representative phrase by using the broadcast time of the selected broadcast program.

도 10은 여러 단어가 조합된 형태를 갖는 인기 키워드에 대한 제2 후보 대표 문구를 생성하는 과정을 설명하기 위해 제공되는 흐름도이다.FIG. 10 is a flowchart provided to explain a process of generating a second candidate representative phrase for a popular keyword having a combination of several words.

도 10에 따르면, 1010 단계에서, 방송 프로그램 결정부(731)는 여러 단어가 조합된 인기 키워드를 형태소 분석하여 조합된 복수의 단어들을 분리할 수 있다.Referring to FIG. 10, in operation 1010, the broadcast program determiner 731 may separate a plurality of words that have been combined by morphologically analyzing a popular keyword having a combination of words.

일례로, 인기 키워드가 여러 단어가 조합된 "XXX PPP QQQ" 구조인 경우, 방송 프로그램 결정부(731)는 인기 키워드를 형태소 분석하여 "XXX", "PPP", "QQQ"로 분리할 수 있다.For example, when the popular keyword has a structure of "XXX PPP QQQ" in which several words are combined, the broadcast program determiner 731 may morph the popular keyword into "XXX", "PPP", and "QQQ". .

이어, 1020 단계에서, 방송 프로그램 결정부(731)는 방송 데이터에서 분리된 단어들을 포함하는 방송 프로그램을 결정할 수 있다. 일례로, 방송 프로그램 결정부(731)는 방송 데이터에서 분리된 단어 "PPP"를 포함하는 방송 프로그램으로 "XXX"와 "ZZZ"를 결정할 수 있다. 그러면, 방송 프로그램 결정부(731)는 방송 프로그램 "XXX" 및 "ZZZ"에 포함되는 분리된 단어의 개수를 카운트할 수 있다. 결정부(231)는 방송 프로그램 "XXX"는 3개의 분리된 단어 "XXX", "PPP", 및 "QQQ"를 포함하고, "ZZZ"는 1개의 분리된 단어 "PPP"를 포함하는 것으로 카운트할 수 있다. 이에 따라, 방송 프로그램 결정부(731)는 가장 많은 분리된 단어를 포함하는 방송 프로그램 "XXX"를 선택할 수 있다.In operation 1020, the broadcast program determiner 731 may determine a broadcast program including words separated from the broadcast data. For example, the broadcast program determiner 731 may determine "XXX" and "ZZZ" as a broadcast program including the word "PPP" separated from the broadcast data. Then, the broadcast program determiner 731 may count the number of separated words included in the broadcast programs "XXX" and "ZZZ". The determination unit 231 counts that the broadcast program "XXX" includes three separate words "XXX", "PPP", and "QQQ", and "ZZZ" includes one separate word "PPP". can do. Accordingly, the broadcast program determiner 731 may select a broadcast program "XXX" that includes the most separated words.

이때, 1030 단계에서, 결정된 방송 프로그램이 복수 개이고, 1040 단계에서, 분리된 단어를 최대로 포함하는 방송 프로그램이 복수 개인 경우, 1050 단계에서, 방송 프로그램 결정부(731)는 결정된 방송 프로그램들 중 방송 시간이 인기 키워드의 노출 시간에 가장 가까운 방송 프로그램을 선택할 수 있다. 이때, 방송 프로그램 결정부(731)는 결정된 복수의 방송 프로그램의 방송 시작 시간 및 방송 종료 시간을 모두 노출 시간과 비교하여 노출 시간에 가장 가까운 어느 하나의 방송 프로그램을 선택할 수 있다.In this case, when there are a plurality of determined broadcast programs in step 1030, and there are a plurality of broadcast programs including a maximum of separated words in step 1040, in step 1050, the broadcast program determining unit 731 broadcasts among the determined broadcast programs. The broadcast program whose time is closest to the exposure time of the popular keyword can be selected. In this case, the broadcast program determiner 731 may select one of the broadcast programs closest to the exposure time by comparing the determined broadcast start time and broadcast end time of the plurality of broadcast programs with the exposure time.

이때, 1040 단계에서, 분리된 단어를 최대로 포함하는 방송 프로그램이 복수개가 아닌 경우, 1060 단계에서, 방송 프로그램 결정부(731)는 분리된 단어를 최대로 포함하는 방송 프로그램을 선택할 수 있다. 일례로, 방송 프로그램 "XXX"는 분리된 단어 3개를 포함하고, 방송 프로그램 "ZZZ"는 분리된 단어 1개를 포함하므로, 방송 프로그램 결정부(731)는 "XXX"를 선택할 수 있다.In this case, in step 1040, if there are not a plurality of broadcast programs that include the separated words, in step 1060, the broadcast program determiner 731 may select a broadcast program that includes the separated words. For example, since the broadcast program "XXX" includes three separated words and the broadcast program "ZZZ" includes one separated word, the broadcast program determiner 731 may select "XXX".

이어, 1070 단계에서, 매칭 점수 계산부(732)는 선택된 방송 프로그램의 방송 시간과 노출 시간 간의 차이값을 계산할 수 있다. 이때, 차이값이 기설정된 기준값 이하인 경우, 1090 단계에서, 생성부(733)는 선택된 방송 프로그램의 방송 시간을 이용하여 제2 후보 대표 문구를 생성할 수 있다. 그리고, 대표 문구 결정부에서 제2 후보 대표 문구가 대표 문구로 결정된 경우, 대표 문구 제공부는 제2 후보 대표 문구와 인기 키워드를 결합하여 제공할 수 있다. 여기서, 1070 내지 1090 단계의 동작은 앞에서 설명한 940 내지 960 단계와 중복되므로 자세한 설명은 생략하기로 한다.In operation 1070, the matching score calculator 732 may calculate a difference between the broadcast time and the exposure time of the selected broadcast program. In this case, when the difference is less than or equal to the preset reference value, in operation 1090, the generation unit 733 may generate the second candidate representative phrase by using the broadcast time of the selected broadcast program. In addition, when the second candidate representative phrase is determined as the representative phrase in the representative phrase determination unit, the representative phrase provider may provide a combination of the second candidate representative phrase and the popular keyword. Here, the operations of steps 1070 to 1090 are overlapped with steps 940 to 960 described above, and thus a detailed description thereof will be omitted.

도 11은 본 발명의 일실시예에 따라 매칭 점수를 이용하여 제2 후보 대표 문구를 생성하는 과정을 설명하기 위해 제공되는 흐름도이다.11 is a flowchart provided to explain a process of generating a second candidate representative phrase using a matching score according to an embodiment of the present invention.

도 11에 따르면, 1110 단계에서, 방송 프로그램 결정부(731)는 수집된 방송 데이터에서 인기 키워드를 포함하는 방송 프로그램을 결정할 수 있다.According to FIG. 11, in operation 1110, the broadcast program determiner 731 may determine a broadcast program including a popular keyword from the collected broadcast data.

이어, 1020 단계에서, 매칭 점수 계산부(732)는 결정된 방송 프로그램 별로 매칭 점수를 부가할 수 있다. 이때, 매칭 점수 계산부(732)는 방송 데이터를 구성하는 항목들 중 인기 키워드가 프로그램명, 출연진, 극중 이름, 및 방송 내용에 포함되는지 여부에 따라 인기 키워드를 포함하는 방송 프로그램 별로 매칭 점수를 부가할 수 있다. In operation 1020, the matching score calculator 732 may add a matching score for each of the determined broadcast programs. At this time, the matching score calculator 732 adds a matching score for each broadcast program including the popular keyword according to whether the popular keyword among the items constituting the broadcast data is included in the program name, the cast member, the name in the play, and the broadcast content. can do.

일례로, 매칭 점수 계산부(732)는 방송 데이터를 구성하는 항목들 중 인기 키워드가 프로그램명에 포함되는 경우 100, 출연진에 포함되는 경우 80, 극중 이름에 포함되는 경우 80, 및 방송 내용에 포함되는 경우 50을 매칭 점수로 부가할 수 있다.For example, the matching score calculator 732 includes 100 when a popular keyword is included in a program name among items constituting broadcast data, 80 when included in a cast, 80 when included in a name in a play, and broadcast content. 50 may be added as a matching score.

그리고, 1130 단계에서, 매칭 점수 계산부(732)는 결정된 방송 프로그램 별로 시간 점수를 부가할 수 있다. In operation 1130, the matching score calculator 732 may add a time score for each determined broadcast program.

일례로, 매칭 점수 계산부(732)는 인기 키워드가 노출된 시간이 결정된 방송 프로그램의 방송 시간과 일치하는 경우, 일치하는 방송 프로그램의 시간 점수를 50으로 부가할 수 있다. 이때, 매칭 점수 계산부(732)는 인기 키워드의 노출 시간이 방송 시간과 정확히 일치하지 않더라도, 기설정된 오차 범위 내에서 일치하면 시간 점수를 50으로 부가할 수 있다. For example, when the time at which the popular keyword is exposed coincides with the broadcast time of the determined broadcast program, the matching score calculator 732 may add a time score of the matched broadcast program to 50. At this time, even if the exposure time of the popular keyword does not exactly match the broadcast time, the matching score calculator 732 may add a time score of 50 if it matches within a preset error range.

그리고, 매칭 점수 계산부(732)는 방송 프로그램의 방송 시간과 인기 키워드의 노출 시간이 일치하지 않으면, 10분 오차 시 마다 50에서 -5점을 감산하여 시간 점수를 부가할 수 있다. 일례로, 방송 프로그램의 방송 시작 시간과 인기 키워드의 노출 시간이 30분 오차인 경우, 매칭 점수 계산부(732)는 시간 점수를 35로 부가할 수 있다. When the broadcasting time of the broadcast program and the exposure time of the popular keyword do not match, the matching score calculator 732 may add a time score by subtracting -5 points from 50 every 10 minutes. For example, when the broadcast start time of the broadcast program and the exposure time of the popular keyword are 30 minutes apart, the matching score calculator 732 may add the time score to 35.

이어, 1140 단계에서, 매칭 점수 계산부(732)는 방송 프로그램 별로 방송국 가중치를 부가할 수 있다. 일례로, 매칭 점수 계산부(732)는 결정된 방송 프로그램이 공중파 방송인지, 케이블 방송인지, 혹은 라디오방송인지에 따라 가중치를 다르게 부가할 수 있다.Subsequently, in operation 1140, the matching score calculator 732 may add broadcasting station weights for each broadcasting program. For example, the matching score calculator 732 may add weights differently according to whether the determined broadcast program is over-the-air broadcast, cable broadcast, or radio broadcast.

그리고, 1150 단계에서, 매칭 점수 계산부(732)는 방송 프로그램 별로 최종 매칭 점수를 계산할 수 있다. 이때, 매칭 점수 계산부(732)는 위의 수학식 3과 같이, 결정된 방송 프로그램 별 시청률을 이용하여 방송 프로그램 별로 최종 매칭 점수를 계산할 수 있다.In operation 1150, the matching score calculator 732 may calculate a final matching score for each broadcast program. In this case, the matching score calculator 732 may calculate the final matching score for each broadcast program using the determined rating for each broadcast program as shown in Equation 3 above.

일례로, 매칭 점수 계산부(732)는 부가된 매칭 점수와 시간 점수의 합을 계산하고, 계산된 합에 방송국 가중치와 시청률을 곱하여 결정된 방송 프로그램 별로 최종 매칭 점수를 계산할 수 있다.For example, the matching score calculator 732 may calculate a sum of the added matching score and the time score, and calculate a final matching score for each broadcast program determined by multiplying the calculated sum by a broadcasting station weight and an audience rating.

그러면, 1160 단계에서, 방송 프로그램 결정부(731)는 계산된 최종 매칭 점수를 기초로 결정된 방송 프로그램들 중 어느 하나의 방송 프로그램을 선택할 수 있다. 일례로, 방송 프로그램 결정부(731)는 계산된 최종 매칭 점수가 가장 높은 방송 프로그램을 선택할 수 있다.Then, in operation 1160, the broadcast program determiner 731 may select one of the broadcast programs determined based on the calculated final matching score. For example, the broadcast program determiner 731 may select a broadcast program having the highest calculated final matching score.

이어, 1170 단계에서, 매칭 점수 계산부(732)는 선택된 방송 프로그램의 방송 시간과 노출 시간 간의 차이값을 계산할 수 있다. In operation 1170, the matching score calculator 732 may calculate a difference between the broadcast time and the exposure time of the selected broadcast program.

그리고, 1180 단계에서, 차이값이 기설정된 기준값 이하인 경우, 1190 단계에서, 생성부(733)는 선택된 방송 프로그램의 방송 시간을 이용하여 제2 후보 대표 문구를 생성할 수 있다. In operation 1180, when the difference is less than or equal to the preset reference value, in operation 1190, the generation unit 733 may generate the second candidate representative phrase by using the broadcast time of the selected broadcast program.

이때, 제2 후보 대표 문구가 대표 문구로 결정된 경우, 대표 문구 제공부는 생성된 제2 후보 대표 문구와 인기 키워드를 결합하여 제공할 수 있다. 여기서, 1170 내지 1190 단계의 동작은 앞에서 설명한 940 내지 960 단계와 중복되므로 자세한 설명은 생략하기로 한다.In this case, when the second candidate representative phrase is determined as the representative phrase, the representative phrase provider may provide a combination of the generated second candidate representative phrase and the popular keyword. Here, since operations of steps 1170 to 1190 overlap with steps 940 to 960 described above, a detailed description thereof will be omitted.

이상에서 설명한 바와 같이, 본 발명의 일실시예에 따른 대표 문구 제공 시스템은 문서를 기초로 제1 후보 대표 문구를 생성하고, 방송 데이터를 기초로 제2 후보 대표 문구를 생성할 수 있다. 그리고, 최종 인기도 또는 제2 후보 대표 문구의 시간 점수에 기초하여 제1 후보 대표 문구 및 제2 후보 대표 문구 중 어느 하나를 인기 키워드에 대해 최종적인 대표 문구를 결정할 수 있다. 그리고, 대표 문구 제공 시스템은, 결정된 대표 문구와 인기 키워드를 결합하여 웹 페이지에 노출할 수 있다.As described above, the representative phrase providing system according to an embodiment of the present invention may generate the first candidate representative phrase based on the document, and generate the second candidate representative phrase based on the broadcast data. Then, the final representative phrase for the popular keyword may be determined for either the first candidate representative phrase or the second candidate representative phrase based on the final popularity or the time score of the second candidate representative phrase. In addition, the representative phrase providing system may combine the determined representative phrase and the popular keyword and expose the web page.

이때, 제1 후보 대표 문구 및 제2 후보 대표 문구 중 어느 하나가 인기 키워드에 대한 대표 문구로 결정되지 않은 경우, 대표 문구 제공 시스템은 제1 후보 대표 문구 및 제2 후보 대표 문구 전부를 인기 키워드와 결합하여 제공할 수 있다. 또한, 대표 문구 제공 시스템은 제1 후보 대표 문구 및 제2 후보 대표 문구 중 어느 하나를 랜덤하게 선택하여 인기 키워드에 대한 최종적인 대표 문구로 결정할 수도 있다. 또한, 대표 문구 제공 시스템은 제1 후보 대표 문구 및 제2 후보 대표 문구 전부를 인기 키워드에 대한 최종적인 대표 문구로 결정하지 않고, 인기 키워드만 웹페이지에 노출할 수 있다. 즉, 도 5 및 도 8에서, 대표 문구 없이 인기 키워드 목록 만을 웹 페이지에 노출할 수 있다.In this case, when either one of the first candidate representative phrase and the second candidate representative phrase is not determined as the representative phrase for the popular keyword, the representative phrase providing system may match all of the first candidate representative phrase and the second candidate representative phrase with the popular keyword. Can be provided in combination. Also, the representative phrase providing system may randomly select any one of the first candidate representative phrase and the second candidate representative phrase to determine the final representative phrase for the popular keyword. In addition, the representative phrase providing system may expose only the popular keyword on the web page without determining all of the first candidate representative phrase and the second candidate representative phrase as the final representative phrase for the popular keyword. That is, in FIGS. 5 and 8, only the popular keyword list without the representative phrase may be exposed on the web page.

본 발명의 실시 예에 따른 방법들은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다.The methods according to embodiments of the present invention may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the medium may be those specially designed and constructed for the present invention or may be available to those skilled in the art of computer software.

이상과 같이 본 발명은 비록 한정된 실시예와 도면에 의해 설명되었으나, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 본 발명이 속하는 분야에서 통상의 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다.As described above, the present invention has been described by way of limited embodiments and drawings, but the present invention is not limited to the above embodiments, and those skilled in the art to which the present invention pertains various modifications and variations from such descriptions. This is possible.

그러므로, 본 발명의 범위는 설명된 실시예에 국한되어 정해져서는 아니 되며, 후술하는 특허청구범위뿐 아니라 이 특허청구범위와 균등한 것들에 의해 정해져야 한다.Therefore, the scope of the present invention should not be limited to the described embodiments, but should be determined by the equivalents of the claims, as well as the claims.

200: 대표 문구 제공 시스템
210: 제1 후보 대표 문구 생성부
220: 제2 후보 대표 문구 생성부
230: 인기도 계산부
240: 대표 문구 결정부
250: 시간 점수 계산부
260: 대표 문구 제공부200: representative phrase providing system
210: first candidate representative phrase generation unit
220: second candidate representative phrase generation unit
230: popularity calculation unit
240: representative phrase determination unit
250: time score calculator
260: representative phrase provider

Claims

Generating a first candidate representative phrase for a popular keyword based on the document;
Generating a second candidate representative phrase for a popular keyword based on the broadcast data; And
Determining a final representative phrase for the popular keyword by using the first candidate representative phrase and the second candidate representative phrase.
Representative phrases providing method comprising a.

The method of claim 1,
Determining the final representative phrase,
The method of claim 1, wherein the final representative phrase is determined to have a higher suitability for a popular keyword among the first candidate representative phrase and the second candidate representative phrase.

The method of claim 3,
Calculating a popularity of the document used to generate the first candidate representative phrase
Representative phrases providing method further comprising.

The method of claim 3,
Determining the representative phrase,
And determining the first candidate representative phrase as the final representative phrase based on the popularity of the document and the predetermined reference popularity.

The method of claim 2,
Calculating a time score of the second candidate representative phrase using the broadcasting time of the broadcast data and the exposure time of the popular keyword used to generate the second candidate representative phrase.
Representative phrases providing method further comprising.

The method of claim 5,
Determining the representative phrase,
Providing the representative phrase, characterized in that the second candidate representative phrase is determined to have a higher suitability for the popular keyword by using the time score and the predetermined reference time score of the second candidate representative phrase as the final representative phrase. Way.

The method of claim 1,
Generating the first candidate representative phrase,
Determining a reference word of documents containing the popular keyword;
Determining a representative reference word among the reference words;
Expanding the representative reference word by combining the representative reference word with a continuous word before or after the representative reference word; And
Generating the first candidate representative phrase using the extended representative reference word
Representative phrases providing method comprising a.

The method of claim 7, wherein
Expanding the representative reference word,
Calculating conditional probabilities that include consecutive words before or after the representative reference word in documents containing the popular keyword; And
Expanding the representative criterion word based on the conditional probability
Representative phrases providing method comprising a.

The method of claim 7, wherein
Determining the representative reference word,
Counting the frequency of the reference word by analyzing the morphemes of documents including the popular keyword; And
Determining the representative reference word based on the counted frequency
Representative phrases providing method comprising a.

The method of claim 1,
Collecting documents containing the popular keyword; And
Steps to Perform Cluster on Collected Documents
Further comprising:
Generating the first candidate representative phrase,
And generating the first candidate representative phrase by extending a representative reference word based on documents belonging to the cluster.

The method of claim 10,
Generating the first candidate representative phrase,
Adding weights to the collected documents; And
Determining an exposure priority of a cluster using the weights
Representative phrases providing method further comprising.

The method of claim 1,
Generating the second candidate representative phrase,
Determining a broadcast program including the popular keyword in the broadcast data; And
Generating the second candidate representative phrase by using a broadcast time of a broadcast program;
Representative phrases providing method comprising a.

The method of claim 12,
Generating the second candidate representative phrase,
When there are a plurality of broadcast programs, selecting one of the plurality of broadcast programs whose broadcast start time and broadcast end time of the broadcast program are closest to the time at which the popular keyword is exposed;
Further comprising:
Generating the second candidate representative phrase using the broadcast time may include:
And generating the second candidate representative phrase using the broadcast time of the selected broadcast program.

The method of claim 1,
Generating the second candidate representative phrase,
If the popular keyword is in the form of a combination of words, morphologically analyzing the popular keyword and separating the plurality of words;
Selecting one broadcast program based on a plurality of words separated from the broadcast programs including the popular keyword; And
Generating the second candidate representative phrase by using a broadcast time of a selected broadcast program;
Representative phrases providing method comprising a.

The method of claim 1,
Generating the second candidate representative phrase,
Adding a matching score, a time score, and a broadcast station weight to a broadcast program including the popular keyword;
Calculating a final matching score for the broadcast program using at least one of the matching score, time score, broadcast station weight, and rating of the broadcast program;
Selecting one of the broadcast programs from the broadcast program including the popular keyword based on the final matching score; And
Generating the second candidate representative phrase by using a broadcast time of a selected broadcast program;
Representative phrases providing method comprising a.

The method of claim 1,
Combining and providing the final representative phrase with the popular keyword
Representative phrases providing method further comprising.

A computer-readable recording medium having recorded thereon a program for executing the method of any one of claims 1 to 16.

A first candidate representative phrase generation unit generating a first candidate representative phrase for a popular keyword based on the document;
A second candidate representative phrase generation unit generating a second candidate representative phrase for a popular keyword based on the broadcast data; And
A representative phrase determination unit that determines a final representative phrase for the popular keyword by using the first candidate representative phrase and the second candidate representative phrase.
Representative phrase providing system comprising a.

19. The method of claim 18,
The representative phrase determination unit,
Representative phrase providing system, characterized in that the final representative phrase is determined that the higher suitability of the popular keyword among the first candidate representative phrase and the second candidate representative phrase.

20. The method of claim 19,
Popularity calculation unit for calculating the popularity of the document used to generate the first candidate representative phrase
Representative phrase providing system further comprising.

21. The method of claim 20,
The representative phrase determination unit,
And a first candidate representative phrase determined as having a higher suitability for the popular keyword as the final representative phrase using the popularity of the document and a predetermined reference popularity.
Representative phrase providing system comprising a.

19. The method of claim 18,
A time score calculator for calculating a time score of the second candidate representative phrase by using the broadcasting time of the broadcast data and the exposure time of the popular keyword used to generate the second candidate representative phrase
Representative phrase providing system further comprising.

The method of claim 22,
The representative phrase determination unit,
Providing the representative phrase, characterized in that the second candidate representative phrase is determined to have a higher suitability for the popular keyword by using the time score and the predetermined reference time score of the second candidate representative phrase as the final representative phrase. system.

19. The method of claim 18,
The first candidate representative phrase generation unit,
A reference word determination unit determining a reference word by analyzing the morphemes of documents including the popular keyword and determining a representative reference word according to the determined frequency of the reference word; And
A reference word extension unit which expands the representative reference word by combining a continuous word before or after the representative reference word and the representative reference word, and generates the second candidate representative phrase using the extended representative reference word
Representative phrase providing system comprising a.

25. The method of claim 24,
The reference word expansion unit,
Calculating conditional probabilities including consecutive words before or after the representative criterion word in documents including the popular keyword, and extending the representative criterion word based on the conditional probability. system.

25. The method of claim 24,
The reference word determination unit,
And analyzing the morphemes of the documents including the popular keyword to count the frequency of the reference word and determine the representative reference word based on the counted frequency.

19. The method of claim 18,
A document collector configured to collect documents including the popular keyword; And
Cluster execution unit that performs a cluster on the collected documents
Further comprising:
The first candidate representative phrase generation unit,
And the first candidate representative phrase is generated by extending a representative reference word based on documents belonging to the cluster.

19. The method of claim 18,
The second candidate representative phrase generation unit,
A broadcast program determiner configured to determine a broadcast program including the popular keyword in the broadcast data; And
A generator configured to generate the second candidate representative phrase using the determined broadcast time of the broadcast program
Representative phrase providing system comprising a.

The method of claim 28,
The broadcast program determination unit,
When there are a plurality of broadcast programs, the broadcast start time and the broadcast end time of the broadcast program are selected from among the plurality of broadcast programs, the broadcast program being closest to the time when the popular keyword is exposed,
The generation unit,
And the second candidate representative phrase is generated using the broadcast time of the selected broadcast program.

19. The method of claim 18,
The second candidate representative phrase generation unit,
When the popular keyword is in the form of a combination of words, the popular keyword is morphologically analyzed to be divided into a plurality of words, and any one broadcast is based on a plurality of words separated from the broadcast program including the popular keyword. A broadcast program determination unit for selecting a program; And
A generator configured to generate the second candidate representative phrase using the broadcast time of the selected broadcast program
Representative phrase providing system comprising a.

19. The method of claim 18,
The second candidate representative phrase generation unit,
A matching score, a time score, and a broadcast station weight are added to a broadcast program including the popular keyword, and a final matching score for the broadcast program is obtained by using at least one of the matching score, time score, broadcast station weight, and viewer rating of the broadcast program. Matching score calculator to calculate;
A broadcast program determiner configured to select one broadcast program based on the final matching score among broadcast programs including the popular keyword; And
A generator configured to generate the second candidate representative phrase using the broadcast time of the selected broadcast program
Representative phrase providing system comprising a.

19. The method of claim 18,
Representative phrase providing unit providing the final representative phrase and the popular keywords combined
Representative phrase providing system further comprising.