KR101120040B1

KR101120040B1 - Apparatus for recommending related query and method thereof

Info

Publication number: KR101120040B1
Application number: KR1020080127490A
Authority: KR
Inventors: 허정; 황이규; 이충희; 오효정; 임수종; 김현기; 윤여찬; 최미란; 이창기; 장명길
Original assignee: 한국전자통신연구원
Priority date: 2008-12-15
Filing date: 2008-12-15
Publication date: 2012-03-23
Also published as: KR20100068964A

Abstract

본 발명은 검색엔진에서 연관 질의어를 추천하는 장치 및 방법에 있어서, 기존 검색 엔진의 클릭 로그를 이용하여 질의어와 선택된 URL의 관계성을 수치화하여 제시하고, 클릭 로그에 남겨진 시간 정보와 선택된 URL 문서에 대한 분류 정보 등을 이용하여 사용자에 의해 입력된 초기 질의어와 연관성을 가지는 연관 질의어를 카테고리별로 그룹핑하는 방식으로 초기 질의어와 연관 관계가 높은 다양한 연관 질의어를 제시함으로써 사용자가 원하는 정보를 쉽게 찾을 수 있도록 하며, 또한 URL이 선택된 시간 정보를 반영하여 최근의 정보 경향을 반영할 수 있도록 한다.The present invention provides an apparatus and method for recommending an association query in a search engine. The present invention provides a numerical representation of a relationship between a query and a selected URL using a click log of an existing search engine, and displays the time information left in the click log and the selected URL document. By using the classification information on the related group, the related query that is related to the initial query input by the user is grouped by category, so that various related query words that are highly related to the initial query can be presented so that users can easily find the information they want. In addition, the URL may reflect the current information trend by reflecting the selected time information.

검색, 질의어, 클릭 로그, 그룹핑, 연관성 Search, query, clicklog, grouping, relevance

Description

Associated query recommendation device and method {APPARATUS FOR RECOMMENDING RELATED QUERY AND METHOD THEREOF}

본 발명은 검색엔진에서 연관 질의어 제공방법에 관한 것으로, 특히 사용자가 검색엔진에서 특정 정보를 찾기 위해 질의어를 입력하는 경우 기존 검색 엔진의 클릭 로그(click log)를 이용하여 질의어와 선택된 URL의 관계성을 수치화하여 제시하고, 클릭 로그에 남겨진 시간 정보와 선택된 URL 문서에 대한 분류 정보 등을 이용하여 사용자에 의해 입력된 초기 질의어와 연관성을 가지는 연관 질의어를 카테고리별로 그룹핑(classification)하는 방식으로 초기 질의어와 연관성이 높은 연관 질의어를 제시할 수 있도록 하는 연관 질의어 추천 장치 및 방법에 관한 것이다.The present invention relates to a method for providing a related query in a search engine. In particular, when a user inputs a query to search for specific information in a search engine, the relation between a query and a selected URL using a click log of an existing search engine is used. Is presented by quantifying and using the initial time information left in the click log and the classification information on the selected URL document. The present invention relates to a related query recommendation apparatus and method for presenting a highly related related query.

본 발명은 지식경제부 및 정보통신연구진흥원의 IT신성장동력핵심기술개발사업의 일환으로 수행한 연구로부터 도출된 것이다[과제관리번호: 2008-S-020-01, 과제명: 웹 QA 기술 개발].The present invention is derived from the research conducted as part of the IT new growth engine core technology development project of the Ministry of Knowledge Economy and the Ministry of Information and Communication Research and Development (Task Management No .: 2008-S-020-01, Title: Web QA Technology Development).

최근 들어, 웹(web) 기술의 발전과 더불어 사용자들이 참고할 수 있는 웹 문 서의 수가 기하급수적으로 늘어나면서, 사용자들이 원하는 정보를 찾기가 쉽지 않아졌다. 이로 인해, 사용자가 원하는 웹 문서를 손쉽게 찾을 수 있도록 도와주기 위해서 다양한 웹 검색 엔진이 개발되고 있으나, 웹 문서 검색에 대한 지식이 부족한 일반 사용자의 경우 웹 검색을 위해서 사용자가 입력해야 하는 질의어 선택에 많은 어려움이 있었다.Recently, with the development of web technology, the number of web documents that can be referred to by users increases exponentially, making it difficult for users to find the information they want. As a result, various web search engines have been developed to help users easily find the web documents they want. However, general users who do not have enough knowledge about web document search have many options for selecting a query for the web search. There was a difficulty.

위와 같은 사용자의 질의어 선택의 어려움을 해결하기 위해 검색정보를 제공하는 기존의 포털 사이트(portal site)에서는 질의어 자동완성 기능을 제공하고 있다. 이는 사용자가 입력한 일부 자모나 음절이 포함된 다양한 질의어를 제시하여 사용자가 선택할 수 있도록 한다. 즉, 여기서 제시되는 질의어들은 의미적인 관련성과는 무관하게 단지 동일한 패턴의 자모나 음절의 포함여부만을 파악하기 때문에 초기 질의어 선택에는 도움이 될 수 있다. In order to solve the difficulty of selecting a user's query as described above, an existing portal site providing search information provides a query completion function. This suggests various query terms including some letters or syllables entered by the user so that the user can select them. In other words, the query words presented here can be helpful for initial query selection because they only contain the same pattern of letters or syllables regardless of their semantic relevance.

그러나, 사용자의 질의어에 대해서 자모나 음절에 대한 패턴을 포함하지 않는 의미적으로 연관된 질의어에 대해서는 추천이 불가능하다.However, recommendation is not possible for semantically related queries that do not include patterns for the alphabet or syllables for the user's query.

또한, 의미적 연관성에 따른 질의 추천을 위해서 기존에는 대용량의 문서에 기반하여 어휘들간의 공기정보를 추출하고, 이를 이용하여 연관 질의어를 제공하거나, 다양한 어휘 개념 구조(예: 워드넷)를 이용하여 질의어와 의미적으로 연관된 어휘들을 추천하였다. Also, in order to recommend a query based on semantic correlation, air information between words is extracted based on a large amount of documents, and related query terms are provided using this, or various lexical concept structures (eg, WordNet) are used. We recommended vocabulary-related vocabulary.

그러나, 이와 같은 대용량 언어자원에 기반한 연관 질의어 추천은 실제 검색 엔진에서 사용자 의도나 선호 정보 등이 배제되어 사용자가 입력한 질의어와 연관성이 높은 다른 질의어들을 사용자의 의도에 맞게 제공하여 주지는 못하였다.However, such a query recommendation based on a large amount of linguistic resources could not provide other queries that are highly related to the user's input query in accordance with the user's intention because the user's intention or preference information is excluded from the actual search engine.

따라서 본 발명은 사용자가 검색엔진에서 특정 정보를 찾기 위해 질의어를 입력하는 경우 기존 검색 엔진의 클릭 로그를 이용하여 질의어와 선택된 URL의 관계성을 수치화하여 제시하고, 클릭 로그에 남겨진 시간 정보와 선택된 URL 문서에 대한 분류 정보 등을 이용하여 사용자에 의해 입력된 초기 질의어와 연관성을 가지는 연관 질의어를 카테고리별로 그룹핑하는 방식으로 초기 질의어와 연관성이 높은 연관 질의어를 제시할 수 있도록 하는 연관 질의어 추천 장치 및 방법을 제공하고자 한다.Therefore, when the user inputs a query to search for specific information in a search engine, the present invention quantifies the relationship between the query and the selected URL using the click log of the existing search engine, and presents the time information and the selected URL left in the click log. Apparatus and method for recommending a related query that enables to suggest a related query that is highly related to the initial query by grouping related queries that are related to the initial query input by the user by category using classification information about the document. To provide.

상술한 본 발명은 연관 질의어 추천 장치로서, 클릭로그를 참조하여 다수의 서로 다른 질의어 및 상기 질의어에 따라 선택된 URL 정보와 각 URL이 선택된 시간 정보를 추출하는 정보 추출부와, 상기 URL이 선택된 빈도 정보를 계산하는 빈도 정보 계산부와, 상기 URL과 질의어간 관계를 산출하여 클릭로그 색인 데이터로 생성하는 색인부와, 질의어가 입력되는 경우 상기 클릭로그 색인 데이터를 참조하여 연관 URL과 연관 질의어를 검색한 후, 상기 연관 질의어를 카테고리별로 분류하여 상기 질의어와 연관성이 상대적으로 높은 연관 질의어를 추천 제공하는 서버 제어부를 포함한다.The present invention as described above is an associative query recommendation apparatus, comprising: an information extracting unit for extracting a plurality of different query terms and URL information selected according to the query term and time information in which each URL is selected, with reference to a click log, and frequency information at which the URL is selected; A frequency information calculation unit for calculating a value, an index unit for calculating the relationship between the URL and the query word and generating the click log index data, and searching the associated URL and the related query word with reference to the click log index data when the query word is input. And a server controller for classifying the related query by category and recommending an associated query having a relatively high relation with the query.

또한 본 발명은 연관 질의어 추천을 위한 클릭 로그 색인 등록 방법으로서, 클릭로그를 참조하여 사용자들에 의해 입력되는 다수의 서로 다른 질의어 및 상기 질의어 입력 후 선택된 URL 정보와 각 URL이 선택된 시간 정보를 추출하는 단계와, 상기 URL이 선택된 빈도 정보를 계산하는 단계와, 상기 URL과 질의어간 관계를 산출하여 색인 데이터로 저장하는 단계를 포함한다.In addition, the present invention is a method of registering a click log index for recommending a related query, a plurality of different query words input by the user with reference to the click log and the URL information selected after the query input and the time information selected by each URL is extracted Comprising the step of calculating the frequency information of the selected URL, and calculating the relationship between the URL and the query word and storing it as index data.

본 발명은 검색엔진에서 연관 질의어를 추천하는 장치 및 방법에 있어서, 기존 검색 엔진의 클릭 로그를 이용하여 질의어와 선택된 URL의 관계성을 수치화하여 제시하고, 클릭 로그에 남겨진 시간 정보와 선택된 URL 문서에 대한 분류 정보 등을 이용하여 사용자에 의해 입력된 초기 질의어와 연관성을 가지는 연관 질의어를 카테고리별로 그룹핑하는 방식으로 초기 질의어와 연관 관계가 높은 다양한 연관 질의어를 제시함으로써 사용자가 원하는 정보를 쉽게 찾을 수 있는 이점이 있다. 또한 URL이 선택된 시간 정보를 반영하여 최근의 정보 경향을 반영할 수 있는 이점이 있다.The present invention provides an apparatus and method for recommending an association query in a search engine. The present invention provides a numerical representation of a relationship between a query and a selected URL using a click log of an existing search engine, and displays the time information left in the click log and the selected URL document. The user can easily find the information he / she wants by presenting various related queries that are highly related to the initial query by grouping related queries that are related to the initial query input by the user by using the classification information. There is this. In addition, there is an advantage that the URL can reflect the latest information trend by reflecting the selected time information.

이하, 첨부된 도면을 참조하여 본 발명의 동작 원리를 상세히 설명한다. 하기에서 본 발명을 설명함에 있어서 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략할 것이다. 그리고 후술되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다. Hereinafter, with reference to the accompanying drawings will be described in detail the operating principle of the present invention. In the following description of the present invention, if it is determined that a detailed description of a known function or configuration may unnecessarily obscure the subject matter of the present invention, the detailed description thereof will be omitted. Terms to be described later are terms defined in consideration of functions in the present invention, and may be changed according to intentions or customs of users or operators. Therefore, the definition should be based on the contents throughout this specification.

본 발명의 기술요지를 살펴보면, 클릭 로그를 이용하여 질의어와 선택된 URL의 관계성을 수치화하여 제시하고, 클릭 로그에 남겨진 시간 정보와 선택된 URL 문서에 대한 분류 정보 등을 이용하여 사용자에 의해 입력된 초기 질의어와 연관성을 가지는 연관 질의어를 카테고리별로 그룹핑하는 방식으로 초기 질의어와 연관성이 높은 연관 질의어를 제시할 수 있도록 하는 기술을 통해 본 발명에서 이루고자 하는 바를 쉽게 달성할 수 있다.Referring to the technical gist of the present invention, the relationship between the query and the selected URL is numerically presented using the click log, and the initial information input by the user using the time information left in the click log and the classification information on the selected URL document. It is easy to achieve the purpose of the present invention through the technology of presenting a related query having a high relevance to the initial query by grouping related query having a relationship with the query.

도 1은 본 발명의 실시 예에 따른 연관질의어 추천장치의 구성을 도시한 것으로, 연관질의어 추천장치는 정보 추출부(112), 빈도 정보 계산부(114), 색인부(116), 클릭로그 색인데이터 DB(Data Base)(120), 클릭로그 DB(118) 등을 포함한다.1 is a block diagram of a related query recommendation apparatus according to an embodiment of the present invention. The related query recommendation apparatus includes an information extracting unit 112, a frequency information calculating unit 114, an indexing unit 116, and a click log index. Data DB (Data Base) 120, clicklog DB 118, and the like.

이하, 도 1을 참조하여 본 발명의 연관질의어 추천장치 각 구성 요소에서의 동작을 상세히 설명하기로 한다.Hereinafter, an operation of each component of the related query recommendation apparatus of the present invention will be described in detail with reference to FIG. 1.

먼저, 클릭로그 DB(118)는 인터넷을 포함하는 유/무선 인터넷망을 통해 검색 엔진 서버(130)로 접속한 클라이언트 단말(client terminal)(100)로부터 입력되는 다양한 질의어를 저장하며, 해당 질의어 입력 후 선택한 URL(Uniform Resource Locator) 및 해당 URL이 선택된 시간 정보를 저장한다. 이때 특히 클릭 로그 DB(118)에 저장되는 시간 정보는 질의어에 대한 시간적 경향(trend)을 반영하는 것으로 최근에 가장 많이 클릭된 질의어와 선택 URL에 대한 가중치를 높여줌으로써, 시간적 경향을 수치화하여 연관 질의어 추출에 반영될 수 있다.First, the click log DB 118 stores various queries input from a client terminal 100 connected to a search engine server 130 through a wired / wireless internet network including the Internet, and inputs a corresponding query. Afterwards, the selected URL (Uniform Resource Locator) and the URL store the selected time information. In this case, in particular, the time information stored in the click log DB 118 reflects the temporal trend of the query, and the weighted temporal tendency is quantified by increasing the weight of the most recently clicked query and the selected URL. Can be reflected in the extraction.

정보 추출부(112)는 클릭 로그 DB(118)를 참조하여 다수의 클라이언트 단말(100)을 통해 입력되는 사용자의 질의어 및 질의어에 따라 선택된 URL 정보와 각 URL이 선택된 시간 정보를 추출한다.The information extraction unit 112 refers to the click log DB 118 and extracts URL information selected according to the user's query words and query words input through the plurality of client terminals 100 and time information for each URL.

빈도 정보 계산부(114)는 정보 추출부(112)에서 추출된 질의어, 선택 URL에 대한 빈도 계산 및 출현 확률에 대한 계산을 수행한다. 색인부(116)는 빈도 및 확률이 계산된 질의어와 선택 URL의 관계 구조를 산출하여 클릭로그 색인 데이터 DB(120)에 저장시킨다.The frequency information calculator 114 calculates the frequency of the query word, the selected URL, and the appearance probability extracted by the information extractor 112. The index unit 116 calculates a relationship structure between the query word and the selected URL whose frequency and probability are calculated, and stores the structure in the clicklog index data DB 120.

서버 제어부(110)는 검색엔진 서버(130)의 동작을 제어하며, 정보 추출부(112), 빈도 정보 계산부(114), 색인부(116)를 제어하여 입력된 질의어에 대해 연관성이 높은 연관질의어를 제시하여 준다.The server controller 110 controls the operation of the search engine server 130, and controls the information extractor 112, the frequency information calculator 114, and the indexer 116 to be highly related to the input query. Present the query.

즉, 서버 제어부(110)는 유/무선 통신망을 통해 접속한 클라이언트 단말(100)의 사용자에 의해 초기 질의어가 입력되는 경우 클릭로그 색인 데이터를 참조하여 연관 URL과 연관 질의어를 검색한 후, 다시 연관 URL을 이용하여 초기 질의어와 연관된 다수의 연관 질의어를 검색한다. 이어 초기 질의어와 연관 질의어간 연관성을 산출한 후, 연관 질의어를 카테고리별로 분류하여 그룹핑시키고, 위 산출된 연관성 결과에 따라 연관성이 높은 연관 질의어를 우선하여 표시시킴으로써, 사용자가 원하는 정보를 보다 쉽게 검색할 수 있도록 한다.That is, when the initial query is input by the user of the client terminal 100 connected through the wired / wireless communication network, the server controller 110 searches for the related URL and the related query by referring to the clicklog index data, and then reassociates it. Retrieve a plurality of related queries associated with the initial query using the URL. After calculating the correlation between the initial query and the related query, the related query is classified and grouped into categories, and the relevant related query is first displayed according to the result of the above relevance, so that the user can easily search for the desired information. To help.

도 2는 본 발명의 실시 예에 따른 검색엔진 서버(130)에서 입력된 질의어에 대한 연관 질의어를 추천하는 동작 제어 흐름을 도시한 것이다. 이하, 도 1 및 도 2를 참조하여 본 발명의 실시 예를 상세히 설명하기로 한다.2 illustrates an operation control flow for recommending an association query for a query input by the search engine server 130 according to an exemplary embodiment of the present invention. Hereinafter, embodiments of the present invention will be described in detail with reference to FIGS. 1 and 2.

먼저, 서버 제어부(110)는 (S200)단계에서 유/무선 통신망을 통해 접속한 클라이언트 단말로부터 초기 질의어를 입력받는다.First, the server controller 110 receives an initial query from a client terminal connected through a wired / wireless communication network in step S200.

이어, 서버 제어부(110)는 (S202)단계에서 클릭로그 색인 데이터 DB(120)를 참조하여 입력된 초기 질의어와 연관관계를 가지는 URL을 검색하고, (S204)단계에서 검색된 연관 URL 정보를 이용하여 클릭로그 색인 데이터 DB(120)에서 해당 URL과 연관된 다양한 질의어를 검색하고 이를 초기 질의어와 연관된 연관 질의어로 추출한다.Subsequently, the server controller 110 searches for a URL having an association relationship with the initial query input by referring to the clicklog index data DB 120 in step S202 and using the associated URL information retrieved in step S204. In the clicklog index data DB 120, various query terms associated with the corresponding URL are searched and extracted as related query terms associated with the initial query term.

그런 후, 서버 제어부(110)는 (S206)단계에서 초기 질의어와 연관 URL의 연관 관계를 계산하고, 연관 URL과 연관 질의어의 연관 관계를 계산하여 초기 질의어와 연관 질의어간 연관성을 산출한다.Thereafter, the server controller 110 calculates an association relationship between the initial query word and the associated URL in step S206, and calculates an association relationship between the initial query word and the related query word by calculating an association relationship between the association URL and the related query word.

이어, 서버 제어부(110)는 (S208)단계에서 초기 질의어와 연관되는 것으로 검색된 연관 URL의 해당 웹페이지(web page)를 수집하고, (S210)단계에서 수집해온 URL 웹페이지를 대상으로 기 정의된 유형구조에 기반하여 카테고리별로 분류하여 해당 URL과 연관된 연관 질의어에 대해 카테고리별 그룹핑을 수행한다.Subsequently, the server controller 110 collects a corresponding web page of the related URL searched to be associated with the initial query in step S208, and is pre-defined for the URL web pages collected in step S210. Classify by category based on the type structure and perform grouping by category for the related query word related to the URL.

이어, 서버 제어부(110)는 (S212)단계에서 카테고리별로 그룹핑된 연관 질의어를 위 산출된 초기 질의어와의 연관성 결과에 따라 연관성이 높은 순서대로 우선하여 표시시킨다.Subsequently, the server controller 110 first displays the related query group grouped by category in step S212 in order of high relevance according to the result of the association with the calculated initial query word.

이에 따라, 사용자는 초기 질의어 입력 시 초기 질의어와 연관성이 높은 연관 질의어를 보다 정확히 추천받음으로써, 위와 같이 추천된 연관 질의어를 이용하여 원하는 정보를 보다 쉽게 검색할 수 있게 되는 것이다.Accordingly, the user can more accurately search for the desired information using the recommended related query as described above by more accurately recommending the related query which is highly related to the initial query when the initial query is input.

도 3은 클릭 로그 DB(118)상 질의어와 선택 URL간 연관성을 그래프로 도식화한 도면이다. 3 is a diagram schematically illustrating the association between a query and a selection URL on the click log DB 118.

도 3에서는 네 개의 URL(URL1, URL2, URL3, URL4)을 기준으로 하여 연결된 다양한 질의어를 보여주고 있다. 즉, 사용자가 '소나타'라는 질의어를 입력하였을 경우, '소나타'는 'URL1', 'URL2', 'URL4'와 연관성을 가지고 연결되어 있음을 알 수 있다. 3 illustrates various query words connected based on four URLs (URL1, URL2, URL3, and URL4). In other words, when the user inputs a query term 'Sonata', it can be seen that 'Sonata' is connected with 'URL1', 'URL2', and 'URL4'.

이때 'URL1'과 연관성을 가지고 연결된 질의어는 '현대모비스', '기아자동차', '정몽구', '현대자동차'가 있고, 'URL2'와 연관성을 가지고 연결된 질의어는 '현대자동차', '테라칸', '그랜저', '산타페', '제네시스', '에쿠스', '아반테'가 있고, 'URL4'와 연관성을 가지고 연결된 질의어는 '코렐리', '월광소나타' 등이 있음을 알 수 있다. At this time, the query terms connected with 'URL1' are 'Hyundai Mobis', 'Kia Motors', 'Jong Mong-gu', 'Hyundai Motor', and the query terms connected with 'URL2' are 'Hyundai Motor', 'Terracan' , 'Granger', 'Santa Fe', 'Genesis', 'Equus' and 'Avante', and the query words connected with 'URL4' are 'Corelli' and 'Wolwang Sonata'.

도 4는 도 3에서 '소나타'와 연관된 URL과 질의어들 간의 관계를 계층적 구조로 도시한 것으로, '소나타'와 연관된 질의어들의 예가 최단말에 표현된다. FIG. 4 is a hierarchical structure illustrating a relationship between URLs and query words associated with 'Sonata' in FIG. 3, and an example of query terms associated with 'Sonata' is expressed at the shortest end.

도 4를 참조하면, 특히 '현대 자동차'의 경우, 'URL1'과 'URL2'에 모두 연관성을 가지므로 확률적으로 높은 연관성을 가질 것이다. Referring to FIG. 4, in particular, in the case of 'Hyundai Motor Vehicle', it will have a high correlation since it has an association with both 'URL1' and 'URL2'.

위와 같은 사용자에 의해 입력된 질의어와 연관 URL의 연관성과 URL과 연관 질의어의 연관성에 대한 계산은 아래 [수학식 1]에서와 같은 베이지안 모델을 사용 한다. The calculation of the association between the query word and the association URL input by the user as above and the association of the URL and the association query word uses a Bayesian model as shown in [Equation 1] below.

위 [수학식1]에서 P(URL|Q₁)은 Q₁이 출현하였을 때, URL이 출현할 확률이다. 반면, P(Q₂|URL)은 URL이 출현하였을 때, Q₂가 출현할 확률이다. 여기에 클릭 로그 DB(118)에 저장된 해당 URL이 선택된 시간 정보를 가중치로 부여하면 아래의 [수학식2]에서와 같다.In Equation 1, P (URL | Q ₁ ) is a probability that a URL appears when Q ₁ appears. On the other hand, P (Q ₂ | URL) is the probability that Q ₂ appears when the URL appears. In this case, when the corresponding URL stored in the click log DB 118 is assigned the weighted time information as shown in Equation 2 below.

위 [수학식2]에서 TW(Q₁,URL)는 Q₁과 URL 선택시간에 대한 가중치로써, T(Q₁,URL) 선택시간 가중치의 평균으로 아래의 [수학식3]에서와 같이 계산된다.The above calculation, as shown in Equation (2) from TW (Q _1, URL) is by weight for the Q ₁ and the URL selection _{time, T (Q 1, URL)} [ Equation 3] below as the average of the selected time weight do.

위 [수학식3]을 참조하면 현재시간과 T(Q₁,URL)은 1970.01.01 이후의 경과 초로써 계산한다.Referring to [Equation 3] above, the current time and T (Q ₁ , URL) is calculated as the seconds since 1970.01.01.

도 5는 본 발명의 실시 예에 따라 연관 질의어를 의미적인 구조로 클래스를 구분한 예를 도시한 것으로, 도 4의 예에 적용이 가능하다. FIG. 5 illustrates an example of classifying association queries with a semantic structure according to an embodiment of the present invention, and may be applied to the example of FIG. 4.

도 5를 참조하면, '소나타'라는 질의어는 모호성이 있는 질의어로써, 'URL1'과 'URL2'는 자동차와 관련된 것이고, 'URL4'는 음악과 관련된 것이다. 즉, '소나타'라는 질의어는 자동차와 음악과 관련된 모호성을 가지고 있는 질의어로써, 해당 모호성에 대한 해결 없이 관련 질의어를 추천한다면, 사용자에게 의미적 모호성을 가중시켜 질의어 추천의 효과를 극대화 할 수 없다. Referring to FIG. 5, the query term "Sonata" is an ambiguous query word, "URL1" and "URL2" are related to a car, and "URL4" is related to music. In other words, the query "Sonata" is a query with ambiguity related to cars and music. If a relevant query is recommended without solving the ambiguity, the query cannot be maximized by adding semantic ambiguity to the user.

따라서, 본 발명에서는 해당 질의어와 관련된 URL의 문서를 수집하여 문서에 대한 카테고리 분류를 수행한다. 이때 문서에 대해서 해당 유형이 결정되면, 그 URL과 연결된 모든 연관 질의어는 동일한 카테고리로 분류된다. 이로 인해, 도 4의 'URL1'과 'URL2'는 도 5의 유형들 중 '자동차'로 분류되고 따라서, 'URL1'과 'URL2'와 연결된 모든 연관 질의어는 '자동차' 카테고리로 분류된다. 반면, 'URL4'는 '음악' 카테고리로 결정이 되어, 관련 질의어에 대해서 음악 카테고리로 분류된다. Therefore, in the present invention, the document of the URL related to the query is collected and category classification for the document is performed. At this time, if the corresponding type is determined for the document, all related queries associated with the URL are classified into the same category. For this reason, 'URL1' and 'URL2' of FIG. 4 are classified as 'cars' among the types of FIG. 5, and thus, all related queries connected to 'URL1' and 'URL2' are classified into 'car' categories. On the other hand, 'URL4' is determined as a 'music' category, and the related query is classified into a music category.

도 6은 연관 질의어에 대해 도 5에 따른 분류 후 그룹핑하여 초기 질의어에 대한 추천 질의어로 제시한 예를 도시한 것이다. FIG. 6 illustrates an example of grouping related query words after classification according to FIG. 5 and presenting them as recommended query words for an initial query word.

도 6을 참조하면, 현대자동차는 'URL1'과 'URL2'에 의해서 동시에 연관성을 가지고 연결되었기 때문에 연관 관계 값이 높으므로 상위로 순위화되어 제시되는 것을 알 수 있다. 또한 이중 '제네시스'가 최근 가장 많이 질의된 질의어이므로 다 른 질의어들과 동일한 출현 확률을 가지더라도 시간 가중치에 의해서 상위로 순위화되어 표시되는 것을 알 수 있으며, 이를 통해 연관 질의어에 대해 시간적 경향을 반영할 수 있는 것이다.Referring to FIG. 6, since Hyundai Motor Company is simultaneously connected by 'URL1' and 'URL2', the association value is high, and therefore, the present invention is ranked higher. In addition, since 'Genesis' is the most frequently queried query, it can be seen that it is ranked by time weights even though it has the same occurrence probability as other queries, and this reflects the temporal tendency for the related query. You can do it.

상술한 바와 같이, 본 발명에서는 검색엔진에서 연관 질의어를 추천하는 장치 및 방법에 있어서, 기존 검색 엔진의 클릭 로그를 이용하여 질의어와 선택된 URL의 관계성을 수치화하여 제시하고, 클릭 로그에 남겨진 시간 정보와 선택된 URL 문서에 대한 분류 정보 등을 이용하여 사용자에 의해 입력된 초기 질의어와 연관성을 가지는 연관 질의어를 카테고리별로 그룹핑하는 방식으로 초기 질의어와 연관 관계가 높은 다양한 연관 질의어를 제시함으로써 사용자가 원하는 정보를 쉽게 찾을 수 있도록 하며, 또한 URL이 선택된 시간 정보를 반영하여 최근의 정보 경향을 반영할 수 있도록 한다.As described above, in the present invention, in the apparatus and method for recommending a related query in a search engine, the relationship between a query and a selected URL is numerically presented using a click log of an existing search engine, and the time information left in the click log. By using the classification information on the selected URL document and grouping related query words that are related to the initial query input by the user by category, the user wants to provide various related query words that are highly related to the initial query word. It is easy to find and also allows the URL to reflect the latest information trend by reflecting the selected time information.

한편 상술한 본 발명의 설명에서는 구체적인 실시 예에 관해 설명하였으나, 여러 가지 변형이 본 발명의 범위에서 벗어나지 않고 실시될 수 있다. 따라서 발명의 범위는 설명된 실시 예에 의하여 정할 것이 아니고 특허청구범위에 의해 정하여져야 한다.While the invention has been shown and described with reference to certain preferred embodiments thereof, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention. Accordingly, the scope of the invention should not be limited by the described embodiments but should be defined by the appended claims.

도 1은 본 발명의 실시 예에 따른 검색엔진에서 연관질의어 추천장치의 블록 구성도,1 is a block diagram of a related query recommendation apparatus in a search engine according to an exemplary embodiment of the present invention;

도 2는 본 발명의 실시 예에 따른 연관질의어를 추천하는 동작 제어 흐름도,2 is an operation control flowchart for recommending an association query according to an embodiment of the present invention;

도 3은 본 발명의 실시 예에 따른 질의어와 URL간 연관성 관계 도식화 도면,3 is a diagram illustrating an association relationship between a query and a URL according to an embodiment of the present invention;

도 4는 본 발명의 실시 예에 따른 연관 URL과 질의어들 간 관계를 나타낸 계층적 구조도,4 is a hierarchical structure diagram illustrating a relationship between an association URL and query terms according to an embodiment of the present invention;

도 5는 본 발명의 실시 예에 따른 연관 질의어의 클래스 구분 예시도,5 is a diagram illustrating class classification of an association query according to an embodiment of the present invention;

도 6은 본 발명의 실시 예에 따른 추천질의어 제시 예시도,6 is a view showing a suggestion query according to an embodiment of the present invention;

<도면의 주요 부호에 대한 간략한 설명><Brief description of the major symbols in the drawings>

100 : 클라이언트 단말 130 : 검색엔진 서버100: client terminal 130: search engine server

110 : 서버 제어부 112 : 정보 추출부110: server control unit 112: information extraction unit

114 : 빈도 정보 계산부 116 : 색인부114: frequency information calculation unit 116: index unit

118 : 클릭로그 DB 120 : 클릭로그 색인 데이터 DB118: clicklog DB 120: clicklog index data DB

Claims

delete

Association query recommendation device,

An information extraction unit for extracting a plurality of different query terms and URL information selected according to the query terms and time information for each URL selected by referring to the click log;

A frequency information calculating unit for calculating frequency information in which the URL is selected;

An index unit which calculates a relationship between the URL and the query word and generates clicklog index data;

When a query is input, the server control unit searches for an associated URL and an associated query by referring to the clicklog index data, classifies the related query by category, and recommends an associated query that is highly related to the query.

Including;

The server control unit,

The apparatus of claim 1, wherein the association query is calculated by adding the number of selections of the URL related to the associated query and a weight of the most recent time when the corresponding URL is selected.

delete

As a method for recommending an association query in an association query recommendation apparatus,

Searching for an associated URL in the related query recommendation device when a query is input;

Retrieving a plurality of related query terms associated with the query using the searched related URL information in the related query recommendation apparatus;

Calculating an association between the query and the retrieved associated query in the associated query recommendation device;

Classifying the related query by category in the related query recommendation apparatus;

Displaying the classified related query in the ranking according to the correlation in the related query recommendation device;

Including;

Computing the association,

Calculating a number of times of selecting a URL related to the related query by the related query recommendation device;

Extracting the most recent time at which the URL is selected by the associated query recommendation device;

Calculating an association with the query by calculating a weight of the URL selection time and the URL selection time in the associated query recommendation device;

Association query recommendation method comprising a.

The method of claim 7, wherein

The query word,

An association query recommendation method input from an arbitrary client terminal through a wired / wireless communication network.