KR100931693B1

KR100931693B1 - How to search for keywords

Info

Publication number: KR100931693B1
Application number: KR1020070128449A
Authority: KR
Inventors: 김동욱; 정민성; 박한영
Original assignee: 주식회사 다음커뮤니케이션
Priority date: 2007-12-11
Filing date: 2007-12-11
Publication date: 2009-12-14
Also published as: KR20090061436A

Abstract

본 발명은 인터넷상에서 검색 서비스 사용자의 집단지성을 이용하여 사용자의 검색 의도를 만족시켜 주는 키워드를 빠르고 정확하게 검색하도록 한 키워드 검색 방법에 관한 것으로, 사용자가 키워드를 검색할 시에 해당 검색할 키워드를 포함하는 각 웹 문서와 추천 키워드 포함 여부를 검색하며, 추천 키워드를 포함하는 각 웹 문서의 점수에 가중치를 적용시켜, 검색할 키워드를 포함하는 각 웹 문서의 순위 리스트를 조정하며, 이에 조정된 순위 리스트에 따라 검색할 키워드를 포함하는 각 웹 문서의 요약 정보를 생성시켜 사용자에게 제공하도록 함으로써, 사용자가 진정으로 원하는 키워드들과 상관관계가 높은 웹 문서를 보다 빠르고 정확하게 보여 줄 수 있다.The present invention relates to a keyword search method for quickly and accurately searching a keyword that satisfies a user's search intention by using the collective intelligence of a search service user on the Internet, and includes a keyword to be searched when the user searches for the keyword. Search for each web document and keyword included, apply a weight to the score of each web document including the recommended keyword, and adjust the ranking list of each web document including the keyword to be searched. By generating summary information of each web document including a keyword to be searched for and providing to the user, a web document highly correlated with the keywords desired by the user can be displayed more quickly and accurately.

키워드 검색, 검색 의도, 검색 엔진, 집단지성, 색인 Keyword search, search intent, search engine, collective intelligence, index

Description

How to search for keywords {KEYWORD SEARCHING METHOD}

본 발명은 키워드 검색 방법에 관한 것으로, 특히 인터넷상에서 검색 서비스 사용자의 집단지성을 이용하여 사용자의 검색 의도를 만족시켜 주는 키워드를 빠르고 정확하게 검색하도록 한 키워드 검색 방법에 관한 것이다.The present invention relates to a keyword retrieval method, and more particularly, to a keyword retrieval method for quickly and accurately retrieving a keyword satisfying a user's search intention by using collective intelligence of a search service user on the Internet.

일반적으로, 사용자가 인터넷상에서 원하는 정보(예를 들어, 웹 사이트, 웹 문서 등)를 얻기 위해서는, 사용자는 자신의 단말기를 통해 검색 서비스를 제공하는 검색 사이트에 접속한 다음에, 해당 검색 사이트에서 제공하는 검색 창에 검색 키워드를 입력하고 검색을 실행시킴으로써, 해당 검색 사이트의 검색 엔진을 통해 해당 검색 키워드를 포함하는 웹 사이트 또는 웹 문서를 검색하고, 해당 검색 결과를 요약 정보로 제공해 주게 된다.In general, in order for a user to obtain desired information on the Internet (for example, a web site, a web document, etc.), the user accesses a search site that provides a search service through his terminal, and then provides the search site. By entering a search keyword in a search box and executing a search, a search engine of a corresponding search site searches a web site or a web document including the search keyword and provides the search result as a summary information.

이때, 검색 사이트에서 제공하는 검색 창에 입력하는 검색 키워드로 사용자가 원하는 정보를 얻으려는 경우에, 사용자의 검색 의도를 충분히 만족시킬 수 있는 검색 키워드를 검색 창에 입력해야 함에도 불구하고, 우리말(즉, 한글) 단어의 중의성 및 형태소 분석 오류, 문서 작성자의 실수 등으로 인해 잘못된 검색 키워드를 입력함으로써, 입력된 검색 키워드가 사용자의 검색 의도를 충분히 반영시키지 못하는 문제점이 있다.In this case, if the user wants to obtain the desired information by the search keyword input in the search box provided by the search site, the search keyword that satisfies the user's search intention should be entered in the search box. By inputting an incorrect search keyword due to the neutrality and stemming error of a word, a mistake of a document author, and the like, there is a problem that the input search keyword does not sufficiently reflect the search intention of the user.

첫 번째 중의성에 의한 문제점을 예를 들어 보다 상세히 살펴보면 다음과 같다.Looking at the problem caused by the first neutrality in more detail as follows.

검색 엔진에서 '다음'이라는 키워드로 검색을 수행했을 경우, 실제 사용자는 '다음 커뮤니케이션'과 관련된 정보를 얻기 위한 의도로 '다음'이라는 키워드를 입력했음에도 불구하고, 기존의 검색 엔진에서는 단순히 사용자가 입력한 검색 키워드를 포함하고 있는지의 여부만으로 검색 결과를 제공하기 때문에, '이전'과 반대되는 의미의 '다음'이라는 키워드를 포함하는 문서들을 주로 검색 결과로 제공하고, '기업 및 인터넷 서비스' 의미의 '다음'은 일부 밖에 제공하지 않을 수 있다.When a user searches on the search engine with the keyword 'next', even though the actual user enters the keyword 'next' with the intention of obtaining information related to 'next communication', the existing search engine simply inputs the user. Because search results are provided only by whether or not they contain one search keyword, documents containing the keyword "next" as opposed to "previous" are mainly provided as search results, and the meaning of "corporate and internet services" 'Next' may only provide some.

다시 말해서, 평소 '다음'으로 검색하는 사용자들이 '다음 커뮤니케이션'과 관련된 정보를 얻기 위한 의도로 입력하게 되는 다른 검색어는, '다음검색', '미디어다음', '다음카페' 등과 같이, 기업 또는 인터넷 서비스로서의 '다음'임에도 불구하고, 검색 엔진에서 '다음'이라는 키워드로 검색했을 때, 한글 문서에서 '다음'이라는 단어는 '이전'과 반대되는 의미로 더 많이 나타날 수 있다.In other words, other search terms that users who normally search for 'next' enter with the intention of obtaining information related to 'next communication' may include companies or 'next search', 'next media', 'next cafe', etc. Despite being 'next' as an internet service, the word 'next' may appear more in the Korean document as opposed to 'previous' when searched with the keyword 'next' in a search engine.

다른 예를 살펴보면, 검색 엔진에서 'LA'라는 키워드로 검색을 수행했을 경우, 실제 사용자는 '미국 도시'와 관련된 정보를 얻기 위한 의도로 'LA'라는 키워드를 입력했음에도 불구하고, 'la traviata' 등이 포함되어 있는 문서들을 주로 검색하게 되어 정확성이 떨어지는 경우가 발생한다.In another example, if a search engine performed a search with the keyword "LA", the actual user entered the keyword "LA" for the purpose of obtaining information about "US cities", The documents containing the back are mainly searched, and thus the accuracy is low.

두 번째 형태소 분석 및 문서 작성 오류에 의한 문제점을 예를 들어 보다 상세히 살펴보면 다음과 같다.Looking at the problem caused by the second stemming and documentation errors, for example, as follows.

검색 엔진에서 '에도'라는 키워드로 검색을 수행했을 경우, 실제 사용자는 '일본'과 관련된 정보를 얻기 위한 의도로 '에도'라는 키워드를 입력했음에도 불구하고, 기존 검색 엔진의 형태소 분석기가 '에도'라는 낱말이 명사로 사용되었는지, 조사로 사용되었는지를 확률적으로 판단할 수도 있지만 '편파 판정 에도 좌절하기 않았다'와 같은 문장에서처럼 웹 문서에 존재하는 수많은 문서 작성의 오류 때문에, 조사로서의 '에도'라는 키워드를 포함하는 문서들을 주로 검색 결과로 제공하고, 명사로서의 '에도'는 일부 밖에 제공하지 않음으로써, 형태소 분석기가 정확한 검색 결과를 도출해 낼 수 없는 경우가 더 많다.If a search engine performed a search with the keyword 'Edo', the actual search engine stemmer would search for 'Edo' even though the user entered the keyword 'Edo' with the intention of obtaining information related to 'Japan'. It is possible to determine probabilisticly whether the word is used as a noun or an investigation, but because of the many errors in the writing of a web document, such as in a sentence such as 'I wasn't frustrated with polarization,' 'Edo' as an investigation More often, documents containing keywords are provided as search results, and only a few 'Edo' as a noun can prevent the stemmer from producing accurate search results.

다시 말해서, 평소 '에도'로 검색하는 사용자들이 '일본'과 관련된 정보를 얻기 위한 의도로 입력하게 되는 다른 검색어는, '에도가와 란포', '에도시대' 등과 같이, 명사로서의 '에도'임에도 불구하고, 검색 엔진에서 '에도'라는 키워드로 검색했을 때, 한글 문서에서 '에도'라는 낱말이 조사로서 사용되는 경우가 더 많기 때문에, 키워드 검색 결과에서 '일본' 관련 문서를 찾기 어려운 문제점이 있다.In other words, other search terms entered by users searching for 'Edo' with the intention of obtaining information related to 'Japan' are 'Edo' as a noun, such as 'Edogawa Ranpo' and 'Edo'. Nevertheless, when the keyword "Edo" is searched in a search engine, the word "Edo" is more often used as a search in Korean documents, so it is difficult to find "Japan" related documents in the keyword search results. .

다른 예를 살펴보면, 검색 엔진에서 '가야'라는 키워드로 검색을 수행했을 경우, 실제 사용자는 '역사' 또는 '가야 CC'와 관련된 정보를 얻기 위한 의도로 '가야'라는 키워드를 입력했음에도 불구하고, 동사로서의 '가야'를 포함하고 있는 문서들, 예를 들어 '가야 할 길', '혼수는 얼마나 해 가야 하나요'와 같은 문서들을 주로 검색하게 되는 경우가 발생한다.In another example, if a search engine performed a search with the keyword 'gaya', the actual user entered the keyword 'gaya' with the intention of obtaining information related to 'historic' or 'gaya cc'. It is often the case that documents containing 'gaya' as a verb are mainly searched for documents such as 'way to go' and 'how long should coma go'.

이와 같이, 종래의 키워드 검색은 한글 단어의 중의성, 형태소 분석 및 문서 작성의 오류로 인하여 사용자의 검색 의도를 충분히 만족시키지 못 하는 문제점이 있었다.As described above, the conventional keyword search has a problem that the user's search intention is not sufficiently satisfied due to the neutrality of the Hangul word, the morphological analysis, and the error of document writing.

본 발명이 이루고자 하는 기술적 과제는, 전술한 바와 같은 문제점을 해결하기 위한 것으로, 인터넷상에서 검색 서비스 사용자의 집단지성을 이용하여 사용자의 검색 의도를 만족시켜 주는 키워드를 빠르고 정확하게 검색하도록 한 키워드 검색 방법을 제공하는 것이다.SUMMARY OF THE INVENTION The present invention has been made in an effort to solve the problems described above, and provides a keyword search method for quickly and accurately searching a keyword that satisfies a user's search intention by using collective intelligence of a search service user on the Internet. To provide.

그리고 본 발명은 검색 엔진에서 사용자의 집단지성을 활용하여 사용자의 검색 의도를 충분히 만족시켜 주는 키워드 검색을 빠르고 정확하게 수행하도록 함으로써, 사용자가 진정으로 원하는 키워드들과 상관관계가 매우 높은 웹 문서만을 검색하여 검색의 정확도를 개선 또는 향상시키도록 하는데, 그 목적이 있다.In addition, the present invention allows the search engine to perform keyword search that satisfactorily satisfies the user's search intention by using the collective intelligence of the user, thereby searching only the web documents having a very high correlation with the keywords desired by the user. The purpose is to improve or improve the accuracy of the search.

이러한 과제를 해결하기 위해, 본 발명에 따르면, 검색 엔진에서 특정 키워드를 검색할 시에 사용자들이 입력하는 검색어에 대한 사용자 패턴(즉, 사용자의 집단지성)을 분석하고 이를 이용하여 사용자의 검색 의도를 잘 반영해 줄 수 있는 관련 키워드들을 추출해서 각 검색어들에 대한 추천 키워드들로 데이터베이스화하여 구성하며, 검색 대상이 되는 웹 문서를 미리 분석하여 검색어를 포함한 웹 문서들 중에서 추천 키워드를 포함한 웹 문서에 대한 역색인 구조(즉, 데이터)를 형성한 후에, 실제 키워드 검색 시에 검색어에 대한 추천 키워드들을 찾아 역색인 구조를 이용하여 관련성이 높은 웹 문서들만 노출함으로써, 사용자가 진정으로 원하는 키워드들과 상관관계가 높은 웹 문서를 빠르고 정확하게 보여 줄 수 있도록 한다.In order to solve this problem, according to the present invention, when searching for a specific keyword in the search engine to analyze the user pattern (ie, the collective intelligence of the user) for the search words entered by the user to use the user's search intention It extracts related keywords that can be well reflected and organizes them into a database of suggested keywords for each search word. It analyzes the web documents to be searched in advance and then applies them to the web documents containing the suggested keywords among the web documents including the search terms. After creating an inverted index structure (i.e., data) for a keyword, it searches for suggested keywords for the search term in actual keyword search and uses only the inverted index structure to expose only relevant web documents to correlate with the keywords that the user really wants. Allows you to quickly and accurately display highly relevant web documents.

본 발명의 한 특징에 따르면, 검색 엔진이 사용자가 입력한 키워드를 검색하는 방법에 있어서, 사용자가 입력한 키워드를 수신하는 경우, 상기 키워드를 포함하는 웹 문서를 검색하고, 상기 웹 문서 내에 상기 키워드에 대응하는 추천 키워드가 포함되어 있는지 여부를, 상기 키워드를 포함하고 있는 각 웹 문서 내에 상기 추천 키워드가 포함되어 있는지의 여부를 나타내는 식별 정보를 이용하여 확인하는 단계, 상기 추천 키워드가 포함되어 있는 웹 문서에 대하여 점수 가중치를 추가하고, 상기 가중치 추가 여부, 상기 웹 문서와 상기 키워드와의 매칭 정보 또는 페이지 랭킹에 따라 점수화를 수행하고, 점수가 높은 순서대로 소정 개수의 웹 문서의 리스트를 생성하는 단계, 그리고 상기 리스트에 포함된 웹 문서의 요약 정보를 생성시켜 사용자에게 제공하는 단계를 포함하는 키워드 검색 방법을 제공한다.According to an aspect of the present invention, in a method for a search engine searching for a keyword input by a user, when a keyword input by the user is received, a search engine searches for a web document including the keyword, and searches for the keyword in the web document. Confirming whether or not the recommendation keyword corresponding to is included using identification information indicating whether the recommendation keyword is included in each web document including the keyword, and the web including the recommendation keyword. Adding score weights to documents, performing scoring according to whether the weights are added, matching information between the web document and the keyword or page ranking, and generating a list of a predetermined number of web documents in order of high scores; And generate summary information of the web documents included in the list to the user. It provides a keyword search method comprising the steps of.

그리고 이러한 키워드 검색 방법은, 사용자의 집단지성을 이용하여 상기 검색할 키워드에 대응하는 추천 키워드들을 설정하여 데이터베이스화하는 단계, 상기 검색할 키워드를 포함하고 있는 각 웹 문서의 아이디에 대한 정보를 설정하는 단계, 상기 검색할 키워드를 포함하고 있는 각 웹 문서 내에 상기 추천 키워드가 포함되어 있는지의 여부를 나타내는 식별 정보를 설정하는 단계, 상기 설정된 각 웹 문서의 아이디에 대한 정보 및 식별 정보를 데이터베이스화하는 단계, 그리고 상기 웹 문서 아이디에 대응하는 웹 문서의 본문에 대한 정보를 설정하여 데이터베이스화하는 단계를 더 포함한다. 여기서, 상기 식별 정보는, 상기 검색할 키워드를 포함하고 있는 웹 문서들 중에서 상기 추천 키워드를 포함한 웹 문서에 대해 점수 가중치를 주도록 하기 위한 플래그 비트이다.In the keyword search method, using the collective intelligence of the user, setting the recommended keywords corresponding to the keyword to be searched and databased, and setting information on the ID of each web document including the keyword to be searched. Setting identification information indicating whether the recommendation keyword is included in each web document including the keyword to be searched, and databaseting information on the ID and identification information of each set web document. And setting information on a body of a web document corresponding to the web document ID to make a database. Here, the identification information is a flag bit for giving a score weight to a web document including the recommendation keyword among the web documents including the keyword to be searched.

그리고 이러한 키워드 검색 방법은, 상기 검색할 키워드에 대응하는 추천 키워드들에 대한 정보, 상기 검색할 키워드를 포함하고 있는 각 웹 문서 아이디 및 식별 정보, 상기 웹 문서 아이디에 대응하는 웹 문서 본문에 대한 정보를 주기적으로 재설정하는 단계를 더 포함한다.The keyword searching method includes information on recommended keywords corresponding to the keyword to be searched, each web document ID and identification information including the keyword to be searched, and information about a web document body corresponding to the web document ID. It may further comprise the step of resetting periodically.

한편, 상기 추천 키워드가 포함되어 있는지 여부를 확인하는 단계는, 사용자가 입력한 키워드를 포함하고 있는 각 웹 문서의 아이디를 상기 데이터베이스로부터 검색하는 단계, 그리고 사용자가 입력한 키워드를 포함하고 있는 각 웹 문서 아이디에 대한 식별 정보를 상기 데이터베이스로부터 판독하는 단계를 포함한다.On the other hand, the step of checking whether the recommendation keyword is included, the step of retrieving the ID of each web document containing the keyword entered by the user from the database, and each web containing the keyword entered by the user Reading identification information for a document ID from the database.

또한, 상기 소정 개수의 웹 문서의 리스트를 생성하는 단계는, 사용자가 입력한 키워드를 포함하고 있는 각 웹 문서 내에 사용자가 입력한 키워드에 대응하는 추천 키워드가 포함되어 있는지의 여부를 상기 식별 정보로부터 확인하는 단계, 사용자가 입력한 키워드를 포함하고 있는 각 웹 문서 내에 상기 추천 키워드가 포함되어 있는 경우, 상기 추천 키워드가 포함된 각 웹 문서에 대하여 점수 가중치를 추가하는 단계, 사용자가 입력한 키워드를 포함하고 있는 각 웹 문서에 대하여 상기 가중치 추가 여부, 사용자가 입력한 키워드와의 매칭 정도 또는 페이지 랭킹에 따라 점수화를 수행하는 단계, 그리고 상기 점수가 높은 순서대로 소정 개수의 웹 문서 아이디를 리스트로 생성하는 단계를 포함한다.The generating of the list of the predetermined number of web documents may include whether the recommendation keyword corresponding to the keyword input by the user is included in each web document including the keyword input by the user from the identification information. Checking, if the recommendation keyword is included in each web document including the keyword input by the user, adding a score weight to each web document including the recommendation keyword; Scoring each web document including whether the weight is added according to whether the weight is added, the degree of matching with the keyword input by the user, or the page ranking, and a predetermined number of web document IDs are generated in the order of the high scores. It includes a step.

다르게는, 상기 소정 개수의 웹 문서의 리스트를 생성하는 단계는, 사용자가 입력한 키워드를 포함하고 있는 각 웹 문서에 대하여 사용자가 입력한 키워드와의 매칭 정도 또는 페이지 랭킹에 따라 점수화를 수행하는 단계, 사용자가 입력한 키워드를 포함하고 있는 각 웹 문서 내에 사용자가 입력한 키워드에 대응하는 추천 키워드가 포함되어 있는지의 여부를 상기 식별 정보로부터 확인하는 단계, 사용자가 입력한 키워드를 포함하고 있는 각 웹 문서 내에 상기 추천 키워드가 포함되어 있는 경우, 상기 추천 키워드가 포함된 각 웹 문서의 점수에 가중치를 적용시켜 사용자가 입력한 키워드를 포함하고 있는 각 웹 문서의 점수를 재조정하는 단계, 그리고 상기 재조정된 점수가 높은 순서대로 소정 개수의 웹 문서 아이디를 리스트로 생성하는 단계를 포함한다.Alternatively, the generating of the list of the predetermined number of web documents may include: scoring each web document including a keyword input by the user according to the degree of matching with the keyword input by the user or page ranking. Checking, from the identification information, whether or not the recommended keyword corresponding to the keyword entered by the user is included in each web document including the keyword entered by the user; and each web including the keyword entered by the user. If the recommendation keyword is included in the document, applying a weight to a score of each web document including the recommendation keyword to readjust the score of each web document including the keyword input by the user, and Generating a predetermined number of web document IDs in the order of high score. It is.

본 발명의 다른 특징에 따르면, 사용자가 입력한 키워드를 검색하는 방법에 있어서, 사용자의 집단지성을 이용하여 사용자가 검색할 키워드에 대응하는 추천 키워드들을 설정하여 데이터베이스화하는 단계, 사용자가 검색할 키워드를 포함하고 있는 각 웹 문서의 아이디에 대한 정보를 설정하는 단계, 사용자가 검색할 키워드를 포함하고 있는 각 웹 문서 내에 상기 추천 키워드가 포함되어 있는지의 여부를 나타내는 식별 정보를 설정하는 단계, 상기 설정된 각 웹 문서의 아이디에 대한 정보 및 식별 정보를 데이터베이스화하는 단계, 상기 웹 문서 아이디에 대응하는 웹 문서의 본문에 대한 정보를 설정하여 데이터베이스화하는 단계, 사용자가 키워드를 검색하는 경우에 사용자가 입력한 키워드를 포함하는 각 웹 문서 아이디와 식별 정보를 상기 데이터베이스로부터 검색하는 단계, 사용자가 입력한 키워드를 포함하는 각 웹 문서 내에 사용자가 입력한 키워드에 대응하는 추천 키워드가 포함되어 있는지의 여부를 상기 식별 정보로부터 확인하는 단계, 사용자가 입력한 키워드를 포함하는 각 웹 문서 내에 상기 추천 키워드가 포함되어 있는 경우, 상기 추천 키워드가 포함된 각 웹 문서에 대하여 점수 가중치를 추가하는 단계, 사용자가 입력한 키워드를 포함하는 각 웹 문서에 대하여 상기 가중치 추가 여부, 사용자가 입 력한 키워드와의 매칭 정도 또는 페이지 랭킹에 따라 점수화를 수행하는 단계, 상기 점수가 높은 순서대로 소정 개수의 웹 문서 아이디를 리스트로 생성하는 단계, 그리고 상기 생성된 웹 문서 아이디 리스트에 따라 사용자가 입력한 키워드를 포함하는 각 웹 문서의 요약 정보를 생성시켜 사용자에게 제공하는 단계를 포함한다.According to another aspect of the present invention, in the method for searching a keyword input by the user, using the group intelligence of the user to set the recommended keywords corresponding to the keyword to be searched by the database, the keyword to be searched by the user Setting information about an ID of each web document including a; setting identification information indicating whether the recommendation keyword is included in each web document including a keyword to be searched by a user; Database information and identification information of each web document ID, database information by setting information on the body of the web document corresponding to the web document ID, the user input when the user searches for keywords Recall each web document ID and identification information containing one keyword. Searching from the database; checking from the identification information whether a recommended keyword corresponding to the keyword entered by the user is included in each web document including the keyword entered by the user; and checking the keyword entered by the user. When the recommendation keyword is included in each web document including, adding a score weight to each web document including the recommendation keyword, whether the weight is added to each web document including the keyword input by the user. Scoring according to a matching degree or page ranking with a keyword input by a user, generating a predetermined number of web document IDs in a list in order of high score, and according to the generated web document ID list. A summary of each web document containing the keywords entered by the user. Castle to include the step of providing to the user.

이와 같이 본 발명에 의하면, 사용자의 검색어에 대해 사용자의 검색 의도를 반영한 추천 단어를 데이터베이스화하고 검색어를 포함한 웹 문서들 중에서 추천 단어를 포함한 웹 문서에 대한 역색인 구조를 형성한 후에, 특정 키워드를 검색할 시에 해당 데이터베이스와 역색인 구조를 이용하여 해당 키워드를 포함하는 웹 문서들 중에서 추천 단어를 포함하는 웹 문서에 점수 가중치를 주어, 웹 문서 순위 리스트를 사용자가 원하는 키워드들과 상관관계가 높은 순서대로 조정하여 사용자에게 제공함으로써, 사용자가 진정으로 원하는 키워드들과 상관관계가 높은 웹 문서를 보다 빠르고 정확하게 보여 줄 수 있다.As described above, according to the present invention, after a database of suggested words reflecting the user's search intention for the user's search word and a reverse index structure for the web document including the suggestion word among the web documents including the search word, a specific keyword is selected. When searching, the database and inverse index structure are used to give a weighted weight to the web document containing the suggested word among the web documents containing the keyword, so that the web document ranking list is highly correlated with the desired keywords. By adjusting the order and providing it to the user, the user can quickly and accurately display web documents that are highly correlated with the keywords he or she really wants.

아래에서는 첨부한 도면을 참고로 하여 본 발명의 실시 예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시 예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art may easily implement the present invention. As those skilled in the art would realize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present invention. In the drawings, parts irrelevant to the description are omitted in order to clearly describe the present invention, and like reference numerals designate like parts throughout the specification.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 "…부", "…기", "…모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.Throughout the specification, when a part is said to "include" a certain component, it means that it can further include other components, without excluding other components unless specifically stated otherwise. In addition, the terms “… unit”, “… unit”, “… module” described in the specification mean a unit that processes at least one function or operation, which may be implemented by hardware or software or a combination of hardware and software. Can be.

이제 본 발명의 실시 예에 따른 키워드 검색 방법에 대하여 도면을 참고로 하여 상세하게 설명한다.Now, a keyword search method according to an embodiment of the present invention will be described in detail with reference to the drawings.

도 1은 본 발명의 실시 예에 따른 키워드 검색을 위한 시스템의 개략적인 개념도이다.1 is a schematic conceptual diagram of a system for keyword search according to an embodiment of the present invention.

본 발명의 실시 예에 따른 키워드 검색을 위한 시스템은, 도 1에 도시된 바와 같이, 사용자 단말기(110), 유무선 통신망(120), 검색 엔진(130), 데이터베이스(140)를 포함한다.As shown in FIG. 1, a system for keyword search according to an embodiment of the present invention includes a user terminal 110, a wired / wireless communication network 120, a search engine 130, and a database 140.

사용자 단말기(110)는 사용자들이 특정 키워드를 검색하기 위한 검색어를 입력하는 경우에 해당 입력된 검색어를 유무선 통신망(120)을 통해 검색 엔진(130)에 전달하며, 검색 엔진(130)으로부터 수신되는 검색 결과를 사용자에게 제공한다.When the user inputs a search word for searching for a specific keyword, the user terminal 110 transmits the input search word to the search engine 130 through the wired / wireless communication network 120, and the search received from the search engine 130. Provide the results to the user.

유무선 통신망(120)은 사용자 단말기(110)와 검색 엔진(130) 사이를 서로 연결하여 통신을 수행하도록 하는 역할을 수행한다.The wired / wireless communication network 120 connects the user terminal 110 and the search engine 130 to each other to perform communication.

검색 엔진(130)은 사용자 단말기(110)로부터 수신되는 검색어를 포함한 웹 문서들을 검색하며, 사용자 단말기(110)로부터 수신되는 검색어에 대응하는 추천 키워드가 해당 검색된 웹 문서들에 포함되어 있는지를 식별하며, 해당 추천 키워드를 포함한 웹 문서들에 대해 점수 가중치를 준 후에 해당 검색어를 포함한 웹 문서들에 대한 점수화를 수행하며, 해당 점수화에 따른 웹 문서의 아이디(ID: Identifier) 리스트를 생성시키며, 해당 아이디 리스트에 따라 해당 검색 결과(즉, 검색어를 포함한 웹 문서)에 대한 요약 정보를 생성하여 검색 결과로 유무선 통신망(120)을 통해 사용자 단말기(110)로 전송한다.The search engine 130 searches the web documents including the search term received from the user terminal 110, and identifies whether the searched web documents include the recommended keyword corresponding to the search term received from the user terminal 110. After scoring the weights of the web documents including the recommended keyword, the web documents including the search terms are scored, and a list of IDs (IDs) of the web documents according to the scoring is generated. According to the list, summary information on the corresponding search result (ie, a web document including a search word) is generated and transmitted as a search result to the user terminal 110 through the wired / wireless communication network 120.

그리고 검색 엔진(130)은 쿼리 브로커(Query Broker)(131), 서처(Searcher)(132), 써머리(Summary)(133)를 포함한다.The search engine 130 includes a query broker 131, a searcher 132, and a summary 133.

쿼리 브로커(131)는 사용자 단말기(110)로부터 유무선 통신망(120)을 통해 사용자가 입력한 검색어를 수신받아 해당 수신받은 검색어를 서처(132)로 인가한다. 쿼리 브로커(131)는 서처(132)로부터 인가되는 웹 문서 아이디의 리스트에 대한 정보를 써머리(133)에게 인가한다. 쿼리 브로커(131)는 써머리(133)로부터 인가되는 각 웹 문서의 요약 정보를 검색 결과로 유무선 통신망(120)을 통해 사용자 단말기(110)로 전송한다.The query broker 131 receives a search word input by the user from the user terminal 110 through the wired / wireless communication network 120 and applies the received search word to the searcher 132. The query broker 131 authorizes the summary 133 with information about the list of web document IDs authorized from the searcher 132. The query broker 131 transmits summary information of each web document authorized from the summary 133 to the user terminal 110 through the wired / wireless communication network 120 as a search result.

서처(132)는 쿼리 브로커(131)로부터 인가되는 검색어를 포함하고 있는 웹 문서를 데이터베이스(140)에서 검색하고, 해당 검색된 웹 문서 내에 추천 키워드가 포함되어 있는지의 여부를 데이터베이스(140)에서 확인하고, 해당 확인 결과에 따라 해당 검색된 웹 문서의 점수에 가중치를 준 후에, 해당 검색된 웹 문서에 대한 점수화를 수행하고, 해당 점수화에 따른 웹 문서의 아이디 리스트를 생성시켜 해당 생성된 웹 문서의 아이디 리스트에 대한 정보를 쿼리 브로커(131)로 인가한다.The searcher 132 searches the database 140 for a web document including a search word authorized from the query broker 131, checks in the database 140 whether the keyword is included in the searched web document. After weighting the score of the searched web document according to the check result, scoring the searched web document is performed, and generating a list of IDs of the web document according to the scoring to generate the ID list of the generated web document. Information about the query broker 131.

써머리(133)는 쿼리 브로커(131)로부터 웹 문서 아이디의 리스트에 대한 정보를 인가받아 해당 인가받은 웹 문서 아이디 리스트의 각 웹 문서 아이디에 해당하는 웹 문서의 본문 내용을 데이터베이스(140)에서 판독하여, 해당 인가받은 웹 문서 아이디 리스트에 따라 각 웹 문서의 요약 정보를 생성하여 쿼리 브로커(131)에게 인가한다.The summary 133 receives information about the list of web document IDs from the query broker 131 and reads the body content of the web document corresponding to each web document ID of the authorized web document ID list from the database 140. In response to the list of authorized web document IDs, summary information of each web document is generated and authorized to the query broker 131.

데이터베이스(140)는 사용자들의 각 검색어에 대응하는 사용자 검색 의도 반영용 추천 키워드에 대한 정보, 사용자들의 각 검색어에 대응하는 웹 문서의 아이디에 대한 정보, 사용자들의 각 검색어에 대응하는 추천 키워드를 포함한 웹 문서의 아이디에 대한 식별 정보, 각 웹 문서 아이디에 대응하는 웹 문서 본문에 대한 정보를 데이터베이스화시켜 저장한다. 이때, 데이터베이스(140)는 그 기능이나 역할에 따라 별도의 메모리로 구성될 수도 있음을 잘 이해해야 한다.The database 140 includes a web including information on recommended keywords for reflecting the user's search intention corresponding to each search word of the user, information about an ID of a web document corresponding to each search word of the user, and a recommendation keyword corresponding to each search word of the users. Identification information on the document ID and information on the web document body corresponding to each web document ID are stored in a database. At this time, it should be understood that the database 140 may be configured as a separate memory according to its function or role.

아래에서는 본 발명의 실시 예에 따른 키워드 검색 방법에 대해서 도 2의 순서도를 참조하여 상세하게 설명한다.Hereinafter, a keyword search method according to an embodiment of the present invention will be described in detail with reference to the flowchart of FIG. 2.

우선, 사용자가 특정 키워드를 검색하기 전에 미리 준비해 두어야 할 것이 있다. 이것은 사용자의 검색 의도를 충분히 만족하는 검색 결과를 보다 빠르고 정확하게 얻도록 하기 위해서, 사용자가 입력할 수 있는 키워드에 상응하는 추천 단어를 데이터베이스화하며, 색인 작업 시에 사용자가 입력할 수 있는 키워드에 상응하는 추천 단어를 판독하고 사용자가 입력할 수 있는 키워드를 포함한 문서를 찾고 해당 찾은 웹 문서들 중에서 해당 추천 단어를 포함한 웹 문서의 아이디에 대해 점수 가중치를 줄 것을 표시하는 식별 정보를 추가하여 역색인 구조를 형성하도록 한 다.First of all, there is something to prepare before the user can search for a specific keyword. In order to obtain a search result that satisfies the user's search intentions more quickly and accurately, it database the suggested words corresponding to the keywords that the user can enter, and correspond to the keywords that the user can enter when indexing. An index structure by reading a suggestion, searching for a document containing a keyword that the user can enter, and adding identifying information to indicate a weighted score for the ID of the web document containing the suggestion among the found web documents. To form.

첫 번째로, 사용자의 검색어 입력 시에 사용자의 검색 의도를 보다 정확하게 파악하기 위한 관련 검색어 집합을 생성하여 데이터베이스(140)에 보관한다.First, when a user enters a search term, a related search term set for more accurately grasping the search intention of the user is generated and stored in the database 140.

다시 말해서, 사용자가 검색을 위해 입력할 수 있는 키워드(이하, 쿼리(Query)라 함)에 대해서 사용자의 집단지성을 이용하여 사용자의 검색 의도를 보다 잘 반영해 줄 수 있는 관련 키워드(즉, 추천 키워드)를 분석하여, 각 쿼리에 대응하는 사용자 검색 의도 반영용 추천 키워드에 대한 정보를 설정하며, 해당 설정된 정보들을 데이터베이스화하여 데이터베이스(140)(또는, 별도로 구비된 메모리)에 저장한다(S201).In other words, for a keyword that a user can enter for a search (hereinafter referred to as a query), the user's collective intelligence can be used to better reflect the user's search intent (ie, recommend a keyword). Keyword), and sets information on a recommendation keyword for reflecting a user's search intention corresponding to each query, and stores the set information in a database 140 (or a memory provided separately) (S201). .

두 번째, 사용자가 검색을 위해 입력할 수 있는 쿼리에 대해서 검색 대상이 되는 모든 웹 문서들을 미리 분석하여 해당 분석한 데이터 값을 데이터베이스(140)에 보관시켜 추후 색인 작업 시에 사용할 수 있도록 한다.Second, all the web documents that are to be searched for the query that the user can input for the search are analyzed in advance, and the analyzed data values are stored in the database 140 so that they can be used for later indexing.

다시 말해서, 사용자가 검색을 위해 입력할 수 있는 쿼리에 대한 형태소 분석을 통해서 해당 쿼리를 포함한 웹 문서를 색인할 수 있도록 함과 동시에, 해당 색인된 웹 문서들 중에서 해당 쿼리에 대응하는 추천 키워드를 포함한 웹 문서인지를 식별할 수 있도록 한다.In other words, by stemming on a query that a user can enter for a search, you can index a web document that contains the query, and include a keyword that matches the query among the indexed web documents. Allows you to identify whether it is a web document.

이때, 검색 대상이 되는 모든 웹 문서들에서 키워드를 추출하여, 해당 키워드가 주어지는 경우에 해당 각 웹 문서의 아이디를 찾을 수 있도록, 각 키워드에 대응하는 웹 문서 아이디에 대한 정보를 설정한다(S202).At this time, a keyword is extracted from all web documents to be searched, and information about a web document ID corresponding to each keyword is set so that the ID of each web document can be found when the keyword is given (S202). .

그런 다음에, 사용자가 검색을 위해 입력할 수 있는 쿼리를 포함한 웹 문서 들 내에 해탕 퀴리에 대응하는 추천 키워드가 있는지를 확인하여, 해당 키워드가 주어지는 경우에 해당 키워드에 대응하는 각 웹 문서 내에 해당 키워드에 대응하는 추천 키워드가 포함되어 있는지를 식별하여 해당 추천 키워드를 포함한 웹 문서에 대해 점수 가중치를 줄 수 있도록, 각 키워드에 대응하는 웹 문서들 내의 추천 키워드 포함 여부를 나타내는 식별 정보(예를 들어, 플래그 비트(Flag Bit))를 설정하도록 한다(S203).Then, it checks whether there is a suggestion keyword corresponding to the dissolution query in the web documents including the query that the user can enter for the search, and if the keyword is given, the keyword in each web document corresponding to the keyword. Identification information indicating whether a keyword included in the web document corresponding to each keyword is included to identify whether the keyword corresponding to the keyword is included and to give a score weight to the web document including the keyword. A flag bit is set (S203).

예를 들어, 사용자가 입력한 쿼리에 대응하는 추천 키워드가 사용자가 입력한 쿼리를 포함한 웹 문서 내에 포함되어 있는 경우, 플래그 비트를 '1'로 설정하도록 한다. 반면에, 사용자가 입력한 쿼리에 대응하는 추천 키워드가 사용자가 입력한 쿼리를 포함한 웹 문서 내에 포함되어 있지 않은 경우, 플래그 비트를 '0'으로 설정하도록 한다.For example, when the recommendation keyword corresponding to the query input by the user is included in the web document including the query input by the user, the flag bit is set to '1'. On the other hand, if the recommendation keyword corresponding to the query input by the user is not included in the web document including the query input by the user, the flag bit is set to '0'.

이에, 상술한 단계 S202에서 설정된 각 키워드에 대응하는 웹 문서 아이디에 대한 정보와 상술한 단계 S203에서 설정된 각 키워드에 대응하는 추천 키워드를 포함한 웹 문서 아이디에 대한 식별 정보를 데이터베이스화한 후에 이를 데이터베이스(140)(또는, 별도로 구비된 메모리)에 저장한다(S204). 이때, 각 키워드에 대응하는 웹 문서 아이디 정보와 각 키워드에 대응하는 추천 키워드를 포함한 웹 문서 아이디에 대한 식별 정보에 대한 데이터베이스(140)의 구조를 '역색인 구조'라 한다.Accordingly, after the database of the identification information for the web document ID including the information on the web document ID corresponding to each keyword set in step S202 and the recommended keyword corresponding to each keyword set in step S203, the database ( 140 (or a separate memory) (S204). In this case, the structure of the database 140 for the web document ID information corresponding to each keyword and the identification information for the web document ID including the recommendation keyword corresponding to each keyword is referred to as an inverted index structure.

세 번째, 해당 웹 문서 아이디가 주어지는 경우에 해당 각 웹 문서의 본문을 찾을 수 있도록 각 웹 문서 아이디에 대응하는 웹 문서 본문에 대한 정보를 설정하 여 데이터베이스화한 후에 이를 데이터베이스(140)(또는, 별도로 구비된 메모리)에 저장한다(S205).Third, if the web document ID is given, information about the web document body corresponding to each web document ID is set and databased so that the body of each web document can be found, and then the database 140 (or, The memory is stored separately (S205).

추후에, 사용자들의 각 쿼리에 대응하는 사용자 검색 의도 반영용 추천 키워드에 대한 정보, 사용자들의 각 쿼리에 대응하는 웹 문서의 아이디에 대한 정보, 사용자들의 각 쿼리에 대응하는 추천 키워드를 포함한 웹 문서의 아이디에 대한 식별 정보, 각 웹 문서 아이디에 대응하는 웹 문서 본문에 대한 정보가 삭제, 변경, 또는 수정되는 경우, 해당 삭제, 변경 또는 수정된 정보를 데이터베이스(140)에 업그레이드시켜 준다. 다르게는, 주기적으로 상술한 단계들 S201에서부터 S205를 반복 수행하여 사용자들의 각 쿼리에 대응하는 사용자 검색 의도 반영용 추천 키워드에 대한 정보, 사용자들의 각 쿼리에 대응하는 웹 문서의 아이디에 대한 정보, 사용자들의 각 쿼리에 대응하는 추천 키워드를 포함한 웹 문서의 아이디에 대한 식별 정보, 각 웹 문서 아이디에 대응하는 웹 문서 본문에 대한 정보를 재설정한 후에, 해당 재설정된 정보들을 데이터베이스(140)에 업그레이드시켜 줌으로써, 사용자의 검색 의도를 보다 잘 반영해 줄 수 있는 상태를 계속적으로 유지시켜 줄 수 있다.Subsequently, the web document including information on the suggestion keyword for reflecting the user's search intention corresponding to each query of users, information on the ID of the web document corresponding to each query of users, and the recommendation keyword corresponding to each query of users When the identification information on the ID and the information on the web document body corresponding to each web document ID are deleted, changed, or modified, the deleted, changed or modified information is upgraded in the database 140. Alternatively, periodically repeating the above-described steps S201 to S205, the information on the recommended keyword for reflecting the user search intention corresponding to each query of the user, the information on the ID of the web document corresponding to each query of the user, the user After resetting the identification information of the web document ID including the recommended keyword corresponding to each query of the user, the information on the web document body corresponding to each web document ID, by upgrading the reset information in the database 140 In addition, it can maintain a state that can better reflect the user's search intent.

상술한 바와 같은 데이터베이스(140)를 구성함으로써, 한 개의 텀(Term)으로 구성된 쿼리에 대해서 사용자의 검색 의도에 맞는 웹 문서를 보다 빠르고 정확하게 검색할 수 있다는 이점이 있다.By constructing the database 140 as described above, there is an advantage that a web document suitable for a user's search intention can be searched more quickly and accurately with respect to a query composed of one term.

그런 후에, 사용자가 실제로 특정 키워드에 대한 검색을 수행하기 위해서, 사용자 단말기(110)를 이용하여 유무선 통신망(120)을 통해 검색 사이트에 접속한 후에, 해당 검색 사이트에서 제공하는 검색 창에 특정 키워드를 검색하기 위한 쿼 리를 입력하고 검색 명령을 선택하게 된다.Then, in order to actually perform a search for a specific keyword, the user accesses the search site through the wired / wireless communication network 120 using the user terminal 110, and then inputs the specific keyword in the search window provided by the search site. You enter a query to search and select the search command.

이에, 사용자 단말기(110)는 사용자의 검색 명령에 따라 사용자가 입력한 쿼리를 포함한 검색 명령 메시지를 생성시켜 유무선 통신망(120)을 통해 검색 사이트로 전송하게 된다.Accordingly, the user terminal 110 generates a search command message including a query input by the user according to the search command of the user and transmits the search command message to the search site through the wired / wireless communication network 120.

이에 따라, 검색 사이트의 검색 엔진(130)은 사용자 단말기(110)로부터 유무선 통신망(120)을 통해 전달되는 검색 명령 메시지를 수신받는다. 이때, 검색 엔진(130)에 구비된 쿼리 브로커(131)는 사용자 단말기(110)로부터 유무선 통신망(120)을 통해 사용자가 입력한 쿼리를 수신받아(S206) 해당 수신받은 쿼리에 대한 정보를 검색 엔진(130)에 구비된 서처(132)로 인가한다.Accordingly, the search engine 130 of the search site receives a search command message transmitted from the user terminal 110 through the wired / wireless communication network 120. In this case, the query broker 131 provided in the search engine 130 receives a query input by the user through the wired / wireless communication network 120 from the user terminal 110 (S206) and retrieves information on the received query. The searcher 132 provided at 130 is applied.

그러면, 서처(132)는 쿼리 브로커(131)로부터 사용자가 입력한 쿼리에 대한 정보를 인가받아, 상술한 단계 S204에서 설정해 둔 역색인 구조를 이용하여 색인 작업을 수행한다. 이때, 서처(132)는 사용자가 입력한 쿼리를 포함하고 있는 웹 문서들을 검색하고, 해당 검색된 웹 문서들 내에 추천 키워드가 포함되어 있는지를 식별하고, 해당 추천 키워드가 포함된 웹 문서들에 대해 점수 가중치를 준 후에, 해당 검색된 웹 문서들에 대한 점수화를 수행하며, 해당 점수화에 따른 각 웹 문서의 아이디 리스트를 생성시켜 이에 대한 정보를 쿼리 브로커(131)에게 인가한다.Then, the searcher 132 receives information about the query input by the user from the query broker 131 and performs indexing using the inverted index structure set in step S204. At this time, the searcher 132 searches the web documents including the query input by the user, identifies whether the keyword is included in the searched web documents, and scores the web documents including the suggested keyword. After the weight is given, the searched web documents are scored, and a list of IDs of each web document according to the score is generated to apply the information to the query broker 131.

보다 상세히 설명하면, 서처(132)는 사용자가 입력한 쿼리를 쿼리 브로커(131)로부터 인가받아 사용자가 입력한 쿼리를 포함하고 있는 각 웹 문서의 아이디(즉, 사용자가 입력한 쿼리에 대응하는 웹 문서 아이디)를 데이터베이스(140)의 역색인 구조로부터 판독(즉, 검색)한다(S207).In more detail, the searcher 132 receives the query input by the user from the query broker 131 and receives the ID (ie, the web corresponding to the query input by the user) of each web document including the query input by the user. The document ID) is read (that is, retrieved) from the inverted index structure of the database 140 (S207).

이에, 서처(132)는 사용자가 입력한 쿼리를 포함하고 있는 각 웹 문서에 대해서 추천 키워드를 포함하고 있는지의 여부를 데이터베이스(140)의 역색인 구조(즉, 식별 정보)로부터 판독한다(S208). 즉, 서처(132)는 데이터베이스(140) 내에 기 설정되어 있는 식별 정보(즉, 사용자들의 각 쿼리에 대응하는 추천 키워드를 포함한 웹 문서의 아이디에 대한 식별 정보)를 판독하여, 사용자가 입력한 쿼리에 대응하는 추천 키워드가 사용자가 입력한 쿼리를 포함한 웹 문서 내에 포함되어 있는지를 식별한다(S209). 예를 들어, 서처(132)는 데이터베이스(140) 내에 기 설정되어 있는 플래그 비트가 '1'로 설정되어 있는지를 확인한다.Accordingly, the searcher 132 reads from the inverted index structure (ie, identification information) of the database 140 whether or not it includes a recommended keyword for each web document including the query input by the user (S208). . That is, the searcher 132 reads identification information set in the database 140 (that is, identification information of an ID of a web document including a recommendation keyword corresponding to each query of users) and inputs a query input by the user. It is identified whether the recommended keyword corresponding to is included in the web document including the query input by the user (S209). For example, the searcher 132 checks whether a flag bit preset in the database 140 is set to '1'.

상술한 단계 S209에서 사용자가 입력한 쿼리에 대응하는 추천 키워드가 사용자가 입력한 쿼리를 포함한 웹 문서 내에 포함되어 있는 경우, 서처(132)는 사용자가 입력한 쿼리에 대응하는 추천 키워드가 포함되어 있는 웹 문서의 아이디에 대하여 각각 점수 가중치를 더 추가해 준다(S210).In the above-described step S209, if the keyword corresponding to the query input by the user is included in the web document including the query input by the user, the searcher 132 includes the keyword recommended corresponding to the query input by the user. For each ID of the web document, a score weight is further added (S210).

이에 따라, 서처(132)는 사용자가 입력한 쿼리와의 매칭 정도, 페이지 랭킹, 가중치 적용 여부 등을 통해 사용자가 입력한 쿼리를 포함한 각 웹 문서 아이디에 대하여 점수화를 수행한다. 또는, 서처(132)는 상술한 단계 S208에서 판독한 각 웹 문서 아이디에 대하여 순위를 매긴다(S211).Accordingly, the searcher 132 scores each web document ID including the query input by the user through a degree of matching with the query input by the user, page ranking, weight application, and the like. Alternatively, the searcher 132 ranks each web document ID read in step S208 described above (S211).

다르게는, 서처(132)는 사용자가 입력한 쿼리와의 매칭 정도, 페이지 랭킹 등을 통해 상술한 단계 S208에서 판독한 각 웹 문서 아이디에 대해서 점수화를 수행하거나 순위를 매긴다. 이때, 서처(132)는 사용자가 입력한 쿼리에 대응하는 추천 키워드를 포함한 웹 문서 아이디에 대한 식별 정보를 데이터베이스(140)로부터 확인하여, 사용자가 입력한 쿼리에 대응하는 추천 키워드가 사용자가 입력한 쿼리를 포함한 웹 문서 내에 포함되어 있는지를 식별한다.Alternatively, the searcher 132 scores or ranks each web document ID read in step S208 described above through a degree of matching with a user input query, page ranking, and the like. At this time, the searcher 132 checks the identification information of the web document ID including the recommendation keyword corresponding to the query input by the user from the database 140, and the user inputs the recommendation keyword corresponding to the query input by the user. Identifies whether it is contained within the web document that contains the query.

이에 따라, 사용자가 입력한 쿼리에 대응하는 추천 키워드가 사용자가 입력한 쿼리를 포함한 웹 문서 내에 포함되어 있는 경우, 서처(132)는 사용자가 입력한 쿼리에 대응하는 추천 키워드가 포함되어 있는 웹 문서 아이디에 대하여 각각 점수 가중치를 더 주어, 상술한 단계 S208에서 판독한 각 웹 문서 아이디에 매긴 점수를 재조정하거나 순위를 다시 매긴다.Accordingly, when the suggestion keyword corresponding to the query input by the user is included in the web document including the query input by the user, the searcher 132 may include the web document including the suggestion keyword corresponding to the query input by the user. The scores are further given to the IDs to re-adjust or rerank the scores assigned to the respective web document IDs read in step S208.

그런 다음에, 서처(132)는 점수가 높은 순서대로 소정 개수(예를 들어, 20,000개)의 웹 문서 아이디를 선택하여 리스트로 생성시켜 준다. 또는, 서처(132)는 상위 순위의 웹 문서 아이디를 소정 개수(예를 들어, 20,000개)만큼 선택하여 리스트로 생성시킨 후에(S212), 해당 웹 문서 아이디의 리스트에 대한 정보를 쿼리 브로커(131)로 인가해 준다.Then, the searcher 132 selects a predetermined number (for example, 20,000) of web document IDs in order of high score and generates the list. Alternatively, the searcher 132 selects a predetermined number of web document IDs of a higher rank (for example, 20,000) to generate a list (S212), and then queries the query broker 131 for information on the list of the corresponding web document IDs. ) Is authorized.

그러면, 쿼리 브로커(131)는 서처(132)로부터 웹 문서 아이디의 리스트에 대한 정보를 인가받아 검색 엔진(130)에 구비된 써머리(133)에게 인가한다.Then, the query broker 131 receives information about the list of web document IDs from the searcher 132 and applies the summary information to the summary 133 provided in the search engine 130.

이에, 써머리(133)는 쿼리 브로커(131)로부터 웹 문서 아이디의 리스트에 대한 정보를 인가받아 서처(132)의 검색 결과(즉, 검색된 각 웹 문서)에 대한 요약 정보를 생성하여 쿼리 브로커(131)로 인가한다.Accordingly, the summary 133 receives the information on the list of web document IDs from the query broker 131 and generates summary information on the search result of the searcher 132 (that is, each searched web document) to generate the query broker 131. Is applied.

다시 말해서, 써머리(133)는 쿼리 브로커(131)로부터 웹 문서 아이디 리스트를 인가받고, 해당 인가받은 웹 문서 아이디 리스트의 각 웹 문서 아이디를 이용하여 데이터베이스(140)로부터 각 웹 문서 아이디에 대응하는 웹 문서의 본문에 대한 정보를 판독(즉, 검색)한 후에(S213), 해당 인가받은 웹 문서 아이디 리스트에 따라 사용자가 입력한 쿼리를 포함하고 있는 각 웹 문서의 요약 정보를 생성시켜 주며(S214), 해당 생성된 각 웹 문서의 요약 정보를 쿼리 브로커(131)로 인가해 준다.In other words, the summary 133 receives a web document ID list from the query broker 131 and uses a web document ID of the authorized web document ID list to correspond to each web document ID from the database 140. After reading (ie, retrieving) information about the body of the document (S213), a summary information of each web document including the query entered by the user is generated according to the authorized web document ID list (S214). Then, the summary information of each generated web document is authorized to the query broker 131.

그러면, 쿼리 브로커(131)는 써머리(133)로부터 각 웹 문서의 요약 정보를 인가받아 검색 결과로 유무선 통신망(120)을 통해 사용자 단말기(110)로 전송한다(S215). 이에, 사용자 단말기(110)는 검색 엔진(130)으로부터 수신되는 검색 결과를 사용자에게 제공해 줌으로써, 사용자가 진정으로 원하는 키워드들과 상관관계가 매우 높은 웹 문서들을 보다 빠르고 정확하게 보여 줄 수 있다.Then, the query broker 131 receives the summary information of each web document from the summary 133 and transmits it to the user terminal 110 through the wired / wireless communication network 120 as a search result (S215). Accordingly, the user terminal 110 may provide a user with a search result received from the search engine 130, thereby quickly and accurately displaying web documents highly correlated with keywords that the user really wants.

이상, 본 발명의 실시 예는 검색 엔진에서 사용자의 집단지성을 이용하여 사용자들이 입력하는 검색어에 대해서 사용자의 검색 의도를 잘 반영해 줄 수 있는 추천 단어를 데이터베이스화하고, 색인 작업 시에 사용자가 입력할 수 있는 검색어에 대한 형태소 분석을 통해 해당 키워드가 포함된 문서를 색인하고 해당 색인된 문서들 중에서 추천 단어를 포함한 문서를 색인하는 역색인 구조를 형성한 후에, 검색어 입력 시에 검색어에 상응하는 추천 단어를 판독하고, 색인 시에 해당 검색어를 포함한 문서 아이디를 서치하고, 해당 서치된 문서 아이디 중에서 추천 단어를 포함한 문서 아이디를 서치하여 해당 서치된 문서 아이디에 점수 가중치를 준 후에, 해당 문서들에 대한 점수화 및 아이디 리스트를 생성시키고, 검색 결과에 대한 요약 정보 생성 시에 아이디 리스트에 따라 요약 정보를 생성하여 제공하도록 함으로써, 사용자가 진정으로 원하는 키워드들과 상관관계가 높은 웹 문서를 보다 빠르고 정확하게 보여 줄 수 있도록 한 키워드 검색 방법에 대해서 설명하였다.In the above-described embodiment of the present invention, a search engine may use a user's collective intelligence to search a user's input search word for a database that suggests a user's search intention and input the user's input when indexing. Stemming the search terms that can be used to index documents that contain the keyword and form an inverted index structure that indexes documents containing the suggested words among the indexed documents. After reading the words, searching the document IDs containing the search terms in the index, searching the document IDs containing the suggested words among the searched document IDs, and giving score weights to the searched document IDs. When generating scoring and ID list, and generating summary information about search results By describing and providing summary information according to the ID list, the keyword retrieval method is described so that a user can quickly and accurately display a web document highly correlated with the desired keywords.

그러나 본 발명의 실시 예는 이상에서 설명한 장치 및/또는 방법을 통해서만 구현이 되는 것은 아니며, 본 발명의 실시 예의 구성에 대응하는 기능을 실현하기 위한 프로그램, 그 프로그램이 기록된 기록 매체 등을 통해 구현될 수도 있으며, 이러한 구현은 앞서 설명한 실시 예의 기재로부터 본 발명이 속하는 기술 분야의 전문가라면 쉽게 구현할 수 있는 것이다.However, embodiments of the present invention are not implemented only through the above-described apparatus and / or method, but implemented through a program for realizing a function corresponding to the configuration of the embodiment of the present invention, a recording medium on which the program is recorded, and the like. Such an implementation may be easily implemented by those skilled in the art to which the present disclosure pertains based on the description of the above-described embodiments.

이상에서 본 발명의 실시 예에 대하여 상세하게 설명하였지만 본 발명의 권리범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 발명의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 발명의 권리범위에 속하는 것이다.Although the embodiments of the present invention have been described in detail above, the scope of the present invention is not limited thereto, and various modifications and improvements of those skilled in the art using the basic concepts of the present invention defined in the following claims are also provided. It belongs to the scope of rights.

도 1은 본 발명의 실시 예에 따른 키워드 검색을 위한 시스템을 나타낸 개략적인 블록도이다.1 is a schematic block diagram illustrating a system for keyword search according to an embodiment of the present invention.

도 2는 본 발명의 실시 예에 따른 키워드 검색 방법을 나타낸 순서도이다.2 is a flowchart illustrating a keyword search method according to an exemplary embodiment of the present invention.

Claims

In a way for a search engine to search for a keyword entered by a user,

When receiving a keyword input by the user, the web document including the keyword is searched, and whether the recommendation keyword corresponding to the keyword is included in the web document, Confirming using identification information indicating whether the recommended keyword is included in each web document including the keyword,

Score weights are added to the web document including the recommendation keyword, scores are added according to whether the weight is added, matching information between the web document and the keyword, or page ranking, and a predetermined number of points are ordered in the order of high scores. Generating a list of web documents, and

Generating summary information of a web document included in the list and providing the summary information to a user;

Keyword search method comprising a.

The method of claim 1,

Setting a database of recommended keywords corresponding to the keyword to be searched by using the collective intelligence of the user and making a database;

Setting information about an ID of each web document including the keyword to be searched;

Setting identification information indicating whether the recommended keyword is included in each web document including the keyword to be searched,

Databaseting the information on the ID and identification information of each of the set web documents; and

Setting information about a body of a web document corresponding to the web document ID and making a database;

Keyword search method further comprising.

The method of claim 2,

The identification information,

And a flag bit for giving a score weight to a web document including the recommendation keyword among the web documents including the keyword to be searched.

The method of claim 2,

Periodically resetting the information on the recommended keywords corresponding to the keyword to be searched, each web document ID and identification information including the keyword to be searched, and the information on the web document body corresponding to the web document ID. Keyword search method that includes more.

The method of claim 2,

The step of checking whether the suggested keyword is included,

Retrieving the ID of each web document containing the keyword entered by the user from the database, and

Reading identification information for each web document ID including a keyword entered by the user from the database

Keyword search method comprising a.

The method of claim 5,

Generating the list of the predetermined number of web documents,

Checking, from the identification information, whether or not the recommended keyword corresponding to the keyword entered by the user is included in each web document including the keyword entered by the user;

If the suggested keyword is included in each web document including a keyword input by a user, adding a score weight to each web document including the recommended keyword,

Scoring each web document including a keyword input by the user according to whether the weight is added, matching degree with the keyword input by the user, or page ranking; and

Generating a predetermined number of web document IDs in a list in order of increasing score;

Keyword search method comprising a.

The method of claim 5,

Generating the list of the predetermined number of web documents,

Scoring each web document including a keyword input by the user according to the degree of matching with the keyword input by the user or page ranking;

When the suggestion keyword is included in each web document including the keyword input by the user, each web document including the keyword input by the user is applied by applying a weight to the score of each web document including the recommendation keyword. To rescale the score, and

Generating a predetermined number of web document IDs in a list in order of increasing the readjusted scores;

Keyword search method comprising a.

In a method for searching a keyword entered by a user,

Using the group intelligence of the user to set and database the recommended keywords corresponding to the keyword to be searched by the user,

Setting information about the ID of each web document that contains a keyword to be searched by the user,

Setting identification information indicating whether the recommended keyword is included in each web document including a keyword to be searched by a user,

Databaseting the information on the ID and identification information of each of the set web documents;

Retrieving each web document ID and identification information including the keyword entered by the user from the database when the user searches for the keyword,

If the recommendation keyword is included in each web document including a keyword input by a user, adding a score weight to each web document including the recommendation keyword;

Scoring each web document including a keyword input by the user according to whether the weight is added, matching degree with the keyword input by the user, or page ranking;

Generating a predetermined number of web document IDs in a list in order of increasing score; and

Generating summary information of each web document including a keyword input by the user according to the generated web document ID list and providing the summary information to the user;

Keyword search method comprising a.