KR19990048714A

KR19990048714A - How to Determine Priority of Similar Documents in Internet Information Retrieval

Info

Publication number: KR19990048714A
Application number: KR1019970067480A
Authority: KR
Inventors: 승현석
Original assignee: 윤종용; 삼성전자 주식회사
Priority date: 1997-12-10
Filing date: 1997-12-10
Publication date: 1999-07-05

Abstract

가.청구범위에 기재된 발명이 속한 기술분야The technical field to which the invention described in the claims belongs.

인터넷 정보검색시 유사문서의 우선순위를 판별하는 방법에 관한 것이다.The present invention relates to a method for determining the priority of similar documents in Internet information retrieval.

나.발명이 해결하려고 하는 기술적 과제B. Technical problem to be solved

보다 대표적이고 입력된 키워드와 관련성이 있는 문서를 우선하여 웹 브라우저화면에 디스플레이시키는 방법을 제공한다.It provides a method of displaying a more representative and related document prior to displaying on a web browser screen.

다.발명의 해결방법의 요지C. Summary of the Solution

일반적으로 하위 디렉토리에 있는 즉, 디렉토리가 깊은 문서일수록 세부적인 내용의 문서이고, 디렉토리가 상위에 있는 문서일수록 서론 혹은 앞부분일 경우가 많은 점을 이용하여 디렉토리 계층에 따라 보너스 점수를 부여하여 보다 우선적으로 디스플레이시킴으로써 보다 대표적이고 관련성 높은 문서를 우선하여 디스플레이시킴을 특징으로 한다.In general, the deeper the document in the subdirectory, that is, the deeper the document, the higher the document in the directory. By displaying, a more representative and relevant document is preferentially displayed.

라.발명의 중요한 용도D. Significant Uses of the Invention

인터넷 정보검색시 이용한다.Used for internet information search.

Description

How to Determine Priority of Similar Documents in Internet Information Retrieval

본 발명은 인터넷 정보검색 방법에 관한 것으로, 특히 정보검색시 유사문서의 우선순위를 판별하는 방법에 관한 것이다.The present invention relates to an Internet information retrieval method, and more particularly, to a method of determining the priority of similar documents during information retrieval.

일반적으로 인터넷에서는 방대한 양의 정보 검색을 위해 다양한 정보 검색 엔진을 제공하고 있으며 사용자들은 검색하고자 하는 키워드(Keyword)만 입력시키면 상기 정보 검색 엔진이 세계도처에 존재하는 데이터 베이스로부터 상기 키워드에 해당하는 문서들을 검색해서 사용자의 웹 브라우저(Web Browser) 화면에 상기 검색된 문서들을 디스플레이시켜준다.In general, the Internet provides a variety of information search engines for searching a large amount of information, and users only need to enter a keyword to search, and the information search engine is a document corresponding to the keyword from a database around the world. These documents are then displayed and displayed on the user's web browser screen.

도 1은 상기한 내용의 인터넷에서의 통상적인 정보검색 시스템의 개략적인 구성도를 도시한 것이다. 상기 도 1을 참조하여 인터넷에서의 정보를 검색하는 과정을 좀더 상세히 설명하면 먼저 사용자(100)는 웹 브라우저(102)를 통해 검색엔진(104)에 접속하고 원하는 정보에 해당하는 키워드를 입력시킨다. 그러면 검색엔진(104)은 세계도처에 깔려있는 인터넷에 연결된 사이트들(106,108,110,112)을 검색하여 상기 사용자(100)가 입력한 키워드에 해당하는 자료를 수집하여 다시 웹 브라우저(102) 화면에 상기 수집한 문서들을 디스플레이시킨다. 이에 따라 사용자(100)는 상기 디스플레이되는 문서들 중에서 자신이 원하는 내용과 가장 관련있는 내용을 선택해서 보게된다.Fig. 1 shows a schematic configuration diagram of a conventional information retrieval system on the Internet with the above contents. Referring to FIG. 1, the process of searching for information on the Internet will be described in more detail. First, the user 100 accesses the search engine 104 through a web browser 102 and inputs a keyword corresponding to desired information. Then, the search engine 104 searches for sites 106, 108, 110, and 112 connected to the Internet, which are spread all over the world, collects data corresponding to the keyword inputted by the user 100, and collects the collected data on the web browser 102 again. Display the documents. Accordingly, the user 100 selects and displays the most relevant content from among the displayed documents.

그리고 검색엔진은 인터넷을 통해 검색한 문서들을 웹 브라우저 화면에 디스플레이함에 있어서 사용자가 원하는 정보와 가장 관련이 높은 문서순서대로 디스플레이시킴으로써 사용자가 좀더 상기 디스플레이된 문서중에서도 자신이 원하는 정보에 보다 빠르게 접근할 수 있도록 하는 서비스를 제공해야 한다.In addition, the search engine displays documents retrieved through the Internet in the order of the documents most relevant to the information desired by the user in the web browser screen, so that the user can access the information he / she wants more quickly among the displayed documents. Provide services to ensure that

그런데 상기한 관련성이 높은 문서의 우선순위 디스플레이를 위해 종래의 검색엔진들에서 이용되는 해당 키워드가 문서내에 몇번 명시되어 있는지에 따라 점수를 부여하는 키워드 빈도수를 체크방법이나 해당 키워드가 문서내에 어느 위치에 있는지에 따라 점수를 부여하는 키워드 위치 체크방법은 문서 상호간에 우선순위를 결정하기 어렵기 때문에 사용자의 요구에 맞는 정보에 대한 문서를 만족할만하게 우선순위로 디스플레이하지는 못했다. 따라서 사용자는 자신들이 스스로 알아서 관련이 높을 것 같은 문서를 디스플레이된 순서에 관계없이 검색해야 함으로써 불편한 문제점이 있었다.However, the method of checking the keyword frequency for assigning a score according to how many times the corresponding keyword used in the conventional search engines is specified in the document for displaying the priority of the relevant document, or where the keyword is placed in the document The keyword position checking method, which gives a score according to whether or not it is difficult to determine the priority among documents, did not display the document with information that satisfies the user's needs with priority. Therefore, the user has an inconvenience in that he or she should search for documents that are likely to be related by themselves regardless of the displayed order.

상술한 바와 같이 종래의 검색엔진들에서 사용되는 키워드 빈도수 체크방법이나 키워드 위치 체크방법 등은 문서 상호간에 우선순위를 결정하기 어렵기 때문에 검색된 문서들을 사용자의 요구 맞는 관련이 높은 문서순으로 디스플레이시키지 못했다. 따라서 사용자들은 스스로 알아서 관련이 높을 것 같은 문서를 검색해야하는 불편함이 있었다.As described above, the keyword frequency checking method and the keyword location checking method used in the conventional search engines do not display the searched documents in the order of high relevance according to the user's requirements because it is difficult to determine the priority among documents. . As a result, users were inconvenient to search for documents that seem to be related to themselves.

따라서 본 발명의 목적은 검색엔진들이 좀더 사용자의 요구와 관련이 높은 문서들을 우선순위로 먼저 디스플레이시킬 수 있도록 하는 방법을 제공함에 있다.It is therefore an object of the present invention to provide a method that enables search engines to first display documents that are more relevant to a user's needs in priority.

도 1은 통상적인 인터넷 정보검색 시스템의 개략적인 구성도,1 is a schematic configuration diagram of a conventional Internet information retrieval system;

도 2는 본 발명의 실시 예에 따른 유사문서의 우선순위를 판별하는 처리흐름도.2 is a flow chart for determining the priority of similar documents according to an embodiment of the present invention.

상술한 목적을 달성하기 위한 본 발명은 일반적으로 하위 디렉토리에 있는 즉, 디렉토리가 깊은 문서일수록 세부적인 내용의 문서이고 디렉토리가 상위에 있는 문서일수록 서론 혹은 앞부분일 경우가 많은 점을 이용하여 디렉토리 계층에 따라 보너스 점수를 부여하여 보다 우선적으로 디스플레이시킴으로써 보다 대표적이고 관련 문서를 우선하여 디스플레이시킴을 특징으로 한다.In order to achieve the above object, the present invention generally uses sub-directories, that is, documents that have deeper contents as documents in a deeper directory, and documents that have higher contents as directories in the upper part of the directory. According to the present invention, a more representative and related document is displayed first by giving a bonus score to display more preferentially.

이하 본 발명의 바람직한 실시 예를 첨부한 도면을 참조하여 상세히 설명한다. 하기 설명 및 첨부 도면에서 구체적인 처리 흐름과 같은 많은 특정 상세들이 본 발명의 보다 전반적인 이해를 제공하기 위해 나타나 있다. 이들 특정 상세들없이 본 발명이 실시될 수 있다는 것은 이 기술분야에서 통상의 지식을 가진 자에게 자명할 것이다. 또한 본 발명의 요지를 불필요하게 흐릴 수 있는 공지 기능 및 구성에 대한 상세한 설명은 생략한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. Many specific details are set forth in the following description and in the accompanying drawings, in order to provide a more thorough understanding of the present invention. It will be apparent to those skilled in the art that the present invention may be practiced without these specific details. In addition, detailed descriptions of well-known functions and configurations that may unnecessarily obscure the subject matter of the present invention will be omitted.

도 2는 본 발명의 실시 예에 따른 보다 관련성이 높은 문서를 우선하여 디스플레이시키는 처리흐름도를 도시한 것이다. 이제 상기 도 1 및 도 2를 참조하여 본 발명의 실시 예를 상세히 설명한다.2 is a flowchart illustrating a process of first displaying a more relevant document according to an exemplary embodiment of the present invention. An embodiment of the present invention will now be described in detail with reference to FIGS. 1 and 2.

먼저 검색엔진(104)은 도 2의 (200)단계에서 사용자(100)로부터 키워드를 입력받으면 (202)단계로 진행하여 상기 입력받은 키워드에 따른 관련된 문서를 세계도처에 연결된 각 사이트들(106,108,110,112)의 데이터 색인으로부터 검색한다. 이어 검색엔진(104)은 (204)단계로 진행하여 상기 (202)단계에서 검색된 관련 문서들에 포함된 키워드 빈도수를 체크하여 많은 수의 키워드를 가진 문서에 높은 점수를 부여한다. 상기 높은 점수를 부여한다 함은 검색된 문서를 웹 브라우저(102) 화면에 디스플레이시 우선하여 디스플레이한다는 것을 의미한다. 즉, 우선하여 디스플레이시킨다는 것은 관련성이 높은 문서임을 의미하는 것이다. 이어 검색엔진(104)은 (206)단계로 진행하여 상기 검색된 관련 문서중 사용자(100)가 입력한 키워드의 위치를 체크하여 키워드가 보다 문서 앞쪽에 존재하는 문서에 높은 점수를 부여한다. 검색엔진(104)은 (208)단계로 진행하여 상기 검색된 문서들이 디렉토리상 어디에 위치해있는지를 검사하여 즉, 상대적으로 하위 디렉토리 또는 상위 디렉토리에 위치해 있는지를 검사하여 상위 디렉토리에 위치한 문서에 높은 점수를 부여한다. 상기 상위 디레토리상에 위치한 문서에 대해 높은 점수를 부여한다는 것은 예를 들어 아래와 같은First, when the search engine 104 receives a keyword from the user 100 in step 200 of FIG. 2, the search engine 104 proceeds to step 202, where respective sites 106, 108, 110, and 112 are connected to the related documents according to the received keyword around the world. Retrieve from the data index of. Next, the search engine 104 proceeds to step 204 and checks the keyword frequency included in the related documents searched for in step 202 to give a high score to a document having a large number of keywords. Granting the high score means that the retrieved document is displayed first when displayed on the web browser 102 screen. In other words, displaying first means that the document is highly relevant. The search engine 104 proceeds to step 206 to check the position of the keyword input by the user 100 among the searched related documents, and gives a high score to the document in which the keyword exists in front of the document. The search engine 104 proceeds to step 208 to check where the searched documents are located in the directory, i.e., whether the documents are located in a relatively lower directory or a higher directory to give a higher score to the documents located in the upper directory. do. For example, giving a high score to a document located in the upper directory is as follows.

1. http://www.samsung/news/1997/olympic.html,1. http://www.samsung/news/1997/olympic.html,

2. http://www.samsung/news/index.html,2. http://www.samsung/news/index.html,

3. http://www.samsung3. http://www.samsung

세가지의 문서가 "samsung"이라는 키워드를 입력시켰을 때 검색되었다고 하면 가장 상위 디렉토리에 존재하는 문서 3번이 가장 관련성이 높다고 판단하는 것을 의미한다. 상기와 같이 판단하는 이유는 전술한 바와 같이 가장 상위 디렉토리에 존재하는 문서가 가장 대표되는 문서이기 때문이다. 이와 같이 상기 (204)∼(208)단계까지 키워드 빈도수, 키워드 위치, 키워드가 속한 문서의 디렉토리상 위치에 따라 점수를 부여한 검색엔진(104)은 (210)단계로 진행하여 상기 결과를 종합하여 가장 관련성이 높은 즉 가장 많은 점수를 부여받은 문서부터 우선하여 차례로 웹 브라우저(102) 화면에 디스플레이시킨다.When three documents are searched when the keyword "samsung" is entered, it means that document 3 in the uppermost directory is determined to be the most relevant. The reason for the determination as described above is that, as described above, the document existing in the highest directory is the most representative document. In this way, the search engine 104 that scores points according to the keyword frequency, the keyword position, and the position on the directory of the document to which the keyword belongs in steps 204 through 208 proceeds to step 210 to synthesize the results. The most relevant documents, that is, the documents with the highest scores, are first displayed on the web browser 102 screen in order.

따라서 사용자는 웹 브라우저 화면을 통해 자신이 입력한 키워드에 대한 보다 대표적이고 보다 관련성이 높은 문서를 우선하여 차례로 볼 수 있다.Therefore, the user may preferentially view more representative and more relevant documents for the keyword input by the user through the web browser screen.

상술한 바와 같이 본 발명은 검색엔진이 검색된 문서를 디스플레이시키는 순서에 있어서 보다 상위 디렉토리에 있는 문서를 우선하여 디스플레이시킴으로써 사용자는 웹 브라우저 화면을 통해 자신이 입력한 키워드에 대한 보다 대표적이고 보다 관련성이 높은 문서를 우선하여 볼 수 있는 이점이 있다.As described above, the present invention preferentially displays the documents in the upper directory in the order in which the search engine displays the searched documents, thereby allowing the user to display more representative and more relevant to the keywords entered by the user through the web browser screen. This has the advantage of giving priority to the document.

Claims

In the method of determining similar documents when searching information on the Internet,

Receiving a keyword from a user, searching for related documents according to the keyword from a data index of each site connected to the Internet;

A second process of checking a frequency of the keyword among the searched related documents and assigning a high score to a document having a high frequency of the keyword;

A third process of checking a position of the keyword among the searched documents to which a score is given according to the frequency of the keyword and giving a high score to a document in which the keyword is located in front;

A fourth process of assigning a high score to a document existing in a higher directory among the searched documents, which is scored according to the position in the document of the keyword;

And a fifth process of displaying the retrieved documents on the web browser screen in the order of the highest scores given in the second, third, and fourth processes. Way.