KR100421938B1

KR100421938B1 - Method of measure for retrieval efficiency in computing system

Info

Publication number: KR100421938B1
Application number: KR10-2000-0058765A
Authority: KR
Inventors: 김회율; 서창덕
Original assignee: 김회율
Priority date: 2000-10-06
Filing date: 2000-10-06
Publication date: 2004-03-10
Also published as: KR20020038966A

Abstract

1. 청구범위에 기재된 발명이 속한 기술분야1. TECHNICAL FIELD OF THE INVENTION

본 발명은 컴퓨팅 시스템에서의 검색효율 평가 방법 및 그를 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체에 관한 것임.The present invention relates to a method for evaluating search efficiency in a computing system and a computer readable recording medium having recorded thereon a program for realizing the same.

2. 발명이 해결하려고 하는 기술적 과제2. The technical problem to be solved by the invention

본 발명은, 일반적인 컴퓨팅 환경하에서 검색결과에 순위가 부여되는 순위부여 시스템의 검색효율을 객관적으로 평가하여 제안된 검색 알고리즘의 우열을 보다 세밀하고 정확하게 평가하기 위한 평가 방법 및 그를 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공하고자 함.The present invention records an evaluation method for evaluating the superiority of the proposed search algorithm more precisely and accurately by objectively evaluating the search efficiency of a ranking system to which search results are ranked in a general computing environment, and a program for realizing the same. To provide a computer-readable recording medium.

3. 발명의 해결방법의 요지3. Summary of Solution to Invention

본 발명은, 데이터베이스에서 어떤 자료의 검색결과, 관련있는 것과 관련없는 것들이 어떠한 순서로 검색되었는지를 가지고 검색 알고리즘의 검색효율을 평가하는 방법에 있어서, 측정범위내에서 검색된 자료가 실제 관련있는 것인지의 여부를 나타내는 검색패턴을 입력받는 제 1 단계; 입력된 상기 검색패턴으로부터 상기 측정범위내에 존재하는 부적합 자료의 순위에 따른 거리를 누적하는 제 2 단계; 상기 측정범위내에 존재하는 모든 자료의 거리를 누적하는 제 3 단계; 및 상기 측정범위내에 존재하는 부적합 자료의 순위에 따른 누적 거리를 상기 측정범위내에 존재하는 모든 자료의 누적 거리로 나누어, 부적합 자료의 순위에 따른 누적 거리를 정규화하여 정규화된 평가척도(NDS)를 출력하는 제 4 단계를 포함함.The present invention relates to a method for evaluating the search efficiency of a search algorithm based on the order in which a search result of a material is searched in a database, and in which order irrelevant to a related item is searched, wherein whether the searched data within a measurement range is actually related. Receiving a search pattern indicating a first step; A second step of accumulating a distance according to a ranking of non-conformance data existing in the measurement range from the input search pattern; Accumulating the distances of all data existing within the measurement range; And dividing the cumulative distance according to the ranking of the non-conforming data existing in the measurement range by the cumulative distance of all the data existing in the measuring range, and normalizing the cumulative distance according to the ranking of the non-conforming data to output a normalized evaluation scale (NDS). Including the fourth step.

4. 발명의 중요한 용도4. Important uses of the invention

본 발명은 순위부여 시스템의 검색효율을 객관적으로 평가하여 제안된 검색 알고리즘의 우열을 가리는데 이용됨.The present invention is used to objectively evaluate the search efficiency of the ranking system to cover the superiority of the proposed search algorithm.

Description

Method of measure for retrieval efficiency in computing system

본 발명은 일반적인 컴퓨팅 환경하에서 데이터베이스에서 어떤 자료의 검색결과, 관련있는 것과 관련없는 것들이 어떠한 순서로 검색되었는지를 가지고 그 검색 알고리즘의 검색효율을 평가하는 평가 방법 및 상기 방법을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체에 관한 것이다.According to the present invention, an evaluation method for evaluating the search efficiency of a search algorithm and a program for realizing the search algorithm of a data in a database in a general computing environment, and in what order are irrelevant to what is related. The present invention relates to a recording medium which can be read by.

종래의 많은 검색 알고리즘들의 검색효율을 측정하기 위해서는 검색결과를 가지고 평가해야 하는데, 본 발명은 이들 검색결과에 순위가 부여되는 시스템의 검색효율을 평가하는데 사용되는 기술이다.In order to measure the search efficiency of many conventional search algorithms, the search results should be evaluated with the search results. The present invention is a technique used to evaluate the search efficiency of a system that is ranked on these search results.

종래 방법들 중 일부는 두 가지 평가값을 사용해야만 하는 경우도 있는데, 이는 단일가 척도가 아니기 때문에 두 검색 알고리즘의 효율을 비교할 때 서로 한 쪽 값이 우수하게 나오는 경우 우열을 가리기가 곤란한 문제가 있었다. 또한, 일정한 범위내에 검색된 관련 자료의 개수만을 가지고 평가하는 방법은 개략적인 평가방법밖에 되지 않는다. 즉, 같은 개수의 관련 자료가 검색되었지만 앞쪽 순위에 몰려있을 수도 있고 뒤쪽에 몰려 있을 수도 있는데, 이를 모두 같은 수치로 평가한다.Some of the conventional methods may have to use two evaluation values. Since this is not a single-value measure, there is a problem that it is difficult to cover the superiority when one value is excellent when comparing the efficiency of the two search algorithms. In addition, the method of evaluating only the number of related data retrieved within a certain range is only a rough evaluation method. In other words, the same number of related data has been retrieved, but may be in the front rank or in the back.

일반적으로 많이 알려진 평균순위는 더더욱 부정확하며, 다른 방법 역시 검색된 유형에 대해 그 효율, 즉 사용자가 보는 노력의 양을 객관적인 수치로 정량화시키는데 한계가 있었다.In general, the well-known average rank is even more inaccurate, and other methods have also limited their ability to objectively quantify the efficiency, ie, the amount of effort the user sees, for the types found.

따라서, 검색결과에 순위가 부여되는 순위부여 시스템의 검색효율을 객관적으로 평가하여 제안된 검색 알고리즘의 우열을 보다 세밀하고 정확하게 평가하기 위한 방안이 필수적으로 요구된다.Therefore, a method for more precisely and accurately evaluating the superiority of the proposed search algorithm is required by objectively evaluating the search efficiency of the ranking system to which the ranking is given to the search results.

본 발명은 상기한 바와 같은 요구에 부응하기 위하여 제안된 것으로, 일반적인 컴퓨팅 환경하에서 검색결과에 순위가 부여되는 순위부여 시스템의 검색효율을 객관적으로 평가하여 제안된 검색 알고리즘의 우열을 보다 세밀하고 정확하게 평가하기 위한 평가 방법 및 상기 방법을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공하는데 그 목적이 있다.The present invention has been proposed in order to meet the above-described needs, and evaluates the superiority of the proposed search algorithm more precisely and accurately by objectively evaluating the search efficiency of a ranking system which is ranked in search results under a general computing environment. It is an object of the present invention to provide a computer-readable recording medium recording an evaluation method for performing the same and a program for realizing the method.

도 1 은 본 발명이 적용되는 하드웨어 시스템의 구성 예시도.1 is an exemplary configuration diagram of a hardware system to which the present invention is applied.

도 2 는 본 발명에 따른 검색효율 평가 방법에 대한 일실시예 흐름도.2 is a flowchart illustrating an embodiment of a method for evaluating search efficiency according to the present invention;

도 3 은 본 발명의 실시예에 따른 평가 척도 실행 화면 예시도.3 is an exemplary view of an evaluation scale execution screen according to an embodiment of the present invention.

도 4 는 본 발명의 검색효율 평가 방법에 사용되는 다양한 거리함수 예시도.Figure 4 is an exemplary diagram of various distance functions used in the search efficiency evaluation method of the present invention.

도 5 는 본 발명의 검색효율 평가 방법중 n순위내에서 r개의 관련 자료가 검색되는 모든 경우(_nC_r)에 대한 검색패턴 자동 생성 과정에 대한 일실시예 흐름도.FIG. 5 is a flow chart of an embodiment of an automatic search pattern generation process for all cases ( _n C _r ) where r related data are searched within n ranks in the search efficiency evaluation method of the present invention. FIG.

도 6 은 본 발명의 검색효율 평가 방법중 6순위내에서 3개의 관련 자료가 검색되는 경우(₆C₃)에 대한 검색패턴 발생 순서를 나타낸 일실시예 설명도.Figure 6 is an explanatory view showing one embodiment a search pattern generating sequence from the (C ₃ ₆₎ If the three data relating to the search in the 6 rank evaluation method of the search efficiency of the present invention.

도 7 은 본 발명의 검색효율 평가 방법중 6순위내에서 3개의 관련 자료가 검색되는 경우((₆C₃)에 대한 검색패턴의 재귀적 호출 과정을 나타낸 일실시예 설명도.FIG. 7 is an exemplary diagram illustrating a recursive calling process of a search pattern for a case in which three related data are searched ( ₆ C ₃ ) within a sixth rank among the methods for evaluating the search efficiency of the present invention. FIG.

도 8 은 본 발명의 검색효율 평가 방법중 n순위내에서 r개의 관련 자료가 검색되는 모든 경우(_nC_r)에 대한 검색패턴의 재귀적 호출 과정을 나타낸 일실시예 설명도.8 is an exemplary explanatory diagram illustrating a recursive calling process of a search pattern for all cases ( _n C _r ) where r related data are searched within n ranks in the search efficiency evaluation method of the present invention.

* 도면의 주요 부분에 대한 부호의 설명* Explanation of symbols for the main parts of the drawings

11 : 중앙처리장치 12 : 주기억장치11: central processing unit 12: main memory unit

13 : 보조기억장치 14 : 입력장치13: auxiliary memory device 14: input device

15 : 출력장치 16 : 주변장치15: output device 16: peripheral device

상기 목적을 달성하기 위한 본 발명은, 데이터베이스에서 어떤 자료의 검색결과, 관련있는 것과 관련없는 것들이 어떠한 순서로 검색되었는지를 가지고 검색 알고리즘의 검색효율을 평가하는 방법에 있어서, 측정범위내에서 검색된 자료가 실제 관련있는 것인지의 여부를 나타내는 검색패턴을 입력받는 제 1 단계; 입력된 상기 검색패턴으로부터 상기 측정범위내에 존재하는 부적합 자료의 순위에 따른 거리를 누적하는 제 2 단계; 상기 측정범위내에 존재하는 모든 자료의 거리를 누적하는 제 3 단계; 및 상기 측정범위내에 존재하는 부적합 자료의 순위에 따른 누적 거리를 상기 측정범위내에 존재하는 모든 자료의 누적 거리로 나누어, 부적합 자료의 순위에 따른 누적 거리를 정규화하여 정규화된 평가척도(NDS)를 출력하는 제 4 단계를 포함하여 이루어진 것을 특징으로 한다.In order to achieve the above object, the present invention provides a method for evaluating the search efficiency of a search algorithm having a search result of a material in a database and an order in which irrelevant ones are searched. A first step of receiving a search pattern indicating whether or not it is actually related; A second step of accumulating a distance according to a ranking of non-conformance data existing in the measurement range from the input search pattern; Accumulating the distances of all data existing within the measurement range; And dividing the cumulative distance according to the ranking of the non-conforming data existing in the measurement range by the cumulative distance of all the data existing in the measuring range, and normalizing the cumulative distance according to the ranking of the non-conforming data to output a normalized evaluation scale (NDS). It is characterized by comprising a fourth step.

또한, 본 발명은 기존 평가척도와 정규화된 평가척도의 특성 차이를 분석하기 위하여, 주어진 조건하에서 발생 가능한 모든 검색패턴을 자동 생성하는 제 5 단계를 더 포함하여 이루어진 것을 특징으로 한다.In addition, the present invention is characterized in that it further comprises a fifth step of automatically generating all search patterns that can occur under a given condition in order to analyze the characteristic difference between the existing rating scale and the normalized rating scale.

상기 목적을 달성하기 위한 본 발명은, 데이터베이스에서 어떤 자료의 검색결과, 관련있는 것과 관련없는 것들이 어떠한 순서로 검색되었는지를 가지고 검색 알고리즘의 검색효율을 평가하기 위하여, 프로세서를 구비한 컴퓨팅 시스템에, 측정범위내에서 검색된 자료가 실제 관련있는 것인지의 여부를 나타내는 검색패턴을 입력받는 제 1 기능; 입력된 상기 검색패턴으로부터 상기 측정범위내에 존재하는 부적합 자료의 순위에 따른 거리를 누적하는 제 2 기능; 상기 측정범위내에 존재하는 모든 자료의 거리를 누적하는 제 3 기능; 및 상기 측정범위내에 존재하는 부적합 자료의 순위에 따른 누적 거리를 상기 측정범위내에 존재하는 모든 자료의 누적 거리로 나누어, 부적합 자료의 순위에 따른 누적 거리를 정규화하여 정규화된 평가척도(NDS)를 출력하는 제 4 기능을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공한다.In order to achieve the above object, the present invention provides a computing system with a processor for evaluating the search efficiency of a search algorithm with a search result of some material in a database and in what order the unrelated ones are searched. A first function of receiving a search pattern indicating whether the searched material within the range is actually related; A second function of accumulating a distance according to a ranking of non-conforming material existing in the measurement range from the input search pattern; A third function of accumulating distances of all data existing within the measurement range; And dividing the cumulative distance according to the ranking of the non-conforming data existing in the measurement range by the cumulative distance of all the data existing in the measuring range, and normalizing the cumulative distance according to the ranking of the non-conforming data to output a normalized evaluation scale (NDS). A computer readable recording medium having recorded thereon a program for realizing the fourth function is provided.

또한, 본 발명은 기존 평가척도와 정규화된 평가척도의 특성 차이를 분석하기 위하여, 주어진 조건하에서 발생 가능한 모든 검색패턴을 자동 생성하는 제 5 기능을 더 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공한다.In addition, the present invention provides a computer readable recording program for further realizing a fifth function of automatically generating all search patterns that can occur under a given condition, in order to analyze the characteristic differences between the existing and normalized rating scales. Provide the medium.

본 발명은 데이터베이스에서 어떤 자료의 검색결과, 관련있는 것과 관련없는 것들이 어떠한 순서로 검색되었는지를 가지고 그 검색 알고리즘의 검색효율을 평가하는 평가 방법에 관한 것이다. 이때, 검색된 자료들은 관련 혹은 부적합 자료들로 구성되어 있으며, 이들에게 부여된 순위를 가지고 검색 알고리즘의 효율을 0에서 1사이의 수치로 정량화하여 다른 알고리즘과 비교할 수 있도록 한다.The present invention relates to an evaluation method for evaluating the retrieval efficiency of a retrieval algorithm with a search result of a certain material in a database and in what order the irrelevant ones are retrieved. At this time, the searched data is composed of related or inappropriate data, and the ranking is assigned to them so that the efficiency of the search algorithm can be quantified from 0 to 1 and compared with other algorithms.

본 발명의 평가 방법에서는 임의의 검색결과를 입력받아 계산할 수도 있으며, 주어진 조건하에서 발생 가능한 검색결과 유형(검색패턴)을 모두 자동 생성시키는 검색패턴 자동 생성 과정을 내장하여 평가(즉, 기존 평가 척도와 제안한 평가 척도와의 특성 차이를 분석)에 이용하도록 하였다.In the evaluation method of the present invention, an arbitrary search result may be input and calculated, and the evaluation may be performed by embedding an automatic search pattern generation process that automatically generates all types of search results (search patterns) that can occur under a given condition. The characteristic difference from the proposed evaluation scale was used for the analysis.

상술한 목적, 특징들 및 장점은 첨부된 도면과 관련한 다음의 상세한 설명을 통하여 보다 분명해 질 것이다. 이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 일실시예를 상세히 설명한다.The above objects, features and advantages will become more apparent from the following detailed description taken in conjunction with the accompanying drawings. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1 은 본 발명이 적용되는 하드웨어 시스템의 구성 예시도이다.1 is an exemplary configuration diagram of a hardware system to which the present invention is applied.

도 1에 도시된 바와 같이, 본 발명이 적용되는 하드웨어 시스템은, 중앙처리장치(11)와, 중앙처리장치(11)에 연결된 주기억장치(12)와, 주기억장치(12)에 연결된 보조기억장치(13)와, 중앙처리장치(11)에 연결된 입출력장치(14,15) 및 주기억장치(12)에 연결된 주변장치(16)를 구비한다.As shown in FIG. 1, the hardware system to which the present invention is applied includes a central processing unit 11, a main memory device 12 connected to the central processing device 11, and an auxiliary memory device connected to the main memory device 12. 13, the input / output devices 14 and 15 connected to the central processing unit 11 and the peripheral device 16 connected to the main memory device 12 are provided.

여기서, 하드웨어 시스템은, 컴퓨터의 전체 동작을 제어하고 관리하는 중앙처리장치(11), 상기 중앙처리장치(11)에서 수행되는 프로그램을 저장하고 작업 수행중 이용되는 또는 작업 수행중에 발생되는 각종 데이터를 저장하는 주기억장치(12)와 보조기억장치(13) 및 사용자와의 데이터 입출력을 위한 입출력장치(14,15)와 통신 인터페이스 등을 위한 주변장치(16)를 포함한다.Here, the hardware system, the central processing unit 11 for controlling and managing the overall operation of the computer, the program stored in the central processing unit 11 and stores a variety of data used during or during operation And a main memory device 12 and an auxiliary memory device 13, an input / output device 14 and 15 for inputting / outputting data to and from a user, and a peripheral device 16 for a communication interface.

그리고, 상기 보조기억장치(13)는 대량의 데이터를 저장하는 역할을 하며, 상기 입출력장치(14,15)는 일반적인 키보드, 디스플레이 장치 및 프린터 등을 포함한다.The auxiliary memory device 13 stores a large amount of data, and the input / output devices 14 and 15 include a general keyboard, a display device, and a printer.

그러나, 상기한 바와 같은 구성을 갖는 컴퓨터 하드웨어 환경은 당해 분야에서 이미 주지된 기술에 지나지 아니하므로 여기에서는 그에 관한 자세한 설명은 생략하기로 한다. 다만, 상기와 같은 하드웨어 시스템의 주기억장치(12)에는 검색결과에 순위가 부여되는 순위부여 시스템의 검색효율을 객관적으로 평가하여 제안된 검색 알고리즘의 우열을 보다 세밀하고 정확하게 평가하기 위한 평가 척도 프로그램이 저장되어 있으며, 상기 중앙처리장치(11)의 제어에 따라 수행된다.However, since the computer hardware environment having the configuration as described above is only a technique well known in the art, detailed description thereof will be omitted herein. However, in the main memory device 12 of the hardware system, an evaluation scale program for objectively evaluating the search efficiency of the ranking system, which is ranked in the search results, to more precisely and accurately evaluate the superiority of the proposed search algorithm. It is stored under the control of the central processing unit (11).

이제, 본 발명의 평가 척도 프로그램에 대해 보다 상세히 설명한다.Now, the evaluation scale program of the present invention will be described in more detail.

평가 척도 프로그램은 임의의 검색결과를 입력받아 계산할 수도 있으며, 주어진 조건하에서 발생 가능한 검색결과 유형을 모두 자동 생성시키는 검색패턴 자동 생성 프로그램을 내장하여 평가에 이용한다.The evaluation scale program may be calculated by receiving an arbitrary search result. The evaluation scale program may be used for evaluation by embedding an automatic search pattern generation program that automatically generates all types of search results that can occur under a given condition.

평가 척도 프로그램은 검색결과를 바탕으로 검색효율을 평가하여 0에서 1사이의 정규화된 수치로 정량화시키는 방법으로, 이를 구현하기 위해서는 검색결과를 질의자가 볼 때 드는 노력으로 정량화시켜야 한다.The evaluation scale program evaluates the search efficiency based on the search results and quantifies it to a normalized value between 0 and 1. To implement this, the search results must be quantified by the effort required by the queryer.

원하는 만큼의 관련자료를 보기 위해 질의자는 일정 순위까지 화면을 보아야 하는데, 여기에는 관련자료 뿐만아니라 부적합 자료도 함께 포함되어 있다. 이때, 관련자료를 보는데 드는 노력은 당연한 것이므로 부적합 자료를 보는데 드는 노력을 최소화시킬 수 있는 시스템의 검색효율이 좋은 것이다.In order to see as much relevant data as desired, the queryer must view the screen up to a certain rank, which includes not only relevant data but also nonconforming data. At this time, since the effort to view the related data is natural, the search efficiency of the system that can minimize the effort to view the non-conforming data is good.

측정기준은 관련 자료는 가급적 많이, 부적합 자료는 가급적 적게, 평균 순위는 낮을수록, 또 몰려있을수록 좋은 것이므로, 이들을 모두 반영할 수 있어야 한다. 측정범위내 관련자료가 많이 검색될수록 부적합 자료는 덜 검색되게 되는데, 하나의 부적합 자료를 보아야만 하는 질의자의 노력은 앞쪽에 위치할수록 크게 된다. 그것은 부적합 자료가 앞쪽에 위치할수록 상대적으로 관련 자료는 뒤쪽에 위치하게 되며, 이는 질의자의 보는 노력이 부적합 자료로 인해 증가되기 때문이다.The metrics should be as good as possible, as much data as possible, as little data as possible, as low as possible, and as high as possible. The more relevant data in the measurement range is retrieved, the less relevant data is retrieved, and the inquirer's effort to see a single non-conforming data becomes larger the earlier it is located. This is because the more nonconforming data is located in the front, the more relevant data is located in the back, because the viewing effort of the interrogator increases due to the nonconforming data.

본 발명의 평가 척도 프로그램은 측정범위내에 존재하는 부적합 자료의 순위를 바탕으로 거리개념을 도입하여 이들의 합을 구하고 정규화하여 0에서 1사이의 값으로 정량화한다. 이는 기존의 순위를 직접 이용함으로써 발생되는 문제점들을 해결할 수 있는 것으로, 이때 거리는 순위에 반비례하며 거리함수를 사용함으로써 특정 데이터베이스가 추구하는 검색효율에 맞추어 거리함수를 선택할 수 있게 한다. 또한, 주어진 조건하에서 발생 가능한 모든 검색패턴을 자동 생성시키는 검색패턴 자동 생성 프로그램을 제작하여 기존 평가 척도와 제안한 평가 척도와의 특성차이를 분석하는데 이용하도록 한다.The evaluation scale program of the present invention introduces the distance concept based on the ranking of the nonconformance data within the measurement range, sums and normalizes them, and quantizes them to values between 0 and 1. This can solve the problems caused by using the existing ranking directly. In this case, the distance is inversely proportional to the ranking, and by using the distance function, the distance function can be selected according to the search efficiency pursued by a specific database. In addition, an automatic search pattern generation program for automatically generating all search patterns that can occur under a given condition is used to analyze the characteristic difference between the existing evaluation scale and the proposed evaluation scale.

도 2 는 본 발명에 따른 검색효율 평가 방법에 대한 일실시예 흐름도이다.2 is a flowchart illustrating a method for evaluating search efficiency according to the present invention.

본 발명은 사용자로부터 검색패턴을 입력받아(201) 이를 평가하여(202 내지 204) 수치로 결과를 출력한다(205). 이때, 기존 평가 척도와 함께 출력하여 비교, 참조하도록 한다. 여기서, 검색패턴이란, 측정범위내에 검색된 자료가 실제 관련있는 것인지의 여부를 나타낸 것이다. 평가 척도 실행 화면이 도 3에 도시되었다.The present invention receives a search pattern from the user (201), evaluates it (202 to 204) and outputs the result as a numerical value (205). At this time, it is output along with the existing evaluation scale for comparison and reference. Here, the search pattern indicates whether or not the data searched within the measurement range is actually related. An evaluation scale execution screen is shown in FIG. 3.

여기서, NDS(Normalized Distance Sum)는 정규화된 평가 척도값을 의미한다.Here, NDS (Normalized Distance Sum) means a normalized evaluation scale value.

도 3의 분포유형에 대한 의미는, 측정범위가 10위까지 일 때 관련자료 5개가 모두 10위 안에 검색되었으며, 그 위치는 각각 1, 2, 3, 4, 10위이며, 부적합 자료가 5, 6, 7, 8, 9위에 5개 검색되었음을 의미한다. 이때의 검색효율 평가값이 도 3에 나타나 있다.검색패턴에 대해 보다 상세하게 살펴보면 다음과 같다.사용자가 검색시스템의 질의항목에 데이터(키워드, 그림 등)를 입력후 이와 관련있거나 유사한 데이터들을 데이터베이스 내에서 검색하고자 검색 실행버튼을 눌렀을때, 검색 시스템은 검색 결과를 출력하는데, 유사도가 높은 순으로 출력한다. 이때, 유사도가 낮을수록 사용자의 입장에서는 부정확한 결과일 가능성이 높기 때문에 상위 일정 순위까지만을 검색결과로 간주한다.똑같은 질의와 데이터베이스라 할지라도 검색시스템의 검색방법이 다르면 검색결과는 달라질 수 있는데, 검색결과의 효율성을 측정하기 위해 검색결과 중 상위 일정 순위로 잡는 범위를 측정범위라 한다. 이 측정범위 안에는 실제 관련있는 데이터와 검색시스템이 잘못 찾은 관련없는 데이터들로 구성된다. 예를 들어, 관련있는 데이터를 "Ｏ"으로, 관련없는 데이터를 "×"로 표기하고 측정범위 5위일때 A시스템의 검색패턴은 "ＯＯ××Ｏ"(관련 데이터 1,2,5위로 검색), B시스템은 "ＯＯＯ××"(관련 데이터 1,2,3위로 검색) 등 여러 형태가 될 수 있다.검색효율을 평가하기 위해서는 이러한 검색패턴을 대상으로 이루어지는데, 본 발명에서는 도 3의 프로그램을 통해 검색패턴을 입력하여 각각의 평가척도를 통해 검색효율을 계산할 수 있다. 따라서, 사용자가 입력하는 검색패턴이란 앞서의 "ＯＯＯ××"와 같은 검색결과를 바탕으로 검색효율을 계산하기 위한 도 3과 같은 프로그램에 입력하는 데이터를 말한다. 도 3은 검색결과 측정범위를 25위로 하여 검색하였을때 실제 관련 있는 데이터가 검색된 1,2,3,4,10위의 검색패턴이 입력된 예이다.The meaning of the distribution type of FIG. 3 is that when the measurement range reaches the 10th position, all 5 related data were searched in the 10th position, and the positions are 1, 2, 3, 4, 10th, respectively, and the non-conforming data is 5, It means that 5 searches were made in 6th, 7th, 8th, and 9th place. The search efficiency evaluation value at this time is shown in Fig. 3. The search pattern is described in more detail as follows. A user inputs data (keywords, pictures, etc.) into a query item of a search system and stores related or similar data in a database. When the search execution button is pressed to search within, the search system outputs the search results in order of similarity. In this case, the lower the similarity, the more likely it is to be an inaccurate result from the user's point of view. Therefore, the search result is regarded as a search result up to a certain high ranking. In order to measure the efficiency of a search result, the range that the search result ranks as a higher schedule is called a measurement range. Within this measurement range is actually composed of relevant data and irrelevant data that the search system finds incorrectly. For example, if the relevant data is marked as "O" and the irrelevant data is denoted as "X" and the measurement range is 5th, the search pattern of A system is "OO × XO" (searched as related data 1,2,5). The B system can be in various forms, such as " 0 " (refer to related data 1, 2, 3). In order to evaluate the search efficiency, such a search pattern is used. The search pattern can be input through the program to calculate the search efficiency through each evaluation scale. Accordingly, the search pattern input by the user refers to data input to the program as shown in FIG. 3 for calculating the search efficiency based on the search result as described above. 3 is an example in which the first, second, third, fourth, and tenth search patterns in which the relevant data is searched are searched when the search result is measured at the 25th position.

본 발명은 검색효율을 측정하는데 있어 하나의 척도로 다음의 4가지 조건을 모두 충족시키며 0∼1의 정규화된 값이다.The present invention satisfies the following four conditions on one scale in measuring the search efficiency and is a normalized value of 0 to 1.

[조건1] 많은 관련 자료를 검색할 수 있어야 한다.[조건2] 검색된 부적합 자료가 적어야 한다.[조건3] 관련 자료가 앞쪽 순위에 있어야 한다.[조건4] 관련 자료가 몰려 있어야 한다.[Condition 1] It should be possible to search a lot of related data. [Condition 2] There should be a small amount of nonconforming data searched. [Condition 3] The related data should be ranked first.

본 발명은 다른 평가 척도와 달리 관련/부적합 자료의 개수나 순위를 이용하는 것이 아니라, 거리를 이용한다. 검색된 특정 자료의 순위에 대한 거리는 다음의 [표1]과 같이 주어진다.Unlike other evaluation measures, the present invention uses distances, not the number or rank of relevant / nonconforming data. The distance to the ranking of the specific data retrieved is given in Table 1 below.

순위ranking 1One 22 33 ...... ii ...... m-2m-2 m-1m-1 mm ...... d_i d _i mm m-1m-1 m-2m-2 ...... m-i+1m-i + 1 ...... 33 22 1One 00

따라서, 측정범위(m)내에 검색된 임의 자료의 순위가 i라고 할 때, 이에 대한 거리는 하기의 [수학식 1]에 의해 구해진다.Therefore, when the rank of any data retrieved within the measurement range m is i, the distance to this is obtained by the following formula (1).

이제, 측정범위 m순위내의 부적합 자료에 대한 거리(수학식 1)를 모두 더한다. 이때, 관련 자료가 모두 검색된 이후에 검색된 자료는 부적합 자료로 보지 않는다. 즉, 더해지는 범위l은 관련자료를 모두 검색하였을 때는 검색된 마지막 관련자료의 순위가 되며, 관련자료가 일부만 검색된 경우에는 더해지는 범위는 m순위까지이다. 이러한 범위l안에서 부적합 자료에 대한 거리를 더하게 되면 하기의 [수학식 2](B)와 같다(202).Now, add together the distance (Equation 1) for nonconforming data in the range m order of measurement. At this time, the searched data after all relevant data is searched is not regarded as inappropriate data. That is, the range l added is the ranking of the last relevant data retrieved when all related data are searched. If the relevant data is only partially searched, the range added is up to m rank. Adding the distance to the nonconforming data within this range l is as shown in Equation 2 (B) below (202).

이제, 상기 [수학식 2](B)를 정규화하기 위해서는 검색된 모든 자료에 대한 거리를 더한 값(C)으로 나누면 된다(B/C). 검색된 모든 자료에 대한 거리의 합은 하기의 [수학식 3](C)과 같다(203).Now, in order to normalize Equation (2) (B), the distance to all the retrieved data is divided by the value (C) (B / C). The sum of the distances for all the data retrieved is shown in Equation 3 (C) below (203).

따라서, 상기 [수학식 2](B)를 상기 [수학식 3](C)으로 나눈(B/C) 정규화된 평가척도값(NDS)은 하기의 [수학식 4]로 나타낼 수 있다(204).Accordingly, the normalized evaluation scale value (NDS) obtained by dividing [Equation 2] (B) by [Equation 3] (C) (B / C) may be represented by Equation 4 below (204). ).

상기 [수학식 4]에는 상기 [수학식 1]이 포함되어 있으므로 순위에 따른 거리변화는 상기 [수학식 1]에서 알 수 있듯이 순위가 커질수록 거리는 같은 비율로 감소한다. 이것은 매우 일반적인 경우이지만 때에 따라서는 특별히 다른 형태의 감소비율을 적용하여 측정하고 싶을 때가 있다. 즉, 직선형으로 감소하는 것이 아니라, 도 3처럼 곡선형으로 감소하는 거리함수를 사용할 수도 있도록 확장할 필요가 있다. 다만, 감소되는 형태는 직선이던 곡선이던 뒤쪽 순위에 대한 거리는 항상 앞쪽 순위에 대한 거리보다 작아야만 하는 단조감소 형태를 띄어야 한다.Since Equation 4 is included in Equation 4, the distance change according to the rank decreases at the same rate as the rank increases as shown in Equation 1. This is a very common case, but there are times when you may want to measure a different rate of reduction. In other words, it is necessary to expand to use a distance function that decreases in a straight line rather than decreasing in a straight line. However, the reduced form should be a monotonic reduction form where the distance to the rear rank must be less than the distance to the front rank, whether straight or curved.

도 4와 같은 거리함수는 원함수와 삼각함수로 표현되는데, 이들의 값은 0∼1값을 가지므로 먼저 1∼m의 순위를 0에서 1사이의 값으로 정규화한다. 정규화식은 다음의 [수학식 5]와 같다.The distance function as shown in FIG. 4 is represented by a desired function and a trigonometric function. Since these values have 0 to 1 values, first, the rank of 1 to m is normalized to a value between 0 and 1. The regular expression is shown in [Equation 5] below.

상기 [수학식 5]는 측정범위를 10위까지(m=10)로 한다고 할 때, 1∼10 사이의 임의의 순위 i를 0∼1로 정형화하기 위한 것이다. 즉, i=1을 상기 [수학식 5]에 대입하면는 0이 되며, i=2일 때는 0.1, i=10일 때는 0.9가 된다. 이때, 순위 i에 대한 정규화된 순위로부터 정규화된 거리를 구하기 위해 원하는 함수를 적용한다.[Equation 5] is for shaping an arbitrary rank i between 1 and 10 to 0 to 1 when the measurement range is set to 10th position (m = 10). In other words, substituting i = 1 in Equation 5 above Is 0, and is 0.1 when i = 2 and 0.9 when i = 10. Where normalized rank for rank i Normalized distance from Function you want to find Apply.

거리함수는 하기의 [수학식 6]과 같으며, 상기 [수학식 1]에 해당하는 경우는 [수학식 6]의 첫 번째 함수인 r의 역함수가 된다.The distance function is the same as the following [Equation 6], the case corresponding to [Equation 1] is the inverse of r which is the first function of [Equation 6].

이제, 이 거리함수들을 사용할 수 있도록 확장된 NDS는 하기의 [수학식 7]과 같이 계산된다. 여기서, 거리함수가 r의 역함수에 의한 것일 때, NDS는 상기 [수학식 4]와 같이 된다.Now, the NDS extended to use these distance functions is calculated as shown in Equation 7 below. Where distance function When N is due to the inverse function of r, NDS becomes as shown in Equation 4 above.

r_L: 마지막 관련 자료의 순위r _L : Rank of last relevant material

m: 측정 범위의 마지막 순위m: the last rank in the measuring range

검색패턴이 n순위내에서 r개의 관련자료(Ｏ)가 나타난다고 할 때, 발생 가능한 모든 패턴의 개수는 수학적으로_nC_r개이며, 이러한 패턴들을 대상으로 평가값을 측정해 그래프로 나타내보면 측정범위나 관련 영상의 개수 변화에 따른 평가 척도의 특성을 파악하는데 많은 도움이 된다. 이처럼 패턴들을 자동 생성시키는 과정이 도 5에 도시되었다.If the search pattern is r related data (O) within n ranks, the number of possible patterns is mathematically _n C _r , and the measured value is measured and graphed for these patterns. It is very helpful to understand the characteristics of the evaluation scale according to the range or the number of related images. As such, a process of automatically generating patterns is illustrated in FIG. 5.

도 5의 n순위내에서 r개의 관련 자료가 검색되는 모든 경우(_nC_r)에 대한 검색패턴 자동 생성 과정의 핵심은 재귀적 호출에 있다. 예를 들어,₆C₃에 대한 패턴을 그려보면 도 6과 같이 일정한 규칙이 있음을 알 수 있다.The recursive call is the key to the automatic search pattern generation process for all cases ( _n C _r ) where r related data are retrieved within the n rank of FIG. 5. For example, if you draw a pattern for ₆ C ₃ it can be seen that there is a certain rule as shown in FIG.

도 6에서는 일정한 단위로 형태가 반복됨을 알 수 있다. 즉, 첫 번째 "Ｏ"가 순위 1에서부터 생성되는 경우는 G1에 해당하는 p1∼p10이고, 순위 2에서부터 생성되는 경우에는 G2에 해당되는 p11∼p16, 순위 3부터는 G3의 p17∼p19, 순위 4부터는 G4의 p20이다.In Figure 6 it can be seen that the form is repeated in a certain unit. In other words, when the first "O" is generated from rank 1, p1 to p10 corresponding to G1, and if generated from rank 2, p11 to p16 corresponding to G2, and from p3 to p19 and rank 4 of G3 from rank 3 From p20 on G4.

다시 말해,₆C₃에 의한 패턴 20개는 도 6처럼 크게 네 그룹으로 나뉘어지며, 이들은 각각 모집단의 발생패턴을 반복함을 알 수 있다. G1을 다시 세분화하면, 두 번째 "Ｏ"가 각각 순위 2, 3, 4, 5에서부터 생성되는 G11, G12, G13, G14가 된다. 이렇게 세분화하다 보면, 결국은 마지막 "Ｏ"가 생성되는 경우만 남게 되며, 더이상 세분화할 수 없게 된다.In other words, the 20 patterns by ₆ C ₃ are largely divided into four groups as shown in FIG. 6, and each of them repeats the occurrence pattern of the population. If G1 is further subdivided, the second " 0 " becomes G11, G12, G13, and G14 generated from ranks 2, 3, 4, and 5, respectively. In this segmentation, only the last "O" is generated, and the segmentation is no longer possible.

이로부터 재귀적 호출(recursive call)에 의한 알고리즘이 적합함을 알 수 있는데, 도 6의 패턴을₆C₃으로 표기한다면 G1은 (1,₅C₂)로 표기할 수 있다. 즉, 첫 번째 "Ｏ"가 순위 1에 위치하고 있으므로 범위 5내에서 2개의 "Ｏ"가 생성되는 경우를 의미한다. 이런 식으로 G2, G3, G4는 각각 (2,₄C₂), (3,₃C₂), (4,₂C₂)로 표기된다. 이때, G1은 더 세분화될 수 있으므로, G11, G12, G13, G14에 대해 각각 (2,₄C₁), (3,₃C₁), (4,₂C₁), (5,₁C₁)로 표기된다. 이러한 표기법에 따라 도 6은 도 7과 같이 표기할 수 있다.From this it can be seen that the algorithm by the recursive call (recursive call) is suitable. If the pattern of FIG. 6 is expressed as ₆ C ₃ G1 can be expressed as (1, ₅ C ₂ ). That is, since the first " 0 " is located in the rank 1, it means that two " 0 " In this way, G2, G3, and G4 are denoted as (2, ₄ C ₂ ), (3, ₃ C ₂ ), and (4, ₂ C ₂ ), respectively. In this case, since G1 may be further subdivided, for each of G11, G12, G13, and G14, (2, ₄ C ₁ ), (3, ₃ C ₁ ), (4, ₂ C ₁ ), and (5, ₁ C _1). Is indicated by). According to this notation, FIG. 6 may be written as shown in FIG. 7.

가장 마지막까지 세분화시킨 이후 마지막 "Ｏ"를 특정 위치에 생성시키면 되는데, (2,₄C₁)의 경우는 (3,₃C₀), (4,₂C₀), (5,₁C₀), (6,₀C₀)로도 표기할 수 있지만_nC₀=1이므로 더이상 조합기호를 사용하지 않고 (3,1) (4,1) (5,1) (6,1)로 표기한다. 이는 각각 순위 3부터 6까지의 위치에 마지막 남은 하나의 "Ｏ"를 생성함을 뜻한다.After subdividing to the end, the last "O" can be created at a specific location. For (2, ₄ C ₁ ), (3, ₃ C ₀ ), (4, ₂ C ₀ ), (5, ₁ C ₀ ), (hereinafter referred to as 6, it can be expressed also ₀ C _0), but because it is C ₀ = 1 _n without the use of any more symbol combination (3,1) (4,1) (5,1) (6,1) . This means that the last remaining one "O" is generated at positions 3 to 6, respectively.

도 7로부터 세분화하기 위한 규칙을 발견할 수 있는데, n범위내에서 r개의 "Ｏ"가 올 수 있는 패턴의 수_nC_r이 주어지면 세분화하는 과정은 도 8과 같이 일반화시킬 수 있다. 즉, 세분화하기 위해 우측으로 이동하면서 이루어지는 재귀적 호출은 (i,_nC_r)에서 i값은 1 증가, n과 r은 1 감소한 후 (i,_nC_r)을 호출하며, 세분화된 그룹간 아래로의 이동은 그 첫 번째 그룹의 i를 1 증가, n은 1 감소, r은 고정한 후 (i,_nC_r)을 호출하게 된다. 이러한 오른쪽으로의 호출 반복은 r=1이 되었을 때 멈추게 되며, 아래로의 호출반복은 n=r이 되었을 때 멈추게 된다. 여기서, (i,_nC_r) 표기의 의미는 i번째 위치에 "Ｏ"를 생성시킨 후, 그 이후의 영역에 대해서는 다시_nC_r에 대한 패턴생성이 반복됨을 뜻한다.7 may be found from a rule for segmentation, n can be r of "O" is generalized procedure is as shown in FIG. 8 to be given a number _n C _r of the pattern that can be broken down when coming within range. That is, a recursive call is made by moving to the right in order to further refine the (i, _n C _r) in the i value is called a first increment, n and r is 1 (i, _n C _r) and then decreased and liver granular group Moving down calls i (1, _n C _r ) after the first group of i increments by 1, n decreases by 1, and r locks. This repetition of calls to the right stops when r = 1 and the repetition of calls below stops when n = r. Here, (i, _n C _r ) notation means that " 0 " is generated at the i-th position, and then pattern generation for _n C _r is repeated for subsequent regions.

도 8을 알고리즘화한 것이 도 5로서, 도 8의 세분화(오른쪽 및 아래쪽으로의 세분화)는 도 5의 r이 1이 아닐 때 선택되는 오른쪽의 for문에 의한 반복문 블록으로 표시하였으며, 도 8의 오른쪽 부분인 마지막 "Ｏ"의 위치에 해당하는 내용은 도 5의 r=1일 때 선택되는 왼쪽의 for문에 나타나 있다.상술한 바와 같은 본 발명의 방법은 프로그램으로 구현되어 컴퓨터로 읽을 수 있는 기록매체(씨디롬, 램, 롬, 플로피 디스크, 하드 디스크, 광자기 디스크 등)에 저장될 수 있다.The algorithm of FIG. 8 is illustrated in FIG. 5, and the segmentation of FIG. 8 (the right and the subdivisions) is represented by a loop block by the right for statement selected when r of FIG. 5 is not 1. The content corresponding to the position of the last "O" which is the right part is shown in the for statement on the left, which is selected when r = 1 of FIG. 5. The method of the present invention as described above is implemented as a program and can be read by a computer. It may be stored in a recording medium (CD-ROM, RAM, ROM, floppy disk, hard disk, magneto-optical disk, etc.).

이상에서 설명한 본 발명은, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 있어 본 발명의 기술적 사상을 벗어나지 않는 범위내에서 여러 가지 치환, 변형 및 변경이 가능하므로 전술한 실시예 및 첨부된 도면에 한정되는 것이 아니다.The present invention described above is capable of various substitutions, modifications, and changes without departing from the spirit of the present invention for those skilled in the art to which the present invention pertains, and the above-described embodiments and accompanying It is not limited to the drawing.

상기한 바와 같은 본 발명에 따르면, 순위가 주어지는 검색 결과를 통하여 검색효율을 객관적으로 정확하게 측정할 수 있어 순위부여 시스템의 검색효율에 대한 우월성 비교가 가능해 진다. 즉, 기존의 방법으로는 우열을 가리지 못하거나 또는 잘못 평가하는 경우와 달리, 세밀하고도 정확하게 평가할 수 있으므로 인터넷상의 검색 엔진을 비롯한 제안된 많은 검색 알고리즘들의 검색 성능을 정량화시켜 우위를 비교할 수 있는 효과가 있다.According to the present invention as described above, it is possible to objectively and accurately measure the search efficiency through the search results given the ranking, it is possible to compare the superiority of the search efficiency of the ranking system. In other words, unlike the case where the existing method does not cover the superiority or misevaluation, it is possible to evaluate precisely and accurately. Therefore, it is possible to quantify the search performance of many proposed search algorithms including the search engines on the Internet and compare the advantages. There is.

Claims

In the method of evaluating the search efficiency of the search algorithm based on the order in which the search results of the data in the database and the related and unrelated things are searched,

A first step of receiving a search pattern indicating whether the searched data within the measurement range is actually related;

A second step of accumulating a distance according to a ranking of non-conformance data existing in the measurement range from the input search pattern;

Accumulating the distances of all data existing within the measurement range; And

Dividing the cumulative distance according to the ranking of the non-conforming data existing in the measurement range by the cumulative distance of all the data existing in the measuring range, normalizing the cumulative distance according to the ranking of the non-conforming data to output a normalized evaluation scale (NDS) 4th step

Search efficiency evaluation method in a computing system comprising a.

The method of claim 1,

The fifth step of automatically generating all search patterns that can occur under a given condition in order to analyze the characteristic differences between the existing and normalized rating scales.

Search efficiency evaluation method in a computing system further comprising.

The method of claim 2,

The fifth step,

A method for evaluating search efficiency in a computing system, which automatically generates all search patterns that can occur under a given condition by a recursive call.

The method according to any one of claims 1 to 3,

The normalized rating scale (NDS) is,

Substantially, the search efficiency evaluation method in a computing system, characterized in that the value quantified by a normalized numerical value between zero (0) to one (1) for evaluating the search efficiency of the search algorithm.

The method according to any one of claims 1 to 3,

The normalized rating scale (NDS) is,

A method for evaluating search efficiency in a computing system, characterized in that the value is quantified by a normalized value between 0 and 1 for evaluating the search efficiency of a search algorithm by the following equation.

( ,

r _L : ranking of last relevant material,

m: the last rank in the measurement range)

The method according to any one of claims 1 to 3,

The normalized rating scale (NDS) is,

,

r _L : ranking of last relevant material,

m: the last rank in the measurement range, d _ni is the normalized distance)

The method of claim 4, wherein

The process of normalizing the cumulative distance according to the ranking of the nonconforming material of the fourth step,

A method for evaluating search efficiency in a computing system, characterized in that the rank (r) is normalized (r _ni ) to a value between zero (0) and one (1) by the following equation.

(Where i is any rank and m is the measurement range)

The method of claim 7, wherein

A method of evaluating search efficiency in a computing system, characterized in that the distance d from the rank r _ni normalized by the following equation is normalized (d _ni ) to a value between zero (0) and one (1). .

In computing systems with processors, to evaluate the search efficiency of a search algorithm based on the order in which the search results of the data in the database and in what order are irrelevant and related,

A first function of receiving a search pattern indicating whether or not the data searched within the measurement range is actually related;

A second function of accumulating a distance according to a ranking of non-conforming material existing in the measurement range from the input search pattern;

A third function of accumulating distances of all data existing within the measurement range; And

Dividing the cumulative distance according to the ranking of the non-conforming data existing in the measurement range by the cumulative distance of all the data existing in the measuring range, normalizing the cumulative distance according to the ranking of the non-conforming data to output a normalized evaluation scale (NDS) 4th function

A computer-readable recording medium having recorded thereon a program for realizing this.

The method of claim 9,

A fifth function for automatically generating all search patterns that can occur under a given condition in order to analyze the characteristic difference between the existing rating scale and the normalized rating scale

A computer-readable recording medium that records a program for further realization.