KR20030089934A

KR20030089934A - Range search binary method in integer type data structure and apparatus thereof

Info

Publication number: KR20030089934A
Application number: KR1020020027861A
Authority: KR
Inventors: 박경준; 류원; 이은준
Original assignee: 한국전자통신연구원; 주식회사 케이티
Priority date: 2002-05-20
Filing date: 2002-05-20
Publication date: 2003-11-28
Also published as: KR100484484B1

Abstract

PURPOSE: A method for a binary detection by a section search in a data structure having integer type data and a method thereof are provided to detect wanted data rapidly by sensing a section capable of including data to be searched instead of selecting the whole sections, detecting the data within the reduced range, thereby reducing the number of detecting comparison times in the case that massive data comprising integers are searched. CONSTITUTION: A record sorting unit receives data from an input unit(110) and sorts the received data(records) in descending powers or in ascending powers according to key values(120). A group of the sorted records is stored in a storage. A section setting unit comprises the upper/lower limit range calculation unit and a section calculation unit. The upper/lower limit range calculation unit receives a key value of data to be searched(130) and calculates the upper value and the lower value as to the key value(143). The section calculation unit calculates a range or an area of a key value to be searched(145). A key search unit decides a range as to an array number of a record having a key value to be searched out of the sorted records(153) and searches a record having a key value to be searched out of the decided range(155). If the record is searched, searched data are output through an output unit(160).

Description

Range detection binary method in integer type data structure and apparatus according to interval search in data structure with integer data

본 발명은 컴퓨터 등에 저장된 자료를 검출하는 분야에 관한 것으로, 특히 이분검출(Binary Search) 알고리즘에 의한 정수로 구성된 자료의 검출 속도를 보다 향상시켜 검출 시간을 줄일 수 있도록 하기 위한 구간탐색에 의한 이분검출방법 및 그 장치에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to the field of detecting data stored in a computer, and the like, and in particular, to detect binary data by interval search for further improving the detection speed of data consisting of integers by a binary search algorithm to reduce the detection time. A method and apparatus therefor.

종래의 이분검출은 키 K₁, K₂, ··, K_n을 가진 레코드 R₁, R₂, ··, R_n들이 오름차순 또는 내림차순으로 정리된 정수의 집합 R에서 원하는 키가 K인 레코드를 찾는다고 가정할 때, 먼저 레코드들의 순서상 중간에 위치한 키 K_m(m≤n)을 택하여 주어진 키 K와 비교하는 방식을 반복함으로써 검출하였다. 이분검출 방식의 경우, 주어진 키가 어느 위치에 있는지 알 수 없기 때문에 맨 처음 탐색 구간을 레코드 R₁부터 R_n에 이르는 전 구간을 탐색구간으로 택한다. 그러나 만일 찾는 키 K가 집합 R에서 어느 특정 범위 안에 위치하고 있음을 미리 알아낼 수 있다면 탐색 범위를 축소할 수 있고 범위가 축소되면 평균 비교 횟수가 줄어들므로 그 범위에서의 이분 탐색은 전 구간을 대상으로 할 때 보다 검출 속도가 빠를 것이다.Conventional dichotomy detects a record whose desired key is K in a set R of integers in which records R ₁ , R ₂ , ..., R _n are arranged in ascending or descending order with keys K ₁ , K ₂ , ..., K _n . Assuming a search, first, the key K _m (m ≦ n) located in the middle of the order of the records is selected and repeated by repeating the comparison with a given key K. In the binary detection method, since the location of a given key is unknown, the first search section is selected as the search section from the record R ₁ to R _n . However, if we can find out in advance that the key K we are looking for is within a certain range in the set R, we can narrow the search range and reduce the average number of comparisons if the range is reduced, so a binary search over that range Detection speed will be faster than when.

다음은 자료 검출의 여러 방법중의 하나인 이분검출 알고리즘을 나타낸 것으로써 다음과 같이 표현된다.The following is a binary detection algorithm that is one of several methods of data detection.

Binary_Search(x,v,n)Binary_Search (x, v, n)

Int x, v[], n;Int x, v [], n;

{{

int low, mid, high;int low, mid, high;

low = 0; --------------- (1)low = 0; --------------- (One)

high = n - 1; ---------- (2)high = n-1; ---------- (2)

while (low <= high) {while (low <= high) {

mid = (low + high) / 2;mid = (low + high) / 2;

if (x < v[mid])if (x <v [mid])

high = mid - 1;high = mid-1;

else if (x > v[mid])else if (x> v [mid])

low = mid + 1;low = mid + 1;

else return(mid);else return (mid);

}}

return(0);return (0);

}}

위의 이분검출 방식에서 x는 검색하려는 키 값으로서 주어진 키 K를 값으로 하는 변수이며, n은 레코드의 수로서 집합 R의 크기를 나타내며, v[]는 레코드들의 집합이며 키 값의 순서로 정리된 것으로 레코드들이 오름차순으로 정리되어 있다고 가정한 것이다. mid는 비교할 중간 레코드를 나타내는 변수이며, low와 high는 집합 R인 v[]에서 비교 대상 레코드들의 하한범위와 상한범위를 나타내는 변수이다. 수행 종료 직후의 함수값은 찾은 레코드의 번호 또는 찾는 레코드가 집합 R에 없음을 나타내는 값 0이다. K > K_mid이면 비교 대상이 R_mid+1, R_mid+2, ··,R_n이므로 하한 범위의 변수 low는 mid + 1이 되며, K < K_mid이면 R₁, R₂,··, R_mid-1의 레코드들이 비교 대상이 되므로 상한 범위의 변수 high가 mid-1이 된다. 원하는 레코드가 없는 경우에는 알고리즘을 수행하는 도중 low > high가 되어 low = high 조건에 위배되며, 이때 종료한다. 식(1) 및 식(2)에서 보는 바와 같이 low(하한범위) 값과 high(상한범위) 값을 지정할 때 이분 검출 방식의 경우 주어진 키가 어느 위치에 있는지 알 수 없기 때문에 맨 처음 탐색 구간을 레코드 R₁에서 R_n에 이르는 전체 집합 R을 탐색구간으로 택할 수 밖에 없는 문제점이 있다.In the above binary detection method, x is a key value to be searched and is a variable whose value is a given key K, n is the number of records, which represents the size of set R, and v [] is a set of records and arranged in the order of key values. It is assumed that the records are sorted in ascending order. mid is a variable representing the intermediate record to be compared, and low and high are variables representing the lower and upper bounds of the records to be compared in the set R, v []. The function value immediately after the execution ends is the number of the record found or the value 0 indicating that the record to be found is not in the set R. If K> K _{mid, the} comparison target is R _{mid + 1} , R _{mid + 2} , ..., R _n, so the variable low in the lower limit is mid + 1, and if K <K _mid , R ₁ , R ₂ , ... Since the records in R _mid-1 are compared, the variable high in the upper range becomes mid-1. If the desired record is not found, low> high during the algorithm execution is violated by the low = high condition. As shown in equations (1) and (2), when the low and high values are specified, the binary detection method does not know where the given key is located. There is a problem that the entire set R from record R ₁ to R _n can be selected as a search section.

본 발명이 이루고자 하는 기술적 과제는, 상기 문제점들을 해결하기 위해서 정수로 된 방대한 양의 자료를 검색할 경우 맨 처음 탐색 구간을 전 구간으로 선택하지 않고 찾고자 하는 데이터가 존재할 수 있는 구간을 미리 알아내어 탐색 범위를 좁혀준 후 축소된 범위 안에서 검출함으로써 검출 비교횟수를 줄여 보다 빠른 속도로 원하는 데이터를 검출하기 위해서 구간 탐색에 의한 이분검출방법 및 그 장치를 제공하는데 있다.The technical problem to be solved by the present invention is to find out the section in which the data to be searched can be found in advance without selecting the first search section as the entire section when searching for a large amount of integer data to solve the above problems. The present invention provides a method and apparatus for dividing detection by section searching in order to detect desired data at a faster speed by reducing the number of detection comparisons by narrowing the range and detecting the result in a reduced range.

본 발명이 이루고자 하는 다른 기술적 과제는, 상기 방법을 컴퓨터에서 실행시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공하는데 있다.Another object of the present invention is to provide a computer-readable recording medium having recorded thereon a program for executing the method on a computer.

도 1은 본 발명에 따른 구간탐색에 의한 이분검출방법에 대한 흐름을 나타내는 도면이다.1 is a view showing a flow for the dichotomous detection method according to the section search according to the present invention.

도 2는 본 발명에 따른 구간탐색에 의한 데이터를 검색하는 이분검색장치에 대한 블록도를 나타낸다.2 is a block diagram of a binary search apparatus for searching data by section search according to the present invention.

도 3은 정렬된 레코드의 집합에서 각 레코드가 가질 수 있는 키 값의 하한범위(도 3a), 상한범위(도 3b) 및 키 값의 범위(도 3c)를 나타내는 도면이다.FIG. 3 is a diagram illustrating a lower limit range (FIG. 3A), an upper limit range (FIG. 3B), and a range of key values (FIG. 3C) that each record may have in the sorted set of records.

도 4는 본 발명에 따른 탐색구간을 정하기 위해서 레코드의 상한값 및 하한값을 구하기 위한 수학식을 나타내는 도면이다.4 is a diagram illustrating an equation for obtaining an upper limit value and a lower limit value of a record in order to determine a search section according to the present invention.

도 5는 각 레코드가 가질 수 있는 키 값의 분포 유형을 나타내는 도면으로, 레코드의 개수보다 키 값의 구간크기가 큰 유형을 갖는 분포(도 5a), 레코드의 개수와 키 값의 구간 크기가 동일한 유형을 갖는 분포(도 5b), 키 값의 구간 크기가 레코드의 개수보다 작은 유형을 갖는 분포(도 5c) 및 키 값의 구간 크기가 1인 유형을 갖는 분포(도 5d)를 나타낸다.FIG. 5 is a diagram illustrating a distribution type of key values that each record may have, and a distribution having a type in which the key value has a larger size than the number of records (FIG. 5A). Type distribution (FIG. 5B), distribution having a type whose key size is smaller than the number of records (FIG. 5C) and distribution having a type whose key size is 1 (FIG. 5D).

상기의 과제를 이루기 위한 본 발명에 따른 구간탐색에 의한 이분검출방법은, (a) 정수형의 키 값을 가지는 레코드들을 순차적으로 정렬하는 단계;(b) 상기 (a)단계에서 정렬된 레코드들에서 검색할 레코드의 키 값을 입력받아 상기 키 값의 범위를 결정하는 단계;(c) 상기 키 값의 범위 내에 위치하는 레코드들의 순서범위를 결정하고 상기 순서범위 내에서 상기 입력받은 키 값을 가진 레코드를 검색하는 단계를 포함한다.The binary detection method according to the section search according to the present invention for achieving the above object, (a) sequentially sorting records having an integer key value; (b) in the records arranged in step (a) Receiving a key value of a record to be searched and determining a range of the key value; (c) determining a sequence range of records located within the range of the key value and having a record having the input key value within the sequence range Searching for;

상기의 과제를 이루기 위한 본 발명에 따른 구간탐색에 의한 이분검출장치는, 정수형의 키 값을 가지는 레코드들을 순차적으로 정렬하는 레코드정렬부;상기 정렬된 레코드들에서 검색할 레코드의 키 값을 입력받아 상기 키 값의 범위를 결정하는 구간설정부;상기 키 값의 범위 내에 위치하는 레코드들의 순서범위를 결정하고 상기 순서범위 내에서 상기 입력받은 키 값을 가진 레코드를 검색하는 키검색부를 포함한다.In accordance with an aspect of the present invention, there is provided a binary detection apparatus for detecting a segment, comprising: a record sorting unit for sequentially arranging records having an integer key value; receiving a key value of a record to be searched from the sorted records An interval setting unit for determining a range of the key value; a key search unit for determining an order range of records located within the range of the key value and searching for a record having the input key value within the order range.

이하에서, 첨부된 도면을 참조하여 본 발명의 바람직한 실시 예에 대하여 상세히 설명한다.Hereinafter, with reference to the accompanying drawings will be described in detail a preferred embodiment of the present invention.

도 1은 본 발명에 따른 구간탐색에 의한 이분검출방법에 대한 흐름을 나타내는 도면으로, 정수형으로 된 자료를 입력받아 키에 따라서 정렬하고 검색할 레코드의 구간을 설정하여 검색을 하게 된다.1 is a view showing a flow for a dichotomous detection method according to the section search according to the present invention, and receives the data in an integer type, sorts according to a key, and sets a section of a record to be searched.

이하 도 1 및 도 2를 함께 설명하기로 한다.1 and 2 will be described together.

입력부(210)로부터 자료를 입력(110단계)받아 레코드정렬부(220)는 입력받은 자료를 키 값에 따라서 레코드들을 내림차순 또는 오름차순으로 정렬(120단계)한다. 여기에서는 오름차순으로 정렬된 레코드의 집합에 대해서 설명한다.Receiving data from the input unit 210 (step 110), the record sorting unit 220 sorts the records in descending or ascending order according to the key value (step 120). This section describes a set of records sorted in ascending order.

레코드의 집합이 정렬되면, 정렬한 레코드의 집합을 저장부(270)에 저장한다. 구간설정부(240)는 상하한범위계산부(243) 및 구간계산부(245)로 구성되고, 상하한범위계산부(243)는 검색을 원하는 자료의 키 값을 입력받아(130단계) 그 키 값에 대한 하한값과 상한값을 계산(143단계)하고 구간계산부(245)는 검색을 원하는 키 값의 범위 또는 영역을 계산(145단계)한다. 키 값의 하한값, 상한값 및 키 값의 범위를 계산하는 과정에 대한 설명은 도 3에서 자세히 설명한다.When the set of records is sorted, the sorted set of records is stored in the storage unit 270. The interval setting unit 240 includes an upper and lower range calculation unit 243 and an interval calculation unit 245, and the upper and lower range calculation unit 243 receives a key value of a material to be searched (step 130). The lower limit value and the upper limit value are calculated (step 143), and the interval calculator 245 calculates a range or region of key values to be searched (step 145). A description of the process of calculating the lower limit, the upper limit, and the range of the key value of the key value will be described in detail with reference to FIG. 3.

키검색부(250)는 정렬된 레코드들 중에서 찾기를 원하는 키 값을 가진 레코드의 배열번호에 대한 범위를 결정(153단계)하고 결정된 범위 내에서 찾기를 원하는 키 값을 가진 레코드를 검색(155단계)한다. 검색구간인 레코드의 범위를 계산하는 과정에 대해서는 도 4에서 상세하게 설명한다. 레코드가 검색되면, 출력부(260)를 통하여 검색된 자료를 출력하게 된다.The key search unit 250 determines a range of an array number of records having a key value to be searched among the sorted records (step 153) and searches for a record having a key value to be searched within the determined range (step 155). )do. A process of calculating the range of the record, which is a search section, will be described in detail with reference to FIG. 4. When the record is found, the searched data is output through the output unit 260.

도 3a는 레코드가 가질 수 있는 키 값의 하한범위를 나타내는 도면이다. 우선 키 K₁, K₂,··, K_n을 가진 레코드 R₁,R₂,··, R_n들이 주어지고 이들은 키에 의해 오름차순으로 정렬되어 있고 키의 값이 모두 정수라고 가정한다. 키의 값이 K인 레코드 R_k를 찾는다고 할 때, 정수로 된 집합 R에서 주어진 K가 어느 구간에 있는지 사전에 정확히 알아내는 과정에 대해 살펴본다.3A is a diagram illustrating a lower limit of a key value that a record may have. First, the records R ₁ , R ₂ , ..., R _n with keys K ₁ , K ₂ , ..., K _n are given, and they are arranged in ascending order by key and the values of the keys are all integers. Suppose we want to find a record, R _k , whose value is K, and we look at how to find out in advance which interval K is given in an integer set R.

집합 R에서 각 레코드가 가질 수 있는 하한범위(값)와 상한범위(값)를 계산한다. 먼저, 도 3a에서 최하위 키 값인 레코드 R₁의 키를 K₁이라 하고 최상위 키 값인 레코드 R_n을 K_n이라고 하면 레코드 R₂에 들어갈 수 있는 값 K는 오름차순으로 정렬되어 있으므로 K₁보다는 크고 K_n보다는 작은 값을 가질 것이다. 즉, R₂에는 K₁보다 큰 K₁+1부터 값이 들어갈 수 있고, 마찬가지로 R₃에는 K₁+1보다 큰 K₁+2부터 값을 갖게 될 것이다. 이와 같은 과정을 R_n-1까지 적용하면 R_n-1은 K₁+(N-1)-2보다 큰 값인 K₁+(N-1)-1부터의 값을 갖게 된다. 따라서 맨 마지막 레코드인 R_n에는 K_n의 값이 들어갈 것이다. 따라서 레코드 R_k가 가질 수 있는 하한범위는 K₁+(k-1)이 된다. 여기서, 1 < k < N 이고 N은 레코드의 총갯수를 나타낸다.Compute the lower and upper bounds (values) that each record can have in set R. First, as the least significant key value key of a record R ₁ in Fig. 3a K _1, and when the highest key value of the record R _n that K _n value of K that can be in the record R _2, so is arranged in ascending order greater than K ₁ K _n Will have a smaller value. That is, R ₂ is can enter a value from greater than K ₁ K ₁ +1, is, like R ₃ will have a value from greater than K ₁ +2 K ₁ +1. Applying this process to the R _n -1 R _n -1 will have a K ₁ + _(N-1) larger value K ₁ + _(N-1) from the value of -1 than -2. So the last record, R _{n, will} contain the value of K _n . Therefore, the lower limit of record R _k is K ₁ + (k-1). Where 1 <k <N and N represents the total number of records.

도 3b는 각 레코드가 가질 수 있는 상한범위에 대하여 도시하고 있다. 집합 R에서 레코드 R₁과 레코드 R_n은 이미 결정되어지므로 하한범위와 상한범위의 값이 R₁은 K₁이 되고 R_n의 경우는 K_n이 된다. 그러면 R_n-1이 가질 수 있는 상한범위는 R_n이 K_n이므로 R_n-1의 상한범위는 K_n-1이 된다. 동일한 원리로 R_n-2의 상한범위는 K_n-2가 되고 R_n-3은 K_n-3이 될 것이다. 이와 같은 과정을 R_n까지 적용해 보면 레코드 R_n의 상한범위는 K_n-(N-2)가 된다. 따라서 이를 일반화시키면 레코드 R_k의 상한범위는K_n-(N-k)가 된다.3B shows an upper limit range that each record can have. Since the records R ₁ and R _n are already determined in the set R, the values of the lower and upper ranges R ₁ are K ₁ and R _n is K _n . The upper limit range in which the R _n -1 R _n may have the upper limit of K, so _n R _n -1 is the _n -1 K. In the same principle, the upper limit of R _n -2 will be K _n -2 and R _n -3 will be K _n -3. Applying this process up to R _n , the upper limit of the record R _n is K _n- (N-2). Therefore, generalizing this, the upper limit of the record R _k is K _n- (Nk).

위의 결과를 정리하면 집합 R에서 레코드 R_k가 취할 수 있는 하한범위는 K₁+(k-1)이 되고 상한범위는 K_n-(N-k)가 된다. 즉, k번째 레코드가 취할 수 있는 키 값을 K라고 할 때 K가 가질 수 잇는 값의 범위는 K₁+(k-1) ≤K ≤K_n-(N-k)가 된다 (단, k는 1< k <N).Summarizing the above result, the lower limit of record R _k in set R is K ₁ + (k-1) and the upper limit is K _n- (Nk). In other words, if K is the key value that can be taken by the kth record, the range of values that K can have is K ₁ + (k-1) ≤ K ≤ K _n- (Nk) (where k is 1). <k <N).

N개의 레코드를 갖는 집합 R에서 첫번째 레코드와 마지막 레코드를 제외한 R₂에서 R_n-1까지의 각 레코드가 가질 수 있는 키 값의 영역 크기(Key Range Size, 이하 H로 표기한다) H는 {K_n-(N-k)} - {(K₁+(k-1))} + 1 (즉, 상한범위 - 하한범위 + 1)이 되며 이를 정리하면, H = K_n-K₁-N+2로서 모두 동일한 크기의 범위를 갖는다. 이것을 그림으로 나타내면 그 형태는 도 3c와 같이 항상 평행사변형 꼴의 분포를 가지게 됨을 알 수 있다.In a set R with N records, the range of key values that each record from R ₂ to R _n -1 excluding the first record and the last record can have (Key Range Size, hereinafter referred to as H) H is {K _n- (Nk)}-{(K ₁ + (k-1))} + 1 (i.e., upper limit-lower limit + 1), summed up, H = K _n -K ₁ -N + 2 All have the same size range. If this is shown as a picture, it can be seen that the shape always has a distribution of parallelograms as shown in FIG. 3C.

집합 R에서 키 값이 K인 레코드를 찾을 경우 K가 주어진 집합 R에서 어느 구간에 존재하는 가를 알아내기 위해서 도 4에서와 같이 x축을 레코드의 번호(n), y축을 키 값(K)으로하여 하한범위가 K₁이고 상한범위가 K_n을 나타내는 것이라고 할 때 두개의 직선 방정식이 나올 수 있다. 즉, 하나는 y = x라는 직선의 방정식으로서 여기서는 K = (K₁-1)+n으로 표현할 수 있으며 또 다른 하나는 y = x + b 형태의직선의 방정식으로서 이를 달리 표현하면 K=(K₁-2)+n+H라는 방정식으로 표현이 가능하다. 위 두 식을 살펴보면 K값이 주어질 경우 두개의 n값이 구해지는데, K=(K₁-1)+n으로부터 구해지는 n을 high값이라고 하고 K=(K₁-2)+n+H에서 구해지는 n값을 low값이라고 할 경우, low는 K가 존재할 수 있는 맨 처음 시작항 즉, 변수 low를 나타내고 high는 K가 존재할 수 있는 마지막 항 즉, 변수 high를 나타낸다.In the case of finding a record having a key value K in a set R, in order to find out which interval K is in a given set R, the x-axis is the record number (n) and the y-axis is the key value (K) as shown in FIG. If the lower limit is K ₁ and the upper limit is K _n , two linear equations can be generated. That is, one is an equation of a straight line y = x, which can be expressed as K = (K ₁ -1) + n, and the other is a linear equation of the form y = x + b. It can be expressed by the equation ₁ -2) + n + H. Looking at the above two equations, if K is given, two n values are obtained. N obtained from K = (K ₁ -1) + n is called high and K = (K ₁ -2) + n + H When the value of n is determined to be low, low denotes the first starting term where K can exist, that is, the variable low, and high denotes the last term where K can exist, that is, the variable high.

물론 N값과 키의 분포에 따라 low값과 high값도 달라질 수 있다. N이 0부터 시작할 때(즉, n ≥0 일 때) 두 식은 각각 K=K₁+ n, K=(K₁-1)+n+H로 표현된다.Of course, depending on the distribution of N and key, the low and high values can also vary. When N starts from 0 (ie, when n> 0), the two equations are represented by K = K ₁ + n and K = (K ₁ -1) + n + H, respectively.

집합 R에서 각 레코드가 가질 수 있는 키 값의 분포 유형은 다음과 같이 네가지 형태로 분류할 수 있다. 먼저, 도 5a는 H > N인 분포 유형으로서, 레코드의 개수 N에 비해 키의 영역범위(H)가 큰 유형으로 이 경우 찾고자 하는 K값이 K₁과 K₁+N값 사이(K₁≤K ≤K₁+N)에 있거나 또는 K₁+H부터 K_n사이(K₁+H < K ≤K_n)에 있을 경우로 이 구간에서는 기존의 이분 검출 알고리즘보다 탐색 구간이 좁혀지므로 평균 비교횟수가 줄어들게 되고 따라서 탐색 속도가 상대적으로 빠르다. 그러나 K값이 기존의 이분 검출 알고리즘과 동일한 구간인 K₁+N부터 K_n+H사이(즉, K₁+N ≤K ≤K₁+H)에 있을 경우에는 기존의 이분 검출 알고리즘과 거의 동일한 속도로 탐색할 것이다. 이 경우 탐색 구간이 기존의 이분 검출 알고리즘과 동일한 영역은 K_n-K₁-2N(513)이 되고 그 비율은 (K_n-K₁-2N)/(K_n-K₁)으로서 K_n-K₁이 N에 비해 무한히 크다고 가정할 경우 즉, K_n-K₁>> N일 경우 그 비율이 1로 접근한다. 이와 같은 극한의 경우 이 기법은 기존의 이분 검출 알고리즘과 거의 동일한 속도의 탐색시간을 갖게 될 것이다. 최악의 경우의 예는 N이 1이고(N이 0일 수는 없으므로) 이에 비해 아주 큰 폭으로 키 값이 분포된 경우로 이때 비교횟수는 둘 다 동일하게 1이 된다.The distribution types of key values that each record can have in the set R can be classified into four types as follows. First, FIG. 5A is a distribution type in which H> N, and a type having a large area range (H) of keys compared to the number N of records. In this case, the K value to be found is between K ₁ and K ₁ + N values (K ₁ ≤ K ≤ K ₁ + N) or between K ₁ + H and K _n (K ₁ + H <K ≤ K _n ), which means that the search interval is narrower than the conventional dichotomy detection algorithm. Decreases and therefore the search speed is relatively fast. However, when the K value is between K ₁ + N and K _n + H (that is, K ₁ + N ≤ K ≤ K ₁ + H), which is the same interval as the conventional binary detection algorithm, it is almost identical to the conventional binary detection algorithm. Will search at speed. In this case, the area where the search interval is the same as the conventional binary detection algorithm is K _n -K ₁ -2N (513), and the ratio is (K _n -K ₁ -2N) / (K _n -K ₁ ) as K _n- Assuming that K ₁ is infinitely larger than N, that is, when K _n -K ₁ >> N, the ratio approaches 1. In this extreme case, this technique will have almost the same search time as the conventional binary detection algorithm. The worst case example is where N is 1 (because N can't be 0) and the key values are distributed very broadly, with the same number of comparisons.

도 5b는 H=N인 분포 유형으로서 키 값 K가 N인 경우에만 탐색 구간이 기존의 이분 검출 알고리즘과 동일하게 되고 키 값 K가 K_n< K < N인 구간과 N < K < K₁인 구간에서는 탐색 구간이 줄어들므로 즉, N값이 작아지므로 비교 회수가 줄어들게 된다. 이 분포의 특징을 보면, 키 값이 최하위 값인 K₁이나 최상위 값인 K_n으로 접근할수록 구간이 점차 좁혀지면서 비교 회수가 점차 줄어들게 되고 결국에는 K₁또는 K_n에 이르면 한 점으로 수렴되므로 N=1인 경우가 된다. 이때 최대 비교회수는 log(1)+1에 의해서 1이 되므로 단 1번의 비교로 검출을 완료한다. 이러한 특징은 어느 분포에서나 동일하게 나타난다.5b is a distribution type in which H = N, and only when the key value K is N, the search interval is the same as that of the conventional dichotomy detection algorithm, and the key value K is K _n <K <N and the interval N <K <K ₁ In the section, since the search section is reduced, that is, the N value is small, the number of comparisons is reduced. The characteristic of this distribution is that as the key value approaches the lowest value, K _1, or the highest value, K _n , the interval narrows and the number of comparisons decreases gradually, eventually reaching K ₁ or K _n . Is the case. At this time, since the maximum number of comparisons becomes 1 by log (1) +1, detection is completed by only one comparison. This feature appears the same in any distribution.

도 5c는 H < N 인 분포유형으로서 탐색구간이 기존의 이분검출알고리즘에 비해 항상 작은 영역이므로 평균비교횟수는 log₂N - log₂R 만큼 줄어들 것이다(여기서, R은 결정된 탐색 구간을 나타낸다). 줄어드는 평균 비교 회수가 log₂(N/R)이므로 R이 일정할 때, N이 증가함에 따라 상대적인 비교 회수의 차는 이에 비례하여 계속 늘어난다. 곧 N이 큰 경우일수록 평균 검출 속도가 상대적으로 빠르다는 것을 나타낸다.5c is a distribution type where H <N, and since the search interval is always a small area compared to the conventional dichotomy detection algorithm, the average number of comparisons will be reduced by log ₂ N-log ₂ R (where R represents the determined search interval). Since the decreasing average number of comparisons is log ₂ (N / R), when R is constant, the relative difference in comparisons increases proportionally as N increases. In other words, a larger value of N indicates that the average detection speed is relatively faster.

도 5d는 H = 1인 분포 유형으로서 H값이 1이 되어 y = x형의 균일한 분포를 갖게 되어서 이런 분포하에서는 low값과 high값이 동일하므로 임의의 키 값 K에 대하여 해당 레코드를 찾기 위한 최대 비교 회수는 항상 1이 된다. 이 경우는 비교 회수가 N값에 관계없이 N이 무한히 큰 경우에도 최대 비교 회수는 항상 1이 되므로 단 1번의 비교로 키 값이 K인 레코드를 찾아낸다. 위에 열거한 네 가지 분포 유형을 평균 탐색 속도면에서 비교하면 도 5a 내지 도 5d에서 도 5d로 갈수록 속도가 빨라진다. 그러므로, 어떠한 분포의 경우에도 기존의 이분 검출(Binary Search)과 비교할 때 탐색 속도는 빠르며 특히 N이 클수록 비교의 정도는 커진다.Figure 5d is a distribution type of H = 1, H value is 1 to have a uniform distribution of the type y = x, so under this distribution the low and high values are the same, so to find the record for any key value K The maximum number of comparisons is always one. In this case, even when N is infinitely large regardless of the number of N, the maximum number of comparisons is always 1, so only one comparison finds a record having a key value of K. Comparing the four types of distributions listed above in terms of average search speed, the speed increases from FIGS. 5A to 5D to FIG. 5D. Therefore, in case of any distribution, the search speed is faster when compared with the conventional binary search, and in particular, the larger N, the greater the comparison.

상기의 내용을 정리하여 개선된 이분 검출 알고리즘을 기술하면 아래와 같이 나타낼 수 있다.Summarizing the above, an improved bipartite detection algorithm can be described as follows.

Range_Search(x,v,n)Range_Search (x, v, n)

Int x, v[], n;Int x, v [], n;

{{

int low, mid, high;int low, mid, high;

high = x v[0]; ---------- (3)high = x v [0]; ---------- (3)

if (high >= n-1) high = n - 1;if (high> = n-1) high = n-1;

low = x v[n-1] + n - 1; ----------- (4)low = x v [n-1] + n-1; ----------- (4)

if (low <= 0) low = 0;if (low <= 0) low = 0;

while (low <= high) {while (low <= high) {

mid = (low + high) / 2;mid = (low + high) / 2;

if (x < v[mid])if (x <v [mid])

high = mid - 1;high = mid-1;

else if (x > v[mid])else if (x> v [mid])

low = mid + 1;low = mid + 1;

else return(mid);else return (mid);

}}

return(0);return (0);

}}

위의 이분검출 방식에서 x는 찾는 키 값으로 주어진 키 K를 값으로 갖는 변수이며, n은 레코드 수로서 집합 R의 크기를 나타내며, v[]는 레코드들의 집합이며 키 값의 순서로 정리된 것으로 레코드들이 오름차순으로 정리되어 있다고 가정한 것이다. mid는 비교할 중간 레코드를 나타내는 변수이며, low와 high는 집합 R인 v[]에서 비교 대상 레코드들의 하한범위와 상한범위를 나타내는 변수이다. 수행 종료 직후의 함수값은 찾은 레코드의 번호 또는 찾는 레코드가 집합 R에 없음을 나타내는 값 0이다. K > K_mid이면 비교 대상이 R_mid+1, R_mid+2,··, R_n이므로 하한 범위의 변수 low는 mid + 1이 되며, K < K_mid이면 R₁, R₂,··, R_mid-1의 레코드들이 비교 대상이 되므로 상한 범위의 변수 high가 mid - 1이 된다. 원하는 레코드가 없는 경우에는 알고리즘을 수행하는 도중 low > high가 되어 low ≤high 조건에 위배되며, 이때 종료한다.In the above binary detection method, x is a variable having the key K as a value to find, n is the number of records, the size of the set R, and v [] is a set of records and arranged in the order of the key values. It is assumed that the records are arranged in ascending order. mid is a variable representing the intermediate record to be compared, and low and high are variables representing the lower and upper bounds of the records to be compared in the set R, v []. The function value immediately after the execution ends is the number of the record found or the value 0 indicating that the record to be found is not in the set R. If K> K _{mid, the} target of comparison is R _{mid + 1} , R _{mid + 2} , ..., R _n, so the variable low in the lower limit is mid + 1, and if K <K _mid , R ₁ , R ₂ , ... The records in R _mid-1 are compared, so the variable high in the upper range is mid-1. If the desired record is not found, low> high during the algorithm execution is violated by the condition of low ≤ high, and the process ends.

위에서 기술한 알고리즘의 식(3) 및 식(4)에서 low값과 high 값을 결정하는 식은 도 4에서 이미 기술한 바와 같다. 즉, high 값을 결정하는 방정식 K = (K₁-1) + n 에서 n이 high 값을 나타내고, low 값을 결정하는 방정식 K = (K₁- 2) + n + H에서 n이 low값을 나타낸다. 즉, high = K - K₁+ 1, low = K - K₁- H + 2 가 된다. 도 4에서 기술한 바와 같이 n이 0부터 시작할 때, K = (K₁-1) + n 및 K = (K₁- 2) + n + H는 각각 K = K₁+n, K = (K₁- 1) + n + H로 표현된다고 하였으므로 high = K - K₁이 되며, low = K - K₁- H + 1이 된다.Equation (3) and equation (4) of the above-described algorithm to determine the low and high values are the same as already described in FIG. That is, the equation for determining the high value of _{K = (K 1 -1) +} n equation K = (K ₁ - 2) for n represents a high value, determines the low value in the n + a low value in the n + H Indicates. That is, high = K-K ₁ + 1, low = K-K ₁ -H + 2. When starting from n to 0 as described in Fig. _{4, K = (K 1 -1} ) + n , and _{K = (K 1 - 2)} + n + H each _{K = K 1 + n, K} = (K _1-1 ) + n + H, so high = K-K ₁ , low = K-K ₁ -H + 1.

또한 도 3c에서 기술한 바와 같이, H는 K_n- K₁- n + 2의 값을 갖는다고 하였으므로 low = K - K_n+ n - 1로 표현된다. 위와 같이 low(하한범위) 값과 high(상한범위) 값에 의해 찾는 키가 존재하는 구간을 사전에 알 수 있으므로 전체 구간이 아닌 이 구간 내에서 검출을 시작하므로 전체 집합 R을 검출구간으로 택하는 기존의 방식 보다 빠른 속도로 검출이 가능하다.In addition, as described in FIG. 3C, since H has a value of K _n -K ₁ -n + 2, it is represented by low = K-K _n + n-1. As above, it is possible to know the section in which the key is found by the low (lower limit) value and the high (higher limit value) in advance, so the detection is started within this interval rather than the entire interval. Therefore, the entire set R is selected as the detection interval. Detection is possible faster than conventional methods.

다음은, N이 12이고 아래와 같은 배열로 정렬되어 있을 때 키 값(K)이 35인 레코드의 위치 구간을 찾는 과정을 예를 들어 설명한다.The following describes, for example, a process of finding a position section of a record having a key value K of 35 when N is 12 and arranged in the following array.

배열Arrangement 1One 22 33 44 55 66 77 88 99 1010 1111 1212 키값Key value 2727 2828 3030 3131 3232 3535 3636 3737 3939 4040 4141 4242

레코드의 배열이 1부터 시작함으로써, H = K₁₂- K₁- 12 + 2, high = K - K₁+ 1 , low = K - K₁- H + 2를 이용하여 계산하면, 상기 배열에서와 같이 키 값(K)이 35인 레코드의 위치에 대한 범위를 계산하면, H가 5이고 low가 5이고 high가 9이므로 배열에서 5번째와 9번째 구간에 위치하고 있음을 알 수 있다. 따라서 이 구간에서 이분탐색을 시작하여 키 값이 35인 레코드의 위치를 찾아낸다.By the arrangement of the records starting from _{1, H = K 12 - K} 1 - 12 + 2, high = K - K 1 + 1, low = K - K 1 - when calculated by using the H + 2, as in the arrangement Similarly, if we calculate the range of the position of the record having the key value K of 35, we can see that it is located in the 5th and 9th intervals of the array because H is 5, low is 5, and high is 9. Therefore, start a binary search in this interval to find the location of the record with the key value 35.

본 발명은 또한 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 컴퓨터가 읽을 수 있는 기록매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 하드디스크, 플로피디스크, 플래쉬 메모리, 광데이터 저장장치 등이 있으며, 또한 캐리어 웨이브(예를 들어 인터넷을 통한 전송)의 형태로 구현되는 것도 포함한다. 또한 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드로서 저장되고 실행될 수 있다.The invention can also be embodied as computer readable code on a computer readable recording medium. The computer-readable recording medium includes all kinds of recording devices in which data that can be read by a computer system is stored. Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, hard disk, floppy disk, flash memory, optical data storage device, and also carrier waves (for example, transmission over the Internet). It also includes the implementation in the form of. The computer readable recording medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.

이상에서 설명한 바와 같이, 본 발명에 의하면, 정수형의 자료 집합에서 주어진 키를 검출할 때 찾는 키 K가 집합 R에서 어느 특정 범위 내에 위치하고 있음을 미리 알아낸 후 축소된 구간에서 자료를 검출한다. 따라서 기존의 이분 검출 방식보다 비교회수가 상대적으로 줄어들어 보다 빠른 방법으로 원하는 자료를 검출할 수 있는 시간 단축 효과가 있다.As described above, according to the present invention, when detecting a given key in an integer type data set, the key K is found in advance in a certain range in the set R, and then data is detected in a reduced section. Therefore, compared with the conventional binary detection method, the number of comparisons is relatively reduced, which can reduce the time to detect desired data in a faster way.

Claims

(a) sequentially sorting records having an integer key value;

(b) receiving a key value of a record to be searched from the records sorted in step (a) and determining a range of the key value; And

and (c) determining an order range of records located within a range of the key value and searching for a record having the input key value within the order range.

The method of claim 1, wherein step (a)

And dividing the records in ascending or descending order according to the size of the key value.

The method of claim 1, wherein step (b)

(b1) calculating a lower limit range and an upper limit range of a key value of the record having the predetermined number; And

(b2) calculating an area of key values each record except for the first record and the last record in a set of records consisting of a predetermined number of records; Binary detection method.

The method of claim 3, wherein in step (b1),

The lower limit and the upper limit are obtained from the following equations,

[Equation]

Where K ₁ is the key value of the first record, K is the key value of the kth record, and K _n is the key value of the nth record, k is an integer greater than 1, and n is Divided detection method according to the section search, characterized in that the integer in the range of.

The method of claim 3, wherein in step (b2),

The area of the key value is obtained by the following equation,

[Equation]

Where K ₁ represents the kit value of the first record and K _n represents the key value of the nth record, and n is Divided detection method according to the section search, characterized in that the integer in the range of.

The method of claim 1, wherein in step (c),

The order range determines the lower limit and the upper limit of the record by the following equation,

[Equation]

Wherein K ₁ is the key value of the first record, K _n is the key value of the nth record, and H is a range of key values.

A record sorting unit for sequentially ordering records having an integer key value;

An interval setting unit which receives a key value of a record to be searched from the sorted records and determines a range of the key value; And

And a key searching unit for determining an order range of records located within a range of the key value and searching for a record having the input key value within the order range.

The method of claim 7, wherein

And a input unit for receiving a key value of the records or the record to be searched.

The method of claim 8, wherein the record sorting unit

And dividing the records having an integer key input into the input unit in ascending or descending order.

The method of claim 7, wherein the section setting unit

An upper and lower range calculation unit configured to calculate a lower limit range and an upper limit range of the input key value; And

Binary detection by edge searching, comprising: an interval calculating unit for calculating an area of key values each record except for the first and last records in a set of records consisting of a predetermined number of records Device.

The method of claim 7, wherein

And a storage unit for storing the sorted records or the records of the retrieved key values.

A computer-readable recording medium having recorded thereon a program for executing the method of any one of claims 1 to 6.