KR100319761B1

KR100319761B1 - Frame-partitioned parallel processing method for database retrieval using signature file

Info

Publication number: KR100319761B1
Application number: KR1020000002767A
Authority: KR
Inventors: 김정기
Original assignee: 오길록; 한국전자통신연구원
Priority date: 2000-01-21
Filing date: 2000-01-21
Publication date: 2002-01-05
Also published as: KR20010075870A

Abstract

본 발명은 멀티미디어 데이터에 대한 검색을 빠르게 하기 위해 해싱 함수를 이용하여 시그니처라는 비트 스트링의 인덱스 구조를 만들고 이를 병렬 처리할 수 있도록 하는 시그니처 파일을 이용한 데이터베이스 검색시스템에서의 프레임 분할 병렬 처리 방법에 관한 것으로, 신장 해싱과 프레임 분할 방식을 이용하여 시그니처를 분할하고 각 프로세싱 노드에 전달하여 병렬로 저장 구조를 구축하는 제 1 단계; 구축된 저장 구조에 검색의 질의가 주어지면 질의로부터 질의 시그니처를 생성하고, 질의 시그니처와 저장 구조를 비교하여 동류키를 생성하여 동일한 양의 동류키를 각각의 프로세싱 노드로 전달하는 제 2 단계; 동류키를 전달받은 각각의 프로세싱 노드는 동류키가 가리키는 프레임을 찾아서 질의 시그니처와 매치되는지 여부를 판단하여, 그 판단결과를 이웃하는 프로세싱 노드에 전달하는 제 3 단계; 및 판단결과를 전달받은 프로세싱 노드가 매치 결과를 이용하여 매치되는 프레임을 검색하여 판단하고, 다음 프로세싱 노드로 전달하여 마지막 프레임에 도달하면 검색을 종료하는 제 4 단계를 포함하며, 데이터베이스 검색시스템 등에 이용됨.The present invention relates to a frame division parallel processing method in a database retrieval system using a signature file that can create an index structure of a bit string called a signature and process the same by using a hashing function to speed up the retrieval of multimedia data. A first step of dividing the signature using the decompression hashing and the frame partitioning scheme and transferring the signature to each processing node to build a storage structure in parallel; A second step of generating a query signature from the query, comparing the query signature with the storage structure, generating a peer key by passing the same amount of the peer key to each processing node when the query of the search is given to the constructed storage structure; Each processing node that has received the same key finds a frame indicated by the same key, determines whether it matches the query signature, and transmits the result of the determination to a neighboring processing node; And a fourth step in which the processing node receiving the determination result searches for and determines a matched frame using the match result, and transfers to the next processing node to terminate the search when the last frame is reached. .

Description

FRAME-PARTITIONED PARALLEL PROCESSING METHOD FOR DATABASE RETRIEVAL USING SIGNATURE FILE}

본 발명은 시그니처 파일을 이용한 데이터베이스 검색시스템에서의 프레임 분할 병렬 처리 방법과 상기 방법을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체에 관한 것으로, 특히 비트 스트링(bit string) 형태의 데이터(data)를 저장하는 데이터베이스(database)에서 질의(query)에 대한 검색을 빠르게 하기 위한, 시그니처 파일을 이용한 데이터베이스 검색시스템에서의 프레임 분할 병렬 처리 방법과 상기 방법을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체에 관한 것이다.The present invention relates to a frame division parallel processing method in a database retrieval system using a signature file, and to a computer readable recording medium having recorded thereon a program for realizing the method. In particular, data in the form of a bit string A computer-readable recording method of frame division parallel processing in a database retrieval system using a signature file and a program for realizing the method to speed up a search for a query in a database storing It relates to a recording medium.

상기와 같은 병렬 처리 방법은 대규모 멀티미디어 데이터(multimedia data)를 데이터베이스에 저장할 때, 효율적인 데이터 관리를 위한 인덱스(index) 방법의 하나인 해싱(hashing) 기법을 이용하여 병렬 처리함으로써, 보다 빠르게 검색을 하기 위한 것이다. 이를 위하여 본 발명에서는 멀티미디어 데이터에 해싱 기법을 적용하여 레코드 당 축약된 비트 스트링을 구하고, 이것을 인덱스 구조로 만들어 저장한다. 이때, 축약된 비트 스트링을 레코드 시그니처(signature)라 한다. 또한, 검색의 대상이 되는 질의도 검색을 위한 같은 모양의 질의 시그니처로 만든다. 상기의 멀티미디어 데이터는 수시로 추가되고 삭제되며, 갱신되는 동적인 환경에 존재한다.Such a parallel processing method performs faster retrieval by using a hashing technique, which is one of index methods for efficient data management, when storing massive multimedia data in a database. It is for. To this end, in the present invention, by applying a hashing technique to the multimedia data to obtain a shortened bit string per record, it is stored in an index structure. In this case, the abbreviated bit string is called a record signature. In addition, the query to be searched is made into the same query signature for searching. The multimedia data is in a dynamic environment that is added, deleted, and updated from time to time.

또한, 이러한 병렬처리가 수행되는 컴퓨터 시스템의 환경은 SN(Shared Nothing) 구조의 병렬처리 컴퓨터에서 수행된다. 상기의 SN 구조는 각각의 마이크로 프로세서(micro processor)에 자신의 메모리(memory)와 디스크(disk)를 가지며,상호 프로세서 간에는 고속의 네트워크로 연결된다. 여기서, 프로세서, 메모리, 디스크의 한 모음을 프로세싱 노드(processing node)라 한다.In addition, the environment of a computer system in which such parallel processing is performed is performed in a parallel processing computer having a shared nothing structure. The SN structure has its own memory and disk in each microprocessor, and is connected to each other in a high speed network. Here, a collection of processors, memory, and disks is called a processing node.

한편, 종래의 '병렬처리를 위한 코드 분산 방법 및 장치'는 병렬 컴퓨터에서 코드 형태로 저장된 자료에 대해 프로세서에 분산시키는 방법과 장치를 개발하는 것으로, 질의 코드에 대한 동류코드를 병렬 프로세서에서 고르게 분포시키기 위해서 겹침(folding) 형태로 코드를 프로세서에 분할한다. 이렇게 하면 병렬 처리의 효율성을 높일 수 있다.On the other hand, the conventional method and apparatus for distributing code for parallel processing is to develop a method and apparatus for distributing data stored in a code form in a parallel computer to a processor, and evenly distribute similar codes for query codes in a parallel processor. In order to do this, we split the code into processors that are folded. This can increase the efficiency of parallel processing.

여기서, 동류코드란 질의에 매치되는 모든 비트 스트링 코드를 의미한다. 예를 들어, 질의 코드가 '0101'일 때, 동류코드는 '0101', '0111', '1101', '1111'이다.Here, the homogeneous code means all bit string codes that match the query. For example, when the query code is '0101', the similarity codes are '0101', '0111', '1101', and '1111'.

종래의 '해밍필터' 방법은, 자료전송을 위해 1973년에 제안된 해밍코드를 이용하여 분산행렬(check matrix)을 만들고 이를 코드와 곱해서 얻은 결과 값으로 프로세서의 번호를 결정하여 처리하는 병렬처리 방법이다. 상기 분산행렬에서 행의 개수가 m이고, 열의 개수가 n일때, 프로세서의 개수 P는 아래의 [수학식 1]에 의하여 구해진다.The conventional 'hamming filter' method uses a Hamming code proposed in 1973 for data transmission, and creates a check matrix and multiplies the code with a resultant value to determine and process the processor number. to be. When the number of rows in the distributed matrix is m and the number of columns is n, the number P of processors is obtained by Equation 1 below.

P = 2^m= n+1P = 2 ^m = n + 1

예를 들어, 프로세서 개수가 8일 때, (3X7)의 분산행렬이 구해진다.For example, when the number of processors is 8, a dispersion matrix of (3X7) is obtained.

그러나, 상기의 해밍필터는 [수학식 1]로 프로세서가 결정될 때에만 좋은 분산을 이룰 수 있는 단점이 있다.However, the Hamming filter has a disadvantage that good dispersion can be achieved only when the processor is determined by Equation 1.

또한, 종래의 기술들은 비트 스트링 형태의 데이터인 시그니처를 병렬 처리하기 위하여 해싱 기법의 키 값에 따라 분류하여 시그니처를 프로세싱 노드에 전달하였다. 즉, 검색할 때 해싱 함수에 의한 해싱키로부터 만들어지는 동류키를 가능한 프로세싱 노드에 고르게 분포시키는 방향으로 시그니처를 분류하였다. 여기서, 동류키는 상기의 동류코드와 같은 의미이다. 종래의 '병렬처리를 위한 코드 분산 방법 및 장치'는 동류키를 고르게 분포시키기 위해 겹침(folding) 방법을 사용하였고, 종래의 '해밍필터' 방법은 상기의 분산 방법처럼 비트 스트링의 레코드 중 분산행렬에 맞는 크기로 계산해서 프로세서의 번호를 결정하였다.In addition, conventional techniques classify according to key values of a hashing scheme and transmit the signatures to processing nodes in order to process the signatures in the form of bit strings in parallel. That is, the signatures are classified in the direction of distributing homogeneous keys generated from the hashing keys by the hashing function evenly among possible processing nodes. Here, the same key has the same meaning as the above same code. In the conventional code distribution method and apparatus for parallel processing, a folding method is used to evenly distribute similar keys, and the conventional Hamming filter method is a distributed matrix among records of a bit string like the above distribution method. The number of processors was determined by calculating the correct size.

따라서, 종래에는 동류키를 고르게 분산시켜 병렬처리의 효율성을 높이려 했지만, 모든 경우에 대하여 동류키를 고르게 분산시킬 수 있는 방법이 없기 때문에 병렬처리를 위해 필수적인 실행 균등을 이루기 어려운 문제점이 있었다.Therefore, in the related art, even though the parallel key is distributed evenly, the efficiency of parallel processing is improved. However, since there is no method for evenly distributing the same key in all cases, it is difficult to achieve the necessary execution equality for parallel processing.

본 발명은, 상기한 바와 같은 문제점을 해결하기 위하여 안출된 것으로, 멀티미디어 데이터에 대한 검색을 빠르게 하기 위해 해싱 함수를 이용하여 시그니처라는 비트 스트링의 인덱스 구조를 만들고 이를 병렬 처리할 수 있도록 하는, 시그니처 파일을 이용한 데이터베이스 검색시스템에서의 프레임 분할 병렬 처리 방법과 상기 방법을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공하는데 그 목적이 있다.The present invention has been made to solve the above-described problems, the signature file using a hashing function to speed up the search for multimedia data to create an index structure of a bit string called a signature and to process them in parallel It is an object of the present invention to provide a frame division parallel processing method in a database retrieval system using a computer and a computer readable recording medium recording a program for realizing the method.

도 1 은 본 발명이 적용되는 데이터베이스 검색시스템에 대한 구성 예시도.1 is an exemplary configuration diagram of a database search system to which the present invention is applied.

도 2 는 본 발명에 따른 시그니처 파일을 이용한 데이터베이스 검색시스템에서의 프레임 분할 병렬 처리 방법중 병렬 저장 구조를 나타내는 설명도.2 is an explanatory diagram showing a parallel storage structure of a frame division parallel processing method in a database retrieval system using a signature file according to the present invention;

도 3 은 본 발명에 따른 시그니처 파일을 이용한 데이터베이스 검색시스템에서의 프레임 분할 병렬 처리 방법에 대한 일실시예 흐름도.3 is a flowchart illustrating a method for parallel processing of frame division in a database search system using a signature file according to the present invention.

*도면의 주요 부분에 대한 부호의 설명* Explanation of symbols for the main parts of the drawings

11 : 레코드 12 : 데이터베이스11: record 12: database

13 : 레코드 시그니처 14 : 병렬검색13: record signature 14: parallel search

15 : 질의 시그니처15: query signature

상기 목적을 달성하기 위한 본 발명은, 시그니처 파일을 이용한 데이터베이스 검색시스템에 적용되는 프레임 분할 병렬 처리 방법에 있어서, 신장 해싱과 프레임 분할 방식을 이용하여 시그니처를 분할하고 각 프로세싱 노드에 전달하여 병렬로 저장 구조를 구축하는 제 1 단계; 상기 구축된 저장 구조에 검색의 질의(query)가 주어지면 질의로부터 질의 시그니처를 생성하고, 상기 질의 시그니처와 상기 저장 구조를 비교하여 동류키를 생성하여 동일한 양의 동류키를 각각의 프로세싱 노드로 전달하는 제 2 단계; 상기 동류키를 전달받은 상기 각각의 프로세싱 노드는 상기 동류키가 가리키는 프레임을 찾아서 질의 시그니처와 매치되는지 여부를 판단하여, 그 판단결과를 이웃하는 프로세싱 노드에 전달하는 제 3 단계; 및 상기 판단결과를 전달받은 프로세싱 노드가 매치 결과를 이용하여 매치되는 프레임을 검색하여 판단하고, 다음 프로세싱 노드로 전달하여 마지막 프레임에 도달하면 검색을 종료하는 제 4 단계를 포함하여 이루어진 것을 특징으로 한다.In order to achieve the above object, the present invention provides a frame division parallel processing method applied to a database retrieval system using a signature file, wherein the signature is partitioned using decompression hashing and frame division schemes, and transferred to each processing node and stored in parallel. A first step of building the structure; Given a query of a search in the constructed storage structure, a query signature is generated from the query, the query signature is compared with the storage structure, and a co-key is generated to deliver the same amount of co-key to each processing node. A second step of doing; A third step of each processing node that has received the same key finds a frame indicated by the same key to determine whether it matches the query signature, and transmits the result of the determination to a neighboring processing node; And a fourth step in which the processing node that has received the determination result searches for and determines a matched frame using the match result, and transfers to the next processing node to terminate the search when the last frame is reached. .

또한, 본 발명은, 프로세서를 구비한 데이터베이스 검색시스템에, 신장 해싱과 프레임 분할 방식을 이용하여 시그니처를 분할하고 각 프로세싱 노드에 전달하여 병렬로 저장 구조를 구축하는 제 1 기능; 상기 구축된 저장 구조에 검색의 질의(query)가 주어지면 질의로부터 질의 시그니처를 생성하고, 상기 질의 시그니처와 상기 저장 구조를 비교하여 동류키를 생성하여 동일한 양의 동류키를 각각의 프로세싱 노드로 전달하는 제 2 기능; 상기 동류키를 전달받은 상기 각각의 프로세싱 노드는 상기 동류키가 가리키는 프레임을 찾아서 질의 시그니처와 매치되는지여부를 판단하여, 그 판단결과를 이웃하는 프로세싱 노드에 전달하는 제 3 기능; 및 상기 판단결과를 전달받은 프로세싱 노드가 매치 결과를 이용하여 매치되는 프레임을 검색하여 판단하고, 다음 프로세싱 노드로 전달하여 마지막 프레임에 도달하면 검색을 종료하는 제 4 기능을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공한다.The present invention also provides a database retrieval system having a processor, comprising: a first function of dividing a signature by using decompression hashing and a frame dividing scheme and transferring the signature to each processing node to build a storage structure in parallel; Given a query of a search in the constructed storage structure, a query signature is generated from the query, the query signature is compared with the storage structure, and a co-key is generated to deliver the same amount of co-key to each processing node. A second function of doing; A third function for each processing node receiving the same key to find a frame indicated by the same key to determine whether it matches a query signature, and to transmit a result of the determination to a neighboring processing node; And a computer for recording a program for realizing a fourth function of searching for and determining a matched frame using the match result, and transmitting the result to the next processing node to end the search when the processing node receives the determination result. It provides a recording medium that can be read by.

상술한 목적, 특징들 및 장점은 첨부된 도면과 관련한 다음의 상세한 설명을 통하여 보다 분명해 질 것이다. 이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 일실시예를 상세히 설명한다.The above objects, features and advantages will become more apparent from the following detailed description taken in conjunction with the accompanying drawings. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1 은 본 발명이 적용되는 데이터베이스 검색시스템에 대한 구성 예시도이다.1 is an exemplary configuration diagram of a database search system to which the present invention is applied.

도 1에 도시된 바와 같이, 레코드(11)는 데이터베이스(12)의 저장 구조에 따라 저장되고, 삽입과 삭제가 빈번히 일어난다. 이러한 멀티미디어 데이터베이스에서 질의를 검색한다는 의미는 질의에 존재하는 데이터를 포함하는 모든 레코드를 구한다는 의미이다. 따라서, 검색을 빠르게 하기 위하여 먼저 레코드(11)로부터 레코드 시그니처(13)를 추출하여 저장하고, 주어진 질의로부터 질의 시그니처(15)를 추출하며, 이 질의 시그니처(15)를 레코드 시그니처(13)와 비교하여 매치(match)되는 레코드 시그니처를 찾는다. 이렇게 구해진 레코드 시그니처(13)에 해당하는 실제의 레코드가 질의를 포함하는지를 확인하게 된다. 여기서, 질의 시그니처(15)가 레코드 시그니처(13)에 매치된다는 의미는 레코드 시그니처(13)가 질의 시그니처(15)의 비트 스트링을 포함한다는 의미이다.As shown in Fig. 1, the record 11 is stored according to the storage structure of the database 12, and insertion and deletion frequently occur. Searching for a query in such a multimedia database means that all records containing data existing in the query are retrieved. Therefore, in order to speed up the search, first, the record signature 13 is extracted from the record 11 and stored, the query signature 15 is extracted from the given query, and the query signature 15 is compared with the record signature 13. To find the matching record signature. It is checked whether the actual record corresponding to the record signature 13 thus obtained includes a query. Herein, the query signature 15 matches the record signature 13 means that the record signature 13 includes a bit string of the query signature 15.

컴퓨터 시스템에 레코드의 삽입과 삭제가 빈번히 발생하는 동적인 환경을 지원하기 위하여 본 발명에서의 병렬 처리 방법은 신장해싱(Extendable Hashing)을 이용한다. 그러나, 종래의 기술에서는 해싱 함수의 키를 근거로 병렬 처리를 하는 반면, 본 발명에서는 실행 균등을 위하여 시그니처의 프레임(frame)을 근거로 병렬 처리를 수행한다.In order to support a dynamic environment where record insertion and deletion occur frequently in a computer system, the parallel processing method of the present invention uses extensible hashing. However, in the related art, the parallel processing is performed based on the key of the hashing function, whereas in the present invention, the parallel processing is performed based on the frame of the signature for equality of execution.

상기의 프레임은 시그니처를 분할한 같은 크기의 시그니처 조각이다. 시그니처의 프레임 개수는 프로세싱 노드의 정수배 만큼으로 설정한다. 이렇게 함으로써, 프로세싱 노드 당 양적인 균등과 실행에 대한 균등을 이룰 수 있기 때문이다. 이렇게 분할된 프레임을 각 프로세싱 노드에 전달하여 저장한다.The frame is a signature fragment of the same size obtained by dividing the signature. The number of frames of the signature is set to an integer multiple of the processing node. This way, you can achieve quantitative equality and equality of execution per processing node. The divided frames are delivered to each processing node and stored.

도 2 는 본 발명에 따른 시그니처 파일을 이용한 데이터베이스 검색시스템에서의 프레임 분할 병렬 처리 방법중 병렬 저장 구조를 나타내는 설명도이다.2 is an explanatory diagram illustrating a parallel storage structure in a frame division parallel processing method in a database retrieval system using a signature file according to the present invention.

도 2에 도시된 바와 같이, 신장 해싱과 프레임 분할 방법을 이용하여 시그니처를 분할하고 각 프로세싱 노드에 전달하여 병렬로 저장하는 구조를 구성하고 나면 도2와 같은 저장 구조가 생성된다. 이에 대해 상세히 설명하면 다음과 같다.As shown in FIG. 2, after the signature is divided and transferred to each processing node and stored in parallel using decompression hashing and frame division methods, a storage structure as shown in FIG. 2 is generated. This will be described in detail below.

첫번째, 신장 해싱과 프레임 분할 방법을 이용하여 시그니처를 분할하고 각 프로세싱 노드에 전달하여 병렬로 저장 구조를 구성한다.First, the signature is partitioned using the decompression hashing and the frame partitioning method and passed to each processing node to form the storage structure in parallel.

두번째, 병렬로 저장된 구조에 검색의 질의(query)가 주어지면, 질의로부터 질의 시그니처를 만들고, 상기의 질의 시그니처와 상기의 병렬로된 저장 구조를 비교하여 동류키를 생성하고, 각각의 프로세싱 노드에 동일한 양의 동류키를 전달한다.Secondly, given a query of a search in a parallel stored structure, a query signature is created from the query, the query signature is compared with the parallel stored structure, and a concurrency key is generated for each processing node. Pass the same amount of concurrency keys.

세번째, 동류키를 전달 받은 각각의 프로세싱 노드는 상기 동류키가 가리키는 프레임을 찾아서 질의 시그니처와 매치되는지 여부를 판단하고, 매치되는 판단 결과를 이웃하는 프로세싱 노드에 전달한다.Third, each processing node that has received the same key finds a frame indicated by the same key, determines whether it matches the query signature, and delivers the matched determination result to the neighboring processing node.

네번째, 판단 결과를 전달 받은 프로세싱 노드는 상기 전달 받은 매치 결과를 이용하여 매치되는 프레임 만을 검색하여 다시 판단결과를 구하고, 또 다음 프로세싱 노드로 전달하여 마지막 프레임까지 도달했을 때, 검색을 종료한다.Fourth, the processing node that has received the determination result retrieves only the matched frame using the received match result to obtain the determination result again, and transfers to the next processing node to end the search when reaching the last frame.

컴퓨터 시스템에 존재하는 프로세싱 노드 중 하나는 메인 프로세싱 노드(main processing node) 역할을 수행하여 레코드가 데이터베이스에 삽입되면, 상기의 레코드를 레코드 시그니처로 만들고, 해싱함수에 의하여 저장될 위치를 결정하여 그 위치에 레코드 시그니처를 저장하기 위해 프레임으로 분할하며, 상기의 프레임을 각 프로세싱 노드에 전달한다. 상기의 메인 프로세싱 노드는 병렬처리에 가담하는 노드일 수도 있고 개별적인 하나의 프로세싱 노드일 수도 있다. 상기의 메인 프로세싱 노드는 신장 해싱의 해싱키들을 소유하고 있으며 레코드의 삽입, 삭제 및 검색의 시작을 수행하는 역할을 한다.One of the processing nodes present in the computer system acts as a main processing node, where records are inserted into the database, making them record signatures, and determining where to store them by hashing functions. The frame signature is divided into frames for storing the record signature, and the frame is transmitted to each processing node. The main processing node may be a node participating in parallel processing or may be a separate processing node. The main processing node owns hashing keys of decompression hashing and is responsible for performing insertion, deletion and retrieval of records.

시그니처를 분할하고 각 프로세싱 노드에 전달하여 병렬로 저장 구조를 구성한 결과로 도 2와 같은 저장 구조가 형성된 다음, 저장된 구조에 검색의 질의가 주어지면 질의 시그니처를 만들고, 질의 시그니처로부터 동류키를 계산한다. 동류키의 개수를 프로세싱 노드의 개수로 나누어 각 프로세싱 노드에 같은 양의 동류키가 전달되도록 한다. 도 2에서 해싱키가 '1010'이므로 네개의 동류키 '1010', '1011', '1110', '1111'가 생성되고, 이중 '1010'은 프로세싱 노드 P1에, '1011'은 프로세싱 노드 P2에, '1110'은 프로세싱 노드 P3에, '1111'은 프로세싱 노드 P4에 전달되고 검색이 시작된다. 여기까지는 메인 프로세싱 노드에서 수행된다.As a result of dividing the signature and transferring it to each processing node to form a storage structure in parallel, a storage structure as shown in FIG. . The number of concurrent keys is divided by the number of processing nodes so that the same amount of similar keys is delivered to each processing node. In FIG. 2, since the hashing key is '1010', four concurrency keys '1010', '1011', '1110', and '1111' are generated, of which '1010' is in processing node P1 and '1011' is processing node P2. Is transferred to processing node P3 and '1111' to processing node P4 and the search begins. So far, this is done at the main processing node.

상기에 기술된 상기의 세번째 과정부터는 각 프로세싱 노드에서 수행된다. 프로세싱 노드 P1에서 질의 시그니처의 프레임 '0010 1100'이 레코드 시그니처의 프레임들과 비교된다. 병렬 처리를 위해 P1은 자신이 가지고 있는 블록 중 가장 상위 블록 두개를 검색하고, P2는 두번째 상위 블록 두개를, 그리고 Pn은 가장 마지막 블록을 동시에 검색한다. 이러한 레코드 시그니처 프레임을 포함하고 있는 블록을 프레임 블록이라 한다.From the third process described above, it is performed at each processing node. Frame '0010 1100' of the query signature is compared with the frames of the record signature at processing node P1. For parallelism, P1 retrieves the top two blocks of its own block, P2 searches the second two top blocks, and Pn searches the last block simultaneously. The block containing the record signature frame is called a frame block.

이때, 프레임 검색은 질의 시그니처 프레임의 검색 순서에 따라 수행된다. P1에서 '0010' 보다 '1100'이 '1'의 개수가 많기 때문에, 먼저 검색하면 레코드 시그니처 프레임 중 매치되지 않은 것을 더 빨리 찾아낼 수 있다. 도 2에서 '1100'과 매치되는 레코드 시그니처 프레임은 하나도 없음을 알 수 있으며, 다음 질의 프레임인 '0010'은 검색하지 않아도 된다.In this case, the frame search is performed according to the search order of the query signature frame. Since P1 has a greater number of '1's than' 0010 'in P1, searching first may find the unmatched record signature frame faster. In FIG. 2, it can be seen that no record signature frame matches '1100', and the next query frame '0010' does not need to be searched.

이렇게, 검색된 결과는 P2에 전달되고, P1은 Pn에서 수행된 결과를 받아 가장 하위 블록을 검색한다. P2에서도 상기와 같은 과정을 거쳐 두개의 시그니처 모두가 매치됨을 알 수 있다. 질의 시그니처 프레임 '0000'은 모든 프레임에 매치됨으로 블록 접근을 줄이기 위해 프레임 검색을 수행하지 않는다. 이렇게 함으로써, 검색 수행을 더 빠르게 할 수 있다. 상기의 매치된 결과는 P3으로 전달되고 P2는 P1의 결과를 받아 가장 상위 블록을 검색한다.In this way, the searched result is delivered to P2, and P1 receives the result performed at Pn and searches the lowest block. In P2, the same procedure can be seen that both signatures match. The query signature frame '0000' matches all frames, so frame search is not performed to reduce block access. By doing this, you can speed up the search. The matched result is passed to P3, which receives the result of P1 and searches for the highest block.

상기의 과정들은 각 프로세싱 노드에서 처음 실행한 위치로 다시 되돌아 오면 끝나고, 매치된 결과는 메인 프로세싱 노드로 넘겨져 매치 결과가 통합된다.The above steps are completed when returning back to the first execution position at each processing node, and the matched result is passed to the main processing node to integrate the match result.

도 3 은 본 발명에 따른 시그니처 파일을 이용한 데이터베이스 검색시스템에서의 프레임 분할 병렬 처리 방법에 대한 일실시예 흐름도이다.3 is a flowchart illustrating a method for parallel processing of frame division in a database search system using a signature file according to the present invention.

도 3에 도시된 바와 같이, 사용자 질의로부터 질의 시그니처를 생성하고(301), 질의 시그니처를 이용하여 동류키 집합(EK)을 계산한다(302).As shown in FIG. 3, a query signature is generated from the user query (301), and a similar key set (EK) is calculated using the query signature (302).

이어서, 같은 양의 동류키를 각 프로세싱 노드에 전달하여 각 프로세싱 노드로 분리한다(303). 이때, 이와 같은 과정은 메인 프로세싱 노드에서 수행된다.The same amount of concurrency keys is then passed to each processing node to separate into each processing node (303). At this time, this process is performed in the main processing node.

이렇게, 같은 양의 동류키를 전달받은 각 프로세싱 노드는 각 동류키에 대한 프레임 그룹의 검색상태를 초기화한다(304). 여기서, 프레임 그룹은 한 동류키에 의해 가리켜지는 프레임 블록들의 집합을 의미하고, 검색상태는 한 프레임 그룹에 대하여 검색이 어느 프로세싱 노드에서 실행되고 있는지를 알기 위하여 프로세싱 노드의 번호를 값으로 갖는다.In this way, each processing node that has received the same amount of like keys initializes the search state of the frame group for each like key (304). Here, a frame group means a set of frame blocks indicated by one concatenation key, and a search state has a value of a processing node's number as a value to know in which processing node a search is performed for one frame group.

이어서, 프레임들의 매치상태를 '1'로 초기화한다(305). 여기서, 매치상태는 프레임 그룹 내에 있는 시그니처에 대해 질의 시그니처와 매치되는지의 상태를 표시한다. 그러므로, 매치 상태는 레코드 시그니처 개수 만큼의 비트수가 필요하다.Next, the match state of the frames is initialized to '1' (305). Here, the match state indicates whether or not the signature matches the query signature for signatures in the frame group. Therefore, the match state requires as many bits as the number of record signatures.

또한, 프레임 그룹의 검색 상태 플래그를 'False'로 초기화한다(306). 그 이유는 프레임 블록에 매치되는 레코드 시그니처 프레임이 있음을 표시하기 위해서 이다. 만약 매치되는 프레임이 없으면, 검색상태를 '-1'로 바꿈으로써, 빠른 검색을 수행하게 되는데 이는 더 이상 프레임 블록을 포함하는 프레임 그룹을 검색할 필요가 없기때문이다.In addition, the search state flag of the frame group is initialized to 'False' (306). The reason is to indicate that there is a record signature frame that matches the frame block. If no matching frame is found, the search status is changed to '-1' to perform a quick search because it is no longer necessary to search the frame group containing the frame block.

한편, 아직 검색하지 않은 동류키가 존재하는지를 검색한다(307).On the other hand, it is searched whether there is a similar key which has not been searched yet (307).

검색결과, 동류키가 존재하지 않으면 최종 매치된 시그니처를 출력하고(308), 동류키가 존재하면 프레임 블록의 시그니처 개수 만큼을 반복한 후(309) 질의 프레임이 레코드 프레임과 일치하는지를 판단한다(310).As a result, if the same key does not exist, the last matched signature is output (308), and if the same key exists, the number of signatures of the frame block is repeated (309), and then it is determined whether the query frame matches the record frame (310). ).

검색결과, 동류키가 존재하지 않으면 최종 매치된 시그니처를 출력하고(308), 동류키가 존재하면 프레임 블록의 시그니처 개수만큼(309) 질의 프레임이 레코드 프레임과 일치하는지를 반복하여 판단한다(310). 여기서, (309)과정에서 반복한다는 의미는 (310), (311) 및 (312)과정에 해당하는 모든 부분을 반복한다는 의미이다.As a result of the search, if the same key does not exist, the last matched signature is output (308). If the same key exists, the number of signatures of the frame block (309) is repeatedly determined whether the query frame matches the record frame (310). Here, repeating in step 309 means repeating all parts corresponding to steps 310, 311, and 312.

판단결과, 질의 프레임이 레코드 프레임과 일치하면 검색상태를 'True' 로 바꾸고(311), 질의 프레임이 레코드 프레임과 일치하지 않으면 프레임의 검색상태를 '0'으로 바꾼다(312). 여기서, 검색 상태를 '0'으로 바꾸는 이유는 시그니처가 여러 프레임으로 나누어져 있으므로 일단 매치되었다고 가정하고 한 부분이라도 매치되지 않으면 전체 시그니처가 매치되지 않는 것이 되기 때문이다.As a result, if the query frame matches the record frame, the search state is changed to 'true' (311). If the query frame does not match the record frame, the search state of the frame is changed to '0' (312). Here, the reason for changing the search state to '0' is that since the signature is divided into several frames, it is assumed that once a match is made, and if any part does not match, the entire signature does not match.

이어서, 검색한 프레임 블록의 상태 플래그가 'True'인지를 확인한다(313).Subsequently, it is checked whether the state flag of the searched frame block is 'True' (313).

확인결과, 검색한 프레임 블록의 상태 플래그가 'True'이면 검색상태와 매치상태를 다음 프로세싱 노드로 전달하고(314), 검색한 프레임 블록의 상태 플래그가 'False'가 아니면 검색상태를 '-1'로 두고 검색상태만 전달한다(315).As a result, if the status flag of the searched frame block is 'True', the search status and match status are transmitted to the next processing node (314). If the status flag of the searched frame block is not 'False', the search status is '-1'. '315 and delivers only the search status (315).

즉, (313)에서 (315)까지는 프레임 블록에 대한 검색 상태를 확인하여 매치 결과(매치 상태와 검색 상태)를 다음 프로세싱 노드에 넘겨주는 과정이다. 즉, 검색상태 플래그가 'True'이면 매치되는 프레임이 존재함으로 검색 상태와 매치 상태를 모두 다음 프로세싱 노드로 넘겨준다. 그러나, False인 경우는 매치되는 시그니처가 없으므로 그다음 블록부터는 검색할 필요가 없으므로 '-1'값을 갖는 검색상태만 넘겨준다.That is, from 313 to 315, the process of checking the search state for the frame block and passing the match result (match state and search state) to the next processing node. In other words, if the search status flag is 'True', there is a matching frame, so both the search status and the match status are passed to the next processing node. In case of False, however, since there is no matching signature, it is not necessary to search from the next block, so only the search status with '-1' is passed.

이후, 아직 검색하지 않은 동류키가 존재하는지를 검색하는 과정(307)부터 반복 수행하고, 모든 동류키를 검색한 다음에 최종적으로 매치되는 시그니처를 출력한다.Subsequently, the process is repeatedly performed in step 307 of searching for whether there is a similar key that has not been searched yet, and after searching all the same key, a finally matching signature is output.

이상에서 설명한 본 발명은 전술한 실시예 및 첨부된 도면에 의해 한정되는 것이 아니고, 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 여러 가지 치환, 변형 및 변경이 가능하다는 것이 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 있어 명백할 것이다.The present invention described above is not limited to the above-described embodiments and the accompanying drawings, and various substitutions, modifications, and changes can be made in the art without departing from the technical spirit of the present invention. It will be apparent to those of ordinary knowledge.

상기한 바와 같은 본 발명은, 시그니처를 병렬 처리하기 위해 프레임으로 분할하여 저장함으로써, 종래의 방법에서 이룰 수 없었던 실행의 균등을 이룰 수 있는 효과가 있다.The present invention as described above has the effect of equalizing the execution which could not be achieved by the conventional method by dividing and storing the signature into frames for parallel processing.

또한, 본 발명은 시그니처의 매치되는 특성에 의해 동류키가 발생하는데, 이 동류키를 병렬의 프로세싱 노드에 고르게 분포시킴으로써, 데이터에 대한 편중과 실행에 대한 편중을 최소화시킬 수 있으며, 병렬 처리의 프로세싱 노드를 효과적으로 이용하여 빠른 검색을 수행할 수 있는 효과가 있다.In addition, according to the present invention, a matching key is generated by a matching characteristic of a signature. By uniformly distributing the matching key to parallel processing nodes, it is possible to minimize the bias on data and the execution on parallel processing. It is effective to perform quick search by effectively using nodes.

그리고, 본 발명은, 시그니처를 프레임으로 분할하는 방법을 이용하는데 있어, 신장해싱의 구조를 해치지 않기 때문에 데이터의 삽입과 삭제가 빈번하게 발생하는 동적인 환경에서도 효율적으로 처리할 수 있는 효과가 있다.In addition, the present invention utilizes a method of dividing a signature into frames, which does not impair the structure of decompression hashing, so that it can be efficiently processed even in a dynamic environment in which data insertion and deletion frequently occur.

Claims

In the frame division parallel processing method applied to a database retrieval system using a signature file,

A first step of dividing the signature using the decompression hashing and the frame division scheme and passing the signature to each processing node to build a storage structure in parallel;

Given a query of a search in the constructed storage structure, a query signature is generated from the query, the query signature is compared with the storage structure, and a co-key is generated to deliver the same amount of co-key to each processing node. A second step of doing;

A third step of each processing node that has received the same key finds a frame indicated by the same key to determine whether it matches the query signature, and transmits the result of the determination to a neighboring processing node; And

A fourth step in which the processing node receiving the determination result searches for and determines a matched frame by using the match result, and transfers to the next processing node to terminate the search when the last frame is reached.

Frame division parallel processing method comprising a.

The method of claim 1,

In the third step,

A fifth step of initializing a retrieval state of frames for each concurrent key at each processing node, and initializing a match state of frames and a retrieval state flag of a frame group;

A sixth step of determining whether the query frame matches the record frame by repeating the number of signatures of the frame block according to the search result of the same key; And

A seventh step of transmitting a search state and a match state or a search state to a neighboring processing node according to the determination result;

Frame division parallel processing method using a signature file comprising a.

The method of claim 2,

The frame group,

And a set of frame blocks indicated by one concatenation key.

The method of claim 2 or 3,

The match state is,

And indicating whether or not the signature in the frame group matches the query signature.

In a database search system with a processor,

A first function of dividing a signature using decompression hashing and a frame partitioning scheme and passing it to each processing node to build a storage structure in parallel;

Given a query of a search in the constructed storage structure, a query signature is generated from the query, the query signature is compared with the storage structure, and a co-key is generated to deliver the same amount of co-key to each processing node. A second function of doing;

A third function for each processing node receiving the same key to find a frame indicated by the same key to determine whether it matches a query signature, and to transmit the result of the determination to a neighboring processing node; And

A fourth function of searching for and determining a matched frame using the match result by the processing node receiving the determination result, transferring the result to the next processing node, and ending the search when the last frame is reached.

A computer-readable recording medium having recorded thereon a program for realizing this.