KR20050005261A

KR20050005261A - Information search management system and method tereof

Info

Publication number: KR20050005261A
Application number: KR1020030044284A
Authority: KR
Inventors: 최윤수; 류범종; 강무영; 서정현; 최성필; 안성수; 진두석; 주원균; 이민호; 김광영; 김진숙; 김현
Original assignee: 한국과학기술정보연구원
Priority date: 2003-07-01
Filing date: 2003-07-01
Publication date: 2005-01-13
Also published as: KR100493399B1

Abstract

PURPOSE: A system and a method for searching/managing database information are provided to construct a low-cost/high-efficiency information system by forming a stable management function in addition to a search function for a database. CONSTITUTION: An index database volume(70) stores an index database. A document database volume(80) stores distributed document databases. A dictionary database volume(50) stores distributed dictionary databases. A kernel(10) performs data I/O(Input/Output) between the index/document database volume and a user memory, and manages files, directories, records, and reverse files in the index/document database volume. A storage engine(20) manages catalogs storing metadata for the databases and manages document/indexes by using the kernel. A search engine(30) performs the search for a user's query. An indexer(40) extracts the index for the document inputted from the user by using the dictionary database. A data manager(60) generates the database by receiving a schema file from a manager, loads a bundle of raw documents in bulk, and performs indexing.

Description

Information retrieval management system and its method {INFORMATION SEARCH MANAGEMENT SYSTEM AND METHOD TEREOF}

본 발명은 데이터베이스(DB) 정보를 검색하고 관리하는 정보검색 관리시스템 및 그 방법에 관한 것이다. 특히 데이터베이스에 대한 검색기능 외에 안정적인 관리 기능을 추가시켜 구성함으로써, 저비용 및 고효율의 정보시스템 구축이 가능한 정보검색 관리시스템 및 그 방법에 관한 것이다.The present invention relates to an information retrieval management system and method for retrieving and managing database (DB) information. In particular, the present invention relates to an information retrieval management system and a method for constructing a low-cost and high-efficiency information system by adding a stable management function in addition to a database search function.

과거에는 특정계층의 전유물로 인식되었던 인터넷이 웹의 등장으로 대중화되면서 일반인들을 위한 정보 공유의 장이 되었으며, 기하급수적으로 증가되는 정보의 홍수 속에서 자신의 필요한 정보를 찾고자 하는 사용자들의 요구에 의해 정보 검색 시스템에 대한 수요가 크게 증가하였다.In the past, the Internet, which was recognized as the exclusive property of a certain class, became a place of information sharing for the general public with the advent of the web, and searched for information by users who wanted to find their necessary information in the explosive flood of information. The demand for the system has increased significantly.

정보 검색 시스템이 데이터 보관 저장 장소로 데이터베이스 관리시스템(database management system: DBMS) 활용 추세가 급증하면서 역 파일 구조의 정보검색시스템과 데이터베이스 관리시스템(DBMS)의 정형 데이터를 결합하여 검색을 지원하는 시스템이 주류를 이루고 있다. 즉, 데이터베이스 관리 시스템(DBMS)은 데이터의 관리를 담당하고, 정보 검색 시스템은 색인을 수행하고사용자 검색에 대응하는 방식이다.As information retrieval systems are increasingly using database management systems (DBMSs) as data storage locations, systems that support retrieval by combining structured data from information retrieval systems (DBMSs) with reverse file structures It is mainstream. That is, a database management system (DBMS) is in charge of managing data, and an information retrieval system performs an index and corresponds to a user search.

그러나, 정보서비스를 위해 데이터베이스 관리시스템(DBMS)과 정보검색시스템을 동시에 사용하는 것은 서비스 구축 비용을 증가시키고, 개발속도를 느리게 하며, 전체적으로 시스템이 중량화되는 단점이 있었다.However, simultaneously using a database management system (DBMS) and an information retrieval system for information services has the disadvantage of increasing the service construction cost, slowing down the development speed, and weighting the system as a whole.

또한, 인터넷 사용자의 기하급수적인 증가에 따라 인터넷에 있는 정보량 또한 급속도로 증가하며 정보 갱신도 자주 일어나고 있다. 이러한 정보의 홍수 속에서 자신이 필요한 정보를 찾고자 하는 사용자들의 요구에 의해 정보 검색 시스템은 대용량의 색인 및 잦은 데이터 갱신이 요구되고 있다.In addition, with the exponential growth of Internet users, the amount of information on the Internet is also rapidly increasing, and information updates frequently occur. The information retrieval system requires a large volume of indexes and frequent data updates due to the demands of users who want to find the information they need in the flood of information.

종래의 정보 검색 시스템은 대용량 데이터의 처리(색인)가 힘들거나 많은 사용자에 대한 응답 시간이 오래 걸렸으며, 시스템 자원을 많이 소모하였다. 또한, 데이터 갱신을 한번에 모아서 처리하는 벌크 방식을 통한 적재로 인하여 잦은 데이터 갱신이 요구되는 분야에서는 적용하는데 어려움이 있었다. 그리고, 여러 분야에서 많이 사용되고 있는 종래의 데이터베이스 시스템은 잦은 데이터 갱신은 가능하나, 많은 양의 비정형 문서를 처리하기에는 많은 시간이 걸리는 단점이 있었다.Conventional information retrieval systems have been difficult to process large amounts of data (index) or have a long response time for many users, and consume a lot of system resources. In addition, there is a difficulty in applying in a field requiring frequent data updates due to the bulk loading method of collecting and processing data updates at once. In addition, the conventional database system, which is widely used in various fields, can frequently update data, but it takes a long time to process a large amount of unstructured documents.

따라서, 본 발명은 상기 문제점을 해결하기 위해 이루어진 것으로, 본 발명의 목적은 데이터베이스에 대한 검색기능 외에 안정적인 관리 기능을 추가시켜 구성함으로써, 저비용 및 고효율의 정보시스템 구축이 가능한 정보검색 관리시스템 및 그 방법을 제공하는데 있다.Accordingly, the present invention has been made to solve the above problems, and an object of the present invention is to add a stable management function in addition to the search function for the database, an information retrieval management system and method capable of constructing a low cost and high efficiency information system To provide.

또한, 본 발명의 다른 목적은 유니코드를 지원함으로써, 고어/한자 및 다국어 언어를 저장엔진 수준에서 지원 가능하도록 한 정보검색 관리시스템 및 그 방법을 제공하는데 있다.Another object of the present invention is to provide an information retrieval management system and method for supporting Unicode, Chinese, Chinese and multilingual languages at the storage engine level.

또한, 본 발명의 또 다른 목적은 문서 및 색인 데이터베이스에 대한 압축을 지원함으로써, 문서 크기의 세배정도 되는 색인 데이터베이스에 대한 공간을 축소하여 저장 공간을 효율적으로 이용하도록 한 정보검색 관리시스템 및 그 방법을 제공하는데 있다.In addition, another object of the present invention is to provide an information retrieval management system and method for efficiently using storage space by reducing the space for the index database that is about three times the document size by supporting compression of the document and index database. To provide.

또한, 본 발명의 다른 목적은 기존의 정보검색 시스템과 데이터베이스 시스템의 장점을 취하여 대용량 데이터의 저장 및 빠른 검색, 많은 동시 사용자 처리 그리고 잦은 데이터 갱신이 가능하도록 한 정보검색 관리시스템 및 그 방법을 제공하는데 있다.In addition, another object of the present invention is to provide an information retrieval management system and method that takes advantage of the existing information retrieval system and database system to enable the storage and fast retrieval of large data, many simultaneous user processing and frequent data updates. have.

도 1은 본 발명에 의한 정보검색 관리시스템의 전체 구성 블록도1 is an overall block diagram of an information retrieval management system according to the present invention

도 2는 본 발명에 의한 정보검색 관리시스템의 데이터 베이스의 적재 과정을 개념적으로 나타낸 개념도2 is a conceptual diagram conceptually illustrating a loading process of a database of an information retrieval management system according to the present invention;

도 3은 본 발명에 의한 정보검색 관리시스템의 구성 프로세스들과의 통신 형태를 개념적으로 나타낸 개념도3 is a conceptual diagram conceptually showing a form of communication with the process of configuring the information retrieval management system according to the present invention

도 4a 내지 도 4f는 본 발명에 의한 정보검색 관리시스템의 클라이언트 측의 검색 서비스를 개념적으로 나타낸 개념도로서,4A to 4F are conceptual views conceptually showing a search service on the client side of the information retrieval management system according to the present invention;

도 4a는 클라이언트가 데이터베이스의 정보를 얻는 방법을 나타낸 개념도,4A is a conceptual diagram illustrating how a client obtains information of a database;

도 4b는 클라이언트가 데이터베이스의 섹션 리스트를 얻는 방법을 나타낸 개념도,4B is a conceptual diagram illustrating how a client obtains a section list of a database;

도 4c는 클라이언트가 검색을 요청하는 방법을 나타낸 개념도,4c is a conceptual diagram illustrating how a client requests a search;

도 4d는 클라이언트가 유사문서검색을 요청하는 방법을 나타낸 개념도,4d is a conceptual diagram illustrating how a client requests similar document search;

도 4e는 클라이언트가 검색 결과 리스트를 요청하는 방법을 나타낸 개념도,4E is a conceptual diagram illustrating how a client requests a list of search results;

도 4f는 클라이언트가 문서의 원문을 요청하는 방법을 나타낸 개념도이다.4F is a conceptual diagram illustrating how a client requests the original text of a document.

도 5a 및 도 5b는 본 발명에 의한 정보검색 관리시스템의 온라인 문서 관리 서비스를 개념적으로 나타낸 개념도로서,5A and 5B are conceptual views conceptually illustrating an online document management service of the information retrieval management system according to the present invention.

도 5a는 한 개의 문서를 삽입 및 변경하는 방법을 나타낸 개념도,5A is a conceptual diagram illustrating a method of inserting and changing a single document;

도 5b는 한 개의 문서를 삭제하는 방법을 나타낸 개념도이다.5B is a conceptual diagram illustrating a method of deleting one document.

도 6은 문서의 제목 섹션을 두 가지의 색인 방식에 따라 색인 한 것을 나타낸 도면6 is a diagram illustrating indexing of a title section of a document according to two indexing methods;

<도면의 주요 부분에 대한 부호의 설명><Explanation of symbols for the main parts of the drawings>

10 : 커널 11 : 파일 및 디렉토리 관리기10: Kernel 11: File and Directory Manager

12 : 레코드 관리기 13 : 역화일 관리기12: Record Manager 13: Back File Manager

14 : 트랜잭션 관리기 15 : 입출력 관리기14: transaction manager 15: I / O manager

20 : 저장엔진 21 : 카탈로그 관리기20: storage engine 21: catalog manager

22 : 문서관리기 23 : 색인관리기22: Document Manager 23: Index Manager

30 : 검색엔진 40 : 색인기30: search engine 40: indexer

50 : 사전 데이터베이스 60 : 데이터 관리기50: Dictionary Database 60: Data Manager

70 : 목록 데이터베이스 볼륨 80 : 문서 데이터베이스 볼륨70: List Database Volume 80: Document Database Volume

101 : 데이터베이스 스키마 파일 102 : 원시 문서 파일101: database schema file 102: raw document file

103 : 불용어 목록 110 : 로더103: stopwords list 110: loader

210 : 잡 스케줄러210: Job Scheduler

220/1∼220/n, 330/1∼330/n : 제 1 파이어 내지 제 n 파이어220/1 to 220 / n, 330/1 to 330 / n: first fire to nth fire

230 : 셋 매니저 232 : 셋(set)230: set manager 232: set

240 : 데이터 매니저 250 : 목록 데이터베이스240: Data Manager 250: List Database

260 : 제 1 내지 제 n 데이터베이스260: first to nth database

310 : 클라이언트 320 : 잡 스케줄러310: Client 320: Job Scheduler

340 : 셋 매니저 350 : 데이터 매니저340: set manager 350: data manager

상기 목적을 달성하기 위한 본 발명에 의한 정보검색 관리 시스템은,Information retrieval management system according to the present invention for achieving the above object,

목록 데이터베이스를 저장하고 있는 목록 데이터베이스 볼륨;A list database volume that stores the list database;

하나 또는 여러 개의 문서 데이터베이스를 분산하여 저장하고 있는 문서 데이터베이스 볼륨;A document database volume in which one or several document databases are distributed and stored;

하나 또는 여러 개의 사전 데이터베이스를 분산하여 저장하고 있는 사전 데이터베이스 볼륨;A dictionary database volume in which one or several dictionary databases are distributed and stored;

상기 목록 및 문서 데이터베이스 볼륨과 사용자 메모리간의 데이터 입출력을 수행하며, 상기 목록 및 문서 데이터베이스 볼륨내의 파일 및 디렉토리, 레코드, 역화일을 관리하는 커널;A kernel that performs data input / output between the list and document database volumes and user memory, and manages files, directories, records, and inverse files in the list and document database volumes;

상기 커널을 이용하여 데이터베이스에 대한 메타정보를 보관하는 카탈로그 관리, 문서 및 색인에 대한 관리를 수행하는 저장엔진;A storage engine for performing catalog management, document and index management using the kernel to store meta information about a database;

사용자의 질의에 대한 검색을 수행하는 검색엔진;A search engine that searches for a user's query;

상기 사용자가 입력한 문서에 대한 색인어를 상기 사전 데이터베이스를 이용하여 추출하는 색인기; 및An indexer for extracting an index word for the document input by the user using the dictionary database; And

관리자가 작성한 스키마 파일을 입력받아 데이터베이스를 생성하고 원시문서 묶음을 벌크적재하고 색인을 수행하는 데이터관리기를 포함하여 구성된 것을 특징으로 한다.It includes a data manager that receives a schema file created by an administrator, creates a database, bulk loads a bundle of source documents, and performs an index.

상기 커널은 상기 목록 및 문서 데이터베이스 볼륨의 접근을 위한 페이지 및 버퍼를 관리하고, 이를 위한 로깅 및 잠금 기능을 수행하는 기능을 더 포함하는 것을 특징으로 한다.The kernel may further include a function of managing pages and buffers for accessing the list and document database volumes, and performing logging and locking functions therefor.

상기 커널은 레코드가 저장되어 있는 디스크의 물리적인 식별자를 논리적인 식별자로 매핑하여 접근을 용이하게 하고, 여러개의 볼륨에 대해 논리적 식별자를 발행하는 파일 및 디렉토리 관리기; 내부적으로 한 페이지에 저장할 수 있는 객체와 한 페이지 이상되는 긴 자료항목에 대한 관리를 지원하며, 화일내의 레코드를 처음부터 순차적으로 접근하는 기능, 레코드에 대한 삽입, 수정, 삭제와 관련된 갱신 연산을 지원하는 레코드 관리기; 저장공간의 효율을 위해 키에 대한 압축을 수행하며, 삽입, 수정, 삭제, 검색 기능을 하는 역화일 관리기; 트랜잭션의 시작과 종료, 중단, 세이브 포인트(Save Point) 기능, 트랜잭션의 정보를 로그 파일에 기록하는 기능을 제공하는 트랜잭션 관리기; 및 상기 목록 및 문서 데이터베이스 볼륨과 사용자 메모리간의 데이터 입출력을 관리하는 입출력 관리기를 포함하여 구성된 것을 특징으로 한다.The kernel may include a file and directory manager that maps a physical identifier of a disk in which a record is stored to a logical identifier to facilitate access, and issues logical identifiers for a plurality of volumes; It supports the management of objects that can be stored on one page internally and long data items more than one page, and the ability to sequentially access records in the file from the beginning, and update operations related to inserting, modifying, and deleting records. A record manager; A reverse file manager which compresses a key for efficient storage space and performs insertion, modification, deletion, and retrieval; A transaction manager providing a start and end of a transaction, aborting, a save point function, and a function of recording a transaction information in a log file; And an input / output manager for managing data input / output between the list and document database volumes and the user memory.

상기 트랜잭션 관리기는 회복시간을 줄이기 위해 체크 포인트를 사용하며, 지정된 작업이 중간에 실패하면 모든 작업은 트랜잭션의 시작점으로 복귀되도록 하여 데이터의 무결성을 유지시키는 것을 특징으로 한다.The transaction manager uses a checkpoint to reduce recovery time, and if a specified task fails in the middle, all the tasks are returned to the starting point of the transaction to maintain the integrity of the data.

상기 입출력 관리기는 상기 목록 및 문서 데이터베이스 볼륨에 페이지를 할당, 삭제, 유지보수를 수행하는 페이지관리기; 상기 디스크의 페이지를 메모리의 페이지로 매핑을 담당하는 버퍼관리기; 및 서로 다른 요구 사항들이 동일한 객체를 접근할 때 사용하는 잠금관리기를 포함하여 구성된 것을 특징으로 한다.The input / output manager may include: a page manager which allocates, deletes, and maintains pages in the list and document database volumes; A buffer manager for mapping the pages of the disk to the pages of the memory; And a lock manager that different requirements are used to access the same object.

상기 저장 엔진은 구축하고자 하는 데이터베이스의 구조에 대한 정보를 관리하는 카탈로그 관리기; 원본 문서를 내부 문서구조로 변환하여 삽입, 기존 문서의 삭제 또는 수정에 대한 연산을 수행하는 문서 관리기; 및 상기 색인기를 이용하여 추출된 색인 정보를 검색에 적합한 구조로 구성하는 색인 관리기를 포함하여 구성된 것을 특징으로 한다.The storage engine includes a catalog manager that manages information about a structure of a database to be built; A document manager converting the original document into an internal document structure and performing operations for insertion, deletion or modification of an existing document; And an index manager configured to configure the index information extracted using the indexer into a structure suitable for searching.

상기 데이터베이스의 구조에 대한 정보는 문서 구조, 색인 방법, 기본키 정보, 압축여부, 불용어를 포함하는 것을 특징으로 한다.Information on the structure of the database is characterized in that it includes a document structure, indexing method, primary key information, compression or not, stopwords.

상기 목적을 달성하기 위한 본 발명에 의한 다른 정보검색 관리 시스템은,Another information search management system according to the present invention for achieving the above object,

상기 목록 및 문서 데이터베이스의 디렉토리와 볼륨, 데이터베이스 그룹, 섹션의 정의, 섹션의 색인 방식, 원시 문서의 구조, 원시 문서로부터 데이터베이스로의 적재 방법에 대한 정보를 기술하는 데이터베이스 스키마 파일;A database schema file describing information about the directory and volume of the list and document database, a database group, a section definition, a section indexing method, a structure of a raw document, and a loading method from a source document to a database;

상기 문서 데이터베이스에 적재될 원시 문서들로 구성된 원시 문서 파일;A raw document file consisting of raw documents to be loaded into the document database;

상기 목록 데이터베이스에 적재될 불용어 목록; 및A stopword list to be loaded into the list database; And

상기 데이터베이스 스키마 파일로부터 수신된 정보에 의해 상기 문서 데이터베이스 생성 및 문서 적재를 수행하며, 상기 문서 데이터베이스 생성 및 문서 적재를 위한 명령어를 관리자로부터 입력받는 로더를 포함하여 구성된 것을 특징으로 한다.The document database is generated and the document is loaded according to the information received from the database schema file, and the loader is configured to include a loader for receiving commands for generating the document database and loading the document from an administrator.

상기 데이터베이스 스키마 파일에 기술되어 있는 원시 문서의 구조를 통하여 이종 구조를 가진 문서도 하나의 데이터베이스에 적재가 가능한 것을 특징으로 한다.Through the structure of the original document described in the database schema file, a document having a heterogeneous structure can be loaded into a single database.

상기 목적을 달성하기 위한 본 발명에 의한 또 다른 정보검색 관리 시스템은,Another information retrieval management system according to the present invention for achieving the above object,

클라이언트로부터 수신된 연결 요청을 제 1 내지 제 n 파이어들의 상태에 따라 작업을 분배해 주며, 온라인 문서관리시 데이터 매니저에 문서 관리를 요청하여 그 결과를 클라이언트에게 전송하며, 데이터베이스 변경시 데이터 매니저, 제 1 내지 제 n 파이어, 셋 매니저에게 데이터베이스가 변경되었음을 알리는 잡 스케줄러;It distributes the work according to the status of the first to nth fires received from the client, requests document management from the data manager during online document management, and transmits the result to the client. A job scheduler for notifying the first to nth fire, set managers that the database has been changed;

상기 잡 스케줄러로부터 수신된 서비스 요청에 대한 작업을 수행하며 그 결과를 상기 클라이언트에게 전송하며, 검색 결과를 셋 매니저에게 저장 요청하는 제1 내지 제 n 파이어;First to n-th fires that perform a job on a service request received from the job scheduler, transmit a result to the client, and request to save a search result to a set manager;

상기 제 1 내지 제 n 파이어로부터 수신된 서비스 요청에 대한 작업을 수행하며 그 결과를 상기 제 1 내지 제 n 파이어에게 전송하는 셋 매니저; 및A set manager that performs a task on a service request received from the first to nth fires and transmits a result to the first to nth fires; And

상기 잡 스케줄러로부터 수신된 서비스 요청에 대한 작업을 수행하며 그 결과를 상기 잡 스케줄러에게 전송하는 데이터 매니저를 포함하여 구성된 것을 특징으로 한다.And a data manager for performing a job on the service request received from the job scheduler and transmitting the result to the job scheduler.

상기 목적을 달성하기 위한 본 발명에 의한 정보검색 관리 방법은,Information retrieval management method according to the present invention for achieving the above object,

잡 스케줄러, 제 1 내지 제 n 파이어, 셋 매니저, 데이터 매니저를 포함하는 정보검색 관리시스템의 정보검색 관리 방법에 있어서,An information retrieval management method of an information retrieval management system including a job scheduler, first to n-th fires, set manager, and data manager,

클라이언트가 상기 잡 스케줄러를 통해 상기 제 1 내지 제 n 파이어로 데이터베이스의 정보를 요청하여 수신받는 제 1 단계;A first step of a client requesting and receiving information of a database from the first to nth fires through the job scheduler;

상기 클라이언트가 상기 잡 스케줄러를 통해 상기 제 1 내지 제 n 파이어로 데이터베이스의 섹션 리스트를 요청하여 수신받는 제 2 단계;A second step in which the client requests and receives a section list of the database from the first to nth fires through the job scheduler;

상기 클라이언트가 상기 잡 스케줄러를 통해 상기 제 1 내지 제 n 파이어로 검색을 요청하여 검색 결과를 수신받는 제 3 단계;A third step of receiving, by the client, a search result from the first to nth fires through the job scheduler and receiving a search result;

상기 클라이언트가 상기 잡 스케줄러를 통해 상기 제 1 내지 제 n 파이어로 유사문서검색을 요청하여 검색 결과를 수신받는 제 4 단계;A fourth step in which the client requests a similar document search from the first to nth fires through the job scheduler and receives a search result;

상기 클라이언트가 상기 잡 스케줄러를 통해 상기 제 1 내지 제 n 파이어로 검색 결과 리스트를 요청하여 수신받는 제 5 단계; 및A fifth step in which the client requests and receives a search result list from the first to nth fires through the job scheduler; And

상기 클라이언트가 상기 잡 스케줄러를 통해 상기 제 1 내지 제 n 파이어로문서의 원문을 요청하여 수신받는 제 6 단계를 포함하여 구성된 것을 특징으로 한다.And a sixth step in which the client requests and receives the original text of the first through nth fire documents through the job scheduler.

상기 제 1 단계에서 상기 제 1 내지 제 n 파이어는 상기 잡 스케줄러를 통해 수신한 상기 클라이언트의 요청에 의해 데이터베이스 정보를 검색하여 상기 클라이언트에게 전송하는 것을 특징으로 한다.In the first step, the first to n-th fires retrieve database information by the request of the client received through the job scheduler, and transmit the information to the client.

상기 제 2 단계에서 상기 제 1 내지 제 n 파이어는 상기 잡 스케줄러를 통해 수신한 상기 클라이언트의 요청에 의해 데이터베이스의 섹션 리스트를 검색하여 상기 클라이언트에게 전송하는 것을 특징으로 한다.In the second step, the first to n-th fires may retrieve a section list of a database by the request of the client received through the job scheduler, and transmit the retrieved section list to the client.

상기 제 3 단계에서 상기 클라이언트는 상기 잡 스케줄러를 통해 상기 제 1 내지 제 n 파이어로 질의, 검색대상 데이터베이스 리스트, 섹션을 포함한 검색 정보를 전송하는 것을 특징으로 한다.In the third step, the client transmits search information including a query, a search target database list, and a section to the first to nth fires through the job scheduler.

상기 제 3 단계는 상기 제 1 내지 제 n 파이어에서 상기 클라이언트로부터 수신된 상기 검색 정보에 의해 검색을 수행한 후 검색 결과를 상기 셋 매니저에게 전송하는 단계; 상기 셋 매니저에서 상기 제 1 내지 제 n 파이어로부터 수신된 검색 결과를 저장한 후, 결과 셋 번호와 문서 개수를 상기 제 1 내지 제 n 파이어에게 전송하는 단계; 및 상기 제 1 내지 제 n 파이어에서 상기 셋 매니저로부터 수신된 결과 셋 번호와 문서 개수를 상기 클라이언트에게 전송하는 단계를 포함하여 구성된 것을 특징으로 한다.The third step may include: performing a search by the search information received from the client in the first to nth fires and transmitting a search result to the set manager; Storing the search results received from the first to nth fires in the set manager, and transmitting a result set number and the number of documents to the first to nth fires; And transmitting the result set number and the number of documents received from the set manager to the client in the first to nth fires.

상기 제 4 단계에서 상기 클라이언트는 상기 잡 스케줄러를 통해 상기 제 1 내지 제 n 파이어로 검색 문서번호, 검색대상 데이터베이스 리스트, 섹션, 검색 방법을 포함한 유사문서 검색 정보를 전송하는 것을 특징으로 한다.In the fourth step, the client transmits similar document search information including a search document number, a search target database list, a section, and a search method through the job scheduler.

상기 제 4 단계는 상기 제 1 내지 제 n 파이어에서 상기 클라이언트로부터 수신된 상기 유사문서 검색 정보에 의해 검색을 수행한 후 검색 결과를 상기 셋 매니저에게 전송하는 단계; 상기 셋 매니저에서 상기 제 1 내지 제 n 파이어로부터 수신된 검색 결과를 저장한 후, 결과 셋 번호와 문서 개수를 상기 제 1 내지 제 n 파이어에게 전송하는 단계; 및 상기 제 1 내지 제 n 파이어에서 상기 셋 매니저로부터 수신된 결과 셋 번호와 문서 개수를 상기 클라이언트에게 전송하는 단계를 포함하여 구성된 것을 특징으로 한다.The fourth step may include: performing a search based on the similar document search information received from the client in the first to nth fires, and then transmitting a search result to the set manager; Storing the search results received from the first to nth fires in the set manager, and transmitting a result set number and the number of documents to the first to nth fires; And transmitting the result set number and the number of documents received from the set manager to the client in the first to nth fires.

상기 제 5 단계에서 상기 클라이언트는 상기 잡 스케줄러를 통해 상기 제 1 내지 제 n 파이어로 검색 시 수신한 결과 셋 번호, 보여줄 섹션과 수신받을 검색 결과 리스트의 개수를 포함한 검색 결과 리스트 정보를 전송하는 것을 특징으로 한다.In the fifth step, the client transmits, through the job scheduler, search result list information including a result set number received when searching the first to nth fires, a section to be displayed, and the number of search result lists to be received. It is done.

상기 제 5 단계는 상기 제 1 내지 제 n 파이어에서 상기 클라이언트로부터 수신한 상기 검색 결과 리스트 정보를 상기 셋 매니저에게 전송하는 단계; 상기 셋 매니저에서 상기 제 1 내지 제 n 파이어로부터 수신한 상기 검색 결과 리스트 정보에 포함된 검색 결과 리스트의 개수 만큼의 문서를 상기 제 1 내지 제 n 파이어에게 전송하는 단계; 및 상기 제 1 내지 제 n 파이어에서 상기 셋 매니저로부터 수신된 문서를 가지고 상기 클라이언트가 요구한 섹션 정보를 검색하여 검색 결과를 상기 문서와 같이 상기 클라이언트에게 전송하는 단계를 포함하여 구성된 것을 특징으로 한다.The fifth step may include transmitting the search result list information received from the client in the first to nth fires to the set manager; Transmitting as many documents as the number of search result lists included in the search result list information received from the first to nth fires to the first to nth fires in the set manager; And retrieving section information requested by the client with the documents received from the set manager in the first to nth fires, and transmitting a search result to the client as the document.

상기 제 6 단계에서 상기 클라이언트는 상기 잡 스케줄러를 통해 상기 제 1 내지 제 n 파이어로 문서 번호를 전송하는 것을 특징으로 한다.In the sixth step, the client transmits the document number to the first to nth fires through the job scheduler.

상기 제 6 단계는 상기 제 1 내지 제 n 파이어는 상기 클라이언트로부터 수신한 상기 문서 번호에 해당하는 문서의 문서 내용을 상기 클라이언트에게 전송하는 것을 특징으로 한다.In the sixth step, the first to nth fires transmit document contents of a document corresponding to the document number received from the client to the client.

상기 목적을 달성하기 위한 본 발명에 의한 다른 정보검색 관리 방법은,Another information retrieval management method according to the present invention for achieving the above object,

클라이언트가 문서와 삽입 또는 변경할 데이터베이스 정보를 상기 잡 스케줄러로 전송하는 단계;Transmitting, by the client, a document and database information to be inserted or changed to the job scheduler;

상기 잡 스케줄러에서 상기 클라이언트가 전송한 정보를 상기 데이터 매니저에게 전송하는 단계;Transmitting, by the job scheduler, information transmitted by the client to the data manager;

상기 데이터 매니저에서 문서 저장 또는 변경 후 결과를 상기 잡 스케줄러에게 전송하는 단계;Transmitting a result after storing or changing a document in the data manager to the job scheduler;

상기 잡 스케줄러에서 삽입 또는 변경이 성공적이면 셋 매니저에게 결과 셋을 갱신할 것을 명령하는 단계;Instructing a set manager to update the result set if the insert or change is successful in the job scheduler;

상기 셋 매니저에서 명령 수행 결과를 상기 잡 스케줄러에게 전송하는 단계;Transmitting a command execution result to the job scheduler in the set manager;

상기 잡 스케줄러에서 최종 결과를 상기 제 1 내지 제 n 파이어에게 전송하는 단계; 및Sending a final result from the job scheduler to the first to nth fires; And

상기 제 1 내지 제 n 파이어에서 데이터베이스를 다시 오픈하고 결과를 상기클라이언트에게 전송하는 단계를 구비한 한 개의 문서를 삽입 및 변경하는 방법을 포함하여 구성된 것을 특징으로 한다.And a method of inserting and modifying a document, the method comprising: reopening a database in the first to nth fires and transmitting a result to the client.

상기 정보검색 관리 방법은 상기 클라이언트에서 삭제할 문서의 아이디(ID)를 상기 잡 스케줄러로 전송하는 단계; 상기 잡 스케줄러에서 상기 클라이언트로부터 수신한 삭제할 문서의 아이디(ID)를 상기 데이터 매니저로 전송하는 단계; 상기 데이터 매니저에서 문서 삭제 후 결과를 상기 잡 스케줄러에게 전송하는 단계; 상기 잡 스케줄러에서 삭제가 성공적이면 셋 매니저에게 결과 셋을 갱신할 것을 명령하는 단계; 상기 셋 매니저에서 명령 수행 결과를 상기 잡 스케줄러에게 전송하는 단계; 상기 잡 스케줄러에서 최종 결과를 상기 제 1 내지 제 n 파이어에게 전송하는 단계; 및 상기 제 1 내지 제 n 파이어에서 데이터베이스를 다시 오픈하고 결과를 상기 클라이언트에게 전송하는 단계를 구비한 한 개의 문서를 삭제하는 방법을 더 포함하여 구성된 것을 특징으로 한다.The information retrieval management method includes transmitting an ID of a document to be deleted from the client to the job scheduler; Transmitting, by the job scheduler, an ID of a document to be deleted received from the client to the data manager; Transmitting a result to the job scheduler after deleting a document in the data manager; Instructing a set manager to update the result set if the deletion is successful in the job scheduler; Transmitting a command execution result to the job scheduler in the set manager; Sending a final result from the job scheduler to the first to nth fires; And re-opening the database in the first to nth fires and deleting the one document having the result of transmitting the result to the client.

이하, 첨부된 도면을 참조하여 본 발명의 실시예에 대해 상세히 설명하기로 한다.Hereinafter, with reference to the accompanying drawings will be described in detail an embodiment of the present invention.

도 1은 본 발명에 의한 정보검색 관리시스템의 전체 구성 블록도이다.1 is a block diagram showing the overall configuration of an information retrieval management system according to the present invention.

상기 도 1에 도시된 바와 같이, 상기 정보검색 관리시스템은 크게 커널(10), 저장엔진(20), 검색엔진(30), 색인기(40), 사전 데이터베이스(DB)(50), 데이터 관리기(60), 목록 데이터베이스 볼륨(volume)(70), 문서 데이터베이스 볼륨(80)을 포함하여 구성한다.As shown in FIG. 1, the information retrieval management system is largely composed of a kernel 10, a storage engine 20, a search engine 30, an indexer 40, a dictionary database 50, a data manager ( 60), the list database volume (70), and the document database volume (80).

상기 커널(10)은 상기 정보검색 관리시스템의 하부구조로서, 물리적인 디스크와 사용자 메모리간의 데이터 입출력을 담당하며, 크게 파일 및 디렉토리 관리기(11), 레코드 관리기(12), 역화일 관리기(13), 트랜잭션 관리기(14), 입출력 관리기(15)를 포함하여 구성한다.The kernel 10 is an infrastructure of the information retrieval management system, and is responsible for data input and output between the physical disk and the user memory, and is largely a file and directory manager 11, a record manager 12, and a reverse file manager 13. And a transaction manager 14 and an input / output manager 15.

여기서, 상기 파일 및 디렉토리 관리기(11)는 물리적인 디스크 공간에 논리적으로 동일한 집합인 레코드나 역화일을 구성하기 위해 사용되는 단위로서, 페이지들의 묶음인 익스텐트 단위로 할당된다. 레코드가 저장되어 있는 디스크의 물리적인 식별자를 논리적인 식별자로 매핑하여 접근을 용이하게 하고, 여러개의 볼륨에 대해 유일한 논리적 식별자를 발행할 수 있다.Here, the file and directory manager 11 is a unit used to form records or inverse files that are logically the same set in physical disk space, and is allocated in extent units that are a bundle of pages. The physical identifier of the disk where the record is stored can be mapped to a logical identifier to facilitate access and issue unique logical identifiers for multiple volumes.

상기 레코드 관리기(12)는 내부적으로 한 페이지에 저장할 수 있는 객체와, 한 페이지 이상되는 긴 자료항목에 대한 관리를 지원하며, 화일내의 레코드를 처음부터 순차적으로 접근하는 기능, 레코드에 대한 삽입, 수정, 삭제와 관련된 갱신 연산을 지원한다.The record manager 12 supports management of an object that can be stored in one page internally and a long data item of more than one page, and sequentially accesses a record in a file from the beginning, and inserts and modifies the record. It also supports update operations related to deletes.

상기 역화일 관리기(13)는 B트리를 사용하여 구현되었으며, 키 부분은 정수, 실수, 가변길이 문자열 중 하나를 사용할 수 있고, 엘레멘트 부분에는 사용자가 정의한 구조체를 저장한다. 저장공간의 효율을 위해 키에 대한 압축을 수행할 수 있으며, 삽입, 수정, 삭제, 검색 등의 기능을 제공한다.The reverse file manager 13 is implemented using a B tree, and the key part may use one of an integer, a real number, and a variable length string, and an element part stores a user-defined structure. To save storage space, the key can be compressed and provide functions such as insert, modify, delete, and search.

상기 트랜잭션 관리기(14)는 사용자가 지정한 작업의 단위를 안정적으로 수행하기 위한 일련의 기능을 제공한다. 즉, 트랜잭션의 시작과 종료, 중단, 세이브 포인트(Save Point) 기능, 트랜잭션의 정보를 로그 파일에 기록하는 기능을 제공하고, 회복시간을 줄이기 위해 체크 포인트를 사용한다. 지정된 작업이 중간에 실패한다면 모든 작업은 트랜잭션의 시작점으로 복귀되어 데이터의 무결성을 유지한다.The transaction manager 14 provides a series of functions for stably performing a unit of work designated by a user. In other words, it provides the start and end of a transaction, abortion, a save point function, and record the transaction information to a log file, and use checkpoints to reduce recovery time. If a given job fails in the middle, all jobs return to the beginning of the transaction to maintain the integrity of the data.

상기 입출력 관리기(15)는 물리적인 디스크에 페이지를 할당, 삭제, 유지보수를 수행하는 페이지관리기와, 디스크의 페이지를 메모리의 페이지로 매핑을 담당하는 버퍼관리기와, 서로 다른 요구 사항들이 동일한 객체를 접근할 때 사용하는 잠금관리기 등으로 구성되며, 저장시스템의 버퍼는 공유메모리로 구현되어, 동시에 여러개의 프로세스가 접근할 수 있도록 한다.The input / output manager 15 includes a page manager for allocating, deleting, and maintaining a page on a physical disk, a buffer manager for mapping a page of a disk to a page of a memory, and an object having different requirements. It consists of a lock manager used for access, and the buffer of the storage system is implemented as shared memory so that multiple processes can access it at the same time.

다음, 상기 저장엔진(20)은 상기 커널(10)을 하부구조로 이용하여 설계되었으며, 카탈로그 관리기(21), 문서 관리기(22), 색인 관리기(23)를 포함하여 구성한다.Next, the storage engine 20 is designed using the kernel 10 as a substructure, and includes a catalog manager 21, a document manager 22, and an index manager 23.

여기서, 상기 카탈로그 관리기(21)는 문서 구조, 색인 방법, 기본키 정보, 압축여부, 불용어 등 구축하고자 하는 데이터베이스의 구조에 대한 정보를 관리한다. 그리고, 상기 문서 관리기(22)는 원본 문서를 내부 문서구조로 변환하여 삽입, 기존 문서의 삭제 또는 수정에 대한 연산을 수행하고, 상기 색인 관리기(23)는 상기 색인기(40)를 이용하여 추출된 색인 정보를 검색에 적합한 구조로 구성한다.Here, the catalog manager 21 manages information about a structure of a database to be constructed, such as a document structure, an indexing method, basic key information, compression, and stopwords. The document manager 22 converts the original document into an internal document structure to perform operations for insertion, deletion or modification of an existing document, and the index manager 23 is extracted using the indexer 40. Organize index information into a structure suitable for searching.

상기 저장엔진(20)은 레코드(객체)의 크기에 제한이 없으며, 안정적인 온라인 삽입, 삭제를 위해 트랜잭션 처리를 통한 회복기능을 제공한다. 또한 XML, SGML과 같은 구조문서 및 멀티미디어 데이터등 다양한 타입의 객체 저장 및 검색을 지원한다.The storage engine 20 has no limitation on the size of a record (object), and provides a recovery function through transaction processing for stable online insertion and deletion. It also supports various types of object storage and retrieval such as structured documents such as XML and SGML and multimedia data.

다음, 상기 검색 엔진(30)은 크게 부울 모델과 확장 벡터 모델을 지원한다. 이 때, 부울 모델은 기본 불리언의 확장으로서 불리언 연산자(and, or, not), 근접도 연산자(near, within), 관계연산자(=,<=,<,>=,>), 범위연산자(range), 그리고 절단 연산자를 지원한다. 확장 벡터 모델은 사용자 질의와 문서간의 유사도 값의 계산에 따라 검색 결과를 내는 순수 벡터 모델을 확장한 모델로서 벡터 연산을 기본으로 하되, 불리언 모델의 근접도 연산자를 제외한 나머지 연산자를 허용한다.Next, the search engine 30 greatly supports a Boolean model and an extended vector model. In this case, the boolean model is an extension of the basic boolean as Boolean operators (and, or, not), proximity operators (near, within), relational operators (=, <=, <,> =,>) and range operators (range). , And truncation operators. The extended vector model is an extension of the pure vector model that produces search results based on the calculation of the similarity value between the user query and the document. The extended vector model is based on vector operations, except for the proximity operator of the Boolean model.

다음, 상기 색인기(40)는 다양한 색인형식을 제공하여 사용자가 특정한 섹션에 대하여 적합한 색인형식을 선택하게 하였고, 유니코드 기반으로 구성되어 고어/한자 및 다국어로 구성된 문서도 쉽게 처리가 가능하다. 영어에 대해서는 단어의 어근을 색인할 수 있는 스테밍 옵션과, 한자에 대해서는 한글로 변환하여 색인할 수 있는 옵션을 제공한다. 한글에 대해서는 실질형태소 사전, 어미사전, 조사사전, 결합형 보조용어 사전, 불용어 처리용 형식형태소 사전 등을 이용하여 처리한다.Next, the indexer 40 provides a variety of index formats to allow the user to select a suitable index format for a particular section, it is based on Unicode can easily process documents composed of Gore / Chinese and multilingual. For English, it provides stemming options for indexing the roots of words and for Chinese characters, the option for indexing by converting to Korean. Hangul is processed using real morpheme dictionaries, ending dictionaries, search dictionaries, combined auxiliary term dictionaries, and formal morpheme dictionaries for processing stopwords.

다음, 상기 데이터관리기(60)는 관리자가 작성한 스키마를 이용하여 대용량의 벌크데이터를 데이터베이스에 적재 및 색인한다. 적재에 사용되는 데이터는 일반 텍스트 파일과 함께, 지집(gzip)으로 압축된 압축파일도 포함하며, 저장시에도 공간효율을 위해 압축을 하여 저장할 수 있다. 또한, 유니코드의 지원을 위해, 다른 형식의 문서는 내부적으로 UTF-8형식으로 인코딩하여 저장한다. 적재시 기본키를 점검하여 동일한 문서의 적재를 방지하고, 디스크의 연속된 공간에 데이터를 효과적으로 배치하여, 대용량 문서 및 색인을 신속하게 수행하고 후에 접근을 용이하게 한다.Next, the data manager 60 loads and indexes a large amount of bulk data in a database using a schema created by an administrator. In addition to the plain text file, the data used for loading includes a compressed file compressed in a gzip, and can be compressed and stored for space efficiency. In addition, for Unicode support, documents in other formats are encoded and stored internally in UTF-8. When loading, the primary key is checked to prevent the loading of the same document, and the data is effectively placed in a contiguous space on the disk to quickly perform large documents and indexes and facilitate access later.

다음, 상기 목록 데이터베이스 볼륨(70)은 목록 데이터베이스를 저장하고 있고, 상기 문서 데이터베이스 볼륨(80)은 하나 또는 여러 개의 문서 데이터베이스를 분산하여 저장하고 있다. 여기서, 상기 문서 데이터베이스는 구축하려는 실제의 문서들과 그 문서들에 대한 색인을 저장하며, 목록 데이터베이스는 데이터베이스들에 대한 문서 구조, 색인 방법, 기본 키(Primary Key), 압축여부, 불용어 등에 대한 정보들을 저장하여 유지한다.Next, the list database volume 70 stores a list database, and the document database volume 80 stores one or several document databases in a distributed manner. Here, the document database stores the actual documents to be built and the indexes of the documents, and the list database contains information about the document structure, indexing method, primary key, compression, and stopwords for the databases. Save them and keep them.

상기 문서 데이터베이스는 도 1에 도시된 바와 같이, 하나 이상의 저장 장치(Document Volume)에 걸쳐 분산 저장된다. 이는 대용량 데이터베이스를 처리하기 위한 방법의 일환으로 대용량 데이터베이스에 대한 검색성능을 향상시키기 위해서 멀티쓰레드를 이용한 분산검색을 이용한다.As shown in FIG. 1, the document database is distributed and stored over one or more storage devices. As part of the method for processing large databases, multi-threaded distributed search is used to improve the search performance of large databases.

본 발명에 의한 정보검색 관리시스템은 여러 개의 데이터베이스에서 레코드를 식별하기 위해 각 레코드마다 논리적인 문서 식별자(Record Identifier: RID)를 부여한다. 논리적 문서 식별자는 데이터베이스 그룹 전체에서 유일하도록 시스템에 의해 자동으로 부여되며, 볼륨 식별자와 레코드번호로 구성된다.The information retrieval management system according to the present invention assigns a logical document identifier (RID) to each record to identify the records in multiple databases. The logical document identifier is automatically assigned by the system to be unique throughout the database group and consists of a volume identifier and a record number.

§문서 구조§ Document structure

본 발명에 의한 정보검색 관리시스템은 주로 '단행본 도서', '연구보고서', '논문' 또는 '신문기사' 등과 같은 비정형의 텍스트 문서에 대한 검색을 지원한다. 이 시스템에서 데이터베이스는 텍스트 문서 개체들의 집합체이며, 하나의 문서는 '제목', '저자명', '초록' 등의 섹션들로 구성된다. 이것은 관계형 데이터베이스에서의 테이블(또는 릴레이션)이 레코드들로 구성되며, 각각의 레코드가 여러 개의필드(또는 속성)들로 구성되는 계층구조에 비유될 수 있다. 본 발명에 의한 정보검색 관리시스템에서 문서를 구성하는 섹션들은 다음과 같이 세 가지로 분류된다.The information retrieval management system according to the present invention mainly supports retrieval of atypical text documents such as 'book books', 'research reports', 'papers' or 'newspaper articles'. In this system, a database is a collection of text document entities, and a document consists of sections such as 'Title', 'Author', and 'Abstract'. This can be likened to a hierarchy in which a table (or relation) in a relational database consists of records, where each record consists of several fields (or attributes). In the information retrieval management system according to the present invention, sections constituting a document are classified into three categories as follows.

●기본 섹션(Basic Section)Basic Section

기본 섹션은 하나의 문서를 구성하는 최소 단위의 문서 구성요소이다. 사용자는 데이터베이스 검색 과정에서 문서 검색의 범위를 지정하기 위한 기본단위로서 기본섹션을 사용할 수 있다. 예를 들면, '제목', '저자명', '저자의 소속 기관', '초록', '본문' 등은 모두 논문을 구성하는 기본 섹션이 될 수 있다. 그리고 사용자는 검색 범위를 '논문 제목'과 '초록'이라는 두개의 기본 섹션에 한정시킴으로써 질의의 수행을 지시할 수 있다. 또한 기본 섹션은 문서 색인을 위한 단위로 사용되기도 한다. 다시 말해, 본 발명에 의한 정보검색 관리시스템은 문서의 빠른 검색을 위해 기본 섹션별로 색인 화일을 구성한다. 상기 정보검색 관리시스템의 기본 섹션의 내용은 문자열, 숫자 모두 지원한다.The basic section is the minimum document component that constitutes a document. You can use the default section as the basic unit for specifying the scope of the document search during the database search. For example, the title, author's name, author's institution, abstract, and text can all be basic sections of a paper. In addition, the user can direct the execution of the query by limiting the search scope to two basic sections: 'paper title' and 'green'. Basic sections can also be used as a unit for document indexing. In other words, the information retrieval management system according to the present invention constructs index files for each basic section for quick retrieval of documents. The content of the basic section of the information retrieval management system supports both strings and numbers.

●가상 섹션(Virtual Section)Virtual Section

본 발명에 의한 정보검색 관리시스템에서는 기본 섹션들에 대한 다양한 색인타입을 지원하기 위해 물리적으로 하나의 의미를 갖는 새로운 섹션을 구성할 수 있다. 이 섹션 형태를 가상 섹션이라고 한다. 또한 가상 섹션은 기본 섹션과 결합하여 결합 섹션을 구성할 수 있다. 예를 들어, '제목'의 기본 섹션에 대해 '제목 2', '제목 3'의 가상섹션을 구성하고 각각을 부분 일치(INDEX_BY_MA), 완전 일치(INDEX_AS_IS)등 다양한 타입의 색인을 구성할 수 있다.In the information retrieval management system according to the present invention, in order to support various index types for basic sections, a new section having a single physical meaning can be configured. This type of section is called a virtual section. Virtual sections can also be combined with base sections to form joining sections. For example, you can construct virtual sections of 'Title 2' and 'Title 3' for the basic section of 'Title' and configure various types of indexes such as partial match (INDEX_BY_MA) and exact match (INDEX_AS_IS). .

●결합 섹션(Union Section)Union Section

본 발명에 의한 정보검색 관리시스템에서는 여러 개의 기본 섹션 및 가상섹션들을 결합하여 논리적으로 하나의 의미를 갖는 새로운 섹션을 구성할 수 있다. 이러한 섹션을 결합 섹션이라고 부르는데, 이것은 실제 색인을 생성하지 않으면서 사용자의 관점에서 하나의 섹션 단위로 인식되는 가상의 섹션이다. 예를 들어, '주제목', '부제목', '소제목'의 기본섹션 또는 가상섹션을 결합하여 '제목'이라는 결합 섹션을 구성할 수 있다.In the information retrieval management system according to the present invention, a plurality of basic sections and virtual sections can be combined to form a new section having a logical meaning. Such a section is called a joining section, which is a fictitious section that is perceived as a single section from the user's point of view without actually indexing. For example, a combined section called 'title' may be formed by combining a basic section or a virtual section of 'subtitle', 'subtitle', 'subtitle'.

§색인 형태§Index form

일반적으로 색인은 시스템의 검색 속도를 높일 뿐만 아니라 검색 효과에도 큰 영향을 미치는 것으로 알려져 있다. 따라서 적절한 색인 방법의 지정은 시스템의 성능과 질을 크게 향상 시킬 수 있다. 본 발명에 의한 정보검색 관리시스템은 문서의 색인을 위해 여러 가지의 색인 방식을 지원하며, 데이터베이스 설계자는 문서의 기본 섹션 및 가상 섹션 마다 이들 색인 방식 중의 하나를 적용한다. 또한 한자와 스테밍(STEMMING) 처리에 대한 옵션이 제공된다. 한자2한글(HANJA2HANGLE) 옵션이 참(TRUE)인 경우는 한자로 된 문서인 경우에 한자를 한글로 변환한 후 이를 색인 방식에 따라 색인어를 추출하고, 한자2한글(HANJA2HANGLE) 옵션이 거짓(FALSE)인 경우는 한글로 변환하지 않고 한자를 색인 방식에 따라 색인어를 추출한다. 영문의 경우 스테밍(STEMMING) 옵션이 참(TRUE)인 경우에만 스테밍(STEMMING) 처리를 한다.In general, the index not only speeds up the search of the system but also has a great effect on the search effect. Therefore, specifying the proper indexing method can greatly improve the performance and quality of the system. The information retrieval management system according to the present invention supports various indexing methods for indexing documents, and the database designer applies one of these indexing methods to each basic section and virtual section of the document. There are also options for Chinese characters and STEMMING processing. If the HANJA2HANGLE option is TRUE, if the document is in Chinese, convert the Hanja to Hangul and extract the index word according to the indexing method.The HANJA2HANGLE option is FALSE. ), Extracts index words according to the index method without converting to Hangul. In English, stemming is only done when the stemming option is TRUE.

본 발명에 의한 정보검색 관리시스템은 INDEX_AS_IS, INDEX_BY_TOKEN,INDEX_BY_MA, INDEX_BY_CHAR, INDEX_AS_NUMERIC, INDEX_AS_IS_MA 등과 같은 6가지의 기본 색인 방식을 구비한다.The information retrieval management system according to the present invention includes six basic indexing methods such as INDEX_AS_IS, INDEX_BY_TOKEN, INDEX_BY_MA, INDEX_BY_CHAR, INDEX_AS_NUMERIC, INDEX_AS_IS_MA, and the like.

●INDEX_AS_ISINDEX_AS_IS

상기 INDEX_AS_IS는 문서에서 섹션의 내용 또는 값 전체를 하나의 색인어로 추출한다. 그렇게 함으로써 그 섹션에 대한 완전 일치(Exact matching)의 검색을 지원한다. INDEX_AS_IS로 색인된 섹션에 대해 단지 문자열 값의 일치(Character String equality)여부만을 확인하는 수준에서의 검색을 수행한다. 관계연산자(<,<=,>,>=,>)를 이용한 문자열 값의 비교는 가능하다. INDEX_AS_IS 색인 방식은 '제어번호'와 같이 레코드를 구별할 수 있도록 유일하게 부여된 기본 섹션에 주로 사용된다.The INDEX_AS_IS extracts the entire contents or values of a section from a document as one index word. Doing so supports searching for exact matching on that section. Performs a search at the level that checks only Character String equality for the section indexed with INDEX_AS_IS. It is possible to compare string values using the relational operators (<, <=,>,> =,>). The INDEX_AS_IS indexing method is mainly used for basic sections that are uniquely assigned to distinguish records, such as 'control number'.

예) 제 목 : "정보검색에 관한 연구"Example) Title: "Research on Information Retrieval"

색인어 : "정보검색에 관한 연구"Index: "Research on Information Retrieval"

예) 제 목 : "情報檢索에 관한 硏究"Ex) Title: "硏究 about 情報檢索"

색인어 : "情報檢索에 관한 硏究"Index: "硏究 about 情報檢索"

예) 제어번호 : "AN00012"Ex) Control number: "AN00012"

색인어 : "AN00012"Index word: "AN00012"

●INDEX_BY_TOKENINDEX_BY_TOKEN

상기 INDEX_BY_TOKEN은 텍스트 검색과 같이 내용 기반의 부분 일치(Partial matching) 검색을 지원해야 하는 섹션에 적용 가능하다. 상기 INDEX_BY_TOKEN 색인 방식은 섹션의 내용 또는 값 전체를 색인어로 사용하는 것이 아니라, 섹션내의어절 또는 단어들 중에서 색인어를 선정한다. 상기 INDEX_BY_TOKEN은 섹션에서 불용어를 제외한 어절 또는 단어들을 색인어로 추출하는 초보적인 색인 방식으로, 별도의 후처리를 수행하지 않고 원문에 나타난 형태 그대로를 색인어로 사용한다. 따라서 이 방식은 '사람 이름'이나 '지명'과 같은 고유명사들을 주로 포함하고 있는 섹션이나 논문의 '키워드 리스트'와 같이 별다른 후처리가 필요 없는 섹션들에 적합하다.The INDEX_BY_TOKEN is applicable to a section that needs to support content-based partial matching search such as text search. The index method of INDEX_BY_TOKEN selects index words from among clauses or words in the section, rather than using the entire contents or values of the section as index words. The INDEX_BY_TOKEN is a basic indexing method for extracting words or words except for stopwords from a section as an index word, and uses an index word as it appears in the original text without performing post-processing. Therefore, this method is suitable for sections that mainly contain proper nouns such as' person's name 'or' name 'or' sections that do not require special post-processing, such as' keyword list 'of the paper.

HANJA2HANGLE(거짓)인 경우If HANJA2HANGLE

색인어 : "정보검색에", "관한", "연구"Index Terms: "Information Search", "About", "Research"

색인어 : "情報檢索","에","관한","硏究"Index: "情報檢索", "in", "about", "硏究"

HANJA2HANGLE(참)인 경우If HANJA2HANGLE

색인어 : "정보검색에","관한","연구"Index terms: "information search", "about", "research"

●INDEX_BY_MAINDEX_BY_MA

상기 INDEX_BY_MA는 INDEX_BY_TOKEN의 색인 방식에서 한 단계 더 나아가 한글 텍스트의 색인을 위해 한글 형태소 해석기(Morphological analyzer)를 사용하며 영어 텍스트의 경우에는 옵션으로 스테머(stmmer)를 이용한다. 한글 텍스트의 각어절에 대해 형태소 해석을 수행함으로써 명사, 조사, 접미사, 동사, 형용사 등의 최소 형태소 단위를 구분한 후, 섹션의 내용을 대표할 수 있는 단순 명사를 색인어로 추출한다. 그리고 영어의 경우에는 규칙적인 복수형 및 동사의 시제 변화 등을 처리하여 어간을 추출한다. 따라서 이 방식은 단순한 INDEX_BY_TOKEN의 어절 단위 색인보다 양질의 색인을 수행한다. 따라서 INDEX_BY_MA는 '논문 제목'이나 '초록'과 같은 기본 섹션에 적합하다.The INDEX_BY_MA goes one step further to the indexing method of INDEX_BY_TOKEN and uses a Morphological analyzer for indexing Korean text and optionally uses a stemmer for English text. Morphological analysis is performed on each sentence of Hangeul text to classify the smallest morphological units such as nouns, surveys, suffixes, verbs, and adjectives, and then extracts simple nouns that can represent the contents of the section. In the case of English, stems are extracted by processing regular plurals and verb tense changes. Thus, this method performs a better index than the simple index word index of INDEX_BY_TOKEN. Thus INDEX_BY_MA is suitable for basic sections such as 'paper title' or 'abstract'.

HANJA2HANGLE(거짓)인 경우If HANJA2HANGLE

색인어 : "정보","검색","정보검색","연구"Index terms: "information", "search", "information search", "research"

색인어 : "情報檢索","硏究"Index terms: "情報檢索", "硏究"

예) 제 목 : "Information Systems"Ex) Title: "Information Systems"

색인어 : "informat", "system"Index words: "informat", "system"

HANJA2HANGLE(참)인 경우If HANJA2HANGLE

●INDEX_BY_CHARINDEX_BY_CHAR

상기 INDEX_BY_CHAR는 기본 섹션에서 영어인 경우에는 한 음절씩, 한글인 경우에는 두 음절씩을 추출하여 색인어를 생성한다. 이 방식은 한글 내용이 담긴 기본 섹션중에서 '사람 이름'과 같은 섹션에 주로 사용한다. 즉, 사람 이름이 '홍 길동', '홍길동', '홍 길 동'등과 같이 한글 띄어쓰기가 자유롭게 되어 있는 경우 '홍','길','동'과 같이 음절 단위로 색인하여 저장하면, 사용자가 사람 이름을 검색할 경우 띄어 쓰기에 관계없이 검색이 가능하도록 한다.The INDEX_BY_CHAR generates index words by extracting one syllable in English and two syllables in Korean. This method is mainly used for the section like 'person's name' in the basic section containing Hangul content. In other words, if a person's name is freely available in Korean, such as 'Hong Gil-dong', 'Hong Gil-dong', and 'Hong Gil-dong', if the user's name is indexed and stored in syllable units, such as 'Hong', 'Gil', 'Dong', Searches for a person's name, regardless of spacing.

HANJA2HANGLE(거짓)인 경우If HANJA2HANGLE

예) 이 름 : "홍 길동"Ex) Name: "Hong Gil Dong"

색인어 : "홍","길","동"Index words: "hong", "gil", "dong"

예) 이 름 : "洪吉童"Ex) Name: "洪吉童"

색인어 : "洪","吉","童"Index terms: "洪", "吉", "童"

HANJA2HANGLE(참)인 경우If HANJA2HANGLE

예) 이 름 : "홍 길동"Ex) Name: "Hong Gil Dong"

색인어 : "홍","길","동"Index words: "hong", "gil", "dong"

예) 이 름 : "洪吉童"Ex) Name: "洪吉童"

색인어 : "홍","길","동"Index words: "hong", "gil", "dong"

●INDEX_AS_NUMERIC● INDEX_AS_NUMERIC

상기 INDEX_AS_NUMERIC은 숫자로 구성된 기본 섹션에 대하여 색인어를 추출한다. 상기 INDEX_AS_NUMERIC로 지정된 섹션은 단일의 값(atomic value)만을 갖는 것으로 가정하며, 이들 섹션에 대해서는 관계연산(<,<=,>,>=,=,-)이 가능하다. 이색인 방식은 '날짜'와 같이 년월일 형태로 일정한 형식이 갖추어지고 숫자로 이루어진 섹션에 대해 주로 사용한다.The INDEX_AS_NUMERIC extracts an index word for a basic section composed of numbers. It is assumed that the sections designated as INDEX_AS_NUMERIC have only a single atomic value, and a relational operation (<, <=,>,> =, =,-) is possible for these sections. Diagonal styles are often used for sections that have a uniform, numeric form, such as date, year, and date.

예) 날 짜 : "19961214"Ex) Date: "19961214"

색인어 : 19961214Index word: 19961214

●INDEX_AS_IS_MAINDEX_AS_IS_MA

상기 INDEX_AS_IS_MA는 앞서 설명한 INDEX_AS_IS 색인방식을 다소 변형시킨 방법과 INDEX_BY_MA 색인방식을 동시에 만족하는 색인 방식이다. 즉, 문서에서 섹션의 내용 전체를 색인어로 추출하고, 동시에 한글 텍스트의 각 어절에 대해 형태소 해석을 수행함으로써 명사, 조사, 접미사, 동사, 형용사 등의 최소 형태소 단위를 구분한 후, 섹션의 내용을 대표할 수 있는 단순 명사를 색인어로 추출한다.The INDEX_AS_IS_MA is an index method that satisfies both the above-described INDEX_AS_IS index method and INDEX_BY_MA index method simultaneously. That is, by extracting the entire contents of a section from the document as an index word, and performing morphological analysis on each word of Hangul text, the minimum morphological units such as nouns, surveys, suffixes, verbs, and adjectives are classified. Extract a simple noun that can be represented as an index word.

HANJA2HANGLE(거짓)인 경우If HANJA2HANGLE

색인어 : "정보검색에 관한 연구","정보","검색","정보검색","연구"Index terms: "Research", "information", "search", "information search", "research"

색인어 : "情報檢索에 관한 硏究", "情報檢索", "硏究"Index: "硏究 about 情報檢索", "情報檢索", "硏究"

HANJA2HANGLE(참)인 경우If HANJA2HANGLE

색인어 : "정보검색에 관한 연구","정보","검색", "정보검색", "연구"Index terms: "Research", "information", "search", "information search", "research"

색인어 : "정보검색에 관한 연구", "정보", "검색", "정보검색", "연구"Index terms: "Research", "information", "search", "information search", "research"

●DO_NOT_INDEXDO_NOT_INDEX

상기 DO_NOT_INDEX로 지시된 섹션에 대해서는 색인을 수행하지 않는다. 현재 이들 방식으로 지정된 섹션에 대해서는 아무런 탐색 경로가 제공되지 않는다.The index indicated by the DO_NOT_INDEX is not performed. Currently no search path is provided for sections specified in these ways.

도 2는 본 발명에 의한 정보검색 관리시스템의 데이터 베이스의 적재 과정을 개념적으로 나타낸 개념도이다.2 is a conceptual diagram conceptually illustrating a loading process of a database of the information retrieval management system according to the present invention.

상기 도 2에서, 데이터베이스 스키마 파일(Database Schema File)(101), 원시 문서 파일(Documents)(102), 불용어 목록(Stopwords)(103)은 데이터베이스 관리자가 준비해야 하는 파일이다.In FIG. 2, the database schema file 101, the source documents file 102, and the stopwords list 103 are files that a database administrator needs to prepare.

상기 데이터베이스 스키마 파일(Database Schema File)(101)은 데이터베이스 디렉토리와 문서 볼륨, 데이터베이스 그룹, 섹션의 정의, 섹션의 색인 방식, 원시 문서의 구조, 원시 문서로부터 데이터베이스로의 적재 방법 등에 대한 정보를 기술한다.The database schema file 101 describes information about a database directory and document volume, a database group, a section definition, a section indexing method, a structure of a source document, a method of loading a source document into a database, and the like. .

상기 원시 문서 파일(Documents)(102)은 데이터베이스에 적재될 원시 문서들로, 이종 구조를 가진 문서도 데이터베이스 스키마 파일(101)에 기술되어 있는 원시 문서의 구조를 통하여 하나의 데이터베이스에 적재가 가능하다.The raw document files 102 are raw documents to be loaded into a database, and documents having heterogeneous structures may be loaded into a single database through the structure of the raw documents described in the database schema file 101. .

상기 불용어 목록(Stopwords)(103)은 데이터베이스에 적재될 불용어 목록으로서, 검색할 때에만 참조하고 검색대상에서는 제외한다. 만일, 사용자가 불용어를 검색에 포함하고자 하면 질의어 앞에 '+'를 붙여 검색해야 한다.The stopwords list 103 is a list of stopwords to be loaded into a database and is referred to only when searching and excluded from a search target. If the user wants to include the stopword in the search, the user should search by prefixing the query with a '+'.

상기 데이터베이스 스키마 파일(101), 원시 문서 파일(102), 불용어 목록(103)이 준비되면 로더(Loader)(110)를 이용하여 데이터베이스를 적재하게 된다. 상기 로더(Loader)(110)는 데이터베이스를 적재하기 위해 사용하는 프로그램으로, 데이터베이스 적재에 필요한 모든 정보를 데이터베이스 스키마 파일(101)로부터 가져온다. 상기 로더(110)는 명령행(command line)에서 옵션을 통해 스키마 파일을 읽어 수행하는 일종의 명령어 해석기(command line interpreter)로서, 관리자는 이 프로그램에게 데이터베이스 생성 및 문서 적재를 위한 명령어를 입력할 수 있다. 관리자는 편의를 위해 상기 로더(110)에 입력할 목적으로 명령어들로 구성된 파일을 만들어서 사용하는 것이 편리하다. 본 발명의 정보검색 관리시스템에서는 이를 데이터베이스 스키마 파일(또는, 스키마 파일)이라 한다.When the database schema file 101, the source document file 102, and the stop word list 103 are prepared, the database is loaded using the loader 110. The loader 110 is a program used to load a database, and imports all information necessary for loading a database from the database schema file 101. The loader 110 is a type of command line interpreter that reads a schema file through an option on a command line, and an administrator may input a command for creating a database and loading a document to the program. . It is convenient for the administrator to create and use a file composed of instructions for input to the loader 110 for convenience. In the information retrieval management system of the present invention, this is called a database schema file (or schema file).

도 3은 본 발명에 의한 정보검색 관리시스템의 구성 프로세스들과의 통신 형태를 개념적으로 나타낸 개념도이다.3 is a conceptual diagram conceptually showing a form of communication with the configuration processes of the information retrieval management system according to the present invention.

상기 정보검색 관리시스템은 상기 도 3에 도시된 바와 같이, 크게 잡 스케줄러(Job Scheduler)(210), 제 1 내지 제 n 파이어(Fire)(220/1∼220/n), 셋 매니저(Set Manager)(230), 데이터 매니저(Data Manager)(240), 클라이언트(Client)(도시되지 않음) 등 다섯 종류의 프로세스로 구성되며, 프로세스들 사이는 소켓 통신이나 파이프 통신을 통하여 데이터를 전송한다. 상기 프로세스들의 기능을 다음과 같다.As shown in FIG. 3, the information retrieval management system includes a job scheduler 210, first to nth fires 220/1 to 220 / n, and a set manager. (230), a data manager (240), and a client (not shown), which are composed of five kinds of processes, and processes transmit data through socket communication or pipe communication. The functions of the above processes are as follows.

상기 잡 스케줄러(Job Scheduler)(210)는 클라이언트로부터 들어오는 연결 요청을 수락하고 제 1 내지 제 n 파이어(220/1∼220/n)들의 상태에 따라 작업을 분배해 준다. 그리고, 온라인 문서관리시 데이터 매니저(240)에 문서 관리를 요청하며 그 결과를 클라이언트에게 전송한다. 또한, 데이터베이스 변경시 데이터 매니저(240), 제 1 내지 제 n 파이어(220/1∼220/n), 셋 매니저(230)에게 데이터베이스가 변경되었음을 알린다.The job scheduler 210 accepts the connection request from the client and distributes the job according to the states of the first to nth fires 220/1 to 220 / n. When the online document is managed, the document manager is requested to manage the document and the result is transmitted to the client. In addition, when the database is changed, the data manager 240, the first to nth fires 220/1 to 220 / n, and the set manager 230 are notified that the database has been changed.

상기 제 1 내지 제 n 파이어(220/1∼220/n)는 검색 수행이 주 기능이며, 상기 잡 스케줄러(Job Scheduler)(210)로부터 넘어오는 서비스 요청에 대해서 작업 후 클라이언트에게 결과를 넘겨준다. 그리고, 검색 결과를 셋 매니저(230)에게 저장 요청을 한다.The first to nth fires 220/1 to 220 / n have a main function of performing a search and hand over the service request from the job scheduler 210 to the client after the operation. Then, a request is made to store the search results to the set manager 230.

상기 셋 매니저(Set Manager)(230)는 검색 결과 저장 및 관리가 주 기능이며, 상기 제 1 내지 제 n 파이어(220/1∼220/n)로부터 넘어오는 서비스 요청에 대해 작업 후 상기 제 1 내지 제 n 파이어(220/1∼220/n)에게 결과를 넘겨준다.The set manager 230 mainly stores and manages a search result, and after the service request from the first to nth fires 220/1 to 220 / n is operated, the first to The result is passed to the nth fires 220/1 to 220 / n.

상기 데이터 매니저(Data Manager)(240)는 온라인 문서 관리가 주 기능이며, 잡 스케줄러(Job Scheduler)(210)로부터 넘어오는 서비스 요청에 대해 작업 후 상기 잡 스케줄러(Job Scheduler)(210)에게 결과를 넘겨준다.The data manager 240 is a main function of online document management, and after the job request from the job scheduler 210 is returned to the job scheduler 210. Pass it over.

상기 클라이언트(Client)는 사용자의 요구를 상기 잡 스케줄러(Job Scheduler)(210)에게 전송한다.The client transmits a user's request to the job scheduler 210.

본 발명에 의한 정보검색 관리시스템의 클라이언트 측 서비스는 검색 서비스, 온라인 문서관리 서비스, 에러처리 서비스 세가지가 있으며, 이 중 정보검색 관리시스템 서버 측에 전송되는 서비스는 검색 서비스, 온라인 문서관리 서비스 두 가지이다.There are three types of client-side services of the information retrieval management system according to the present invention: retrieval service, online document management service, and error processing service. to be.

도 4a 내지 도 4f는 본 발명에 의한 정보검색 관리시스템의 클라이언트 측의 검색 서비스를 개념적으로 나타낸 개념도이다.4A through 4F are conceptual views conceptually showing a search service on a client side of an information retrieval management system according to the present invention.

먼저, 도 4a는 클라이언트가 데이터베이스의 정보를 얻는 방법을 나타낸 개념도이다.First, FIG. 4A is a conceptual diagram illustrating how a client obtains information of a database.

상기 도 4a에 도시된 바와 같이, 상기 검색 서비스는 클라이언트(310)에서 데이터베이스 정보를 요청하면(단계 S1), 잡 스케줄러(320)에서는 상기 클라이언트(310)가 요청한 데이터베이스 정보를 유휴 상태인 제 1 내지 제 n 파이어(330/1∼330/n)에게 전송한다(단계 S2). 그 다음, 상기 제 1 내지 제 n 파이어(330/1∼330/n)에서는 상기 클라이언트(310)가 요청한 데이터베이스 정보를 검색하여 상기 클라이언트(310)에게 전송한다(단계 S3).As shown in FIG. 4A, when the search service requests database information from the client 310 (step S1), the job scheduler 320 first to idle the database information requested by the client 310. The data is transmitted to the nth fires 330/1 to 330 / n (step S2). Next, the first to n-th fires 330/1 to 330 / n retrieve the database information requested by the client 310 and transmit it to the client 310 (step S3).

도 4b는 클라이언트가 데이터베이스의 섹션 리스트를 얻는 방법을 나타낸 개념도이다.4B is a conceptual diagram illustrating how a client obtains a section list of a database.

상기 도 4b에 도시된 바와 같이, 상기 검색 서비스는 클라이언트(310)에서 데이터베이스의 섹션 리스트를 요청한다(단계 S11). 그 다음, 잡 스케줄러(320)에서는 상기 클라이언트(310)가 요청한 데이터베이스의 섹션 리스트를 유휴 상태인 제 1 내지 제 n 파이어(330/1∼330/n)에게 전송한다(단계 S12). 그 다음, 상기 제 1 내지 제 n 파이어(330/1∼330/n)에서는 상기 클라이언트(310)가 요청한 데이터베이스의 섹션 리스트를 검색하여 상기 클라이언트(310)에게 전송한다(단계 S13).As shown in FIG. 4B, the search service requests a section list of the database from the client 310 (step S11). Then, the job scheduler 320 transmits the section list of the database requested by the client 310 to the first to nth fires 330/1 to 330 / n in the idle state (step S12). Next, the first to nth fires 330/1 to 330 / n retrieve the section list of the database requested by the client 310 and transmit it to the client 310 (step S13).

도 4c는 클라이언트가 검색을 요청하는 방법을 나타낸 개념도이다.4C is a conceptual diagram illustrating a method of requesting a search by a client.

상기 도 4c에 도시된 바와 같이, 상기 검색 서비스는 클라이언트(310)에서 질의, 검색대상 데이터베이스 리스트 및 섹션 등의 정보를 전송한다(단계 S21). 그 다음, 잡 스케줄러(320)에서는 상기 클라이언트(310)가 전송한 정보를 유휴 상태인 제 1 내지 제 n 파이어(330/1∼330/n)에게 전송한다(단계 S22). 그 다음, 상기 제 1 내지 제 n 파이어(330/1∼330/n)에서는 상기 클라이언트(310)가 전송한 정보를 검색한 후 검색 결과를 셋 매니저(340)에게 전송한다(단계 S23). 그 다음, 상기 셋 매니저(340)는 수신된 검색 결과를 저장한 후, 결과 셋 번호와 문서 개수를 상기 제 1 내지 제 n 파이어(330/1∼330/n)에게 전송한다(단계 S24). 그 다음, 상기 제 1 내지 제 n 파이어(330/1∼330/n)는 상기 셋 매니저(340)로부터 수신된 검색 정보를 상기 클라이언트(310)로 전송한다(단계 S25).As shown in FIG. 4C, the search service transmits information such as a query, a search target database list and a section, to the client 310 (step S21). Next, the job scheduler 320 transmits the information transmitted from the client 310 to the first to nth fires 330/1 to 330 / n in the idle state (step S22). Next, the first to nth fires 330/1 to 330 / n search for the information transmitted from the client 310 and transmit the search result to the set manager 340 (step S23). Next, the set manager 340 stores the received search result and transmits the result set number and the number of documents to the first to nth fires 330/1 to 330 / n (step S24). Next, the first to n th fires 330/1 to 330 / n transmit the search information received from the set manager 340 to the client 310 (step S25).

도 4d는 클라이언트가 유사문서검색을 요청하는 방법을 나타낸 개념도이다.4D is a conceptual diagram illustrating a method of requesting a similar document search by a client.

상기 도 4d에 도시된 바와 같이, 상기 검색 서비스는 클라이언트(310)에서 검색 문서번호, 검색대상 데이터베이스 리스트, 섹션, 검색 방법 등의 정보를 전송한다(단계 S31). 그 다음, 잡 스케줄러(320)에서는 상기 클라이언트(310)가 전송한 정보를 유휴 상태인 제 1 내지 제 n 파이어(330/1∼330/n)에게 전송한다(단계 S32). 그 다음, 상기 제 1 내지 제 n 파이어(330/1∼330/n)에서는 상기 클라이언트(310)가 전송한 정보를 검색한 후 검색 결과를 셋 매니저(340)에게 전송한다(단계 S33). 그 다음, 상기 셋 매니저(340)는 수신된 검색 결과를 저장한 후, 결과 셋 번호와 문서 개수를 상기 제 1 내지 제 n 파이어(330/1∼330/n)에게 전송한다(단계 S34). 그 다음, 상기 제 1 내지 제 n 파이어(330/1∼330/n)는 상기 셋 매니저(340)로부터 수신된 검색 정보를 상기 클라이언트(310)로 전송한다(단계 S35).As shown in FIG. 4D, the search service transmits information such as a search document number, a search target database list, a section, a search method, and the like (step S31). Then, the job scheduler 320 transmits the information transmitted from the client 310 to the first to nth fires 330/1 to 330 / n in the idle state (step S32). Next, the first to nth fires 330/1 to 330 / n search for the information transmitted by the client 310 and transmit the search result to the set manager 340 (step S33). Next, the set manager 340 stores the received search result and transmits the result set number and the number of documents to the first to nth fires 330/1 to 330 / n (step S34). Next, the first to nth fires 330/1 to 330 / n transmit the search information received from the set manager 340 to the client 310 (step S35).

도 4e는 클라이언트가 검색 결과 리스트를 요청하는 방법을 나타낸 개념도이다.4E is a conceptual diagram illustrating how a client requests a search result list.

상기 도 4e에 도시된 바와 같이, 상기 검색 서비스는 클라이언트(310)에서 검색 시 얻은 결과 셋 번호, 보여줄 섹션과 얻어올 결과의 개수를 전송한다(단계 S41). 그 다음, 잡 스케줄러(320)에서는 상기 클라이언트(310)가 전송한 정보를 유휴 상태인 제 1 내지 제 n 파이어(330/1∼330/n)에게 전송한다(단계 S42). 그 다음, 상기 제 1 내지 제 n 파이어(330/1∼330/n)에서는 상기 클라이언트(310)가 전송한 요청 정보를 셋 매니저(340)에게 전송한다(단계 S43). 그 다음, 상기 셋 매니저(340)는 요청한 개수만큼의 검색 결과를 상기 제 1 내지 제 n 파이어(330/1∼330/n)에게 전송한다(단계 S44). 그 다음, 상기 제 1 내지 제 n 파이어(330/1∼330/n)는 상기 셋 매니저(340)로부터 수신된 검색 정보를 상기 클라이언트(310)로 전송한다(단계 S45).As shown in FIG. 4E, the search service transmits the result set number obtained when the client 310 searches, the section to be shown, and the number of results to be obtained (step S41). Next, the job scheduler 320 transmits the information transmitted from the client 310 to the first to nth fires 330/1 to 330 / n in the idle state (step S42). Next, the first to nth fires 330/1 to 330 / n transmit the request information transmitted from the client 310 to the set manager 340 (step S43). Next, the set manager 340 transmits the requested number of search results to the first to nth fires 330/1 to 330 / n (step S44). Next, the first to n th fires 330/1 to 330 / n transmit the search information received from the set manager 340 to the client 310 (step S45).

상기 도 4f에 도시된 바와 같이, 상기 검색 서비스는 클라이언트(310)에서 문서 번호를 잡 스케줄러(320)에게 전송한다(단계 S51). 그 다음, 잡 스케줄러(320)에서는 상기 클라이언트(310)로부터 수신한 문서번호를 유휴 상태인 제 1 내지 제 n 파이어(330/1∼330/n)에게 전송한다(단계 S52). 그 다음, 상기 제 1 내지 제 n 파이어(330/1∼330/n)에서는 해당 문서의 문서 내용을 상기 클라이언트(310)에게 전송한다(단계 S53).As shown in FIG. 4F, the search service transmits the document number to the job scheduler 320 at the client 310 (step S51). Then, the job scheduler 320 transmits the document number received from the client 310 to the first to nth fires 330/1 to 330 / n in the idle state (step S52). Next, the first to nth fires 330/1 to 330 / n transmit the document contents of the corresponding document to the client 310 (step S53).

도 5a 및 도 5b는 본 발명에 의한 정보검색 관리시스템의 온라인 문서 관리 서비스를 개념적으로 나타낸 개념도이다.5A and 5B are conceptual views conceptually illustrating an online document management service of the information retrieval management system according to the present invention.

먼저, 도 5a는 한 개의 문서를 삽입 및 변경하는 방법을 나타낸 개념도이다.First, FIG. 5A is a conceptual diagram illustrating a method of inserting and changing a single document.

상기 도 5a에 도시된 바와 같이, 상기 한 개의 문서를 삽입 및 변경하는 방법은 클라이언트(310)에서 문서와 삽입 또는 변경할 데이터베이스 이름 등의 삽입 및 변경 정보를 잡 스케줄러(320)로 전송한다(단계 S61).As shown in FIG. 5A, in the method of inserting and changing the single document, the client 310 transmits inserting and changing information such as a document and a database name to be inserted or changed to the job scheduler 320 (step S61). ).

그 다음, 상기 잡 스케줄러(320)에서는 상기 클라이언트(310)로부터 수신한 삽입 및 변경 정보를 데이터 매니저(350)로 전송한다(단계 S62).Next, the job scheduler 320 transmits the insertion and change information received from the client 310 to the data manager 350 (step S62).

그 다음, 상기 데이터 매니저(350)에서는 문서 저장 또는 변경 후 결과를 상기 잡 스케줄러(320)에게 전송한다(단계 S63).Then, the data manager 350 transmits the result after storing or changing the document to the job scheduler 320 (step S63).

그 다음, 상기 잡 스케줄러(320)에서는 삽입 또는 변경이 성공적이면 셋 매니저(340)에게 결과 셋을 갱신(Refresh)할 것을 명령한다(단계 S64).The job scheduler 320 then instructs the set manager 340 to refresh the result set if the insertion or change is successful (step S64).

그 다음, 상기 셋 매니저(340)는 명령 수행 결과를 상기 잡 스케줄러(320)에게 전송한다(단계 S65).Next, the set manager 340 transmits a command execution result to the job scheduler 320 (step S65).

그 다음, 상기 잡 스케줄러(320)는 최종 결과를 유휴 상태의 제 1 내지 제 n 파이어(330/1∼330/n)에게 전송한다(단계 S66).The job scheduler 320 then sends the final result to the first to nth fires 330/1 to 330 / n in idle state (step S66).

그 다음, 상기 제 1 내지 제 n 파이어(330/1∼330/n)에서는 데이터베이스를 다시 오픈하고 결과를 상기 클라이언트(310)에게 전송한다(단계 S67).Next, the first to nth fires 330/1 to 330 / n open the database again and transmit the result to the client 310 (step S67).

상기 도 5b에 도시된 바와 같이, 상기 한 개의 문서를 삭제하는 방법은 클라이언트(310)에서 삭제할 문서의 아이디(ID)를 잡 스케줄러(320)로 전송한다(단계 S71).As illustrated in FIG. 5B, the method of deleting one document transmits an ID (ID) of the document to be deleted from the client 310 to the job scheduler 320 (step S71).

그 다음, 상기 잡 스케줄러(320)에서는 상기 클라이언트(310)로부터 수신한삭제할 문서의 아이디(ID)를 데이터 매니저(350)로 전송한다(단계 S72).Next, the job scheduler 320 transmits the ID (ID) of the document to be deleted received from the client 310 to the data manager 350 (step S72).

그 다음, 상기 데이터 매니저(350)에서는 문서 삭제 후 결과를 상기 잡 스케줄러(320)에게 전송한다(단계 S73).Then, the data manager 350 transmits the result after deleting the document to the job scheduler 320 (step S73).

그 다음, 상기 잡 스케줄러(320)에서는 삭제가 성공적이면 셋 매니저(340)에게 결과 셋을 갱신(Refresh)할 것을 명령한다(단계 S74).The job scheduler 320 then instructs the set manager 340 to refresh the result set if the deletion is successful (step S74).

그 다음, 상기 셋 매니저(340)는 명령 수행 결과를 상기 잡 스케줄러(320)에게 전송한다(단계 S75).Next, the set manager 340 transmits a command execution result to the job scheduler 320 (step S75).

그 다음, 상기 잡 스케줄러(320)는 최종 결과를 유휴 상태의 제 1 내지 제 n 파이어(330/1∼330/n)에게 전송한다(단계 S76).The job scheduler 320 then sends the final result to the first to nth fires 330/1 to 330 / n in idle state (step S76).

그 다음, 상기 제 1 내지 제 n 파이어(330/1∼330/n)에서는 데이터베이스를 다시 오픈하고 결과를 상기 클라이언트(310)에게 전송한다(단계 S77).Next, the first to nth fires 330/1 to 330 / n reopen the database and transmit a result to the client 310 (step S77).

본 발명에 의한 정보검색 관리 시스템에서는 기본적으로 불리언(Boolean)과 확장 벡터 모델을 지원한다. 불리언 모델은 기본 불리언의 확장으로서 불리언 연산자{AND(&), OR(｜), NOT(!)}, 근접도 연산자{NEAR(/N), WITHIN(/W)}, 관계연산자(=,<=,<,>=,>), 범위연산자(∼), 그리고 절단 연산자를 지원한다. 또한 벡터 모델은 순수 벡터 모델의 확장으로서 벡터 연산을 기본으로 하되 불리언 모델의 근접도 연산자를 제외한 나머지 연산자를 허용한다.The information retrieval management system according to the present invention basically supports Boolean and extended vector models. The boolean model is an extension of the basic booleans: Boolean operators {AND (&), OR (|), NOT (!)}, Proximity operators {NEAR (/ N), WITHIN (/ W)}, relational operators (=, < =, <,> =,>), Range operators (~), and truncation operators. In addition, the vector model is an extension of the pure vector model, which is based on vector arithmetic, but allows other operators except the proximity operator of the Boolean model.

다음은, 본 발명에 의한 정보검색 관리 시스템에서 제공하는 기본적인 2가지 모델인 불리언과 확장 벡터에서 질의를 작성하는 기본적인 방법에 대해 설명한다.The following describes a basic method for creating a query in two basic models, Boolean and extension vectors, provided by the information retrieval management system according to the present invention.

상기 정보검색 관리 시스템의 데이터베이스 검색 엔진은 섹션이라 불리는 검색 대상 식별자를 검색의 기본 단위로 삼기 때문에, 질의는 검색 대상 식별자, 검색범위지시자(섹션지정자, 관계연산자, 범위연산자), 단어들로 구성된 탐색어의 구성으로부터 시작된다. 이 탐색어들은 하나 이상의 불리언 연산자(Boolean Operator)/근접도 연산자와 괄호에 의해 결합될 수 있다(단, 벡터 검색의 경우는 검색 연산자가 특별히 존재하지 않음을 참고한다).Since the database search engine of the information retrieval management system uses a search target identifier called a section as a basic unit of the search, the query is a search composed of a search target identifier, a search scope indicator (section designator, a relational operator, a range operator), and words. It begins with the composition of words. These search terms may be combined by one or more Boolean Operators / Proximity Operators and parentheses (note that there is no search operator specifically for vector searches).

다음은, 질의문 작성의 기본형식을 보여준다.The following shows the basic format of a query statement.

<질의어>=(<질의어>)<Query> = (<query>)

<질의어><불리언 연산자><탐색어><Query> <boolean operator> <navigator>

<탐색어><근접도 연산자><탐색어><Navigation word> <proximity operator> <navigation word>

<탐색어><Navigation words>

<탐색어>=<섹션리스트>:<검색 단어 리스트><Navigation words> = <section list>: <list of search words>

<섹션리스트>:(<검색 단어 리스트>)<Section list>: (<search word list>)

<섹션리스트><관계 연산자><검색 단어><Section list> <relational operator> <search word>

<섹션리스트><검색 단어><범위연산자><검색 단어><Section list> <search word> <range operator> <search word>

<섹션 리스트>=<섹션리스트>,<섹션><Section list> = <section list>, <section>

<섹션><Section>

<검색 단어>="<단어>"<Search word> = "<word>"

+<단어>+ <Word>

<단어>*<Word> *

여기서, ','를 이용하여 여러 섹션을 지정 할 수 있다. 그리고, 검색 단어 리스트는 공백으로 구분된 문자열로 구성할 수 있다.Here, several sections can be specified by using ','. The search word list may be formed of a string separated by spaces.

●검색 대상 섹션 지정 방법(탐색어의 구성 방법)● How to specify the section to search (How to configure search terms)

본 발명의 정보검색 관리 시스템에서 섹션은 데이터베이스의 필드에 해당하는 것으로, SQL에서의 필드 조건 지정과 유사하지만, 특정 섹션을 지정하는 과정에서 상기 정보검색 관리 시스템의 색인 방법에 따른 데이터 타입을 반영한다. 표 1은 색인 타입에 따른 탐색어 구성 방법을 보여준다. 상기 정보검색 관리 시스템은 총 6가지의 색인 방법을 제공하는데, 섹션 지정자(:)는 문자열 타입의 검색어의 지정에 사용하는 것으로 총 5가지의 문자열 색인 타입에서 사용될 수 있고, 관계 연산자(=,<,>,<=,>=)는 숫자형과 문자열 일부에서 사용되는 것으로, 표 1과 같이 2가지의 색인타입에서 사용될 수 있다. 마지막으로 범위 연산은 관계연산자와 같은 적용 범위를 가진다.In the information retrieval management system of the present invention, a section corresponds to a field of a database and is similar to field condition designation in SQL, but reflects the data type according to the indexing method of the information retrieval management system in designating a specific section. . Table 1 shows how search terms are constructed according to index types. The information retrieval management system provides a total of six indexing methods. The section designator (:) is used to designate a string type search word and can be used in a total of five string index types, and a relational operator (=, < ,>, <=,> =) Are used for numeric and string parts, and can be used for two index types as shown in Table 1. Finally, the range operation has the same scope as the relational operator.

색인 타입에 따른 탐색어 구성Search term composition by index type 지정자Designator 데이터 타입Data type 색인 타입Index type 탐색어 구성 예Search term configuration example 섹션지정자(:)Section specifier (:) 문자열String INDEX_BY_MAINDEX_BY_CHARINDEX_AS_ISINDEX_BY_TOKENINDEX_AS_IS_MAINDEX_BY_MAINDEX_BY_CHARINDEX_AS_ISINDEX_BY_TOKENINDEX_AS_IS_MA SECTION: WORDSECTION: WORD_LISTSECTION: "WORD"SECTION: "WORD_LIST"SECTION: +WORDSECTION: WORDSECTION: WORD_LISTSECTION: "WORD" SECTION: "WORD_LIST" SECTION: + WORD 관계연산자Relational operator 문자열String INDEX_AS_ISINDEX_AS_IS SECTION = WORDSECTION < WORDSECTION > WORDSECTION <= WORDSECTION >= WORDSECTION = WORDSECTION <WORDSECTION> WORDSECTION <= WORDSECTION> = WORD 숫자형Numeric INDEX_AS_NUMERICINDEX_AS_NUMERIC 범위연산자(∼)Range Operator (~) 문자열String INDEX_AS_ISINDEX_AS_IS SECTION: WORD∼WORDSECTION: WORD-WORD 숫자형Numeric INDEX_AS_NUMERICINDEX_AS_NUMERIC

다음은 검색 대상 섹션 지정 방법에 따른 사용 예를 간단하게 보인 것으로 타이틀(TITLE)은 문자열 타입으로 색인되었고, 날짜(DATE)는 숫자형 타입으로 색인되었다고 가정한다.The following is a simple example of how to specify the section to be searched. Assume that the title (TITLE) is indexed by string type and the date (DATE) is indexed by numeric type.

TITLE : 유전자TITLE: Gene

TITLE : 유전자 정보TITLE: Genetic Information

TITLE : (가정용 컴퓨터 시스템)TITLE: (Household Computer Systems)

DATE > 1990DATE> 1990

DATE : 1990 ∼ 2000DATE: 1990 ~ 2000

위의 예처럼 색인 타입에 따른 데이터 타입을 정확하게 알고 탐색어를 구성하여야 올바른 검색결과를 얻을 수 있다. 물론 타입을 제대로 고려하지 않고 사용할 경우에는 에러 코드와 메시지가 출력된다.As shown in the example above, you need to know the data type according to the index type and compose the search term to get the correct search result. Of course, if you do not consider the type properly, an error code and message will be displayed.

다만 기존의 정보검색 관리 시스템 질의와의 호환을 위해서 표 1에 명시되지 않은 문법을 제공하는데, 그것은 숫자형 타입에 섹션지정자(:)의 사용을 허용한다는 점이다. 즉, 위의 예에서 날짜(DATE) 타입은 (1)과 같이 사용될 수 있으며, 그 의미는 (2)와 같다.However, for compatibility with existing information retrieval management system queries, a grammar not specified in Table 1 is provided, which allows the use of section specifiers (:) for numeric types. That is, in the above example, the DATE type can be used as in (1), and its meaning is as in (2).

DATE : 1990 ..................(1)DATE: 1990 ....................... (1)

DATE = 1990 ..................(2)DATE = 1990 .................. (2)

본 발명의 정보검색 관리 시스템에서는 기본적으로 검색어에 하나의 섹션이 지정되지만, 검색 대상 섹션을 두 개 이상 지정할 경우에는 섹션 명 다음에컴마(,) 그리고 섹션 명을 차례로 입력하여 사용할 수 있다. (3)은 "정보 검색"이라는 검색어에 제목(TITLE)과 컨텐트(CONTENT)의 두 가지 섹션을 지정한 것으로, 이것은 (4)와 같은 의미를 가진다.In the information retrieval management system of the present invention, one section is basically designated as a search word. However, when two or more sections to be searched are specified, a comma (,) and a section name can be entered in order. (3) designates two sections, title (TITLE) and content (CONTENT), for the search term "information search", which has the same meaning as (4).

TITLE, CONTENT : 정보 검색 ...................(3)TITLE, CONTENT: Information Retrieval ......... (3)

TITLE : 정보 검색｜CONTENT : 정보 검색 .......(4)TITLE: Information Search ｜ CONTENT: Information Search ....... (4)

이와 같이 섹션 지정자 ':'은 색인 타입과 무관하게 모든 색인 섹션에 사용될 수 있다.As such, the section specifier ':' can be used for all index sections, regardless of index type.

●지정된 검색 대상 섹션이 영향을 미치는 범위● Scope to which the specified search target section affects

원칙적으로, 모든 검색 단어에는 섹션이 명시되어야 하며, 그렇지 않은 경우에는 오류가 발생된다. 이전 검색 단어의 섹션 지정을 그대로 계승하는 경우에 한해서 질의 문맥의 의미에 따른 섹션 생략을 부분적으로 허용하는데, 그 예는 다음에서 설명한다. 단, 섹션 지정자(:)가 사용된 경우에만 그 생략을 허용하고, 관계연산이나 범위연산이 사용된 경우에는 섹션을 명시하여야 한다.In principle, every search word must contain a section, otherwise an error occurs. Only the section elimination according to the meaning of the query context is allowed, but only if the section designation of the previous search word is inherited. An example is described below. However, the omission is allowed only when the section specifier (:) is used, and the section must be specified when a relational or range operation is used.

∥ 허용하는 경우∥ If you allow

표 2의 경우 사용자는 제목(TITLE)에서 'car audio system'을 찾고자 하는 것으로, 표 2의 오른쪽 열과 같이 표현되어야 하지만, 사용의 편의성을 위해 왼쪽 열의 질의 형태를 허용한다. 왼쪽 열에서 괄호의 사용은 무방하고, 'car', 'audio', 'system'은 모두 제목(TITLE)이라는 섹션 지정이 된 것으로 간주한다.In the case of Table 2, the user wants to find 'car audio system' in the title (TITLE), which should be expressed like the right column of Table 2, but allows the form of query in the left column for ease of use. The use of parentheses in the left column is acceptable, and 'car', 'audio', and 'system' are all assumed to be section designations called TITLE.

허용 질의 vs 원질의 의미Tolerance vs. Material 허용 질의Allow query 원 질의 의미Meaning of raw quality TITLE:car & audio & systemTITLE:(car & audio & system)TITLE:(car) & audio & systemTITLE:((car & audio) & system)TITLE: car & audio & systemTITLE: (car & audio & system) TITLE: (car) & audio & systemTITLE: ((car & audio) & system) TITLE:car & TITLE:audio & TITLE:systemTITLE: car & TITLE: audio & TITLE: system

¶허용하지 않는 경우¶ If not allowed

(5)와 같은 형태도 허용하는데, 이 경우 'system'은 이전 섹션인 'TITLE'을 기본적으로 계승하기 때문에 가능한 경우이고, (6)의 경우 'car'는 이전 섹션이 존재하지 않기 때문에 에러로 처리 한다.(5) is also allowed, in which case 'system' is possible because it inherits the previous section 'TITLE' by default, and in (6) 'car' is an error because no previous section exists. Take care.

(TITLE:car & audio)&system ................(5)(TITLE: car & audio) & system ...... (5)

car & TITLE:audio & system ................(6)car & TITLE: audio & system ...... (6)

관계 연산자와 범위 연산자의 경우는 섹션 생략이 허용되지 않는데, 이 경우에 대한 예제는 표 3과 같다.Section omission is not allowed for the relational operator and the range operator. An example of this is shown in Table 3.

섹션의 생략이 허용되지 않는 예제Example where section omission is not allowed 관계 연산자의 사용Use of relational operators 범위 연산자의 사용Use of range operators DATE > 1990 & 2000DATE > (1990 & 2000)DATE = 1990 & 2000DATE = (1990 & 2000)DATE> 1990 & 2000 DATE> (1990 & 2000) DATE = 1990 & 2000 DATE = (1990 & 2000) 1990∼20001990-2000

●연산자 우선 순위● Operator Priority

질의 문에 여러 가지 연산자를 혼합하여 사용할 경우에는 연산 우선순위를 적용하여 연산자 간의 연산 순위를 결정한다. 표 4는 본 발명의 정보검색 관리 시스템의 검색 엔진에서 제공하는 연산자들의 우선 순위를 예시하고 있다.When using a combination of operators in a query statement, the operation priority is applied to determine the operation rank between operators. Table 4 illustrates the priority of the operators provided by the search engine of the information retrieval management system of the present invention.

연산자 우선 순위Operator precedence 우선순위Priority 연산자Operator 높다↑↓낮다High ↑ ↓ Low 불용어 지정자문자열 지정자괄호섹션 지정자관계 연산자범위 연산자근접도 연산자불리언 연산자Terminology Specifier String Specifier Bracket Section Specifier Relation Operator Scope Operator Proximity Operator Boolean Operator +" "( );,= > < >= <=-/W /NAND(&) OR(｜) NOR(!)+ "" ();, => <> = <=-/ W / NAND (&) OR (|) NOR (!)

● 불용어(STOPWORD) 처리● Stopword Processing

데이버베이스(DB) 관리자가 상기 정보검색 관리 시스템 데이터 적재기를 사용하여 데이터를 적재할 때, 각 섹션에 불용어를 지정할 수 있다. 이때 지정한 불용어는 검색 시 사용하는데, 사용자가 검색 단어를 입력하면 검색기는 사용자가 입력한(섹션, 검색 단어) 쌍을 불용어와 비교하여 불용어로 등록된 것은 검색에서 제외한다. 이러한 불용어를 검색에서 제외하는 방법은 일반적으로 전문 검색의 성능향상을 가져오기 때문에 사용한다.When a database administrator loads data using the information retrieval management system data loader, it is possible to assign stopwords to each section. In this case, the designated stop word is used in the search. When the user inputs a search word, the searcher compares the pair (section, search word) entered by the user with the stop word and excludes the registered stop word from the search. Excluding such stopwords from the search is generally used because it improves the performance of the full-text search.

그러나, 불용어를 검색에 포함시키기를 원하는 사용자를 위해서 특수한 지정자('+')를 제공하는데, '+'를 검색 단어의 앞에 붙여 줌으로서 해당 단어가 불용어의 경우라도 검색에 포함시키도록 한다.However, a special designator ('+') is provided for users who want to include a stopword in the search, by prefixing the '+' to the search word so that the word is included in the search even if it is a stopword.

예를 들어 '시스템'이 불용어일 경우 (7)과 (8)은 다른 검색 결과를 제시한다.For example, if 'system' is a stopword, (7) and (8) suggest different search results.

TITLE: 자동차 오디오 시스템 ...............(7)TITLE: Car Audio System ............... (7)

TITLE: 자동차 오디오 + 시스템 ...............(8)TITLE: Car Audio + System ............... (8)

(7)의 경우는 제목(TITLE)에 '자동차'와 '오디오'만이 포함된 문서를 찾아주고, (8)은 세 검색 단어가 모두 포함된 문서를 찾아준다.(7) finds documents containing only 'car' and 'audio' in the title (TITLE), and (8) finds documents containing all three search words.

불용어 지정자의 사용시, 단지 검색어와 함께 검색어의 앞에 공백 없이 사용하여야 하고, 다음과 같이 연산자에 붙여 사용하는 경우는 허용하지 않는다.When using a stopword designator, you must use it without a space in front of the search word together with the search word.

TITLE: 정보 +& 검색TITLE: Search for information + &

다만 아래에서 설명할 문자열 지정자와 같이 사용하는 것은 허용한다.However, it can be used with the string specifier described below.

TITLE: +"정보 검색"TITLE: + "Search for information"

● 문자열 지정자의 사용● Use of string specifiers

문자열 지정자("")는 검색 단어의 구분 시 공백이나 특수 문자와 관계 없이 전체를 하나의 검색 단어로 구분하기 위한 방법으로써, 문자열 지정자 안에 명시 된 내용은 섹션의 색인 타입과 관계없이 전체가 하나의 색인어로 간주된다. 주로 INDEX_AS_IS에서 유용하게 사용될 수 있다.The string specifier ("") is a method to separate the whole word into one search word regardless of spaces or special characters when separating the search words. The content specified in the string designator is one whole regardless of the index type of the section. It is considered an index word. Mainly useful in INDEX_AS_IS.

도 6은 문서의 제목(TITLE) 섹션을 두 가지의 색인 방식에 따라 색인 한 것으로, INDEX_AS_IS 방식에서는 공백이나 특수 문자를 고려하지 않기 때문에 총 1개의 색인어가 생성되고, INDEX_BY_TOKEN 방식에서는 구분자를 이용하기 때문에 총 3개의 색인어가 생성됨을 볼 수 있다.FIG. 6 illustrates that the title section of the document is indexed according to two indexing methods. In the INDEX_AS_IS method, a single index word is generated because no spaces or special characters are considered, and the INDEX_BY_TOKEN method uses delimiters. You can see that three index words are generated.

이 경우 만약 사용자가 제목(TITLE)을 INDEX_AS_IS와 같은 방식으로 색인 하였다면, 검색 시 (9)와 같이 문자열 지정자를 사용하여 질의 구성하여야만 올바른 결과를 얻을 수 있다. (10)과 같이 문자열 지정자를 사용하지 않는 단어의 나열은여러 개의 색인어를 생성시키기 때문이다.In this case, if the user indexes the title (TITLE) in the same way as INDEX_AS_IS, a correct result can be obtained only when constructing a query using a string specifier as in (9). This is because a list of words that do not use string specifiers, such as (10), generates multiple index words.

TITLE: "AN20020808 공지 사항" ................(9)TITLE: "AN20020808 Announcements" ...... (9)

TITLE: AN20020808 공지 사항 ................(10)TITLE: AN20020808 Notice ...... (10)

그 예를 보면, (9)는 ""안의 내용이 공백이나 특수 문자에 관계없이 동일한 제목(TITLE)을 가지는 올바른 문서를 찾아내는 반면에, (10)은 질의 처리 과정에서 세 개의 검색 단어를 생성 시키기 때문에, 사용자가 기대하는 결과를 얻을 수 없다.For example, (9) finds the correct document with the same title (TITLE), regardless of spaces or special characters, while (10) generates three search words during query processing. As a result, the user does not get the expected results.

●불리언(Boolean) 검색Boolean search

불리언 검색은 검색 시스템에서 가장 기본적으로 사용되는 검색 모델이다. 불리언 검색에서 지원하는 연산자는 표 5와 같은데, 연산자는 대문자로 표기되어야 한다.Boolean search is the most commonly used search model in search systems. The operators supported in Boolean search are shown in Table 5. The operators must be capitalized.

불리안 연산자Boolean operators 관계 연산자Relational operator 범위 연산자Range operator 근접도 연산자Proximity operator 절단 연산자Truncation operator &(AND)｜(OR)!(NOR)& (AND) | (OR)! (NOR) =<=<>=>= <= <> => ∼To N(NEAR)W(WITHIN)N (NEAR) W (WITHIN) **

●불리언 연산 검색Boolean operation search

상기 정보검색 관리 시스템에서 사용하는 불리언 연산자는 모두 바이너리 연산자이다. 다음은 불리언 연산 예제이다.All Boolean operators used in the information retrieval management system are binary operators. Here is an example of a Boolean operation:

ABS:(정보 & 검색) ...............(11)ABS: (Information & Search) ............... (11)

ABS:(정보 ｜검색) ...............(12)ABS: (Information | Search) ............... (12)

TI:(정보 ! 검색) ...............(13)TI: (Information! Search) ............... (13)

ABS:! 검색 ...............(14)ABS :! Search ............... (14)

질의 (11)의 검색 결과는 ABS 섹션에서 "정보"와 "검색"이 모두 포함하는 단어를 검색한다. 질의 (12)는 ABS 섹션에서 "정보" 또는 "검색" 중에 하나 이상의 단어가 포함하는 단어를 검색한다. 질의 (13)은 TI 섹션에서 "정보"를 포함하는 문서 중에서 "검색" 단어가 포함하지 않는 문서를 검색한다. 질의 (14)는 상기 정보검색 관리 시스템에서 지원하지 않는 연산이다. 상기 '!' 연산자는 "AND NOT"의 의미를 갖는다.The search result of the query 11 searches for a word including both "information" and "search" in the ABS section. Query 12 searches for words in one or more of the words “information” or “search” in the ABS section. Query 13 searches for documents in the TI section that do not contain the word "search" among the documents that contain "information". The query 14 is an operation that is not supported by the information retrieval management system. remind '!' Operator has the meaning of "AND NOT".

● 관계 연산 검색● Relational operation search

DATE >= 20020101 ......................(15)DATE> = 20020101 ......... (15)

질의 (15)의 검색 결과는 날짜(DATE) 섹션에서 날짜가 2002년 1월 1일보다 크거나 같은 날짜를 포함하는 모든 문서를 검색한다.The search result of query (15) retrieves all documents in the DATE section that contain dates that are greater than or equal to January 1, 2002.

●근접도 연산 검색Proximity calculation search

ABS: (정보 /W2 검색) ....................(16)ABS: (Info./W2 Search) ......................... (16)

ABS: (정보 /N2 검색) ....................(17)ABS: (Info./N2 Search) ......................... (17)

질의 (16)의 검색 결과는 ABS 섹션에서 단어간의 거리가 2이하이고 인접한 단어간의 순서대로 "정보"와 "검색"이 모두 포함하는 단어를 검색한다. 질의 (17)의 검색 결과는 ABS 섹션에서 단어간의 거리가 2이하이고 인접한 단어간에 순서에상관 없이 "정보"와 "검색"이 모두 포함하는 단어를 검색한다.The search result of the query 16 searches for words in the ABS section that contain both "information" and "search" in the order of the adjacent words and the distance between words is 2 or less. The search result of the query (17) searches for the words in the ABS section that are less than or equal to two words in length and that both "information" and "search" include adjacent words in any order.

●절단 연산 검색Cutting operation search

상기 정보검색 관리 시스템에서 지원하는 절단 연산자는 '*'이고, 우측 절단에 한해서만 활용될 수 있다.The truncation operator supported by the information retrieval management system is '*' and can be used only for right truncation.

ABS: 정보* .......................(18)ABS: Information * ......... (18)

질의 (18)의 검색 결과는 ABS 섹션에서 "정보","정보검색","정보시스템","정보처리" 등 "정보"를 포함하는 모든 단어를 검색한다.The search result of the query 18 searches for all words including "information", such as "information", "information search", "information system", "information processing", and the like in the ABS section.

●문자열 확장 검색String Expansion Search

문자열 확장에는 복합명사 확장과 다수 문자열 확장의 두 가지 형태가 있는데, 이것들은 재현율(recall)과 정확도(precision)를 향상시키기 위해 사용하는 방법이다.There are two types of string expansion: compound noun expansion and multiple string expansion. These are the methods you use to improve recall and precision.

첫째, 복합명사 확장에서는 복합 명사를 각 구성 명사의 형태로 분해한 후, 연산자를 조합함으로써 재현율과 정확도를 향상시킨다. 예를 들어, 사용자가 "TI: 정보검색"이라는 질의를 입력하였고, TI는 INDEX_AS_MA로 색인되었다고 가정하자. 검색 시스템은 '정보검색'을 색인기의 입력으로 주어 구성명사 '정보', '검색'의 두 가지 분해 결과를 얻는다. 동일 레벨의 구성 명사 사이에는 WITHIN 연산자를 적용시킴으로써 (20)과 같은 질의를 생성해 낸다.First, in compound noun extension, compound nouns are decomposed into the form of each constituent noun, and then combination of operators improves reproducibility and accuracy. For example, suppose a user enters the query "TI: information retrieval" and TI is indexed with INDEX_AS_MA. The retrieval system gives 'information search' as the input to the indexer and obtains two decomposition results of the construct nouns 'information' and 'search'. A query like (20) is generated by applying the WITHIN operator between constructs of the same level.

TI: 정보검색 .................(19)TI: Information Retrieval ......... (19)

→TI:(정보 /W1 검색) ........(20)→ TI: (Info / W1 Search) ........ (20)

질의 (19)의 복합명사는 질의 (20)으로 확장 된다. 질의 (19)의 검색 결과는 TI 섹션에서 "정보"와 "검색"이 순서대로 인접한 문서를 검색한다.Compound nouns in query (19) are expanded into query (20). The search result of the query 19 searches for adjacent documents in the order of "information" and "search" in the TI section.

그 밖에도 이와 유사하게 처리되는 질의의 형태는 다음과 같다.In addition, the form of a similarly processed query is as follows.

공백 이외의 단어 구분자로 연결된 단어 : red-earedWords concatenated with word separators other than spaces: red-eared

둘째, 다수 문자열 확장은 공백으로 구분된 여러 검색 단어를 입력할 경우에 연산자를 적용하여 검색하는 방법으로서 사용되는 기본 연산자는 '&'이다. 그러나, 사용자가 그 의미를 변경하여 사용하는 것을 허용한다.Second, the multi-string expansion is a '&' which is used as a method of applying an operator when entering multiple search words separated by spaces. However, it allows the user to change its meaning and use it.

TI: 정보 검색 연구 ...............(21)TI: Information Retrieval Research ............... (21)

→TI:(정보 & 검색 & 연구) ........(22)→ TI: (Information & Search & Research) ........ (22)

질의 (21)은 검색 시스템에서 내부적으로 (22)형태로 처리된다.The query 21 is processed internally by the search system in the form of 22.

TI: 정보 검색 연구 .................(23)TI: Information Retrieval Research ......... (23)

→TI:((정보 & 검색) & 연구) ........(24)→ TI: ((Information & Search) & Research) ........ (24)

질의 (23)은 복합명사와 여러 검색 단어가 혼합되어 있는 것으로서 (24)와 같이 처리된다.The query (23) is a mixture of compound nouns and several search words, and is processed as (24).

● 확장 벡터(Extended Vector) 검색● Extended Vector Search

상기 정보검색 관리 시스템에서 확장 벡터 검색은 일반적으로 알려진 확장벡터 모델과는 달리 기본적인 벡터검색에 추가적으로 WITHIN/NEAR 근접도 연산자를 이용한 가중치 계산을 지원하는 모델이다. 기본적으로 벡터 표현 방법으로 주어진 것에서는 벡터 연산을 지원하고, 불리언 연산이 사용 된 것들에는 불리언 연산자를 그대로 지원함으로써, 벡터와 불리언 검색의 이점을 모두 취할 수 있는 방법이다.확장된 벡터 모델에서 지원하는 연산자는 표 6과 같다.In the information retrieval management system, extended vector retrieval is a model that supports weight calculation using the WITHIN / NEAR proximity operator in addition to basic vector retrieval. Basically, it is a way to take advantage of vectors and boolean searches by supporting vector operations on those given as vector representations and Boolean operators on those where Boolean operations are used. The operators are shown in Table 6.

불리안 연산자Boolean operators 관계 연산자Relational operator 범위 연산자Range operator 근접도 연산자Proximity operator 절단 연산자Truncation operator &(AND),｜(OR),!(NOT)& (AND), ｜ (OR),! (NOT) =<=<>=>= <= <> => ∼To N(NEAR)W(WITHIN)N (NEAR) W (WITHIN) **

●기본적인 벡터 검색Basic vector search

TI: 정보 검색 시스템 ..............................(25)TI: Information Retrieval System ......................... (25)

질의 (25)의 검색 결과는 TI 섹션에서 "정보 검색 시스템"에 대한 벡터 가중치를 적용하여 랭킹(ranking) 처리된 문서들을 검색한다.The search result of query 25 retrieves the ranked documents by applying the vector weight for the "information retrieval system" in the TI section.

●불리언 연산 검색Boolean operation search

TI: 정보 검색 & ABS: 검색 시스템 ...................(26)TI: Information Retrieval & ABS: Retrieval System ......... (26)

TI: 정보 검색 ｜ ABS: 검색 처리 ...................(27)TI: Information retrieval | ABS: Retrieval processing ......... (27)

질의 (26)의 검색 결과는 TI 섹션에서 "정보 검색"에 대한 벡터가중치를 적용하여 랭킹 처리 된 문서들의 집합과 ABS 섹션에서 "검색 시스템"에 대한 벡터 가중치를 적용하여 랭킹 처리 된 문서들의 집합을 교집합(Intersection)한 문서를 검색한다. 질의 (27)의 검색 결과는 TI 섹션에서 "정보 검색"에 대한 벡터 가중치를 적용하여 랭킹 처리된 문서들의 집합과 ABS 섹션에서 "검색 처리"에 대한 벡터 가중치를 적용하여 랭킹 처리 된 문서들의 집합을 '｜' 처리된 문서를 검색한다.The search results of query 26 intersect the set of documents ranked by applying vector weights for "information retrieval" in the TI section and the set of documents ranked by applying vector weights for "search system" in the ABS section. Search for (Intersection) documents. The search result of query 27 is a set of documents ranked by applying vector weights for "information retrieval" in the TI section and a set of documents ranked by applying vector weights for "search processing" in the ABS section. 'Search the processed document.

●관계 연산 검색Relational operation search

DATE >= 20020101 .....................(28)DATE> = 20020101 ..................... (28)

질의 (28)의 겸색 결과는 DATE 섹션에서 날짜가 2002년 1월 1일보다 크거나 같은 날짜를 포함하는 모든 문서를 검색한다.The color search result of query (28) retrieves all documents in the DATE section that contain dates whose date is greater than or equal to January 1, 2002.

●절단 연산 검색Cutting operation search

ARS: 정보* ........................(29)ARS: Information * ..... (29)

질의 (29)의 검색 결과는 ABS 섹션에서 "정보", "정보검색", "정보시스템", "정보처리" 등 "정보"를 포함하는 모든 단어를 검색한다.The search result of the query 29 retrieves all words in the ABS section including "information" such as "information", "information search", "information system", "information processing", and the like.

●복합 명사 확장 검색Compound Noun Expansion Search

TI: 정보검색 ..................(30)TI: Information Retrieval ... (30)

→TI: 정보 /W 검색 ............(31)→ TI: Search for information / W ... (31)

질의 (30)에서 복합명사는 질의 (31)로 질의가 확장 된다. 질의 (31)의 검색 결과는 TI 섹션에서 "정보 검색"에 대한 벡터 가중치와 거리 가중치를 적용하여 랭킹 처리된 문서를 검색한다.In query 30, the compound noun expands into query 31. The search result of the query 31 searches the ranked documents by applying vector weights and distance weights for "information search" in the TI section.

이상의 본 발명은 상기에 기술된 실시예들에 의해 한정되지 않고, 당업자들에 의해 다양한 변형 및 변경을 가져올 수 있으며, 이는 첨부된 특허청구범위에서 정의되는 본 발명의 취지와 범위에 포함되는 것으로 보아야 할 것이다.The present invention is not limited to the above-described embodiments, but can be variously modified and changed by those skilled in the art, which should be regarded as included in the spirit and scope of the present invention as defined in the appended claims. something to do.

이상에서 살펴본 바와 같이, 본 발명에 의한 정보검색 관리시스템 및 그 방법에 의하면, 데이터베이스에 대한 검색기능 외에 안정적인 관리 기능을 추가시켜 구성함으로써, 저비용 및 고효율의 정보시스템 구축이 가능하다.As described above, according to the information retrieval management system and the method according to the present invention, by configuring a stable management function in addition to the retrieval function for the database, it is possible to build a low-cost and high-efficiency information system.

또한, 유니코드를 지원함으로써, 고어/한자 및 다국어 언어를 저장엔진 수준에서 지원이 가능하다.In addition, by supporting Unicode, it is possible to support Gore / Chinese and multilingual languages at the storage engine level.

또한, 문서 및 색인 데이터베이스에 대한 압축을 지원함으로써, 문서 크기의 세배정도 되는 색인 데이터베이스에 대한 공간을 축소하여 저장 공간을 효율적으로 이용할 수 있다.In addition, by supporting compression of the document and the index database, the space for the index database, which is about three times the document size, can be reduced to efficiently use the storage space.

또한, 색인기를 위한 전문 분야에 대한 용어사전 구축에 따른 표준화 및 관리 기법의 개발로 지속적인 용어사전의 확장을 통해, 한국어 정보처리 기술의 근간이 되는 언어자료 인프라가 구축되며, 한국어 정보처리 기술에 대한 체계적인 접근을 통해 언어정보처리 기술의 식민지화를 방지하고 국내 연구자들의 연구 기반을 마련할 수 있다.In addition, through the development of standardization and management techniques based on the construction of terminology dictionary for specialized fields for indexers, the linguistic data infrastructure, which is the basis of Korean information processing technology, is established through continuous expansion of terminology dictionary. Through a systematic approach, it is possible to prevent colonization of linguistic information processing technology and to lay the foundation of research for domestic researchers.

또한, 전체 시스템의 기능을 여러 개의 프로세서가 나누어 처리함으로써, 시스템의 부하를 최소화하고 안정적인 온라인 트랜잭션 처리와 데이터 관리를 보장한다. 그리고, 기존의 정보검색 시스템에서의 데이터 추가, 변경, 삭제는 검색 서비스를 하면서 동시에 처리할 수 없는데, 본 발명의 시스템에서는 4개의 프로세서로 나뉘어 문서 관리만 담당하는 프로세서가 따로 존재하고 데이터 변경이 이루어졌는지를 감지하여 검색 서비스를 담당하는 프로세스에게 전파하는 프로세서가 따로 존재하기 때문에 자연스럽게 검색서비스를 하면서 온라인으로 데이터의 변경이 이루어지는 효과가 있다. 따라서, 전문분야 포탈이나 기업포탈 등의 인터넷서비스 분야, 전자도서관 분야, 논문검색 시스템, 전자 문서 관리 시스템(EDMS)/그룹웨어 분야, 더 나아가 지식 관리 시스템(KMS) 분야 등에 활용가치가 높다.In addition, multiple processors share the functionality of the entire system, minimizing the load on the system and ensuring reliable online transaction processing and data management. In addition, data addition, change, and deletion in the existing information retrieval system cannot be simultaneously processed while performing a retrieval service. In the system of the present invention, a processor is divided into four processors and only a document management agent exists and data is changed. Since there is a separate processor that detects whether the data is lost and propagates it to the process in charge of the search service, the data is changed online while the search service is naturally performed. Therefore, it is highly useful in internet service fields such as specialized portals and corporate portals, electronic library fields, thesis retrieval systems, electronic document management systems (EDMS) / groupware, and even knowledge management systems (KMS).

Claims

A list database volume that stores the list database;

A document database volume in which one or several document databases are distributed and stored;

A dictionary database volume in which one or several dictionary databases are distributed and stored;

A kernel that performs data input / output between the list and document database volumes and user memory, and manages files, directories, records, and inverse files in the list and document database volumes;

A storage engine for performing catalog management, document and index management using the kernel to store meta information about a database;

A search engine that searches for a user's query;

An indexer for extracting an index word for the document input by the user using the dictionary database; And

An information retrieval management system comprising a data manager that receives a schema file created by an administrator, creates a database, bulk loads a bundle of source documents, and performs an index.

The method of claim 1, wherein the kernel,

And managing a page and a buffer for accessing the list and document database volumes, and performing a logging and locking function therefor.

The method of claim 1, wherein the kernel,

A file and directory manager for facilitating access by mapping a physical identifier of a disk where a record is stored to a logical identifier, and for issuing logical identifiers for multiple volumes;

It supports the management of objects that can be stored on one page internally and long data items more than one page, and the ability to sequentially access records in the file from the beginning, and update operations related to inserting, modifying, and deleting records. A record manager;

A reverse file manager which compresses a key for efficient storage space and performs insertion, modification, deletion, and retrieval;

A transaction manager providing a start and end of a transaction, aborting, a save point function, and a function of recording a transaction information in a log file; And

And an input / output manager configured to manage data input / output between the list and document database volumes and user memory.

The method of claim 3, wherein

The transaction manager uses a checkpoint to reduce recovery time, and if a specified job fails in the middle, all the jobs are returned to the starting point of the transaction to maintain the integrity of the data.

The method of claim 3, wherein the input / output manager,

A page manager for allocating, deleting, and maintaining pages in the list and document database volumes;

A buffer manager for mapping the pages of the disk to the pages of the memory; And

An information retrieval management system comprising a lock manager for different requirements to access the same object.

The method of claim 1, wherein the storage engine,

A catalog manager that manages information about a structure of a database to be built;

A document manager converting the original document into an internal document structure and performing operations for insertion, deletion or modification of an existing document; And

And an index manager configured to configure the index information extracted using the indexer into a structure suitable for searching.

The method of claim 1,

And information about the structure of the database includes a document structure, an indexing method, primary key information, compression, and stopwords.

A list database volume that stores the list database;

A database schema file describing information about the directory and volume of the list and document database, a database group, a section definition, a section indexing method, a structure of a raw document, and a loading method from a source document to a database;

A raw document file consisting of raw documents to be loaded into the document database;

A stopword list to be loaded into the list database; And

And a loader configured to perform the document database generation and the document loading by the information received from the database schema file, and to receive a command from the administrator for the document database generation and the document loading.

The method of claim 1,

And a document having a heterogeneous structure can be loaded into a single database through a structure of a source document described in the database schema file.

It distributes the work according to the status of the first to nth fires received from the client, requests document management from the data manager during online document management, and transmits the result to the client. A job scheduler for notifying the first to nth fire, set managers that the database has been changed;

First to n-th fires that perform a job on a service request received from the job scheduler, transmit a result to the client, and request to save a search result to a set manager;

A set manager that performs a task on a service request received from the first to nth fires and transmits a result to the first to nth fires; And

And a data manager for performing a job on the service request received from the job scheduler and transmitting the result to the job scheduler.

An information retrieval management method of an information retrieval management system comprising the job scheduler of claim 10, first to nth fires, set manager, and data manager,

A first step of a client requesting and receiving information of a database from the first to nth fires through the job scheduler;

A second step in which the client requests and receives a section list of the first to nth fire databases through the job scheduler;

A third step of receiving, by the client, a search result from the first to nth fires through the job scheduler and receiving a search result;

A fourth step in which the client requests a similar document search from the first to nth fires through the job scheduler and receives a search result;

A fifth step in which the client requests and receives a search result list from the first to nth fires through the job scheduler; And

And a sixth step in which the client receives and receives the original text of the document from the first to nth fires through the job scheduler.

The method of claim 11, wherein the first to n-th fire in the first step,

And retrieving database information and transmitting the database information to the client at the request of the client received through the job scheduler.

The method of claim 11, wherein the first to n-th fire in the second step,

And retrieving a section list of a database at the request of the client received through the job scheduler and transmitting the retrieved section list to the client.

The method of claim 11, wherein in the third step, the client,

And retrieving search information including a query, a search target database list, and a section through the job scheduler through the job scheduler.

The method of claim 14, wherein the third step,

Performing a search by the search information received from the client in the first to nth fires, and then transmitting a search result to the set manager;

Storing the search results received from the first to nth fires in the set manager, and transmitting a result set number and the number of documents to the first to nth fires; And

And transmitting the result set number and the number of documents received from the set manager to the client in the first to nth fires.

The method of claim 11, wherein in the fourth step, the client,

And transmitting similar document search information including a search document number, a search target database list, a section, and a search method through the job scheduler through the job scheduler.

The method of claim 16, wherein the fourth step,

Transmitting a search result to the set manager after performing a search based on the similar document search information received from the client in the first to nth fires;

The method of claim 11, wherein in the fifth step, the client,

And transmitting the search result list information including the result set number, a section to be displayed, and the number of search result lists to be received when searching the first to nth fires through the job scheduler.

The method of claim 18, wherein the fifth step,

Transmitting the search result list information received from the client in the first to nth fires to the set manager;

Transmitting as many documents as the number of search result lists included in the search result list information received from the first to nth fires to the first to nth fires in the set manager; And

Searching for section information requested by the client with the document received from the set manager in the first to nth fires, and transmitting a search result to the client as the document; How to manage.

The method of claim 11, wherein in the sixth step, the client,

And transmitting the document number to the first to nth fires through the job scheduler.

The method of claim 20, wherein the sixth step is

And the first to nth fires transmit the document contents of the document corresponding to the document number received from the client to the client.

Transmitting, by the client, a document and database information to be inserted or changed to the job scheduler;

Transmitting, by the job scheduler, information transmitted by the client to the data manager;

Transmitting a result after storing or changing a document in the data manager to the job scheduler;

Instructing a set manager to update the result set if the insert or change is successful in the job scheduler;

Transmitting a command execution result to the job scheduler in the set manager;

Sending a final result from the job scheduler to the first to nth fires; And

And a method of inserting and modifying a document, the method comprising reopening a database and transmitting a result to the client in the first to nth fires.

The method of claim 22, wherein the information retrieval management method,

Transmitting an ID of a document to be deleted from the client to the job scheduler;

Transmitting, by the job scheduler, an ID of a document to be deleted received from the client to the data manager;

Transmitting a result to the job scheduler after deleting a document in the data manager;

Instructing a set manager to update the result set if the deletion is successful in the job scheduler;

Sending a final result from the job scheduler to the first to nth fires; And

And re-opening the database in the first to nth fires and deleting a single document comprising transmitting a result to the client.