KR101744017B1

KR101744017B1 - Method and apparatus for indexing data for real time search

Info

Publication number: KR101744017B1
Application number: KR1020160077358A
Authority: KR
Inventors: 송상욱
Original assignee: 주식회사 지앤클라우드
Priority date: 2016-03-11
Filing date: 2016-06-21
Publication date: 2017-06-07

Abstract

실시간 검색을 위한 데이터 인덱싱 방법 및 장치가 제공된다. 데이터 인덱싱 방법은, 컴퓨팅 장치에서 수행되는 실시간 검색을 위한 데이터 인덱싱 방법으로서, 메모리의 문서를 로그 파일로 기록하는 단계, 로그 파일에서 읽은 정보의 적어도 일부를 포함한 일정량의 문서를 선택하는 단계, 문서에 대한 적어도 하나의 임시 세그먼트를 생성하는 단계, 적어도 하나의 임시 세그먼트를 검색엔진의 검색에 노출하는 단계, 및 적어도 하나의 임시 세그먼트가 노출된 상태에서 적어도 하나의 임시 세그먼트에 포함된 문서가 머징(merging) 중이면, 해당 삭제 후보 문서의 식별자를 저장한 삭제 요청 파일을 생성하는 단계를 포함한다.A method and apparatus for data indexing for real-time searching are provided. A method of data indexing, comprising: writing a document in a memory to a log file; selecting a quantity of documents including at least a portion of the information read in the log file; Creating a temporary segment for at least one temporary segment, exposing at least one temporary segment to a search of a search engine, and merging a document contained in at least one temporary segment with at least one temporary segment being exposed, ), The deletion request file storing the identifier of the deletion candidate document is generated.

Description

TECHNICAL FIELD [0001] The present invention relates to a data indexing method and apparatus for real-

본 발명은 실시간 검색 기술에 관한 것으로, 더욱 상세하게는, 실시간 검색을 위한 데이터 인덱싱 방법 및 장치에 관한 것이다.The present invention relates to a real-time search technology, and more particularly, to a data indexing method and apparatus for real-time search.

인덱스(Index)는 데이터베이스 분야에 있어서 테이블에 대한 동작 속도를 높여주는 자료 구조를 말한다. 인덱스는 통상 테이블 내의 단일 칼럼이나 여러 개의 칼럼을 이용하여 생성된다. 이러한 인덱스 구조를 잘 이용하면, 데이터 검색에 있어서 레코드 접근과 관련한 효율적인 순서 매김 동작뿐만 아니라 고속의 검색 동작이 가능하다.An index is a data structure that speeds up operations on tables in the database field. Indexes are usually created using a single column or multiple columns in a table. By using this index structure well, it is possible to perform high-speed search operation as well as efficient ordering operation related to record access in data retrieval.

인덱스를 저장하는 디스크 공간은 테이블을 저장하는데 필요한 디스크 공간보다 작은 것이 일반적이다. 그것은 테이블이 다른 세부 항목들을 갖지만 인덱스는 키 필드만을 갖기 때문이다. 관계형 데이터베이스에서 인덱스는 테이블 부분에 대한 하나의 사본일 수 있다.The disk space for storing the index is generally smaller than the disk space required for storing the table. That is because the table has other details but the index only has the key field. In relational databases, an index can be a single copy of a table part.

또한, 인덱스는 데이터에 대하여 고유 제약 조건을 설정하기 위해 사용된다. 고유 인덱스는 중복된 항목이 등록되는 것을 차단하도록 기능하기 때문에 인덱스의 대상인 테이블에서 고유성이 보장된다.The index is also used to set unique constraints on the data. Since unique indexes prevent duplicate entries from being registered, uniqueness is ensured in the table being indexed.

최근 전기, 전자, 통신 등의 기술 분야의 발전에 따라 인터넷, 유무선 통신 등에서 수많은 데이터가 생성되고 있다. 특히 인터넷에서 소셜 네트워크 서비스나 포털 서비스 등을 제공하는 업체는 기존 데이터베이스 관리도구로 데이터를 수집, 저장, 관리, 분석할 수 있는 역량을 넘어서는 대용량의 정형 또는 비정형 데이터 집합(이하, 빅데이터라 한다)을 처리하거나 빅데이터로부터 가치를 추출하고 결과를 분석하고자 하는 실질적인 문제와 직면하고 있다.BACKGROUND ART [0002] Recently, with the development of technology fields such as electricity, electronics, and communication, a large amount of data is being generated in the Internet, wired and wireless communication, and the like. In particular, companies providing social network services or portal services on the Internet can use large-scale fixed or unstructured data sets (hereinafter referred to as "big data") that are beyond the capability of collecting, storing, managing and analyzing data with existing database management tools. , Or to extract values from big data and analyze the results.

특히, SNS(social network service) 관련 데이터, 뉴스 기사, 뉴스 댓글, 검색 키워드 등은 최신 데이터일수록 높은 중요도를 가지므로 최신 데이터를 효율적이면서 신속하게 데이터베이스에 반영할 수 있는 방안이 요구되고 있다. 게다가, SNS 관련 데이터, 뉴스 기사, 뉴스 댓글, 검색 키워드 등과 같이 시간적 이슈에 대응하는 데이터는 종종 폭증하는 경우가 발생하는데, 이러한 문제에 대비하여 축적되는 데이터를 충분히 빠르면서 안정적으로 색인할 수 있는 기술이 요구되고 있다.Particularly, as the latest data have a high importance, data related to social network service (SNS), news articles, news comments, and search keywords are required to be efficiently and quickly reflected in the database. In addition, data corresponding to temporal issues, such as SNS-related data, news articles, news comments, and search keywords, often occasionally explode, and a technology capable of indexing data accumulated in preparation for such problems sufficiently fast and stably .

한국등록특허 제10-0835706호(2008.05.30.)Korean Patent No. 10-0835706 (May 30, 2008)

상기와 같은 문제점을 해결하기 위한 본 발명의 목적은, 종래의 인덱싱 기법으로 쉽게 처리할 수 없는 데이터를 신속하게 처리할 수 있는, 실시간 검색을 위한 데이터 인덱싱 방법 및 장치를 제공하는데 있다.An object of the present invention is to provide a data indexing method and apparatus for real-time retrieval, which can quickly process data that can not be easily processed by a conventional indexing technique.

본 발명의 다른 목적은, 데이터와 같이 최신 데이터의 중요도가 높은 데이터를 효과적으로 실시간 검색할 수 있는, 데이터 인덱싱 방법 및 장치를 제공하는데 있다.It is another object of the present invention to provide a data indexing method and apparatus capable of effectively real-time searching data having high importance of latest data like data.

상기 목적을 달성하기 위한 본 발명의 일 측면에서는, 컴퓨팅 장치에서 수행되는 실시간 검색을 위한 데이터 인덱싱 방법으로서, 메모리의 문서를 로그 파일로 기록하는 단계; 상기 로그 파일에서 읽은 정보의 적어도 일부를 포함한 일정량의 문서를 선택하는 단계; 상기 문서에 대한 적어도 하나의 임시 세그먼트를 생성하는 단계; 상기 적어도 하나의 임시 세그먼트를 검색엔진의 검색에 노출하는 단계; 및 상기 적어도 하나의 임시 세그먼트가 노출된 상태에서 상기 적어도 하나의 임시 세그먼트에 포함된 문서가 머징(merging) 중이면, 해당 삭제 후보 문서의 식별자를 저장한 삭제 요청 파일을 생성하는 단계를 포함하는, 데이터 인덱싱 방법이 제공된다.According to an aspect of the present invention, there is provided a data indexing method for real-time search performed in a computing device, the method comprising: recording a document in a memory as a log file; Selecting a predetermined amount of the document including at least a part of the information read from the log file; Creating at least one temporary segment for the document; Exposing the at least one temporary segment to a search of a search engine; And generating a deletion request file storing an identifier of the deletion candidate document if the document included in the at least one temporary segment is merging while the at least one temporary segment is exposed. A method of data indexing is provided.

여기서, 상기 삭제 요청 파일의 파일명은 시간적 고정값으로 정해지는 숫자를 포함할 수 있다.Here, the file name of the deletion request file may include a number determined by a temporal fixed value.

여기서, 상기 삭제 요청 파일은 기준 참조 개수가 설정될 수 있다. 상기 기준 참조 개수는 상기 삭제 요청 파일이 머징에 참조될 때 일정 개수씩 차감될 수 있다.Here, the number of reference references may be set in the deletion request file. The reference reference count may be subtracted by a certain number when the deletion request file is referred to in the merging.

여기서, 데이터 인덱싱 방법은, 상기 생성하는 단계 후에, 상기 머징에서의 참조에 의해 상기 기준 참조 개수가 하한치 이하일 때, 상기 하한치 이하의 삭제 요청 파일에 대응하는 문서를 삭제하는 단계를 더 포함할 수 있다.Here, the data indexing method may further include, after the generating step, deleting the document corresponding to the deletion request file below the lower limit when the reference number of reference is less than the lower limit by reference at the merging .

여기서, 데이터 인덱싱 방법은, 상기 노출하는 단계 후에, 상기 노출하는 단계에서의 라이브 문서 개수를 체크하는 단계를 더 포함할 수 있다. 상기 라이브 문서 개수의 범위는 로그성 범위들로 분류될 수 있다.Here, the data indexing method may further include, after the exposing step, checking the number of live documents in the exposing step. The range of the number of live documents can be classified into log-realistic ranges.

여기서, 데이터 인덱싱 방법은, 상기 체크하는 단계 후에, 상기 로그성 범위들로 분류된 복수의 머징 대상 세그먼트들 각각에 대하여 머징 색인을 동시에 처리하는 단계를 더 포함할 수 있다.Here, the data indexing method may further include, after the checking step, concurrently processing the merging indexes for each of the plurality of merging target segments classified into the logarithmic ranges.

여기서, 상기 로그성 범위들은, 각 세그먼트에 포함되는 문서의 개수가 0, 100, 1K, 10K, 100K 및 1M의 분류 범위들을 포함할 수 있다.Here, the logarithmic ranges may include classification ranges of 0, 100, 1K, 10K, 100K and 1M in the number of documents included in each segment.

여기서, 상기 기록하는 단계는, 문서 로거가 플러시 주기마다 파일에 문서를 첨부하고 롤링(rolling) 주기마다 파일을 교체하여 상기 로그 파일을 생성할 수 있다.Here, the recording step may be such that the document logger attaches a document to a file at each flush period and generates the log file by replacing the file at every rolling cycle.

여기서, 상기 문서 로거는 상기 로그 파일을 읽는 플러시(flush) 때마다 교체되는 두 개의 메모리들 또는 메모리 영역들에 연결될 수 있다. 상기 로그 파일은 파일 큐에 삽입되고, 상기 파일 큐는 로그파일상태 객체가 포함되며, 로그파일상태 객체는 파일 객체와 닫힘 여부를 포함할 수 있다.Here, the document logger may be connected to two memories or memory areas that are replaced each time a flush of reading the log file is performed. The log file is inserted into a file queue, the file queue includes a log file status object, and the log file status object may include a file object and whether it is closed.

여기서, 데이터 인덱싱 방법은, 상기 기록하는 단계 전에, 상기 문서 로거에서 색인요청 작업에 대한 신호 또는 정보를 획득하는 단계를 더 포함할 수 있다.Here, the data indexing method may further include obtaining a signal or information for an index request job in the document logger before the writing step.

여기서, 상기 선택하는 단계는, 컴퓨팅 장치의 인덱싱 용량이나 성능에 따라 미리 정해진 분량 또는 개수의 로그 파일들을 뽑아낼 수 있다.Here, the selecting may extract a predetermined number or a number of log files according to the indexing capacity or performance of the computing device.

상기 목적을 달성하기 위한 본 발명의 다른 측면에서는, 데이터 인덱싱 장치로서, 프로세서, 상기 프로세서에 연결되는 메모리, 및 상기 메모리에 저장되는 프로그램을 포함하고, 상기 프로세서는 상기 프로그램에 의해 문서 로거 및 동적 색인 스케줄러의 기능들을 수행할 수 있다. 상기 문서로거는 상기 메모리의 문서를 로그 파일로 기록하고; 상기 동적 색인 스케줄러는 상기 로그 파일에서 읽은 정보의 적어도 일부를 포함한 일정량의 문서를 선택하고, 상기 문서에 대한 적어도 하나의 임시 세그먼트를 생성하고, 상기 적어도 하나의 임시 세그먼트를 검색엔진의 검색에 노출하고, 상기 적어도 하나의 임시 세그먼트에 포함된 삭제 후보 문서를 머징(merging) 중일 때, 상기 삭제 후보 문서의 식별자를 저장한 삭제 요청 파일을 생성하는, 데이터 인덱싱 장치가 제공된다.According to another aspect of the present invention, there is provided a data indexing apparatus comprising: a processor; a memory coupled to the processor; and a program stored in the memory, The functions of the scheduler can be performed. The document logger records a document of the memory in a log file; Wherein the dynamic index scheduler selects a quantity of documents containing at least a portion of the information read in the log file, generates at least one temporary segment for the document, exposes the at least one temporary segment to a search of a search engine And generating a deletion request file storing an identifier of the deletion candidate document when the deletion candidate document included in the at least one temporary segment is being merged.

여기서, 상기 프로세서는 상기 프로그램에 의해 머징 색인 스케줄러의 기능을 더 수행하며, 상기 머징 색인 스케줄러는 상기 동적 색인 스케줄러에 의해 동적 색인된 복수의 색인 세그먼트를 병합할 수 있다. 그리고 상기 프로세서는 상기 머징 색인 스케줄러의 머징 색인 작업 중에 동적 색인되는 색인 세그먼트에 대하여 임시아이디를 부여하고, 상기 임시아이디를 파일명에 포함하는 삭제 요청 파일을 생성할 수 있다.Here, the processor further performs the function of a merging index scheduler by the program, and the merging index scheduler can merge a plurality of index segments dynamically indexed by the dynamic index scheduler. The processor may generate a deletion request file including a temporary ID for the index segment to be dynamically indexed during the merging index operation of the merging index scheduler and including the temporary ID in the file name.

여기서, 상기 프로세서는, 상기 머징 색인 스케줄러에 의해 생성된 제2 색인 세그먼트를 검색에 노출할 때, 상기 삭제 요청 파일을 이용하여 이미 검색에 노출된 제1 색인 세그먼트에 포함된 문서를 삭제할 수 있다.Here, when the processor exposes the second index segment generated by the merging index scheduler to the search, the processor may delete the document included in the first index segment that has already been exposed to the search using the deletion request file.

상기 목적을 달성하기 위한 본 발명의 또 다른 측면에서는, 색인 스케줄러를 포함하는 인덱서(indexer)에서 수행되는 데이터 처리 방법의 일종으로서, 색인 스케줄러가 메모리에 기록된 로그(LOG) 파일을 읽는 단계, 상기 LOG 파일에서 읽은 정보를 토대로 일정량의 문서를 선택하는 단계; 상기 문서에 대한 임시 세그먼트를 생성하는 단계, 상기 임시 세그먼트를 검색에 노출하는 단계, 및 상기 임시 세그먼트에 대한 병합이 진행중이거나 병합 문서가 존재하면, 상기 문서의 식별자를 삭제 요청 파일에 저장하는 단계를 포함하는, 인덱스 구조의 동작 방법이 제공된다.According to another aspect of the present invention, there is provided a data processing method performed by an indexer including an index scheduler, the index scheduler reading a log file recorded in a memory, Selecting a quantity of documents based on information read from the LOG file; Creating a temporary segment for the document, exposing the temporary segment to a search, and storing the identifier of the document in an erase request file if merge for the temporary segment is in progress or if a merge document exists A method of operating an index structure is provided.

상기 목적을 달성하기 위한 본 발명의 또 다른 측면에서는, 머징 스케줄러를 포함하는 인덱서(indexer)에서 수행되는 데이터 처리 방법의 일종으로서, 머징 스케줄러가 주기적으로 색인 데이터 내의 세그먼트 리스트를 확인하는 단계, 상기 세그먼트 리스트 중 머징할 세그먼트들을 선택하는 단계, 상기 머징할 세그먼트들을 문서 레벨별로 묶는 단계, 상기 문서레벨별로 상기 세그먼트들을 머징하는 단계, 상기 머징된 세그먼트들에 대한 문서레벨별 임시 세그먼트를 생성하는 단계, 상기 머징하는 단계에서 삭제 요청 파일의 세그먼트 리스트를 읽어들이는 단계, 및 상기 세그먼트 리스트에 속한 세그먼트가 존재하면 해당 세그먼트를 임시 세그먼트에서 삭제하는 단계를 포함하는, 인덱스 구조의 동작 방법이 제공된다.According to another aspect of the present invention, there is provided a data processing method performed by an indexer including a merging scheduler, comprising: a merging scheduler periodically checking a segment list in the index data; Selecting segments to merge among the lists, grouping the segments to be merged by document level, merging the segments by document level, creating a temporary segment by document level for the merged segments, Reading a segment list of the deletion request file at a merging step, and deleting the segment from the temporary segment if the segment belonging to the segment list exists.

상기 목적을 달성하기 위한 본 발명의 또 다른 측면에서는, 색인 요청된 저장부의 문서를 로그(LOG) 파일로 기록하는 색인 요청 처리부, 상기 LOG 파일을 읽어들여 동적 색인을 수행하는 색인 스케줄러, 및 색인된 데이터 내의 세그먼트 리스트에서 선택한 세그먼트들을 병합 정책에 따라 병합하는 머징 스케줄러를 포함하며, 상기 병합된 세그먼트들은 임시 세그먼트로 생성되어 검색에 노출되는, 인덱스 구조가 제공된다.According to another aspect of the present invention, there is provided an information processing apparatus including an index request processing unit for writing a document of a storage requested by a log file, an index scheduler for reading the LOG file to perform dynamic indexing, And a merging scheduler for merging the segments selected in the segment list in the data according to the merge policy, wherein the merged segments are generated as a temporary segment and are exposed to the search.

여기서, 인덱스 구조는 인덱서에 대응하거나, 또는 인덱스 처리 방법이나 인덱싱 동작 방법을 수행하는 인덱스, 인덱스 노드, 인덱스 서버, 또는 인덱스 서비스 장치에 대응할 수 있다.Here, the index structure may correspond to an indexer, an index node, an index server, or an index service device that corresponds to an indexer or performs an index processing method or an indexing operation method.

상술한 바와 같은 본 발명의 실시예에 따른 실시간 검색을 위한 색인 구조 및 그 동작 방법을 이용할 경우에는, 일정량의 빅데이터에 대하여 종래 기술에 비해 확실히 빠르게 예컨대, 수초 내에 색인 작업을 완료할 수 있다.When the index structure and the operation method for real-time search according to the embodiment of the present invention as described above are used, it is possible to complete the indexing work within a few seconds, for example, within a few seconds as compared with the prior art for a certain amount of big data.

또한, 새로운 색인 구조를 통해 데이터 등의 문서를 신속하게 검색할 수 있도록 하고, 색인 구조의 효율적인 관리를 통해 새로운 데이터를 색인하여 기존 데이터 색인에 병합할 수 있으며, 이러한 처리 방법을 통해 신속하고 정확한 검색 서비스 구현에 기여할 수 있다.In addition, the new index structure enables quick retrieval of documents such as data, and efficient management of the index structure allows new data to be indexed and merged into existing data indexes. Can contribute to service implementation.

또한, SNS(social network service) 관련 데이터, 뉴스 기사, 뉴스 댓글, 검색 키워드 등과 같이 최신 데이터일수록 높은 중요도를 가지는 데이터 등의 문서를 효율적이면서 신속하게 데이터베이스에 반영할 수 있다.In addition, the latest data such as social network service (SNS) -related data, news articles, news comments, search keywords, and the like can be efficiently and quickly reflected in the database.

또한, SNS 관련 데이터, 뉴스 기사, 뉴스 댓글, 검색 키워드 등과 같이 시간적 이슈에 대응하는 데이터가 폭증하는 경우에도 충분히 빠르면서 안정적으로 데이터를 색인할 수 있다.In addition, even when data corresponding to temporal issues such as SNS-related data, news articles, news comments, search keywords, etc. are exploding, data can be indexed sufficiently fast and stably.

도 1은 본 발명의 일실시예에 따른 데이터 인덱싱 방법(이하, 간략히 '인덱싱 방법'이라 함)에 대한 개략적인 순서도이다.
도 2는 본 발명의 다른 실시예에 따른 인덱싱 방법에 대한 순서도이다.
도 3은 본 실시예에 따른 인덱싱 방법에 채용할 수 있는 색인 요청 절차에 대한 순서도이다.
도 4는 본 실시예에 따른 인덱싱 방법에 채용할 수 있는 색인 절차에 대한 순서도이다.
도 5는 본 실시예에 따른 인덱싱 방법에 채용할 수 있는 머징 절차에 대한 순서도이다.
도 6은 본 실시예에 따른 인덱싱 방법의 실시간 인덱싱의 작동 원리를 설명하기 위한 도면이다.
도 7은 도 6의 문서 로거의 작동 원리를 설명하기 위한 도면이다.
도 8은 도 6의 색인 스케줄러의 작동 원리를 설명하기 위한 도면이다.
도 9는 도 6의 동적 색인 모듈의 작동 원리를 설명하기 위한 도면이다.
도 10a는 도 6의 색인 세그먼트의 라이프 사이클을 설명하기 위한 도면이다.
도 10b는 도 6의 색인 세그먼트의 라이프 사이클의 다른 실시예를 설명하기 위한 도면이다.
도 11은 도 6의 색인 세그먼트의 라이프 사이클의 타임라인을 예시한 도면이다.
도 12는 본 발명의 다른 실시예에 따른 데이터 인덱싱 장치에 대한 블록도이다.
도 13은 본 발명의 또 다른 실시예에 따른 데이터 인덱싱 장치에 대한 블록도이다.1 is a schematic flowchart of a data indexing method (hereinafter, simply referred to as 'indexing method') according to an embodiment of the present invention.
2 is a flowchart of an indexing method according to another embodiment of the present invention.
3 is a flowchart of an index request procedure that can be employed in the indexing method according to the present embodiment.
4 is a flowchart of an indexing procedure that can be employed in the indexing method according to the present embodiment.
5 is a flow chart of a merging procedure that can be employed in the indexing method according to the present embodiment.
6 is a view for explaining the operation principle of real-time indexing in the indexing method according to the present embodiment.
7 is a diagram for explaining the operation principle of the document logger of Fig.
FIG. 8 is a diagram for explaining the operation principle of the index scheduler of FIG. 6; FIG.
FIG. 9 is a diagram for explaining the operation principle of the dynamic index module of FIG. 6; FIG.
FIG. 10A is a diagram for explaining the life cycle of the index segment of FIG. 6. FIG.
FIG. 10B is a view for explaining another embodiment of the life cycle of the index segment of FIG. 6. FIG.
FIG. 11 is a diagram illustrating a timeline of the life cycle of the index segment of FIG. 6. FIG.
12 is a block diagram of a data indexing apparatus according to another embodiment of the present invention.
13 is a block diagram of a data indexing apparatus according to another embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 그러나 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 각 도면을 설명하면서 유사한 참조부호를 유사한 구성요소에 대해 사용하였다. While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It is to be understood, however, that the invention is not to be limited to the specific embodiments, but includes all modifications, equivalents, and alternatives falling within the spirit and scope of the invention. Like reference numerals are used for like elements in describing each drawing.

제1, 제2, A, B 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다. The terms first, second, A, B, etc. may be used to describe various elements, but the elements should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component. And / or < / RTI > includes any combination of a plurality of related listed items or any of a plurality of related listed items.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 아니하는 것으로 이해되어야 할 것이다. It is to be understood that when an element is referred to as being "connected" or "connected" to another element, it may be directly connected or connected to the other element, . On the other hand, when an element is referred to as being "directly connected" or "directly connected" to another element, it should be understood that there are no other elements in between.

본 명세서에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함한다", "가진다" 등과 관련된 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The singular expressions include plural expressions unless the context clearly dictates otherwise. In this specification, terms related to "comprising "," having ", and the like are intended to specify the presence of stated features, integers, steps, operations, elements, parts, or combinations thereof, But do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, or combinations thereof.

또한, 본 명세서에서 오해의 소지가 없는 한 어떤 문자의 첨자가 다른 첨자를 가질 때, 표시의 편의를 위해 첨자의 다른 첨자는 첨자와 동일한 크기나 형태로 표시될 수 있다.Also, in the present specification, when a subscript of a character has different subscripts, other subscripts of the subscript can be displayed in the same size or form as the subscript for convenience of display.

본 명세서에서 다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 포함한다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 의미와 일치하는 의미로 해석되어야 하며, 본 명세서에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless otherwise defined herein, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms such as those defined in commonly used dictionaries should be interpreted in a manner consistent with the contextual meaning of the related art, and are not to be construed as ideal or overly formal, unless explicitly defined herein.

본 실시예에서 사용되는 용어들 중 일부 주요 용어를 정의하면 다음과 같다.Some of the terms used in this embodiment are defined as follows.

용어 "실시간 검색"과 관련하여 본 실시예에서는 색인한 문서를 수초 이내에 검색할 수 있는 방법을 제공한다.Regarding the term "real-time search ", this embodiment provides a method for searching an indexed document within a few seconds.

용어 "노드"는 서버 내에서 실행되고 있는 엔진 데몬 프로세스를 지칭할 수 있다. 또한, "노드"는 서버에서 하나의 엔진만이 실행되는 경우, 그 서버 자체를 지칭할 수 있다. 예를 들어, 검색노드는 검색서버이고, 색인노드는 색인서버가 될 수 있다.The term "node" may refer to an engine daemon process running in the server. A "node" can also refer to the server itself if only one engine is running in the server. For example, the search node may be a search server, and the index node may be an index server.

용어 "세그먼트"는 색인 데이터를 구성하는 기본 단위의 문서 모음이며 실제의 파일 디렉토리를 지칭할 수 있다. 색인이 한번 수행될 때마다, 세그먼트는 하나씩 늘어날 수 있다. 본 실시예에서는 1회 검색 시에 현재의 검색 환경에 노출된 모든 세그먼트들을 검색하고 검색된 모든 세그먼트들을 하나로 합쳐서 최종 결과(다른 세그먼트)를 만들 수 있다.The term "segment" is a collection of documents of the basic units that constitute the index data and can refer to an actual file directory. Each time the index is executed, the segment can be incremented by one. In this embodiment, at the time of one search, all the segments that are exposed to the current search environment can be searched and all the searched segments can be combined into one end result (another segment).

용어 "머징(merging)"은 여러 개의 세그먼트를 하나로 병합하는 과정을 지칭한다. 머징이 없다면, 세그먼트가 너무 많아져서 검색이 느려지고, 디스크 공간이 모자랄수 있다. 그러므로 본 실시예에서는 주기적인 머징을 통해 파일 갯수를 줄일 수 있다. 이러한 머징은 기본적으로 컴퓨터 장치의 디스크 조각 모음과 비슷한 작동 원리를 가질 수 있으나, 그 구체적인 작동 원리에서는 아래의 상세한 설명에서와 같이 차이가 있다.The term " merging "refers to the process of merging several segments into one. Without merging, too many segments can slow down the search, and disk space may be scarce. Therefore, in the present embodiment, the number of files can be reduced through periodic merging. Such merging can basically have a working principle similar to that of a computer device, but its specific operating principle is different as described in the following detailed description.

용어 "동적 색인"은 새로 갱신된 일부 문서들을 색인하여 기존 색인에 붙여주는 색인 방식을 말한다. 동적 색인을 이용하면, 최신 데이터를 정기적으로 색인하면서도 색인 대상 세그먼트들의 수가 증가하는 것을 억제하여 문서를 빠르고 정확하게 검색하도록 할 수 있다.The term "dynamic index" refers to an indexing scheme in which some newly updated documents are indexed and pasted into an existing index. With dynamic indexing, it is possible to retrieve documents quickly and accurately by restricting the number of indexed segments from increasing while indexing the latest data periodically.

이하, 본 발명에 따른 바람직한 실시예를 첨부된 도면을 참조하여 상세하게 설명한다.Hereinafter, preferred embodiments according to the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일실시예에 따른 데이터 인덱싱 방법(이하, 간략히 '인덱싱 방법'이라 함)에 대한 개략적인 순서도이다.1 is a schematic flowchart of a data indexing method (hereinafter, simply referred to as 'indexing method') according to an embodiment of the present invention.

도 1을 참조하면, 본 실시예에 따른 인덱싱 방법은, 실시간 검색을 위해 데이터의 색인(index)을 처리하기 위한 것으로서, 색인 요청된 문서들을 먼저 파일에 쌓고, 파일에서 문서들을 읽어들여 수초 이내에 색인 작업을 완료하도록 이루어진다. 이러한 인덱싱 방법은 색인된 문서를 검색할 수 있고, 색인 파일을 적절하게 관리할 수 있는 컴퓨팅 장치 또는 컴퓨팅 장치를 포함하는 인덱싱 장치에 의해 구현될 수 있다.Referring to FIG. 1, an indexing method according to an embodiment of the present invention is for processing an index of data for real-time searching. The indexing method first accumulates index-requested documents in a file, reads documents from the file, To complete the operation. Such an indexing method can be implemented by an indexing device including a computing device or a computing device capable of retrieving an indexed document and managing the index file appropriately.

즉, 본 실시예에 따른 인덱싱 방법은 소정의 데이터에 대하여 색인요청을 받고 색인요청에 따라 인덱싱 장치 내에서 색인 요청 작업을 실행할 수 있다(S11). 특히, 색인 요청 작업을 실행할 때, 색인 대상 문서의 추출에 있어 입력되는 문서 전체에 대하여 미리 설정된 로그 파일들 중 일정 용량 또는 일정 개수의 로그 파일을 뽑아냄으로써 이후의 색인 작업이나 머징 작업을 효과적으로 수행하도록 할 수 있다.That is, in the indexing method according to the present embodiment, the index request is received for predetermined data, and the index request operation can be executed in the indexing device according to the index request (S11). Particularly, when executing the index request operation, a predetermined capacity or a predetermined number of log files among the log files preset for the entire document to be input in the extraction of the index target document is extracted, thereby effectively performing the subsequent indexing operation or merging operation can do.

다음, 색인 요청된 데이터 또는 문서를 동적 색인할 수 있다(S12). 동적 색인은 새로 갱신된 일부 문서들을 색인하여 기존 색인에 붙여주는 색인 방식을 말한다. 이러한 동적 색인 작업을 실행하면, 중복되는 세그먼트를 효과적으로 제거하여 실시간 검색에 노출되는 세그먼트의 수를 효율적으로 제어할 수 있다.Next, the indexed data or document can be dynamically indexed (S12). Dynamic indexing is an indexing method that indexes some newly updated documents and attaches them to existing indexes. By performing such a dynamic indexing operation, it is possible to effectively remove overlapping segments and efficiently control the number of segments exposed to real-time search.

다음, 색인 작업 또는 동적 색인 작업 도중이나 그 후에 복수의 세그먼트들을 머징할 수 있다(S13). 이러한 머징 작업 또는 머징 색인 작업을 실행하면, 중복되는 세그먼트를 더욱 효과적으로 제거할 수 있다.Next, a plurality of segments may be merged during or after the indexing operation or the dynamic indexing operation (S13). By executing such merging operation or merging indexing operation, overlapping segments can be more effectively removed.

본 실시예에서 인덱싱 방법은 색인 요청 작업, 색인 작업(또는 동적 색인 작업) 및 머징 작업(또는 머징 색인 작업)이 기재된 순서대로 순차적으로 실행되는 것으로 설명되나, 이에 한정되지는 않으며, 구현에 따라서 머징 색인 작업이 생략될 수도 있다.In the present embodiment, the indexing method is described as being performed sequentially in the order described in the index requesting operation, indexing operation (or dynamic indexing operation), and merging operation (or merging indexing operation), but the present invention is not limited thereto, Indexing may be omitted.

도 2는 본 발명의 다른 실시예에 따른 인덱싱 방법에 대한 순서도이다.2 is a flowchart of an indexing method according to another embodiment of the present invention.

도 2를 참조하면, 본 실시예에 따른 인덱싱 방법은, 색인요청된 데이터에 대하여 복수의 동적 색인 프로세스들을 통해 복수의 동적 색인을 동치에 처리할 수 있다. 즉, 제1 동적 색인(S12a)과 제2 동적 색인(S12b)은 비동기적으로 혹은 독립적으로 수행될 수 있으며, 구현에 따라서 동시 처리될 수 있다.Referring to FIG. 2, the indexing method according to the present embodiment can process a plurality of dynamic indexes in the same manner through a plurality of dynamic indexing processes for index-requested data. That is, the first dynamic index S12a and the second dynamic index S12b may be performed asynchronously or independently, and may be concurrently processed according to the implementation.

또한, 구현에 따라서 본 실시예에 따른 인덱싱 방법은, 동적 색인(S12a 및/또는 S12b)과 머징 색인(S13)을 서로 비동기적으로, 독립적으로 혹은 병렬적으로 수행할 수 있다.According to the implementation, the indexing method according to the present embodiment can perform the dynamic indexes S12a and / or S12b and the merging index S13 asynchronously, independently or in parallel.

일례로, 세그먼트의 입장에서 볼 때, 각 세그먼트는 색인요청(도 1의 S11 참조)만이 수행되거나, 동적 색인(S12a 및/또는 S12b)만이 수행되거나, 머징 색인(S13)만이 수행될 수 있다. 또한, 구현에 따라서 색인 요청, 동적 색인(S12a 또는 S12b) 및 머징 색인(S13) 중 적어도 두 개 이상의 절차들의 조합이 수행될 수 있다.For example, in the context of a segment, each segment may be performed only an index request (see S11 in FIG. 1), only a dynamic index (S12a and / or S12b), or only a merging index S13. Further, depending on the implementation, a combination of at least two of the index request, dynamic index (S12a or S12b) and merging index (S13) may be performed.

색인요청만이 수행되는 경우, 색인된 데이터는 기존의 색인 방법에 의해 1차적으로 처리된 결과일 수 있다. 이러한 색인된 데이터는 메모리 등의 저장 시스템에 저장되고, 그 후 별도의 절차에 따라 동적색인 스케줄러나 머징색인 스케줄러에 의해 처리되어 색인된 데이터로 출력될 수 있다. 동적색인 스케줄러는 간단히 색인 스케줄러로 지칭되고, 머징색인 스케줄러는 머징 스케줄러로 지칭될 수 있다.When only an index request is performed, the indexed data may be the result primarily processed by an existing indexing method. Such indexed data may be stored in a storage system such as a memory and then processed by a dynamic index scheduler or a merging index scheduler according to a separate procedure and output as indexed data. The dynamic index scheduler is simply referred to as an index scheduler, and the merging index scheduler may be referred to as a merging scheduler.

도 3은 본 실시예에 따른 인덱싱 방법에 채용할 수 있는 색인 요청 절차에 대한 순서도이다.3 is a flowchart of an index request procedure that can be employed in the indexing method according to the present embodiment.

도 3을 참조하면, 본 실시예에 따른 인덱싱 방법은 색인 요청을 위해 먼저 색인서버의 API(application programming interface) 리스너(listener) 등을 통해 문서 색인 요청을 수신할 수 있다(S111). 문서 색인 요청에 대한 최초 신호는 사용자 단말이나 관리자 단말로부터 색인서버로 전달될 수 있다.Referring to FIG. 3, the indexing method according to the present embodiment may receive a document indexing request through an application programming interface (API) listener of the index server, for example, at step S111. The initial signal for the document index request may be passed from the user terminal or the administrator terminal to the index server.

API 리스너로부터의 문서 색인 요청에 따라 즉, 색인서버의 작업 풀(job pool) 내에 색인요청 작업이 생성됨에 따라 색인서버 혹은 색인서버에 대응하는 인덱서(indexer)가 색인 문서를 메모리에 기록하거나 유지할 수 있다(S112). 색인서버나 인덱서는 동적 색인 모듈을 포함할 수 있다.An index server corresponding to an index server or an index server can write or maintain an index document in memory as an index request job is created within a job pool of the index server, that is, according to a document index request from an API listener (S112). An index server or indexer may include a dynamic index module.

메모리에 색인 문서를 유지하는 것은 색인 작업 대상임을 지정하거나 색인 작업 대상을 저장하는 메모리의 특정 위치의 저장 공간에 색인 문서를 기록하도록 구현될 수 있다. 메모리는 저장 시스템이나 저장부의 적어도 일부로 구현될 수 있다.Maintaining an index document in memory may be implemented to specify that it is an indexing target or to write an index document to a storage location at a particular location in memory that stores the indexing destination. The memory may be implemented as at least a portion of a storage system or storage.

즉, 전술한 유지 단계(S112)는 색인 요청된 문서를 먼저 미리 정의한 구조나 형태를 갖는 파일에 쌓고, 쌓은 파일들에 대한 일정 용량 또는 일정 개수의 로그 파일을 토대로 일정량의 문서를 읽어들이도록 구현될 수 있다. 여기서, 파일을 읽어내는 작업은 문서로거에 의해 수행될 수 있다. 문서로거는 색인 요청 처리부에 포함될 수 있다.That is, in the maintenance step S112 described above, an index-requested document is first piled up in a file having a structure or a form that has been defined in advance, and a predetermined amount of documents are read based on a certain capacity or a predetermined number of log files for the stacked files . Here, the operation of reading the file can be performed by the document logger. The document logger may be included in the index request processing unit.

다음, 색인 요청 처리부는 주기적으로 메모리의 문서를 로그(LOG) 파일로 기록할 수 있다(S113). 색인 요청 처리부는 색인서버 또는 동적색인모듈에 포함될 수 있다. LOG 파일은 "indexlog"의 파일명을 갖고 세그먼트 내 문서에 대응하는 값 예컨대, 2321845350978868 등의 형태로 저장될 수 있다.Next, the index request processing unit periodically writes a document in the memory as a log file (S113). The index request processing unit may be included in the index server or the dynamic index module. The LOG file may have a file name of "indexlog " and may be stored in a format corresponding to the document in the segment, for example, 2321845350978868.

도 4는 본 실시예에 따른 인덱싱 방법에 채용할 수 있는 색인 절차에 대한 순서도이다.4 is a flowchart of an indexing procedure that can be employed in the indexing method according to the present embodiment.

도 4를 참조하면, 본 실시예에 따른 인덱싱 방법은, 먼저 색인 스케줄러가 LOG 파일을 읽어들일 수 있다(S41). 그리고 적정량의 문서를 선택할 수 있다(S42).Referring to FIG. 4, in the indexing method according to the present embodiment, the index scheduler can read the LOG file (S41). Then, a proper amount of documents can be selected (S42).

적정량의 문서 선택은 적정량의 로그 파일을 읽어들임으로써 구현될 수 있다. 즉, 색인 스케줄러는 색인을 수행하는데 있어서 인덱싱 장치의 성능이나 용량에 맞추어 설정되거나, 처리하는 데이터(빅데이터)의 특성에 따라 미리 설정되는 개수 혹은 용량의 로그 파일을 읽어들이고 각 로그 파일에 대응하는 문서를 선택하도록 구현될 수 있다. 1회 읽어들이는 로그 파일의 용량 혹은 개수는 빅데이터의 경우, 20메가바이트(Mbyte) 또는 1만개인 것이 바람직하다.Appropriate amount of document selection can be implemented by reading an appropriate amount of log files. That is, the index scheduler is set according to the performance or capacity of the indexing device in performing indexing, or reads a log file having a preset number or capacity according to characteristics of data (big data) to be processed, May be implemented to select a document. It is preferable that the capacity or the number of the log files to be read once is 20 megabytes (Mbyte) or 10,000 in the case of the big data.

다음, 색인 스케줄러는 선택한 문서에 대해 색인을 수행할 수 있다. 본 실시예에서 용어 '색인'은 '동적 색인'과 혼용될 수 있다. 색인이 수행된 문서는 종류, 레벨 등에 따라 분류되어 일정량씩 임시 세그먼트로 생성될 수 있다(S43).Next, the index scheduler can perform indexing on the selected document. In this embodiment, the term 'index' may be used in combination with 'dynamic index'. The indexed document may be classified according to the type, level, etc., and may be generated as a temporary segment by a predetermined amount (S43).

임시 세그먼트가 생성되면 기본적인 색인은 완료된 것으로 볼 수 있다. 다만, 본 실시예에서는 개량된 동적 색인을 위해 이후의 동적 색인 과정을 추가로 더 수행할 수 있다.Once a temporary segment is created, the basic index can be viewed as complete. However, in the present embodiment, a further dynamic indexing process can be further performed for the improved dynamic indexing.

색인 스케줄러는 임시 세그먼트를 검색에 노출한다(S44). 여기서, 검색에 노출하는 것은 인덱싱 장치가 연결되고 외부의 질의에 대하여 데이터베이스 내 자료를 검색하는 장치에서 임시 세그먼트를 검색할 수 있도록 메모리나 저장 시스템에 저장되는 것을 의미한다.The index scheduler exposes the temporary segment to the search (S44). Here, exposing the search means storing the data in a memory or a storage system so that the indexing device is connected and the temporary segment can be searched by a device searching the database for an external query.

다음, 색인 스케줄러는, 검색에 노출된 임시 세그먼트 내 문서들 중 머징 중인 문서가 존재하는지를 판단할 수 있다(S45).Next, the index scheduler can determine whether there is a merging document among the documents in the temporary segment exposed in the search (S45).

상기의 판단 결과, 머징 중인 문서가 존재하면, 색인 스케줄러는 문서의 식별자(문서ID)를 삭제 요청 파일에 저장할 수 있다(S46). 삭제 요청 파일은 'DELETE.REQ'로 명명될 수 있다. 삭제 요청 파일은 중복 등의 이유로 이후 미리 정해진 조건을 만족할 때 삭제될 파일을 지칭한다.As a result of the determination, if there is a document to be merged, the index scheduler can store the identifier (document ID) of the document in the deletion request file (S46). The delete request file can be named 'DELETE.REQ'. The deletion request file refers to a file to be deleted when a predetermined condition is satisfied thereafter because of duplication or the like.

한편, 상기의 판단 결과, 검색에 노출 중인 임시 세그먼트 내 문서들 중 머징 중인 문서가 존재하지 않으면, 색인 스케줄러는 일정 시간이나 일정 시간에 더하여 일정 조건(피노출 범위, 레벨 등)을 변경하여 검색에 다시 노출할 수 있다. 이 과정은 상기의 단계(S44)의 변형예로서 임시 세그먼트 내 문서가 머징이나 기타 다른 작업에 연관되지 않았음을 확인하기 위한 것일 수 있다.If it is determined that there is no document being merged among the documents in the temporary segment being exposed to the search, the index scheduler changes the predetermined condition (the range of exposure, level, etc.) You can re-expose it. This process may be a variant of the above step S44 to confirm that the document in the temporary segment is not related to merging or other operations.

또 한편, 상기 단계(S45)의 판단 결과, 임시 세그먼트 내 문서들 중 머징 중인 문서가 존재하지 않으면, 혹은 임시 세그먼트 내 문서들 중 다른 작업에 연관된 문서가 없는 것으로 판단되면, 색인 스케줄러는 세그먼트에 대한 동적 색인을 완료할 수 있다(S47).On the other hand, if it is determined in step S45 that there is no document to be merged among the documents in the temporary segment or that there is no document related to another task among the documents in the temporary segment, The dynamic index can be completed (S47).

도 5는 본 실시예에 따른 인덱싱 방법에 채용할 수 있는 머징 절차에 대한 순서도이다.5 is a flow chart of a merging procedure that can be employed in the indexing method according to the present embodiment.

도 5를 참조하면, 본 실시예에 따른 인덱싱 방법은, 먼저 머징 스케줄러가 주기적으로 색인 데이터 내의 세그먼트 리스트를 확인할 수 있다. 또한, 머징 스케줄러는 확인된 세그먼트 리스트를 토대로 머징할 세그먼트를 선택할 수 있다(S51).Referring to FIG. 5, in the indexing method according to the present embodiment, the merging scheduler can periodically check the segment list in the index data. Further, the merging scheduler can select a segment to be merged based on the checked segment list (S51).

세그먼트 리스트는 머징 대상 세그먼트를 다른 세그먼트와 구별한 정보를 포함할 수 있다. 본 실시예에서 용어 '머징'은 '머징 색인'과 혼용될 수 있다.The segment list may include information that distinguishes the merging target segment from other segments. In this embodiment, the term " merging " may be used in combination with the " merging index ".

상기의 확인 및 선택 단계(S51)의 수행 결과, 머징할 세그먼트가 존재하지 않는다면(S52의 아니오), 머징 스케줄러는 머징 색인 작업의 다음 주기를 기다릴 수 있다.As a result of the above checking and selecting step S51, if the segment to be merged does not exist (NO in S52), the merging scheduler can wait for the next cycle of the merging indexing operation.

한편, 상기의 확인 및 선택 단계(S51)의 수행 결과, 머징할 세그먼트가 존재하면(S52의 예), 머징 스케줄러는 머징할 세그먼트들을 분류 정책에 따라 그룹별로 묶은 임시 세그먼트들로 변환하거나 저장하거나 혹은 생성할 수 있다(S53). 분류 정책은 문서 레벨별, 문서 종류별 등으로 문서를 분류하는 기준을 포함할 수 있다.On the other hand, if it is determined in step S51 that the segment to be merged exists (YES in step S52), the merging scheduler converts or segments the segments to be merged into temporary segments grouped according to the classification policy, (S53). Classification policies may include criteria to classify documents by document level, document type, and so on.

머징 작업의 수행 결과, 그룹별로 묶인 임시 세그먼트가 생성될 수 있다. 그룹별로 묶인 임시 세그먼트가 생성되면, 기본적인 머징 색인 작업은 완료된 것으로 볼 수 있다. 다만, 본 실시예에서는 데이터 등의 실시간 검색을 위하여 아래와 같은 추가적인 과정을 더 수행할 수 있다.As a result of the merging operation, temporary segments bundled into groups can be generated. If a temporary segment grouped by group is created, the basic merging index operation can be viewed as completed. However, in the present embodiment, the following additional process can be further performed for real-time retrieval of data and the like.

머징 스케줄러는 임시 세그먼트를 검색에 노출한다(S54). 검색에 노출되는 임시 세그먼트는 동적 색인에 의해 생성된 임시 세그먼트와 함께 시간 흐름에 따라 순서대로 저장, 배치 혹은 나열될 수 있다.The merging scheduler exposes the temporary segment to search (S54). Temporary segments that are exposed to the search may be stored, placed, or ordered in order along with the temporal flow with the temporary segments generated by the dynamic index.

다음, 머징 스케줄러는 동적 색인 작업 중이나 머징 색인 작업 중에 생성된 삭제 요청 파일이 존재하는지를 판단할 수 있다(S55). 상기의 판단 결과, 예이면, 머징 스케줄러는 삭제 요청 파일을 읽어들여 해당 문서ID의 임시 세그먼트를 삭제하거나 임시 세그먼트에 대응하는 문서의 삭제 처리를 수행할 수 있다(S56). 한편, 상기의 판단 결과가 아니오이면, 머징 스케줄러는 특정 임시 세그먼트의 삭제 과정을 생략할 수 있다.Next, the merging scheduler can determine whether there is a deletion request file generated during the dynamic indexing operation or during the merging indexing operation (S55). As a result of the determination, if the answer is YES, the merging scheduler reads the deletion request file and deletes the temporary segment of the document ID or deletes the document corresponding to the temporary segment (S56). On the other hand, if the result of the determination is NO, the merging scheduler can omit the process of deleting a specific temporary segment.

다음, 머징 스케줄러는 삭제 요청 파일(일례로, DELETE.REQ)이 다른 곳(작업 등)에서 사용되지 않는지를 판단할 수 있다(S57). 상기의 판단 결과, 예이면, 머징 스케줄러는 삭제 요청 파일을 읽어들어 삭제 요청 파일 내 문서ID에 대응하는 임시 세그먼트를 삭제하거나, 임시 세그먼트에 대응하는 문서를 삭제할 수 있다(S58). 한편, 상기 단계(S58)의 판단 결과가 아니오이면, 머징 스케줄러는 특정 임시 세그먼트의 삭제 과정을 생략할 수 있다.Next, the merging scheduler can determine whether the deletion request file (e.g., DELETE.REQ) is not used elsewhere (job, etc.) (S57). If yes, the merging scheduler reads the deletion request file to delete the temporary segment corresponding to the document ID in the deletion request file or delete the document corresponding to the temporary segment (S58). On the other hand, if the determination result of step S58 is NO, the merging scheduler can omit the deletion process of the specific temporary segment.

전술한 절차에 따라 머징 색인 작업이 완료될 수 있다(S59).The merging indexing operation can be completed according to the above-described procedure (S59).

한편, 본 실시예에서 상기의 두 단계들(S55 및 S57)은 순차적으로 수행되는 것을 설명하였으나, 본 발명은 그러한 구성으로 한정되지 않고, 각 단계(S55 또는 S57)가 독립적으로 수행되거나 병렬적으로 수행될 수 있고, 구현에 따라서 본 실시예의 역순으로 수행 (즉, S57 다음에 S55가 수행)되도록 구현될 수 있다.Although the above-described two steps S55 and S57 are sequentially performed in the present embodiment, the present invention is not limited to such a configuration, and each step S55 or S57 may be performed independently or in parallel And may be implemented so that it is performed in the reverse order of the present embodiment (i.e., S57 is followed by S55) depending on the implementation.

전술한 실시예에 의하면, 동적 색인 작업과 머징 색인 작업을 반복적으로 수행함으로써, 실시간 입력되는 빅데이터에 대하여 빠른 색인 작업과 함께 적은 색인 데이터를 생성하는 인덱싱 작업을 수행할 수 있고, 그에 의해 빅데이터 등의 데이터에 대한 효과적인 실시간 검색을 가능케 한다.According to the above-described embodiment, by performing the dynamic indexing operation and the merging indexing operation repeatedly, it is possible to perform an indexing operation for generating a small index data together with a quick index operation on the big data inputted in real time, Real-time retrieval of data such as < RTI ID = 0.0 >

도 6은 본 실시예에 따른 인덱싱 방법의 실시간 인덱싱의 작동 원리를 설명하기 위한 도면이다.6 is a view for explaining the operation principle of real-time indexing in the indexing method according to the present embodiment.

도 6을 참조하면, 본 실시예에 따른 인덱싱 방법은 동적 색인 모듈(60)과 API 리스너(61)를 포함하는 색인 서버에 의해 구현될 수 있다. 색인 서버는 실시간 인덱싱에 필요한 작업을 버퍼나 메모리 상의 작업 풀(Job Pool, 62)로 생성하여 처리할 수 있다. 동적 색인 모듈(60)은 문서 로그(63), 동적 색인 스케줄러(65) 및 머징 색인 스케줄러(66)를 포함할 수 있다.Referring to FIG. 6, the indexing method according to the present embodiment can be implemented by an indexing server including a dynamic indexing module 60 and an API listener 61. The index server can create and process jobs required for real-time indexing in a buffer or memory pool (Job Pool, 62). The dynamic index module 60 may include a document log 63, a dynamic index scheduler 65 and a merging index scheduler 66.

색인 서버의 동작 과정을 좀더 상세히 설명하면 다음과 같다.The operation process of the index server will be described in more detail as follows.

인덱스 요청(/service/index)이 수신되면(S61), API 리스너(61)는 미리 설정된 작업 풀(62)에 인덱스 요청을 전달할 수 있다(S62).When the index request (/ service / index) is received (S61), the API listener 61 can transmit the index request to the preset work pool 62 (S62).

인덱스 요청이 전달되면, 작업 풀(62)에서는 인덱스 문서 요청 작업(IndexDocumentRequestJob)을 생성하고 이 작업을 색인 서버에서 실행할 수 있는 환경을 제공한다. 즉, 작업 풀(62)에서 생성된 색인 요청 작업은 동적 색인 모듈의 문서로거(63)에 전달될 수 있다(S63). 여기서 문서로거(63)는 시간 기반으로 동작하는 문서 로거(TimeBaseRollingDocumentLogger)일 수 있다.When the index request is forwarded, the work pool 62 creates an index document request job (IndexDocumentRequestJob) and provides an environment in which the job can be executed in the index server. That is, the index request job generated in the work pool 62 may be transmitted to the document logger 63 of the dynamic index module (S63). Here, the document logger 63 may be a TimeBaseRollingDocumentLogger that operates on a time basis.

작업 풀(62)은 메모리 풀로서 지칭될 수 있다. 이러한 작업 풀(62)은 하나의 작업이나 작업 그룹을 처리하기 위해 예약되어 있는 주 메모리 또는 기억장치의 논리적인 영역을 포함할 수 있다. 색인서버의 시스템에서 모든 주기억장치는 메모리 풀이라고 하는 논리 할당으로 구분될 수 있다. 색인서버를 포함하는 시스템은 디폴트(default)로 작업 풀(62)에 대한 데이터 및 프로그램의 전송을 관리할 수 있다.The work pool 62 may be referred to as a memory pool. These pools of work 62 may include logical areas of main memory or storage devices that are reserved for processing one task or group of tasks. In the system of the index server, all main memory devices can be divided into logical allocations called memory pools. The system including the index server can manage the transmission of data and programs to the work pool 62 by default.

즉, 사용자 작업이 메모리를 확보하는 메모리 풀은 이러한 작업의 활동 레벨을 제한하는 풀과 실질적으로 동일하다. 메모리 풀의 활동 레벨은 메모리 풀에서 동시에 활성화되는 스레드 수일 수 있다. 기본 풀로부터 메모리를 확보하지만 기계 풀 활동 레벨을 사용하는 시스템 작업은 예외일 수 있다. 또한, 메모리 풀의 서브시스템 모니터는 첫 번째 서브시스템 설명 풀에서 메모리를 확보하지만 기계 활동 레벨을 사용할 수 있다. 이러한 서브시스템 모니터는 활동 레벨 설정에 관계없이 항상 실행될 수 있다.That is, the memory pool in which the user's work reserves memory is substantially the same as the pool that limits the activity level of this operation. The activity level of the memory pool may be the number of threads activated simultaneously in the memory pool. An exception to this is the system operation that gets memory from the default pool but uses the machine pool activity level. In addition, the subsystem monitor in the memory pool can free up memory in the first subsystem description pool, but can use the machine activity level. These subsystem monitors can always be run regardless of the activity level setting.

전술한 API 리스너(61)는 기본적으로 문서 삽입 동작(InsertDocumentsAction), 문서 업데이트 동작(UpdateDocumentsAction), 문서 삭제 동작(DeleteDocumentsAction) 또는 이들의 조합 동작을 수행할 수 있다.The above-described API listener 61 can basically perform a document insertion operation (InsertDocumentsAction), a document update operation (UpdateDocumentsAction), a document delete operation (DeleteDocumentsAction), or a combination thereof.

다음, 문서로거(63)는 파일 내 문서 또는 세그먼트에 대하여 인덱스 로그(indexlog, 71)를 생성할 수 있다(S64). 인덱스 로그(71)는 숫자 형태를 가진 로그 기록을 포함할 수 있다. 로그 기록은 2321845350978868 등의 형태를 가질 수 있다.Next, the document logger 63 can generate an index log 71 for the document or segment in the file (S64). The index log 71 may include a log record having a numeric form. The log record may have the form of 2321845350978868 or the like.

또한, 문서로거(63)는 플러시 주기마다 동작하여 일정 개수 또는 용량의 로그 파일을 가진 인덱스 로그 파일을 생성하고, 이 파일들을 미리 정해진 포맷의 파일 큐(64)에 삽입할 수 있다.In addition, the document logger 63 may operate every flush period to generate an index log file having a log file of a predetermined number or capacity, and insert the files into the file queue 64 of a predetermined format.

다음, 문서로거(63)에서 인덱스 로그 파일을 포함하는 파일 큐(file queue, 64)가 전달되면, 동적 색인 모듈(60)의 동적 색인 스케줄러(IndexFireScheduleWorker, 65)는 작업 풀(62)에 동적 색인 작업(job)을 생성하고(S66), 동적 색인 작업을 통해 세그먼트를 생성할 수 있다(S67). 생성된 세그먼트는 메모리의 특정 영역에 임시 세그먼트로 저장될 수 있고, 색인 세그먼트로서 검색에 노출될 수 있다.Next, when a file queue 64 containing an index log file is delivered in the document logger 63, the dynamic index scheduler (IndexFireScheduleWorker) 65 of the dynamic index module 60 sends a dynamic index A job is created (S66), and a segment can be created through a dynamic indexing operation (S67). The generated segment may be stored as a temporary segment in a specific area of memory and may be exposed as a search index segment.

한편, 동적 색인 모듈(60)의 머징 색인 스케줄러(66)는 색인 세그먼트를 주기적으로 확인할 수 있다(S68). 그리고 머징 색인 스케줄러(66)는 색인 세그먼트 내 복수의 세그먼트들을 일정 개수, 일정 작업, 또는 일정 시간 단위로 작업 풀(62)에 생성한 제1 머징 색인 작업(Job-1), 제2 머징 색인 작업(Job-2) 등을 통해 복수의 세그먼트들을 머징할 수 있다(S69). 머징된 세그먼트들은 메모리의 특정 영역에 저장되고 색인 세그먼트 내의 다른 특정 세그먼트로서 검색에 노출될 수 있다(S69a).Meanwhile, the merging index scheduler 66 of the dynamic indexing module 60 can periodically check the index segment (S68). The merging index scheduler 66 includes a first merging index job (Job-1), a second merging index job (Job-1), and a second merging index job 62. The merging index scheduler 66 includes a plurality of segments in the index segment, (Job-2) or the like (S69). The merged segments may be stored in a particular area of the memory and exposed to the search as another specific segment in the index segment (S69a).

본 실시예에서는 동적 색인 모듈(60)이 문서로거(63), 동적 색인 스케줄러(65) 및 머징 색인 스케줄러(66)를 포함하는 것으로 설명하였지만, 본 발명의 색인 서버는 그러한 구성으로 한정되지 않고, 머징 색인 스케줄러(66)를 색인 서버 외부의 별도의 모듈로 구현할 수 있다. 또한, 문서로거(63)도 색인 서버 외부의 별도의 문서 요청 처리 모듈로 구현할 수 있다.Although the dynamic index module 60 has been described as including document logger 63, dynamic index scheduler 65 and merging index scheduler 66 in the present embodiment, the index server of the present invention is not limited to such a configuration, The merging index scheduler 66 may be implemented as a separate module outside the index server. Also, the document logger 63 can be implemented as a separate document request processing module outside the index server.

도 7은 도 6의 문서 로거(logger)의 작동 원리를 설명하기 위한 도면이다.FIG. 7 is a diagram for explaining the operation principle of the document logger of FIG. 6; FIG.

도 7을 참조하면, 본 실시예에 따른 문서로거는 메모리에 저장된 소프트웨어 모듈이나 프로그램과 이 프로그램이나 소프트웨어 모듈을 실행하는 프로세서에 의해 구현될 수 있다. 문서 로거는 기본적으로 입력 문서를 저장하는 메모리(71) 또는 메모리 영역에 접근하여 메모리 데이터를 읽어내고, 읽어낸 문서를 인덱스로그 파일들(75)로 변환 및/또는 분할하여 파일 큐(64)에 삽입할 수 있다.Referring to FIG. 7, the document logger according to the present embodiment may be implemented by a software module or a program stored in a memory and a processor executing the program or the software module. The document logger basically reads the memory data by accessing the memory 71 or the memory area for storing the input document, converts the read document into index log files 75 and / or divides the read document into index log files 75, Can be inserted.

좀더 구체적으로 문서 로거의 작동 원리를 설명하면, 먼저 문서 로거는 입력 문서 즉 색인 요청된 문서들을 메모리(71) 내 파일에 쌓는다. 메모리(71)는 제1 메모리(71a) 및 제2 메모리(71b)를 구비할 수 있다. 물론, 제1 메모리(71a) 및 제2 메모리(71b)는 단일 메모리 내의 제1 메모리 영역 및 제2 메모리 영역에 대응할 수 있다.More specifically, the operation principle of the document logger will be described. First, the document logger accumulates input documents, that is, index-requested documents, in a file in the memory 71. The memory 71 may include a first memory 71a and a second memory 71b. Of course, the first memory 71a and the second memory 71b may correspond to the first memory area and the second memory area in a single memory.

다음, 문서 로거는 데이터(memory data)로서 메모리(71)에 저장하는 입력 문서를 더블 버퍼링(double buffering) 방식으로 플러시(flush) 때마다 교체하여 쌓을 수 있다. 또한, 문서 로거는 플러시 체크 동작(flush chech task, 72)을 수행하도록 설정될 수 있다(S72). 플러시 체크 동작은 플러시 주기마다 복수의 메모리 또는 복수의 메모리 영역들에 교체로 저장되는 문서를 체크하는 것을 의미한다.Next, the document logger can replace and stack the input document stored in the memory 71 as memory data every time the input document is flushed by a double buffering method. In addition, the document logger may be configured to perform a flush check operation 72 (S72). The flush check operation refers to checking a document stored in a plurality of memories or a plurality of memory areas in a swap period alternately.

일 실시예에서, 문서로거는 플러시 주기마다 파일에 문서를 첨부할 수 있다(S73). 예컨대, 플러시 주기가 1초로 설정되는 경우, 색인을 수행하는 쪽(동적 색인 스케줄러 등)에서는 1초마다 갱신된 문서를 받을 수 있다.In one embodiment, the document logger may attach a document to the file at each flush period (S73). For example, if the flush period is set to 1 second, the indexed document (dynamic index scheduler, etc.) can receive updated documents every second.

다른 실시예에서, 문서로거는 롤링(rolling) 주기마다 파일을 교체할 수 있다(S74). 예컨대, 롤링 주기가 기본 30초 또는 1분으로 설정된 경우, 30초 미만으로 설정되는 경우에 비해 문서로거에서 하나의 파일이 과도하게 커지는 것을 방지할 수 있다.In another embodiment, the document logger may replace the file every rolling cycle (S74). For example, when the rolling period is set to the basic 30 seconds or 1 minute, it is possible to prevent one file from becoming excessively large in the document logger as compared with the case where the rolling period is set to less than 30 seconds.

파일로거에 의해 생성되는 인덱스로그(Indexlog) 파일(75)은, 각 문서에 대응하는 인덱스로그 기록을 저장하고 소정의 파일명으로 지정될 수 있다. 일례로, API로서 시간을 반환하는 메서드인 nanoTime()를 사용하여 시간적 고정값으로 정해지는 숫자(2327473223121315 등)를 파일명으로 생성할 수 있다.The index log file 75 generated by the file logger may store the index log record corresponding to each document and be designated with a predetermined file name. For example, you can use nanoTime (), a method that returns a time as an API, to generate a file name with a fixed number of time values (such as 2327473223121315).

인덱스로그가 기록된 파일들(75)은 파일 큐(queue)를 통해 동적 색인 스케줄러에 전달될 수 있다. 파일 큐(64)는 인(in) 또는 풋(put)하면 해당/대상 내용을 파일에 쓰고, 아웃(out) 또는 겟(get)하면 파일에서 읽어 그 내용을 반환하는 큐(queue) 또는 대기열이다. 파일 큐(64)에는 로그 파일 상태(LogFileStatus) 객체가 포함될 수 있다. 로그 파일 상태 객체는 파일 객체와 닫힘 여부(isClosed)에 대한 정보를 포함할 수 있다.The files 75 in which the index log is recorded can be delivered to the dynamic index scheduler through a file queue. The file queue 64 is a queue or a queue that reads out the contents of the file / object and outputs the contents when the file / . The file queue 64 may include a LogFileStatus object. The log file status object may contain information about the file object and whether it is closed (isClosed).

도 8은 도 6의 색인 스케줄러 즉 동적 색인 스케줄러의 작동 원리를 설명하기 위한 도면이다.FIG. 8 is a view for explaining the operation principle of the index scheduler of FIG. 6, that is, the dynamic index scheduler.

도 8을 참조하면, 본 실시예에 따른 동적 색인 스케줄러는, 파일 큐(64)를 통해 전달되는 파일(75)에서 문서를 읽어들여(S81 및 S82) 수초 이내에 색인 작업을 완료한다(S84a, S84b, S84c). 그리고 색인된 문서가 검색되도록 검색 노드에 노출한다(S85a, S85b, S85c).8, the dynamic index scheduler according to the present embodiment reads a document from the file 75 transmitted through the file queue 64 (S81 and S82) and completes indexing within a few seconds (S84a, S84b , S84c). And exposed to the search node so that the indexed document is searched (S85a, S85b, S85c).

전술한 동적 색인 스케줄러의 작동 원리를 좀더 상세히 설명하면 다음과 같다. 문서로거로부터 파일 큐(64)를 통해 파일이 수신되면, 동적 색인 스케줄러는 큐의 파일이 닫히고(closed), 그에 따라 수행되는 문서 읽기를 완료하면(S81), 해당 물리 파일(75)을 삭제할 수 있다.The operation principle of the dynamic index scheduler will be described in more detail as follows. When a file is received from the document logger via the file queue 64, the dynamic index scheduler closes the file of the queue and completes the reading of the document to be performed accordingly (S81) have.

다음, 동적 색인 스케줄러는 읽어낸 문서들 중 색인 문서를 추출한다(S82). 색인 문서의 추출은 20MB 또는 1만개 이하의 문서를 뽑아서 색인하도록 구현될 수 있다. 이러한 색인 문서의 추출 조건은 제1 설정(dynamic.max_log_count), 제2 설정(dynamic.max_log_size_MB) 등으로 미리 저장될 수 있다. 이러한 추출 조건에 의하면, 많은 양의 데이터를 일정 크기(20MB) 또는 일정 분량(1만건 정도)으로 분할하여 색인함으로써 전체 데이터에 대한 색인 작업을 수초 내에 완료할 수 있다.Next, the dynamic index scheduler extracts an index document from among the read documents (S82). The extraction of an index document can be implemented to extract 20MB or less than 10,000 documents. The extraction condition of the index document may be stored in advance as a first setting (dynamic.max_log_count), a second setting (dynamic.max_log_size_MB), or the like. According to such extraction conditions, a large amount of data can be divided into a certain size (20 MB) or a certain amount (about 10,000) and indexed, thereby completing the indexing operation for the entire data within a few seconds.

다음, 동적 색인 스케줄러는 동적 색인 작업을 위해 추출된 색인문서(색인 요청 문서)를 분배하여 복수의 동적 색인 작업들(S84a, S84b, S84c)에 전달할 수 있다(S83). 복수의 동적 색인 작업들(Jobs)(S84a, S84b, S84c)에서는 각각의 동적 색인 작업을 수행하고, 동적 색인된 세그먼트를 각각의 검색 노드(S85a, S85b, S85c)에 전달할 수 있다. 검색 노드들은 새로운 세그먼트(newer)로서 동적 색인된 세그먼트를 색인 세그먼트에 포함시킬 수 있다(S86). 동적 색인된 세그먼트는 검색에 노출되는 색인 세그먼트(83) 중 새로 추가되는 세그먼트(newer)로서 검색가능한 상태의 데이터들(82)에 포함될 수 있다.Next, the dynamic index scheduler may distribute the extracted index document (index request document) for dynamic indexing operation and transmit it to the plurality of dynamic indexing operations (S84a, S84b, S84c) (S83). A plurality of dynamic indexing jobs (S84a, S84b, S84c) can perform each dynamic indexing operation and deliver the dynamically indexed segments to the respective search nodes S85a, S85b, S85c. The search nodes may include the dynamically indexed segment as a new segment in the index segment (S86). The dynamically indexed segment may be included in searchable data 82 as a newly added segment of the index segment 83 that is exposed to the search.

여기서, 검색노드의 세그먼트가 삭제 요청 파일 내 세그먼트 리스트에 포함되는 경우 즉, 세그먼트가 머징 중인 작업에 포함되어 있는 경우, 동적 색인 스케줄러는 해당 검색노드의 세그먼트를 별도의 삭제 요청 파일(delete.req)(84)로 생성할 수 있다(S87). 삭제 요청 파일(84)은 차후 해당 파일의 삭제를 위해 메모리에 저장하거나 외부로 전달할 수 있다.Here, when the segment of the search node is included in the segment list in the deletion request file, that is, when the segment is included in the merging operation, the dynamic index scheduler inserts the segment of the search node into a separate delete request file (delete.req) (84) (S87). The deletion request file 84 may be stored in a memory for later deletion of the corresponding file or may be transmitted to the outside.

삭제 요청 파일(84)의 이름(name)은 delete.req 앞에 소정의 숫자를 포함할 수 있고, 미리 정해진 머징 참조 기준 횟수를 설정하는 참조 개수와 결합될 수 있다. 일례로, 삭제 요청 파일(84)의 파일명에 부가되는 숫자(파일명 숫자)와 참조 개수를 예시하면 아래의 [표 1]과 같다.The name of the deletion request file 84 may include a predetermined number before delete.req and may be combined with a reference number that sets a predetermined number of merging reference times. For example, the number (file name number) added to the file name of the deletion request file 84 and the reference number are as shown in [Table 1] below.

파일명 숫자 File name number 참조 개수Reference count 11111111111111111111 22 22222222222222222222 33 33333333333333333333 33

삭제 요청 파일(84)은 머징 작업의 참조 개수를 카운트하여 참조 개수가 0이면 해당 문서를 삭제하도록 설정될 수 있다. 이러한 삭제 요청 파일(84)을 이용하면, 머징이 많아지고 길어질 때 다수의 삭제 요청 파일(84)이 생성되나, 결국 머징이 종료되면 해당 머징 작업 동안에 생성된 모든 삭제 요청 파일(84)이 삭제되도록 구성할 수 있다.The deletion request file 84 may be set to count the reference number of the merging operation and delete the document if the reference number is zero. If the deletion request file 84 is used, a large number of deletion request files 84 are generated when the number of merging operations becomes longer and longer. However, when the merging operation ends, all the deletion request files 84 generated during the merging operation are deleted Can be configured.

또한, 전술한 삭제 요청 파일(84)을 만드는 이유는 새로 색인된 문서가 기존 문서의 업데이트일 경우, 기존 세그먼트에서 문서를 찾아서 지우기 위한 것이다. 이러한 삭제 요청 파일(84)은 삭제를 위한 마킹(삭제 마킹)의 기능을 포함할 수 있다. 즉, 본 실시예에 의하면, 머징 중인 문서는 삭제 처리가 불가능하므로, 머징이 끝나고 처리할 수 있도록 데이터를 마킹하여 둘 수 있다.The reason for creating the deletion request file 84 is to search for and delete a document in an existing segment if the newly indexed document is an update of an existing document. This deletion request file 84 may include the function of marking for deletion (deletion marking). In other words, according to the present embodiment, since the merging document can not be deleted, the data can be marked so that the merging process can be completed.

도 9는 도 6의 동적 색인 모듈의 작동 원리를 설명하기 위한 도면이다.FIG. 9 is a diagram for explaining the operation principle of the dynamic index module of FIG. 6; FIG.

도 9를 참조하면, 본 실시예에 따른 동적 색인 모듈는 동적 색인을 통해 계속해서 생성되어 그 개수가 늘어나는 세그먼트들(83)에서 라이브 문서 개수를 확인하고 주기적인 머징 색인을 실행하여 세그먼트를 개수를 줄이도록 구성된다.Referring to FIG. 9, the dynamic index module according to the present embodiment checks the number of live documents in the segments 83 continuously generated through the dynamic index and increases the number of segments, and performs a periodic merging index to reduce the number of segments .

이러한 동적 색인 모듈은 동적 색인 스케줄러에 더하여 문서 개수 확인 모듈과 머징 색인 스케줄러의 협업에 의해 구현될 수 있다. 문서 개수 확인 모듈이나 머징 색인 스케줄러는 메모리에 저장된 프로그램(또는 소프트웨어 모듈 등)과 이 프로그램을 실행하는 프로세서에 의해 구현될 수 있다. 전술한 문서 개수 확인 모듈은 머징 색인 스케줄러의 일부 기능으로 포함될 수 있다.This dynamic index module can be implemented by the collaboration of the document count confirmation module and the merging index scheduler in addition to the dynamic index scheduler. The document count verification module or the merging index scheduler may be implemented by a program (or a software module) stored in a memory and a processor executing the program. The above-described document number confirmation module can be included as a part of the merging index scheduler.

좀더 구체적으로 설명하면, 문서 개수 확인 모듈은 색인 세그먼트들에 포함되는 현재 노출 문서의 개수를 체크할 수 있다(S91). 현재 노출 문서는 라이브(live) 문서로 지칭될 수 있다.More specifically, the document count confirmation module may check the number of currently exposed documents included in the index segments (S91). The currently exposed document may be referred to as a live document.

라이브 문서의 개수 체크는 미리 설정된 머징 규칙에 따른 것일 수 있다. 즉, 문서 개수 확인 모듈은 라이브 문서의 개수를 로그성 범위를 포함하도록 분류하여 체크할 수 있다. 로그성 범위는 0, 100, 1K, 10K, 100K, 1M 등과 같이 로그 베이스 10(log₁₀) 단위로 1 내지 6을 포함할 수 있다. 라이브 문서 개수는 각 범위의 문서 개수 별로 하나 이상의 세그먼트를 포함할 수 있다.The number of live documents can be checked according to a predetermined set of merging rules. That is, the document number check module can classify and check the number of live documents to include a log-likelihood range. The logarithmic range may include 1 to 6 in log base 10 (log ₁₀ ) units such as 0, 100, 1K, 10K, 100K, 1M, The number of live documents may include one or more segments per number of documents in each range.

라이브 문서 개수의 체크 결과를 예시하면 아래의 [표 2]와 같다.The results of checking the number of live documents are shown in [Table 2] below.

라이브 문서 개수 범위Live document count range 세그먼트 개수Number of Segments 00 22 100100 66 1K1K 44 10K10K 22 100K100K 1One 1M1M 1One 5M5M 1One 5M+5M + 1One

이러한 로그성 범위를 이용하면, 이후의 머징 색인 작업에서 각 세그먼트에 포함된 문서 개수에 따라 세그먼트를 분류하여 복수의 머징 작업을 병렬적으로 수행하도록 준비될 수 있다(S92).With the use of this logarithmic range, in a subsequent merging index operation, segments can be sorted according to the number of documents included in each segment and prepared to perform a plurality of merging operations in parallel (S92).

라이브 문서 개수가 확인되면, 머징 색인 스케줄러는 미리 설정된 머징 정책이나 머징 규칙에 따라 작업 풀(Job Pool)에 문서 개수 별로 분류된 적어도 하나 이상의 머징 작업을 생성하고 머징 색인을 실행할 수 있다(S93).When the number of live documents is confirmed, the merging index scheduler may generate at least one merging job classified by the number of documents in the job pool according to a preset merging policy or merging rule, and execute the merging index (S93).

머징된 문서들은 복수의 임시 세그먼트들(83a, 83b)로 준비되어 검색에 노출될 수 있다. 그리고 업데이트된 패키지 리스트 또는 삭제 요청 파일(84) 내 문서나 세그먼트 리스트에 따라 머징된 문서 중 일부는 삭제될 수 있다(S96). 삭제 요청 파일을 사용하여 삭제 적용을 수행하는 경우, 참조 개수가 하나씩 줄어들어 참조 개수가 0이 되는 파일을 삭제할 수 있다.The merged documents may be prepared with a plurality of temporary segments 83a and 83b and exposed to search. Some of the merged documents may be deleted according to the updated package list or the document or segment list in the deletion request file 84 (S96). If you apply deletion using a delete request file, you can delete a file whose reference count is 0 by decrementing the reference count by one.

삭제되지 않은 임시 세그먼트는 검색에 노출되는 데이터들(82)에 포함되는 색인 세그먼트(83)에 새로운 세그먼트(Newer)로 등록될 수 있다(S94, S95).The non-deleted temporary segment may be registered as a new segment (Newer) in the index segment 83 included in the data 82 that is exposed to the search (S94, S95).

전술한 머징 규칙은 특정 필드 값이 0인 세그먼트는 무조건 머징을 수행하거나, 4개 이상이 모인 세그먼트들은 머징을 수행하거나, 500만 건 이상의 세그먼트는 머징에서 제외하거나, 삭제 문서가 전체의 40% 이상을 차지하면, 4개가 모이지 않아도 머징을 수행하거나, 머징에 선택되지 않은 세그먼트는 그대로 두도록 설정될 수 있다.The foregoing merging rule is such that a segment having a specific field value of 0 performs merging unconditionally, a segment in which four or more segments are merged, a segment in which more than 5 million segments are excluded from merging, The merge may be performed without gathering four segments, or may be set such that segments not selected for merging are left as they are.

본 실시예에 의하면, 머싱 색인을 문서 개수 범위를 나누어 실행함으로써, 비슷한 크기의 세그먼트들끼리 병합할 수 있고, 그에 의해 장치의 입력 및 출력 효율을 가장 좋게 사용할 수 있다.According to this embodiment, segments of similar size can be merged by executing the merging index divided by the number-of-documents range, whereby the input and output efficiency of the apparatus can be best used.

도 10a 및 도 10b는 도 6의 색인 세그먼트 즉 동적 색인 세그먼트의 라이프 사이클을 설명하기 위한 도면들이다.FIGS. 10A and 10B are diagrams for explaining the life cycle of the index segment, that is, the dynamic index segment of FIG.

도 10a를 참조하면, 본 실시예에 따른 세그먼트 라이프 사이클은 동적 색인, 머징 색인 또는 이들의 조합에 의해 색인 세그먼트 내 세그먼트의 개수와 각 세그먼트 내 문서 개수의 변화를 가질 수 있다.Referring to FIG. 10A, the segment life cycle according to the present embodiment may have a change in the number of segments in the index segment and the number of documents in each segment by a dynamic index, a merging index, or a combination thereof.

예를 들면, 검색에 노출되는 데이터(제1 데이터, 82)의 색인 세그먼트(83)로서 복수의 세그먼트들(a0, a1, a2, a3)이 있다고 할 때, 동적 색인에 의해 하나의 세그먼트(102)가 준비되어 새로운 색인 세그먼트(a4)로 추가될 수 있다(S101). 그 경우, 색인 서버는 검색에 노출되는 데이터(82a)의 색인 세그먼트 내 다른 세그먼트들(a0, a1, a2, a3)에서 동적 색인된 새로운 세그먼트(a4) 내 문서와 중복되는 문서를 삭제할 수 있다(S102). 도 10에서 각 세그먼트의 짙은 음영 부분은 문서가 삭제된 부분을 나타낼 수 있다.For example, if there is a plurality of segments a0, a1, a2, and a3 as the index segments 83 of the data (first data, 82) exposed to the search, one segment 102 May be prepared and added as a new index segment a4 (S101). In that case, the index server may delete the documents that are duplicated with the documents in the new segment a4 dynamically indexed in the other segments (a0, a1, a2, a3) in the index segment of the data 82a that are exposed to the search S102). In Fig. 10, the dark shaded portions of each segment can indicate the deleted portion of the document.

도 10b를 참조하면, 검색에 노출되는 데이터들(제1 데이터, 82b)의 색인 세그먼트로서 복수의 세그먼트들(a0, a1, a2, a3, a4, a5)이 있다고 할 때, 다른 동적 색인 작업이 동시에 수행되면, 색인 서버는 동적 색인된 세그먼트(104)를 새로운 세그먼트로서 제1 데이터(82b)에 포함시키고, 동적 색인된 세그먼트(104, a6) 내 문서와 동일한 다른 세그먼트들(a0, a1, a2, a3, a4, a5) 내의 문서를 삭제할 수 있다. 본 실시예에서, 첫 번째 세그먼트(a0)는 모든 문서가 동적 색인된 세그먼트(a6)와 중첩될 수 있다.10B, it is assumed that there are a plurality of segments a0, a1, a2, a3, a4, and a5 as index segments of data (first data, 82b) The index server includes the dynamically indexed segment 104 as the new segment in the first data 82b and the other segments a0, a1, a2 that are the same as the document in the dynamically indexed segment 104, a6 , a3, a4, a5) can be deleted. In this embodiment, the first segment a0 may overlap with the dynamic indexed segment a6 of all documents.

또한, 위의 동적 색인된 세그먼트(a6)를 생성하는 동적 색인 작업과 동시에 머징 색인 작업이 수행될 수 있다(S105). 즉, 복수의 동적 색인 작업들의 수행과 병렬적으로, 머징 색인 스케줄러는 색인 세그먼트 내의 복수의 세그먼트들(a0, a1, a2, a3, a4, a5)에 대한 머징 색인 작업을 수행할 수 있다.In addition, a merge indexing operation can be performed simultaneously with the dynamic indexing operation to generate the above dynamic indexed segment a6 (S105). That is, in parallel with the execution of a plurality of dynamic indexing operations, the merging index scheduler can perform a merge indexing operation on a plurality of segments a0, a1, a2, a3, a4, a5 in the index segment.

머징 색인된 세그먼트(106)는 새로운 세그먼트로서 제2 데이터(82c)에 포함될 수 있다(S106). 이때, 머징 색인 스케줄러는 머징 중에 만들어진 삭제 문서(84a)를 머징 색인된 세그먼트(106)에 적용하여 해당 문서를 삭제할 수 있다(S107a).The merged indexed segment 106 may be included in the second data 82c as a new segment (S106). At this time, the merging index scheduler can delete the document by applying the deletion document 84a created during the merging to the merged indexed segment 106 (S107a).

또한, 머징 색인 스케줄러는 머징 색인된 세그먼트(106)에 머징 중인 작업의 문서가 있다면 차후 삭제를 위해 그에 대한 문서(삭제 요청 파일)을 생성할 수 있다(S107b).In addition, the merging index scheduler may generate a document (deletion request file) for the subsequent deletion if there is a document of the job being merged in the merged indexed segment 106 (S107b).

또한, 다른 복수의 동적 색인 작업들이 있으면, 머징 색인 스케줄러는 복수의 세그먼트들(a0, a1, a2, a3, a4, a5, a6)에 대하여 각각 동적 색인 작업을 수행한 후, 동적 색인된 세그먼트(108a, 108b)를 추가하면서 머징 색인된 세그먼트(106)를 제2 데이터(82c)에 포함될 수 있다. 제2 데이터(82c)는 제1 데이터(82b)의 시간 흐름에 따라 색인 세그먼트가 증가, 감소 또는 유지되어 검색에 노출되는 데이터를 지칭할 수 있다.In addition, if there are a plurality of different dynamic index operations, the merging index scheduler performs a dynamic indexing operation on each of the plurality of segments a0, a1, a2, a3, a4, a5, a6, The merged indexed segment 106 may be included in the second data 82c while adding the first segment 108a, 108b. The second data 82c may refer to data for which the index segment is increased, decreased or maintained in accordance with the time flow of the first data 82b to be exposed to the search.

상기의 동적 색인된 세그먼트들(108a, 108b)은 기재된 순서대로 혹은 빨리 생성된 순서대로 새로운 색인 세그먼트(a7, a8)로서 제2 데이터(82c)에 포함될 수 있다(S108a, S108b).The dynamically indexed segments 108a and 108b may be included in the second data 82c as new index segments a7 and a8 in the order described or as quickly as they are generated (S108a and S108b).

그리고 세그먼트 내 문서가 모두 동적 색인된 세그먼트 내 문서와 중첩되는 두 세그먼트(a0, a1)는 삭제될 수 있다.And two segments (a0, a1) in which the documents in the segment are all overlapped with the document in the segment that is dynamically indexed can be deleted.

도 11은 도 6의 색인 세그먼트의 라이프 사이클의 타임라인을 예시한 도면이다.FIG. 11 is a diagram illustrating a timeline of the life cycle of the index segment of FIG. 6. FIG.

도 11을 참조하면, 본 실시예에 따른 데이터 인덱싱 방법은 복수의 색인 작업들과 순차적인 머징 작업들을 위해 각 작업에서 생성되는 세그먼트(색인 세그먼트)를 검색 즉, 검색 가능한 상태의 데이터들에 효과적으로 노출할 수 있다.Referring to FIG. 11, the data indexing method according to the present embodiment searches for segments (index segments) generated in each job for a plurality of index jobs and sequential merging jobs, that is, can do.

제1 색인 작업(색인1)을 통해 생성된 제1 색인 세그먼트(a0), 제2 색인 작업(색인2)을 통해 생성된 제2 색인 세그먼트(a1), 및 제3 색인 작업(색인3)을 통해 생성된 제3 색인 세그먼트(a2)가 시간 흐름에 따라 순서대로 검색에 노출될 수 있다.A first indexing segment a0 generated through a first indexing operation (index 1), a second indexing segment al generated through a second indexing operation (index 2), and a third indexing operation (index 3) The third index segment a2 generated through the first index segment may be exposed to the search in order according to the time flow.

다음, 제1 머징 작업(머징1)을 통해 색인 세그먼트들(a0, a1, a2)이 머징 색인되어 제6 색인 세그먼트(a5)로 생성될 수 있다.Next, the index segments a0, a1, a2 may be merged indexed through a first merging operation (merging 1) to produce a sixth index segment a5.

한편, 제1 머징 작업(머징1) 도중에, 제4 색인 작업(색인4)을 통해 제4 색인 세그먼트(a3)가 생성될 수 있다. 이때, 색인 서버는 이전 색인 세그먼트들에 대한 제1 머징 작업(머징1)이 수행 중이므로 제4 색인 세그먼트(a3)의 파일명에 임시아이디를 부여하고 임시아이디를 포함하는 삭제 요청 파일을 함께 생성할 수 있다.Meanwhile, during the first merging operation (merging 1), the fourth index segment a3 may be generated through the fourth index operation (index 4). At this time, since the first merging operation (merging 1) for the previous index segments is being performed, the index server may give a temporary ID to the file name of the fourth index segment a3 and generate a deletion request file including the temporary ID together have.

또한, 위의 제4 색인 작업(색인4)의 경우와 유사하게, 제1 머징 작업(머징1) 도중에, 제5 색인 작업(색인5)을 통해 제5 색인 세그먼트(a4)가 생성될 수 있다. 이때, 색인 서버는 이전의 일부 색인 세그먼트들에 대한 제1 머징 작업(머징1)이 수행 중이므로 제5 색인 세그먼트(a4)의 파일명에 임시아이디를 포함하는 삭제 요청 파일을 함께 생성할 수 있다.Also, similar to the case of the above-mentioned fourth index operation (index 4), during the first merging operation (merging 1), the fifth index segment a4 may be generated through the fifth index operation (index 5) . At this time, since the first merging operation (merging 1) for some previous index segments is being performed, the index server may generate a deletion request file including a temporary ID in the file name of the fifth index segment a4.

한편, 제1 머징 작업(머징1)이 완료된 후, 제2 머징 작업(머징2)은 제1 머징 작업(머징1)을 통해 생성된 제6 색인 세그먼트(a5)와 제2 머징 작업(머징2) 도중에 동적 색인된 제4 및 제5 색인 세그먼트들(a3, a4)에 대하여 수행될 수 있다. 제2 머징 작업(머징2)에 의해 제8 색인 세그먼트(a7)가 생성될 수 있다.On the other hand, after the first merging operation (merging 1) is completed, the second merging operation (merging 2) includes the sixth index segment a5 generated through the first merging operation (merging 1) and the second merging operation (A3, a4) dynamically indexed during the first and second index segments. The segment a7 which is the eighth index can be generated by the second merging operation (merging 2).

물론, 제2 머징 작업(머징2) 도중에 제6 색인 작업(색인6)을 통해 제7 색인 세그먼트(a6)가 생성될 수 있다. 그 경우, 색인 서버는 제7 색인 세그먼트(a6)에 임시아이디를 부여하여 임시아이디를 포함하는 삭제 요청 파일을 생성할 수 있다. 삭제 요청 파일의 파일명은 "2330665693309899.delete.req"와 같이 부여될 수 있다.Of course, a seventh index segment a6 may be generated through a sixth index operation (index 6) during the second marching operation (merging 2). In this case, the index server can give a temporary ID to the seventh index segment a6 to generate a deletion request file including the temporary ID. The file name of the deletion request file may be given as "2330665693309899.delete.req ".

세그먼트의 아이디는 색인이 끝날 때 순차적으로 부여되므로, 색인이 먼저 수행되더라도 아이디는 늦게 부여받을 수 있다.Since the IDs of segments are assigned sequentially at the end of the index, IDs can be granted late even if the index is first performed.

색인 종료 시점에 머징 중인 세그먼트가 있다면 자신의 <임시아이디>.delete.req 파일(삭제 요청 파일)을 생성하여 차후 머징 작업이 끝난 세그먼트가 참조하도록 할 수 있다.If there is a segment that is being merged at the end of the index, you can create your own <Temporary ID> .delete.req file (delete request file) so that the next merged segment can be referenced.

이와 같이, 본 실시예에 의하면, 복수의 색인들과 순차적인 머징들의 조합을 효과적으로 수행하면서 삭제 요청 파일을 이용하여 중복되는 문서를 효과적으로 삭제할 수 있다.As described above, according to the present embodiment, duplicate documents can be effectively deleted by using a deletion request file while effectively performing a combination of a plurality of indexes and sequential mergings.

도 12는 본 발명의 다른 실시예에 따른 데이터 인덱싱 장치에 대한 블록도이다. 도 13은 본 발명의 또 다른 실시예에 따른 데이터 인덱싱 장치에 대한 블록도이다.12 is a block diagram of a data indexing apparatus according to another embodiment of the present invention. 13 is a block diagram of a data indexing apparatus according to another embodiment of the present invention.

도 12를 참조하면, 본 실시예에 따른 데이터 인덱싱 방법을 구현하는 컴퓨팅 장치(이하, 간략히 인덱서 또는 인덱싱 장치라고 한다)는, 제어부(1210) 및 메모리(1220)를 포함할 수 있다. 제어부(1210)는 통신부(1230) 또는 통신 인터페이스를 더 포함할 수 있으며, 데이터베이스(1240)와 연결될 수 있다. 데이터베이스(1240)의 적어도 일부는 인덱서에 포함될 수 있다.12, a computing device (hereinafter briefly referred to as an indexer or indexing device) that implements the data indexing method according to the present embodiment may include a control unit 1210 and a memory 1220. [ The control unit 1210 may further include a communication unit 1230 or a communication interface, and may be connected to the database 1240. At least a portion of the database 1240 may be included in the indexer.

제어부(1210)는 색인요청 처리부(1211), 색인 문서 추출부(1214), 동적 색인부(1216) 및 머징 색인부(1218)를 포함할 수 있다.The control unit 1210 may include an index request processing unit 1211, an index document extracting unit 1214, a dynamic indexing unit 1216, and a merging indexing unit 1218.

색인요청 처리부(1211)는 API 리스너 등으로 구현될 수 있으며, 색인요청을 접수하고 접수된 색인 요청을 인덱서의 작업 풀에 전달할 수 있다. 작업 풀에서는 색인요청이 접수될 때, 색인요청 작업을 생성할 수 있다.The index request processing unit 1211 may be implemented as an API listener or the like and may accept the index request and forward the received index request to the indexer's work pool. In the work pool, when an index request is received, an index request operation can be created.

색인 문서 추출부(1214)는 작업 풀에 생성된 색인요청작업에 따라 색인작업을 시작하는 수단이나 이러한 수단에 상응하는 기능을 수행하는 구성부를 포함할 수 있다. 색인시작부(1214)는 작업 풀에 연결되는 문서로거를 포함할 수 있다.The index document extracting unit 1214 may include means for starting the indexing operation according to the index request job created in the work pool or a component performing a function corresponding to this means. Index initiator 1214 may include a document logger that is coupled to a work pool.

동적 색인부(1216)는 동적 색인 모듈 또는 동적 색인 스케줄러에 대응될 수 있다. 동적 색인부(1216)는 색인 문서 추출부(1214)에 의해 메모리(1220)에 저장되는 인덱스 로그를 토대로 색인 문서를 추출하고, 동적 색인 작업을 수행할 수 있다.The dynamic indexing unit 1216 may correspond to a dynamic indexing module or a dynamic indexing scheduler. The dynamic indexing unit 1216 can extract the index document based on the index log stored in the memory 1220 by the index document extracting unit 1214 and perform the dynamic indexing operation.

머징 색인부(1218)는 동적 색인 모듈에 포함되는 머징 색인 스케줄러로 구현되거나 별도의 머징 색인 모듈로 구현될 수 있다. 머징 색인부(1218)는 라이브 문서 갯수를 확인하고, 머징 색인 작업을 수행할 수 있다.The merging index unit 1218 may be embodied as a merging index scheduler included in the dynamic indexing module or may be implemented with a separate merging indexing module. The merging indexing unit 1218 can check the number of live documents and perform a merging indexing operation.

전술한 색인요청 처리부(1211), 색인 문서 추출부(1214), 동적 색인부(1216) 및 머징 색인부(1218)는 제어부(1210)에 로드되는 프로그램으로서, 메모리(1220)에 각각의 소프트웨어 모듈(1224)로 탑재될 수 있다.The index request processing unit 1211, the index document extracting unit 1214, the dynamic indexing unit 1216 and the merging indexing unit 1218 are programs loaded into the control unit 1210, Lt; RTI ID = 0.0 > 1224 < / RTI >

전술한 제어부(1210)는 로직 컨트롤러나 마이크로프로세서(microprocessor)로 구현될 수 있다. 제어부(1210)는 하나 이상의 코어와 캐시 메모리를 포함할 수 있다. 제어부(1210)가 멀티 코어 구조를 가지는 경우, 멀티 코어(multi-core)는 두 개 이상의 독립 코어를 단일 집적 회로로 이루어진 하나의 패키지로 통합한 것을 지칭할 수 있다.The controller 1210 may be implemented as a logic controller or a microprocessor. The controller 1210 may include one or more cores and a cache memory. When the control unit 1210 has a multi-core structure, a multi-core may refer to integrating two or more independent cores into a single package of a single integrated circuit.

일례로, 도 13에 도시한 바와 같이, 본 실시예의 인덱싱 장치가 제1 컴퓨팅 장치(120A)와 제2 컴퓨팅 장치(120B)의 협업에 의해 구현되는 경우, 제1 컴퓨팅 장치(120A)의 제1 프로세서에서 인덱싱 방법의 일부를 수행하고, 제2 컴퓨팅 장치(120B)의 제2 프로세서에서 인덱싱 방법의 나머지 일부를 수행하도록 구현될 수 있다. 그 경우, 제1 프로세서에서 생성되는 삭제 요청 파일(84c)과 제2 프로세서에서 생성되는 삭제 요청 파일(84d)은 동일한 파일명을 가질 수 있다. 따라서 본 실시예에서는 복수의 프로세서들 각각에 <임시아이디>의 마지막에 고유의 식별자 혹은 숫자를 추가하여 삭제 요청 파일을 생성할 수 있다. 고유의 식별자 혹은 숫자는 일례로 제1 프로세서에서 생성된 삭제 요청 파일(84c)의 임시아이디 마지막에는 0을 붙이고, 제2 프로세서에서 생성된 삭제 요청 파일(84d)의 임시아이디 마지막에는 2를 붙이는 방식 등으로 구현될 수 있다.13, when the indexing device of the present embodiment is implemented by the cooperation of the first computing device 120A and the second computing device 120B, the first computing device 120A May be implemented to perform part of the indexing method in the processor and to perform the remainder of the indexing method in the second processor of the second computing device 120B. In this case, the deletion request file 84c generated in the first processor and the deletion request file 84d generated in the second processor may have the same file name. Therefore, in this embodiment, a unique request ID or a number may be added to each of the plurality of processors at the end of the " temporary ID " to generate a deletion request file. For example, a unique identifier or number is set to 0 by adding 0 to the end of the temporary ID of the deletion request file 84c generated by the first processor and by appending 2 to the end of the temporary ID of the deletion request file 84d generated by the second processor Or the like.

또한, 제어부(1210)가 단일 코어 구조를 가지는 경우, 단일 코어(single core)는 중앙 처리 장치(CPU)를 포함할 수 있다. 중앙처리장치는 MCU(micro control unit)와 주변 장치(외부 확장 장치를 위한 집적회로)가 함께 배치되는 SOC(system on chip)로 구현될 수 있으나, 이에 한정되지는 않는다. 여기서, 코어는 처리할 명령어를 저장하는 레지스터(register), 비교, 판단, 연산을 담당하는 산술논리연산장치(arithmetic logical unit, ALU), 명령어의 해석과 실행을 위해 CPU를 내부적으로 제어하는 내부 컨트롤 유닛(control unit), 내부 버스 등을 구비할 수 있다.Further, when the controller 1210 has a single core structure, a single core may include a central processing unit (CPU). The central processing unit may be implemented as a system on chip (SOC) in which a micro control unit (MCU) and a peripheral device (integrated circuit for external expansion device) are disposed together, but the present invention is not limited thereto. Here, the core includes registers for storing instructions to be processed, arithmetic logical units (ALUs) for comparisons, judgments, and arithmetic operations, internal controls for internally controlling CPUs for interpreting and executing instructions, A control unit, an internal bus, and the like.

또한, 제어부(1210)는 하나 이상의 데이터 프로세서, 이미지 프로세서, 코덱(CODEC) 또는 이들의 조합을 포함할 수 있으나, 이에 한정되지는 않는다. 제어부(1210)는 차량에 탑재되는 적어도 하나 이상의 전자제어장치(electric control unit, ECU)를 포함할 수 있다.In addition, the controller 1210 may include, but is not limited to, one or more data processors, image processors, CODECs, or a combination thereof. The control unit 1210 may include at least one electric control unit (ECU) mounted on the vehicle.

또한, 제어부(1210)는 주변장치 인터페이스와 메모리 인터페이스를 구비할 수 있고, 그 경우 주변장치 인터페이스는 제어부(1210)와 입출력 시스템 및 다른 주변 장치(통신부 등)의 연결 및 연결 관리를 담당하고, 메모리 인터페이스는 제어부(1210)와 메모리(1220)의 연결 및 연결 관리를 담당할 수 있다.The controller 1210 may include a peripheral device interface and a memory interface. In this case, the peripheral device interface is responsible for connection and connection management between the controller 1210 and the input / output system and other peripheral devices (communication units, etc.) The interface may be responsible for connection and connection management between the controller 1210 and the memory 1220.

전술한 제어부(1210)는 여러 가지의 소프트웨어 프로그램을 실행하여 인덱싱 방법을 수행하기 위한 데이터 입력, 데이터 처리 및 데이터 출력을 수행할 수 있다. 또한, 제어부(1210)는 메모리(1220)에 저장되어 있는 특정한 소프트웨어 모듈(명령어 세트)을 실행하여 해당 모듈에 대응하는 특정한 여러 가지의 기능을 수행할 수 있다. 즉, 제어부(1210)는 메모리(1220)에 저장된 모듈들(1224)에 의해 컴퓨팅 장치에서 인덱싱 방법을 수행하는데 있어서 핵심적인 역할을 수행할 수 있다.The control unit 1210 may execute various software programs to perform data input, data processing, and data output for performing the indexing method. In addition, the control unit 1210 may execute a specific software module (instruction set) stored in the memory 1220 to perform various specific functions corresponding to the corresponding module. That is, the controller 1210 can play a key role in performing the indexing method in the computing device by the modules 1224 stored in the memory 1220. [

전술한 메모리(1220)는 메모리 시스템으로 지칭될 수 있고, 하나 이상의 자기 디스크 저장 장치와 같은 고속 랜덤 액세스 메모리 및/또는 비휘발성 메모리, 하나 이상의 광 저장 장치 및/또는 플래시 메모리를 포함할 수 있다. 또한, 메모리(1220)는 소프트웨어, 프로그램, 명령어 집합 또는 이들의 조합을 저장할 수 있다.The memory 1220 described above may be referred to as a memory system and may include high speed random access memory and / or nonvolatile memory such as one or more magnetic disk storage devices, one or more optical storage devices, and / or a flash memory. The memory 1220 may also store software, programs, a set of instructions, or a combination thereof.

소프트웨어의 구성요소는 운영 체제(operating system) 모듈, 통신 모듈, 그래픽 모듈, 사용자 인터페이스 모듈, MPEG(moving picture experts group) 모듈, 카메라 모듈, 하나 이상의 애플리케이션 모듈 등을 포함할 수 있다. 모듈(1224)은 명령어들의 집합으로서 명령어 세트(instruction set) 또는 프로그램으로 표현될 수 있다.Components of the software may include an operating system module, a communication module, a graphics module, a user interface module, a moving picture experts group (MPEG) module, a camera module, one or more application modules, and the like. Module 1224 may be represented as an instruction set or program as a collection of instructions.

운영 체제는 예컨대 MS WINDOWS, LINUX, 다윈(Darwin), RTXC, UNIX, OS X, iOS, 맥 OS, VxWorks, 구글 OS, 안드로이드(android), 바다(삼성 OS), 플랜 9 등과 같은 내장 운영 체제를 포함하고, 인덱서의 시스템 작동(system operation)을 제어하는 여러 가지의 구성요소를 구비할 수 있다. 전술한 운영 체제는 여러 가지의 하드웨어(장치)와 소프트웨어 구성요소(모듈) 사이의 통신을 수행하는 기능도 구비할 수 있으나, 이에 한정되지는 않는다.The operating system includes built-in operating systems such as MS WINDOWS, LINUX, Darwin, RTXC, UNIX, OS X, iOS, Mac OS, VxWorks, Google OS, Android, And may include various components for controlling the system operation of the indexer. The above-described operating system may also include, but is not limited to, a function of performing communication between various hardware devices and software components (modules).

통신부(1230)는 인덱서가 네트워크를 통해 검색 서버나 다른 장치와 연결될 수 있도록 하나 이상의 통신 프로토콜을 지원할 수 있다. 통신부(1230)는 하나 이상의 무선 통신 서브시스템을 포함할 수 있다. 무선 통신 서브시스템은 무선 주파수(radio frequency) 수신기 및 송수신기 및/또는 광(예컨대, 적외선) 수신기 또는 송수신기를 포함할 수 있다.The communication unit 1230 can support one or more communication protocols so that the indexer can be connected to the search server or other device via the network. The communication unit 1230 may include one or more wireless communication subsystems. The wireless communication subsystem may include a radio frequency receiver and a transceiver and / or an optical (e.g., infrared) receiver or transceiver.

네트워크는 예를 들어, GSM(Global System for Mobile Communication), EDGE(Enhanced Data GSM Environment), CDMA(Code Division Multiple Access), W-CDMA(W-Code Division Multiple Access), LTE(Long Term Evolution), LET-A(LET-Advanced), OFDMA(Orthogonal Frequency Division Multiple Access), WiMax, Wi-Fi(Wireless Fidelity), Bluetooth 등을 포함할 수 있다.The network may be, for example, a Global System for Mobile Communications (GSM), Enhanced Data GSM Environment (EDGE), Code Division Multiple Access (CDMA), W-Code Division Multiple Access (W-CDMA), Long Term Evolution (LET-A), Orthogonal Frequency Division Multiple Access (OFDMA), WiMax, Wi-Fi (Wireless Fidelity), Bluetooth, and the like.

한편, 본 실시예에 있어서, 인덱서의 메모리(1220)에 저장되는 모듈(1224)은 컴퓨터 장치에 탑재되는 기능 블록 또는 모듈일 수 있으나, 이에 한정되지 않는다. 전술한 모듈(1224)은 이들이 수행하는 일련의 기능(인덱싱 방법)을 구현하기 위한 소프트웨어 형태로 컴퓨터 판독 가능 매체(기록매체)에 저장되거나 혹은 캐리어 형태로 원격지에 전송되어 다양한 컴퓨팅 장치에서 동작하도록 구현될 수 있다.Meanwhile, in the present embodiment, the module 1224 stored in the memory 1220 of the indexer may be, but is not limited to, a functional block or module mounted on a computer device. The modules 1224 described above may be stored in a computer readable medium (recording medium) in the form of software for implementing a series of functions (indexing methods) they perform, or may be transmitted to a remote location in the form of a carrier to be implemented in various computing devices .

여기서 컴퓨터 판독 가능 매체는 네트워크를 통해 연결되는 복수의 컴퓨팅 장치나 클라우드 시스템을 포함할 수 있고, 복수의 컴퓨팅 장치나 클라우드 시스템 중 적어도 하나 이상은 메모리 시스템에 본 실시예의 인덱싱 방법을 수행하기 위한 프로그램이나 소스 코드 등을 저장할 수 있다.The computer readable medium may include a plurality of computing devices or cloud systems connected via a network, and at least one of the plurality of computing devices or the cloud system may be a program for performing the indexing method of the present embodiment Source code, and so on.

즉, 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하는 형태로 구현될 수 있다. 컴퓨터 판독 가능 매체에 기록되는 프로그램은 본 발명을 위해 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것을 포함할 수 있다.That is, the computer-readable medium may be embodied in the form of a program command, a data file, a data structure, or the like, alone or in combination. Programs recorded on a computer-readable medium may include those specifically designed and constructed for the present invention or those known and available to those skilled in the computer software arts.

또한, 컴퓨터 판독 가능 매체는 롬(rom), 램(ram), 플래시 메모리(flash memory) 등과 같이 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치를 포함할 수 있다. 프로그램 명령은 컴파일러(compiler)에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터(interpreter) 등을 사용해서 컴퓨터에 의해 실행될 수 있는 고급 언어 코드를 포함할 수 있다. 하드웨어 장치는 본 실시예의 인덱싱 방법을 수행하기 위해 적어도 하나의 소프트웨어 모듈로 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The computer-readable medium may also include a hardware device specifically configured to store and execute program instructions, such as a ROM, a RAM, a flash memory, and the like. Program instructions may include machine language code such as those produced by a compiler, as well as high-level language code that may be executed by a computer using an interpreter or the like. The hardware device may be configured to operate with at least one software module to perform the indexing method of the present embodiment, and vice versa.

또한, 본 실시예의 인덱싱 방법은 클라우드 컴퓨팅 기반의 검색엔진 시스템에 채용될 수 있다. 검색엔진은 인터넷 등의 네트워크에서 데이터를 검색하는 수단이나 이러한 수단에 상응하는 기능을 수행하는 구성부를 지칭할 수 있다. 일례로, 검색엔진 시스템은 IaaS(Infrastructure as a Service), PaaS(Platform as a Service), 또는 SaaS(Software as a Service)를 지원하는 시스템일 수 있다. 여기서, IasS는 서버를 운영하기 위한 서버 자원, 통신프로토콜(IP 등), 네트워크(network), 저장장치(Storage), 전력 등의 인프라를 가상의 환경에서 이용할 수 있게 서비스 형태로 제공하는 것을 말한다. PaaS는 서비스를 개발할 수 있는 플랫폼(platform)과 그 환경을 이용하는 응용프로그램을 개발할 수 있는 API까지 제공하는 형태를 말한다. 그리고 SaaS는 클라우드 환경에서 동작하는 응용프로그램을 서비스 형태로 제공하는 것을 말한다.In addition, the indexing method of this embodiment can be employed in a cloud computing based search engine system. A search engine may refer to a means for retrieving data in a network such as the Internet or a component that performs a function corresponding to such means. For example, the search engine system may be a system supporting Infrastructure as a Service (IaaS), Platform as a Service (PaaS), or Software as a Service (SaaS). Here, IasS is a service type in which a server resource, a communication protocol (IP, etc.), a network, a storage, and a power for operating a server are utilized in a virtual environment. PaaS is a form that provides a platform to develop a service and an API to develop an application that uses the environment. SaaS is a service type that provides applications running in the cloud environment.

상기에서는 본 발명의 바람직한 실시예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the present invention as defined by the following claims It can be understood that

Claims

1. A data indexing method for real-time searching performed in a computing device having a processor,
The index request processing unit of the processor recording a document of a memory as a log file;
Selecting an amount of a document including at least a portion of information read from the log file by an index scheduler of the processor;
The index scheduler generating at least one temporary segment for the document;
The index scheduler exposing the at least one temporary segment to a search of a search engine; And
If the index scheduler is merging a document included in the at least one temporary segment while the at least one temporary segment is exposed, generating an erase request file storing an identifier of the document, ,
The processor further performs the function of a merging index scheduler by a program stored in the memory,
The merging index scheduler merging a plurality of index segments dynamically indexed by the dynamic index scheduler of the processor,
Wherein the processor grants a temporary ID for an index segment that is dynamically indexed during a merge indexing operation of the merging index scheduler, wherein the deletion request file includes the temporary ID in a filename.

The method according to claim 1,
Wherein the file name of the deletion request file includes a number determined by a temporal fixed value.

The method of claim 2,
Wherein a reference reference number is set in the deletion request file, and the reference number is subtracted by a predetermined number when the deletion request file is referred to the merging.

The method of claim 3,
Further comprising, after said generating, deleting a document corresponding to the deletion request file below the lower limit when the reference number of references is less than or equal to the lower limit by reference in the merging.

5. The method according to any one of claims 1 to 4,
After the step of exposing,
Further comprising the step of checking the number of live documents in the exposing step, wherein the range of live document counts is classified into log realistic ranges.

The method of claim 5,
After the checking step,
And simultaneously processing a merging index for each of the plurality of merging target segments classified into the logarithmic ranges.

The method of claim 6,
Wherein the logarithmic ranges include classification ranges in which the number of documents included in each segment is 0, 100, 1K, 10K, 100K or 1M.

The method according to claim 1,
Wherein the writing step comprises attaching a document to a file each time the document logger flushes and replacing the file every rolling cycle to generate the log file.

The method of claim 8,
Wherein the document logger is connected to two memories or memory areas that are replaced each time a flush of reading the log file is performed.

The method of claim 8,
Wherein the log file is inserted into a file queue, the file queue includes a log file status object, and the log file status object includes a file object and a closed status.

The method of claim 8,
Further comprising the step of obtaining a signal or information for an index request operation in the document logger prior to the step of recording.

The method according to claim 1,
Wherein the selecting step extracts log files of a predetermined quantity or number according to the indexing capacity or performance of the computing device.

As a data indexing device,
A processor, a memory coupled to the processor, and a program stored in the memory,
The processor performs the functions of the document logger and the dynamic index scheduler by the program,
The document logger records the document in the memory as a log file,
Wherein the dynamic index scheduler selects a quantity of documents containing at least a portion of the information read in the log file, generates at least one temporary segment for the document, exposes the at least one temporary segment to a search of a search engine Generating a deletion request file storing an identifier of the deletion candidate document when the deletion candidate document included in the at least one temporary segment is being merged,
The processor further performs the function of a merging index scheduler by a program stored in the memory,
The merging index scheduler merging a plurality of index segments dynamically indexed by the dynamic index scheduler of the processor,
Wherein the processor provides a temporary ID for an index segment that is dynamically indexed during a merging index operation of the merging index scheduler, wherein the deletion request file includes the temporary ID in a filename.

delete

14. The method of claim 13,
Wherein the processor deletes a document included in a first index segment that has already been exposed to search using the deletion request file when exposing a second index segment generated by the merging index scheduler to the search, .