KR101082024B1

KR101082024B1 - Device for index managing of evidence image in digital forensic system and method therefor

Info

Publication number: KR101082024B1
Application number: KR1020090021690A
Authority: KR
Inventors: 조수형; 홍도원
Original assignee: 한국전자통신연구원
Priority date: 2008-12-08
Filing date: 2009-03-13
Publication date: 2011-11-10
Also published as: KR20100066263A

Abstract

디지털 포렌식 시스템에서 증거 이미지의 색인 관리 기술에 관한 것이다. 색인 관리 장치는 증거 이미지를 파일 시스템 구조의 디지털 자료로 복원하는 복원부, 복원된 디지털 자료에 속하는 파일들 각각에 고유의 문서번호를 할당하여 문서번호사전으로 관리하는 문서번호사전 관리부, 및 복원된 디지털 자료의 파일로부터 색인어를 추출하고 추출된 색인어별로 색인어가 속하는 파일에 할당된 문서번호를 지정하는 색인 수행부를 포함하는 색인부; 색인부에 의한 색인어별 문서번호가 지정된 색인 정보가 저장되는 색인 저장부; 및 색인 데이터베이스를 참조하여 검색하고자 하는 색인어에 지정된 문서번호를 확인하고, 확인된 문서번호를 가지고 상기 문서번호사전을 참조하여 해당 파일을 검색하는 색인어 검색부;를 포함한다.An index management technique of evidence images in a digital forensic system. The index management apparatus includes: a restoration unit for restoring the evidence image to digital data of the file system structure, a document number dictionary management unit for assigning a unique document number to each of the files belonging to the restored digital data and managing the document number dictionary; An indexing unit including an indexing unit for extracting an index word from a file of digital material and specifying a document number assigned to a file to which the index word belongs for each extracted index word; An index storage unit for storing index information in which document numbers for respective index words are designated by the index unit; And an index word search unit that checks a document number assigned to the index word to be searched by referring to the index database, and searches the corresponding file by referring to the document number dictionary with the confirmed document number.

Description

Device for index managing of evidence image in digital forensic system and method therefor}

디지털 포렌식 시스템(digital forensic system)에 관한 것으로, 특히 디지털 증거의 빠른 검색을 위한 색인 기술에 관한 것이다.It relates to a digital forensic system, and in particular to an indexing technique for fast retrieval of digital evidence.

본 연구는 지식경제부 및 정보통신연구진흥원의 IT신성장동력기술개발사업의 일환으로 수행한 연구로부터 도출된 것이다.[과제관리번호: 2007-S-019-02, 과제명: 정보투명성 보장형 디지털 포렌식 시스템 개발(Development of Digital Forensic System for Information Transparency)]This study was derived from a study conducted as part of the IT New Growth Engine Technology Development Project of the Ministry of Knowledge Economy and the Ministry of Information and Telecommunications Research and Development. [Task Management Number: 2007-S-019-02, Title: Information Transparency Guaranteed Digital Forensic System Development of Digital Forensic System for Information Transparency]

디지털 포렌식 시스템은 컴퓨터나 디지털 장비에 저장된 디지털 데이터로부터 증거 자료를 수집, 보관, 분석, 보고하기 위한 시스템이다. 통상 저장매체의 복제 및 이미지 인식, 디지털 증거 자료의 보존, 디지털 증거 자료의 검색 및 분석, 디지털 증거 자료의 복구 과정들로 구성된다. 원본 디지털 증거 자료가 손상되거나 변조되는 경우 법적 자료로 활용될 수 없기 때문에, 통상 원본을 손상시키지 않고 원본과 동일하게 복사하여 사용한다. 이 복사물을 통상 증거 이미지라 하 며, 한번 생성된 증거 이미지는 변경될 수 없는 특징이 있다.Digital forensic systems are systems for collecting, storing, analyzing, and reporting evidence from digital data stored on computers and digital equipment. It usually consists of duplication and image recognition of storage media, preservation of digital evidence, retrieval and analysis of digital evidence, and recovery of digital evidence. If the original digital evidence material is damaged or tampered with, it cannot be used as legal data. Therefore, copy and use the same as the original without damaging the original. This copy is commonly referred to as evidence image, and once created, the evidence image is immutable.

한편, 디지털 증거를 수집, 보관, 분석 및 보고하는 포렌식 과정 중에서 증거 자료를 이미지화하여 수집하는 과정과 증거 이미지로부터 증거를 검색하는 분석 과정에 많은 시간이 소요된다. 이중 분석 과정에 소요되는 시간을 줄이기 위해 증거 이미지를 색인한다. 그런데 종래 색인 방식은 색인 단어에 대한 파일 정보를 디렉토리를 포함한 파일명으로 관리하기 때문에, 색인 단계마다 파일명을 처리하는 오버헤드가 발생한다. 예를 들어, “컴퓨터”라는 단어를 색인하는 경우, 이 단어가 ‘A.doc’, ‘C.hwp’, ‘E.ppt’파일에서 발견되었다고 하자. 그러면 “컴퓨터”라는 단어에 대해서 ‘C:/abc/def/A.doc’, ‘C:/abc/deg/op/C.hwp’, ‘E:/kkk/ssl/esd/abd/ssc/E.hwp’와 같이 색인이 될 것이다. 이 같은 색인 방식은 파일명을 처리하는데 오버헤드를 발생시킬 수밖에 없다.Meanwhile, in the forensic process of collecting, storing, analyzing, and reporting digital evidence, the process of imaging and collecting evidence data and analyzing the evidence from the evidence image takes a lot of time. The evidence image is indexed to reduce the time required for the double analysis process. However, since the conventional indexing method manages the file information for the index word as a file name including a directory, there is an overhead of processing the file name for each index step. For example, if you index the word "computer", it is found in the files "A.doc", "C.hwp", and "E.ppt." Then, for the word “computer”, you can use 'C: /abc/def/A.doc', 'C: /abc/deg/op/C.hwp', 'E: / kkk / ssl / esd / abd / ssc / It will be indexed like E.hwp '. This indexing method incurs an overhead in processing file names.

또한, 종래 크기가 큰 파일을 색인하기 위해서는 파일을 작은 크기로 나누어 임시 파일명을 각각 부여하고, 나누어진 파일들을 기준으로 색인을 생성한다. 이후 색인 데이터베이스를 검색하면 나누어진 파일들에 대한 검색 결과가 나오는데, 이를 모두 하나의 파일에서 찾은 결과로 후처리를 반드시 해야 한다. 그래야만 색인 단어에 대한 파일 검색을 제대로 할 수 있기 때문이다.In addition, in order to index a file having a large size in the related art, a file is divided into small sizes, a temporary file name is assigned to each file, and an index is generated based on the divided files. After searching the index database, the search results for the divided files are displayed, which must be post-processed with the results found in one file. That way, you can properly search the file for index words.

그리고 나누어진 파일들에 임시 파일명을 부여할 경우, 이미 있는 파일명과 중복되는 문제가 발생할 수 있다. 예를 들어 설명하면, ‘C:/abc/def/’디렉토리에 있는 ‘A.doc’파일이 대용량 파일이어서 이를 세 개의 파일로 나누고 나누어진 파일들 각각에 대해 ‘A-1.doc’, ‘A-2.doc’,‘A-3.doc’와 같이 임시 파일명을 부여하는 경우, 이 임시 파일명이 기존에 이미 있는 파일명과 중복되는 문제가 발생할 수 있다는 것이다.In addition, if a temporary file name is assigned to the divided files, a problem of overlapping with the existing file name may occur. For example, the file 'A.doc' in the directory 'C: / abc / def /' is a large file, so it is divided into three files and 'A-1.doc', ' If a temporary file name is given, such as A-2.doc 'or' A-3.doc ', a problem may occur in which the temporary file name is duplicated with an existing file name.

종래 색인 과정에서 파일명을 처리하는데 발생하는 오버헤드 문제를 해결할 수 있는 색인 관리 장치 및 방법을 제공함을 목적으로 한다.It is an object of the present invention to provide an index management apparatus and method that can solve the overhead problem of processing a file name in a conventional indexing process.

또한 후처리를 필요로 하지 않는 색인 관리 장치 및 방법을 제공함을 목적으로 한다.It is also an object of the present invention to provide an index management apparatus and method that does not require post-processing.

또한 대용량 파일에 대한 색인을 위해 파일을 작은 크기로 나누어 임시 파일명을 부여함에 의해 발생하는 파일명 중복 문제를 해결할 수 있는 색인 관리 장치 및 방법을 제공함을 목적으로 한다.It is also an object of the present invention to provide an index management apparatus and method for solving a file name duplication problem caused by dividing a file into small sizes and assigning a temporary file name for indexing a large file.

전술한 기술적 과제를 달성하기 위한 디지털 포렌식 수사를 위해 원본과 동일하게 복사된 증거 이미지에 대한 색인 관리 장치는 증거 이미지를 파일 시스템 구조의 디지털 자료로 복원하는 복원부, 상기 복원된 디지털 자료에 속하는 파일들 각각에 고유의 문서번호를 할당하여 문서번호사전으로 관리하는 문서번호사전 관리부, 및 상기 복원된 디지털 자료의 파일로부터 색인어를 추출하고 추출된 색인어별로 색인어가 속하는 파일에 할당된 문서번호를 지정하는 색인 수행부를 포함하는 색인부; 색인부에 의한 색인어별 문서번호가 지정된 색인 정보가 저장되는 색인 저장부; 및 색인 데이터베이스를 참조하여 검색하고자 하는 색인어에 지정된 문서번호를 확인하고, 확인된 문서번호를 가지고 문서번호사전을 참조하여 해당 파일을 검색하는 색인어 검색부;를 포함한다.For the digital forensic investigation to achieve the above technical problem, the index management device for the evidence image copied in the same manner as the original, the restoration unit for restoring the evidence image to the digital data of the file system structure, the file belonging to the restored digital data A document number dictionary management unit for assigning a unique document number to each of them to manage the document number dictionary; An index unit including an index performer; An index storage unit for storing index information in which document numbers for respective index words are designated by the index unit; And an index word search unit that checks a document number assigned to the index word to be searched by referring to the index database and searches the corresponding file by referring to the document number dictionary with the confirmed document number.

문서번호사전 관리부는 디지털 자료에 속하는 디렉토리를 포함한 파일명들 각각에 문서번호를 할당하여 문서번호사전으로 관리함에 일 특징이 있다.The document number dictionary management unit has a feature of assigning a document number to each of file names including a directory belonging to a digital material and managing the document number dictionary.

색인부는 디지털 자료에 속하는 파일별로 색인어를 추출하는 색인어 추출부, 및 상기 추출된 색인어와 추출된 색인어가 검색된 파일에 할당된 문서번호를 모두 통합하여 색인어별 문서번호가 지정되는 색인 정보를 생성하는 색인 정보 생성부를 포함한다.An index unit extracts an index word for each file belonging to a digital material, and an index for integrating both the extracted index word and the document number assigned to the file from which the extracted index word is searched to generate index information for which the document number for each index word is specified. It includes an information generating unit.

색인어 추출부는 색인하고자 하는 파일의 용량이 한번에 색인할 수 있는 용량보다 큰 대용량 파일인 경우 대용량 파일을 다수로 분할한 후 분할 파일별로 색인어를 추출하며, 색인 정보 생성부는 분할 파일들로부터 추출된 색인어들을 모두 취합하여 대용량 파일에 할당된 문서번호와 대응되게 색인 정보를 생성한다.The index word extractor extracts index words for each divided file after dividing a large number of files into large files when the capacity of the file to be indexed is larger than the indexable size at a time. The index information generator extracts index words extracted from the split files. Collect all of them and generate index information corresponding to document numbers assigned to large files.

한편, 전술한 기술적 과제를 달성하기 위한 색인 관리 방법은 디지털 포렌식의 증거 자료를 수집하는 과정에서 원본과 동일하게 복사된 증거 이미지를 파일 시스템 구조의 디지털 자료로 복원하는 단계; 복원된 파일 시스템 구조의 디지털 자료에 포함된 파일들에 대해 각각 고유한 문서번호를 할당하여 문서번호사전으로 관리하는 단계; 복원된 파일 시스템 구조의 디지털 자료에 포함된 파일들로부터 파일 단위로 색인어를 추출하는 단계; 추출된 색인어와 해당 파일의 문서번호가 대응되는 색인 정보를 생성하여 색인 데이터베이스로 관리하는 단계; 색인 데이터베이스를 참조하여 검색하고자 하는 색인어에 지정된 문서번호를 확인하는 단계; 및 확인된 문서번호를 가지고 문서번호사전을 참조하여 해당 파일에 들어있는 색인어를 검색하는 단계;를 포함한다.On the other hand, the index management method for achieving the above-described technical problem comprises the steps of restoring the evidence image copied in the same manner as the original to the digital data of the file system structure in the process of collecting the digital forensic evidence material; Assigning a unique document number to each file included in the digital data of the restored file system structure and managing the document number dictionary; Extracting index words on a file basis from files included in the digital data of the restored file system structure; Generating index information corresponding to the extracted index word and the document number of the file and managing the index information in an index database; Identifying a document number assigned to an index word to be searched by referring to an index database; And searching the index word contained in the file by referring to the document number dictionary with the confirmed document number.

증거 이미지의 디렉토리를 포함한 파일 이름에 문서번호를 순차적으로 할당하고 문서번호사전으로 관리하여 색인 과정에서 파일을 번호로 처리하면, 색인 단게마다 파일 이름을 처리하는 오버헤드를 줄일 수 있다. 또한 대용량 파일을 나누어 색인 및 갱신하는 경우 각 분할 파일마다 임시 파일명을 부여하지 않으므로 기존의 파일명과 중복되는 문제가 발생하지 않고, 더욱이 임시 파일명들에 대한 검색의 후처리를 하지 않아도 된다.By sequentially assigning document numbers to file names including directories of evidence images and managing them with document number dictionaries, files can be numbered in the indexing process, reducing the overhead of processing file names for each index step. In addition, when a large file is divided and indexed and updated, a temporary file name is not given to each split file, so that a problem of overlapping with the existing file name does not occur, and further, post-processing of the search for the temporary file names is not necessary.

전술한, 그리고 추가적인 본 발명의 양상들은 첨부된 도면을 참조하여 설명되는 바람직한 실시예들을 통하여 더욱 명백해질 것이다. 이하에서는 본 발명을 이러한 실시예를 통해 당업자가 용이하게 이해하고 재현할 수 있도록 상세히 설명하기로 한다.The foregoing and further aspects of the present invention will become more apparent through the preferred embodiments described with reference to the accompanying drawings. Hereinafter, the present invention will be described in detail to enable those skilled in the art to easily understand and reproduce the present invention.

도 1은 본 발명의 일 실시예에 따른 색인 관리 장치의 블록도이다.1 is a block diagram of an index management apparatus according to an embodiment of the present invention.

우선 색인 관리 장치에서 색인할 대상인 증거 이미지(100)란 하드디스크 등 저장매체에 저장된 전체 원본 디지털 자료가 동일하게 복사된 포렌식 방식의 특성을 갖는 자료를 말한다. 그리고 색인 관리 장치(200)는 증거 이미지(100)에 대한 색인 및 색인 검색을 수행하기 위한 장치이다. 색인 관리 장치(200)는 그래픽 사용자 인터페이스(Graphic User Interface, GUI)(210), 색인 관리부(220), 색인부(230), 색인 데이터베이스(240), 및 검색부(250)를 포함한다. GUI(210)는 사용자를 위한 그래픽 입출력 인터페이스이다. 사용자는 GUI(210)를 통해 색인의 생 성, 갱신, 삭제 등을 명할 수 있다. 색인 제어부(220)는 색인의 생성, 갱신, 삭제, 취소 등을 제어하고, 또한 색인의 검색을 제어한다. 일 실시예에 있어서, 색인 제어부(220)는 GUI(210)를 통한 사용자의 명령에 따라 색인 동작 혹은 색인어 검색 동작을 제어한다. 색인 관리부(220), 색인부(230), 및 검색부(250) 구성들은 모두 프로그램 코드로 구현 가능하다.First, the evidence image 100 to be indexed by the index management device refers to a material having a forensic characteristic in which all original digital data stored in a storage medium such as a hard disk are identically copied. The index management apparatus 200 is an apparatus for performing an index and an index search on the evidence image 100. The index management apparatus 200 includes a graphical user interface (GUI) 210, an index management unit 220, an index unit 230, an index database 240, and a search unit 250. GUI 210 is a graphical input / output interface for a user. The user may order creation, update, deletion, etc. of the index through the GUI 210. The index control unit 220 controls the generation, update, deletion, cancellation, etc. of the index, and also controls the search of the index. In one embodiment, the index control unit 220 controls an index operation or an index search operation according to a user's command through the GUI 210. The components of the index manager 220, the index 230, and the searcher 250 may be all implemented by program code.

색인부(230)는 증거 이미지(100)로부터 색인을 생성, 갱신, 삭제, 백업하는 역할을 수행한다. 본 발명의 특징적인 양상에 따른 색인부(230)는 증거 이미지(100)로부터 색인어를 추출하고 추출된 색인어별로 색인어가 속하는 파일에 할당된 문서번호를 지정하여 색인한다. 색인부(230)에 의한 색인 정보를 역파일이라 하며, 이는 색인어에 해당하는 문서가 무엇인지를 나타내는 파일이나 데이터 구조를 말한다. 이 역파일은 색인 데이터베이스(240)에 저장된다. 그리고 검색부(250)는 색인 데이터베이스(240)에 저장된 역파일에 기초하여 검색하고자 하는 색인어에 지정된 적어도 하나의 문서번호를 확인하고, 문서번호를 갖는 파일을 찾아 그 파일 내에서 색인어를 검색한다.The index unit 230 serves to generate, update, delete, and back up the index from the evidence image 100. The index unit 230 according to a characteristic aspect of the present invention extracts an index word from the evidence image 100 and designates and indexes a document number assigned to a file to which the index word belongs for each extracted index word. The index information by the index unit 230 is called a reverse file, which refers to a file or data structure indicating what document corresponds to the index word. This reverse file is stored in the index database 240. The search unit 250 checks at least one document number assigned to the index word to be searched based on the reverse file stored in the index database 240, searches for a file having the document number, and searches the index word within the file.

도 2는 도 1의 부분 상세 구성도이다.2 is a partial detailed configuration diagram of FIG. 1.

색인부(230)는 복원부(231), 문서번호사전 관리부(232), 및 색인 수행부(233)를 포함한다. 복원부(231)는 증거 이미지(100)를 파일 시스템 구조를 갖는 디지털 자료(100-1)로 복원한다. 파일 시스템 구조로 변환이 되어야만 색인이 가능하기 때문이다.The indexing unit 230 includes a restoring unit 231, a document number dictionary management unit 232, and an index performing unit 233. The restoration unit 231 restores the evidence image 100 to the digital material 100-1 having the file system structure. This is because indexing is possible only after conversion to the file system structure.

문서번호사전 관리부(232)는 복원부(231)에 의해 복원된 파일 시스템 구조를 갖는 디지털 자료(100-1)의 파일들 각각에 고유의 문서번호를 순차적으로 할당한다. 일 실시예에 있어서, 문서번호사전 관리부(232)는 디렉토리를 포함한 파일명에 문서번호를 순차적으로 할당한다. 이는 도 3에 예시되어 있다. 이 같이 하면, 파일의 문서번호만을 알면 디렉토리 정보를 통해 파일의 저장 위치를 파악할 수 있고, 또한 파일명을 통해 해당 파일을 알 수 있다. 이렇게 파일들에 할당된 문서번호들에 대한 정보는 문서번호사전으로 문서번호사전 데이터베이스(260)에 저장되어 관리된다.The document number dictionary management unit 232 sequentially assigns a unique document number to each of the files of the digital material 100-1 having the file system structure restored by the restoration unit 231. FIG. In one embodiment, the document number dictionary management unit 232 sequentially assigns document numbers to file names including directories. This is illustrated in FIG. 3. In this way, if only the document number of the file is known, the storage location of the file can be grasped through the directory information, and the file can be known through the file name. The information about the document numbers assigned to the files is stored in the document number dictionary database 260 as document number dictionary and managed.

색인 수행부(233)는 복원부(231)에 의해 복원된 파일 시스템 구조를 갖는 디지털 자료(100-1)에 속하는 파일들로부터 색인어를 추출한다. 그리고 추출된 색인어별로 그 색인어가 속하는 파일에 할당된 문서번호를 지정하는 색인 방식을 통해 역파일을 생성하여 색인 데이터베이스(240)에 저장 관리한다.The index performer 233 extracts index words from files belonging to the digital material 100-1 having the file system structure restored by the restorer 231. The reverse file is generated and stored in the index database 240 through an index method for designating a document number assigned to a file to which the index word belongs for each index word extracted.

검색부(250)는 검색하고자 하는 색인어에 대해 색인 데이터베이스(240)를 참조하여 문서번호를 확인한다. 그리고 확인된 문서번호를 가지고 문서번호사전 데이터베이스(260)에 저장된 문서번호사전을 통해 디렉토리를 포함한 파일명을 확인한다. 검색부(250)는 확인된 디렉토리를 포함한 파일명을 가지고 복원부(231)에 의해 복원된 파일 시스템 구조를 갖는 디지털 자료로부터 해당 파일을 찾은 후 그 파일에서 색인어를 검색한다.The search unit 250 checks the document number with reference to the index database 240 for the index word to be searched. And through the document number dictionary stored in the document number dictionary database 260 with the confirmed document number to check the file name including the directory. The search unit 250 searches for an index word in the file after finding the file from the digital data having the file system structure restored by the restoration unit 231 with the file name including the identified directory.

도 4은 도 3의 색인 수행부(233)의 구체적인 구성 예시도이다.4 is a diagram illustrating a detailed configuration of the index performing unit 233 of FIG. 3.

색인 수행부(233)는 색인어 추출부(233-1), 색인어 정렬부(233-2), 및 색인 정보 생성부(233-3)를 포함한다. 색인어 추출부(233-1)는 파일 시스템 구조의 디 지털 자료(100-1)에 속하는 파일들로부터 색인어를 추출한다. 색인어는 예를 들어 문서 내 문장을 구성하는 명사만이 될 수 있다. 색인어 추출부(233-1)는 파일별로 모든 색인어를 추출하고, 색인어 추출부(233-1)에 포함되는 빈도수 산출부(233-1a)는 파일별로 각각의 색인어에 대한 빈도수를 산출한다. 여기서 빈도수 산출부(233-1a)는 추가적인 구성이다. 색인어 정렬부(233-2)는 색인어 추출부(233-1)에 의해 추출된 색인어와 각 색인어에 대한 빈도수를 각 파일별로 색인어를 기준으로 정렬한다. 예를 들어, 색인어를 가나다 순이나 알파벳 순으로 정렬하는 것이다. 이에 대한 예시는 도 5에 도시되어 있다. 색인어 정렬부(233-2)는 추가적인 구성으로서, 생략될 수 있다. 그리고 도 5에는 예시되어 있지 않으나, 빈도수 산출부(233-1a)에 의해 산출된 빈도수가 기록되기 위한 필드가 각 레코드에 더 추가될 수 있다. 도 5와 같은 테이블 정보는 한 번 이용되고 삭제되는 것이 아니라 별도로 저장 관리될 수 있다. 혹 색인어 정렬부(233-2)가 색인 수행부(233)에 구성되어 있지 않다면, 도 5에서 색인어 기준으로 정렬되지 않은 테이블 정보가 별도로 저장 관리될 것이다.The index performing unit 233 includes an index word extracting unit 233-1, an index word arranging unit 233-2, and an index information generating unit 233-3. The index word extracting unit 233-1 extracts an index word from files belonging to the digital data 100-1 of the file system structure. Index terms can be, for example, only nouns that form sentences in a document. The index word extractor 233-1 extracts all index words for each file, and the frequency calculator 233-1a included in the index word extractor 233-1 calculates a frequency for each index word for each file. Here, the frequency calculator 233-1 a has an additional configuration. The index word sorting unit 233-2 sorts the index word extracted by the index word extracting unit 233-1 and the frequency of each index word based on the index word for each file. For example, sorting index words alphabetically or alphabetically. An example of this is shown in FIG. 5. The index word aligning unit 233-2 may be omitted as an additional configuration. Although not illustrated in FIG. 5, a field for recording the frequency calculated by the frequency calculator 233-1a may be further added to each record. The table information as shown in FIG. 5 may be stored and managed separately instead of being used once and deleted. If the index word sorter 233-2 is not configured in the index performer 233, table information not sorted based on the index word in FIG. 5 may be separately stored and managed.

색인 정보 생성부(233-3)는 색인어 정렬부(233-2)에 의해 파일별로 정렬된 색인어와 빈도수 그리고 문서번호를 통합하여 역파일을 생성한다. 색인어 정렬부(233-2)가 구성되지 않은 경우, 색인 정보 생성부(233-3)는 색인어 추출부(233-1)에 의해 파일별로 추출된 색인어와 그 파일의 문서번호를 통합하여 역파일을 생성한다. 역파일 테이블에 대한 예시가 도 6에 도시되어 있다. 바람직하게 색인 정보 생성부(233-3)는 도 6에 예시되어 있는 바와 같이 색인어를 기준으로 가나다 순에 따라 레코드들을 순차적으로 정렬한다. 이 같이 할 경우, 색인어 정렬부(233-2)에 의해 파일 단위로 색인어 정렬이 먼저 이루어지면, 색인 정보 생성부(233-3)에 의한 역파일 생성 속도는 빠르게 진행될 수 있다.The index information generating unit 233-3 generates the reverse file by integrating the index word, the frequency, and the document number sorted by file by the index word arranging unit 233-2. When the index word arranging unit 233-2 is not configured, the index information generating unit 233-3 integrates the index word extracted for each file by the index word extracting unit 233-1 and the document number of the file to reverse the file. Create An example of an inverted file table is shown in FIG. 6. Preferably, the index information generation unit 233-3 sorts the records sequentially according to the order of the index, as illustrated in FIG. In this case, if the index word sorting is performed on a file basis by the index word sorting unit 233-2 first, the reverse file generation speed by the index information generating unit 233-3 can be accelerated.

일 실시예에 있어서, 색인 정보 생성부(233-3)는 최초에 첫 번째 파일에 대한 역파일을 생성하고, 이후 파일에 대해서 색인 데이터베이스(240)에 저장된 역파일을 갱신하는 방식을 취할 수 있다. 다른 실시예에 있어서, 색인 정보 생성부(233-3)는 여러 파일 그룹 단위로 역파일을 생성하고, 이후 파일 그룹에 대해서 색인 데이터베이스(240)에 저장된 역파일을 갱신하는 방식을 취할 수도 있다. 또 다른 실시예에 있어서, 색인 정보 생성부(233-3)는 색인하고자 하는 모든 파일들에 대해 한번에 역파일을 생성할 수도 있다.In one embodiment, the index information generator 233-3 may first generate a reverse file for the first file and then update the reverse file stored in the index database 240 for the file. . In another embodiment, the index information generator 233-3 may generate a reverse file in units of several file groups, and then update the reverse file stored in the index database 240 for the file group. In another embodiment, the index information generator 233-3 may generate a reverse file for all files to be indexed at once.

한편, 파일의 용량이 큰 관계로 한번에 색인을 할 수 없는 경우가 있다. 예를 들어, 한번에 색인할 수 있는 파일의 최대용량이 20MB인 것으로 가정하면, 50MB 크기를 갖는 파일에 대해서는 그 파일을 분할하여 색인하여야 한다. 따라서 색인부(230)는 20MB 단위로 3등분한 뒤, 첫 번째 분할 파일을 색인하고, 두 번째 분할 파일을 추가로 색인하여 갱신한 뒤, 나머지 분할 파일을 추가로 색인하여 갱신하는 방식을 취한다. 증거 이미지에 있는 파일들은 내용 변경이 발생하지 않기 때문에, 같은 이름의 파일에 대하여 색인을 갱신하는 것은 크기가 커서 나누어 색인하는 경우일 뿐이다. 이렇게 같은 이름의 대용량 파일을 다수로 분할한 후 분할 파일 단위로 색인할 경우, 분할 파일별로 색인된 결과를 모두 합쳐 하나의 문서번호로 처리한다.On the other hand, because of the large file size, indexing may not be possible at one time. For example, assuming that the maximum size of a file that can be indexed at one time is 20MB, a file having a size of 50MB should be divided and indexed. Therefore, the index unit 230 divides the third divided into 20MB units, indexes the first divided file, additionally indexes and updates the second divided file, and additionally indexes and updates the remaining divided files. . Since the files in the evidence image do not change their contents, updating the index for a file of the same name is only a large index. If you divide a large number of files with the same name and then index them in separate file units, all the indexed results for each split file are combined and processed as a single document number.

참고로, 도 4에서는 색인어 추출부(233-1)에서 색인 대상 파일의 용량이 대용량인지 아닌지 판단한 후 대용량인 경우는 파일을 나누어 색인어 추출 과정을 진행할 수 있다. 그리고 색인 정보 생성부(233-3)는 하나의 대용량 파일에서 분할된 파일들에 대해서는 동일한 문서번호로 처리한다. 그리고 동일한 문서번호로 처리된다 하더라도 하나의 분할 파일에 대한 색인 데이터가 동일한 문서번호를 갖는 다른 분할 파일의 색인 데이터에 덮어쓰기 되는 것이 아니라 추가되도록 한다. 즉, 색인 정보 생성부(233-3)는 하나의 대용량 파일에 대한 다수의 분할 파일들에 대해 동일한 문서번호로 인식하며, 첫 번째 분할 파일에 대한 색인 데이터에 두 번째 분할 파일에 대한 색인 데이터와 세 번째 분할 파일에 대한 색인 데이터를 추가하는 방식을 통해 역파일을 갱신한다.For reference, in FIG. 4, after determining whether the index target file has a large capacity or not, the index word extracting unit 233-1 may divide the file and proceed with the index word extraction process. The index information generator 233-3 processes the files divided into one large file with the same document number. Even if the same document number is processed, the index data for one divided file is added to the index data of another divided file having the same document number, rather than being overwritten. That is, the index information generation unit 233-3 recognizes the same document number for the plurality of split files for one large file, and the index data for the second split file and the index data for the second split file. Update the reverse file by adding index data for the third split file.

본 발명의 추가적인 양상에 따라 색인 정보 생성부(233-3)는 역파일의 일부 색인 정보를 삭제할 수 있다. 예를 들어, 사용자가 일부 디렉토리나 특정 파일 등에 대한 색인 정보를 삭제할 것을 요청하면, 색인 정보 생성부(233-3)는 역파일에서 삭제 대상이 되는 문서번호와 관련된 색인 정보를 모두 삭제하고 역파일을 갱신한다. 일 실시예에 있어서, 색인 정보 생성부(233-3)는 도 5와 같은 테이블 정보를 참조하여 삭제 대상이 되는 파일의 테이블 정보는 제외시키고 나머지 파일들에 대한 테이블 정보를 통합하여 역파일을 새로 생성하여 색인 데이터베이스(240)에 덮어쓰는 방식으로 일부 색인 정보를 삭제할 수 있다.According to an additional aspect of the present invention, the index information generator 233-3 may delete some index information of the reverse file. For example, when a user requests to delete index information for some directories or specific files, the index information generator 233-3 deletes all index information related to the document number to be deleted from the reverse file and reverse the file. Update the. In one embodiment, the index information generating unit 233-3 excludes the table information of the file to be deleted by referring to the table information as shown in FIG. 5 and integrates the table information for the remaining files to refresh the inverse file. Some index information may be deleted by generating and overwriting the index database 240.

도 7은 본 발명의 일 실시예에 따른 색인 생성/갱신 방법의 흐름도이다.7 is a flowchart of an index generation / update method according to an embodiment of the present invention.

증거 이미지에 대한 색인 정보 생성이나 갱신시, 색인부(230)는 증거 이미지 를 파일 시스템 구조의 디지털 자료로 복원한다(단계 S700). 그리고 파일 시스템 구조의 디지털 자료에 속하는 파일들에 대해 각각 고유한 문서번호를 할당하고, 이를 문서번호사전으로 관리한다(단계 S710). 이후 색인부(230)는 파일별로 색인어를 추출하고, 전체 색인어를 통합하여 문서번호 필드와 색인어 필드 그리고 추가로 빈도수 필드를 포함하는 레코드들로 구성되는 역파일을 생성한다(단계 S720).When generating or updating index information on the evidence image, the index unit 230 restores the evidence image to digital data of a file system structure (step S700). Then, a unique document number is assigned to each of the files belonging to the digital material of the file system structure, and this is managed as a document number dictionary (step S710). Thereafter, the index unit 230 extracts an index word for each file and generates an inverted file composed of records including a document number field, an index word field, and a frequency field additionally by integrating the entire index word (step S720).

단계 S720에 대해 구체적인 예시를 들면, 우선 색인부(230)는 파일 단위로 색인을 수행하는데, 이에 앞서 파일이 대용량 파일인지를 확인한다. 대용량 파일이 아닌 경우, 색인부(230)는 파일에서 색인어를 추출한다. 그리고 부가적으로 추출된 색인어의 빈도수를 산출할 수 있다. 색인부(230)는 파일별로 색인어를 순차적으로 정렬한다. 빈도수가 산출된 경우, 색인어를 기준으로 한 정렬에 따라 빈도수도 해당 색인어에 대응되게 함께 정렬된다.As a specific example for step S720, the indexing unit 230 first performs indexing on a file basis, and checks whether the file is a large file. If the file is not a large file, the index unit 230 extracts an index word from the file. In addition, the frequency of the extracted index word can be calculated. The index unit 230 sequentially sorts index words for each file. When the frequency is calculated, the frequency is also sorted together corresponding to the index, according to the sorting based on the index.

색인부(230)는 파일별로 정렬된 색인어와 그 파일에 할당된 문서번호가 지정된 역파일을 생성한다. 역파일을 생성하는 방식에 있어서, 색인부(230)는 색인 대상이 되는 모든 파일들에 대해 한 번에 역파일을 생성할 수 있고, 일정 수의 파일 그룹 단위로 역파일을 생성한 후 갱신할 수도 있으며, 최초 색인 대상이 되는 하나의 파일에 대해 역파일을 생성한 후 계속하여 갱신할 수도 있다. The index unit 230 generates an inverse file in which index words sorted by files and document numbers assigned to the files are designated. In the method of generating the reverse file, the index unit 230 may generate the reverse file at once for all the files to be indexed, and generate the reverse file in a predetermined number of file group units and then update the reverse file. You can also create a reverse file for one file that is the first index, and then update it.

한편, 색인하고자 하는 파일이 대용량 파일인 경우, 색인부(230)는 대용량 파일을 한번에 색인 가능하도록 다수로 분할하고 각 분할 파일 단위로 색인을 수행한다. 그리고 이들 분할 파일은 동일한 문서번호로 인식되며, 각 분할 파일 단위로 수행된 색인 정보는 모두 하나의 문서번호와 관련되게 취합된다.On the other hand, if the file to be indexed is a large file, the index unit 230 divides the large file so that it can be indexed at once, and indexes each divided file unit. These divided files are recognized with the same document number, and the index information performed in each divided file unit is collected in association with one document number.

도 8은 본 발명의 일 실시예에 따른 색인어 검색 방법의 흐름도이다.8 is a flowchart illustrating an index word search method according to an embodiment of the present invention.

색인어 검색 명령이 있으면, 색인부(230)는 검색하고자 하는 색인어에 대한 문서번호를 확인한다(단계 S800). 이는 색인 데이터베이스(240)에 저장된 역파일을 통해 확인될 수 있다. 그 다음 색인부(230)는 확인된 문서번호에 대한 파일을 확인을 확인한다(단계 S810). 이는 문서번호사전 데이터베이스(260)에 저장된 문서번호사전을 통해 확인될 수 있다. 색인부(230)는 확인된 파일을 검색하고 검색된 파일 내에서 색인어를 검색한다(단계 S820). 그리고 색인어 검색 결과가 GUI(230)를 통해 디스플레이되도록 출력한다(단계 S830).If there is an index word search command, the index unit 230 confirms a document number for the index word to be searched (step S800). This can be confirmed through a reverse file stored in the index database 240. The index unit 230 then confirms the confirmation of the file with respect to the confirmed document number (step S810). This may be confirmed through a document number dictionary stored in the document number dictionary database 260. The index unit 230 searches for the identified file and searches for an index word within the searched file (step S820). The index word search result is output to be displayed through the GUI 230 (step S830).

한편, 흐름도로 도시되지는 않았으나, 일부 색인 정보에 대한 삭제 명령이 있으면, 색인부(230)는 역파일에서 삭제하고자 하는 문서번호와 관련된 색인 정보가 모두 삭제되도록 역파일을 재구성한다. 일부 색인 정보를 삭제하는 방식은 언급한 바와 같다.On the other hand, although not shown in the flow chart, if there is a delete command for some index information, the index unit 230 reconstructs the reverse file so that all the index information associated with the document number to be deleted from the reverse file is deleted. The manner of deleting some index information is as mentioned.

이제까지 본 발명에 대하여 그 바람직한 실시예들을 중심으로 살펴보았다. 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.So far I looked at the center of the preferred embodiment for the present invention. Those skilled in the art will appreciate that the present invention can be implemented in a modified form without departing from the essential features of the present invention. Therefore, the disclosed embodiments should be considered in an illustrative rather than a restrictive sense. The scope of the present invention is shown in the claims rather than the foregoing description, and all differences within the scope will be construed as being included in the present invention.

도 1은 본 발명의 일 실시예에 따른 색인 관리 장치의 블록도.1 is a block diagram of an apparatus for managing an index according to an embodiment of the present invention.

도 2는 도 1의 부분 상세 구성 예시도.Figure 2 is a partial detailed configuration example of FIG.

도 3은 문서번호사전 예시도.3 is an exemplary document number dictionary.

도 4은 도 2의 색인 수행부의 구체적인 구성 예시도.4 is a diagram illustrating a detailed configuration of an index performing unit of FIG. 2.

도 5는 색인어 정렬 예시도.5 is an illustration of index word alignment.

도 6은 역파일 예시도.6 is an inverted file example.

도 7은 본 발명의 일 실시예에 따른 색인 생성/갱신 방법의 흐름도.7 is a flowchart of an index generation / update method according to an embodiment of the present invention.

도 8은 본 발명의 일 실시예에 따른 색인어 검색 방법의 흐름도.8 is a flowchart of an index word retrieval method according to an embodiment of the present invention;

<도면의 주요 부분에 대한 부호의 설명><Explanation of symbols for the main parts of the drawings>

210 : GUI 220 : 색인 제어부210: GUI 220: Index control unit

230 : 색인부 231 : 복원부230: indexing unit 231: restoring unit

232: 문서번호사전 관리부 233 : 색인 수행부232: document number dictionary management unit 233: index execution unit

233-1 : 색인어 추출부 233-2 : 색인어 정렬부233-1: Index word extraction unit 233-2: Index word alignment unit

233-3 : 색인 정보 생성부 240 : 색인 데이터베이스233-3: Index information generation unit 240: Index database

250 : 검색부 260 : 문서번호사전 데이터베이스250: search unit 260: document number dictionary database

Claims

delete

An index management apparatus for evidence images copied identically to the original for digital forensic investigation,

A reconstruction unit for reconstructing the evidence image into digital data of a file system structure, a document number dictionary management unit for assigning a unique document number to each of the files belonging to the reconstructed digital data to manage the document number dictionary, and the restored digital An indexing unit including an indexing unit for extracting an index word from a file of data and specifying a document number assigned to a file to which the index word belongs for each extracted index word;

An index storage unit for storing index information for designating a document number for each index word by the index unit; And

An index word search unit for checking a document number assigned to an index word to be searched by referring to the index database, and searching the corresponding file by referring to the document number dictionary with the identified document number;

The document number dictionary management unit assigns a document number to each of file names including a directory belonging to the digital material, and manages the document number dictionary.

The method of claim 2,

The index unit combines all of the extracted index word and the document number assigned to the file from which the extracted index word is searched, and index information for specifying a document number for each index word. And an index information generating unit to generate.

The method of claim 3,

The extracting unit includes a frequency calculating unit calculating a frequency of the extracted index word for each file,

And the index information generator generates a reverse file by additionally reflecting the frequency of index words for each index word and document number.

The method of claim 3,

The index unit may further include an index word sorting unit that sorts the extracted index words by files according to sorting criteria.

And the index information generator generates index information after sorting by the index word sorter.

The method of claim 3,

The index word extracting unit extracts an index word for each divided file after dividing the large file into a plurality of files when the capacity of the file to be indexed is larger than the capacity that can be indexed at one time.

The index information generation unit collects all index words extracted from the split files to generate index information corresponding to the document number assigned to the large file, splits the large file to index the first split file, and indexes the remaining split files. Index management apparatus, characterized in that for updating.

The method of claim 3,

And the index information generating unit reconfigures the index information stored in the index database so that index information related to the deleted designated document number is deleted when there is an index information deleting command associated with at least one document number.

Restoring the evidence image copied in the same manner as the original to the digital data of the file system structure in the process of collecting the digital forensic evidence data;

Assigning a unique document number to each file included in the digital data of the restored file system structure and managing the document number dictionary;

Extracting index words on a file basis from files included in the digital data of the restored file system structure;

Generating index information corresponding to the extracted index word and the document number of the file and managing the index information in an index database;

Identifying a document number assigned to an index word to be searched by referring to the index database; And

Searching the index word contained in the file with reference to the document number dictionary with the identified document number; including,

The managing of the document number dictionary may include managing a document number dictionary by assigning a document number to each file name including a directory belonging to the digital material.

The method of claim 8,

The extracting of the index word may include checking whether the capacity of the file to be indexed is a large file larger than the capacity to be indexed at one time; splitting the large file into a plurality of large files, Further extracting index terms for each file,

In the managing of the index database, the index information generating unit collects all index words extracted from the split files to generate index information corresponding to the document number assigned to the large file, and splits the large file into the first split file. Index management and updating the index of the remaining split files.

The method of claim 8,

Newly configuring index information stored in the index database such that index information relating to at least one document number is deleted if there is an instruction to delete index information related to at least one document number;

Index management method characterized in that it further comprises.