KR20140000369A

KR20140000369A - Forensic analysis method and system for document files

Info

Publication number: KR20140000369A
Application number: KR1020120067113A
Authority: KR
Inventors: 이상진; 박정흠; 박민수; 정수봉; 김상현; 홍일영
Original assignee: 고려대학교 산학협력단; 대한민국(관리부서 대검찰청)
Priority date: 2012-06-22
Filing date: 2012-06-22
Publication date: 2014-01-03
Also published as: KR101374239B1

Abstract

The present invention relates to a forensic analysis method and a system for document files. The forensic analysis method comprises the following steps: a file receiving step receiving a file which is an object to be forensically analyzed; a file area checking step checking if there is a normal area or an unassigned area in the received file; a file searching step searching a compound document file in the normal or unassigned areas; a verifying step verifying the compound document file; and a data restoring step restoring data in the unassigned area of the compound document file. The forensic analysis method and system for document file can restore data stored in a damaged compound document file or an unassigned area of a damaged compound document file in which data is not assigned. [Reference numerals] (AA) Start; (BB) Receiving a file for performing forensic analysis; (CC) Mounted storage device; (DD) File/directory; (EE) Dump file in the unassigned area of a file; (FF) End; (S121) Kind of the file?; (S122) Analyzing a file system for determining into a normal area or the unassigned area; (S123) Received file is determined to be at the unassigned area; (S124) Received file is determined to be at the normal area; (S130) Searching a compound document file present in the normal area or unassigned area of the file; (S140) Validation for the searched compound document file is performed; (S150) Restoring of data in the unassigned area of the compound document file

Description

Forensic analysis method and system for document files

본 발명은 문서파일의 포렌식 분석 방법 및 시스템에 관한 것으로, 특히 문서파일의 손상된 일부 영역에 존재하는 데이터를 복구하는 문서파일의 포렌식 분석 방법 및 시스템에 관한 것이다.
The present invention relates to a method and system for forensic analysis of a document file, and more particularly, to a method and system for forensic analysis of a document file for recovering data existing in some damaged areas of the document file.

산업의 발전에 따라 무수히 많은 양의 문서가 매일 발생되고 있으며, 이와 같이 문서들은 인터넷과 같은 각종 네트워크를 통해 전세계의 다른 사용자에게 전달되고 있다. As the industry develops, millions of documents are generated every day, and the documents are transmitted to other users around the world through various networks such as the Internet.

특히, 이러한 문서는 다양한 문서생성프로그램을 통해 생성되는데, 그 중에서도 마이크로소프트에서 개발한 복합 파일 이진 형식을 갖는 문서파일은 데이터가 단일 파일 내 일종의 파일 시스템과 유사한 개념의 계층 구조로서 이루어짐에 따라, 데이터를 체계적으로 관리할 수 있어, 복잡하고 다양한 데이터를 저장해야하는 현대에서 주로 사용되고 있다. In particular, these documents are generated through various document generating programs. Among them, document files having a complex file binary format developed by Microsoft have data as a hierarchical structure similar to a kind of file system in a single file. It can be managed systematically, and it is mainly used in modern times that needs to store complex and diverse data.

특히, 이러한 복합 파일 이진 형식을 사용하는 문서파일(이하, 복합문서파일이라고 한다.)이 각종 범죄에 이용되어 사건의 중요한 증거 또는 실마리가 되는 경우, 상기 복합문서파일을 분석하는 디지털 포렌식(Digital Forensics)이 수행되고 있다. In particular, when a document file using the compound file binary format (hereinafter referred to as a compound document file) is used for various crimes and becomes an important evidence or clue of an incident, digital forensics for analyzing the compound document file is used. ) Is being performed.

이러한, 디지털 포렌식을 수행하기 위해서는 복합문서파일을 수집하고, 이를 분석하며, 또한 상기 복합문서파일의 일부가 삭제되는 경우에 대비하여, 복합문서파일 내 사용되지 않는 영역 즉, 비할당영역에 존재하는 데이터 또한 복구해야한다. In order to perform the digital forensics, a compound document file is collected, analyzed, and also prepared in a case where a part of the compound document file is deleted, which is present in an unused area of the compound document file. Data must also be recovered.

하지만, 이러한 복합문서파일을 분석하고, 특히 비할당영역 내 존재하는 데이터를 복구하기가 어려움에 따라 각종 범죄에 관련된 중요 증거로서 상기 복합문서파일이 결정적인 역할을 하기가 어렵다는 문제점이 발생했다. However, it is difficult to analyze such compound document files, and in particular, to recover data existing in the unallocated area, which makes it difficult for the compound document file to play a decisive role as important evidence related to various crimes.

상술한 바와 같이, 문서파일의 포렌식 분석 방법 및 시스템에 대한 선행기술을 살펴보면 다음과 같다. As described above, the prior art for the forensic analysis method and system of the document file is as follows.

선행기술 1은 한국등록특허 제 0932537호 (2009.12.9)로서, 이미지 필터를 이용한 포렌식 증거 분석 시스템 및 방법에 관한 것이다. 이러한 선행기술 1은 디지털 증거의 사본을 작성하고, 디지털 증거의 원본과 사본이 동일한지 확인하여 디지털 증거의 원본을 보관하며, 디지털 증거의 사본을 정해진 범주에 따라 학습 모델에 의해 생성된 이미지 필터링 모델에 의해 이미지 파일들을 특정 범주로 분류하여 증거를 분석하여 증거 분석 결과 보고서를 작성함으로써, 하드 디스크에 담긴 모든 이미지 파일을 신속하게 분석하여 증거 분석 시간을 단축할 수 있다. Prior art 1 relates to Korean Patent Registration No. 0932537 (2009.12.9), which relates to a forensic evidence analysis system and method using an image filter. This prior art 1 creates a copy of the digital evidence, verifies that the original and the copy of the digital evidence are identical, archives the original of the digital evidence, and copies the digital evidence into an image filtering model generated by the learning model according to a defined category. By classifying image files into specific categories and analyzing the evidence to produce an evidence analysis result report, it is possible to quickly analyze all the image files contained in the hard disk and shorten the analysis time of the evidence.

또한, 선행기술 2는 한국등록특허 제0882864호(2009.2.3)로서, 디지털 포렌식 시스템을 위한 대용량 데이터 고속 검색 시스템 및 방법에 관한 것이다. 이러한 선행기술 2는 디지털 증거를 분석하기 위한 디지털 포렌식 시스템에서 대용량의 디스크 이미지로부터 파일 시스템을 구성하여 파일별로 클러스터를 재배열하고, 디스크 이미지 내의 텍스트 정보를 가지고 있는 파일들을 텍스트 파일로 변환한 후, 패턴 매칭 보드를 이용하여 비트단위검색에 의해 특정 키워드나 통상적인 표현을 빠르고 정확하게 검색할 수 있다.
In addition, the prior art 2, Korean Patent Registration No. 0882864 (2009.2.3), relates to a high-speed data retrieval system and method for a digital forensic system. This prior art 2, in a digital forensic system for analyzing digital evidence, configures a file system from a large disk image, rearranges clusters by file, converts files having text information in the disk image into text files, and Using a pattern matching board, you can quickly and accurately search for specific keywords or common expressions by bitwise search.

상기와 같은 종래 기술의 문제점을 해결하기 위해, 본 발명은 복합문서파일의 비할당영역에 존재하는 데이터의 복구 및 분석을 수행할 수 있는 문서파일의 포렌식 분석 방법 및 시스템을 제공하고자 한다.
In order to solve the above problems of the prior art, the present invention is to provide a forensic analysis method and system for document files that can perform the recovery and analysis of data existing in the unallocated area of the compound document file.

위와 같은 과제를 해결하기 위한 본 발명의 한 실시 예에 따른 문서파일의 포렌식 분석 방법은 포렌식 분석을 수행하고자 하는 파일을 수신하는 파일수신단계; 수신한 상기 파일 내 정상영역 또는 비할당영역이 존재하는지 여부를 확인하는 파일영역판단단계; 상기 파일의 정상영역 또는 비할당영역 내 존재하는 복합문서파일을 검색하는 파일검색단계; 상기 복합문서파일에 대한 검증을 수행하는 검증단계; 및 상기 복합문서파일의 비할당영역 내 데이터에 대한 복구를 수행하는 데이터복구단계;를 포함한다. Forensic analysis method of a document file according to an embodiment of the present invention for solving the above problems is a file receiving step of receiving a file to perform forensic analysis; A file area determination step of checking whether a normal area or an unassigned area exists in the received file; A file searching step of searching for a compound document file existing in a normal area or an unallocated area of the file; A verification step of performing verification on the compound document file; And a data recovery step of recovering data in the unallocated area of the compound document file.

보다 바람직하게는 조사대상매체에 마운트된 저장장치, 파일의 비할당영역 내 존재하는 덤프파일, 파일, 디렉터리 중 적어도 하나를 수신하는 파일수신단계를 포함할 수 있다. More preferably, the method may include a file receiving step of receiving at least one of a storage device mounted on a medium to be examined, a dump file existing in an unallocated area of a file, a file, and a directory.

보다 바람직하게는 상기 조사대상매체에 마운트된 저장장치를 입력받는 경우, 입력받은 상기 마운트된 저장장치에 대한 파일 시스템을 분석하여 데이터가 저장되어 있는 정상영역 또는 데이터가 저장되지 않은 비할당영역으로 나누어 판단하는 제1 판단과정; 상기 파일의 비할당영역 내 존재하는 덤프파일을 입력받는 경우, 입력받은 파일은 비할당영역으로 판단하는 제2 판단과정; 및 상기 파일 또는 디렉터리를 입력받은 경우, 상기 파일 또는 디렉터리는 정상영역으로 판단하는 제3 판단과정; 중 적어도 하나의 과정을 수행하는 파일영역판단단계를 포함할 수 있다. More preferably, when receiving a storage device mounted on the medium to be examined, the file system for the mounted storage device is analyzed and divided into a normal area in which data is stored or an unallocated area in which data is not stored. A first judging process for judging; A second determination step of determining whether the received file is an unallocated area when receiving a dump file existing in the unallocated area of the file; And a third determining step of determining that the file or directory is a normal area when the file or directory is received. A file area determination step of performing at least one process may be included.

보다 바람직하게는 상기 정상영역 내 존재하는 복합문서파일을 검색하는 정상영역파일검색과정; 및 상기 비할당영역 내 존재하는 복합문서파일을 검색하는 비할당영역파일검색과정;중 하나의 과정을 수행하는 파일검색단계를 포함할 수 있다. More preferably, the normal region file searching process of searching for a compound document file existing in the normal region; And an unassigned area file searching process for searching for the composite document file existing in the unallocated area.

보다 바람직하게는 수신한 파일 내 복합문서를 나타내는 시그니처가 존재하는지 여부를 검색하는 시그니처검색과정; 상기 시그니처가 검색된 복합문서파일이 고유의 디렉터리 개체를 포함하는지 여부를 확인하는 디렉터리개체확인과정; 및 상기 복합문서파일의 헤더를 검증하여 상기 복합문서파일 내 문서상태를 확인하는 헤더검증과정; 을 포함하는 정상영역파일검색과정을 포함할 수 있다. More preferably, a signature retrieval process of retrieving whether a signature representing a compound document in the received file exists; A directory object checking step of checking whether the compound document file in which the signature is found includes a unique directory object; And a header verification process of verifying a document state in the compound document file by verifying a header of the compound document file. It may include a normal area file search process including a.

보다 바람직하게는 수신한 파일의 헤더구조를 분석하여 복합문서를 나타내는 시그니처가 존재하는지 여부를 검색하는 시그니처검색과정; 상기 시그니처가 검색된 복합문서파일의 시그니처 검색위치에 따라 상기 복합문서파일의 압축여부를 확인하는 압축확인과정; 상기 복합문서파일의 헤더구조를 분석하여 상기 복합문서파일이 사용하는 섹터의 크기를 확인하는 섹터크기확인과정; 기설정된 파일할당표에 할당된 섹터의 개수를 확인하는 섹터개수확인과정; 상기 파일할당표가 저장된 섹터를 관리하는 정보를 나타내는 확장 파일할당표의 사용여부를 확인하는 확장파일할당표사용확인과정; 상기 확장 파일할당표를 사용하는 경우, 상기 확장 파일할당표의 시작 섹터정보에 기초하여 상기 확장 파일할당표를 해석한 후, 모든 복합문서파일에 대한 파일할당표를 획득하는 파일할당표획득과정; 및 상기 파일 할당표에 기초하여 상기 복합문서파일의 크기를 연산하는 파일크기연산과정; 을 포함하는 비할당영역파일검색과정을 포함할 수 있다. More preferably, a signature retrieval process of retrieving whether a signature representing a compound document exists by analyzing a header structure of the received file; A compression checking step of checking whether the compound document file is compressed according to a signature search position of the compound document file in which the signature is found; A sector size checking step of analyzing a header structure of the compound document file to check the size of a sector used by the compound document file; A sector count checking step of checking the number of sectors allocated to a preset file allocation table; An extension file assignment table usage checking step of checking whether or not an extension file allocation table indicating information for managing the sector in which the file allocation table is stored is used; A file allocation table acquiring step of acquiring a file allocation table for all compound document files after interpreting the expansion file allocation table based on the start sector information of the expansion file allocation table when the extension file allocation table is used; And a file size calculation step of calculating a size of the compound document file based on the file allocation table. It may include an unallocated file search process including a.

보다 바람직하게는 상기 복합문서파일이 포함하는 고유의 디렉토리 개체에 대한 접근 가능여부를 검증하는 접근여부검증과정; 및 상기 복합문서파일의 데이터 구조를 검증하여 문서상태에 대한 정상 또는 손상여부를 판단하는 정상여부판단과정;을 포함하는 검증단계를 포함할 수 있다. More preferably, access verification step of verifying whether the access to the unique directory object included in the compound document file; And a normal determination process of determining whether the document state is normal or damaged by verifying a data structure of the compound document file.

보다 바람직하게는 상기 복합문서파일이 정상이라고 판단하는 경우, 상기 복합문서파일 내 이진 형식의 디렉터리 개체를 수집하여 스트림 또는 스토리지의 이름, 생성시간, 수정시간 중 적어도 하나를 추출하는 디렉터리개체정보추출과정; 상기 복합문서파일로부터 본문 및 메타 데이터를 추출하는 데이터추출과정; 상기 복합문서파일로부터 추출한 본문 및 메타데이터에 기초하여 상기 복합문서파일의 이름을 결정하는 파일이름결정과정; 및 상기 복합문서파일 내 비할당영역의 크기를 연산하는 비할당크기연산과정;을 더 포함하는 검증단계를 포함할 수 있다. More preferably, if it is determined that the compound document file is normal, the directory object information extraction process of extracting at least one of a stream, storage name, creation time, and modification time by collecting a directory object in binary format in the compound document file. ; A data extraction process of extracting text and metadata from the compound document file; A file name determination process of determining a name of the compound document file based on the text and metadata extracted from the compound document file; And an unallocated size calculation process of calculating a size of an unallocated region in the compound document file.

보다 바람직하게는 상기 복합문서파일 중 손상파일 또는 정상으로 검증된 복합문서파일 중 비할당영역으로 판단되는 파일에 대하여 이진 형식의 디렉터리 개체를 복구하는 디렉터리개체복구과정; 상기 복합문서파일의 메타데이터를 복구하는 메타데이터복구과정; 상기 복합문서파일의 본문스트림에 대한 압축저장 여부를 확인하는 압축여부확인과정; 압축확인이 완료된 상기 복합문서파일의 본문데이터 중 텍스트 추출 알고리즘을 통해 텍스트를 추출하는 텍스트추출과정; 상기 복합문서파일로부터 텍스트를 제외한 나머지 데이터를 추출하는 나머지데이터추출과정; 및 추출된 상기 텍스트 및 나머지 데이터를 조합하여 상기 복합문서파일을 복구하는 파일복구과정; 을 포함하는 데이터복구단계를 포함할 수 있다. More preferably, the directory object recovery process of restoring a directory object in binary format for a damaged file of the compound document file or a file determined to be an unallocated area of a compound document file that has been verified as normal; A metadata recovery process for restoring metadata of the compound document file; A compression checking step of checking whether or not to compress and store the body stream of the compound document file; A text extraction process of extracting text through a text extraction algorithm of the body data of the compound document file after compression checking is completed; A remaining data extraction process of extracting remaining data except text from the compound document file; And a file recovery process for recovering the compound document file by combining the extracted text and the remaining data. It may include a data recovery step comprising a.

보다 바람직하게는 상기 복합문서파일의 본문스트림이 압축저장된 경우, 상기 복합문서파일 내 비할당섹터 중 비압축섹터를 필터링하는 비압축섹터필터링과정; 상기 복합문서파일의 압축섹터에 대하여 압축 알고리즘을 통해 상기 압축섹터 중 첫 번째 섹터를 검색하는 첫섹터검색과정; 검색된 상기 첫 번째 압축섹터를 상기 복합문서파일의 나머지 압축섹터와 결합하는 압축섹터결합과정; 및 결합된 압축섹터에 대하여 압축을 해제한 후, 본문 데이터를 획득하는 본문 데이터획득과정;을 포함하는 압축여부확인과정을 포함할 수 있다.
More preferably, when the body stream of the compound document file is compressed and stored, an uncompressed sector filtering process of filtering out uncompressed sectors among unallocated sectors in the compound document file; A first sector searching process of searching a first sector of the compressed sectors through a compression algorithm with respect to the compressed sectors of the compound document file; A compression sector combining process of combining the retrieved first compressed sector with the remaining compressed sectors of the compound document file; And decompressing the combined compression sector and acquiring the body data to obtain the body data.

본 발명의 문서파일의 포렌식 분석 방법 및 시스템은 손상된 복합문서파일 또는 복합문서파일 내 데이터가 할당되지 않은 비할당영역에 저장되었던 데이터를 용이하게 복구할 수 있는 효과가 있다. The method and system for forensic analysis of a document file of the present invention have an effect of easily recovering a damaged compound document file or data stored in an unallocated area to which data in the compound document file is not allocated.

또한, 본 발명의 문서파일의 포렌식 분석 방법 및 시스템은 복합문서파일의 포렌식 분석 및 손상된 복합문서파일에 대한 복구를 통해, 각종 범죄를 신속하게 해결할 수 있는 효과가 있다.
In addition, the forensic analysis method and system of the document file of the present invention has an effect of quickly solving various crimes through forensic analysis of the compound document file and recovery of the damaged compound document file.

도 1은 복합문서파일의 내부구조를 나타낸 도면이다.
도 2는 본 발명의 일 실시 예에 따른 문서파일의 포렌식 분석방법의 순서도이다.
도 3은 본 발명의 복합문서파일 검색단계의 세부과정을 나타내는 순서도이다.
도 4는 복합문서파일의 헤더 내부구조를 나타낸 표이다.
도 5는 파일할당표의 내부구조를 나타낸 표이다.
도 6은 파일할당표의 일 예를 나타낸 표이다.
도 7은 본 발명의 복합문서파일 검증단계의 세부과정을 나타내는 순서도이다.
도 8은 디렉터리 개체의 내부구조를 나타낸 표이다.
도 9는 스트림의 슬랙영역에 대한 일 예를 나타낸 도면이다.
도 10은 복합문서파일 내 존재하는 비할당영역 및 슬랙영역을 나타낸 도면이다.
도 11은 본 발명의 복합문서파일의 비할당영역 내 데이터복구단계의 세부과정을 나타내는 순서도이다.
도 12는 속성 스트림의 내부구조를 나타낸 표이다.
도 13은 압축된 본문 데이터의 복구과정을 나타낸 도면이다. 1 is a diagram showing the internal structure of a compound document file.
2 is a flowchart illustrating a forensic analysis method of a document file according to an embodiment of the present invention.
3 is a flowchart illustrating a detailed process of a compound document file search step of the present invention.
4 is a table showing the internal structure of a header of a compound document file.
5 is a table showing the internal structure of the file allocation table.
6 is a table showing an example of a file allocation table.
7 is a flowchart illustrating a detailed process of the compound document file verification step of the present invention.
8 is a table showing the internal structure of a directory object.
9 is a diagram illustrating an example of a slack region of a stream.
FIG. 10 is a diagram illustrating an unallocated area and a slack area existing in the compound document file.
11 is a flowchart showing the detailed procedure of the data recovery step in the unallocated area of the compound document file of the present invention.
12 is a table showing the internal structure of an attribute stream.
13 is a diagram illustrating a recovery process of compressed body data.

이하, 본 발명을 바람직한 실시 예와 첨부한 도면을 참고로 하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며, 여기에서 설명하는 실시 예에 한정되는 것은 아니다. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT Hereinafter, the present invention will be described in detail with reference to preferred embodiments and accompanying drawings, which will be easily understood by those skilled in the art. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein.

먼저, 본 발명에 대하여 설명하기에 앞서, 복합문서파일의 내부구조에 대하여 자세히 알아보도록 한다.First, before describing the present invention, the internal structure of the compound document file will be described in detail.

도 1은 복합문서파일의 내부구조를 나타낸 도면이다. 1 is a diagram showing the internal structure of a compound document file.

도 1에 도시된 바와 같이, 본 발명에서 사용하는 복합문서파일은 스토리지와 스트림의 계층구조로 이루어져 있으며, 상기 스토리지와 스트림을 관리하기 위해 사용되는 메타데이터를 포함한다. As shown in FIG. 1, the compound document file used in the present invention has a hierarchical structure of storage and streams, and includes metadata used to manage the storage and streams.

이러한 스토리지는 내부에 실제 존재하는 데이터는 가지고 있지 않으며, 파일 시스템의 디렉터리와 같은 역할을 하고, 크게 루트 스토리지와, 그 하부에 속하는 서브 스토리지를 포함할 수 있다. Such storage does not have any data actually present therein, and acts like a directory of a file system, and may include a root storage and a sub storage belonging to a lower portion thereof.

또한, 스트림은 상기 스토리지의하부에 속하며, 그 크기에 따라 표준 스트림과 소형 스트림으로 나누어지게 되는데, 상기 스트림의 크기가 4096 Byte 이상인 경우에는 표준 스트림이라하고, 4096 Byte 미만인 경우에는 소형 스트림이라 할 수 있다. 또한, 상기 표준 스트림은 파일할당표(FAT: File Allocation Table)를 참조하여 구성이 이루어지고, 상기 소형 스트림은 소형 파일할당표(Mini FAT)를 참조하여 구성이 이루어진다. 또한 이러한 스트림은 내부에 포함되는 내용에 따라 텍스트 스트림, 데이터 스트림 등으로 나누어진다. In addition, the stream belongs to the lower part of the storage, and is divided into a standard stream and a small stream according to its size. When the stream size is 4096 bytes or more, it is called a standard stream, and when the stream is smaller than 4096 bytes, it is a small stream. have. In addition, the standard stream is configured with reference to a file allocation table (FAT), and the small stream is configured with reference to a small file allocation table (Mini FAT). In addition, such a stream is divided into a text stream, a data stream, and the like according to the contents contained therein.

이와 더불어, 상기 메타데이터는 사용자가 입력한 데이터를 설명하거나 관리하기 위한 데이터로서, 응용프로그램 상에서 자동적으로 생성된다. 이러한 메타데이터는 헤더 섹터, 파일할당표, 확장 파일할당표 및 디렉터리 개체를 포함한다. 상기 헤더 섹터는 복합문서파일의 최상위 512 Byte로서, 스토리지 또는 스트림에 접근하기 위해 필요한 정보를 포함한다. 또한, 상기 파일할당표는 섹터의 체인정보를 저장하고 있으며, 상기 확장 파일할당표는 파일할당표가 저장된 섹터를 관리하기 위한 정보를 나타내고, 상기 디렉터리 개체는 상기 스토리지 및 스트림에 저장된 정보를 저장하고 있다. In addition, the metadata is data for describing or managing data input by a user, and is automatically generated on an application program. Such metadata includes header sectors, file assignment tables, extended file assignment tables, and directory objects. The header sector is the highest 512 bytes of the compound document file and contains information necessary for accessing the storage or the stream. The file allocation table stores chain information of sectors, the extended file allocation table indicates information for managing sectors in which the file allocation table is stored, and the directory entity stores information stored in the storage and streams. have.

하기의 표 1을 통해 운영체제에서 사용되는 파일 시스템(FAT, NTFS)과 본 발명의 복합문서파일간 차이점을 확인할 수 있다.
Table 1 below shows the difference between the file system (FAT, NTFS) used in the operating system and the compound document file of the present invention.

구분division 파일 시스템File system 복합 문서 파일Compound document file 메타데이터Metadata 부트 섹터Boot sector 헤더 섹터Header sector 저장 방식Storage method 클러스터 할당으로 파일 저장Save file with cluster assignment 섹터 할당으로 스트림 저장Save stream with sector allocation 데이터 단위Data unit 클러스터cluster 섹터, 소형 섹터Sector, small sector 데이터 형식Data format 디렉터리, 파일Directory, file 스토리지, 스트림Storage, stream 데이터 형식 정보Data type information MFT
MFT
디렉터리 개체Directory object 할당 정보 관리Assignment Information Management 파일 할당 표File Allocation Table 파일 할당 표File Allocation Table 포함관계Relationship 트리 구조Tree structure 트리 구조Tree structure 최상위 저장소Top-level repository 루트 디렉터리Root directory 루트 스토리지Root storage 하위 저장소Child repository 디렉터리Directory 스토리지storage

이러한, 상기 복합문서파일에 대한 포렌식 분석방법에 대하여 도 2를 참조하여 보다 구체적으로 살펴보도록 한다. Forensic analysis of the compound document file will be described in more detail with reference to FIG. 2.

도 2는 본 발명의 일 실시 예에 따른 문서파일의 포렌식 분석방법의 순서도이다.2 is a flowchart illustrating a forensic analysis method of a document file according to an embodiment of the present invention.

도 2에 도시된 바와 같이, 본 발명의 문서파일의 포렌식 분석 방법은 파일수신부가 먼저 포렌식 분석을 수행하고자 하는 파일을 수신한다(S110). 이때, 포렌식 분석을 수행하고자 하는 다양한 종류의 파일을 수신할 수 있으며, 포렌식 분석을 위해 수신한 파일의 종류를 확인한다(S121). 예를 들어, 조사대상매체(예를 들면, 컴퓨터, 노트북 등)에 마운트된 저장장치(예를 들면, 하드디스크, 램, 메모리카드, 이동식 저장장치 등)를 입력받거나, 데이터가 할당되어 있지 않은 비할당영역 내 존재하는 덤프파일을 수신하거나, 일반 문서파일 또는 디렉터리 중 적어도 하나를 포렌식 분석을 위한 파일로서 수신할 수 있다. As shown in FIG. 2, in the forensic analysis method of the document file of the present invention, the file receiving unit first receives a file to be subjected to forensic analysis (S110). In this case, various types of files to be subjected to forensic analysis may be received, and the types of files received for forensic analysis are checked (S121). For example, a storage device (e.g., hard disk, RAM, memory card, removable storage device, etc.) mounted on the medium to be investigated (e.g., computer, laptop, etc.) is inputted or data is not allocated. A dump file existing in the unallocated area may be received, or at least one of a general document file or a directory may be received as a file for forensic analysis.

이때, 파일수신부가 포렌식 분석을 수행하고자 하는 조사대상매체에 마운트된 저장장치를 수신하는 경우에는 제어부가 상기 저장장치 내 파일 시스템을 분석하여, 상기 저장장치 내 데이터가 저장되어 있는 정상영역과, 데이터가 저장되어 있지 않은 비할당영역을 구분하여 각각 판단한다(S122). In this case, when the file receiving unit receives a storage device mounted on a target medium for which forensic analysis is to be performed, the controller analyzes the file system in the storage device, and the normal area in which the data in the storage device is stored, and the data. Determine each of the unallocated areas that are not stored (S122).

또는 상기 파일수신부가 데이터가 할당되어 있지 않은 비할당영역 내 존재하는 덤프파일을 수신하는 경우에는 상기 제어부가 수신한 파일을 비할당영역으로 판단한다(S123). Alternatively, when the file receiving unit receives a dump file existing in an unallocated area where data is not allocated, the control unit determines that the received file is an unallocated area (S123).

이와 달리, 파일수신부가 일반 문서파일 또는 디렉터리를 수신하는 경우에는 상기 제어부가 데이터가 저장되어 있는 정상영역이라고 판단한다(S124). On the contrary, when the file receiving unit receives the general document file or the directory, the control unit determines that the data is a normal area in which data is stored (S124).

상술한 바와 같이, 제어부가 포렌식 분석을 위해 수신한 파일의 종류를 먼저 파악하고, 수신한 파일을 정상영역 또는 비할당영역으로 구분한 후, 구분한 정상영역 또는 비할당영역 내 이진 형식으로 이루어진 복합문서파일이 존재하는지 여부를 확인한다(S130). As described above, the control unit first grasps the type of file received for forensic analysis, divides the received file into a normal area or an unassigned area, and then combines the binary file in the normal or unallocated area. Check whether the document file exists (S130).

이하, 도 3을 참조하여, 복합문서파일의 검색단계에 대하여 보다 자세히 살펴보도록 한다. Hereinafter, the searching step of the compound document file will be described in more detail with reference to FIG. 3.

도 3은 본 발명의 복합문서파일 검색단계의 세부과정을 나타내는 순서도이다.3 is a flowchart illustrating a detailed process of a compound document file search step of the present invention.

도 3에 도시된 바와 같이, 앞서 과정 S120을 통해 정상영역 또는 비할당영역의 판단을 수행한 수신 파일에 대하여, 정상영역 내 복합문서파일이 존재하는지 또는 비할당영역 내 복합문서파일이 존재하는지를 확인한다(S131).As shown in FIG. 3, it is checked whether the compound document file in the normal area or the compound document file in the unassigned area exists with respect to the reception file in which the determination of the normal area or the unassigned area is performed through step S120. (S131).

이때, 상기 복합문서파일이 정상영역 또는 비할당영역에 존재하는지 그 존재위치에 따라 세부 검색과정이 상이해진다. At this time, the detailed retrieval process differs depending on whether the compound document file exists in the normal area or the unassigned area.

먼저, 정상영역 내 복합문서파일이 존재하는지 검색하기 위해, 제어부가 정상영역 내 존재하는 모든 파일의 데이터를 대상으로 이진 형식의 복합문서를 나타내는 시그니처가 존재하는지 확인한다(S132).First, in order to search for the existence of the compound document file in the normal region, the controller checks whether a signature representing the compound document in binary format exists for the data of all files existing in the normal region (S132).

이때, 상기 수신한 파일의 정상영역 내 존재하는 데이터 중 복합문서를 나타내는 시그니처가 존재한다고 판단한 경우, 상기 수신한 파일의 정상영역 내 존재하는 데이터를 대상으로 마이크로소프트 오피스, 한글과 컴퓨터 한글 등과 같은 고유한 디렉터리 개체를 확인한다(S133).In this case, when it is determined that a signature indicating a compound document exists among the data existing in the normal area of the received file, the unique data such as Microsoft Office, Korean, and computer Hangul are targeted to the data existing in the normal area of the received file. One directory object is checked (S133).

이러한 확인과정을 통해, 제어부는 수신한 파일 내 복합문서를 나타내는 시그니처가 존재하고, 고유한 디렉터리 개체를 포함한다고 확인한 경우에는 수신한 파일의 정상영역 내 복합문서파일이 존재한다고 판단하여, 상기 복합문서파일의 헤더를 검증한다(S134). Through this checking process, when the controller determines that the signature indicating the compound document in the received file exists and includes a unique directory object, the controller determines that the compound document file in the normal area of the received file exists, The header of the file is verified (S134).

이때, 상기 복합문서파일의 헤더에 대한 검증성공의 여부를 확인하여(S135), 상기 복합문서파일의 헤더에 대한 검증이 성공한 경우, 존재하는 복합문서파일은 정상상태라고 판단한다(S136).At this time, by checking whether the verification of the header of the compound document file is successful (S135), if the verification of the header of the compound document file is successful, it is determined that the existing compound document file is in a normal state (S136).

하지만 이와 달리, 상기 복합문서파일의 헤더에 대한 검증이 실패한 경우, 상기 수신한 파일의 정상영역 내 복합문서파일이 존재하나, 존재하는 복합문서파일의 상태가 손상되었다고 판단한다(S137).On the contrary, when the verification of the header of the compound document file fails, it is determined that the compound document file exists in the normal region of the received file, but the state of the existing compound document file is damaged (S137).

또한, 이때 상기 과정 S132에서 수신한 파일의 정상영역에 존재하는 데이터에 복합문서를 나타내는 시그니처가 존재하지 않거나, 과정 S133에서 수신한 파일의 정상영역에 존재하는 데이터가 고유한 디렉터리 개체를 포함하고 있지 않은 경우, 현재 수신한 파일의 정상영역 내부에는 복합문서파일이 존재하지 않다고 판단하고, 파일수신부가 이어서 수신하는 다음 파일에 대하여(S138) 상술한 과정 S120에 기재된 수신한 파일 종류에 따른 정상영역 또는 비할당영역의 존재여부를 확인하고, 이에 따라 과정 S130에 기재된 각 영역 별 복합문서파일의 검색과정을 다시 수행한다.At this time, the signature indicating the compound document does not exist in the data present in the normal region of the file received in step S132, or the data present in the normal region of the file received in step S133 contains a unique directory object. If not, it is determined that the compound document file does not exist in the normal area of the currently received file, and the next file received by the file receiving unit (S138) is the normal area according to the received file type described in the above-described process S120 or The existence of the unassigned area is checked, and accordingly, the process of searching for the composite document file for each area described in step S130 is performed again.

이와 같이 수신한 파일의 정상영역 내 복합문서파일이 존재하는 것과 달리, 수신한 파일의 비할당영역 내 복합문서파일이 존재하는 경우에 대해서도 살펴보도록 한다. Unlike the case in which the compound document file in the normal region of the received file exists, the case where the compound document file in the unallocated region of the received file exists will be described.

먼저, 수신한 파일의 비할당영역 내 섹터 또는 클러스터 크기를 기준으로 하여 복합문서를 나타내는 시그니처가 존재하는지 확인한다(S139a)First, it is checked whether a signature representing a compound document exists based on the sector or cluster size in the unallocated area of the received file (S139a).

만약, 상기 수신한 파일의 비할당영역 내 섹터 또는 클러스터 크기에 의해 복합문서를 나타내는 시그니처가 존재한다고 확인한 경우, 수신한 파일의 비할당영역 내 복합문서파일이 존재한다고 판단하고, 상기 복합문서파일의 데이터가 압축되어 있는지 여부를 확인한다(S139b). 예를 들어, 상기 복합문서파일의 3 번째 오프셋에서 시그니처(0xD0CF11E0A1B11AE1)가 발견된 경우에는 상기 복합문서파일이 NTFS 압축 데이터일 가능성이 존재하므로, 상기 NTFS 압축 해제를 수행하여 압축이 해제된 정상적인 복합문서파일의 데이터를 획득한다.If it is determined that a signature indicating a compound document exists by the sector or cluster size in the unallocated area of the received file, it is determined that the compound document file exists in the unallocated area of the received file, It is checked whether the data is compressed (S139b). For example, if the signature (0xD0CF11E0A1B11AE1) is found at the third offset of the compound document file, the compound document file may be NTFS compressed data. Therefore, the normal compound document decompressed by performing NTFS decompression is performed. Obtain the data of the file.

이와 같이, 비할당영역 내 존재하는 복합문서파일의 데이터를 획득한 후 상기 복합문서파일의 크기를 연산하기 위한 과정을 수행한다. In this way, after acquiring the data of the compound document file existing in the unallocated area, the process of calculating the size of the compound document file is performed.

도 4는 복합문서파일의 헤더 내부구조를 나타낸 표이다.4 is a table showing the internal structure of a header of a compound document file.

이후, 도 4에 도시된 바와 같이, 상기 복합문서파일의 헤더를 통해 섹터의 크기를 확인하고(S139c), 이어서 섹터의 개수를 확인한다(S139d). 예를 들어, Sector Shift가 9인 경우, 섹터의 크기는 2⁹인 512인 것을 확인할 수 있고, 헤더 구조 내 섹터의 개수(Number of FAT Sectors)에 해당하는 값이 0x2C인 것을 알 수 있다. Then, as shown in FIG. 4, the size of the sector is checked through the header of the compound document file (S139c), and then the number of sectors is checked (S139d). For example, when the sector shift is 9, it can be seen that the size of the sector is 512, which is 2 ⁹ , and the value corresponding to the number of sectors in the header structure is 0x2C.

이후, 파일수신부가 수신한 각 복합문서파일에 대한 파일 할당표를 획득하기 위해, 제어부가 먼저 확장 파일할당표를 사용하는지 여부를 확인한다(S139e). 이때, 복합문서파일의 크기가 약 7MB 이상인 경우에, 상기 복합문서파일에 대한 파일할당표가 저장된 위치를 관리하는 별도의 테이블을 나타내는 확장 파일할당표가 사용될 수 있으며, 특히, 도 4에 도시된 복합문서파일 내 헤더 내부 구조에서 Number of DIFAT Sectors가 0 보다 큰 경우, 확장 파일할당표를 사용하고 있다고 판단할 수 있다. Thereafter, in order to obtain a file allocation table for each compound document file received by the file receiving unit, the control unit first checks whether the extended file allocation table is used (S139e). In this case, when the size of the compound document file is about 7 MB or more, an extended file allocation table indicating a separate table for managing a location where the file allocation table for the compound document file is stored may be used. In particular, as shown in FIG. If Number of DIFAT Sectors is greater than 0 in the header internal structure of compound document file, it can be determined that extended file allocation table is used.

이에 따라, 상기 확장 파일할당표에 기재된 시작섹터(First DIFAT Sector Location)정보를 이용하여 사용하는 확장 파일할당표를 해석함으로써, 수신한 각 파일의 전체에 대한 파일할당표를 획득한다(S139f).Accordingly, the file allocation table for all of the received files is obtained by analyzing the extension file allocation table used by using the first sector information described in the expansion file allocation table (S139f).

이와 같이 파일수신부가 수신한 각 파일에 대한 파일할당표를 이용하여 복합문서파일의 크기를 연산한다(S139g)In this way, the file receiving table calculates the size of the compound document file using the file allocation table for each file (S139g).

도 5는 파일할당표의 내부구조를 나타낸 표이다.5 is a table showing the internal structure of the file allocation table.

도 5에 도시된 바와 같이, 앞서 과정 S139f를 통해 획득한 파일할당표를 이용하여 스트림 데이터가 저장되는 섹터의 체인을 구성할 수 있다. 예를 들어, 도 6과같이 파일할당표가 구성되는 경우에는 어떠한 스트림의 시작 인덱스가 9 라면, 섹터 체인은 9 → 10 → 11 → 12 → 13 → 14 → -2 로 이루어진다. 이때, 상기 파일할당표의 인덱스는 파일의 헤더를 제외한 나머지 부분에서의 위치를 나타내므로, 파일에서의 위치는 인덱스의 위치에 1을 더한 값을 앞서 과정 S139c를 통해 확인한 섹터의 크기와 곱하여 획득할 수 있다. As shown in FIG. 5, a chain of sectors in which stream data is stored may be configured using the file allocation table obtained through the process S139f. For example, when the file allocation table is configured as shown in FIG. 6, if the start index of a stream is 9, the sector chain is 9 → 10 → 11 → 12 → 13 → 14 → -2. In this case, since the index of the file allocation table indicates the position in the remaining portions except for the header of the file, the position in the file may be obtained by multiplying the size of the sector checked through the step S139c by adding 1 to the position of the index. have.

파일에서의 위치 = (인덱스의 위치 + 1) × 섹터의 크기Position in file = (position of index + 1) × size of sector

이후, 제어부가 상기 파일할당표 내 존재하는 인덱스의 최대값을 연산하고, 연산한 상기 최대값에 2를 더한 후, 앞서 과정 S139c를 통해 확인한 섹터의 크기를 곱하여 복합문서파일의 크기를 연산한다. 이때, 상기 인덱스의 최대값에 2를 더하는 것은 인덱스 0 번에 해당하는 헤더 섹터와 인덱스 최대값에 해당하는 마지막 섹터를 포함시키기 위한 것이다. Thereafter, the controller calculates the maximum value of the index existing in the file allocation table, adds 2 to the calculated maximum value, and multiplies the size of the sector checked through step S139c to calculate the size of the compound document file. In this case, adding 2 to the maximum value of the index is to include the header sector corresponding to index 0 and the last sector corresponding to the index maximum value.

파일의 크기 = (파일할당표의 최대값 + 2) × 섹터의 크기File size = (maximum value in file allocation table + 2) × size of sector

상술한 바와 같이, 검색된 복합문서파일에 대한 검증을 수행한다(S140).As described above, the searched compound document file is verified (S140).

이하, 도 7을 참조하여 복합문서파일의 검증단계에 대하여 보다 자세히 살펴보도록 한다. Hereinafter, the verification step of the compound document file will be described in more detail with reference to FIG. 7.

도 7은 본 발명의 복합문서파일 검증단계의 세부과정을 나타내는 순서도이다.7 is a flowchart illustrating a detailed process of the compound document file verification step of the present invention.

도 7에 도시된 바와 같이, 제어부가 복합문서파일이 포함하는 마이크로소프트 오피스, 한글과 컴퓨터의 한글과 같은 고유의 디렉터리 개체에 대한 접근이 가능한지 여부를 검증한다(S141).As shown in FIG. 7, the control unit verifies whether access to a unique directory object such as Microsoft Office, Korean, and Korean of the computer included in the compound document file is possible (S141).

고유의 디렉터리 개체에 대한 접근이 가능한 복합문서파일에 포함된 데이터 구조를 검증하여 상기 복합문서파일의 문서상태가 정상인지 또는 손상되었는지 여부를 판단한다(S142). The data structure included in the compound document file accessible to the unique directory object is verified to determine whether the document state of the compound document file is normal or damaged (S142).

만약 상기 복합문서파일의 문서상태가 정상인 경우라면(S143), 상기 복합문서파일 내 이진 형식의 디렉터리 개체를 수집하고, 도 8과 같이 도시된 디렉터리 개체의 내부구조를 통해 스트림, 스토리지의 이름, 생성시간, 수정시간 중 적어도 하나의 정보를 추출한다(S144). If the document state of the compound document file is normal (S143), the directory object in binary format in the compound document file is collected, and the stream, storage name, and generation are generated through the internal structure of the directory object shown in FIG. At least one information of time and modification time is extracted (S144).

또한 상기 복합문서파일로부터 본문 및 메타 데이터를 추출한다(S145). 이때, 상기 본문 데이터는 마이크로소프트의 오피스 또는 한글과 컴퓨터의 한글과 같은 고유한 저장형식을 통해 저장되기 때문에, 상기 본문 데이터의 이러한 저장형식을 해석하여 텍스트와 이미지 등의 의미있는 데이터를 획득할 수 있다. 또한, 상기 메타 데이터는 속성 스트림을 통해 사용자가 입력한 데이터를 설명하거나 관리하기 위한 데이터로서, 응용프로그램 상에서 자동적으로 생성되는 데이터인 메타데이터를 획득할 수 있다. 이러한 속성 스트림은 복합문서파일에 존재하는 특수한 스트림으로서, 문서파일에 대한 속성정보를 저장하고 있다. 예를 들어, 한글과 컴퓨터의 한글(2000년대 이후) 파일에는 0x0005HwpSummaryInformation, 마이크로소프트의 오피스(1997년부터 2003년)는 0x0005SummnaryInformation과 0x0005DocumentSummaryInformation 와 같은 속성 스트림이 저장되어 있다. 이러한 속성 스트림에는 복합문서파일에 관련된 제목, 주제, 지은이, 마지막으로 저장한 사람, 작성날짜, 수정날짜 등의 정보가 저장된다. In addition, the body and metadata are extracted from the compound document file (S145). In this case, since the body data is stored through a unique storage format such as Microsoft's office or Korean and a computer's Hangul, it is possible to obtain meaningful data such as text and images by interpreting the storage format of the body data. have. In addition, the metadata is data for describing or managing data input by a user through an attribute stream, and may acquire metadata, which is data automatically generated on an application program. This attribute stream is a special stream existing in the compound document file and stores attribute information about the document file. For example, Hangul and computer Hangul (after the 2000s) file stores attribute streams such as 0x0005HwpSummaryInformation and Microsoft's Office (1997-2003), 0x0005SummnaryInformation and 0x0005DocumentSummaryInformation. The attribute stream stores information such as title, subject, author, last saved person, creation date, and modification date related to the compound document file.

이후, 제어부는 추출한 본문 데이터 및 메타 데이터를 통해 상기 복합문서파일에 대한 이름을 결정할 수 있다(S146). 이러한 복합문서파일의 이름은 본문 텍스트의 첫 부분에 해당하거나, 메타 데이터에 기록된 제목 또는 주제에 해당하거나, 데이터가 발견된 섹터 또는 클러스터의 위치에 해당하는 경우 중에 하나를 선택하여 결정될 수 있다. Thereafter, the controller may determine the name of the compound document file through the extracted body data and meta data (S146). The name of the compound document file may be determined by selecting one of the first part of the body text, the title or the subject recorded in the metadata, or the location of the sector or cluster where the data is found.

이어서, 상기 복합문서파일 내부에 사용되지 않은 영역 즉, 비할당영역의 크기를 연산한다(S147). 상기 복합문서파일은 특정 단위를 기준으로 데이터를 저장하기 때문에, 데이터가 저장되지 않은 영역 즉, 비할당영역 및 슬랙영역이 존재할 수 있다. 이하, 도 9 내지 도 10을 참조하여, 상기 복합문서파일 내부의 비할당영역 및 슬랙영역에 대하여 구체적으로 살펴보도록 한다. Subsequently, the size of the area which is not used in the compound document file, that is, the unallocated area is calculated (S147). Since the compound document file stores data based on a specific unit, an area where data is not stored, that is, an unallocated area and a slack area may exist. Hereinafter, an unallocated area and a slack area inside the compound document file will be described in detail with reference to FIGS. 9 to 10.

도 9는 스트림의 슬랙영역에 대한 일 예를 나타낸 도면이다. 9 is a diagram illustrating an example of a slack region of a stream.

도 9에 도시된 바와 같이, 스트림 A의 크기는 4,246 Byte이고, 섹터단위 즉 512 byte 단위로 데이터가 저장된다고 가정한 경우, 상기 스트림 A의 크기에 해당하는 4,246 byte의 데이터를 저장하기 위해서는 총 9개의 섹터가 할당되는 것을 알 수 있다. 이때, 데이터는 4,246 byte까지 저장됨에 따라, 결국 4,608 byte 중 저장하고자 하는 4,246 byte를 뺀 나머지 362 byte의 슬랙영역이 발생하는 것을 알 수 있다. As shown in FIG. 9, when the size of the stream A is 4,246 bytes and it is assumed that data is stored in units of sectors, that is, 512 bytes, in order to store 4,246 bytes of data corresponding to the size of the stream A, a total of 9 It can be seen that four sectors are allocated. At this time, as data is stored up to 4,246 bytes, it can be seen that a slack region of 362 bytes is generated after subtracting 4,246 bytes from 4,608 bytes.

이외에도, 복합문서파일의 내부에 존재하는 비할당영역 또는 슬랙영역에 대하여 살펴보면 다음과 같다. In addition, the unallocated area or slack area existing inside the compound document file is as follows.

도 10은 복합문서파일 내 존재하는 비할당영역 및 슬랙영역을 나타낸 도면이다. FIG. 10 is a diagram illustrating an unallocated area and a slack area existing in the compound document file.

도 10에 도시된 바와 같이, 파일할당표에서 값이 0xFFFFFFFF(-1)인 인덱스에 해당하는 부분은 비할당영역에 해당하고, 디렉터리 개체 또는 스트림의 마지막 부분이 슬랙영역에 해당하는 것을 알 수 있다. As shown in FIG. 10, it can be seen that the portion corresponding to the index having a value of 0xFFFFFFFF (-1) in the file allocation table corresponds to the unallocated region, and the last portion of the directory object or stream corresponds to the slack region. .

이러한 비할당영역 또는 슬랙영역(이하, 비할당영역이라 한다.)은 복합문서파일 내 다수개가 존재할 수 있으며, 또한 각 응용프로그램의 특성으로 인하여 이전의 작업 내역이 상기 복합문서파일 내 비할당영역 내 그대로 남아있는 경우도 발생할 수 있다. Such unallocated area or slack area (hereinafter, referred to as "unallocated area") may exist in the compound document file, and due to the characteristics of each application program, the previous work history may be stored in the unallocated area of the compound document file. It can also happen if it remains.

따라서, 상기 복합문서파일의 비할당영역 내 데이터가 존재하는 경우, 존재하는 데이터를 복구한다(S150).Therefore, when the data in the unallocated area of the compound document file exists, the existing data is recovered (S150).

이하, 도 11을 참조하여, 복합문서파일의 비할당영역 내 데이터 복구단계에 대하여 자세히 살펴보도록 한다. Hereinafter, the data recovery step in the unallocated area of the compound document file will be described in detail with reference to FIG. 11.

도 11은 본 발명의 복합문서파일의 비할당영역 내 데이터복구단계의 세부과정을 나타내는 순서도이다.11 is a flowchart showing the detailed procedure of the data recovery step in the unallocated area of the compound document file of the present invention.

도 11에 도시된 바와 같이, 제어부가 복합문서파일이 손상된 파일인지 또는 비할당영역 내 데이터가 존재하는 파일인지 여부를 먼저 확인한다(S151). As shown in FIG. 11, the control unit first checks whether the compound document file is a damaged file or a file in which data in the unallocated area exists (S151).

만약, 상기 제어부가 상기 복합문서파일이 손상파일이거나, 비할당영역 내 데이터가 존재하는 파일이라고 판단한 경우, 이진 형식의 디렉터리 개체에 대하여 복구를 수행한다(S152). 예를 들어, 복합문서파일에 존재하는 스트림 또는 스토리지의 이름정보와, 생성시간정보, 수정시간정보를 각각 추출하여 획득함에 따라, 이를 분석하여 디렉터리 개체에 대한 복구를 수행할 수 있다. If the control unit determines that the compound document file is a damaged file or a file in which data in the unallocated area exists, the control unit recovers the directory object in binary format (S152). For example, as the name information, the creation time information, and the modification time information of the stream or storage existing in the compound document file are extracted and acquired, respectively, the directory object may be recovered by analyzing the extracted information.

이어서, 상기 복합문서파일의 메타 데이터를 복구한다(S153). 즉, 복합문서파일의 속성 스트림에 대한 복구를 수행하는 것이다. Subsequently, the meta data of the compound document file is recovered (S153). That is, recovery of attribute stream of compound document file is performed.

도 12는 속성 스트림의 내부구조를 나타낸 표이다.12 is a table showing the internal structure of an attribute stream.

도 12에 도시된 바와 같이, 바이트 오더(0xFFFE), CLSID, FMTID 등의 정보를 이용하여 상기 속성 스트림의 시작 섹터를 검색할 수 있다. 이와 같이, 제어부가 속성 스트림의 시작 섹터를 검색한 후, 상기 속성 스트림의 내부 구조를 해석하여 각 속성정보를 복구하는 과정을 수행한다. As shown in FIG. 12, a start sector of the attribute stream may be searched using information such as a byte order (0xFFFE), a CLSID, and an FMTID. As described above, after the controller searches for the start sector of the attribute stream, the controller analyzes the internal structure of the attribute stream and recovers each attribute information.

이후, 복합문서파일의 본문 데이터를 복구하기에 앞서, 상기 본문 데이터가 압축되는 경우가 발생할 수도 있으므로, 상기 복합문서파일의 본문 스트림이 압축되었는지 여부를 확인한다(S154).Thereafter, before restoring the body data of the compound document file, the case where the body data is compressed may occur, so it is checked whether the body stream of the compound document file is compressed (S154).

만약, 상기 복합문서파일의 본문 스트림이 압축된 경우, 상기 복합문서파일 내부의 비할당섹터에 대하여 비압축섹터를 필터링한다(S155). 이러한 필터링 과정을 통해 0x00 또는 0xFF만으로 구성되는 섹터와, 아스키 또는 유니코드 등과 같은 문자로만 구성된 섹터 뿐만 아니라, 압축이 정상적으로 되지 않아 랜덤성이 낮은 데이터를 본문 데이터 복구 대상에서 제외한다. If the body stream of the compound document file is compressed, the uncompressed sector is filtered for an unallocated sector in the compound document file (S155). Through this filtering process, not only sectors composed of 0x00 or 0xFF, sectors composed only of characters such as ASCII or Unicode, but also data of low randomness due to uncompressing are excluded from the body data recovery target.

이후, 압축프로그램에서 주로 사용하는 압축알고리즘을 이용하여 상기 복합문서파일의 압축섹터 중 첫 번째 섹터를 검색한다(S156). 이와 같이, 검색된 첫 번째 압축섹터를 상기 복합문서파일의 나머지 압축섹터와 상호 결합한 후(S157), 결합된 압축섹터에 대하여 압축 해제를 수행함으로써, 본문 데이터를 획득한다(S158) Thereafter, the first sector of the compressed sector of the compound document file is searched using a compression algorithm mainly used in a compression program (S156). In this way, after the searched first compressed sector is mutually combined with the remaining compressed sectors of the compound document file (S157), decompression is performed on the combined compressed sectors to obtain body data (S158).

도 13은 압축된 본문 데이터의 복구과정을 나타낸 도면이다. 13 is a diagram illustrating a recovery process of compressed body data.

도 13에 도시된 바와 같이, 복합문서파일 내부에 복수 개의 압축된 본문 스트림이 존재함에 따라, 상기 복수 개의 압축섹터를 상호 결합하고, 압축해제하는 과정을 통해 압축이 해제된 본문데이터를 획득할 수 있다. As shown in FIG. 13, as a plurality of compressed body streams exist in the compound document file, the decompressed body data may be obtained by combining and decompressing the plurality of compressed sectors. have.

이와 같이, 압축이 해제된 본문 데이터 또는 상기 복합문서파일의 압축되지 않은 본문 스트림에 대하여 텍스트 추출 알고리즘에 기초하여 상기 복합문서파일의 본문 데이터 중 텍스트를 추출한다(S159). 이후, 상기 과정 S159를 통해 추출한 텍스트를 제외한 나머지 데이터를 추출한다(S159a)In this way, text is extracted from the body data of the compound document file based on a text extraction algorithm with respect to the decompressed body data or the uncompressed body stream of the compound document file (S159). Thereafter, the remaining data except for the text extracted through the process S159 is extracted (S159a).

이와 같이, 상기 과정 S159를 통해 추출한 텍스트와, 과정 S159a를 통해 추출한 나머지 데이터를 상호 조합하여 상기 복합문서파일을 최종적으로 복구한다(S159b).In this way, the composite document file is finally restored by combining the text extracted through the process S159 and the remaining data extracted through the process S159a (S159b).

또한, 이러한 문서파일의 포렌식 분석방법은 컴퓨터로 실행하기 위한 프로그램이 기록된 컴퓨터 판독가능 기록매체에 저장될 수 있다. 이때, 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록 장치를 포함한다. 컴퓨터가 읽을 수 있는 기록 장치의 예로는 ROM, RAM, CD-ROM, DVD±ROM, DVD-RAM, 자기 테이프, 플로피 디스크, 하드 디스크(hard disk), 광데이터 저장장치 등이 있다. 또한, 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 장치에 분산되어 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다.In addition, the forensic analysis method of the document file may be stored in a computer readable recording medium having recorded thereon a program for execution by a computer. At this time, the computer-readable recording medium includes all kinds of recording apparatuses in which data that can be read by a computer system is stored. Examples of the computer readable recording medium include ROM, RAM, CD-ROM, DVD 占 ROM, DVD-RAM, magnetic tape, floppy disk, hard disk, optical data storage, and the like. The computer readable recording medium can also be distributed over network coupled computer devices so that the computer readable code is stored and executed in a distributed fashion.

상기에서는 본 발명의 바람직한 실시 예에 대하여 설명하였지만, 본 발명은 이에 한정되는 것이 아니고 본 발명의 기술 사상 범위 내에서 여러 가지로 변형하여 실시하는 것이 가능하고 이 또한 첨부된 특허청구범위에 속하는 것은 당연하다.
While the present invention has been described in connection with what is presently considered to be practical exemplary embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, but, on the contrary, Do.

S110: 포렌식 분석을 위한 파일을 수신함
S120: 파일 내 정상영역 또는 비할당영역의 존재여부를 판단함
S130: 파일의 정상영역 또는 비할당영역 내 존재하는 복합문서파일을 검색함
S140: 복합문서파일을 검증수행함
S150: 복합문서파일의 비할당영역 내 데이터복구를 수행함S110: Receive a file for forensic analysis
S120: Determine whether there is a normal area or an unallocated area in the file.
S130: Search for a compound document file existing in the normal or unallocated area of the file
S140: Validate compound document file
S150: Perform data recovery in unallocated area of compound document file

Claims

A file receiving step of receiving a file to perform forensic analysis;
A file area determination step of checking whether a normal area or an unassigned area exists in the received file;
A file searching step of searching for a compound document file existing in a normal area or an unallocated area of the file;
A verification step of performing verification on the compound document file; And
A data recovery step of performing recovery of data in the unallocated area of the compound document file;
Forensic analysis method of the document file comprising a.

The method of claim 1,
The file receiving step
A forensic analysis method of a document file, characterized in that it receives at least one of a storage device mounted on a medium to be examined, a dump file existing in an unallocated area of a file, a file, and a directory.

3. The method of claim 2,
The file area determination step
When receiving a storage device mounted on the medium to be examined, a first system that analyzes a file system for the mounted storage device received and divides it into a normal area in which data is stored or an unallocated area in which data is not stored. Judgment process;
A second determination step of determining whether the received file is an unallocated area when receiving a dump file existing in the unallocated area of the file; And
A third determination step of determining that the file or directory is a normal area when the file or directory is received;
Forensic analysis method of the document file, characterized in that to perform at least one of the process.

The method of claim 1,
The file search step
A normal region file searching process for searching for a compound document file existing in the normal region; And
An unallocated area file searching process of searching for a compound document file existing in the unallocated area;
Forensic analysis method of the document file, characterized in that to perform one of the process.

5. The method of claim 4,
The normal region file search process
A signature retrieval process for retrieving whether a signature representing a compound document in the received file exists;
A directory object checking step of checking whether the compound document file in which the signature is found includes a unique directory object; And
A header verification process of verifying a document state in the compound document file by verifying a header of the compound document file;
Forensic analysis method of a document file comprising a.

5. The method of claim 4,
The unallocated file search process
A signature retrieval process of retrieving whether a signature representing a compound document exists by analyzing a header structure of the received file;
A compression checking step of checking whether the compound document file is compressed according to a signature search position of the compound document file in which the signature is found;
A sector size checking step of analyzing a header structure of the compound document file to check the size of a sector used by the compound document file;
A sector count checking step of checking the number of sectors allocated to a preset file allocation table;
An extension file assignment table usage checking step of checking whether or not an extension file allocation table indicating information for managing the sector in which the file allocation table is stored is used;
A file allocation table acquiring step of acquiring a file allocation table for all compound document files after interpreting the expansion file allocation table based on the start sector information of the expansion file allocation table when the extension file allocation table is used; And
A file size calculation step of calculating a size of the compound document file based on the file allocation table;
Forensic analysis method of a document file comprising a.

The method of claim 1,
The verification step is
An access verification process of verifying whether access to a unique directory object included in the compound document file is accessible; And
A normal determination process of verifying a normal or damaged state of a document by verifying a data structure of the compound document file;
Forensic analysis method of a document file comprising a.

The method of claim 7, wherein
The verification step is
If it is determined that the compound document file is normal, extracting directory object information in binary form in the compound document file to extract at least one of a stream, storage name, creation time, and modification time;
A data extraction process of extracting text and metadata from the compound document file;
A file name determination process of determining a name of the compound document file based on the text and metadata extracted from the compound document file; And
And a non-allocating size calculation step of calculating a size of the unallocated region in the compound document file.

The method of claim 1,
The data recovery step
A directory object recovery process of restoring a directory object in binary format for a damaged file among the compound document files or a file determined to be an unallocated region among the compound document files verified as normal;
A metadata recovery process for restoring metadata of the compound document file;
A compression checking step of checking whether or not to compress and store the body stream of the compound document file;
A text extraction process of extracting text through a text extraction algorithm of the body data of the compound document file of which compression checking is completed;
A remaining data extraction process of extracting remaining data except text from the compound document file; And
A file recovery process for recovering the compound document file by combining the extracted text and the remaining data;
Forensic analysis method of a document file comprising a.

10. The method of claim 9,
The compression check process is
An uncompressed sector filtering step of filtering uncompressed sectors among unallocated sectors in the compound document file when the body stream of the compound document file is compressed and stored;
A first sector searching process of searching a first sector of the compressed sectors through a compression algorithm with respect to the compressed sectors of the compound document file;
A compression sector combining process of combining the retrieved first compressed sector with the remaining compressed sectors of the compound document file; And
A body data acquisition process of acquiring body data after decompressing the combined compression sector;
Forensic analysis method of a document file comprising a.

A forensic analysis system of a document file according to any one of claims 1 to 10.