KR20190040755A

KR20190040755A - Method for detecting malware using file image and apparatus using the same

Info

Publication number: KR20190040755A
Application number: KR1020170130810A
Authority: KR
Inventors: 최선오
Original assignee: 한국전자통신연구원
Priority date: 2017-10-11
Filing date: 2017-10-11
Publication date: 2019-04-19

Abstract

Disclosed are a method for detecting a malicious code using comparison of a file image and an apparatus therefor. According to an embodiment of the present invention, the method for detecting a malicious code comprises the steps of: extracting image data for each of a normal file which does not include a malicious code and a malicious file which includes a malicious code; performing deep learning based on the image data to generate a malicious code detection model based on an image to detect a malicious code; and comparing the image data corresponding to a target file to determine whether the file is infected by a malicious code with the malicious code detection model to detect a malicious code.

Description

METHOD FOR DETECTING MALWARE USING FILE IMAGE AND APPARATUS USING THE SAME BACKGROUND OF THE INVENTION 1. Field of the Invention [

본 발명은 악성코드 탐지 기술에 관한 것으로, 특히 악성코드를 이미지 데이터로 변환하고, 이것을 딥 러닝(deep learning) 모델로 학습하여 악성코드를 탐지하는 기술에 관한 것이다.The present invention relates to a malicious code detection technique, and more particularly, to a technique for detecting a malicious code by converting malicious code into image data and learning it as a deep learning model.

일반적으로 악성코드를 정적 또는 동적으로 분석하여 악성코드의 특정 패턴을 등록시킨 후 악성코드 검사엔진에서 특정 패턴의 존재 여부를 판단하는 방법으로 악성코드를 탐지한다. 그러나, 이러한 방법은 최신의 패턴이 등록되지 않은 제로데이 악성코드를 탐지하는데 제약점을 가진다. 또한, 파일로부터 패턴을 추출하고 비교하는데 많은 시간이 소요되므로 실시간으로 악성코드를 탐지하는 데에는 어려움이 있다.Generally, the malicious code is statically or dynamically analyzed to register a specific pattern of the malicious code, and the malicious code detection engine detects the malicious code by determining whether a specific pattern exists or not. However, this method has a limitation in detecting zero-day malicious codes for which the latest pattern is not registered. In addition, since it takes a long time to extract and compare patterns from files, it is difficult to detect malicious codes in real time.

한국 공개 특허 제10-2017-0087007호, 2017년 7월 27일 공개(명칭: 악성 코드 분석을 위한 전자 장치 및 이의 방법)Korean Patent Laid-Open No. 10-2017-0087007, July 27, 2017 (Name: Electronic Apparatus for Malicious Code Analysis and Its Method)

본 발명의 목적은 패턴을 추출하여 비교하는 방식에 의존하지 않는 새로운 악성코드 탐지 방법을 제공하는 것이다.It is an object of the present invention to provide a new malicious code detection method which does not depend on a method of extracting and comparing patterns.

또한, 본 발명의 목적은 API 시스템 콜을 추출하는 것보다 신속하게 악성코드 탐지 딥 러닝을 위한 데이터를 획득할 수 있는 방법을 제공하는 것이다.It is also an object of the present invention to provide a method for acquiring data for malicious code detection deep learning more quickly than extracting an API system call.

또한, 본 발명의 목적은 패턴이 알려지지 않은 제로데이 악성코드에도 대응할 수 있는 악성코드 탐지 기술을 제공하는 것이다.It is also an object of the present invention to provide a malicious code detection technique capable of coping with a zero-day malicious code whose pattern is unknown.

상기한 목적을 달성하기 위한 본 발명에 따른 악성코드 탐지 방법은 악성코드가 포함되지 않은 정상파일과 악성코드가 포함된 악성파일에 대해 각각 이미지 데이터를 추출하는 단계; 상기 이미지 데이터를 기반으로 딥 러닝(deep learning)을 수행하여 악성코드를 탐지하기 위한 이미지 기반의 악성코드 탐지 모델을 생성하는 단계; 및 악성코드 감염여부를 판단하기 위한 대상파일에 상응하는 이미지 데이터를 상기 악성코드 탐지 모델과 비교하여 악성코드를 탐지하는 단계를 포함한다.According to another aspect of the present invention, there is provided a malicious code detection method including extracting image data for a normal file including no malicious code and a malicious file including a malicious code, Generating an image-based malicious code detection model for detecting malicious code by performing deep learning based on the image data; And comparing the image data corresponding to the target file for determining whether or not the malicious code is infected with the malicious code detection model to detect the malicious code.

이 때, 생성하는 단계는 복수개의 CNN(Convolutional Neural Network) 계층들과 복수개의 완전연결계층들로 구성된 딥 러닝 모델을 이용하여 상기 딥 러닝을 수행할 수 있다.At this time, the generating step may perform the deep learning using a deep learning model composed of a plurality of CNN (Convolutional Neural Network) layers and a plurality of complete connection layers.

이 때, 이미지 데이터는 상기 정상파일 또는 상기 악성파일에 대해 다운샘플링(down sampling) 기법을 적용하여 추출된 파일 이미지에 상응할 수 있다.At this time, the image data may correspond to a file image extracted by applying a down sampling technique to the normal file or the malicious file.

이 때, 복수개의 CNN 계층들은 각각 풀링 계층(pooling layer)을 포함할 수 있다.At this time, each of the plurality of CNN layers may include a pooling layer.

이 때, 이미지 데이터는 기설정된 사이즈의 그레이 스케일(gray scale) 이미지에 상응할 수 있다.At this time, the image data may correspond to a gray scale image of a predetermined size.

이 때, 악성코드 탐지 방법은 상기 대상파일에 상응하는 이미지 데이터와 상기 대상파일에 대한 악성코드 탐지결과를 기반으로 상기 악성코드 탐지 모델을 업데이트하는 단계를 더 포함할 수 있다.In this case, the malicious code detection method may further include updating the malicious code detection model based on the image data corresponding to the target file and the malicious code detection result for the target file.

또한, 본 발명의 일실시예에 따른 악성코드 탐지 장치는, 악성코드가 포함되지 않은 정상파일과 악성코드가 포함된 악성파일에 대해 각각 이미지 데이터를 추출하고, 상기 이미지 데이터를 기반으로 딥 러닝(deep learning)을 수행하여 악성코드를 탐지하기 위한 이미지 기반의 악성코드 탐지 모델을 생성하고, 악성코드 감염여부를 판단하기 위한 대상파일에 상응하는 이미지 데이터를 상기 악성코드 탐지 모델과 비교하여 악성코드를 탐지하는 프로세서; 및 상기 악성코드 탐지 모델과 상기 악성코드 탐지 모델을 생성하기 위한 이미지 데이터를 저장하는 메모리를 포함한다.Also, the malicious code detection apparatus according to an embodiment of the present invention extracts image data for each of a normal file including no malicious code and a malicious file including a malicious code, and performs a deep learning operation based on the image data based malicious code detection model for detecting malicious code by performing deep learning on the detected malicious code and comparing the image data corresponding to the target file with the malicious code detection model to determine whether the malicious code is infected, A processor to detect; And a memory for storing the malicious code detection model and the image data for generating the malicious code detection model.

이 때, 프로세서는 복수개의 CNN(Convolutional Neural Network) 계층들과 복수개의 완전연결계층들로 구성된 딥 러닝 모델을 이용하여 상기 딥 러닝을 수행할 수 있다.At this time, the processor can perform the deep learning using a deep learning model composed of a plurality of CNN (Convolutional Neural Network) layers and a plurality of complete connection layers.

이 때, 프로세서는 상기 대상파일에 상응하는 이미지 데이터와 상기 대상파일에 대한 악성코드 탐지결과를 기반으로 상기 악성코드 탐지 모델을 업데이트할 수 있다.At this time, the processor may update the malicious code detection model based on the image data corresponding to the target file and the malicious code detection result for the target file.

본 발명에 따르면, 패턴을 추출하여 비교하는 방식에 의존하지 않는 새로운 악성코드 탐지 방법을 제공할 수 있다.According to the present invention, it is possible to provide a new malicious code detection method that does not depend on a method of extracting and comparing patterns.

또한, 본 발명은 API 시스템 콜을 추출하는 것보다 신속하게 악성코드 탐지 딥 러닝을 위한 데이터를 획득할 수 있는 방법을 제공할 수 있다.In addition, the present invention can provide a method for acquiring data for malicious code detection and deep-processing more quickly than extracting an API system call.

또한, 본 발명은 패턴이 알려지지 않은 제로데이 악성코드에도 대응할 수 있는 악성코드 탐지 기술을 제공할 수 있다.In addition, the present invention can provide a malicious code detection technique capable of coping with a zero-day malicious code whose pattern is not known.

도 1은 본 발명의 일실시예에 따른 파일이미지를 이용한 악성코드 탐지 시스템을 나타낸 도면이다.
도 2는 본 발명의 일실시예에 따른 악성코드 탐지 방법을 나타낸 동작 흐름도이다.
도 3은 본 발명에 따른 정상파일에서 추출된 이미지 데이터의 일 예를 나타낸 도면이다.
도 4는 본 발명에 따른 악성파일에서 추출된 이미지 데이터의 일 예를 나타낸 도면이다.
도 5는 본 발명에 따른 딥러닝 모델의 일 예를 나타낸 도면이다.
도 6은 본 발명의 일실시예에 따른 악성코드 탐지 장치를 나타낸 블록도이다.
도 7는 본 발명의 일실시예에 따른 컴퓨터 시스템을 나타낸 도면이다.1 is a diagram illustrating a malicious code detection system using a file image according to an embodiment of the present invention.
2 is a flowchart illustrating a malicious code detection method according to an embodiment of the present invention.
3 is a diagram illustrating an example of image data extracted from a normal file according to the present invention.
4 is a view showing an example of image data extracted from a malicious file according to the present invention.
5 is a diagram showing an example of a deep learning model according to the present invention.
6 is a block diagram illustrating a malicious code detection apparatus according to an embodiment of the present invention.
7 illustrates a computer system in accordance with an embodiment of the present invention.

본 발명을 첨부된 도면을 참조하여 상세히 설명하면 다음과 같다. 여기서, 반복되는 설명, 본 발명의 요지를 불필요하게 흐릴 수 있는 공지 기능, 및 구성에 대한 상세한 설명은 생략한다. 본 발명의 실시형태는 당 업계에서 평균적인 지식을 가진 자에게 본 발명을 보다 완전하게 설명하기 위해서 제공되는 것이다. 따라서, 도면에서의 요소들의 형상 및 크기 등은 보다 명확한 설명을 위해 과장될 수 있다.The present invention will now be described in detail with reference to the accompanying drawings. Hereinafter, a repeated description, a known function that may obscure the gist of the present invention, and a detailed description of the configuration will be omitted. Embodiments of the present invention are provided to more fully describe the present invention to those skilled in the art. Accordingly, the shapes and sizes of the elements in the drawings and the like can be exaggerated for clarity.

이하, 본 발명에 따른 바람직한 실시예를 첨부된 도면을 참조하여 상세하게 설명한다.Hereinafter, preferred embodiments according to the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일실시예에 따른 파일이미지를 이용한 악성코드 탐지 시스템을 나타낸 도면이다.1 is a diagram illustrating a malicious code detection system using a file image according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 일실시예에 따른 파일이미지를 이용한 악성코드 탐지 시스템은 악성코드에 감염된 파일을 이미지 파일로 변환하여 딥러닝 모델을 학습시키는 방식으로 악성코드의 탐지를 수행할 수 있다. Referring to FIG. 1, a malicious code detection system using a file image according to an embodiment of the present invention detects a malicious code by converting a file infected with malicious code into an image file and learning a deep learning model have.

이 때, 본 발명의 일실시예에 따른 시스템은 크게 두 가지 부분으로 나뉠 수 있다. At this time, the system according to an embodiment of the present invention can be largely divided into two parts.

첫 번째는 정상파일(Normal File)과 악성파일(Malware)로부터 이미지 데이터를 추출(Feature Extraction)하는 것이다. The first is feature extraction of image data from normal files and malware.

예를 들어, 본 발명의 일실시예에 따른 악성코드 탐지 장치를 이용하여 악성코드가 포함되지 않은 정상파일과 악성코드가 포함된 악성파일에 대해 각각 이미지 데이터를 추출할 수 있다.For example, by using the malicious code detection apparatus according to an embodiment of the present invention, it is possible to extract image data for normal files that do not contain malicious code and malicious files that contain malicious code.

이 때, 이미지 데이터는 정상파일 또는 악성파일에 대해 다운샘플링(down sampling) 기법을 적용하여 추출된 파일 이미지에 상응할 수 있다.At this time, the image data may correspond to a file image extracted by applying a down sampling technique to a normal file or a malicious file.

이 때, 이미지 데이터는 기설정된 사이즈의 그레이 스케일(gray scale) 이미지에 상응할 수 있다. At this time, the image data may correspond to a gray scale image of a predetermined size.

그리고 두 번째는 추출된 이미지 데이터를 사용하여 딥 러닝 모델을 학습하고(Deep Learning Model Training), 학습된 딥 러닝 모델을 이용하여 대상파일에 대해 악성코드 탐지를 수행(Malware Detection)하는 것이다. Second, Deep Learning Model training is performed using extracted image data, and Malware Detection is performed on the target file using the learned deep learning model.

예를 들어, 본 발명의 일실시예에 따른 악성코드 탐지 장치는 이미지 데이터를 기반으로 딥 러닝을 수행하여 악성코드를 탐지하기 위한 이미지 기반의 악성코드 탐지 모델을 생성하고, 악성코드 감염여부를 판단하기 위한 대상파일에 상응하는 이미지 데이터를 악성코드 탐지 모델과 비교하여 악성코드를 탐지할 수 있다. For example, the malicious code detection apparatus according to an embodiment of the present invention generates an image-based malicious code detection model for detecting malicious code by performing deep learning based on image data, and determines whether or not the malicious code is infected The malicious code can be detected by comparing the image data corresponding to the target file with the malicious code detection model.

이 때, 복수개의 CNN(Convolutional Neural Network) 계층들과 복수개의 완전연결계층들로 구성된 딥 러닝 모델을 이용하여 딥 러닝을 수행할 수 있다.At this time, the deep learning can be performed using a deep learning model composed of a plurality of CNN (Convolutional Neural Network) layers and a plurality of complete connection layers.

또한, 도 1에는 도시하지 아니하였으나, 대상파일에 대한 악성코드 탐지가 완료된 경우, 탐지결과를 반영하여 딥 러닝 모델을 업데이트할 수도 있다. Also, although not shown in FIG. 1, when the malicious code detection for the target file is completed, the deep learning model may be updated by reflecting the detection result.

예를 들어, 본 발명의 일실시예에 따른 악성코드 탐지 장치는 대상파일에 상응하는 이미지 데이터와 대상파일에 대한 악성코드 탐지결과를 기반으로 악성코드 탐지 모델을 업데이트할 수 있다. For example, the malicious code detection apparatus according to an embodiment of the present invention can update the malicious code detection model based on the image data corresponding to the target file and the malicious code detection result for the target file.

도 2는 본 발명의 일실시예에 따른 악성코드 탐지 방법을 나타낸 동작 흐름도이다.2 is a flowchart illustrating a malicious code detection method according to an embodiment of the present invention.

도 2를 참조하면, 본 발명의 일실시예에 따른 악성코드 탐지 방법은 악성코드가 포함되지 않은 정상파일과 악성코드가 포함된 악성파일에 대해 각각 이미지 데이터를 추출한다(S210). 즉, 정상파일과 악성파일을 각각 이미지로 변환할 수 있다. Referring to FIG. 2, a malicious code detection method according to an embodiment of the present invention extracts image data for a normal file including no malicious code and a malicious file including a malicious code, respectively (S210). That is, normal files and malicious files can be converted into images, respectively.

이 때, 이미지 데이터는 정상파일 또는 악성파일에 대해 다운샘플링(down sampling) 기법을 적용하여 추출된 파일 이미지에 상응할 수 있다. 예를 들어, 정상파일이나 악성파일에서 각각 복수개의 불변하는 특징점들을 추출하고, 추출된 특징점들을 이미지화하는 방식으로 다운샘플링을 수행하여 각각의 파일에 대한 이미지를 추출할 수 있다.At this time, the image data may correspond to a file image extracted by applying a down sampling technique to a normal file or a malicious file. For example, it is possible to extract an image for each file by extracting a plurality of unchanging feature points from a normal file or a malicious file and downsampling the extracted feature points.

이 때, 이미지 데이터는 기설정된 사이즈의 그레이 스케일(gray scale) 이미지에 상응할 수 있다. 예를 들어, 이미지 데이터는 32 * 32 픽셀에 사이즈를 갖는 그레이 스케일 이미지에 상응할 수 있다.At this time, the image data may correspond to a gray scale image of a predetermined size. For example, the image data may correspond to a gray scale image having a size of 32 * 32 pixels.

이 때, 그레이 스케일 이미지에서 각 픽셀은 0~255의 값을 가질 수 있다. At this time, in the grayscale image, each pixel may have a value of 0 to 255. [

이 때, 정상파일에서 추출된 이미지 데이터와 악성파일에서 추출된 이미지 데이터는 시각적으로도 구분이 가능할 수 있다.At this time, the image data extracted from the normal file and the image data extracted from the malicious file can be visually distinguished.

예를 들어, 도 3과 도 4를 참조하면, 도 3에 도시된 이미지는 본 발명의 일실시예에 따라 정상파일에서 추출된 이미지일 수 있고, 도 4에 도시된 이미지는 본 발명의 일실시예에 따라 악성파일에서 추출된 이미지에 해당할 수 있다. For example, referring to FIGS. 3 and 4, the image shown in FIG. 3 may be an image extracted from a normal file according to an embodiment of the present invention, and the image shown in FIG. According to the example, it may correspond to the image extracted from the malicious file.

이 때, 본 발명에서 악성코드를 탐지하기 위해 이미지 데이터, 즉 파일 이미지를 사용하는 이유는 악성코드의 변종이 기존의 악성코드와 유사한 이미지를 가지기 때문이다. In this case, the reason why the image data, that is, the file image, is used to detect a malicious code in the present invention is that the variant of the malicious code has an image similar to the existing malicious code.

종래의 악성코드 탐지 방법을 이용하여 악성코드를 탐지하는데 가장 큰 문제점 중 하나가 바로 다수의 악성코드 변종들에 의한 공격일 수 있다. 즉, 하나의 악성코드로부터 다양한 변종이 발생할 수 있기 때문에 모든 악성코드들의 패턴을 분석하여 대응하는 것은 매우 어려운 일이다. 그러나, 본 발명과 같이 악성파일에서 추출된 악성코드의 이미지를 이용한다면, 유사한 변종의 악성코드들도 탐지가 가능하기 때문에 보다 효과적이고 효율적으로 악성코드를 탐지해 낼 수 있다.One of the biggest problems in detecting malicious code using conventional malicious code detection method may be attack by many malicious code variants. In other words, it is very difficult to analyze and respond to patterns of all malicious codes because various variants can occur from one malicious code. However, if an image of a malicious code extracted from a malicious file is used as in the present invention, a malicious code of a similar variant can be detected, so that malicious code can be detected more effectively and efficiently.

예를 들어, 악성코드 A가 감염된 악성파일로부터 이미지 데이터를 추출하여 악성코드 탐지에 이용한다고 가정한다면, 악성코드 A_1, 악성코드 A_2, ... 악성코드 A_n 등의 변종 악성코드들도 탐지될 수 있다. For example, if malicious code A extracts image data from malicious files infected and uses it to detect malicious code, malicious codes such as malicious code A_1, malicious code A_2, ... malicious code A_n may be detected have.

또한, 본 발명의 일실시예에 따른 악성코드 탐지 방법은 이미지 데이터를 기반으로 딥 러닝(deep learning)을 수행하여 악성코드를 탐지하기 위한 이미지 기반의 악성코드 탐지 모델을 생성한다(S220). 즉, 정상파일과 악성파일에서 각각 추출된 이미지 데이터를 가지고 악성코드 탐지 모델을 학습시킬 수 있다.In addition, in the malicious code detection method according to an embodiment of the present invention, deep-learning is performed based on image data to generate an image-based malicious code detection model for detecting malicious code at step S220. That is, the malicious code detection model can be learned by using the image data extracted from the normal file and the malicious file, respectively.

이 때, 이미지 데이터를 통해 악성코드에 상응하는 데이터를 빠르게 추출할 수 있다. 예를 들어, 딥 러닝 데이터를 획득하기 위해 API 시스템 콜 함수를 사용하는 경우, API 시스템 콜을 추출하기 위하여 파일을 수분 동안 실행시켜야 할 수 있다. 즉, API 시스템 콜을 이용하는 경우, 데이터를 추출하는데 시간이 오래 걸리는 단점이 존재하지만 본 발명에 따른 이미지 데이터는 보다 빠르게 데이터를 추출할 수 있다.At this time, data corresponding to the malicious code can be quickly extracted through the image data. For example, when using the API system call function to obtain deep running data, the file may need to be run for a few minutes to extract the API system call. That is, when the API system call is used, there is a drawback that it takes a long time to extract the data, but the image data according to the present invention can extract data more quickly.

이 때, 복수개의 CNN(Convolutional Neural Network) 계층들과 복수개의 완전연결계층들로 구성된 딥 러닝 모델을 이용하여 딥 러닝을 수행할 수 있다. At this time, the deep learning can be performed using a deep learning model composed of a plurality of CNN (Convolutional Neural Network) layers and a plurality of complete connection layers.

이 때, CNN 계층은 합성곱 계층(Convolutional Layer), 뉴런함수(Rectified Linear Unit), 풀링 계층(Pooling Layer)을 번갈아 적용하는 것으로, 마지막에는 완전연결계층(Fully-connected Layer)으로 구성될 수 있다. At this time, the CNN layer alternately applies a convolutional layer, a neuron function (Rectified Linear Unit), and a pooling layer, and finally, the CNN layer may be configured as a fully-connected layer .

예를 들어, 본 발명에서 제안하는 악성코드 탐지 모델, 즉 딥 러닝 모델은 3개의 CNN 계층들과 2개의 완전연결계층들을 포함할 수 있다. 이 때, CNN 계층의 경우에는 일부분만이 다음 계층과 연관관계를 가지기 때문에 이미지의 지역성을 반영할 수 있는 장점이 있다. 또한, 완전연결계층은 기본적인 딥 러닝 학습이 가능하도록 하는 역할을 할 수 있다.For example, the malicious code detection model proposed in the present invention, i.e., the deep learning model, may include three CNN layers and two complete connection layers. In this case, in the case of the CNN layer, since only a part of the CNN layer has a relation with the next layer, there is an advantage that the localization of the image can be reflected. In addition, the complete connection layer can play a role in enabling basic deep learning learning.

이 때, 합성곱 계층을 사용함으로써 딥 러닝에서 학습에 필요한 파라미터의 수를 감소시킬 수 있어 학습시간을 절약할 수 있다.At this time, the number of parameters required for learning in the deep learning can be reduced by using the product multiply layer, and the learning time can be saved.

이 때, 복수개의 CNN 계층들은 각각 풀링 계층(pooling layer)을 포함할 수 있다. 이 때, 풀링 계층은 이미지의 사소한 변화에도 강건한 딥 러닝 모델을 만들 수 있도록 해줄 수 있다. At this time, each of the plurality of CNN layers may include a pooling layer. At this point, the pooling layer can help to create a robust deep-running model even for small changes in the image.

또한, 본 발명의 일실시예에 따른 악성코드 탐지 방법은 악성코드 감염여부를 판단하기 위한 대상파일에 상응하는 이미지 데이터를 악성코드 탐지 모델과 비교하여 악성코드를 탐지한다(S230).In the malicious code detection method according to an embodiment of the present invention, malicious code is detected by comparing image data corresponding to a target file for determining malicious code infection with a malicious code detection model.

이 때, 악성코드 탐지 모델은 다양한 종류의 악성코드들에 대한 이미지 데이터를 통해 학습되어 생성될 수 있으므로 대상파일의 이미지 데이터와 패턴을 비교하는 방식으로 악성코드를 탐지하는데 사용될 수 있다.In this case, since the malicious code detection model can be generated by learning image data of various types of malicious codes, it can be used to detect malicious codes by comparing patterns of image data of a target file with patterns.

또한, 도 2에는 도시하지 아니하였으나, 본 발명의 일실시예에 따른 악성코드 탐지 방법은 대상파일에 상응하는 이미지 데이터와 대상파일에 대한 악성코드 탐지결과를 기반으로 악성코드 탐지 모델을 업데이트할 수 있다. Although not shown in FIG. 2, the malicious code detection method according to an embodiment of the present invention can update the malicious code detection model based on the image data corresponding to the target file and the malicious code detection result for the target file have.

따라서, 악성코드 탐지 모델은 시간이 지날수록 보다 효과적이고 범용적으로 악성코드를 탐지할 수 있다.Therefore, the malicious code detection model can detect malicious code more effectively and universally over time.

또한, 도 2에는 도시하지 아니하였으나, 본 발명의 일실시예에 따른 악성코드 탐지 방법은 악성코드 탐지를 위해 필요한 정보를 송수신할 수 있다. 특히, 본 발명에 따른 악성코드 탐지 방법에서는 악성코드 탐지 모델을 생성하기 위해 이미지 데이터를 추출할 정상파일이나 악성파일을 수신하거나, 악성코드 감염여부를 판단하기 위한 대상파일 등을 수신할 수 있다.Although not shown in FIG. 2, the malicious code detection method according to an embodiment of the present invention can transmit and receive information necessary for malicious code detection. In particular, in the malicious code detection method according to the present invention, it is possible to receive a normal file or a malicious file from which image data is to be extracted to generate a malicious code detection model, or a target file for judging whether or not the malicious code is infected.

또한, 도 2에는 도시하지 아니하였으나, 본 발명의 일실시예에 따른 악성코드 탐지 방법은 상술한 바와 같이 악성코드 탐지 과정에서 발생하는 다양한 정보를 저장한다. Also, although not shown in FIG. 2, the malicious code detection method according to an embodiment of the present invention stores various information generated in the malicious code detection process as described above.

본 발명의 일실시예에 따른 악성코드 탐지 방법에서는 API 시스템 콜을 추출하는 것보다 빠르게 딥 러닝 데이터를 추출하기 때문에 실시간으로 악성코드를 탐지하기 위한 악성코드 탐지 모델을 생성 및 업데이트할 수 있다. In the malicious code detection method according to an embodiment of the present invention, since the deep learning data is extracted faster than the API system call is extracted, a malicious code detection model for detecting a malicious code in real time can be created and updated.

또한, 본 발명의 일실시예에 따른 악성코드 탐지 방법은 기존의 악성코드 탐지엔진과는 다르게 분석가의 분석을 통해 얻어지는 악성코드패턴이 따로 필요하지 않기 때문에 패턴이 알려지지 않은 제로데이 악성코드에 대해서도 신속하게 대응할 수 있는 장점을 갖는다.In addition, the malicious code detection method according to an embodiment of the present invention differs from the existing malicious code detection engine in that the malicious code pattern obtained through the analysis of the analyst is not required separately, so that the zero- It is possible to cope with this problem.

도 5는 본 발명에 따른 딥 러닝 모델의 일 예를 나타낸 도면이다.5 is a diagram showing an example of a deep learning model according to the present invention.

도 5를 참조하면, 본 발명에 따른 딥 러닝 모델은 합성곱 계층(510, 520, 530)과 풀링 계층(511, 521, 531)으로 구성된 복수개의 CNN 계층들과 복수개의 완전연결계층(540, 550)들로 구성될 수 있다. 5, a deep learning model according to an embodiment of the present invention includes a plurality of CNN layers composed of a convolution product layer 510, 520, 530 and a pooling layer 511, 521, 531, a plurality of complete connection layers 540, 550).

이 때, CNN 계층은 현재 계층의 일부분이 다음 계층과 연관관계를 가지기 때문에 이미지의 지역성을 반영할 수 있다. 이 때, CNN 계층에는 합성곱 계층(510, 520, 530)과 함께 풀링 계층(511, 521, 531)이 사용되는데, 풀링 계층(511, 521, 531)은 이미지의 사소한 변화에도 강건한 딥 러닝 모델을 만들 수 있도록 할 수 있다. At this time, the CNN layer can reflect the localness of the image because a part of the current layer has an association with the next layer. In this case, the pooling layers 511, 521 and 531 are used together with the composite product layers 510, 520 and 530 in the CNN layer. The pooling layers 511, 521 and 531 are robust deep- Can be created.

이 때, 완전연결계층(540, 550)은 기본적인 딥 러닝 학습이 가능하도록 할 수 있다. At this time, the complete connection layer 540 and 550 can enable basic deep learning learning.

도 6은 본 발명의 일실시예에 따른 악성코드 탐지 장치를 나타낸 블록도이다.6 is a block diagram illustrating a malicious code detection apparatus according to an embodiment of the present invention.

도 6을 참조하면, 본 발명의 일실시예에 따른 악성코드 탐지 장치는 통신부(610), 프로세서(620) 및 메모리(630)를 포함한다.Referring to FIG. 6, a malicious code detection apparatus according to an embodiment of the present invention includes a communication unit 610, a processor 620, and a memory 630.

통신부(610)는 악성코드 탐지를 위해 필요한 정보를 송수신하는 역할을 할 수 있다. 특히, 본 발명의 일실시예에 따른 통신부(610)는 악성코드 탐지 모델을 생성하기 위해 이미지 데이터를 추출할 정상파일이나 악성파일을 수신하거나, 악성코드 감염여부를 판단하기 위한 대상파일을 수신할 수도 있다.The communication unit 610 can transmit and receive information necessary for malicious code detection. In particular, the communication unit 610 according to an embodiment of the present invention receives a normal file or a malicious file from which image data is to be extracted to generate a malicious code detection model, or receives a target file for determining whether the malicious code is infected It is possible.

프로세서(620)는 악성코드가 포함되지 않은 정상파일과 악성코드가 포함된 악성파일에 대해 각각 이미지 데이터를 추출한다. 즉, 정상파일과 악성파일을 각각 이미지로 변환할 수 있다. The processor 620 extracts image data for a normal file including no malicious code and a malicious file including a malicious code, respectively. That is, normal files and malicious files can be converted into images, respectively.

예를 들어, 악성코드 A가 감염된 악성파일로부터 이미지 데이터를 추출하여 악성코드 탐지에 이용한다고 가정한다면, 악성코드 A_1, 악성코드 A_2, ... 악성코드 A_n 등의 변종 악성코드들도 탐지될 수 있다.For example, if malicious code A extracts image data from malicious files infected and uses it to detect malicious code, malicious codes such as malicious code A_1, malicious code A_2, ... malicious code A_n may be detected have.

또한, 프로세서(620)는 이미지 데이터를 기반으로 딥 러닝(deep learning)을 수행하여 악성코드를 탐지하기 위한 이미지 기반의 악성코드 탐지 모델을 생성한다. 즉, 정상파일과 악성파일에서 각각 추출된 이미지 데이터를 가지고 악성코드 탐지 모델을 학습시킬 수 있다.The processor 620 also performs deep learning based on the image data to generate an image-based malicious code detection model for detecting malicious code. That is, the malicious code detection model can be learned by using the image data extracted from the normal file and the malicious file, respectively.

이 때, 복수개의 CNN 계층들은 각각 풀링 계층(pooling layer)을 포함할 수 있다. 이 때, 풀링 계층은 이미지의 사소한 변화에도 강건한 딥 러닝 모델을 만들 수 있도록 해줄 수 있다.At this time, each of the plurality of CNN layers may include a pooling layer. At this point, the pooling layer can help to create a robust deep-running model even for small changes in the image.

또한, 프로세서(620)는 악성코드 감염여부를 판단하기 위한 대상파일에 상응하는 이미지 데이터를 악성코드 탐지 모델과 비교하여 악성코드를 탐지한다. In addition, the processor 620 compares the image data corresponding to the target file for determining whether the malicious code is infected with the malicious code detection model to detect malicious code.

또한, 프로세서(620)는 대상파일에 상응하는 이미지 데이터와 대상파일에 대한 악성코드 탐지결과를 기반으로 악성코드 탐지 모델을 업데이트한다.In addition, the processor 620 updates the malicious code detection model based on the image data corresponding to the target file and the malicious code detection result for the target file.

메모리(630)는 악성코드 탐지 모델과 악성코드 탐지 모델을 생성하기 위한 이미지 데이터를 저장한다.The memory 630 stores image data for generating a malicious code detection model and a malicious code detection model.

또한, 메모리(630)는 상술한 바와 같이 본 발명의 일실시예에 따른 악성코드 탐지 장치에서 발생하는 다양한 정보를 저장한다.In addition, the memory 630 stores various information generated in the malicious code detection apparatus according to an embodiment of the present invention as described above.

실시예에 따라, 메모리(630)는 악성코드 탐지 장치와 독립적으로 구성되어 악성코드 탐지를 위한 기능을 지원할 수 있다. 이 때, 메모리(630)는 별도의 대용량 스토리지로 동작할 수 있고, 동작 수행을 위한 제어 기능을 포함할 수도 있다.According to an embodiment, the memory 630 may be configured independently of the malicious code detection device to support functions for malicious code detection. At this time, the memory 630 may operate as a separate mass storage and may include a control function for performing operations.

한편, 악성코드 탐지 장치는 메모리가 탑재되어 그 장치 내에서 정보를 저장할 수 있다. 일 구현예의 경우, 메모리는 컴퓨터로 판독 가능한 매체이다. 일 구현 예에서, 메모리는 휘발성 메모리 유닛일 수 있으며, 다른 구현예의 경우, 메모리는 비휘발성 메모리 유닛일 수도 있다. 일 구현예의 경우, 저장장치는 컴퓨터로 판독 가능한 매체이다. 다양한 서로 다른 구현 예에서, 저장장치는 예컨대 하드디스크 장치, 광학디스크 장치, 혹은 어떤 다른 대용량 저장장치를 포함할 수도 있다.On the other hand, a malicious code detection device can store information in the device by mounting a memory. In one implementation, the memory is a computer-readable medium. In one implementation, the memory may be a volatile memory unit, and in other embodiments, the memory may be a non-volatile memory unit. In one implementation, the storage device is a computer-readable medium. In various different implementations, the storage device may comprise, for example, a hard disk device, an optical disk device, or any other mass storage device.

이와 같은 악성코드 탐지 장치를 이용함으로써, 패턴을 추출하여 비교하는 방식에 의존하지 않는 새로운 악성코드 탐지 방법을 제공할 수 있다.By using such a malicious code detection apparatus, it is possible to provide a new malicious code detection method that does not depend on a method of extracting and comparing patterns.

또한, API 시스템 콜을 추출하는 것보다 신속하게 악성코드 탐지 딥 러닝을 위한 데이터를 획득할 수 있는 방법을 제공할 수 있다.In addition, it is possible to provide a method for acquiring data for malicious code detection and deep-running more quickly than extracting an API system call.

또한, 패턴이 알려지지 않은 제로데이 악성코드에도 대응할 수 있는 악성코드 탐지 기술을 제공할 수 있다.In addition, it is possible to provide a malicious code detection technique capable of coping with a zero-day malicious code whose pattern is unknown.

도 7는 본 발명의 일실시예에 따른 컴퓨터 시스템을 나타낸 도면이다.7 illustrates a computer system in accordance with an embodiment of the present invention.

도 7을 참조하면, 본 발명의 실시예는 컴퓨터로 읽을 수 있는 기록매체와 같은 컴퓨터 시스템에서 구현될 수 있다. 도 7에 도시된 바와 같이, 컴퓨터 시스템(700)은 버스(720)를 통하여 서로 통신하는 하나 이상의 프로세서(710), 메모리(730), 사용자 입력 장치(740), 사용자 출력 장치(750) 및 스토리지(760)를 포함할 수 있다. 또한, 컴퓨터 시스템(700)은 네트워크(780)에 연결되는 네트워크 인터페이스(770)를 더 포함할 수 있다. 프로세서(710)는 중앙 처리 장치 또는 메모리(730)나 스토리지(760)에 저장된 프로세싱 인스트럭션들을 실행하는 반도체 장치일 수 있다. 메모리(730) 및 스토리지(760)는 다양한 형태의 휘발성 또는 비휘발성 저장 매체일 수 있다. 예를 들어, 메모리는 ROM(731)이나 RAM(732)을 포함할 수 있다.Referring to FIG. 7, embodiments of the present invention may be implemented in a computer system such as a computer-readable recording medium. 7, the computer system 700 includes one or more processors 710, a memory 730, a user input device 740, a user output device 750, and a storage 730, which communicate with one another via a bus 720. [ (760). In addition, the computer system 700 may further include a network interface 770 coupled to the network 780. The processor 710 may be a central processing unit or a semiconductor device that executes the processing instructions stored in the memory 730 or the storage 760. [ Memory 730 and storage 760 may be various types of volatile or non-volatile storage media. For example, the memory may include ROM 731 or RAM 732.

따라서, 본 발명의 실시예는 컴퓨터로 구현된 방법이나 컴퓨터에서 실행 가능한 명령어들이 기록된 비일시적인 컴퓨터에서 읽을 수 있는 매체로 구현될 수 있다. 컴퓨터에서 읽을 수 있는 명령어들이 프로세서에 의해서 수행될 때, 컴퓨터에서 읽을 수 있는 명령어들은 본 발명의 적어도 한 가지 측면에 따른 방법을 수행할 수 있다.Thus, embodiments of the invention may be embodied in a computer-implemented method or in a non-volatile computer readable medium having recorded thereon instructions executable by the computer. When instructions readable by a computer are executed by a processor, the instructions readable by the computer are capable of performing at least one aspect of the invention.

이상에서와 같이 본 발명에 따른 파일이미지의 비교를 이용한 악성코드 탐지 방법 및 이를 위한 장치는 상기한 바와 같이 설명된 실시예들의 구성과 방법이 한정되게 적용될 수 있는 것이 아니라, 상기 실시예들은 다양한 변형이 이루어질 수 있도록 각 실시예들의 전부 또는 일부가 선택적으로 조합되어 구성될 수도 있다.As described above, the malicious code detection method and the apparatus for detecting malicious code using the comparison of file images according to the present invention are not limited to the configuration and method of the embodiments described above, All or some of the embodiments may be selectively combined.

510, 520, 530: 합성곱 계층 511, 521, 531: 풀링 계층
540, 550: 완전연결 계층 610: 통신부
620: 프로세서 630: 메모리
700: 컴퓨터 시스템 710: 프로세서
720: 버스 730: 메모리
731: 롬 732: 램
740: 사용자 입력 장치 750: 사용자 출력 장치
760: 스토리지 770: 네트워크 인터페이스
780: 네트워크510, 520, 530: Composite product layer 511, 521, 531: Pooling layer
540, 550: complete connection layer 610:
620: Processor 630: Memory
700: computer system 710: processor
720: bus 730: memory
731: ROM 732: RAM
740: User input device 750: User output device
760: Storage 770: Network Interface
780: Network

Claims

Extracting image data for a normal file not including the malicious code and a malicious file including the malicious code, respectively;
Generating an image-based malicious code detection model for detecting malicious code by performing deep learning based on the image data; And
Detecting malicious code by comparing the image data corresponding to the target file for judging whether the malicious code is infected with the malicious code detection model
The malicious code detection method comprising the steps of:

The method according to claim 1,
The generating step
Wherein the deep learning is performed by using a deep learning model composed of a plurality of CNN (Convolutional Neural Network) layers and a plurality of complete connection layers.

The method according to claim 1,
The image data
Wherein the malicious file corresponds to a file image extracted by applying a down sampling technique to the normal file or the malicious file.

The method of claim 2,
Wherein the plurality of CNN layers each include a pooling layer.

The method of claim 3,
Wherein the image data corresponds to a gray scale image of a predetermined size.

The method according to claim 1,
The malicious code detection method
Further comprising updating the malicious code detection model based on the image data corresponding to the target file and the malicious code detection result for the target file.

An image-based malicious code for extracting image data for a malicious file containing no malicious code and a malicious file containing malicious code, and performing deep learning based on the image data to detect malicious code, A processor for generating a code detection model and comparing the image data corresponding to the target file for judging whether the malicious code is infected with the malicious code detection model to detect malicious code; And
A memory for storing image data for generating the malicious code detection model and the malicious code detection model;
And a malicious code detection unit for detecting malicious code.

The method of claim 7,
The processor
Wherein the deep learning is performed using a deep learning model composed of a plurality of CNN (Convolutional Neural Network) layers and a plurality of complete connection layers.

The method of claim 7,
The image data
Wherein the malicious file corresponds to a file image extracted by applying a down sampling technique to the normal file or the malicious file.

The method of claim 8,
Wherein the plurality of CNN layers each include a pooling layer.

The method of claim 9,
Wherein the image data corresponds to a gray scale image of a predetermined size.

The method of claim 7,
The processor
And updates the malicious code detection model based on the image data corresponding to the target file and the malicious code detection result for the target file.