KR101589656B1

KR101589656B1 - System and method for detecting and inquiring metamorphic malignant code based on action

Info

Publication number: KR101589656B1
Application number: KR1020150008756A
Authority: KR
Inventors: 최보민; 강홍구; 황동욱; 이태진; 신영상; 김병익
Original assignee: 한국인터넷진흥원
Priority date: 2015-01-19
Filing date: 2015-01-19
Publication date: 2016-01-28

Abstract

Disclosed are a system and a method for detecting and inquiring a metamorphic malignant code based on PI. According to the present invention, the system includes a malignant code analysis system which extracts application program interface (API) call information which a malignant code calls by performing a malignant doubt execution file, and detects the malignant action of the malignant code by using the extracted API call information; and a similarity analysis system which calculates API call similarity between at least two malignant codes by using a malignant code list where the malignant codes are collected. The API call similarity is obtained by using the API call information and/or the malignant code list. Thereby, a metamorphic malignant code of which a part is mutated is effectively detected.

Description

FIELD OF THE INVENTION [0001] The present invention relates to a malicious code variant detection system,

본 실시예들은 API 기반의 기반의 변종 악성 코드를 탐지하고 조회하기 위한 악성 코드 변종 탐지 조회 시스템 및 방법에 관한 것이다.The embodiments are directed to a malware variant detection and query system and method for detecting and querying variant malicious codes based on an API.

보안제품 성능평가 기관에서 악성 코드를 체크한 결과, 2014년 10월까지 1억개가 넘는 새로운 악성 코드가 발견되었다고 발표하였다. The security product performance assessment agency checked the malware and found that more than 100 million new malicious code was detected by October 2014.

이 처럼, 급증하고 있는 악성 코드에 대한 신속한 대응을 위하여, 악성 코드 분석을 자동화하려는 연구가 활발하게 진행되고 있다.In order to respond quickly to such rapidly growing malicious codes, studies are being actively conducted to automate malicious code analysis.

이러한 흐름에 따라, 최근에는 커널 레벨에서 악성코드 행위를 자동적으로 분석하는 시스템이 제안되었다.Recently, a system for automatically analyzing malicious code behavior at the kernel level has been proposed.

그러나, 기존의 악성 코드 탐지 시스템은 파일, 레지스터 및 프로세스 등의 기본적인 행위 이벤트만 모니터링하였기 때문에 세부적인 행위 분석이 불가능한 문제점을 안고 있었다.However, the existing malware detection system has been unable to analyze detailed behavior because it only monitored basic action events such as files, registers and processes.

더욱이, 악성 코드에 포함된 일부 코드를 바꾸는 변종 악성 코드가 날로 증가되고 있는 문제점을 안고 있었다.Moreover, there was an increasing number of variants of malicious code that changed some code included in malicious code.

한국공개특허 : 제2012-0124638호, 공개일자 : 2012년 11월 14일, 발명의 명칭 : 행위 기반의 악성코드 탐지 시스템 및 악성코드 탐지 방법.Korean Patent Publication No. 2012-0124638, publication date: November 14, 2012 Title of the invention: an action-based malware detection system and malicious code detection method.

본 실시예들은 악성 코드가 호출하는 API 호출 정보를 추출하여 악성 코드의 악성 행위를 탐지하기 API 기반 악성 코드 변종 탐지 시스템 및 방법을 제공하는데 그 목적이 있다.It is an object of the present invention to provide an API based malicious code variant detection system and method for detecting malicious behavior of malicious code by extracting API call information called by malicious code.

또한, 본 실시예들은 변종 악성 코드 형태를 알 수 있는 API 호출 정보 또는/및 행위 코드와 행위 그룹을 이용하여 API 호출 유사도를 추출하는 API 기반 악성 코드 변종 탐지 시스템 및 방법을 제공하는데 그 다른 목적이 있다.In addition, the present embodiments provide an API based malicious code variant detection system and method for extracting API call similarity using API call information and / or behavior code and action group that can recognize the variant malicious code type. have.

또한, 본 실시예들은 변종 악성 코드 형태를 알 수 있는 같은 군에 속하는 악성 행위를 분류한 행위 코드와 행위 그룹을 추출하는 API 기반 악성 코드 변종 탐지 시스템 및 방법을 제공하는데 또 다른 목적이 있다.It is another object of the present invention to provide an API-based malicious code variant detection system and method for extracting malicious code belonging to the same group that can recognize the variant malicious code type.

또한, 본 실시예들은 악성 행위, 변종 악성 코드를 조회하기 위한 행위 기반 악성 코드 변종 조회 시스템 및 방법을 제공하는데 또 다른 목적이 있다.In addition, the embodiments have another object to provide a behavior-based malicious code variant inquiry system and method for inquiring malicious behavior and variant malicious codes.

본 발명의 일 실시예에 따르면, 악성 의심 실행 파일을 실행시켜 악성 코드가 호출하는 API(Application Program Interface) 호출 정보를 추출하고, 추출된 상기 API 호출 정보를 이용하여 악성 코드의 악성 행위를 탐지하는 악성 코드 분석 시스템; 및 상기 악성 코드를 모아놓은 악성 코드 리스트를 이용하여 탐지된 적어도 둘 이상의 상기 악성 코드간 API 호출 유사도를 계산하는 유사도 분석 시스템을 포함하는 API 기반 악성 코드 변종 탐지 및 조회 시스템이 제공된다.According to an embodiment of the present invention, a malicious suspicious execution file is executed to extract application program interface (API) call information called by malicious code, and malicious code malicious behavior is detected using the extracted API call information Malicious code analysis system; And a similarity analysis system for calculating an API call similarity between at least two malicious codes detected using the malicious code list in which the malicious codes are collected.

여기서, 상기 악성 코드 분석 시스템은 네트워크망에 접속된 네트워크 트래픽 센서로부터 상기 악성 의심 실행 파일을 수집할 수 있다.Here, the malicious code analysis system may collect the malicious suspicious execution file from a network traffic sensor connected to the network.

또한, 상기 악성 코드 분석 시스템은 상기 악성 의심 실행 파일, 제1 API 호출 정보 및 상기 악성 코드의 악성 행위를 저장하는 제1 데이테 베이스를 더 포함할 수 있다.The malicious code analysis system may further include a malicious suspicious execution file, first API call information, and a first database for storing malicious behavior of the malicious code.

또한, 상기 악성 코드 분석 시스템은 사용자 레벨 및 커널 레벨상에서 API 후킹을 통해 상기 악성 코드가 호출하는 API 호출 정보를 추출할 수 있다.In addition, the malicious code analysis system can extract API call information called by the malicious code through API hooking on the user level and the kernel level.

또한, 상기 악성 코드 분석 시스템은 상기 API 호출 정보를 미리 설정된 악성 코드 룰셋을 적용하여 상기 악성 행위를 탐지할 수 있다.In addition, the malicious code analysis system can detect the malicious behavior by applying the malicious code rule set in advance to the API call information.

또한, 상기 악성 코드 분석 시스템은 후킹 필터링을 포함한 상기 악성 코드 룰셋을 적용할 수 있다.Also, the malicious code analysis system may apply the malicious code rule set including the hooking filtering.

또한, 상기 악성 코드 분석 시스템은 가상화 악성 행위 및 리얼 타임 악성 행위를 포함한 상기 악성 행위를 탐지할 수 있다.In addition, the malicious code analysis system can detect the malicious behavior including virtualization malicious activity and real-time malicious activity.

또한, 상기 악성 코드 분석 시스템으로부터 제공받은 악성 행위와 기저장된 행위 분류 규칙 정보간 매칭을 통해 행위 코드들을 생성하고, 유사한 행위를 가진 상기 행위 코드들을 그룹화한 행위 그룹을 생성하는 행위 분류 시스템;을 더 포함하고, 상기 유사도 분석 시스템은 상기 행위 그룹 내의 탐지된 악성 코드를 모아놓은 악성 코드 리스트를 이용하여 동일 행위 그룹내의 적어도 둘 이상의 악성 코드간 API 호출 유사도를 측정할 수 있다.Also, a behavior classification system for generating behavior codes by matching malicious behavior provided from the malicious code analysis system with pre-stored behavior classification rule information, and creating an action group in which the behavior codes having similar behavior are grouped And the similarity analysis system can measure the API call similarity among at least two malicious codes in the same action group by using the malicious code list gathering the detected malicious codes in the action group.

또한, 상기 행위 분류 규칙 정보는 상기 API 호출 정보의 호출에 따른 행위 정보와 상기 행위 정보의 행위 룰에 포함된 API를 식별하기 위한 API 번호; 해당 상기 행위를 수행하기 위하여 참조하는 객체인 파라미터; 상기 객체의 실제값인 파라미터값, 해당 API가 호출되었을 때, 반듯이 함께 호출되는 연관 API; 및 상기 연관 API와 함께 호출되어야 해당 행위에 매칭되는 것인지를 식별하는 플래그를 포함하여 이루어질 수 있다.The behavior classification rule information includes an API number for identifying the action information according to the call of the API call information and the API included in the action rule of the action information, A parameter which is an object to be referred to for performing the action; A parameter value that is an actual value of the object, an associated API that is called together when the corresponding API is called; And a flag identifying whether the action is to be invoked with the associated API to match the action.

또한, 상기 행위 분류 시스템은 생성된 상기 행위 그룹에 포함된 상기 악성 코드를 모아놓은 악성 코드 리스트를 이용하여 API 시퀀스를 추출할 수 있다.In addition, the behavior classification system may extract an API sequence using a malicious code list that collects the malicious codes included in the generated action group.

또한, 상기 행위 분류 시스템은 추출된 상기 API 시퀀스와 상기 행위 분류 규칙 정보내의 단위 행위간 일치 여부를 통해 각 상기 행위 코드별 비트 코드(1,0)를 생성할 수 있다.In addition, the behavior classification system can generate bit codes (1, 0) for each of the behavior codes through the coincidence between the extracted API sequence and the unit behavior in the behavior classification rule information.

또한, 상기 행위 분류 시스템은 상기 행위 코드, 행위 그룹, 악성 코드 리스트, API 시퀀스 및 비트 코드를 저장하는 제2 데이터베이스를 포함할 수 있다.In addition, the behavior classification system may include a second database for storing the behavior code, the action group, the malicious code list, the API sequence, and the bit code.

또한, 상기 제1 및 제2 데이터베이스에 저장된 정보들을 조회하고, 변종 악성 코드를 확인하기 위한 상기 정보들의 조합과 산출을 수행하는 악성코드 조회 시스템을 더 포함할 수 있다.The system may further include a malicious code inquiry system for inquiring information stored in the first and second databases and performing a combination and calculation of the information for identifying the variant malicious code.

또한, 상기 유사도 분석 시스템은 상기 악성 코드 리스트에 포함된 임의의 적어도 둘 이상의 악성 코드간 개별 API 구성, API 호출 순서 및 빈도를 파악하여 상기 API 호출 유사도를 측정할 수 있다.Also, the similarity analysis system can measure the API call similarity by grasping the individual API configuration, the API calling order, and the frequency between any two or more malicious codes included in the malicious code list.

또한, 상기 유사도 분석 시스템은 악성코드 해쉬리스트를 포함한 상기 악성 코드 리스트를 활용할 수 있다.In addition, the similarity analysis system can utilize the malicious code list including the malicious code hash list.

또한, 상기 유사도 분석 시스템은 상기 악성코드 해쉬리스트에 포함된 임의의 둘 이상의 해쉬를 이용한 상기 API 호출 정보에 대한 API 코드화를 수행하여 API 코드 시퀀스를 추출할 수 있다.In addition, the similarity analysis system can perform API coding on the API call information using any two or more hashes included in the malicious code hash list to extract the API code sequence.

또한, 상기 유사도 분석 시스템은 추출된 상기 API 코드 시퀀스와 더불어, N-gram을 이용하여 상기 API 호출 유사도를 측정할 수 있다.In addition, the similarity analysis system can measure the API call similarity using the N-gram in addition to the extracted API code sequence.

또한, 상기 유사도 분석 시스템은 상기 API 호출 정보 집합, 악성 코드 리스트, API 코드화, API 코드 시퀀스 및 API 호출 유사도를 저장하는 제3 데이터베이스를 포함할 수 있다.The similarity analysis system may include a third database for storing the API call information set, the malicious code list, the API code, the API code sequence, and the API call similarity.

또한, 상기 제3 데이터베이스에 저장된 정보들을 조회하고, 변종 악성 코드를 확인하기 위한 상기 정보들의 조합과 산출을 수행하는 악성코드 조회 시스템을 더 포함할 수 있다.The system may further include a malicious code inquiry system for inquiring information stored in the third database and performing a combination and calculation of the information for identifying the variant malicious code.

또한, 본 발명의 다른 일 실시예에 따르면, (a) 악성 의심 실행 파일을 실행시킨 후, 악성 코드가 호출하는 API(Application Program Interface) 호출 정보를 악성 코드 분석 시스템에서 추출하는 단계; (b) 추출된 상기 API 호출 정보를 이용하여 상기 악성 코드에 대한 악성 행위를 악성 코드 분석 시스템에서 탐지하는 단계; 및 (c) 상기 API 호출 정보와 탐지된 상기 악성 코드를 모아놓은 악성 코드 리스트를 이용하여 적어도 둘 이상의 상기 악성 코드간 API 호출 유사도를 유사도 분석 시스템에서 측정하는 단계를 포함하는 API 기반 악성 코드 변종 탐지 및 조회 방법이 제공된다.According to another embodiment of the present invention, there is provided a malicious code analysis method comprising the steps of: (a) extracting application program interface (API) call information called by a malicious code from a malicious code analysis system after executing a malicious suspicious execution file; (b) detecting malicious behavior for the malicious code in the malicious code analysis system using the extracted API call information; And (c) measuring the API call similarity between at least two of the malicious codes in the similarity analysis system using the API call information and the malicious code list gathering the detected malicious codes. And an inquiry method are provided.

여기서, API 기반 악성 코드 변종 탐지 및 조회 방법은 (d) 탐지된 상기 악성 행위와 기저장된 행위 분류 규칙 정보간 매칭을 통해 행위 코드들을 행위 분류 시스템에서 생성하는 단계; 및 (e) 상기 행위 코드를 이용하여 같은 행위군에 속하는 악성 코드들을 그룹화한 행위 그룹을 행위 분류 시스템에서 생성하는 단계;를 더 포함하고, 상기 (c) 단계는 상기 행위 그룹으로부터 추출된 API 호출 정보를 모아놓은 API 호출 정보 집합과 탐지된 상기 악성 코드를 모아놓은 악성 코드 리스트를 이용하여 동일 행위 그룹내의 적어도 둘 이상의 악성 코드간 상기 API 호출 유사도를 측정할 수 있다.The API-based malicious code variant detection and inquiry method includes the steps of: (d) generating behavior codes in the behavior classification system through matching between the detected malicious behavior and pre-stored behavior classification rule information; And (e) generating an action group in which malicious codes belonging to the same action group are grouped by using the action code in a behavior classification system, wherein (c) The API call similarity between at least two malicious codes in the same action group can be measured using a set of API call information including information and a malicious code list of the detected malicious codes.

또한, 상기 (c) 단계는 상기 악성 코드 리스트에 포함된 임의의 적어도 둘 이상의 악성 코드간 개별 API 구성, API 호출 순서 및 빈도를 파악하여 상기 API 호출 유사도를 측정할 수 있다.Also, the step (c) may measure the API call similarity by grasping the individual API configuration, the API calling sequence, and the frequency between any two or more malicious codes included in the malicious code list.

이상과 같이, 실시예들에 따르면, 윈도우즈 환경에서 API 호출 정보 및/또는 악성 코드 리스트 등을 이용하여 API 호출 유사도를 구함으로써 코드 일부가 변형된 변종 악성 코드를 효과적으로 탐지하는 효과가 있다.As described above, according to the embodiments, the API call similarity is obtained by using the API call information and / or the malicious code list in the Windows environment, thereby effectively detecting the variant malicious code in which the code part is modified.

또한, 실시예들에 따르면, 윈도우즈 환경에서 API 호출 정보 및 행위 코드와 행위 그룹을 구하고, 이를 토대로 API 호출 유사도를 구함으로써 코드 일부가 변형된 변종 악성 코드를 효과적으로 탐지하는 효과가 있다.In addition, according to the embodiments, there is an effect that the API call information, the action code and the action group are obtained in the Windows environment, and the API call similarity is obtained based on the API call information and action group, thereby effectively detecting the variant malicious code in which the code part is modified.

도 1은 본 발명의 일 실시예에 따른 악성 코드 변종 탐지 및 조회 시스템을 개략적으로 나타낸 구성도이다.
도 2는 본 발명의 일 실시예에 따른 악성 코드 변종 탐지 시스템의 악성 코드 분석 시스템을 보다 상세하게 나타낸 도면이다.
도 3은 본 발명의 일 실시예에 따른 악성 코드 관리 서버(110)에서 수신하는 분석 대상 트래픽 형태를 예시적으로 나타낸 도면이다.
도 4는 본 발명의 일 실시예에 따른 리얼 타임 분석 에이젼트 관점의 악성 코드 탐지 시스템을 나타낸 도면이다.
도 5는 기존 시스템과 본 실시예의 시스템(가상화 환경)을 통해 처리된 API 기반 악성행위 분석 결과를 나타낸 도면이다.
도 6은 기존 시스템과 본 실시예의 시스템을 통해 처리된 악성 코드 분석 결과를 일례로서 나타낸 도면이다.
도 7은 기존 시스템과 본 실시예의 시스템을 통해 처리된 악성 코드 처리 결과를 나타낸 도면이다.
도 8은 본 발명의 일 실시예에 따른 악성 코드 변종 탐지 시스템의 유사도 분석 시스템과 악성 코드 조회 시스템을 보다 상세하게 나타낸 도면이다.
도 9는 본 발명의 일 실시예에 따른 악성 코드 변종 탐지 시스템에서 생성된 악성 코드 리스트와 API 호출 정보의 형태를 예시적으로 나타낸 도면이다.
도 10은 본 발명의 일 실시예에 따른 유사도 분석 시스템에 적용되는 N-gram 알고리즘의 일례를 나타낸 도면이다.
도 11은 본 발명의 일 실시예에 따른 악성 코드 조회 시스템의 조회 결과를 예시적으로 나타낸 도면이다.
도 12는 본 발명의 일 실시예에 따른 악성 코드 변종 탐지 시스템의 행위 분류 시스템과 유사도 분석 시스템간의 관계와 악성 코드 조회 시스템을 보다 상세하게 나타낸 도면이다.
도 13은 본 발명의 일 실시예에 따른 행위 분류 시스템에서 생성된 행위 코드들의 형태를 예시적으로 나타낸 도면이다.
도 14는 본 발명의 일 실시예에 따른 행위 분류 시스템에서 생성된 행위 그룹의 형태를 예시적으로 나타낸 도면이다.
도 15 및 도 16은 본 발명의 일 실시예에 따른 악성 코드 조회 시스템의 조회 결과를 예시적으로 나타낸 도면이다.
도 17은 본 발명의 일 실시예에 따른 API 기반 악성 코드 변종 탐지 및 조회 방법을 예시적으로 나타낸 순서도이다.
도 18은 본 발명의 일 실시예에 따른 악성 코드 변종 탐지 및 조회 방법의 악성 행위 탐지 방법을 보다 상세하게 나타낸 순서도이다.1 is a block diagram schematically showing a malicious code variant detection and inquiry system according to an embodiment of the present invention.
FIG. 2 is a detailed view of a malicious code analysis system of a malicious code variant detection system according to an embodiment of the present invention.
FIG. 3 is a diagram exemplarily showing a type of traffic to be analyzed received by the malicious code management server 110 according to an embodiment of the present invention.
4 is a diagram illustrating a malicious code detection system in the real time analysis agent view according to an embodiment of the present invention.
5 is a diagram showing an API-based malicious behavior analysis result processed through the existing system and the system (virtualization environment) of the present embodiment.
6 is a diagram showing an example of malicious code analysis result processed through the existing system and the system of the present embodiment.
7 is a diagram showing a malicious code processing result processed through the existing system and the system of the present embodiment.
FIG. 8 is a diagram illustrating a malicious code variant detection system according to an embodiment of the present invention in more detail.
FIG. 9 is a diagram exemplifying the types of malicious code list and API call information generated in the malicious code variant detection system according to an embodiment of the present invention.
10 is a diagram illustrating an example of an N-gram algorithm applied to a similarity analysis system according to an embodiment of the present invention.
11 is a diagram illustrating an example of a query result of a malicious code inquiry system according to an embodiment of the present invention.
12 is a diagram illustrating a relationship between a behavior classification system and a similarity analysis system of a malicious code variant detection system and a malicious code inquiry system in more detail according to an embodiment of the present invention.
FIG. 13 is a diagram illustrating a form of behavior codes generated in a behavior classification system according to an exemplary embodiment of the present invention. Referring to FIG.
FIG. 14 is a diagram illustrating an exemplary behavior group generated in a behavior classification system according to an embodiment of the present invention. Referring to FIG.
15 and 16 are diagrams illustrating exemplary results of a malicious code inquiry system according to an embodiment of the present invention.
17 is a flowchart illustrating an API based malicious code variant detection and inquiry method according to an exemplary embodiment of the present invention.
18 is a flowchart illustrating a malicious behavior detection method of malicious code variant detection and inquiry method according to an embodiment of the present invention in more detail.

이하, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명을 용이하게 실시할 수 있도록 하기 위하여, 본 발명의 바람직한 실시예들에 관하여 첨부된 도면을 참조하여 상세히 설명하기로 한다. 도면에서 유사한 참조부호는 여러 측면에 걸쳐서 동일하거나 유사한 기능을 지칭한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings, so that those skilled in the art can easily carry out the present invention. In the drawings, like reference numerals refer to the same or similar functions throughout the several views.

<제1 실시예>&Lt; Embodiment 1 >

도 1은 본 발명의 일 실시예에 따른 악성 코드 변종 탐지 및 조회 시스템을 개략적으로 나타낸 구성도이다.1 is a block diagram schematically showing a malicious code variant detection and inquiry system according to an embodiment of the present invention.

도 1를 참조하면, 본 발명의 일 실시예에 따른 악성 코드 변종 탐지 및 조회 시스템(1000)은 API 호출 정보를 추출하여 악성 코드의 악성 행위를 탐지하는 악성 코드 분석 시스템(100), 탐지된 악성 행위에 기반하여 행위 코드와 행위 그룹을 생성하여 변종된 악성 코드를 탐지하는 행위 분류 시스템(200), 상기 API 호출 정보 또는/및 행위 그룹내의 적어도 둘 이상의 악성 코드간 API 호출 유사도를 구하여 변종 악성 코드를 탐지하는 유사도 분석 시스템(300) 및 앞서 설명한 악성 행위와 변종 악성 코드 등을 조회하는 악성코드 조회 시스템(400)을 포함할 수 있다. Referring to FIG. 1, a malicious code variant detection and inquiry system 1000 according to an embodiment of the present invention includes a malicious code analysis system 100 for detecting malicious code of malicious code by extracting API call information, A behavior classification system (200) for detecting an altered malicious code by generating an action code and an action group on the basis of the behavior, and an API call similarity between at least two malicious codes in the API call information and / And a malicious code inquiry system 400 for searching malicious behavior and variant malicious code as described above.

이러한 각 구성간에는 외부 네트워크 예컨대 유무선 통신망으로 연결되거나 내부 네트워크로 연결될 수 있다. 그러나, 각 구성간 연결되는 네트워크는 위와 같은 네트워크 구성에 한정되지는 않는다. 이하에서는, 각 구성에 대하여 보다 상세히 설명하고자 한다. Between each of these configurations, an external network such as a wired / wireless communication network or an internal network may be connected. However, the network to be connected between the respective configurations is not limited to the above network configuration. Hereinafter, each configuration will be described in more detail.

<악성 행위 탐지 예><Examples of detection of malicious behavior>

도 2는 본 발명의 일 실시예에 따른 악성 코드 변종 탐지 시스템의 악성 코드 분석 시스템을 보다 상세하게 나타낸 도면이다.FIG. 2 is a detailed view of a malicious code analysis system of a malicious code variant detection system according to an embodiment of the present invention.

도 2를 참조하면, 본 발명의 일 실시예에 따른 악성 코드 분석 시스템(100)은 API 분석 요청, 분석 할당 및 분석 결과 조회 및 저장을 포함한 악성 행위 분석 전반을 관리한다.Referring to FIG. 2, the malicious code analysis system 100 according to an exemplary embodiment of the present invention manages overall malicious behavior analysis including API analysis request, analysis assignment, and analysis result inquiry and storage.

이를 위해, 악성 코드 분석 시스템(110)은 악성 코드 관리 서버(110) 및 가상화 분석 에이젼트(120)를 포함할 수 있다. 먼저, 악성 코드 관리 서버(110)는 악성 코드 분석 대상이 되는 분석 대상 트래픽을 네트워크 트래픽 센서(101)로부터 수집한다. To this end, the malicious code analysis system 110 may include a malicious code management server 110 and a virtualization analysis agent 120. First, the malicious code management server 110 collects the analysis target traffic from the network traffic sensor 101 as a malicious code analysis target.

이때, 네트워크 트래픽 센서(101)는 네트워크, 예컨대 유,무선 네트워크에 접속되어 윈도우즈 환경에서 운영되는 시스템에서 실행된 응용 프로그램의 실행 파일을 포함한 트래픽을 수집하고, 분석이 필요한 분석 대상 트래픽을 추출하여 악성 코드 관리 서버(110)로 전송한다. 분석 요청된 분석 대상 트래픽의 일례는 도 3과 같이 나타낼 수 있다.At this time, the network traffic sensor 101 is connected to a network, for example, a wired or wireless network, collects traffic including an executable file of an application program executed in a system running in a Windows environment, extracts analysis target traffic requiring analysis, To the code management server (110). An example of the analysis target traffic requested for analysis can be shown in FIG.

따라서, 악성 코드 관리 서버(110)는 네트워크 트래픽 센서(101)로부터 분석 대상 트래픽을 수신하고, 이를 Rest API를 사용하여 분석 대상 트래픽에 포함된 응용 프로그램의 제1 악성 의심 실행 파일 및 각종 메타 정보를 데이터베이스(111)에 저장할 수 있다. Therefore, the malicious code management server 110 receives the traffic to be analyzed from the network traffic sensor 101, and uses the Rest API to store the first malicious suspicious execution file and various meta information of the application program included in the traffic to be analyzed And can be stored in the database 111.

이때, 수집된 응용 프로그램의 실행 파일은 윈도우즈 환경에서 실행 가능한 PE(Portable Executable) 파일인 것이 바람직하다.At this time, the executable file of the collected application program is preferably a PE (Portable Executable) file executable in a Windows environment.

그러나, 수집이 아닌, 입력받을 수도 있다. 즉, 악성 코드 관리 서버(110)는 수동적으로 적어도 하나 이상의 실행 파일을 입력받아 데이터베이스(111)에 저장할 수 있다. However, it may be input, not collection. That is, the malicious code management server 110 may manually receive at least one executable file and store the executable file in the database 111.

이때, 입력된 실행 파일은 윈도우즈 환경에서 실행 가능한 PE(Portable Executable) 파일인 것이 바람직하다. 그러나, 앞서 설명한 PE(Portable Executable) 파일로만 제한되지 않음은 물론이다.At this time, the input executable file is preferably a PE (Portable Executable) file executable in a Windows environment. However, it is needless to say that the present invention is not limited to the PE (Portable Executable) file described above.

다음으로, 가상화 분석 에이젼트(120)는 가상화 기술을 이용하여 동시에 구동되는 적어도 하나 이상의 가상화 에이젼트 모듈(121)을 포함할 수 있다. 이러한 가상화 에이젼트 모듈(121)은 가상화 환경에서 수행되는 윈도우즈 시스템을 가리킬 수 있다. Next, the virtualization analysis agent 120 may include at least one virtualization agent module 121 that is simultaneously operated using virtualization technology. The virtualization agent module 121 may refer to a Windows system that is executed in a virtualized environment.

가상화 에이젼트 모듈(121)이 구동이 되면, 가상화 에이젼트 모듈(121)은 악성 코드 관리 서버(110)로부터 수신된 제1 악성 의심 실행 파일을 실행시킬 수 있다. 실행 결과, 악성 코드가 호출하는 제1 API(Application Program Interface) 호출 정보가 추출될 수 있다.When the virtualization agent module 121 is activated, the virtualization agent module 121 can execute the first malicious suspicious execution file received from the malicious code management server 110. As a result of execution, the first application program interface (API) call information called by the malicious code can be extracted.

보다 구체적으로, 가상화 분석 서버(120)는 악성 코드 관리 서버(110)로부터 수신된 제1 악성 의심 실행 파일을 적어도 하나 이상의 가상화 에이젼트 모듈(121)을 이용하여 실행시킨 후, 악성 코드가 호출하는 제1 API(Application Program Interface) 호출 정보를 추출할 수 있다.More specifically, the virtualization analysis server 120 executes the first malicious suspicious execution file received from the malicious code management server 110 using at least one or more virtualization agent modules 121, 1 API (Application Program Interface) call information.

바람직하게는, 사용자 레벨 및 커넬 레벨에서 API 후킹을 통해 악성코드가 호출하는 API 정보를 모니터링하여 제1 API 호출 정보를 추출할 수 있다. 제1 API 호출 정보가 추출되면, 악성 코드에 대한 악성 행위를 알 수 있다.Preferably, the first API call information can be extracted by monitoring API information called by the malicious code through API hooking at the user level and the kernel level. When the first API call information is extracted, the malicious behavior for the malicious code can be known.

즉, '레지스트리 실행위치에 등록', '파일 복사', '웜 프로세스 실행', 'C:W에 로그 파일 생성', '중복실행방지를 위한 Mutex 생성' 및 '레지스트리의 실행 위치에 등록'과 같은 사용자 레벨 및 커널 레벨의 악성 행위를 알 수 있게 된다. 추출된 제1 API 호출 정보는 악성 코드 관리 서버(110)로 전송된다.In other words, 'Register at registry execution location', 'Copy file', 'Execute worm process', 'Create log file at C: W', 'Create mutex to prevent duplication' It is possible to know malicious behavior at the same user level and kernel level. The extracted first API call information is transmitted to the malicious code management server 110.

이와 같이, 사용자 레벨 및 커넬 레벨 상에서 모두 제1 API 호출 정보를 추출할 수 있기 때문에 다양한 API를 대상으로 악성코드 행위 분석이 가능한 잇점을 준다.Thus, since the first API call information can be extracted on both the user level and the kernel level, the malicious code behavior analysis can be performed on various APIs.

이런 경우, 악성 코드 관리 서버(110)는 가상화 분석 에이젼트(120)로부터 수신한 제1 API 호출 정보를 데이터베이스(101)에 저장할 수 있다.In this case, the malicious code management server 110 may store the first API call information received from the virtualization analysis agent 120 in the database 101.

한편, 저장된 제1 API 호출 정보를 이용하여 보다 세밀한 악성 행위를 탐지하기 위하여, 악성 코드 관리 서버(110)는 악성 행위 분석 관리 모듈(112)를 포함할 수 있다.Meanwhile, the malicious code management server 110 may include a malicious behavior analysis management module 112 to detect more detailed malicious behavior using the stored first API call information.

일 실시예에서, 악성 행위 분석 관리 모듈(112)은 가상화 분석 에이젼트(120)로부터 수신된 제1 API 호출 정보를 미리 설정된 악성 코드 룰셋을 적용하여 가상화 환경에서의 가상화 악성 행위를 탐지할 수 있다.In one embodiment, the malicious behavior analysis management module 112 may detect malicious malicious behavior in a virtual environment by applying a preset malicious code rule set to the first API call information received from the virtualization analysis agent 120.

이때, 악성 코드 룰셋은 후킹 필터링을 포함할 수 있다. 즉, 후킹 필터링을 포함한 악성 코드 룰셋을 제1 API 호출 정보에 적용하고, 후킹 필터링된 제1 API 호출 정보와 미리 정의된 악성 코드 룰셋을 비교하여 동일성이 확인되면, 악성 코드의 가상화 악성 행위를 탐지할 수 있다. 탐지된 가상화 악성 행위는 데이터베이스(111)에 저장됨은 물론이다.At this time, the malicious rule set may include hooking filtering. That is, the malicious code ruleset including the hooking filtering is applied to the first API call information, and when the hooking-filtered first API calling information is compared with the predefined malicious code rule set to identify the malicious code malicious code, can do. It is a matter of course that the detected malicious malicious behavior is stored in the database 111.

그러나, 제1 악성 의심 실행 파일로부터 모든 악성 코드를 가상화 환경에서 탐지되지 않을 수도 있다. 이를 대비하고자, 본 일 실시예에서는 리얼 타임 분석 에이젼트를 더 포함할 수 있다. 이러한 리얼 타임 분석 에이젼트는 도 4와 같이 나타낼 수 있다.However, all malicious code from the first malicious suspicious executable file may not be detected in the virtualized environment. In order to prepare for this, the present embodiment may further include a real-time analysis agent. This real-time analysis agent can be represented as shown in FIG.

도 4는 본 발명의 일 실시예에 따른 리얼 타임 분석 에이젼트 관점의 악성 코드 탐지 시스템을 나타낸 도면이다.4 is a diagram illustrating a malicious code detection system in the real time analysis agent view according to an embodiment of the present invention.

도 4를 참조하면, 본 발명의 일 실시예에 따른 악성 코드 탐지 시스템(100)은 악성 코드 관리 서버(110) 및 리얼 타임 분석 에이젼트(130)를 포함할 수 있다. 이때, 악성 코드 관리 서버(110)는 악성 행위 분석 관리 모듈(112)을 포함할 수 있다.Referring to FIG. 4, the malicious code detection system 100 according to an exemplary embodiment of the present invention may include a malicious code management server 110 and a real-time analysis agent 130. At this time, the malicious code management server 110 may include a malicious behavior analysis management module 112.

먼저, 악성 행위 분석 관리 모듈(112)은 실질적인 악성 행위를 분석하는 모듈로서, 가상화 악성 행위를 탐지하지 않은 제2 악성 의심 실행 파일을 데이터베이스(111)에 저장된 제1 악성 의심 실행 파일로부터 추출할 수 있다. 추출된 제2 악성 의심 실행 파일은 이후에 설명할 리얼 분석 서버(130)로 전송될 수 있다. First, the malicious behavior analysis management module 112 is a module for analyzing actual malicious behavior. The malicious behavior analysis management module 112 extracts a second malicious suspicious execution file that has not detected a malicious malicious behavior from the first malicious suspicious execution file stored in the database 111 have. The extracted second malicious suspicious execution file may be transmitted to the real analysis server 130 to be described later.

다음으로, 일 실시예에서, 리얼 타임 분석 에이젼트(130)는 악성 행위 분석 관리 모듈(112)로부터 제공받은 제2 악성 의심 실행 파일을 실행시키는 적어도 하나 이상의 리얼 타임 에이젼트(131)를 포함할 수 있다. Next, in one embodiment, the real-time analysis agent 130 may include at least one real-time agent 131 that executes the second malicious suspicious executable file provided from the malicious behavior analysis management module 112 .

즉, 리얼 타임 에이젼트(131)는 수신된 제2 악성 의심 실행 파일을 가상화 환경을 배제한 리얼 타임 환경에서 실행시킨 후, 악성 코드가 호출하는 제2 API(Application Program Interface) 호출 정보를 추출하게 된다. That is, the real-time agent 131 executes the received second malicious suspicious execution file in a real-time environment excluding the virtual environment, and then extracts second application program interface (API) calling information called by the malicious code.

바람직하게는, 사용자 레벨 및/또는 커널 레벨에서 API 후킹을 통해 악성 코드가 호출하는 API 정보를 모니터링함으로써, 리얼 타임 분석 에이젼트(130)는 제2 API 호출 정보를 추출할 수 있다. 추출된 제2 API 호출 정보는 악성 행위 분석 관리 모듈(112)로 전송될 수 있다.Preferably, the real-time analysis agent 130 can extract the second API call information by monitoring the API information called by the malicious code through API hooking at the user level and / or kernel level. The extracted second API call information may be transmitted to the malicious behavior analysis management module 112.

이에 따라, 악성 행위 분석 관리 모듈(112)은 리얼 타임 에이젼트(131)로부터 수신한 제2 API 호출 정보를 데이터베이스(111)에 저장하고, 저장된 제2 API 호출 정보를 미리 설정된 악성 코드 룰셋을 다시 적용시켜, 리얼 타임 악성 행위를 탐지하게 된다.Accordingly, the malicious behavior analysis management module 112 stores the second API call information received from the real-time agent 131 in the database 111, re-applies the previously stored malicious code rule set to the stored second API call information And detects real-time malicious activity.

이때, 악성 코드 룰셋은 앞서 설명한 바와 같이 동일한 관계로, 그 설명은 생략한다. 탐지된 리얼 타임 악성 행위는 데이테 베이스(111)에 저장됨은 물론이다.At this time, the malicious code ruleset has the same relationship as described above, and a description thereof will be omitted. Of course, the detected real-time malicious activity is stored in the database 111.

이와 같이, 본 실시예에서는 가상화 환경 또는/및 리얼 타임 환경에서 사용자 레벨 및 커널 레벨에 해당하는 모든 API 호출 정보(예: 제1 API 호출 정보와 제2 API 호출 정보)를 추출함으로써, 세부적인 악성 코드의 악성 행위를 탐지할 수 있는 장점을 준다.As described above, in this embodiment, by extracting all the API call information (e.g., the first API call information and the second API call information) corresponding to the user level and the kernel level in the virtualized environment and / or the real-time environment, It gives the advantage of detecting malicious behavior of code.

<비교 예><Comparative Example>

도 5는 기존 시스템과 본 실시예의 시스템(가상화 환경)을 통해 처리된 API 기반 악성행위 분석 결과를 나타내고, 도 6은 기존 시스템과 본 실시예의 시스템을 통해 처리된 악성 코드 분석 결과를 일례로서 나타내며, 도 7은 기존 시스템과 본 실시예의 시스템을 통해 처리된 악성 코드 처리 결과를 나타낸다.FIG. 5 shows an API-based malicious behavior analysis result processed through the existing system and the system (virtualization environment) of the present embodiment, FIG. 6 shows malicious code analysis results processed through the existing system and the system of the present embodiment as an example, 7 shows malicious code processing results processed through the existing system and the system of this embodiment.

일 실시예에서, 도 5는 실험을 통해 기존 분석 시스템에서 탐지하지 못한 악성 코드 행위를 제안된 악성 코드 탐지 시스템(100)에서 탐지할 수 있는지 여부를 비교하였다.In one embodiment, FIG. 5 compares whether malicious code behaviors that were not detected in the existing analysis system through the experiment can be detected in the proposed malicious code detection system 100. FIG.

실험에 의하면, 실제로 2013년도에 유포되었던 악성코드 샘플을 이용하였는데, 해당 악성코드 샘플은 윈도우즈 시스템 상의 백신 프로세스를 조회하고, 해당 백신 프로세스를 강제로 종료하였다. According to experiments, we actually used malicious code samples that were circulated in 2013, and the malicious code samples inquired about the anti-virus process on the Windows system and forcibly terminated the corresponding anti-virus process.

그리고, 웹에서 실행 파일을 다운로드하는 등의 악성 행위를 수행하였다. 기존 분석 시스템에서는 백신 프로세스 종료에 대한 행위가 탐지되었지만, 백신 프로세스 조회 행위는 탐지하지 못하였다. And, malicious behavior such as downloading the executable file from the web was performed. In the existing analysis system, the action for the termination of the vaccine process was detected, but the action for the vaccine process was not detected.

반면, 일 실시예에서 제안한 악성 코드 탐지 시스템(100)은 백신 프로세스 조회와 더불어, 도 5와 같이 악성 코드가 수행하는 세부적인 악성 행위를 알 수 있었다.On the other hand, in the malicious code detection system 100 proposed in one embodiment, in addition to the vaccine process inquiry, detailed malicious actions performed by the malicious code as shown in FIG. 5 can be recognized.

본 실험에서는 다수의 악성코드 샘플을 대상으로도 기존 행위 분석 시스템과 제안하는 시스템(100)의 분석 및 탐지 성능을 측정하였다. 실제로 유포된 110개 악성 코드 샘플을 이용한 분석 결과의 일례는 도 6과 같이 나타낼 수 있다.In this experiment, the analysis and detection performance of the existing behavior analysis system and the proposed system (100) are also measured for a plurality of malicious code samples. An example of the analysis result using 110 malicious code samples actually distributed can be shown in FIG.

도 6에서와 같이, 기존 분석 시스템은 탐지하지 못하는 행위를 일 실시예의 시스템(100)에서는 탐지되고 있음을 알 수 있었다. 이 결과로서, 도 7에서와 같이, 일 실시예에서 제안한 시스템(100)은 실험에 사용된 악성 코드 샘플 110개 중 97개를 탐지함으로써, 88%의 높은 성능을 보이고 있음을 알 수 있었으며, 기존 분석 시스템에서 탐지하지 못한 악성 코드의 악성 행위(예: 7개)까지도 더 탐지할 수 있었다. As shown in FIG. 6, it can be seen that the system 100 of the embodiment detects an action that the existing analysis system can not detect. As a result, as shown in FIG. 7, the system 100 proposed in the embodiment detects 97 out of 110 malicious code samples used in the experiment, and shows that the performance is 88% We could detect malicious code (eg, 7) of malicious code that was not detected by the analysis system.

<변형 악성 코드 탐지/ 조회 예 1><Modified Malicious Code Detection / Query Example 1>

도 8은 본 발명의 일 실시예에 따른 악성 코드 변종 탐지 시스템의 유사도 분석 시스템과 악성 코드 조회 시스템을 보다 상세하게 나타낸 도면이다.FIG. 8 is a diagram illustrating a malicious code variant detection system according to an embodiment of the present invention in more detail.

도 8을 참조하면, 본 발명의 일 실시예에 따른 악성 코드 변종 탐지 시스템(1000)은 유사도 분석 시스템(300) 및 악성 코드 조회 시스템(400)을 포함하고, 상기 유사도 분석 시스템(300)은 API 기반의 변종 악성 코드를 탐지하기 위하여 유사도 분석 서버(310)와 데이터베이스(320)를 포함할 수 있다.Referring to FIG. 8, a malicious code variant detection system 1000 according to an embodiment of the present invention includes a similarity analysis system 300 and a malicious code inquiry system 400. The similarity analysis system 300 includes an API And may include a similarity analysis server 310 and a database 320 to detect malicious code based on variant malicious code.

먼저, 유사도 분석 서버(310)는 악성 코드 분석 시스템(100)으로부터 API 호출 정보를 수신하여 데이터베이스(320)에 저장한다. 더욱이 악성 코드 분석 시스템(100)에서 탐지된 악성 코드를 모아놓은 악성 코드 리스트를 생성하게 된다.First, the similarity analysis server 310 receives the API call information from the malicious code analysis system 100 and stores the API call information in the database 320. Further, the malicious code analysis system 100 generates a list of malicious codes collecting detected malicious codes.

이때, 생성된 악성 코드 리스트는 유사도 측정 대상 악성코드 해쉬 리스트를 지칭한다. 이러한 악성 코드 리스트는 시간정보, 프로세스정보(PID, PPID), API 명, 파라미터1~8 정보 등을 포함할 수 있다. 이러한 악성 코드 리스트는 도 9와 같이 나타내었다.At this time, the generated malicious code list refers to a malicious code hash list to be subjected to the similarity measurement. The malicious code list may include time information, process information (PID, PPID), API name, parameter 1-8 information, and the like. Such a malicious code list is shown in FIG.

따라서, 유사도 분석 서버(310)는 위와 같은 악성 코드 리스트의 악성코드 해쉬 리스트를 이용하여 적어도 둘 이상의 악성 코드간 API 호출 유사도를 측정(계산)한다.Therefore, the similarity analysis server 310 measures (calculates) the API call similarity between at least two malicious codes using the malicious code hash list of the malicious code list as described above.

이때, 둘 이상의 악성 코드간 API 호출 유사도를 측정하기 위하여, 먼저 유사도 측정을 요청받은 적어도 둘 이상의 해쉬를 입력받아 API 호출 정보에 대한 코드화를 수행하여 API 코드 시퀀스를 추출한다. 이렇게 코드화된 API 코드 시퀀스는 파일로 만들어져 API 시퀀스 코드 폴더에 저장 및 관리된다. At this time, in order to measure the similarity of the API calls between two or more malicious codes, at least two hashes requested for similarity measurement are input, and the API call information is coded to extract the API code sequence. The coded API code sequence is created as a file and stored and managed in the API sequence code folder.

이에 따라, 유사도 분석 서버(310)는 추출된 API 코드 시퀀스와 더불어 N-gram 알고리즘을 이용하여 API 호출 유사도를 계산할 수 있다. 예를 들면, 추출된 API 코드 시퀀스를 입력값으로 받아들이고, 이를 N-gram 알고리즘에 적용하면 API 호출 유사도를 계산해 낼 수 있다.Accordingly, the similarity analysis server 310 can calculate the API call similarity using the extracted API code sequence and the N-gram algorithm. For example, if the extracted API code sequence is accepted as an input value and applied to the N-gram algorithm, the API call similarity can be calculated.

이때, 적용되는 N-gram 알고리즘은 한 문자열 내 N개의 인접한 음절(N-gram)의 출현 빈도를 확률적으로 표현한 것으로서, 전체 시퀀스로부터 서브 시퀀스들을 추출하고, 추출 가능한 서브 시퀀스에서 크기가 N인 서브 시퀀스를 추출할 수 있다.In this case, the applied N-gram algorithm is a probabilistic representation of the appearance frequencies of N adjacent syllables (N-grams) in one string. The subsequences are extracted from the entire sequence, and subsequences having sizes of N Sequences can be extracted.

예를 들면, 도 10(예: 3-gram)과 같이 N의 크기 = N개의 토큰이고, 문자열이 “SIGNATURE”의 3-gram 집합, 예컨대 {“SIGNA”, “IGNAT”, “GNATU”, “NATUR, ”ATURE“}인 경우, N의 값이 커질수록 호출 서열 정보를, N의 값이 작을수록 호출 빈도수가 크게 반영될 수 있다. 그러나, N= 1이면 일반 호출 빈도수의 유사도 계산과 동일하다. 다시 말해, N이 1일 때, 시퀀스에 포함된 각 문자 별 빈도수 누적 값이 같다.For example, a size of N = N tokens, and a string of 3-gram sets of "SIGNATURE", such as {"SIGNA", "IGNAT", "GNATU", " NATUR, "ATURE"}, the larger the value of N is, the larger the number of calls can be reflected as the value of N is smaller. However, if N = 1, it is the same as the calculation of the similarity of the general call frequency. In other words, when N is 1, the accumulated number of frequencies for each character in the sequence is the same.

이와 같이, N-gram 알고리즘이 적용되면, N의 크기 설정에 따라 호출 빈도, 호출 순서의 빈도 크기를 알 수 있기 때문에 적어도 둘 이상의 악성 코드간 API 코드 시퀀스에 대한 API 호출 유사도를 계산할 수 있게 된다. 이때, 유사도는 0~1 사이 값을 가질 수 있다.In this way, when the N-gram algorithm is applied, since the call frequency and the frequency of the call sequence can be known according to the size setting of N, it is possible to calculate the API call similarity for at least two malicious code API code sequences. At this time, the degree of similarity may have a value between 0 and 1.

다음으로, 본 실시예에서, 데이터베이스(320)는 앞서 설명한 API 호출 정보, 악성 코드 리스트, API 코드화, API 코드 시퀀스 및 계산된 API 호출 유사도 등을 저정하게 된다. 이와 같이, API 호출 유사도를 계산함으로써, 변종 악성 코드를 탐지하게 된다. Next, in the present embodiment, the database 320 stores the API call information, the malicious code list, the API code, the API code sequence, and the calculated API call similarity described above. Thus, the variant malicious code is detected by calculating the API call similarity.

반면, 본 실시예에서, 악성 코드 조회 시스템(400)은 데이터베이스(320)에 저장된 정보들을 조회하고, 변종 악성 코드를 확인하기 위한 정보들의 조합과 산출을 수행할 수 있다. 이러한 조회 및 산출은 사용자 인터페이스(GUI)를 통해 실현될 수 있다.On the other hand, in the present embodiment, the malicious code inquiry system 400 may inquire information stored in the database 320, and may perform a combination and calculation of information for identifying the variant malicious code. Such inquiry and output can be realized through a user interface (GUI).

예를 들면, 도 11에서와 같이 악성코드 A, 악성코드 B 간의 유사도를 조회하고자, "case1. [조회) mal1_id(24), mal2_id(11) 및 case2. [조회] mal1_id[24], mal2_id[67]"와 같이 조회 요청되면 요청받은 악성 코드 A,B가 동일 유사 그룹에 포함된다고 출력하며, 악성 코드 A, B가 상이한 그룹에 포함되어 과거 유사도 산출 결과가 없음을 출력할 수 있다. 이와 같이, 유사도 조회를 사용자 인터페이스(GUI)를 통해 실현할 수 있게 된다.For example, in order to inquire the similarity degree between malicious code A and malicious code B as shown in FIG. 11, "case1. [Lookup] mal1_id (24), mal2_id (11) and case2. [Lookup] mal1_id [24] 67], "it outputs that the malicious codes A and B requested are contained in the same similar group, and malicious codes A and B are included in different groups to output no past similarity calculation result. As described above, the similarity degree inquiry can be realized through the user interface (GUI).

<변형 악성 코드 탐지/ 조회 예 2>&Lt; Transformation Malicious Code Detection / Query Example 2 >

도 12는 본 발명의 일 실시예에 따른 악성 코드 변종 탐지 시스템의 행위 분류 시스템과 유사도 분석 시스템간의 관계와 악성 코드 조회 시스템을 보다 상세하게 나타낸 도면이다.12 is a diagram illustrating a relationship between a behavior classification system and a similarity analysis system of a malicious code variant detection system and a malicious code inquiry system in more detail according to an embodiment of the present invention.

도 12를 참조하면, 본 발명의 일 실시예에 따른 악성 코드 변종 탐지 시스템(1000)은 행위 분류 시스템(200), 유사도 분석 시스템(300) 및 악성 코드 조회 시스템(400)을 포함하고, 상기 행위 분류 시스템(200)은 행위 기반의 변종 악성 코드를 탐지하기 위하여 행위 분류 서버(210) 및 데이터베이스(220)를 포함할 수 있다. 12, a malicious code variant detection system 1000 according to an embodiment of the present invention includes a behavior classification system 200, a similarity analysis system 300, and a malicious code inquiry system 400, The classification system 200 may include a behavior classification server 210 and a database 220 to detect behavior-based variant malicious codes.

먼저, 행위 분류 서버(210)는 악성 코드 분석 시스템(100)로부터 악성 코드의 악성 행위 및/또는 API 호출 정보를 수신한다. 그리고, 행위 분류 서버(210)는 수신된 악성 코드의 악성 행위와 기저장된 행위 분류 규칙 정보간 매칭을 통해 행위 코드들을 생성한다. First, the behavior classification server 210 receives malicious code of the malicious code and / or API call information from the malicious code analysis system 100. The behavior classification server 210 generates behavior codes by matching the malicious behavior of the received malicious code with the pre-stored behavior classification rule information.

이때, 행위 분류 규칙 정보는 API 호출 정보의 호출에 따른 행위 정보와 상기 행위 정보의 행위 룰에 포함된 API를 식별하기 위한 API 번호와, 해당 상기 행위를 수행하기 위하여 참조하는 객체인 파라미터와, 상기 객체의 실제값인 파라미터값, 해당 API가 호출되었을 때, 반듯이 함께 호출되는 연관 API와 상기 연관 API와 함께 호출되어야 해당 행위에 매칭되는 것인지를 식별하는 플래그를 포함할 수 있다.In this case, the action classification rule information includes an API number for identifying API included in the action rule of the action information, an API number for identifying the API included in the action rule of the action information, A parameter value that is an actual value of an object, a flag that identifies whether an associated API that is called together when the API is invoked, and whether it should be called with the associated API to match the behavior.

따라서, 수신된 악성 코드의 악성 행위가 행위 분류 규칙에 속하는지를 확인하여 매칭이 이루어져 행위 코드들을 생성할 수 있게 된다. 이 처럼 생성된 행위 코드들의 일례는 도 13과 같이 나타낼 수 있다. Accordingly, it is confirmed that malicious behavior of the received malicious code belongs to the action classification rule, so that matching can be performed to generate action codes. An example of the generated behavior codes as shown in FIG. 13 can be shown.

도 13에서와 같이, 임의의 악성 행위들이 행위 분류 규칙에 따라 행위 코드로 구분되는 과정을 알 수 있다. 한편, 상기 행위 코드들은 악성코드별 행위를 식별하기 위한 코드로서, 복합적인 악성 행위를 하나의 비트열로 표현될 수 있다. 예를 들어, 행위 매칭시 '1', 비매칭시 '0'이라는 비트열로 표시될 수 있다. As shown in FIG. 13, it can be seen that a certain malicious behavior is classified into an action code according to a behavior classification rule. On the other hand, the behavior codes are codes for identifying malicious code-specific behaviors, and a complex malicious behavior can be represented by one bit string. For example, it can be displayed as a bit string '1' when matching behavior and '0' when mismatching.

이와 같이, 행위 코드가 생성되면, 행위 분류 서버(210)는 생성된 행위 코드를 이용하여 같은 행위군에 속하는 악성 코드들을 그룹화한 행위 그룹을 생성할 수 있다. In this manner, when the behavior code is generated, the behavior classification server 210 can generate an action group in which malicious codes belonging to the same action group are grouped by using the generated action code.

예를 들면, 도 14에 도시된 바와 같이 행위 코드를 부여받은 악성 코드들은 이를 기준으로 행위가 유사한 악성 코드들로 분류하기 위하여 위하여 그룹핑된 적어도 하나 이상의 행위 그룹을 생성할 수 있다. For example, as shown in FIG. 14, the malicious codes assigned with the action codes may generate at least one action group grouped so that the malicious codes are classified into similar malicious codes based on the malicious codes.

이때, 동일 행위 코드를 가진 악성 코드는 하나의 그룹으로 그룹핑되고, 이들은 상호 유사한 행위를 갖는 악성 코드로 분류될 수 있다. 또한, 1개 시그니처의 행위 코드는 1개 행위 그룹으로 식별되며, 행위 코드의 전체 개수는 유사 행위 그룹의 개수와 동일한 상태를 가질 수 있다. At this time, malicious codes having the same behavior code are grouped into one group, and they can be classified into malicious codes having mutually similar actions. Also, the behavior code of one signature is identified as one behavior group, and the total number of behavior codes may have the same state as the number of similar behavior groups.

이와 같이, 생성된 행위 코드와 행위 그룹 생성으로 인하여, 본 실시예에서는 행위 기반의 변종 악성 코드를 쉽게 확인 가능하다.Thus, due to the generated behavior code and action group, the behavior-based variant malicious code can be easily identified in the present embodiment.

또한, 일 실시예에서 행위 분류 서버(210)는 생성되어진 행위 그룹에 포함된 악성 코드를 모아놓은 악성 코드 리스트를 이용하여 API 시퀀스를 추출할 수 있다. 이때, 악성 코드 리스트에 대한 예는 앞서 설명한 도 9와 같이 나타낼 수 있다. 참고로, 도 9에서는 도 1에서 설명한 API 호출 정보도 함께 표시하였다.In addition, in one embodiment, the behavior classification server 210 may extract an API sequence using a malicious code list including malicious codes included in the generated action group. At this time, an example of the malicious code list can be represented as shown in FIG. 9 described above. 9, the API call information described in FIG. 1 is also displayed.

이 처럼 추출된 API 시퀸스는 행위 코드별 비트 코드를 생성하는데 활용될 수 있다. 즉, 행위 분류 서버(210)는 추출되어진 API 시퀀스와 행위 분류 규칙 정보내의 단위 행위간 일치 여부를 통해 각 행위 코드별 비트 코드(1,0)를 생성할 수 있다.The extracted API sequence can be used to generate bit codes for each behavior code. That is, the behavior classification server 210 can generate bit codes (1, 0) for each behavior code by matching the extracted API sequence with the unit behavior in the behavior classification rule information.

이와 같이, 생성된 비트 코드는 차후에 설명할 악성 코드 변종 조회 시스템(300)에서 조회시 비트 코드를 입력하여 변종 악성 코드와 같은 다양한 정보를 확인하는데 매우 유용하게 쓰인다.As described above, the generated bit code is very useful for inputting a bit code at the time of inquiry in the malicious code variant inquiry system 300 to be described later to confirm various information such as variant malicious codes.

반면, 본 실시예에서, 데이터베이스(220)는 앞서 설명한 바와 같이 생성된 행위 코드, 행위 그룹, 악성 코드 리스트, API 시퀀스 및 비트 코드 등을 저장하는 역할을 한다. 저장된 정보들은 이후에 설명할 악성코드 조회 시스템(400)에 의해 조회될 수 있다.On the other hand, in the present embodiment, the database 220 serves to store behavior codes, action groups, malicious code lists, API sequences and bit codes generated as described above. The stored information may be queried by the malicious code inquiry system 400 to be described later.

다음으로, 본 실시예에서, 유사도 분석 시스템(300)은 앞서 설명한 행위 분류 서버(210)에서 생성한 행위 그룹 내의 탐지된 악성 코드를 모아놓은 악성 코드 리스트를 이용하여 동일 행위 그룹내의 적어도 둘 이상의 악성 코드간 API 호출 유사도를 측정할 수 있다.Next, in the present embodiment, the similarity analysis system 300 uses the malicious code list that has collected the detected malicious codes in the behavior group generated by the behavior classification server 210 described above, You can measure the similarity of API call to code.

이때, API 호출 유사도 측정에 필요한 악성 코드 리스트는 앞서 설명한 도 8의 악성 코드 리스트와 동일하다. 다만, 본 실시예서는 행위 분류 시스템(200)과 관련된 동일 행위 그룹내의 적어도 둘 이상의 악성 코드간 API 호출 유사도를 측정할 수 있지만, 도 8에서는 행위 분류 시스템(200)이 아닌 악성 코드 분석 시스템(100)에서 탐지된 적어도 둘 이상의 악성 코드간 API 호출 유사도를 측정하는 것이 서로 차이가 있다.At this time, the malicious code list necessary for measuring the API call similarity is the same as the malicious code list of FIG. 8 described above. However, the present embodiment can measure the similarity of API calls between at least two malicious codes in the same action group related to the behavior classification system 200, but in the malicious code analysis system 100 ), The API call similarity between at least two malicious codes detected by the malicious codes is different.

이때, 본 실시예에서, 유사도 분석 시스템(300)도 API 코드 시퀀스와 더불어 N-gram 알고리즘을 이용하여 API 호출 유사도를 계산할 수 있다. 이러한 계산은 앞서 도 8에서 충분히 설명하였으므로 그 설명은 생략하기로 한다. 그리고, 위와 같이 계산되거나 입력 및 출력되는 정보들은 데이터베이스(320)에 저장될 수 있다.At this time, in this embodiment, the similarity analysis system 300 can calculate the API call similarity using the N-gram algorithm in addition to the API code sequence. Such calculation has been fully described in FIG. 8, and a description thereof will be omitted. The information calculated or input and output as described above may be stored in the database 320. [

마지막으로, 본 실시예에서, 악성 코드 조회 시스템(400)은 앞서 설명한 유사 분석 시스템(300)의 데이터베이스(320)와 행위 분류 시스템(200)의 데이터베이스(220)에 저장된 정보들을 조회하고, 변종 악성 코드를 확인하기 위한 정보들의 조합과 산출을 수행할 수 있다. 이러한 조회 및 산출은 사용자 인터페이스(GUI)를 통해 실현될 수 있다. 이러한 조회 및 산출은 사용자 인터페이스(GUI)를 통해 실현될 수 있다.Finally, in this embodiment, the malicious code inquiry system 400 inquires the information stored in the database 320 of the similarity analysis system 300 and the database 220 of the behavior classification system 200, The combination and calculation of the information to identify the code can be performed. Such inquiry and output can be realized through a user interface (GUI). Such inquiry and output can be realized through a user interface (GUI).

예를 들면, 도 15에서와 같이, 특정 그룹에 속한 변종 악성코드 목록을 조회하고자, 사용자 인터페이스를 통해 "[조회] group1_id(28), 채널정보(TYPE1), 변종판단결과(TRUE)"와 같이 조회 요청되면, MAL2_id 11, 52 ... 등이 변종 판단 결과에 따라 변종 그룹으로 산출되어 출력될 수 있다.For example, as shown in FIG. 15, in order to look up a list of variant malicious codes belonging to a specific group, a group name_id 28, a channel information TYPE 1, a variant judgment result TRUE When an inquiry is made, MAL2_id 11, 52, etc. may be calculated and output as a variant group according to the result of the variant determination.

또한, 도 16에서와 같이, 특정 악성코드 A에 대한 변종 조회를 위하여, " case1. [조회) mal1_id(24), 변종판단결과(TRUE)와 case2. [조회] mal1_id[24], 변종판단결과(TRUE), 그룹유형(TYPE1) 및 case3. [조회] mal1_id[24], 변종판단결과(TRUE), 그룹유형(TYPE2)"과 조회 요청되면, mal1_ID[24]에 대한 전체 변종 리스트를 출력하고, 동일 그룹 내 변종 리스트만 출력하며, 서로 다른 그룹에 속하는 변종 리스트를 출력할 수 있다.16, for the variant inquiry for a specific malicious code A, "case1. [Lookup] mal1_id (24), variant judgment result (TRUE) and case2. [Lookup] mal1_id [24] (TRUE), the group type (TYPE1), and the case3. [Inquiry] mal1_id [24], the variant determination result (TRUE), the group type (TYPE2) , Only a list of variants within the same group is output, and a list of variants belonging to different groups can be output.

<악성 행위 탐지 방법 예><Example of malicious activity detection method>

도 17은 본 발명의 일 실시예에 따른 API 기반 악성 코드 변종 탐지 및 조회 방법을 예시적으로 나타낸 순서도이고, 도 18은 본 발명의 일 실시예에 따른 악성 코드 변종 탐지 및 조회 방법의 악성 행위 탐지 방법을 보다 상세하게 나타낸 순서도이다.FIG. 17 is a flowchart illustrating an API-based malicious code variant detection and inquiry method according to an embodiment of the present invention. FIG. 18 is a flowchart illustrating a malicious code detection and inquiry method according to an exemplary embodiment of the present invention. Which is a flowchart showing the method in more detail.

도 17를 참조하면, 본 발명의 일 실시예에 따른 행위 기반 악성 코드 변종 탐지 및 조회 방법(S1000)은 API 호출 정보를 추출하여 악성 코드의 악성 행위를 악성 코드 분석 시스템(100)에서 탐지하는 단계(S100), 탐지된 악성 행위에 기반하여 행위 코드와 행위 그룹을 생성하여 변종된 악성 코드를 행위 분류 시스템(200)에서 탐지하는 단계(S200), 상기 API 호출 정보 또는/및 행위 그룹내의 적어도 둘 이상의 악성 코드간 API 호출 유사도를 구하여 변종 악성 코드를 유사도 분석 시스템(300)에서 탐지하는 단계(S300) 및 앞서 설명한 악성 행위와 변종 악성 코드 등을 악성코드 조회 시스템(400)에서 조회하는 단계(S400)를 포함하여 이루어질 수 있다. Referring to FIG. 17, an action-based malicious code variant detection and inquiry method (S1000) according to an embodiment of the present invention includes extracting API call information and detecting malicious code malicious code in the malicious code analysis system 100 (S100), detecting a malicious code that has been altered by generating an action code and an action group based on the detected malicious action (S200) in the behavior classification system (200), detecting at least two of the API call information and / The above-described malicious code and variant malicious code are searched in the malicious code inquiry system 400 (S400) by obtaining the similarity degree of the API call between the malicious codes and detecting the variant malicious code by the similarity analysis system 300 ).

이중에서 본 실시예에 따른 S100 단계는 도 18에서와 같이, 수집되거나 입력된 악성 의심 실행 파일을 저장하는 S110 단계, 악성 의심 실행 파일을 실행시켜 악성 코드가 호출하는 제1 API 호출 정보를 사용자 레벨 및 커널 레벨 상에서 추출하는 S120 단계, 상기 제1 API 호출 정보를 미리 설정된 악성 코드 룰셋을 적용하여 가상화 악성 행위를 탐지하는 S130 단계 및 상기 가상화 악성 행위를 탐지하지 않은 제2 악성 의심 실행 파일로부터 악성 코드가 호출하는 제2 API 호출 정보를 추출하여 리얼 타임 악성 행위를 탐지하는 S140 단계를 포함할 수 있다.18, in step S110, the malicious suspicious execution file collected or inputted is stored in step S110, the malicious suspicious execution file is executed, and the first API call information, which is called by the malicious code, And extracting the malicious code on the kernel level in step S120; detecting a malicious malicious behavior by applying a malicious code rule set in advance to the first API call information; and detecting malicious code from a malicious code file And detecting the real-time malicious behavior by extracting second API call information called by the first API call information.

먼저, S110 단계는 성 코드 분석 대상이 되는 분석 대상 트래픽을 악성 코드 관리 서버(110)에 의해 네트워크 트래픽 센서(101)로부터 수집한다. First, in step S110, the malicious code management server 110 collects the analysis target traffic from the network traffic sensor 101 as a target of analysis.

이때, 네트워크 트래픽 센서(101)는 네트워크, 예컨대 유,무선 네트워크에 접속되어 윈도우즈 환경에서 운영되는 시스템에서 실행된 응용 프로그램의 실행 파일을 포함한 트래픽을 수집하고, 분석이 필요한 분석 대상 트래픽을 추출하여 악성 코드 관리 서버(110)로 전송한다. At this time, the network traffic sensor 101 is connected to a network, for example, a wired or wireless network, collects traffic including an executable file of an application program executed in a system running in a Windows environment, extracts analysis target traffic requiring analysis, To the code management server (110).

따라서, S110 단계는 네트워크 트래픽 센서(101)로부터 분석 대상 트래픽을 수신하고, 이를 Rest API를 사용하여 분석 대상 트래픽에 포함된 응용 프로그램의 제1 악성 의심 실행 파일 및 각종 메타 정보를 악성 코드 관리 서버(110)의 데이터베이스(111)에 저장할 수 있다. Accordingly, in step S110, the analysis target traffic is received from the network traffic sensor 101, and the first malicious suspicious execution file and various meta information of the application program included in the analysis target traffic are transmitted to the malicious code management server 110). &Lt; / RTI >

그러나, 수집이 아닌, 입력받을 수도 있다. 즉, S110 단계는 수동적으로 적어도 하나 이상의 실행 파일을 입력받아 데이터베이스(111)에 저장할 수 있다. 이때, 입력받는 실행 파일은 윈도우즈 환경에서 실행 가능한 PE(Portable Executable) 파일인 것이 바람직하다. 그러나, 앞서 설명한 PE(Portable Executable) 파일로만 제한되지 않음은 물론이다.However, it may be input, not collection. That is, at step S110, at least one executable file may be manually input and stored in the database 111. [ At this time, it is preferable that the executable file to be input is a PE (Portable Executable) file executable in a Windows environment. However, it is needless to say that the present invention is not limited to the PE (Portable Executable) file described above.

이후, 일 실시예에서, S120 단계는 가상화 기술을 이용하여 적어도 하나 이상의 가상화 에이젼트 모듈(121)를 동시에 구동시킬 수 있다. 이때, 연동되어 구동되는 가상화 에이젼트 모듈(121)은 가상화 환경에서 수행되는 윈도우즈 시스템을 가리킬 수 있다. Thereafter, in one embodiment, step S120 may simultaneously activate at least one virtualization agent module 121 using virtualization technology. At this time, the virtualization agent module 121, which is operated in conjunction with the virtualization agent module 121, may refer to a Windows system that is executed in a virtualized environment.

가상화 에이젼트 모듈(121)이 구동이 되면, 악성 코드 관리 서버(110)로부터 수신된 제1 악성 의심 실행 파일을 가상화 에이젼트 모듈(121)에서 실행시킬 수 있다. 실행 결과, 악성 코드가 호출하는 제1 API(Application Program Interface) 호출 정보가 추출될 수 있다.When the virtualization agent module 121 is activated, the first malicious suspicious execution file received from the malicious code management server 110 can be executed in the virtualization agent module 121. As a result of execution, the first application program interface (API) call information called by the malicious code can be extracted.

역으로, S120 단계는 악성 코드 관리 서버(110)로부터 수신된 제1 악성 의심 실행 파일을 적어도 하나 이상의 가상화 에이젼트 모듈(121)을 이용하여 실행시킨 후, 악성 코드가 호출하는 제1 API(Application Program Interface) 호출 정보를 가상화 분석 서버(120)에서 추출할 수 있다.Conversely, in step S120, the first malicious suspicious execution file received from the malicious code management server 110 is executed using at least one virtualization agent module 121, and then a first API (Application Program) Interface call information can be extracted from the virtualization analysis server 120.

바람직하게는, 사용자 레벨 및 커넬 레벨에서 API 후킹을 통해 악성코드가 호출하는 API 정보를 모니터링하여 제1 API 호출 정보를 가상화 분석 서버(120)에서 추출할 수 있다. 제1 API 호출 정보가 추출되면, 악성 코드에 대한 악성 행위를 알 수 있다.Preferably, API information called by the malicious code is monitored through API hooking at the user level and the kernel level, and the first API call information can be extracted from the virtualization analysis server 120. When the first API call information is extracted, the malicious behavior for the malicious code can be known.

즉, '레지스트리 실행위치에 등록', '파일 복사', '웜 프로세스 실행', 'C:W에 로그 파일 생성', '중복실행방지를 위한 Mutex 생성' 및 '레지스트리의 실행 위치에 등록'과 같은 사용자 레벨 및 커널 레벨의 악성 행위를 알 수 있게 된다. 추출된 제1 API 호출 정보는 가상화 분석 서버(120)에서 악성 코드 관리 서버(120)로 전송된다.In other words, 'Register at registry execution location', 'Copy file', 'Execute worm process', 'Create log file at C: W', 'Create mutex to prevent duplication' It is possible to know malicious behavior at the same user level and kernel level. The extracted first API call information is transmitted from the virtualization analysis server 120 to the malicious code management server 120.

이와 같이, 사용자 레벨 및 커넬 레벨 상에서 모두 제1 API 호출 정보를 추출하기 때문에 다양한 API를 대상으로 악성코드 행위 분석이 가능한 잇점을 준다.Thus, since the first API call information is extracted on both the user level and the kernel level, the malicious code behavior analysis can be performed on various APIs.

이런 경우, 가상화 분석 에이젼트(120)로부터 수신한 제1 API 호출 정보는 가상화 분석 서버(120)의 데이터베이스(101)에 저장할 수 있다.In this case, the first API call information received from the virtualization analysis agent 120 may be stored in the database 101 of the virtualization analysis server 120.

이후, 저장된 제1 API 호출 정보를 이용하여 보다 세밀한 악성 행위를 탐지하기 위하여, S130 단계는 가상화 분석 에이젼트(120)로부터 수신된 제1 API 호출 정보를 미리 설정된 악성 코드 룰셋을 악성 행위 분석 관리 모듈(112)에서 적용하여 가상화 환경에서의 가상화 악성 행위를 탐지한다.Thereafter, in order to detect more detailed malicious behavior using the stored first API call information, step S130 is a step of setting malicious code rule sets, which are preset with the first API call information received from the virtualization analysis agent 120, 112) to detect virtual malicious behavior in a virtualized environment.

이때, 악성 코드 룰셋은 후킹 필터링을 포함할 수 있다. 즉, 후킹 필터링을 포함한 악성 코드 룰셋을 제1 API 호출 정보에 적용하고, 후킹 필터링된 제1 API 호출 정보와 미리 정의된 악성 코드 룰셋을 비교하여 동일성이 확인되면 악성 코드의 가상화 악성 행위를 악성 행위 분석 관리 모듈(112)에서 탐지할 수 있다. 탐지된 가상화 악성 행위는 악성 행위 분석 관리 모듈(112)의 데이터베이스(111)에 저장될 수 있다.At this time, the malicious rule set may include hooking filtering. That is, a malicious code rule set including hooking filtering is applied to the first API call information, and when the hooking-filtered first API calling information is compared with a predefined malicious code rule set to confirm the identity, malicious code malicious behavior Can be detected by the analysis management module 112. The detected virtual malicious behavior may be stored in the database 111 of malicious behavior analysis management module 112.

그러나, 제1 악성 의심 실행 파일로부터 모든 악성 코드를 가상화 환경에서 탐지할 수 없을 수도 있다. 이를 대비하고자, S130 단계는 가상화 악성 행위를 탐지하지 않은 제2 악성 의심 실행 파일을 데이터베이스(111)에 저장된 제1 악성 의심 실행 파일로부터 추출할 수 있다. 추출된 제2 악성 의심 실행 파일은 이후에 설명할 리얼 분석 서버(130)로 전송된다. However, it may not be possible to detect all malicious codes from the first malicious suspicious executable in a virtualized environment. In order to prepare for this, in step S130, the second malicious suspicious execution file that has not detected the virtual malicious behavior may be extracted from the first malicious suspicious execution file stored in the database 111. [ The extracted second malicious suspicious execution file is transmitted to the real analysis server 130 to be described later.

이후, 본 실시예에서, S140 단계는 악성 행위 분석 관리 모듈(112)로부터 수신된 제2 악성 의심 실행 파일을 리얼 타임 분석 에이젼트(130)의 리얼 타임 에이젼트(131)를 통해 실행시킨 후, 악성 코드가 호출하는 제2 API(Application Program Interface) 호출 정보를 리얼 타임 분석 에이젼트(130)에서 추출하게 된다. In step S140, the second malicious suspicious execution file received from the malicious behavior analysis management module 112 is executed through the real-time agent 131 of the real-time analysis agent 130, and then the malicious code The second API (application program interface) call information to be called by the real time analysis agent 130 is extracted.

바람직하게는, 사용자 레벨 및/또는 커널 레벨에서 API 후킹을 통해 악성 코드가 호출하는 API 정보를 모니터링함으로써, 제2 API 호출 정보를 리얼 타임 분석 에이젼트(130)에서 추출할 수 있다. 추출된 제2 API 호출 정보는 악성 행위 분석 관리 모듈(112)로 전송된다.Preferably, the second API call information can be extracted from the real-time analysis agent 130 by monitoring API information called by the malicious code through API hooking at the user level and / or the kernel level. The extracted second API call information is transmitted to the malicious behavior analysis management module 112.

이에 따라, S140 단계는 리얼 타임 에이젼트(131)로부터 수신한 제2 API 호출 정보를 악성 행위 분석 관리 모듈(112)의 데이터베이스(111)에 저장하고, 저장된 제2 API 호출 정보를 미리 설정된 악성 코드 룰셋을 다시 적용시켜, 리얼 타임 악성 행위를 악성 행위 분석 관리 모듈(112)에서 탐지하게 된다.Accordingly, in step S140, the second API call information received from the real-time agent 131 is stored in the database 111 of the malicious behavior analysis management module 112, and the stored second API call information is stored in the preset malicious code rule set The malicious behavior analysis module 112 detects the real-time malicious behavior.

이때, 악성 코드 룰셋은 앞서 설명한 바와 같이 동일한 관계로, 그 설명은 생략한다. 탐지된 리얼 타임 악성 행위는 데이테 베이스(111)에 저장된다. 따라서, 데이테 베이스(111)에 저장된 정보들은 필요에 따라 악성 행위 분석에 유용하게 사용될 수 있게 된다.At this time, the malicious code ruleset has the same relationship as described above, and a description thereof will be omitted. The detected real-time malicious activity is stored in the database 111. Therefore, the information stored in the database 111 can be usefully used for malicious behavior analysis as needed.

이와 같이, 본 실시예에서는 가상화 환경 또는/및 리얼 타임 환경에서 사용자 레벨 및 커널 레벨에 해당하는 모든 API 호출 정보를 추출함으로써, 악성 코드의 보다 세부적인 악성 행위를 탐지할 수 있는 장점을 준다. In this way, the present embodiment extracts all the API call information corresponding to the user level and the kernel level in the virtualized environment and / or the real-time environment, thereby detecting more detailed malicious behavior of the malicious code.

다시 도 17로 돌아와, 본 실시예에 따른 S200 단계는 악성 코드 분석 시스템(100)로부터 악성 코드의 악성 행위 및/또는 API 호출 정보를 행위 분류 서버(210)에 의해 수신한다. 그리고, 수신된 악성 코드의 악성 행위와 기저장된 행위 분류 규칙 정보간 매칭을 통해 행위 코드들을 행위 분류 서버(210)에서 생성한다. Returning to FIG. 17, in step S200 according to the present embodiment, malicious code malicious code and / or API calling information is received from the malicious code analysis system 100 by the malicious code analysis system 100 by the malicious code analysis system 100. The behavior classification server 210 generates behavior codes by matching malicious behavior of the received malicious code with pre-stored behavior classification rule information.

따라서, 수신된 악성 코드의 악성 행위가 행위 분류 규칙에 속하는지를 확인하여 매칭이 이루어져 행위 코드들을 생성할 수 있게 된다. 이 처럼, 생성된 행위 코드들의 일례는 도 13과 같이 나타낼 수 있다. Accordingly, it is confirmed that malicious behavior of the received malicious code belongs to the action classification rule, so that matching can be performed to generate action codes. An example of the generated behavior codes can be represented as shown in FIG.

이와 같이, 행위 코드가 생성되면, S200 단계는 생성된 행위 코드를 이용하여 같은 행위군에 속하는 악성 코드들을 그룹화한 행위 그룹을 행위 분류 서버(210)에서 생성할 수 있다. If the action code is generated, the action classifying server 210 may generate an action group in which malicious codes belonging to the same action group are grouped using the generated action code in step S200.

이때, 동일 행위 코드를 가진 악성 코드는 하나의 그룹으로 그룹핑되고, 이들은 상호 유사한 행위를 갖는 악성 코드로 분류될 수 있다. 또한, 1개 시그니처의 행위 코드는 1개 행위 그룹으로 식별되며, 행위 코드의 전체 개수는 유사 행위 그룹의 개수와 동일한 상태를 가질 수 있다. 이와 같이, 생성된 행위 코드와 행위 그룹 생성으로 인하여, 본 실시예에서는 행위 기반의 변종 악성 코드를 쉽게 탐지할 수 있는 장점을 준다.At this time, malicious codes having the same behavior code are grouped into one group, and they can be classified into malicious codes having mutually similar actions. Also, the behavior code of one signature is identified as one behavior group, and the total number of behavior codes may have the same state as the number of similar behavior groups. In this way, the generated behavior codes and action groups are generated, so that the embodiment can easily detect malicious codes based on the behavior.

또한, 일 실시예에서 S200 단계는 생성되어진 행위 그룹에 포함된 악성 코드를 모아놓은 악성 코드 리스트를 이용하여 API 시퀀스를 행위 분류 서버(210)에서 추출할 수 있다. 이때, 악성 코드 리스트에 대한 예는 도 9와 같이 나타낼 수 있다. 참고로, 도 9에서는 도 17에서 설명한 API 호출 정보도 함께 표시하였다.Also, in operation S200, the action classification server 210 may extract an API sequence using a malicious code list including malicious codes included in the generated action group. At this time, an example of the malicious code list can be shown as shown in FIG. 9, the API call information described in FIG. 17 is also displayed.

이 처럼 추출된 API 시퀸스는 행위 코드별 비트 코드를 생성하는데 활용될 수 있다. 즉, 행위 분류 서버(120)는 추출되어진 API 시퀀스와 행위 분류 규칙 정보내의 단위 행위간 일치 여부를 통해 각 행위 코드별 비트 코드(1,0)를 생성할 수 있다.The extracted API sequence can be used to generate bit codes for each behavior code. That is, the behavior classification server 120 can generate bit codes (1, 0) for each behavior code by matching the extracted API sequence with the unit behavior in the behavior classification rule information.

이후, 본 실시예에서, S200 단계는 앞서 설명한 바와 같이 생성된 행위 코드, 행위 그룹, 악성 코드 리스트, API 시퀀스 및 비트 코드 등을 데이터베이스(220)에 저장한다. 저장된 정보들은 이후에 설명할 악성코드 조회 시스템(400)에 의해 조회될 수 있다.Thereafter, in this embodiment, step S200 stores the generated behavior code, action group, malicious code list, API sequence and bit code, etc., in the database 220 as described above. The stored information may be queried by the malicious code inquiry system 400 to be described later.

본 실시예에서, S300 단계는 악성 코드 분석 시스템(100)으로부터 API 호출 정보를 수신하여 유사도 분석 서버(310)의 데이터베이스(320)에 저장한다. 더욱이 악성 코드 분석 시스템(100)에서 탐지된 악성 코도 및/또는 행위 분류 시스템(200)의 행위 그룹내에 악성 코드를 모아놓은 악성 코드 리스트를 유사도 분석 서버(310)에서 추출하게 된다.In step S300, the API call information is received from the malicious code analysis system 100 and stored in the database 320 of the similarity analysis server 310. [ Further, the malicious code list that has collected the malicious codes in the malicious code detected by the malicious code analysis system 100 and / or the action group of the behavior classification system 200 is extracted by the similarity analysis server 310.

이때, 추출된 악성 코드 리스트는 유사도 측정 대상 악성코드 해쉬 리스트를 지칭한다. 이러한 악성 코드 리스트는 시간정보, 프로세스정보(PID, PPID), API 명, 파라미터1~8 정보 등을 포함할 수 있다. 이러한 악성 코드 리스트는 도 9와 같이 나타내었다.At this time, the extracted malicious code list refers to the malicious code hash list of the similarity measurement target. The malicious code list may include time information, process information (PID, PPID), API name, parameter 1-8 information, and the like. Such a malicious code list is shown in FIG.

따라서, S300 단계는 위와 같은 악성 코드 리스트의 악성코드 해쉬 리스트를 이용하여 적어도 둘 이상의 악성 코드간 API 호출 유사도를 유사도 분석 서버(310)에서 측정(계산)한다.Accordingly, in step S300, the similarity analysis server 310 measures (calculates) the API call similarity between at least two malicious codes using the malicious code hash list of the malicious code list as described above.

이때, 둘 이상의 악성 코드간 API 호출 유사도를 측정하기 위하여, 먼저, 유사도 측정을 요청받은 적어도 둘 이상의 해쉬를 입력받아 API 호출 정보에 대한 코드화를 수행하여 API 코드 시퀀스를 추출한다. 이렇게 코드화된 API 코드 시퀀스는 파일로 만들어져 API 시퀀스 코드 폴더에 저장 및 관리된다. At this time, in order to measure the similarity of the API calls between two or more malicious codes, at least two hashes requested for similarity measurement are input and the API call information is coded to extract the API code sequence. The coded API code sequence is created as a file and stored and managed in the API sequence code folder.

이에 따라, S300 단계는 추출된 API 코드 시퀀스와 더불어 N-gram 알고리즘을 이용하여 API 호출 유사도를 유사도 분석 서버(310)에서 계산할 수 있다. 예를 들면, 추출된 API 코드 시퀀스를 입력값으로 받아들이고, 이를 N-gram 알고리즘에 적용하면 API 호출 유사도를 계산해 낼 수 있다.Accordingly, in step S300, the similarity analysis server 310 can calculate the API call similarity using the extracted API code sequence and the N-gram algorithm. For example, if the extracted API code sequence is accepted as an input value and applied to the N-gram algorithm, the API call similarity can be calculated.

본 실시예에서, S300 단계는 앞서 설명한 API 호출 정보, 악성 코드 리스트, API 코드화, API 코드 시퀀스 및 계산된 API 호출 유사도 등을 데이터베이스(320)에 저장하게 된다. 이와 같이, API 호출 유사도를 계산함으로써, 변종 악성 코드를 탐지하게 된다. In this embodiment, the API 320 stores the API call information, the malicious code list, the API code, the API code sequence, and the calculated API call similarity as described above. Thus, the variant malicious code is detected by calculating the API call similarity.

반면, 본 실시예에서, S400 단계는 앞서 설명한 악성 행위 분석 시스템(100), 행위 분류 시스템(200) 및 유사도 분석 시스템(300)의 데이터베이스(110, 210, 310)에 저장된 정보들을 조회하고, 변종 악성 코드를 확인하기 위한 정보들의 조합과 산출을 수행할 수 있다. 이러한 조회 및 산출은 사용자 인터페이스(GUI)를 통해 실현될 수 있다.In step S400, the malicious behavior analysis system 100, the behavior classification system 200, the similarity analysis system 300, and the database 110, 210, and 310 of the similarity analysis system 300, A combination and calculation of information for identifying malicious code can be performed. Such inquiry and output can be realized through a user interface (GUI).

예를 들면, 도 11에서와 같이 악성코드 A, 악성코드 B 간의 유사도를 조회하고자, "case1. [조회) mal1_id(24), mal2_id(11) 및 case2. [조회] mal1_id[24], mal2_id[67]"와 같이 조회 요청되면 요청받은 악성 코드 A,B가 동일 유사 그룹에 포함된다고 출력하며, 악성 코드 A, B가 상이한 그룹에 포함되어 과거 유사도 산출 결과가 없음을 출력할 수 있다. For example, in order to inquire the similarity degree between malicious code A and malicious code B as shown in FIG. 11, "case1. [Lookup] mal1_id (24), mal2_id (11) and case2. [Lookup] mal1_id [24] 67], "it outputs that the malicious codes A and B requested are contained in the same similar group, and malicious codes A and B are included in different groups to output no past similarity calculation result.

또한, 도 15에서와 같이, 특정 그룹에 속한 변종 악성코드 목록을 조회하고자, 사용자 인터페이스를 통해 "[조회] group1_id(28), 채널정보(TYPE1), 변종판단결과(TRUE)"와 같이 조회 요청되면, MAL2_id 11, 52 ... 등이 변종 판단 결과에 따라 변종 그룹으로 산출되어 출력될 수 있다.15, in order to inquire a list of variant malicious codes belonging to a specific group, a query request such as "[inquiry] group1_id (28), channel information (TYPE1), and variant judgment result (TRUE) The MAL2_id 11, 52, etc. may be calculated and output as a variant group according to the result of the variant determination.

이와 같이, 본 실시예에서는 변종 악성 코드 및 악성 코드의 악성 행위들을 악성코드 조회 시스템(300)에 의해 다양한 조건하에 조회하고 산출할 수 있는 장점을 준다. As described above, in this embodiment, the malicious codes of variant malicious code and malicious code are advantageously inquired and calculated by the malicious code inquiry system 300 under various conditions.

이상에서와 같이, 첨부된 도면을 참조하여 본 발명의 실시예를 설명하였으나, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고 다른 구체적인 형태로 실시할 수 있다는 것을 이해할 수 있을 것이다. 따라서 이상에서 기술한 실시예는 모든 면에서 예시적인 것이며 한정적이 아닌 것이다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the exemplary embodiments or constructions. You can understand that you can do it. The embodiments described above are therefore to be considered in all respects as illustrative and not restrictive.

100 : 악성 코드 분석 시스템 101 : 네트워크 트래픽 센서
110 : 악성 코드 관리 서버 111 : 데이터베이스
112 : 악성 행위 분석 관리 모듈 120 : 가상화 분석 에이젼트
121 : 가상화 에이젼트 모듈 130 : 리얼 타임 분석 에이젼트
131 : 리얼 타임 에이젼트 200 : 행위 분류 시스템
210 : 행위 분류 서버 220 : 데이터베이스
300 : 유사도 분석 시스템 310 : 유사도 분석 서버
320 : 데이터베이스 400 : 악성코드 조회 시스템
1000 : 악성 코드 변종 탐지 및 조회 시스템100: malicious code analysis system 101: network traffic sensor
110: malicious code management server 111: database
112: malicious behavior analysis management module 120: virtualization analysis agent
121: virtualization agent module 130: real-time analysis agent
131: Real Time Agent 200: Action Classification System
210: behavior classification server 220: database
300: Similarity analysis system 310: Similarity analysis server
320: Database 400: Malicious Code Lookup System
1000: malware variant detection and retrieval system

Claims

A malicious code analysis system for executing a malicious suspicious execution file to extract application program interface (API) call information called by malicious code and detecting malicious behavior of malicious code using the extracted API call information;
A similarity analysis system for calculating an API call similarity between at least two malicious codes detected using a malicious code list in which the malicious codes are collected; And
And a behavior classification system for generating behavior codes by matching malicious behavior provided from the malicious code analysis system and pre-stored behavior classification rule information and generating behavior groups in which the behavior codes having similar behavior are grouped ,

Wherein the similarity analysis system is configured to classify the API call similarity between at least two malicious codes in the same action group using a set of API call information collected from the API call information extracted from the action group and a malicious code list including the detected malicious code And,

Wherein the similarity analysis system measures the API call similarity by grasping an individual API configuration, an API call order, and a frequency between arbitrary at least two malicious codes included in the malicious code list. Inquiry system.

The method according to claim 1,
The malicious code analysis system comprises:
An API based malicious code variant detection and retrieval system for collecting malicious suspicious execution files from network traffic sensors connected to a network.

The method according to claim 1,
The malicious code analysis system comprises:
A first database for storing the malicious suspicious execution file, first API call information, and malicious behavior of the malicious code;
Based malicious code variant detection and retrieval system.

The method according to claim 1,
The malicious code analysis system comprises:
An API based malicious code variant detection and retrieval system for extracting API call information called by the malicious code through API hooking on a user level and a kernel level.

5. The method of claim 4,
The malicious code analysis system comprises:
And an API based malicious code variant detection and inquiry system for detecting the malicious behavior by applying a malicious code rule set in advance to the API call information.

6. The method of claim 5,
The malicious code analysis system comprises:
An API based malicious code variant detection and retrieval system applying the malicious code ruleset including hooking filtering.

The method according to claim 1,
The malicious code analysis system comprises:
Based malicious code variant detection and retrieval system that detects malicious activity including virtualization malicious activity and real time malicious activity.

delete

2. The method according to claim 1,
An API number for identifying the API included in the behavior information of the API information and the behavior rule of the behavior information;
A parameter which is an object to be referred to for performing the action;
A parameter value that is an actual value of the object, an associated API that is called together when the corresponding API is called; And
And a flag that identifies whether the action is to be invoked with the association API to match the action.

10. The method of claim 9,
Wherein the behavior classification system comprises:
An API based malicious code variant detection and inquiry system for extracting an API sequence by using a malicious code list including the malicious codes included in the generated action group.

11. The method of claim 10,
Wherein the behavior classification system comprises:
And generating bit code (1, 0) for each of the behavior codes through the coincidence between the extracted API sequence and the unit behavior in the behavior classification rule information.

12. The method of claim 11,
Wherein the behavior classification system comprises:
A second database for storing the behavior code, the action group, the malicious code list, the API sequence and the bit code;
Based malware variant detection and retrieval system.

13. The method according to claim 3 or 12,
A malicious code inquiry system for inquiring information stored in the first and second databases and performing a combination and calculation of the information for verifying the variant malicious code;
Based malicious code variant detection and retrieval system.

delete

2. The system according to claim 1,
An API based malicious code variant detection and retrieval system utilizing the above malicious code list including a malicious code hash list.

16. The method of claim 15,
Wherein the similarity analysis system comprises:
And an API code sequence is extracted from the API call information using any two or more hashes included in the malicious code hash list to extract an API code sequence.

17. The method of claim 16,
Wherein the similarity analysis system comprises:
And an API-based malicious code variant detection and inquiry system for measuring the API call similarity using an N-gram in addition to the extracted API code sequence.

18. The method of claim 17,
Wherein the similarity analysis system comprises:
A third database for storing the API call information set, the malicious code list, the API code, the API code sequence, and the API call similarity;
Based malware variant detection and retrieval system.

19. The method of claim 18,
A malicious code inquiry system for inquiring information stored in the third database and performing a combination and calculation of the information for identifying the variant malicious code;
Based malicious code variant detection and retrieval system.

(a) extracting, from a malicious code analysis system, application program interface (API) call information called by a malicious code after executing a malicious suspicious execution file;
(b) detecting malicious behavior for the malicious code in the malicious code analysis system using the extracted API call information;
(c) measuring at least two API call similarities among malicious codes in the similarity analysis system using the API call information and a malicious code list gathering the detected malicious codes;
(d) generating behavior codes in the behavior classification system through matching between the detected malicious behavior and pre-stored behavior classification rule information; And
(e) generating a group of malicious codes belonging to the same action group by using the action code in a behavior classification system,
The method of claim 1, wherein the step (c) further comprises: using an API call information set that collects API call information extracted from the action group, and a malicious code list that is a collection of detected malicious codes, Respectively,
Wherein the step (c) comprises the steps of: detecting API-based malicious code variant detection by detecting an API configuration, an API calling sequence, and a frequency between arbitrary at least two malicious codes included in the malicious code list; And lookup method.

delete

21. The method of claim 20,
The step (c)
An API based malicious code variant detection and retrieval method utilizing the malicious code list including the malicious code hash list.

24. The method of claim 23,
The step (c)
And an API code sequence is extracted from the API call information using any two or more hashes included in the malicious code hash list to extract an API code sequence.

25. The method of claim 24,
The step (c)
And an API-based malicious code variant detection and retrieval method for measuring the API call similarity using an N-gram in addition to the extracted API code sequence.