KR101537088B1

KR101537088B1 - System and method for detecting malicious code based on api calling flow

Info

Publication number: KR101537088B1
Application number: KR1020140116394A
Authority: KR
Inventors: 조래현; 이동희; 한인희
Original assignee: 인포섹(주)
Priority date: 2014-09-02
Filing date: 2014-09-02
Publication date: 2015-07-15

Abstract

The present invention relates to a system and a method for detecting malicious code based on API calling flow, wherein behavior data based on a series of API calling flow is analyzed to detect malicious code when a code to be analyzed is operated in a sandbox which is an area protected on a program. To this end, a system for detecting malicious code based on API calling flow according to one embodiment of the present invention comprises a behavior data collecting unit, a matching unit, a weight summation unit, and a malicious code determination unit.

Description

SYSTEM AND METHOD FOR DETECTING MALICIOUS CODE BASED ON API CALLING FLOW

본 발명은 API 호출 흐름 기반의 악성코드 탐지 시스템 및 방법에 관한 것으로, 보다 상세하게는 프로그램 상의 보호된 영역인 샌드박스(Sandbox) 내에서 분석 대상 코드를 동작시켰을 때, API의 일련의 호출 흐름에 따른 행위 데이터를 분석하여 악성코드를 탐지하는 기술에 관한 것이다.The present invention relates to a system and method for malicious code detection based on an API call flow, and more particularly, to a system and method for malicious code detection based on an API call flow, The present invention relates to a technology for detecting malicious code by analyzing behavior data.

악성코드는 사용자가 알지 못하는 사이 컴퓨터 시스템에 침입, 설치되어 시스템이나 네트워크에 피해를 주고, 불법적으로 정보를 취득하도록 설계된 소프트웨어를 의미하며, 종래의 악성코드 또는 바이러스 탐지는 주로 파일 기반으로 수행되었다.Malicious code refers to software that is designed to infiltrate, install, or otherwise illegally obtain information from a computer system without the user's knowledge, and conventional malicious code or virus detection was performed primarily on file basis.

즉, 종래에는 악성코드의 탐지를 위해 알려진 모든 악성코드 파일의 패턴 또는 해쉬값 등의 특성을 추출하여 악성코드 데이터베이스에 저장해 두어야 하며, 시스템에 존재하는 모든 파일의 특성을 추출한 후 이를 악성코드 데이터베이스에 저장된 데이터와 비교하여 양자가 일치하는 경우 해당 파일을 악성코드라고 판단하였다.That is, conventionally, all patterns of malicious code files known to detect malicious codes or characteristics such as hash values should be extracted and stored in the malicious code database. After extracting the characteristics of all files existing in the system, Compared with the stored data, if the two are identical, the file is determined to be malicious code.

이와 같은 종래 기술에 의하면, 악성코드 파일의 특성을 보유하고 있는 경우 해당 악성코드를 빠르고 정확하게 탐지할 수 있다는 장점이 있다. 그러나, 악성코드 파일의 특성을 보유하고 있지 않는 경우, 즉 알려지지 않은 악성코드의 경우에는 탐지 자체가 불가능하며, 기 알려진 악성코드라도 그 변종이 발생되면 동일한 유해행위를 일으키는 악성코드임에도 불구하고 탐지하기 어렵다는 단점이 있다.According to this conventional technology, when the characteristic of the malicious code file is retained, the malicious code can be detected quickly and accurately. However, if you do not have the characteristics of a malicious code file, that is, in case of unknown malicious code, detection itself is impossible. Even if a known malicious code is malicious code causing the same malicious code when the variant occurs, There is a drawback that it is difficult.

또한, 종래 기술에 의하면, 악성코드를 탐지하기 위해 시스템에 존재하는 모든 파일에 대해 개별적으로 검사를 수행해야 하므로 악성코드 탐지 시간이 길어지는 단점이 있으며, 특히, 하루에 4천여 개 이상의 변종이 나오는 봇(Bot)과 같은 악성코드의 경우, 악성코드 탐지를 위해 모든 변종 악성코드 파일의 샘플을 보유해야 하고, 샘플 파일로부터 악성코드 탐지를 위한 파일의 특성을 일일이 추출해야 하므로, 메모리의 효율 및 탐지 효율이 떨어지는 단점이 있다.In addition, according to the related art, since all the files existing in the system must be individually checked in order to detect the malicious code, there is a disadvantage that the malicious code detection time becomes longer, and in particular, more than 4,000 variants In the case of malicious codes such as bots, it is necessary to retain samples of all kinds of malicious code files for malicious code detection, and to extract the characteristics of files for malicious code detection from sample files, There is a disadvantage that the efficiency is low.

한편, 한국등록특허 제10-1324691호 "모바일 악성 행위 어플리케이션 탐지 시스템 및 방법"은 사용자 단말기에서 악성 행위를 유발할 수 있는 악성 행위 어플리케이션을 탐지할 수 있는 기술로서, 모바일 악성 행위 어플리케이션의 API(Application Programming Interface) 목록 및 API 호출 순서를 패턴화하여 악성 행위 패턴을 생성하는 악성 행위 패턴 생성부 및 악성 행위 패턴에 기초하여 분석 대상 어플리케이션의 악성 행위 여부를 분석하는 악성 행위 분석부를 포함하는 기술을 제시한다.Korean Patent No. 10-1324691 "Mobile malicious application detection system and method" is a technology for detecting a malicious application that can cause malicious action in a user terminal. A malicious behavior pattern generating unit for generating a malicious behavior pattern by patterning an interface list and an API calling sequence, and a malicious behavior analyzing unit for analyzing malicious behavior of the analysis target application based on the malicious behavior pattern.

상기 선행기술은 스마트폰 어플리케이션 마켓에서 유통되는 임의의 어플리케이션을 자동 수집하여 분석하고, 모바일 악성 행위 어플리케이션 여부를 확인함으로써, 모바일 악성 행위 어플리케이션 탐지 기능을 강화할 수 있는 장점이 있다. 하지만 상기 선행기술은 API 호출 순서를 패턴화하여 악성 행위 패턴을 생성한 후, 분석 대상 어플리케이션이 사용하는 API 호출 순서의 패턴이 상기 생성된 악성 행위 패턴과 일치하는지에 따라 악성코드의 여부를 판단하고 있으며, 또한 특정 API의 발생 빈도를 기반으로 악성 행위 패턴의 검출 여부를 판단하고 있기 때문에, 악성코드의 API 호출 순서와 비슷한 양성코드를 악성코드로 탐지하는 오탐율이 높은 단점이 있다.The prior art has the advantage of enhancing the function of detecting a malicious behavior application by automatically collecting and analyzing an arbitrary application distributed in a smartphone application market and confirming whether or not the malicious application is a mobile application. However, in the prior art, after generating the malicious behavior pattern by patterning the API call sequence, it is determined whether or not the malicious code exists according to whether the pattern of the API call sequence used by the analysis target application matches the generated malicious behavior pattern Also, since the malicious behavior pattern is detected based on the frequency of occurrence of a specific API, there is a high defect rate that malicious code detects a malicious code similar to the API calling sequence.

따라서, 악성코드 탐지 시 오탐율은 낮추고, 정탐율은 높일 수 있는 보다 효율적인 기술 개발이 요구된다.Therefore, it is required to develop a more efficient technology that can reduce the false positives and increase the scan rate when detecting malicious codes.

한국등록특허 제10-1324691호 (등록일: 2013.10.28)Korean Registered Patent No. 10-1324691 (Registered on October 27, 2013)

본 발명은 상기와 같은 종래 기술의 문제점을 해결하고자 도출된 것으로서, API 호출 흐름 기반의 악성코드 탐지 시스템 및 방법을 제공하는 것을 목적으로 한다.SUMMARY OF THE INVENTION It is an object of the present invention to provide a malicious code detection system based on an API call flow and a method thereof.

본 발명은 오탐율은 낮추고, 정탐율은 높일 수 있는 악성코드 탐지 기술을 제공하려는 것을 목적으로 한다.An object of the present invention is to provide a malicious code detection technique capable of lowering the false positives and increasing the scan rate.

본 발명은 신종 또는 변종 악성코드를 효율적으로 탐지할 수 있는 악성코드 탐지 기술을 제공하려는 것을 목적으로 한다.An object of the present invention is to provide a malicious code detection technique capable of efficiently detecting a new or variant malicious code.

이러한 목적을 달성하기 위하여 본 발명의 일 실시예에 따른 API 호출 흐름 기반의 악성코드 탐지 시스템은 프로그램 상의 보호된 영역 내에서 분석 대상 코드가 동작하는 동안 상기 분석 대상 코드가 호출하는 API(Application Programming Interface)의 일련의 호출 흐름에 따른 상기 분석 대상 코드의 행위 데이터를 수집하는 행위 데이터 수집부, 상기 수집된 행위 데이터를, 각 노드가 특정 행위에 대응하도록 형성된 트리 구조의 행위 패턴과 순차적으로 매칭하는 매칭부, 상기 수집된 행위 데이터와 순차적으로 매칭된 상기 행위 패턴 내의 각 노드에 부여된 가중치를 합산하는 가중치 합산부 및 상기 합산된 가중치가 기 정해진 합산 값 이상일 경우, 상기 분석 대상 코드를 악성코드로 판단하는 악성코드 판단부를 포함한다.In order to accomplish this object, an API call flow-based malware detection system according to an embodiment of the present invention includes an application program interface (API) called by the analysis target code during operation of the analysis target code in a protected area of the program, A behavior data collecting unit collecting behavior data of the analysis target code according to a series of call flows of the node, a matching process of sequentially matching the collected behavior data with behavior patterns of a tree structure formed so that each node corresponds to a specific action A weight summing unit for summing weights assigned to respective nodes in the behavior pattern sequentially matched with the collected behavior data; and a weighting unit for determining the analysis target code as a malicious code if the summed weight is greater than or equal to a predetermined sum value And a malicious code determining unit.

이때, 상기 매칭부는 상기 트리 구조의 상기 행위 패턴 내의 현재 노드와 연결되는 다음 노드들 중, 상기 수집된 행위 데이터의 행위가 상기 현재 노드와 일치하는지의 여부에 따라 상기 다음 노드들과 순차적으로 매칭할 수 있으며, 상기 가중치 합산부는 악성 코드의 행위로 간주되는 행위에 대응하는 노드에는 플러스(+) 가중치가 부여되고, 양성 코드의 행위로 간주되는 행위에 대응하는 노드에는 마이너스(-) 가중치가 부여되는 상기 트리 구조의 상기 행위 패턴을 기반으로, 상기 행위 데이터와 매칭된 상기 행위 패턴 내의 상기 가중치를 합산할 수 있으며, 상기 행위 데이터 수집부는 상기 분석 대상 코드가 동작하는 동안 프로세스 별로 생성되거나 또는 기록된 상기 행위 데이터를 시간 순으로 수집할 수 있다.At this time, the matching unit sequentially matches the next nodes according to whether the behavior of the collected behavior data matches the current node among the next nodes connected to the current node in the behavior pattern of the tree structure (+) Weight is given to a node corresponding to an action that is regarded as an action of malicious code, and a negative weight is given to a node corresponding to an action regarded as a behavior of a positive code Wherein the action data collecting unit is operable to sum up the weights in the behavior pattern matched with the behavior data based on the behavior pattern of the tree structure, Act data can be collected in chronological order.

또한, 본 발명의 또 다른 일 실시예에 따른 API 호출 흐름 기반의 악성코드 탐지 시스템은 프로그램 상의 보호된 영역 내에서 분석 대상 코드가 동작하는 동안 상기 분석 대상 코드가 호출하는 API(Application Programming Interface)를 식별하고, 상기 식별된 API에 따라서 상기 분석 대상 코드의 행위 데이터를 수집하는 행위 데이터 수집부, 각 노드가 특정 행위에 대응하도록 형성된 트리 구조의 행위 패턴 내의 현재 노드와 연결되는 다음 노드들 중, 상기 수집된 상기 분석 대상 코드의 행위 데이터와 일치하는 행위에 대응하는 노드를 식별하여 매칭하는 매칭부, 상기 매칭된 노드에 부여된 가중치를 누적하여 합산하는 가중치 합산부 및 상기 합산된 가중치가 기 정해진 합산 값 이상일 경우, 상기 분석 대상 코드를 악성코드로 판단하는 악성코드 판단부를 포함한다.According to another embodiment of the present invention, an API call flow-based malicious code detection system includes an application programming interface (API) called by the analysis target code during operation of the analysis target code in a protected area of the program A behavior data collecting unit for collecting behavior data of the analysis target code according to the identified API; and a behavior data collecting unit for collecting behavior data of the analysis target code among the following nodes connected to a current node in a behavior pattern of a tree structure, A matching unit for identifying and matching nodes corresponding to actions corresponding to the behavior data of the analysis target code, a weight summing unit for accumulating and adding weights assigned to the matched nodes, A malicious code determination unit that determines the analysis target code as a malicious code The.

이때, 상기 가중치 합산부는 악성 코드의 행위로 간주되는 행위에 대응하는 노드에는 플러스(+) 가중치가 부여되고, 양성 코드의 행위로 간주되는 행위에 대응하는 노드에는 마이너스(-) 가중치가 부여된 상기 매칭된 노드의 상기 가중치를 누적하여 합산할 수 있으며, 상기 행위 데이터 수집부는 상기 분석 대상 코드가 동작하는 동안 프로세스 별로 생성되거나 또는 기록된 상기 행위 데이터를 시간 순으로 수집할 수 있다.In this case, the weight summing unit may be configured such that a positive (+) weight is assigned to a node corresponding to an action considered to be an action of a malicious code, and a negative (-) weight is assigned to a node corresponding to an action regarded as a behavior of a positive code The behavior data collecting unit may collect the behavior data generated or recorded for each process in chronological order while the analysis target code is operating.

또한, 본 발명의 일 실시예에 따른 악성코드 탐지를 위한 행위 패턴 생성 시스템은 복수의 분석 대상 코드들이 프로그램 상에서 동작 시 호출하는 API (Application Programming Interface)의 호출 흐름에 따른 행위 데이터를 수집하는 행위 데이터 수집부, 상기 수집된 상기 복수의 분석 대상 코드들 각각의 행위 데이터에서 공통으로 호출되는 API를 추출하는 추출부, 상기 추출된 공통된 API에 대응하는 제1 노드를 생성하는 제1 노드 생성부, 상기 추출된 공통된 API의 다음에 상기 복수의 분석 대상 코드들 각각에 의하여 호출되는 API에 대응하는 적어도 하나의 제2 노드를 생성하는 제2 노드 생성부 및 상기 제1 노드와 상기 적어도 하나의 제2 노드를 트리 구조로 연결하여 악성코드 탐지를 위한 트리 구조의 행위 패턴을 생성하는 행위 패턴 생성부를 포함하며, 악성 코드의 행위로 간주되는 행위에 대응하는 노드에는 플러스(+) 가중치를 부여하고, 양성 코드의 행위로 간주되는 행위에 대응하는 노드에는 마이너스(-) 가중치를 부여하는 가중치 부여부를 더 포함할 수 있다.In addition, the behavior pattern generation system for malicious code detection according to an embodiment of the present invention may include a behavior pattern generation function for collecting behavior data according to a call flow of an API (Application Programming Interface) An extraction unit for extracting an API commonly called from behavior data of each of the collected analysis target codes, a first node generating unit for generating a first node corresponding to the extracted common API, A second node generator for generating at least one second node corresponding to an API called by each of the plurality of analysis target codes next to the extracted common API and a second node generator for generating at least one second node corresponding to the second node, And a behavior pattern generation unit for generating a behavior pattern of a tree structure for detecting malicious code by connecting the detected patterns to a tree structure, (+) Weight is assigned to a node corresponding to an action considered to be an action of a positive code, and a minus (-) weight is given to a node corresponding to an action considered to be a behavior of a positive code .

한편, 본 발명의 일 실시예에 따른 API 호출 흐름 기반의 악성코드 탐지 방법은 프로그램 상의 보호된 영역 내에서 분석 대상 코드를 동작시키는 단계, 상기 보호된 영역 내에서 상기 분석 대상 코드가 동작하는 동안 상기 분석 대상 코드가 호출하는 API(Application Programming Interface)의 일련의 호출 흐름에 따른 상기 분석 대상 코드의 행위 데이터를 수집하는 단계, 상기 수집된 행위 데이터를, 각 노드가 특정 행위에 대응하도록 형성된 트리 구조의 행위 패턴과 순차적으로 매칭하는 단계, 상기 수집된 행위 데이터와 순차적으로 매칭된 상기 행위 패턴 내의 각 노드에 부여된 가중치를 합산하는 단계 및 상기 합산된 가중치가 기 정해진 합산 값 이상일 경우, 상기 분석 대상 코드를 악성코드로 판단하는 단계를 포함한다.Meanwhile, an API call flow-based malware detection method according to an embodiment of the present invention includes: operating an analysis target code in a protected area on a program; Collecting behavior data of the analysis target code according to a series of call flows of an API (Application Programming Interface) invoked by the analysis target code, analyzing the collected behavior data by using a tree structure Sequentially summing up the behavior patterns with the behavior patterns, summing weights assigned to the nodes in the behavior pattern sequentially matching with the collected behavior data, and if the sum weight is greater than or equal to the predetermined summation value, As a malicious code.

이때, 상기 매칭하는 단계는 상기 트리 구조의 상기 행위 패턴 내의 현재 노드와 연결되는 다음 노드들 중, 상기 수집된 행위 데이터의 행위가 상기 현재 노드와 일치하는지의 여부에 따라 상기 다음 노드들과 순차적으로 매칭할 수 있으며, 상기 합산하는 단계는 악성 코드의 행위로 간주되는 행위에 대응하는 노드에는 플러스(+) 가중치가 부여되고, 양성 코드의 행위로 간주되는 행위에 대응하는 노드에는 마이너스(-) 가중치가 부여되는 상기 트리 구조의 상기 행위 패턴을 기반으로, 상기 행위 데이터와 매칭된 상기 행위 패턴 내의 상기 가중치를 합산할 수 있으며, 상기 수집하는 단계는 상기 분석 대상 코드가 동작하는 동안 프로세스 별로 생성되거나 또는 기록된 상기 행위 데이터를 시간 순으로 수집할 수 있다.Here, the matching may be performed sequentially with the next nodes according to whether the behavior of the collected behavior data matches the current node among the next nodes connected to the current node in the behavior pattern of the tree structure A positive (+) weight is assigned to a node corresponding to an action considered to be an action of malicious code, and a negative (-) weight is assigned to a node corresponding to an action considered to be a positive code, Based on the behavior pattern of the tree structure to which the behavior data is assigned, the weighting values in the behavior pattern matched with the behavior data may be added, and the collecting step may be performed on a process- The recorded behavior data can be collected in chronological order.

또한, 본 발명의 일 실시예에 따른 API 호출 흐름 기반의 악성코드 탐지 방법은 프로그램 상의 보호된 영역 내에서 분석 대상 코드를 동작시키는 단계, 상기 보호된 영역 내에서 상기 분석 대상 코드가 동작하는 동안 상기 분석 대상 코드가 호출하는 API(Application Programming Interface)를 식별하고, 상기 식별된 API에 따라서 상기 분석 대상 코드의 행위 데이터를 수집하는 단계, 각 노드가 특정 행위에 대응하도록 형성된 트리 구조의 행위 패턴 내의 현재 노드와 연결되는 다음 노드들 중, 상기 수집된 상기 분석 대상 코드의 행위 데이터와 일치하는 행위에 대응하는 노드를 식별하여 매칭하는 단계, 상기 매칭된 노드에 부여된 가중치를 누적하여 합산하는 단계 및 상기 합산된 가중치가 기 정해진 합산 값 이상일 경우, 상기 분석 대상 코드를 악성코드로 판단하는 단계를 포함한다.Also, an API call flow-based malicious code detection method according to an embodiment of the present invention includes the steps of operating an analysis target code in a protected area of a program, Collecting behavior data of the analysis target code in accordance with the identified API; identifying a current state in a behavior pattern of a tree structure formed so that each node corresponds to a specific behavior; Identifying and matching a node corresponding to an action corresponding to an action data of the collected analysis target code among the following nodes connected to the node, accumulating and summing weights assigned to the matched node, If the summed weight is greater than or equal to the predetermined sum, the analysis target code is judged to be malicious code It includes the steps:

이때, 상기 합산하는 단계는 악성 코드의 행위로 간주되는 행위에 대응하는 노드에는 플러스(+) 가중치가 부여되고, 양성 코드의 행위로 간주되는 행위에 대응하는 노드에는 마이너스(-) 가중치가 부여된 상기 매칭된 노드의 상기 가중치를 누적하여 합산할 수 있으며, 상기 수집하는 단계는 상기 분석 대상 코드가 동작하는 동안 프로세스 별로 생성되거나 또는 기록된 상기 행위 데이터를 시간 순으로 수집할 수 있다.At this time, in the step of summing up, a positive (+) weight is given to a node corresponding to an action regarded as an action of a malicious code, and a negative (-) weight is given to a node corresponding to an action regarded as a behavior of a positive code And accumulating and summing the weights of the matched nodes, and the collecting step may collect the action data generated or recorded for each process while the analysis target code is operating, in chronological order.

또한, 본 발명의 일 실시예에 따른 악성코드 탐지를 위한 행위 패턴 생성 방법은 복수의 분석 대상 코드들이 프로그램 상에서 동작 시 호출하는 API (Application Programming Interface)의 호출 흐름에 따른 행위 데이터를 수집하는 단계, 상기 수집된 상기 복수의 분석 대상 코드들 각각의 행위 데이터에서 공통으로 호출되는 API를 추출하는 단계, 상기 추출된 공통된 API에 대응하는 제1 노드를 생성하는 단계, 상기 추출된 공통된 API의 다음에 상기 복수의 분석 대상 코드들 각각에 의하여 호출되는 API에 대응하는 적어도 하나의 제2 노드를 생성하는 단계 및 상기 제1 노드와 상기 적어도 하나의 제2 노드를 트리 구조로 연결하여 악성코드 탐지를 위한 트리 구조의 행위 패턴을 생성하는 단계를 포함하며, 악성 코드의 행위로 간주되는 행위에 대응하는 노드에는 플러스(+) 가중치를 부여하고, 양성 코드의 행위로 간주되는 행위에 대응하는 노드에는 마이너스(-) 가중치를 부여하는 단계를 더 포함할 수 있다.According to another aspect of the present invention, there is provided a method for generating an action pattern for malicious code detection, comprising the steps of: collecting action data according to a call flow of an application programming interface (API) Extracting an API to be commonly called from behavior data of each of the collected plurality of analysis subject codes, generating a first node corresponding to the extracted common API, Generating at least one second node corresponding to an API called by each of a plurality of analysis target codes, and connecting the first node and the at least one second node in a tree structure, And generating a behavior pattern of the structure, wherein a node corresponding to an action regarded as an action of the malicious code includes: Node that assigned the scan (+) and a weight, corresponding to the actions to be considered as an act of positive negative code may further include the step of weighting ().

본 발명은 신종 또는 변종 악성코드를 효율적으로 탐지할 수 있는 효과가 있다.The present invention has the effect of efficiently detecting a new or variant malicious code.

본 발명은 프로그램 상의 보호된 영역(sandbox) 내에서 API hooking을 통해 분석 대상 코드의 동작 흐름을 관찰하므로, 보안성이 향상된 악성코드 탐지 기술을 제공할 수 있는 효과가 있다.The present invention observes the operation flow of an analysis target code through API hooking in a protected area (sandbox) on a program, and thus provides an effect of providing a malicious code detection technique with improved security.

본 발명은 악성코드 탐지에 사용되는 패턴이 트리 구조의 행위 패턴으로 이루어짐에 따라, 메모리의 효율 및 탐지 효율을 향상시킬 수 있는 효과가 있다.According to the present invention, a pattern used for malicious code detection is formed of a behavior pattern of a tree structure, thereby improving efficiency and detection efficiency of a memory.

본 발명은 악성 코드의 행위로 간주되는 행위에 대응하는 노드에는 플러스(+) 가중치가 부여되고, 양성 코드의 행위로 간주되는 행위에 대응하는 노드에는 마이너스(-) 가중치가 부여된 트리 구조의 행위 패턴을 이용해 악성코드를 탐지하므로, 오탐율을 낮추고, 정탐율은 높일 수 있는 효과가 있다.The present invention is characterized in that a positive (+) weight is assigned to a node corresponding to an action considered to be an act of malicious code, and a behavior of a tree structure given a negative (-) weight to a node corresponding to an action considered as a behavior of a positive code Patterns are used to detect malicious code, reducing false positives and increasing scan rates.

또한 종래 기술들은 가상의 보호된 영역인 샌드박스(sandbox) 등에서 분석 대상 코드를 실행시킨 후, 호출되는 일련의 API의 흐름을 체크하여 악성 코드 여부를 판정하는 경우에, 분석 대상 코드에 의하여 호출되는 API의 흐름을, 하나하나의 악성 코드들의 API의 흐름과 일일이 대조해야 하기 때문에 분석에 소요되는 메모리 요구량, 분석을 위한 CPU 연산, 분석 시간이 비효율적이고, 반드시 분석 대상 코드에 의하여 호출되는 API의 흐름을 끝까지 저장해 둔 후에야 하나하나의 악성 코드들의 API의 흐름과 대조할 수 있기 때문에 실시간으로 빠르게 악성 코드를 탐지해 내기가 어려운 문제점이 있었다. Also, in the prior art, when the analysis target code is executed in a sandbox or the like, which is a virtual protected area, and the malicious code is checked by checking the flow of a series of APIs to be called, Since the flow of the API must be compared with the flow of the API of each malicious code, the memory requirement for the analysis, the CPU operation for analysis, and the analysis time are inefficient, and the flow of the API called by the analysis target code It is difficult to detect malicious code in real time because it can be compared with the flow of API of each malicious code.

이에 본 발명은 효율적으로 악성 코드 및 양성 코드의 API의 호출 패턴(행위 패턴)을 트리 구조로 저장해 두고, 분석 대상 코드에 의하여 호출되는 API의 흐름을 순차적으로 트리 구조의 행위 패턴과 매칭시켜 악성 코드 여부를 탐지하므로, 분석 시간을 매우 단축할 수 있고, 메모리 요구량과 CPU 연산량을 줄일 수 있으며, 분석 대상 코드에 의하여 호출되는 API의 흐름을 반드시 끝까지 모니터링할 필요 없이도 악성 코드인지 양성 코드인지 여부를 쉽게 탐지할 수 있다.Accordingly, the present invention efficiently stores a call pattern (behavior pattern) of an API of malicious code and a positive code in a tree structure, sequentially matches the flow of the API called by the analysis target code with a behavior pattern of the tree structure, It is possible to greatly reduce the analysis time, reduce the memory requirement and the CPU computation amount, and make it easy to determine whether the malicious code or the benign code is malicious code or not, without necessarily monitoring the flow of the API called by the analysis target code It can detect.

또 종래 기술들은 악성 코드가 호출하는 API의 흐름 위주로 분석 대상 코드가 호출하는 API의 흐름을 비교 분석하기 때문에, 악성 코드와 호출하는 API의 흐름이 유사한 일부 양성 코드 조차도 악성 코드로 판정하는 오탐 비율이 높아지는데 비하여, 본 발명은 트리 구조의 행위 패턴을 이용하여 분석 대상 코드의 API 호출 흐름을 분석하므로, 한 번의 시퀀스에서 악성 코드의 행위 패턴과 양성 코드의 행위 패턴을 한꺼번에 비교 분석해 냄으로써 양성 코드를 양성 코드로 정확히 인식하는 비율을 크게 높이고, 오탐 비율을 크게 낮출 수 있다.In addition, since the conventional technologies compare and analyze the flow of the API called by the analysis target code based on the flow of the API called by the malicious code, even a part of the positive code having the similar flow of the malicious code and the calling API is also judged as the malicious code The present invention analyzes the API call flow of the analysis target code using the behavior pattern of the tree structure so that the behavior pattern of the malicious code and the behavior pattern of the positive code are compared and analyzed at a time in a sequence, It is possible to greatly increase the rate of correctly recognizing by the code and significantly reduce the false rate.

도 1은 본 발명의 일 실시예에 따른 API 호출 흐름 기반의 악성코드 탐지 시스템의 개략적인 구성을 나타낸 도면이다.
도 2는 본 발명의 일 실시예에 따른 악성코드 탐지를 위한 행위 패턴 생성 시스템의 개략적인 구성을 나타낸 도면이다.
도 3은 종래 샌드박스에 대한 개념을 나타낸 도면이다.
도 4는 종래 리스트 구조의 패턴을 나타낸 도면이다.
도 5는 본 발명의 일 실시예에 따른 트리 구조의 행위 패턴을 나타낸 도면이다.
도 6은 본 발명의 일 실시예에 따라 생성된 행위 패턴을 나타낸 도면이다.
도 7은 본 발명의 일 실시예에 따른 시스템 서비스 변조 악성 코드를 탐지하는 예를 나타낸 도면이다.
도 8은 본 발명의 일 실시예에 따른 시스템 서비스 변조 악성 코드의 행위 패턴을 매칭한 예를 나타낸 도면이다.
도 9는 본 발명의 일 실시예에 다른 API 호출 흐름 기반의 악성코드 탐지 방법에 관한 흐름도이다.
도 10은 본 발명의 일 실시예에 다른 악성코드 탐지를 위한 행위 패턴 생성 방법에 관한 흐름도이다.FIG. 1 is a diagram showing a schematic configuration of a malicious code detection system based on an API call flow according to an embodiment of the present invention.
FIG. 2 is a schematic block diagram of a behavior pattern generation system for malicious code detection according to an embodiment of the present invention. Referring to FIG.
3 is a view showing a concept of a conventional sandbox.
4 is a diagram showing a pattern of a conventional list structure.
5 is a diagram illustrating a behavior pattern of a tree structure according to an embodiment of the present invention.
6 is a diagram illustrating a behavior pattern generated according to an embodiment of the present invention.
7 is a diagram illustrating an example of detecting a system service modulation malicious code according to an embodiment of the present invention.
FIG. 8 is a diagram illustrating an example of matching a behavior pattern of a system service modulation malicious code according to an embodiment of the present invention.
9 is a flowchart of a method for detecting an API call flow based malicious code according to an embodiment of the present invention.
10 is a flowchart of a method for generating an action pattern for detecting malicious code according to an embodiment of the present invention.

이하, 본 발명의 바람직한 실시예를 첨부된 도면들을 참조하여 상세히 설명한다. 본 발명을 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략하기로 한다. 또한 본 발명의 실시예들을 설명함에 있어 구체적인 수치는 실시예에 불과하다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the following description of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present invention rather unclear. In the following description of the embodiments of the present invention, specific values are only examples.

도 1은 본 발명의 일 실시예에 따른 API 호출 흐름 기반의 악성코드 탐지 시스템의 개략적인 구성을 나타낸 도면이다.FIG. 1 is a diagram showing a schematic configuration of a malicious code detection system based on an API call flow according to an embodiment of the present invention.

우선, 도 1을 참조하여 본 발명의 악성코드 탐지 시스템(100)의 구성 요소에 대해 간단히 설명하기로 한다.First, the components of the malicious code detection system 100 of the present invention will be briefly described with reference to FIG.

도 1을 참조하면, 본 발명의 API 호출 흐름 기반의 악성코드 탐지 시스템(100)은 행위 데이터 수집부(110), 매칭부(120), 가중치 합산부(130) 및 악성코드 판단부(140)를 포함한다.1, the API call flow-based malicious code detection system 100 includes a behavior data collection unit 110, a matching unit 120, a weighted summation unit 130, and a malicious code determination unit 140. [ .

행위 데이터 수집부(110)는 프로그램 상의 보호된 영역 내에서 분석 대상 코드가 동작하는 동안 상기 분석 대상 코드가 호출하는 API(Application Programming Interface)의 일련의 호출 흐름에 따른 상기 분석 대상 코드의 행위 데이터를 수집한다.The behavior data collecting unit 110 collects behavior data of the analysis target code according to a series of call flow of an API (application programming interface) called by the analysis target code while the analysis target code is operating in a protected area on the program Collect.

행위 데이터 수집부(110)는 행위 데이터를 수집할 때, 일련의 순서대로 API 호출 흐름(행태)을 수집할 수 있으며, 이를 이용해 매칭부(120)에서는 일련의 순서대로 수집된 행위 데이터를 이용해 트리 구조의 행위 패턴과 순차적으로 매칭하는 것이 가능하다.When collecting behavior data, the behavior data collection unit 110 may collect the API call flow (behavior) in a series of sequential order. Using the behavior data collected in the sequential order, It is possible to sequentially match with the behavior pattern of the structure.

또한, 본 발명의 또 다른 일 실시예에 따르면, 행위 데이터 수집부(110)는 상기와 같이 일련의 순서대로 API 호출 흐름을 수집하는 것이 아니라, 시간 순으로 호출되는 분석 대상 코드의 API 각각을 하나하나 식별하여, 상기 식별된 API에 따라 각각의 행위 데이터를 수집할 수 있고, 매칭부(120)는 분석 대상 코드가 호출하는 API 하나하나를 따라가면서 트리 구조의 행위 패턴과 매칭할 수 있다. 이에 대한 설명은 이후에 자세히 설명하기로 한다.In addition, according to another embodiment of the present invention, the behavior data collection unit 110 does not collect the API call flow in the above-described sequence, but rather collects APIs of the analysis target code to be called in chronological order And the matching unit 120 may match the behavior pattern of the tree structure while following each API called by the analysis target code. This will be described in detail later.

행위 데이터 수집부(110)에서 상기 프로그램 상의 보호된 영역은 샌드박스(Sandbox)를 의미하는 것으로, 상기 샌드박스는 보호된 영역 안에서 프로그램을 작동시켜 외부 요인으로부터 악영향을 미칠 수 있는 근원을 차단하는 보안 소프트웨어를 말한다. 즉, 샌드박스는 외부로부터 받은 파일을 바로 실행하지 않고 보호된 영역 안에서 실행시킴으로써, 악성코드 등과 같은 악영향적인 요소를 차단하는 것이 가능하다.In the behavior data collecting unit 110, the protected area on the program means a sandbox, and the sandbox operates a program in a protected area, so as to block a source that may adversely affect external factors Software. That is, a sandbox can block malicious elements such as malicious code by executing a file received from the outside in a protected area without immediately executing the file.

도 3은 종래 샌드박스에 대한 개념을 나타낸 도면이다. 3 is a view showing a concept of a conventional sandbox.

도 3을 참조하면, 예를 들어, 인터넷 익스플로러의 웹 페이지에 접속하여 해당 페이지를 다운로드 받으면, 하드디스크 내에 리소스들이 채워지게 된다. 일반적으로 하드디스크 내에 리소스를 채우게 되는 과정은 Hard disk (no sansbox) 그림에서 보여지는 것과 같을 수 있으며, 이는 다운받는 새로운 콘텐츠에 악성코드가 포함되어 있을 경우, 악의적으로 OS의 일부분, 또는 다른 어플리케이션의 정보를 훼손시킬 수 있는 문제가 발생할 수 있다.Referring to FIG. 3, for example, when a web page of the Internet Explorer is accessed and a corresponding page is downloaded, the resources are filled in the hard disk. Generally, the process of filling a resource in the hard disk may be as shown in the Hard disk (no sansbox) picture, because if the new content that is downloaded contains malicious code, malicious part of the OS or other application There may be a problem that can damage information.

이러한 문제를 해결하기 위한 방안으로 샌드박스(Sandbox)의 개념이 등장하였으며, 이는 Hard disk (with sansbox) 그림에서 보여지는 것과 같을 수 있다. 즉, 샌드박스는 하드 디스크의 특정 영역을 샌드박스로 지정하고, 지정된 영역 내에서만 리소스를 사용하고 접근하도록 한다. 즉, 샌드박스는 OS와 같은 중요한 영역에는 영향을 주지 않도록 지정된 영역에서만 리소스를 사용하도록 하며, 더 자세하게는 외부에서 받은 프로그램을 JVM(Java Virtual Machine)이라는 보호된 영역 안에 가둔 뒤 작동시키므로, 프로그램이 폭주하거나 악성 바이러스의 침투를 막을 수 있다.As a solution to this problem, the concept of a sandbox has emerged, which can be as shown in the Hard disk (with sansbox) picture. That is, a sandbox specifies a specific area of the hard disk as a sandbox, and uses and accesses resources only within the designated area. In other words, the sandbox allows the resource to be used only in a designated area so as not to affect an important area such as an OS, and more specifically, a program received from outside is put in a protected area called a JVM (Java Virtual Machine) It can prevent congestion or penetration of malicious viruses.

또한, 행위 데이터 수집부(110)는 프로그램 상의 보호된 영역(Sandbox) 내에서 API 후킹(hooking)을 통해 분석 대상 코드가 동작하는 동안 상기 분석 대상 코드의 API 호출 정보를 수집할 수 있다.In addition, the behavior data collection unit 110 may collect API call information of the analysis target code while the analysis target code is operating through API hooking in a protected area (Sandbox) on the program.

이때, 후킹(hooking)은 운영 체제나 응용 소프트웨어 등의 각종 컴퓨터 프로그램에서 소프트웨어 구성 요소 간에 발생하는 함수 호출, 메시지, 이벤트 등을 중간에서 바꾸거나 가로채는 명령, 방법, 기술이나 행위를 말하며, 이때 이러한 간섭된 함수 호출, 이벤트 또는 메시지를 처리하는 코드를 후크(hook)라고 한다.In this case, hooking refers to an instruction, a method, a technique, or an action to interchange or intercept function calls, messages, and events occurring between software components in various computer programs such as an operating system or application software. The code that processes a function call, event, or message is called a hook.

따라서, API 후킹을 이용하게 되면, API를 호출하기 전/후에 사용자의 후크 코드(hook code)를 실행시킬 수 있으며, API에 넘어온 파라미터 혹은 API 함수의 리턴 값을 엿보거나 조작할 수 있다. 또한 API 호출 자체를 취소시키거나 사용자 코드로 실행 흐름을 변경시킬 수 있다. 따라서, 본 발명에 따른 악성코드 탐지 시스템(100)은 API 후킹을 통해 수집된 API 호출 흐름을 이용함으로써, 보다 효율적으로 악성코드를 탐지하는 것이 가능하다. Therefore, if API hooking is used, the user can execute a hook code before / after the API call, and can peek or manipulate the return value of a parameter or API function passed to the API. You can also cancel the API call itself or change the execution flow to your code. Therefore, the malicious code detection system 100 according to the present invention can detect the malicious code more efficiently by using the API call flow collected through the API hooking.

또한, 행위 데이터 수집부(110)는 상기 분석 대상 코드가 동작하는 동안 프로세스 별로 생성되거나 또는 기록된 행위 데이터(즉 API 호출 로깅(Logging)에 관한 정보)를 시간 순으로 수집할 수 있다. 이때 로깅(Logging)은 시스템을 작동할 때 시스템의 작동상태의 기록/보존, 이용자의 습성조사 및 시스템 동작의 분석 등을 하기 위해 작동 중의 각종 정보에 대한 기록을 만드는 것을 말하며, 행위 데이터 수집부(110)는 API 호출 흐름 정보가 기록된 로깅(Logging)으로부터 행위 데이터를 시간 순으로 수집할 수 있다.In addition, the behavior data collection unit 110 may collect behavior data (i.e., information about API call logging) generated or recorded for each process during the operation of the analysis target code in chronological order. Logging refers to creating a record of various kinds of information during operation in order to record / preserve the operation state of the system, analyze the behavior of the user, and analyze the system operation when the system is operated. 110) may collect behavior data from logging in which API call flow information is recorded in chronological order.

매칭부(120)는 행위 데이터 수집부(110)에서 수집된 행위 데이터를, 각 노드가 특정 행위에 대응하도록 형성된 트리 구조의 행위 패턴과 순차적으로 매칭한다.The matching unit 120 sequentially matches the behavior data collected by the behavior data collection unit 110 with the behavior patterns of the tree structure formed so that each node corresponds to a specific behavior.

또한, 매칭부(120)는 트리 구조의 행위 패턴 내의 현재 노드와 연결되는 다음 노드들 중, 행위 데이터 수집부(110)에서 수집된 행위 데이터의 행위가 상기 현재 노드와 일치하는지의 여부에 따라 상기 다음 노드들과 순차적으로 매칭할 수 있다.In addition, the matching unit 120 determines whether or not the behavior of the behavior data collected by the behavior data collection unit 110 among the following nodes connected to the current node in the behavior pattern of the tree structure matches the current node, It can be sequentially matched to the next nodes.

이때, 본 발명에 따른 악성 코드 탐지 시스템(100)에서 악성 코드 탐지 시 사용되는 트리 구조의 행위 패턴에 대해 좀 더 자세히 살펴보면, 본 발명은 악성 코드를 탐지하기 위해 트리 형태의 구조로 형성된 행위(API 호출 흐름) 기반의 패턴을 이용한다.Hereinafter, the behavior pattern of the tree structure used in the malicious code detection system 100 according to the present invention will be described in more detail. In order to detect a malicious code, Call flow) based patterns.

즉, 악성 코드를 탐지하기 위한 패턴 구조로서, 종래에는 리스트(List) 구조의 패턴을 이용하고 있는 반면, 본 발명에 따른 패턴 구조는 트리(Tree) 구조의 패턴을 이용한다. 이에 대한 설명은 도 4 및 도5를 참조하여 더 자세히 설명한다.That is, as a pattern structure for detecting malicious codes, a pattern of a list structure is conventionally used, while a pattern structure of the present invention uses a pattern of a tree structure. This will be described in more detail with reference to FIGS. 4 and 5. FIG.

도 4는 종래 리스트 구조의 패턴을 나타낸 도면이다.4 is a diagram showing a pattern of a conventional list structure.

도 4를 참조하면, 종래 기술들은 악성 코드를 탐지하기 위한 패턴의 구조가 리스트 구조로 패턴화되어 있어, 가상의 보호된 영역인 샌드박스(sandbox) 등에서 분석 대상 코드를 실행시킨 후, 호출되는 일련의 API의 흐름을 체크하여 악성 코드 여부를 판정하는 경우에, 분석 대상 코드에 의하여 호출되는 API의 흐름을, 하나하나의 악성 코드들의 API의 흐름과 일일이 대조해야 하기 때문에 분석에 소요되는 메모리 요구량, 분석을 위한 CPU 연산, 분석 시간이 비효율적이고, 반드시 분석 대상 코드에 의하여 호출되는 API의 흐름을 끝까지 저장해 둔 후에야 하나하나의 악성 코드들의 API의 흐름과 대조할 수 있기 때문에 실시간으로 빠르게 악성 코드를 탐지해 내기가 어려운 문제점이 있었다.Referring to FIG. 4, in the prior art, a structure of a pattern for detecting a malicious code is patterned in a list structure, and after a code to be analyzed is executed in a sandbox or the like, which is a virtual protected area, It is necessary to collate the flow of the API called by the analysis target code with the flow of the API of each malicious code in a case of judging whether or not the malicious code is checked by checking the flow of the API of the malicious code, Since the CPU operation and analysis time for analysis are inefficient and the flow of the API called by the analysis target code must be stored until the end, it is possible to match the flow of the API of each malicious code so that the malicious code is detected rapidly in real time There was a problem that it was difficult to solve.

또한, 특정 API 호출 이후에 발생할 수 있는 행위는 경우의 수가 매우 많으나, 종래의 리스트 구조에서는 이를 탐지하기 위해서 패턴 패키지를 여러 번 탐색해야 하기 때문에, 시스템 성능의 문제와 결부되어 악성 코드 탐지 효율이 떨어지는 단점이 있었다. 또한, 종래 대부분의 악성 코드 탐지 기술들은 기 저장된 패턴의 일치 여부에 따라 악성 코드일 가능성을 높이기만 하는 구조로 되어 있어, 오탐 처리가 용이하지 않았으며, 이에 따라 악성코드의 API 호출 순서와 비슷한 양성코드를 악성코드로 탐지하는 오탐율이 높은 단점이 있었다. 반면, 본 발명에 따른 트리 형태의 행위 패턴 기반의 악성 코드 탐지 기술은 상기와 같은 종래의 문제점을 해결할 수 있다.In addition, since the number of cases that can occur after a specific API call is very large, in the conventional list structure, the pattern package has to be searched for several times in order to detect it. Therefore, There were disadvantages. In addition, most conventional malicious code detection techniques are structured to increase the likelihood of malicious codes according to whether or not previously stored patterns are matched. Thus, it is not easy to mistreat the malicious codes. Accordingly, There is a high defect rate that detects code as malicious code. On the other hand, the tree type behavior pattern-based malicious code detection technique according to the present invention can solve the conventional problems as described above.

도 5는 본 발명의 일 실시예에 따른 트리 구조의 행위 패턴을 나타낸 도면이다.5 is a diagram illustrating a behavior pattern of a tree structure according to an embodiment of the present invention.

도 5를 참조하면, 본 발명에 따른 악성 코드 탐지를 위한 패턴 구조는 트리(Tree) 구조의 행위 패턴으로 형성되어 있다. 본 발명은 효율적으로 악성 코드 및 양성 코드의 API의 호출 패턴(행위 패턴)을 트리 구조로 저장해 두고, 분석 대상 코드에 의하여 호출되는 API의 흐름을 순차적으로 트리 구조의 행위 패턴과 매칭시켜 악성 코드 여부를 탐지하므로, 분석 시간을 매우 단축할 수 있고, 메모리 요구량과 CPU 연산량을 줄일 수 있으며, 분석 대상 코드에 의하여 호출되는 API의 흐름을 반드시 끝까지 모니터링할 필요 없이도 악성 코드인지 양성 코드인지 여부를 쉽게 탐지할 수 있다.Referring to FIG. 5, the pattern structure for detecting malicious code according to the present invention is formed as a behavior pattern of a tree structure. The present invention efficiently stores a call pattern (behavior pattern) of an API of malicious code and a positive code in a tree structure, sequentially matches the flow of the API called by the analysis target code with a behavior pattern of the tree structure, It is possible to greatly reduce the analysis time, reduce the memory requirement and the CPU computation amount, and easily detect whether the malicious code or the benign code is detected without necessarily monitoring the flow of the API called by the analysis target code to the end can do.

또 종래 기술들은 악성 코드가 호출하는 API의 흐름 위주로 분석 대상 코드가 호출하는 API의 흐름을 비교 분석하기 때문에, 악성 코드와 호출하는 API의 흐름이 유사한 일부 양성 코드 조차도 악성 코드로 판정하는 오탐 비율이 높아지는데 비하여, 본 발명은 트리 구조의 행위 패턴을 이용하여 분석 대상 코드의 API 호출 흐름을 분석하므로, 한 번의 시퀀스에서 악성 코드의 행위 패턴과 양성 코드의 행위 패턴을 한꺼번에 비교 분석해 냄으로써 양성 코드를 양성 코드로 정확히 인식하는 비율을 크게 높이고, 오탐 비율을 크게 낮출 수 있다. 한편, 본 발명의 일 실시예에 따른 트리 구조의 행위 패턴에 대한 특성은 이후에 더 자세히 기술하기로 한다.In addition, since the conventional technologies compare and analyze the flow of the API called by the analysis target code based on the flow of the API called by the malicious code, even a part of the positive code having the similar flow of the malicious code and the calling API is also judged as the malicious code The present invention analyzes the API call flow of the analysis target code using the behavior pattern of the tree structure so that the behavior pattern of the malicious code and the behavior pattern of the positive code are compared and analyzed at a time in a sequence, It is possible to greatly increase the rate of correctly recognizing by the code and significantly reduce the false rate. The characteristics of the behavior pattern of the tree structure according to an embodiment of the present invention will be described in detail later.

가중치 합산부(130)는 행위 데이터 수집부(110)에서 수집된 행위 데이터와 순차적으로 매칭된 상기 행위 패턴 내의 각 노드에 부여된 가중치(악성지수)를 합산한다.The weighting summation unit 130 sums the weights (malicious indices) given to each node in the behavior pattern sequentially matching with the behavior data collected by the behavior data collection unit 110.

또한, 가중치 합산부(130)는 악성 코드의 행위로 간주되는 행위에 대응하는 노드에는 플러스(+) 가중치가 부여되고, 양성 코드의 행위로 간주되는 행위에 대응하는 노드에는 마이너스(-) 가중치가 부여되는 상기 트리 구조의 상기 행위 패턴을 기반으로, 행위 데이터 수집부(110)에서 수집된 행위 데이터와 매칭된 상기 행위 패턴 내의 상기 가중치를 합산할 수 있다. 이에 대한 설명은 이후에 더 자세히 설명하기로 한다.In addition, the weighting summation unit 130 adds plus (+) weights to nodes corresponding to behaviors considered malicious code behaviors and minus (-) weights to nodes corresponding to behaviors considered benign code behaviors Based on the behavior pattern of the tree structure to be given, the weighting values in the behavior patterns matched with the behavior data collected by the behavior data collection unit 110. [ The description will be described in more detail later.

악성코드 판단부(140)는 가중치 합산부(130)에서 합산된 가중치가 기 정해진 합산 값 이상일 경우, 상기 분석 대상 코드를 악성코드로 판단한다.The malicious code determining unit 140 determines that the analysis target code is a malicious code when the weighted value summed by the weighted summing unit 130 is equal to or greater than a predetermined summed value.

한편, 이하에서는 본 발명의 일 실시예에 따른 트리 구조의 행위 패턴에 대하여, 상기 행위 패턴이 생성되는 과정 및 상기 행위 패턴의 특성에 대해 자세히 기술한다.Hereinafter, the process of generating the behavior pattern and the characteristics of the behavior pattern will be described in detail with respect to the behavior pattern of the tree structure according to the embodiment of the present invention.

우선, 도 2를 참조하여 본 발명에 따른 트리 구조의 행위 패턴을 생성하기 위한 구성 요소에 대해 살펴본다.First, components for generating a behavior pattern of a tree structure according to the present invention will be described with reference to FIG.

도 2는 본 발명의 일 실시예에 따른 악성코드 탐지를 위한 행위 패턴 생성 시스템의 개략적인 구성을 나타낸 도면이다.FIG. 2 is a schematic block diagram of a behavior pattern generation system for malicious code detection according to an embodiment of the present invention. Referring to FIG.

도 2를 참조하면, 본 발명의 API 호출 흐름 기반의 악성코드 탐지 시스템(100)은 악성코드 탐지를 위한 행위 패턴 생성 시스템(200)을 포함할 수 있으며, 악성코드 탐지를 위한 행위 패턴 생성 시스템(200)은 행위 데이터 수집부(210), 추출부(220), 제1 노드 생성부(230), 제2 노드 생성부(240), 행위 패턴 생성부(250) 및 가중치 부여부(260)를 포함한다.Referring to FIG. 2, the API call flow-based malicious code detection system 100 of the present invention may include a behavior pattern generation system 200 for malicious code detection, and may include a behavior pattern generation system for malicious code detection 200 includes a behavior data collecting unit 210, an extracting unit 220, a first node generating unit 230, a second node generating unit 240, a behavior pattern generating unit 250 and a weight assigning unit 260 .

행위 데이터 수집부(210)는 복수의 분석 대상 코드들이 프로그램 상에서 동작 시 호출하는 API (Application Programming Interface)의 호출 흐름에 따른 행위 데이터를 수집한다.The action data collecting unit 210 collects action data according to a call flow of an API (Application Programming Interface) that is called by a plurality of analysis target codes when operating on a program.

즉, 행위 데이터 수집부(210)는 복수의 악성 코드 및 복수의 양성 코드를 포함하는 복수의 분석 대상 코드들이 프로그램 상에서 동작할 때, 상기 분석 대상 코드들 각각이 호출하는 API의 호출 흐름에 따른 행위 데이터를 모두 수집한다.That is, when a plurality of analysis target codes including a plurality of malicious codes and a plurality of positive codes are operated on a program, the behavior data collection unit 210 may include an action data collection unit 210, Collect all data.

추출부(220)는 행위 데이터 수집부(210)에서 수집된 상기 복수의 분석 대상 코드들 각각의 행위 데이터에서 공통으로 호출되는 API를 추출한다.The extracting unit 220 extracts an API commonly called from the behavior data of each of the plurality of analysis target codes collected by the behavior data collecting unit 210.

제1 노드 생성부(230)는 추출부(220)에서 추출된 공통된 API에 대응하는 제1 노드를 생성한다.The first node generating unit 230 generates a first node corresponding to the common API extracted by the extracting unit 220.

제2 노드 생성부(240)는 추출부(220)에서 추출된 공통된 API의 다음에 상기 복수의 분석 대상 코드들 각각에 의하여 호출되는 API에 대응하는 적어도 하나의 제2 노드를 생성한다.The second node generating unit 240 generates at least one second node corresponding to the API called by each of the plurality of analysis target codes next to the common API extracted by the extracting unit 220.

행위 패턴 생성부(250)는 제1 노드 생성부(230)에서 생성된 제1 노드와 제2 노드 생성부(240)에서 생성된 적어도 하나의 제2 노드를 트리 구조로 연결하여 악성코드 탐지를 위한 트리 구조의 행위 패턴을 생성한다. 이때, 행위 생성 패턴부(250)를 통해 생성된 행위 패턴은 도 5와 같은 트리 구조로 나타낼 수 있다.The behavior pattern generation unit 250 detects a malicious code by connecting the first node generated by the first node generation unit 230 and the at least one second node generated by the second node generation unit 240 as a tree structure To generate a behavior pattern of the tree structure. At this time, the behavior pattern generated through the behavior generation pattern unit 250 can be represented by a tree structure as shown in FIG.

가중치 부여부(260)는 악성 코드의 행위로 간주되는 행위에 대응하는 노드에는 플러스(+) 가중치를 부여하고, 양성 코드의 행위로 간주되는 행위에 대응하는 노드에는 마이너스(-) 가중치를 부여한다. 이때, 상기 가중치는 악성 지수를 나타낸다.The weight assigning unit 260 assigns positive (+) weights to nodes corresponding to behaviors considered malicious code behaviors and minus (-) weights to nodes corresponding to behaviors that are regarded as positive behaviors . At this time, the weight represents a malignancy index.

가중치 부여부(260)는 행위 패턴 생성부(250)에서 트리 구조의 행위 패턴이 생성된 후 가중치를 부여할 수도 있고, 또는 제1 노드 및 제2 노드를 생성할 때 가중치를 부여할 수도 있다. The weight assigning unit 260 may assign a weight to the behavior pattern generated by the behavior pattern generating unit 250 after the behavior pattern of the tree structure is generated or may be weighted when generating the first node and the second node.

더 자세하게는, 본 발명의 일 실시예에 따라 생성된 트리 구조의 행위 패턴 내의 각 노드는 4가지 종류로 구분할 수 있으며, 이는 플러스 패턴 노드, 마이너스 패턴 노드, 독립 패턴 노드 및 분석 패턴 노드일 수 있다.In more detail, each node in the behavior pattern of the tree structure generated according to an embodiment of the present invention can be classified into four kinds, which may be a plus pattern node, a minus pattern node, an independent pattern node, and an analysis pattern node .

상기 플러스 패턴 노드는 상/하위 패턴을 가지고 있으면서, 패턴과 일치하는 경우 악성 행위 또는 악성 행위를 위한 준비 단계로 간주되는 경우에 해당될 수 있다. 또한, 플러스 패턴 노드는 악성 코드의 행위로 간주되는 행위에 대응하는 노드로서, 플러스(+) 가중치가 부여된다.The positive pattern node may have a top / bottom pattern and correspond to a pattern when it is regarded as a preparation stage for malicious action or malicious action. In addition, the positive pattern node is a node corresponding to an action regarded as malicious code behavior, and a positive (+) weight value is given.

즉, 기존에 악성 코드 분석 데이터를 기반으로, 10개의 악성 코드 중 7개에서 공통적으로 특정 행위(API 호출)가 발견된 경우, 상기 특정 행위는 악성 코드일 가능성이 높다는 의미로, 상기 특정 행위에 대응하는 노드에는 100점(확실히 악성코드인 경우) 중 +80의 가중치를 부여할 수 있다. 따라서, 분석 대상 코드의 행위 데이터가 악성코드 탐지 시 80점의 가중치를 가진 플러스 패턴 노드에 대응될 때, 악성 코드 탐지 시스템(100)은 상기 분석 대상 코드의 악성 코드 가능성이 80%임을 사용자에게 제공할 수도 있고, 또는 악성코드 판단부(140)에서 분석 대상 코드를 악성코드로 판단하는 기준이 '가중치(악성지수) 합산값이 78점 이상'이라는 조건으로 설정되어 있는 경우, 80의 악성지수를 지닌 상기 분석 대상 코드는 기준치를 충족하므로 이를 악성 코드로 판단하여 사용자에게 제공할 수 있다.That is, when a specific action (API call) is commonly found in seven of the ten malicious codes based on the malicious code analysis data, it means that the specific action is highly likely to be malicious code. A weight of +80 can be given to the corresponding node at 100 points (in the case of a malicious code surely). Therefore, when the behavior data of the analysis target code corresponds to a positive pattern node having a weight of 80 points at the time of malicious code detection, the malicious code detection system 100 provides the user with the possibility that the malicious code of the analysis target code is 80% Or when the criterion for judging the analysis target code as a malicious code by the malicious code determination unit 140 is set to a condition that the sum of the weight (malicious index) is 78 or more, the malicious index of 80 The analysis target code satisfies the criterion, so that it can be determined that the analysis target code is a malicious code and can be provided to the user.

마이너스 패턴 노드는 상/하위 패턴을 가지고 있으면서, 패턴과 일치하는 경우 정상 프로그램일 가능성이 높아지거나, 혹은 확실히 정상 파일로 간주하는 경우에 해당될 수 있다. 즉, 분석 대상 코드의 행위 데이터가 악성 코드 탐지 시 마이너스 패턴 노드에 대응된다면, 상기 분석 대상 코드가 양성 코드일 가능성은 상기 마이너스 패턴 노드에 부여된 가중치의 값만큼 낮아지게 된다.A minus pattern node may have a top / bottom pattern, and if it matches the pattern, it is likely that it is a normal program, or it may be considered to be a normal file. That is, if the behavior data of the analysis target code corresponds to the minus pattern node at the time of malicious code detection, the possibility that the analysis target code is a positive code is lowered by the value of the weight given to the minus pattern node.

또한, 마이너스 패턴 노드는 양성 코드의 행위로 간주되는 행위에 대응하는 노드로서, 마이너스(-) 가중치가 부여된다.In addition, the minus pattern node is a node corresponding to an action that is regarded as a behavior of a positive code, and is given a minus (-) weight.

본 발명은 트리 구조의 행위 패턴을 이용해 악성코드를 탐지함으로써, 한 번의 시퀀스에서 악성 코드의 행위 패턴과 양성 코드의 행위 패턴을 한꺼번에 비교 분석하는 것이 가능하다. 종래에는 리스트 구조의 패턴 매칭을 이용해 악성코드를 판단함으로써, 악성 코드와 호출하는 API의 흐름이 유사한 일부 양성 코드 조차도 악성 코드로 판정하는 오탐 비율이 높은 단점이 있었다. 반면, 본 발명은 트리 구조의 행위 패턴에 마이너스 패턴 노드를 이용함으로써 오탐 비율을 크게 낮출 수 있는 효과가 있다.According to the present invention, malicious code is detected using a behavior pattern of a tree structure, so that a behavior pattern of a malicious code and a behavior pattern of a positive code can be compared and analyzed at a time in one sequence. Conventionally, there is a disadvantage that a malicious code is judged using pattern matching of a list structure, and even a part of positive codes having similar flow of malicious code and calling API are malignant codes. On the other hand, the present invention has the effect of significantly reducing the false rate by using the minus pattern node in the behavior pattern of the tree structure.

즉, 본 발명에 따른 마이너스 패턴 노드가 악성 코드 탐지 시 상위 노드에 부여된 악성지수의 합을 무효화하거나 차감하는 경우에 대해 살펴보면, 먼저 상위 노드에 부여된 악성지수의 합을 무효화하는 경우는 확실한 시그니쳐에 의해 이전 의심 행위가 정상 동작으로 판단되는 경우에 무효화될 수 있다. 즉, 만약에 분석 대상 코드가 V3 관련 레지스트리에 접근하고(악성지수 +10), 디버깅 여부를 판단하고(악성지수 +20), 가상환경 여부를 판단하고(악성지수 +5), 실행 파일을 다운로드(악성지수 +20) 하는 행위를 순차적으로 나타낸 경우, 상기의 행위로는 악성지수 합산 값이 55이므로, 상기 분석 대상 코드는 악성코드일 가능성이 높다. 하지만 실행 파일을 다운로드 하는 행위 이후에 다운로드 된 실행파일에 Ahnlab 인증서가 포함되어 행위가 감지된다면, 상기 분석 대상 코드는 악성 코드가 아닌 확실한 양성 코드이므로, 상기 인증서를 감지하는 행위에는 상위 노드에 부여된 악성지수의 합을 무효화시키는 마이너스 패턴 노드가 형성될 수 있다.That is, when the negative pattern node according to the present invention invalidates or subtracts the sum of the malicious indices given to the ancestor when the malicious code is detected, if the sum of the malignant indices assigned to the ancestor is invalidated, May be invalidated if a previous suspicious action is determined to be a normal action. That is, if the analysis target code accesses the registry related to V3 (malicious index +10), judges whether or not debugging is performed (malicious index +20), judges whether or not it is a virtual environment (malicious index +5) (Malicious index + 20) are sequentially displayed, the malicious index sum value is 55 as the above behavior, so that the analysis target code is likely to be malicious code. However, if the Ahnlab certificate is included in the downloaded executable file after the execution of downloading the executable file, and the behavior is detected, the analysis target code is not a malicious code but a certain positive code. Therefore, A minus pattern node for invalidating the sum of malicious indices can be formed.

또한, 상위 노드에 부여된 악성지수의 합을 감산하는 경우는 이전 행위로 보아 악성코드로 의심이 되지만, 후행에 뚜렷한 악성행위가 없어 수동으로 분석해야 할 대상인 경우일 수 있다. 즉, 만약에 분석 대상 코드가 서비스 관리자 해들을 획득(악성지수 +10)하고, 특정 시스템 서비스의 핸들을 획득(악성지수 +10)하고, 서비스를 중지(악성지수 +50)하는 행위를 순차적으로 나타낸 경우, 상기의 행위로는 악성지수 합산 값이 70이므로, 상기 분석 대상 코드는 악성코드일 가능성이 상당히 높다. 그러나 서비스를 중지하는 행위 이후에 서비스 실행 관련 레지스트리나 파일의 변조 없이 재시작 하는 행위(악성지수 -20)가 나타난다면, 재시작 하는 행위 이후에 뚜렷한 악성행위가 없으므로, 상기 재시작 하는 행위에는 마이너스 패턴 노드가 형성될 수 있다.In addition, when subtracting the sum of the malicious indices given to an upper node, it may be a malicious code as a result of a previous action, but it may be a case where a manual analysis is required because there is no distinct malicious action in the trailing direction. That is, if the analysis target code acquires the service manager solutions (malicious index + 10), acquires the handle of the specific system service (malicious index +10), and stops the service (malicious index +50) In the case shown, since the malicious index sum value is 70 in the above-mentioned action, the analysis target code is highly likely to be malicious code. However, if there is no malicious behavior after restarting, if there is an operation (malicious index -20) that restarts without modifying the registry or file related to the service execution after stopping the service, the minus pattern node .

한편, 독립 패턴 노드는 트리에서 부모 노드가 존재하지 않는 최상의 노드를 의미한다. 즉, 독립 패턴 노드는 Root 노드를 의미하며, leaf 노드를 가지고 있지 않을 수 있다.On the other hand, the independent pattern node means the best node in which no parent node exists in the tree. That is, the independent pattern node means a root node, and may not have a leaf node.

또한, 분석 패턴 노드는 상기 독립 패턴 노드와 유사하나, 가감할 악성지수를 가지고 있지 않은 패턴 노드를 의미하며, 또한 수동 분석 또는 리포트 생성 시 정보를 표현하기 위한 패턴 노드를 의미한다.Also, the analysis pattern node is similar to the independent pattern node, but means a pattern node having no malicious index to be added or subtracted, and also a pattern node for expressing information in manual analysis or report generation.

도 6은 본 발명의 일 실시예에 따라 생성된 행위 패턴을 나타낸 도면이다.6 is a diagram illustrating a behavior pattern generated according to an embodiment of the present invention.

도 6을 참조하면, 본 발명의 일 실시예에 따라 생성된 트리 구조의 행위 패턴 내의 각 노드들은 특정 행위에 대응하도록 형성되어 있으며, P0 노드는 부모 노드가 존재하지 않는 독립 패턴 노드임을 알 수 있고, P1, P3, P7 노드는 마이너스 가중치가 부여된 마이너스 패턴 노드임을 알 수 있다. 또한, 그 밖에 P2, P4 내지 P6, P8 내지 P12는 플러스 가중치가 부여된 플러스 패턴 노드임을 알 수 있다.Referring to FIG. 6, each node in a behavior pattern of a tree structure generated according to an exemplary embodiment of the present invention is formed to correspond to a specific action, and the node P0 is an independent pattern node in which a parent node does not exist , And P1, P3, and P7 nodes are minus-weighted negative pattern nodes. In addition, it can be seen that P2, P4 to P6, and P8 to P12 are plus pattern nodes to which a plus weight is assigned.

또한, 이때 상위 패턴이 존재하지 않는 최상위 패턴은 Root 노드(P0 노드)로 나타내고, 하위 패턴이 존재하지 않는 최하위 패턴은 Leaf 노드(P1, P5, P9, P10, P12 노드)로 나타내고, 상/하위 패턴이 모두 존재하는 패턴은 Node 패턴(P2, P3, P4, P6, P7, P8, P11 노드)으로 나타낼 수 있다.In this case, the topmost pattern in which there is no upper pattern is represented by a root node (P0 node), the least pattern in which no sub pattern exists is represented by a leaf node (P1, P5, P9, P10 and P12 nodes) A pattern in which all patterns exist can be represented by a Node pattern (P2, P3, P4, P6, P7, P8, and P11 nodes).

예를 들어, 악성코드 판단부(140)에 기 저장된 악성 코드 판단 기준의 기준치가 악성지수 합이 80 이상이 경우로 설정되어 있고, 행위 데이터 수집부(110)에서 분석 대상 코드가 호출하는 API의 일련의 호출 흐름에 따른 행위 데이터로 A,B,C의 행위 데이터를 수집했을 경우, 매칭부(120)는 상기 수집된 A,B,C의 행위 데이터를, 트리 구조의 행위 패턴과 순차적으로 매칭한다. 이때, 매칭부(120)에서 매칭된 패턴이 A 행위 데이터는 P0 노드와 매칭되고, B 행위 데이터는 P3 노드와 매칭되고, C 행위 데이터는 P6 노드에 매칭된 경우, 가중치 합산부(130)는 각각의 노드에 부여된 가중치(악성지수)를 합산한다. 즉, P0 노드는 +20이고, P3 노드는 -20이며, P6 노드는 +45이므로, 분석 대상 코드의 가중치(악성지수) 합산 값은 45가 된다. 이는 악성 코드의 판단 기준인 80 이상의 악성지수 값을 충족하지 못하므로, 악성코드 판단부(140)는 상기 분석 대상 코드를 양성 코드로 판단한다.For example, when the reference value of the malicious code determination reference stored in the malicious code determination unit 140 is set to a case where the malicious index sum is 80 or more, and the behavior data collection unit 110 determines that the API When the behavior data of A, B, and C are collected as behavior data according to a series of call flows, the matching unit 120 sequentially matches the collected behavior data of the A, B, and C with the behavior pattern of the tree structure do. At this time, if the matching pattern in the matching unit 120 matches the A action data with the P0 node, the B action data matches with the P3 node, and the C action data matches with the P6 node, the weighted summation unit 130 And adds the weights (malicious indices) given to the respective nodes. That is, since the P0 node is +20, the P3 node is -20, and the P6 node is +45, the weight value (malicious index) of the analysis target code is 45. The malicious code determination unit 140 determines that the analysis target code is a positive code because it does not satisfy the malicious index value of 80 or more, which is a criterion of the malicious code.

상기와 같은 예는 행위 데이터 수집부(110)가 행위 데이터를 수집할 때, 일련의 순서대로 API 호출 흐름(행태)을 수집하고, 매칭부(120)에서는 일련의 순서대로 수집된 행위 데이터를 트리 구조의 행위 패턴과 순차적으로 모두 매칭한 후, 매칭된 모든 노드에 대한 가중치를 합산하여 악성 코드를 판단하는 예에 대해 설명한 것이다.In the above example, when the behavior data collection unit 110 collects behavior data, it collects the API call flow (behavior) in a series of sequential order, and the matching unit 120 divides the collected behavior data into a tree The malicious code is judged by summing the weights of all the matched nodes after sequentially matching all of the patterns of behavior of the structure.

한편, 본 발명의 또 다른 일 실시예에 따르면, 행위 데이터 수집부(110)는 상기와 같이 일련의 순서대로 API 호출 흐름을 수집하는 것이 아니라, 시간 순으로 호출되는 분석 대상 코드의 API 각각을 하나하나 식별하여, 상기 식별된 API에 따라 각각의 행위 데이터를 수집할 수 있고, 매칭부(120)는 분석 대상 코드가 호출하는 API 하나하나를 따라가면서 트리 구조의 행위 패턴과 매칭할 수 있다.According to another embodiment of the present invention, the action data collecting unit 110 does not collect the API call flow in the above-described sequence, but rather collects APIs of the analysis target code to be called in chronological order And the matching unit 120 may match the behavior pattern of the tree structure while following each API called by the analysis target code.

예를 들어, 악성코드 판단부(140)에 기 저장된 악성 코드 판단 기준의 기준치가 악성지수 합이 80 이상이 경우로 설정되어 있고, 분석 대상 코드가 호출하는 API의 호출 흐름이 A, B, C, D, E, F의 순으로 호출할 경우, 행위 데이터 수집부(110)는 우선 API가 호출하는 A 행위 데이터를 수집하고, 매칭부(120)는 A 행위 데이터와 일치하는 행위에 대응하는 노드를 식별하여 매칭한다. 이때 A 행위 데이터와 매칭된 노드는 P1(+20) 노드라고 가정한다.For example, if the reference value of the malicious code determination reference previously stored in the malicious code determination unit 140 is set to a case where the malicious index sum is 80 or more, and the call flow of the API called by the analysis target code is A, B, C , D, E, and F, the behavior data collection unit 110 firstly collects the A behavior data called by the API, and the matching unit 120 acquires the A behavior data, Are identified and matched. At this time, it is assumed that the node matched with the A action data is a P1 (+20) node.

다음으로 다시 행위 데이터 수집부(110)는 A 행위 데이터 다음으로 호출하는 API, 즉 B 행위 데이터를 수집하고, 매칭부(120)는 B 행위 데이터와 일치하는 행위에 대응하는 노드를 식별하여 매칭하며, 이때 B 행위 데이터와 매칭된 노드는 P2(+30) 노드라고 가정한다. 그리고 가중치 합산부(130)는 매칭부(120)에서 식별하여 매칭된 노드의 가중치를 누적하여 합산하며, 이때 합산된 가중치 값은 P1(+20) + P2(+30) = 50이 된다.Next, the behavior data collecting unit 110 collects the API, that is, the B behavior data, which is called next to the A behavior data, and the matching unit 120 identifies and matches the node corresponding to the behavior matching the B behavior data , It is assumed that the node matched with the B action data is a P2 (+30) node. Then, the weighted summation unit 130 accumulates and weights the matched nodes identified by the matching unit 120, and the summed weight value becomes P1 (+20) + P2 (+30) = 50.

다음으로, 행위 데이터 수집부(110)는 B 행위 데이터 다음으로 호출하는 API, 즉 C 행위 데이터를 수집하고, 매칭부(120)는 C 행위 데이터와 일치하는 행위에 대응하는 노드를 식별하여 매칭하며, 이때 C 행위 데이터와 매칭된 노드는 P3(+40) 노드라고 가정한다. 그리고 가중치 합산부(130)는 이전에 누적되어 합산된 50에 P3(+40)의 가중치 값을 더하며, 이에 따라 P1 내지 P3의 가중치의 합산 값이 90이 된다. 다음으로, 악성코드 판단부(140)는 가중치 합산부(130)에서 합산된 가중치 합산값 90 이 기 정해진 악성 코드 판단 수치(80) 이상이므로, 상기 분석 대상 코드를 악성코드로 판단한다.Next, the behavior data collecting unit 110 collects the API, that is, the C action data, which is called next to the B action data, and the matching unit 120 identifies and matches the node corresponding to the behavior corresponding to the C action data , It is assumed that the node matched with the C action data is a P3 (+40) node. Then, the weighted summation unit 130 adds the weights of P3 (+40) to the previously accumulated 50, and accordingly, the sum of the weights of P1 to P3 becomes 90. [ Next, the malicious code determination unit 140 determines that the analysis target code is a malicious code because the weighted sum value 90 added by the weighted value summing unit 130 is equal to or greater than the predetermined malicious code determination value 80.

즉, 행위 데이터 수집부(110)는 분석 대상 코드가 호출하는 API가 트리 구조의 행위 패턴에 모두 매칭될 때까지 계속 행위 데이터를 수집할 수 있으며, 마찬가지로 매칭부(120)도 API가 행위 패턴에 모두 매칭될 때까지, 상기 분석 대상 코드의 행위 데이터를 트리 구조의 행위 패턴에 매칭시킬 수 있다. 단, 행위 데이터 수집부(110)는 가중치 합산부(130)에서의 가중치 합산 값이 기 정해진 악성 코드를 판단 기준 이상이 될 경우, 행위 데이터를 수집하는 행위 및 매칭하는 행위를 멈출 수 있다.That is, the behavior data collection unit 110 can continuously collect behavior data until the APIs called by the analysis target code are all matched to the behavior patterns of the tree structure. Similarly, the matching unit 120 can also collect the behavior data The behavior data of the analysis target code can be matched to the behavior pattern of the tree structure until all of them match. However, the behavior data collection unit 110 may stop collecting behavior data and matching behavior when the weighted sum value in the weight summing unit 130 is higher than a predetermined criterion for malicious code.

따라서 본 발명은 한 번의 시퀀스에서 악성 코드의 행위 패턴과 양성 코드의 행위 패턴을 한꺼번에 비교 분석해 내는 것이 가능하며, 양성 코드를 양성 코드로 정확히 인식하는 비율을 크게 높이고, 오탐 비율을 크게 낮출 수 있는 효과가 있다.Therefore, the present invention is capable of comparing malicious code behavior patterns and malicious code behavior patterns in one sequence at the same time, greatly enhancing the ratio of recognizing positive codes to positive codes, and significantly reducing the false rate .

도 7은 본 발명의 일 실시예에 따른 시스템 서비스 변조 악성 코드를 탐지하는 예를 나타낸 도면이다.7 is a diagram illustrating an example of detecting a system service modulation malicious code according to an embodiment of the present invention.

도 7을 참조하면, 시스템 서비스를 변조하는 악성 코드가 샌드박스 내에서 동작할 때, 상기 악성 코드는 다음과 같이 5 단계의 패턴 양상을 보일 수 있다.Referring to FIG. 7, when a malicious code modulating a system service operates in a sandbox, the malicious code may exhibit a pattern pattern of five steps as follows.

먼저, 시스템 서비스를 변조하는 악성 코드는 처음 호출하는 API가 Windows system directory 하위에 실행 파일을 생성하는 패턴 양상을 보일 수 있다. 상기 패턴 양상은 매칭부(120)에 의하여 악성코드 탐지를 위한 행위 패턴 내의 Root P0 노드(악성지수 +20)에 매칭될 수 있다. 다음으로 두번째로 호출하는 API는 Windows service 관리자의 요청 handle을 획득하는 패턴 양상을 보일 수 있으며, 이와 같은 패턴 양상은 매칭부(120)에 의하여 행위 패턴 내의 Node P2 노드(악성지수 +10)에 매칭될 수 있다. 다음으로 세번째로 호출하는 API는 이미 존재하는 system service의 handle을 획득하고 중지시키는 패턴 양상을 보일 수 있으며, 이와 같은 패턴 양상은 매칭부(120)에 의하여 Node P4 노드(악성지수 +20)에 매칭될 수 있다. 다음으로, 네번째로 호출하는 API는 시스템 서비스 관련 registry data 변조를 통해 상위에서 생성한 실행 파일을 시스템 서비스로 은폐하는 패턴 양상을 보일 수 있으며, 이와 같은 패턴 양상은 매칭부(120)에 의하여 Node P8 노드(악성지수 +40)에 매칭될 수 있다. 다음으로, 다섯번째로 호출하는 API는 변조된 서비스를 재시작 하는 패턴 양상을 보일 수 있으며, 이와 같은 패턴 양상은 Leaf P12 노드(악성지수 +10)에 매칭될 수 있다. 이때, 각각의 패턴 양상을 트리 구조의 행위 패턴에 매칭한 예는 도 8과 같을 수 있다.First, the malicious code that modifies the system service may show the pattern of the first calling API to generate an executable file under the Windows system directory. The pattern pattern may be matched to the root P0 node (malicious index +20) in the behavior pattern for malicious code detection by the matching unit 120. [ Next, the second calling API may show a pattern pattern for obtaining the request handle of the Windows service manager. Such a pattern pattern is matched to the node P2 node (malicious index + 10) in the behavior pattern by the matching unit 120 . The patterning pattern may be matched to the node P4 node (malicious index +20) by the matching unit 120. The pattern matching unit 120 may be configured to match the pattern of the node P4 . Next, the fourth calling API may show a pattern pattern for concealing the executable file generated by the parent by the system service by modifying the registry data related to the system service. The pattern pattern is detected by the matching unit 120, Node (malicious index +40). Next, the fifth calling API may show a pattern pattern to restart the modulated service, and this pattern pattern may match the Leaf P12 node (malicious index + 10). In this case, an example in which each pattern pattern is matched with a behavior pattern of a tree structure may be as shown in FIG.

도 8은 본 발명의 일 실시예에 따른 시스템 서비스 변조 악성 코드의 행위 패턴을 매칭한 예를 나타낸 도면이다.FIG. 8 is a diagram illustrating an example of matching a behavior pattern of a system service modulation malicious code according to an embodiment of the present invention.

따라서, 도 7을 참조하면, 시스템 서비스를 변조하는 악성 코드는 트리 구조의 행위 패턴 내에 P0(+20), P2(+10), P4(+20), P8(+40) 및 P12(+10) 노드에 매칭되는 것을 알 수 있고, 상기 시스템 서비스를 변조하는 악성 코드의 악성 지수(가중치)는 100 임을 알 수 있다. 따라서, 악성코드 판단부(140)에 악성 코드 판단 기준으로서 가중치 합산 값이 100 이상일 경우로 기 정해져 있을 경우, 악성코드 판단부(140)는 상기 시스템 서비스를 변조하는 악성 코드를 악성 코드로 판단한다.Referring to FIG. 7, malicious codes that modulate system services include P0 (+20), P2 (+10), P4 (+20), P8 ) Node, and it can be seen that the malicious index (weight value) of the malicious code that modulates the system service is 100. Therefore, if the malicious code determination unit 140 determines that the sum of the weight values is 100 or more as a malicious code determination criterion, the malicious code determination unit 140 determines that the malicious code that modifies the system service is a malicious code .

이하에서는 악성 코드 탐지 시스템(100)이 악성코드를 탐지할 때, 행위 데이터 수집부(110)가 일련의 순서대로 API 호출 흐름을 수집하는 것이 아니라, 시간 순으로 호출되는 분석 대상 코드의 API 각각을 하나하나 식별하여, 상기 식별된 API에 따라 각각의 행위 데이터를 수집하고, 매칭부(120)가 분석 대상 코드가 호출하는 API 하나하나를 따라가면서 트리 구조의 행위 패턴과 매칭하는 경우에 대하여, 각 구성 요소별 역할을 간단히 설명하기로 하며, 이에 대한 실시예는 상기에 설명했으므로 이를 참조하도록 한다.Hereinafter, when the malicious code detection system 100 detects a malicious code, the behavior data collection unit 110 does not collect the API call flows in a sequential order, And collects each behavior data according to the identified API. When the matching unit 120 matches one of the APIs called by the analysis target code with the behavior pattern of the tree structure, The role of each component will be briefly described, and an embodiment thereof has been described above, so that reference will be made thereto.

즉, 본 발명의 또 다른 일 실시예에 따른 API 호출 흐름 기반의 악성코드 탐지 시스템(100)은 행위 데이터 수집부(110), 매칭부(120), 가중치 합산부(130) 및 악성코드 판단부(140)를 포함한다.That is, the API call flow-based malicious code detection system 100 according to another embodiment of the present invention includes a behavior data collection unit 110, a matching unit 120, a weighted summation unit 130, (140).

행위 데이터 수집부(110)는 프로그램 상의 보호된 영역 내에서 분석 대상 코드가 동작하는 동안 상기 분석 대상 코드가 호출하는 API(Application Programming Interface)를 식별하고, 상기 식별된 API에 따라서 상기 분석 대상 코드의 행위 데이터를 수집한다.The behavior data collection unit 110 identifies an API (Application Programming Interface) called by the analysis target code while the analysis target code is operating in a protected area on the program, Collect action data.

또한, 행위 데이터 수집부(110)는 분석 대상 코드가 동작하는 동안 프로세스 별로 생성되거나 또는 기록된 상기 행위 데이터를 시간 순으로 수집할 수 있다.In addition, the behavior data collection unit 110 may collect the behavior data generated or recorded for each process during the operation of the analysis target code in chronological order.

매칭부(120)는 각 노드가 특정 행위에 대응하도록 형성된 트리 구조의 행위 패턴 내의 현재 노드와 연결되는 다음 노드들 중, 상기 수집된 상기 분석 대상 코드의 행위 데이터와 일치하는 행위에 대응하는 노드를 식별하여 매칭한다.The matching unit 120 determines a node corresponding to the behavior data of the collected analysis target code among the following nodes connected to the current node in the behavior pattern of the tree structure formed such that each node corresponds to a specific behavior Identify and match.

가중치 합산부(130)는 매칭부(120)에서 매칭된 노드에 부여된 가중치(악성지수)를 누적하여 합산한다.The weighting summation unit 130 accumulates and adds the weights (malicious indices) given to the matched nodes in the matching unit 120.

또한, 가중치 합산부(130)는 악성 코드의 행위로 간주되는 행위에 대응하는 노드에는 플러스(+) 가중치가 부여되고, 양성 코드의 행위로 간주되는 행위에 대응하는 노드에는 마이너스(-) 가중치가 부여된 상기 매칭된 노드의 상기 가중치를 누적하여 합산할 수 있다.In addition, the weighting summation unit 130 adds plus (+) weights to nodes corresponding to behaviors considered malicious code behaviors and minus (-) weights to nodes corresponding to behaviors considered benign code behaviors The weighted values of the matched nodes can be accumulated and added.

이하에서는 본 발명의 일 실시예에 따른 API 호출 흐름 기반의 악성코드 탐지 방법에 대해 설명하되, 상기에 자세히 기술한 내용을 바탕으로 간단히 설명하기로 한다. Hereinafter, a malicious code detection method based on an API call flow according to an embodiment of the present invention will be described, and a brief description will be made based on the details described above.

도 9는 본 발명의 일 실시예에 다른 API 호출 흐름 기반의 악성코드 탐지 방법에 관한 흐름도이다.9 is a flowchart of a method for detecting an API call flow based malicious code according to an embodiment of the present invention.

도 9를 참조하면, 본 발명의 일 실시예에 따른 API 호출 흐름 기반의 악성코드 탐지 방법은 먼저 행위 데이터 수집부(110)에서 프로그램 상의 보호된 영역 내에서 분석 대상 코드를 동작시킨다(S910).Referring to FIG. 9, an API call flow-based malicious code detection method according to an embodiment of the present invention first operates an analysis target code in a protected area on a program in a behavior data collection unit 110 (S910).

이때, 상기 프로그램 상의 보호된 영역은 샌드박스(Sandbox)를 의미하는 것으로, 상기 샌드박스는 보호된 영역 안에서 프로그램을 작동시켜 외부 요인으로부터 악영향을 미칠 수 있는 근원을 차단하는 보안 소프트웨어를 말한다. 즉, 샌드박스는 외부로부터 받은 파일을 바로 실행하지 않고 보호된 영역 안에서 실행시킴으로써, 악성코드 등과 같은 악영향적인 요소를 차단하는 것이 가능하며, 이에 대한 설명은 상기에 도3을 참조하여 자세히 설명했으므로, 이를 참조하기로 한다.At this time, the protected area on the program means a sandbox, and the sandbox refers to security software that operates a program in a protected area to block a source that may adversely affect external factors. That is, the sandbox can block malicious elements such as malicious code by executing a file received from the outside in a protected area without directly executing the file, and the description thereof has been described in detail with reference to FIG. 3, Reference will be made to this.

다음으로, 행위 데이터 수집부(110)는 상기 보호된 영역 내에서 상기 분석 대상 코드가 동작하는 동안 상기 분석 대상 코드가 호출하는 API(Application Programming Interface)의 일련의 호출 흐름에 따른 상기 분석 대상 코드의 행위 데이터를 수집한다(S920).Next, the behavior data collecting unit 110 collects the analysis target code according to a series of call flow of an API (Application Programming Interface) called by the analysis target code while the analysis target code is operating in the protected area Act data is collected (S920).

이때, 행위 데이터 수집부(110)는 행위 데이터를 수집할 때, 일련의 순서대로 API 호출 흐름(행태)을 수집할 수 있으며, 이를 이용해 매칭부(120)에서는 일련의 순서대로 수집된 행위 데이터를 이용해 트리 구조의 행위 패턴과 순차적으로 매칭하는 것이 가능하다.At this time, when collecting the behavior data, the behavior data collection unit 110 may collect the API call flow (behavior) in a series of order, and the matching unit 120 uses the collected behavior data in a sequential order It is possible to sequentially match with the behavior pattern of the tree structure.

또한 본 발명의 또 다른 일 실시예에 따르면, 행위 데이터 수집부(110)는 상기와 같이 일련의 순서대로 API 호출 흐름을 수집하는 것이 아니라, 시간 순으로 호출되는 분석 대상 코드의 API 각각을 하나하나 식별하여, 상기 식별된 API에 따라 각각의 행위 데이터를 수집할 수 있고, 매칭부(120)는 분석 대상 코드가 호출하는 API 하나하나를 따라가면서 트리 구조의 행위 패턴과 매칭할 수 있다.According to another embodiment of the present invention, the action data collection unit 110 does not collect the API call flow in the above-described sequence, but rather collects APIs of the analysis target code to be called in chronological order The matching unit 120 may match the behavior pattern of the tree structure by following each API called by the analysis target code.

즉, 행위 데이터 수집부(110)는 프로그램 상의 보호된 영역 내에서 분석 대상 코드가 동작하는 동안 상기 분석 대상 코드가 호출하는 API(Application Programming Interface)를 식별하고, 상기 식별된 API에 따라서 상기 분석 대상 코드의 행위 데이터를 수집할 수 있다. 이에 대한 설명은 상기에 자세히 설명했으므로, 이하 생략하기로 한다.That is, the behavior data collection unit 110 identifies an application programming interface (API) called by the analysis target code while the analysis target code is operating in a protected area on the program, Act code data can be collected. Since the above description has been described in detail above, the following description will be omitted.

또한, 이때 행위 데이터 수집부(110)는 프로그램 상의 보호된 영역(Sandbox) 내에서 API 후킹(hooking)을 통해 분석 대상 코드가 동작하는 동안 상기 분석 대상 코드의 API 호출 정보를 수집할 수 있으며, 행위 데이터 수집부(110)는 분석 대상 코드가 동작하는 동안 프로세스 별로 생성되거나 또는 기록된 상기 행위 데이터를 시간 순으로 수집할 수 있다.At this time, the behavior data collecting unit 110 may collect API call information of the analysis target code while the analysis target code is operating through API hooking in a protected area (Sandbox) on the program, The data collecting unit 110 may collect the action data generated or recorded for each process during the operation of the analysis target code in chronological order.

다음으로, 매칭부(120)는 상기 수집된 행위 데이터를, 각 노드가 특정 행위에 대응하도록 형성된 트리 구조의 행위 패턴과 순차적으로 매칭한다(S930).Next, the matching unit 120 sequentially matches the collected behavior data with a behavior pattern of a tree structure formed so that each node corresponds to a specific action (S930).

이때 매칭부(120)는 트리 구조의 행위 패턴 내의 현재 노드와 연결되는 다음 노드들 중, 행위 데이터 수집부(110)에서 수집된 행위 데이터의 행위가 상기 현재 노드와 일치하는지의 여부에 따라 상기 다음 노드들과 순차적으로 매칭할 수 있다.At this time, the matching unit 120 determines whether or not the behavior of the action data collected by the behavior data collection unit 110 among the following nodes connected to the current node in the behavior pattern of the tree structure matches the current node, And can be sequentially matched with the nodes.

또한 본 발명의 또 다른 일 실시예에 따르면, 매칭부(120)는 각 노드가 특정 행위에 대응하도록 형성된 트리 구조의 행위 패턴 내의 현재 노드와 연결되는 다음 노드들 중, 상기 수집된 상기 분석 대상 코드의 행위 데이터와 일치하는 행위에 대응하는 노드를 식별하여 매칭할 수 있다.According to another embodiment of the present invention, among the nodes to be connected to the current node in the behavior pattern of the tree structure formed so that each node corresponds to a specific action, It is possible to identify and match the node corresponding to the behavior that matches the behavior data of the node.

다음으로, 가중치 합산부(130)는 행위 데이터 수집부(110)에서 수집된 행위 데이터와 순차적으로 매칭된 상기 행위 패턴 내의 각 노드에 부여된 가중치를 합산한다(S940).Next, the weighting summation unit 130 sums the weights assigned to the respective nodes in the behavior pattern sequentially matching with the behavior data collected by the behavior data collection unit 110 (S940).

또한, 가중치 합산부(130)는 악성 코드의 행위로 간주되는 행위에 대응하는 노드에는 플러스(+) 가중치가 부여되고, 양성 코드의 행위로 간주되는 행위에 대응하는 노드에는 마이너스(-) 가중치가 부여되는 상기 트리 구조의 상기 행위 패턴을 기반으로, 행위 데이터 수집부(110)에서 수집된 행위 데이터와 매칭된 상기 행위 패턴 내의 상기 가중치를 합산할 수 있다.In addition, the weighting summation unit 130 adds plus (+) weights to nodes corresponding to behaviors considered malicious code behaviors and minus (-) weights to nodes corresponding to behaviors considered benign code behaviors Based on the behavior pattern of the tree structure to be given, the weighting values in the behavior patterns matched with the behavior data collected by the behavior data collection unit 110. [

또한, 가중치 합산부(130)는 행위 데이터 수집부(110)가 프로그램 상의 보호된 영역 내에서 분석 대상 코드가 동작하는 동안 상기 분석 대상 코드가 호출하는 API(Application Programming Interface)를 식별하고, 상기 식별된 API에 따라서 상기 분석 대상 코드의 행위 데이터를 수집할 경우, 매칭된 노드에 부여된 가중치를 누적하여 합산할 수 있다.The weight summing unit 130 identifies an application programming interface (API) called by the analysis target code while the behavior data collecting unit 110 operates the analysis target code in the protected area on the program, When the behavior data of the analysis target code is collected according to the API, the weight values assigned to the matched nodes can be accumulated and added.

다음으로, 악성코드 판단부(140)는 상기 합산된 가중치가 기 정해진 합산 값 이상일 경우, 상기 분석 대상 코드를 악성코드로 판단한다(S950).Next, the malicious code determination unit 140 determines that the analysis target code is a malicious code when the summed weight is equal to or greater than a predetermined sum value (S950).

본 발명은 한 번의 시퀀스에서 악성 코드의 행위 패턴과 양성 코드의 행위 패턴을 한꺼번에 비교 분석해 내는 것이 가능하며, 양성 코드를 양성 코드로 정확히 인식하는 비율을 크게 높이고, 오탐 비율을 크게 낮출 수 있는 효과가 있다.The present invention is capable of comparing malicious code behavior patterns and malicious code behavior patterns in one sequence at a time, greatly increasing the rate of recognizing positive codes correctly with positive codes, and significantly reducing the false rate have.

도 10은 본 발명의 일 실시예에 다른 악성코드 탐지를 위한 행위 패턴 생성 방법에 관한 흐름도이다.10 is a flowchart of a method for generating an action pattern for detecting malicious code according to an embodiment of the present invention.

도 10을 참조하면, 본 발명의 일 실시예에 다른 악성코드 탐지를 위한 행위 패턴 생성 방법은 상기 API 호출 흐름 기반의 악성코드 탐지 방법에 포함될 수 있으며, 먼저 복수의 분석 대상 코드들이 프로그램 상에서 동작 시 호출하는 API (Application Programming Interface)의 호출 흐름에 따른 행위 데이터를 수집한다(S1010).Referring to FIG. 10, a method for generating an action pattern for detecting malicious code according to an embodiment of the present invention may be included in the malware detection method based on the API call flow. First, And collects action data according to a call flow of an API (application programming interface) to be called (S1010).

이때, 행위 데이터 수집부(210)는 복수의 악성 코드 및 복수의 양성 코드를 포함하는 복수의 분석 대상 코드들이 프로그램 상에서 동작할 때, 상기 분석 대상 코드들 각각이 호출하는 API의 호출 흐름에 따른 행위 데이터를 모두 수집한다.In this case, when a plurality of analysis target codes including a plurality of malicious codes and a plurality of positive codes are operated on a program, the behavior data collecting unit 210 collects a behavior data according to an API call flow Collect all data.

다음으로, 추출부(220)는 상기 수집된 상기 복수의 분석 대상 코드들 각각의 행위 데이터에서 공통으로 호출되는 API를 추출(S1020)하고, 다음으로, 제1 노드 생성부(230)는 상기 추출된 공통된 API에 대응하는 제1 노드를 생성(S1030)하며, 다음으로, 제2 노드 생성부(240)는 상기 추출된 공통된 API의 다음에 상기 복수의 분석 대상 코드들 각각에 의하여 호출되는 API에 대응하는 적어도 하나의 제2 노드를 생성(S1040)한다.Next, the extracting unit 220 extracts APIs commonly called from the collected behavior data of each of the plurality of analysis subject codes (S1020), and then the first node generating unit 230 extracts the API The second node generator 240 generates a first node corresponding to the common API which is called by the plurality of analysis target codes next to the extracted common API And generates at least one corresponding second node (S1040).

다음으로, 행위 패턴 생성부(250)는 제1 노드 생성부(230)에서 생성된 제1 노드와 제2 노드 생성부(240)에서 생성된 적어도 하나의 제2 노드를 트리 구조로 연결하여 악성코드 탐지를 위한 트리 구조의 행위 패턴을 생성한다(S1050).Next, the behavior pattern generating unit 250 connects the first node generated by the first node generating unit 230 and the at least one second node generated by the second node generating unit 240 in a tree structure, A behavior pattern of the tree structure for code detection is generated (S1050).

이때, 행위 생성 패턴부(250)를 통해 생성된 행위 패턴은 도 5와 같은 트리 구조로 나타낼 수 있으며, 본 발명에 따라 생성된 트리 구조의 행위 패턴은 플러스 패턴 노드, 마이너스 패턴 노드, 독립 패턴 노드 및 분석 패턴 노드와 같이 4가지 종류로 구분될 수 있다. 이에 대한 설명은 상기에 자세히 설명했으므로 이하 생략하기로 한다.5, and the behavior pattern of the tree structure generated according to the present invention includes a plus pattern node, a minus pattern node, an independent pattern node, And an analysis pattern node. Since the description has been described in detail above, the following description will be omitted.

다음으로, 가중치 부여부(260)는 악성 코드의 행위로 간주되는 행위에 대응하는 노드에는 플러스(+) 가중치를 부여하고, 양성 코드의 행위로 간주되는 행위에 대응하는 노드에는 마이너스(-) 가중치를 부여한다(S1060).Next, the weight assigning unit 260 assigns positive (+) weights to nodes corresponding to behaviors considered malicious code behaviors, and assigns negative (-) weights to nodes corresponding to behaviors considered to be benign code behaviors (S1060).

이때, 상기 가중치는 악성 지수를 나타내며, 가중치 부여부(260)는 행위 패턴 생성부(250)에서 트리 구조의 행위 패턴이 생성된 후 가중치를 부여할 수도 있고, 또는 제1 노드 및 제2 노드를 생성할 때 가중치를 부여할 수도 있다.At this time, the weight represents the malignancy index. The weight assigning unit 260 may assign a weight to the action pattern generated by the action pattern generating unit 250 after the action pattern of the tree structure is generated, You can also assign weights when you create them.

본 발명의 일 실시 예에 따른 API 호출 흐름 기반의 악성코드 탐지 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The API call flow based malicious code detection method according to an embodiment of the present invention may be implemented in a form of a program command that can be executed through various computer means and recorded in a computer readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions recorded on the medium may be those specially designed and constructed for the present invention or may be available to those skilled in the art of computer software. Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

이상과 같이 본 발명에서는 구체적인 구성 요소 등과 같은 특정 사항들과 한정된 실시예 및 도면에 의해 설명되었으나 이는 본 발명의 보다 전반적인 이해를 돕기 위해서 제공된 것일 뿐, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 본 발명이 속하는 분야에서 통상적인 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다.As described above, the present invention has been described with reference to particular embodiments, such as specific elements, and specific embodiments and drawings. However, it should be understood that the present invention is not limited to the above- And various modifications and changes may be made thereto by those skilled in the art to which the present invention pertains.

따라서, 본 발명의 사상은 설명된 실시예에 국한되어 정해져서는 아니 되며, 후술하는 특허청구범위뿐 아니라 이 특허청구범위와 균등하거나 등가적 변형이 있는 모든 것들은 본 발명 사상의 범주에 속한다고 할 것이다.Accordingly, the spirit of the present invention should not be construed as being limited to the embodiments described, and all of the equivalents or equivalents of the claims, as well as the following claims, belong to the scope of the present invention .

100: API 호출 흐름 기반의 악성코드 탐지 시스템
110: 행위 데이터 수집부 120: 매칭부
130: 가중치 합산부 140: 악성코드 판단부100: API call flow-based malware detection system
110: Behavior data collecting unit 120:
130 weighted sum unit 140 malicious code determination unit

Claims

Operating an analysis subject code within a protected area of the program;
Collecting behavior data of the analysis target code according to a series of call flow of an API (application programming interface) called by the analysis target code while the analysis target code is operating in the protected area;
Sequentially matching the collected behavior data with a behavior pattern of a tree structure formed so that each node corresponds to a specific behavior;
Summing weights assigned to each node in the behavior pattern that is sequentially matched with the collected behavior data; And
Determining the analysis target code as a malicious code when the summed weight is greater than or equal to a predetermined sum value;
Based malicious code detection method.

The method according to claim 1,
The matching step
And sequentially matching the next nodes according to whether the behavior of the collected behavior data matches the current node among the next nodes connected to the current node in the behavior pattern of the tree structure
A malicious code detection method based on an API call flow.

The method according to claim 1,
The step of summing
A positive (+) weight is assigned to a node corresponding to an action considered malicious code behavior, and a negative (-) weight is assigned to a node corresponding to an action considered as a behavior of a positive code. And summing the weights in the behavior pattern matched with the behavior data
A malicious code detection method based on an API call flow.

The method according to claim 1,
The collecting step
Collecting the action data generated or recorded for each process while the analysis target code is operated in chronological order
A malicious code detection method based on an API call flow.

Operating an analysis subject code within a protected area of the program;
Identifying an application programming interface (API) called by the analysis target code while the analysis target code is operating in the protected area, and collecting behavior data of the analysis target code according to the identified API;
Identifying and matching a node corresponding to an action corresponding to an action data of the collected analysis target code among the following nodes connected to a current node in a behavior pattern of a tree structure formed such that each node corresponds to a specific action;
Accumulating and summing weights assigned to the matched nodes; And
Determining the analysis target code as a malicious code when the summed weight is greater than or equal to a predetermined sum value;
Based malicious code detection method.

6. The method of claim 5,
The step of summing
A positive (+) weight is assigned to a node corresponding to an action considered malicious code behavior, and a node corresponding to an action regarded as an action of a positive code is assigned a negative weight (-) weight of the matched node To accumulate and sum up
A malicious code detection method based on an API call flow.

6. The method of claim 5,
The collecting step
Collecting the action data generated or recorded for each process while the analysis target code is operated in chronological order
A malicious code detection method based on an API call flow.

Collecting behavior data according to a call flow of an application programming interface (API) called by a plurality of analysis target codes in operation on a program;
Extracting an API commonly called from the collected behavior data of each of the plurality of analysis subject codes;
Generating a first node corresponding to the extracted common API;
Generating at least one second node corresponding to an API called by each of the plurality of analysis target codes next to the extracted common API; And
Generating a behavior pattern of a tree structure for malicious code detection by connecting the first node and the at least one second node in a tree structure;
The method comprising the steps of: generating a behavior pattern for malicious code detection;

9. The method of claim 8,
A positive (+) weight is assigned to a node corresponding to an action considered malicious code behavior, and a negative (-) weight is assigned to a node corresponding to an action considered to be a behavior of a positive code
And generating a behavior pattern for malicious code detection.

A computer-readable recording medium having recorded therein a program for executing the method according to any one of claims 1 to 9.

An action data collecting unit for collecting action data of the analysis target code according to a series of call flow of an API (application programming interface) invoked by the analysis target code while the analysis target code is operating within a protected area on the program;
A matching unit sequentially matching the collected behavior data with a behavior pattern of a tree structure formed so that each node corresponds to a specific behavior;
A weight summing unit for summing weights assigned to respective nodes in the behavior pattern sequentially matched with the collected behavior data; And
A malicious code determining unit that determines the analysis target code as a malicious code when the summed weight is greater than or equal to a predetermined sum value;
Based malicious code detection system.

12. The method of claim 11,
The matching unit
And sequentially matching the next nodes according to whether the behavior of the collected behavior data matches the current node among the next nodes connected to the current node in the behavior pattern of the tree structure
Based malicious code detection system.

12. The method of claim 11,
The weighting-
A positive (+) weight is assigned to a node corresponding to an action considered malicious code behavior, and a negative (-) weight is assigned to a node corresponding to an action considered as a behavior of a positive code. And summing the weights in the behavior pattern matched with the behavior data
Based malicious code detection system.

12. The method of claim 11,
The behavior data collecting unit
Collecting the action data generated or recorded for each process while the analysis target code is operated in chronological order
Based malicious code detection system.

(API) called by the analysis target code while the analysis target code is operating within the protected area on the program, and collecting behavior data of the analysis target code according to the identified API. part;
A matching unit for identifying and matching a node corresponding to an action matching the behavior data of the collected analysis target code among the following nodes connected to the current node in the behavior pattern of the tree structure formed so that each node corresponds to a specific action; ;
A weight summing unit for accumulating and summing weights assigned to the matched nodes; And
A malicious code determining unit that determines the analysis target code as a malicious code when the summed weight is greater than or equal to a predetermined sum value;
Based malicious code detection system.

16. The method of claim 15,
The weighting-
A positive (+) weight is assigned to a node corresponding to an action considered malicious code behavior, and a node corresponding to an action regarded as an action of a positive code is assigned a negative weight (-) weight of the matched node To accumulate and sum up
Based malicious code detection system.

16. The method of claim 15,
The behavior data collecting unit
Collecting the action data generated or recorded for each process while the analysis target code is operated in chronological order
Based malicious code detection system.

A behavior data collecting unit for collecting behavior data according to a call flow of an API (Application Programming Interface) called by a plurality of analysis target codes in operation on a program;
An extracting unit for extracting an API commonly called from the collected behavior data of each of the plurality of analysis target codes;
A first node generator for generating a first node corresponding to the extracted common API;
A second node generating unit for generating at least one second node corresponding to an API called by each of the plurality of analysis target codes next to the extracted common API; And
A behavior pattern generation unit for generating a behavior pattern of a tree structure for detecting malicious code by connecting the first node and the at least one second node in a tree structure;
A behavior pattern generation system for malicious code detection.

19. The method of claim 18,
(+) Weight is assigned to a node corresponding to an action considered to be malicious code, and a weight is assigned to a node corresponding to an action considered to be a behavior of a positive code.
A malicious code detection unit for detecting a malicious code,