KR20220038643A

KR20220038643A - Apparatus and method for malware lineage inference system with generating phylogeny

Info

Publication number: KR20220038643A
Application number: KR1020220032974A
Authority: KR
Inventors: 임정호; 유찬곤; 김남규; 김동주; 김철호; 백종현; 안지혜; 이건호
Original assignee: 국방과학연구소
Priority date: 2020-02-25
Filing date: 2022-03-16
Publication date: 2022-03-29
Also published as: KR102382017B1; KR20210108154A

Abstract

The present disclosure relates to an apparatus and method for analyzing a malware evolution relation. According to one embodiment of the present invention, the method for analyzing the malware evolution relation comprises: a step of calculating first complexity of each of a plurality of pieces of malware binary; a step of selecting primally generated source binary by using the calculated first complexity; a step of inferring evolution order of a plurality of pieces of malware binary except for the source binary based on a degree of distance between the calculated first complexity and the plurality of pieces of malware binary.

Description

Apparatus and method for analyzing evolutionary relationship of malicious code {APPARATUS AND METHOD FOR MALWARE LINEAGE INFERENCE SYSTEM WITH GENERATING PHYLOGENY}

본 개시는 악성코드의 진화관계를 분석하는 장치 및 방법을 제공한다.The present disclosure provides an apparatus and method for analyzing evolutionary relationships of malicious codes.

악의적인 목적으로 제작되는 악성코드는 시스템과 시스템 사용자에게 심각한 피해를 가져온다. 악성코드를 탐지하기 위하여 다양한 방법이 연구되고 있지만 악성코드 작성자 또한 악성코드 탐지 알고리즘을 우회하기 위하여 지속적으로 기능을 보강하여 새로운 악성 코드를 제작한다. 새로 출현하는 악성코드의 수는 점차 기하급수적으로 증가하고 있으며 그에 따라 분석가들을 점점 힘들게 하고 있다. Malware created for malicious purposes causes serious damage to the system and system users. Although various methods are being researched to detect malicious code, malicious code writers also create new malicious code by continuously reinforcing functions to bypass the malicious code detection algorithm. The number of emerging malware is growing exponentially, making it increasingly difficult for analysts.

그러나 이렇게 새로 출현하는 악성코드는 기존 악성코드와 완전히 다른 새로운 악성코드는 아니며, 기존 악성코드를 기반으로 필요한 기능을 수정하거나, 추가 한 버전인 경우가 대부분이다. 비록, 기존 악성코드와 새로운 악성코드가 동일하거나 매우 유사하다고 할 수는 없지만, 악성코드의 버전이 올라가는 것을 생명체 진화의 관점으로 보고 악성코드 진화 관계도를 구축해 놓으면 새로운 악성코드를 기 분석된 악성코드와 비교하여 분석을 정확하고 빠르게 수행할 수 있다. However, these newly emerged malicious codes are not new malicious codes that are completely different from existing malicious codes, and most of them are versions with modified or added functions based on existing malicious codes. Although it cannot be said that the existing malicious code and the new malicious code are the same or very similar, if the malicious code evolution relationship diagram is established, the new malicious code is analyzed from the previously analyzed malicious code by looking at the evolution of the malicious code from the viewpoint of the evolution of the organism. Compared to , analysis can be performed accurately and quickly.

종래에는, Software Evolution 법칙에 의거하여 프로그램의 크기와 복잡도를 통해 악성코드의 진화관계를 자동으로 추론하는 알고리즘을 개발하였다. 하지만 “Released”로 빌드된 바이너리에서의 추론과 Root를 추론하는 과정에서 정확하지 않은 결과가 나타날 수 있다. “Released”로 빌드 되었을 경우 실제 코드의 사이즈가 크다고 해서 바이너리의 용량이 크게 나타나지 않기 때문이다. 또한, Creation Time 정보를 활용하여 인공지능 모델을 설계한 뒤 진화관계를 추론하는 시스템을 제시하였다. Creation Time 정보가 제대로 제공되지 않는 악성코드에 대해서는 정확한 진화관계 추론 과정이 어렵다는 한계점이 있다. Conventionally, an algorithm for automatically inferring the evolutionary relationship of malicious codes based on the size and complexity of a program has been developed based on the law of Software Evolution. However, inaccurate results may appear in the process of inferring the root and inferring the binaries built with “Released”. This is because, when built with “Released”, the size of the binary does not appear large just because the size of the actual code is large. In addition, a system for inferring evolutionary relationships after designing an artificial intelligence model using Creation Time information is presented. For malicious codes that do not provide creation time information properly, there is a limitation in that it is difficult to infer an accurate evolutionary relationship.

기존의 연구에서는, 기존의 툴을 이용하여 악성코드의 패밀리를 분류하고 동적으로 실행하여 Execution log를 얻고, 그 이후 Unpacking 과정을 통해 Packing된 악성코드에 대하여도 진화관계 분석이 가능한 방법을 제시하였다. 또한, 바이너리 그 자체를 input으로 하지 않고 각 프로그램의 함수를 단위로 하여 동일하게 호출하는 함수가 많을수록 유사도가 높다고 판단하였고, 함수의 유사도를 바탕으로 진화관계를 추론하고 그래프를 생성하였다. Unpacker를 사용할 시 기존 바이너리의 코드가 변경될 수 있다는 점과 함수의 식별이 제대로 이루어지지 않으면 진화관계를 정상적으로 추론할 수 없다는 한계점이 있다.In the previous study, a method was proposed that classifies the families of malicious codes using existing tools and dynamically executes them to obtain the execution log, and then analyzes the evolutionary relationship for the malicious codes packed through the unpacking process. In addition, it was judged that the similarity was higher as there were more functions calling the same function by using the function of each program as a unit, rather than using the binary itself as an input. When using Unpacker, there are limitations in that the code of the existing binary can be changed and the evolutionary relationship cannot be normally inferred if the function is not properly identified.

소프트웨어의 진화관계 추론은 버전 정보를 알 수 없는 여러 프로그램의 집합에서 어떤 순서로 프로그램들이 개발되어 왔는지 파악하기 위해 연구되고 있다. 소프트웨어의 진화관계는 악성코드의 분류 또는 소프트웨어 취약점 추적 등을 수행할 때 매우 유용한 정보를 제공해 줄 수 있다. 특히 하루가 다르게 실시간으로 새롭게 생성되고 있는 변종 악성코드를 분석하기 위해서는 기존의 악성코드들이 어떻게 변형되어 가는지 연구하는 것이 매우 중요하다. 또한 실제 사이버 환경에서 새롭게 발견되는 수많은 악성코드들을 분석가가 일일이 수동으로 분석하는 데에는 한계가 존재한다.The evolutionary relationship inference of software is being studied to figure out in what order the programs were developed from a set of programs whose version information is unknown. The evolutionary relationship of software can provide very useful information when classifying malicious code or tracking software vulnerabilities. In particular, it is very important to study how the existing malicious codes are transformed in order to analyze the new types of malicious codes that are being created in real time day by day. In addition, there is a limit to the analyst's ability to manually analyze numerous malicious codes newly discovered in the actual cyber environment.

이에 따라, 악성코드의 진화관계 분석을 자동화하여 새로운 악성코드에 대해 즉각적인 대응을 수행할 필요성이 대두되고 있는 실정이다.Accordingly, there is a need for an immediate response to a new malicious code by automating the evolutionary relationship analysis of malicious code.

KR 10-1880796KR 10-1880796 KR 10-1512462KR 10-1512462

악성코드 진화관계를 분석하는 장치 및 방법을 제공하는 데 있다. 또한, 상기 방법을 컴퓨터에서 실행시키기 위한 프로그램을 기록한 기록매체를 제공하는 데 있다. 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 이하의 실시예들로부터 또 다른 기술적 과제들이 유추될 수 있다.An object of the present invention is to provide an apparatus and method for analyzing the evolutionary relationship of malware. Another object of the present invention is to provide a recording medium in which a program for executing the method in a computer is recorded. The technical problem to be achieved by this embodiment is not limited to the technical problems as described above, and other technical problems may be inferred from the following embodiments.

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본 개시의 제 1측면은, 악성코드 진화관계를 분석하는 방법에 있어서, 복수의 악성코드 바이너리들 각각의 제 1복잡도를 계산하는 단계, 상기 계산된 제 1복잡도를 이용하여 최초로 생성된 근원 바이너리를 선별하는 단계, 및 상기 계산된 제 1복잡도 및 상기 복수의 악성코드 바이너리들 간의 거리도를 기초로 하여 상기 근원 바이너리 외 상기 복수의 악성코드 바이너리들의 진화 순서를 추론하는 단계를 포함하는 악성코드 진화관계 분석 방법을 제공할 수 있다.As a technical means for achieving the above technical problem, a first aspect of the present disclosure provides a method for analyzing a malicious code evolution relationship, comprising: calculating a first complexity of each of a plurality of malicious code binaries; selecting a source binary first generated using a first complexity, and evolution of the plurality of malicious code binaries other than the source binary based on the calculated first complexity and distance between the plurality of malicious code binaries It is possible to provide a malicious code evolution relationship analysis method including the step of inferring the order.

또한, 상기 근원 바이너리를 선별하는 단계는, 상기 복수의 악성코드 바이너리들 중 같은 패밀리로 분류된 악성코드 바이너리들로부터 상기 제 1복잡도가 가장 낮은 악성코드 바이너리를 상기 근원 바이너리로 선별하는 단계를 포함하는 악성코드 진화관계 분석 방법을 제공할 수 있다.In addition, the step of selecting the source binary includes selecting a malicious code binary having the lowest first complexity as the source binary from malicious code binaries classified into the same family among the plurality of malicious code binaries. It is possible to provide a method for analyzing the evolutionary relationship of malicious code.

또한, 상기 제 1복잡도를 계산하는 단계는, 동적 분석과 정적 분석을 이용하여 상기 복잡도를 계산하는 단계를 포함하고, 상기 동적 분석은, 상기 복수의 악성코드 바이너리들 각각이 호출하는 API 시퀀스 개수를 추출하고, 상기 정적 분석은, 상기 복수의 악성코드 바이너리들 각각의 제 2복잡도를 추출하되, 상기 추출된 제 2복잡도는 노드(Node)의 개수와 엣지(Edge)의 개수의 합으로 결정되는 악성코드 진화관계 분석 방법을 제공할 수 있다. In addition, calculating the first complexity includes calculating the complexity using dynamic analysis and static analysis, and the dynamic analysis determines the number of API sequences that each of the plurality of malicious code binaries call. extraction, and the static analysis extracts a second complexity of each of the plurality of malicious code binaries, wherein the extracted second complexity is determined by the sum of the number of nodes and the number of edges. A code evolution relationship analysis method can be provided.

또한, 상기 제 1복잡도는, 다음 수학식 1에 따라 계산하는 악성코드 진화관계 분석 방법을 제공할 수 있다.In addition, the first complexity may provide a malicious code evolution relationship analysis method calculated according to Equation 1 below.

[수학식 1][Equation 1]

(여기서, 상기

임의의 값으로 가중치에 해당하고, 상기 s는 상기 제 2복잡도, 상기 d는 상기 API 시퀀스 개수임)(here, the

An arbitrary value corresponds to a weight, where s is the second complexity, and d is the number of API sequences)

또한, 상기 복수의 악성코드 바이너리들 중 임의의 악성코드 바이너리가 패킹되어 있을 경우, 상기

보다 작고, 상기 임의의 악성코드 바이너리가 안티디버깅 되어 있을 경우, 상기

보다 작고, 상기 임의의 악성코드 바이너리가 패킹 및 안티디버깅 되어 있지 않을 경우, 상기

가 동일한 것을 포함하는 악성코드 진화관계 분석 방법을 제공할 수 있다. In addition, if any malicious code binary among the plurality of malicious code binaries is packed, the

If it is smaller and the arbitrary malicious code binary is anti-debugging, the

If it is smaller and the arbitrary malicious code binary is not packed and anti-debugging, the

can provide a malicious code evolution relationship analysis method including the same.

또한, 상기 거리도는, 다음 수학식 2에 따라 계산되는 악성코드 진화관계 분석 방법을 제공할 수 있다. In addition, the distance diagram may provide a method for analyzing the evolutionary relationship of malicious codes calculated according to Equation 2 below.

[수학식 2][Equation 2]

(여기서,상기

는 임의의 악성코드 바이너리 각각을 의미하며, 상기

는 상기

간의 거리도를 의미하며, 상기

는 상기

의 유사도임)(here, the above

denotes each of the arbitrary malicious code binaries,

is said

It means the distance between

is said

is the similarity of )

또한, 상기 유사도는 API 시퀀스의 개수를 이용하여 계산하는, 악성코드 진화관계 분석 방법을 제공할 수 있다.Also, it is possible to provide a malicious code evolution relationship analysis method in which the similarity is calculated using the number of API sequences.

또한, 상기 유사도는 니들만-브니쉬 알고리즘(Needleman-Wunsch Algorithm), 스미스-워터맨 알고리즘(Smith-Waterman Algorithm) 및 히르쉬베르크 알고리즘(Hirschberg's Algorithm)중 적어도 어느 하나를 이용하는, 악성코드 진화관계 분석 방법을 제공할 수 있다.In addition, the similarity is determined by using at least one of the Needleman-Wunsch Algorithm, the Smith-Waterman Algorithm, and the Hirschberg's Algorithm, a malicious code evolution relationship analysis method can provide

또한, 상기 진화 순서를 추론하는 단계는, 상기 근원 바이너리를

으로 식별하는 단계; 상기 복수의 악성코드 바이너리들 중 진화 순서가 식별된 악성코드 바이너리들의 집합은

이며, 상기 복수의 악성코드 바이너리들 중 진화 순서가 식별되지 않은 악성코드 바이너리들의 집합은

라 할 때, 상기

상기 제 1복잡도에 따라 오름차순으로 정렬하는 단계, 상기

에서 상기 제 1복잡도가 가장 낮은 바이너리를

로 선택하는 단계, 다음 수학식 3을 만족하는 상기

의 악성코드 바이너리를

로 선택하는 단계,In addition, the step of inferring the evolution order, the source binary

identifying as; A set of malicious code binaries whose evolution order is identified among the plurality of malicious code binaries is

A set of malicious code binaries whose evolution order is not identified among the plurality of malicious code binaries is

When said to

sorting in ascending order according to the first complexity;

The binary with the lowest first complexity in

Selecting as , wherein the following equation (3) is satisfied

of malware binaries

step to choose,

[수학식 3][Equation 3]

(여기서, 상기

는 임의의 악성코드 바이너리 각각을 의미하며, 상기

는 상기

간의 거리도임)(here, the

denotes each of the arbitrary malicious code binaries,

is said

distance between them)

상기 선택된

식별하는 단계, 상기

에서 자식이 없는 악성코드 바이너리들인

와 상기 식별된

에 대해

를 계산하는 단계, 상기

에 대해

를 계산하는 단계, 상기

가 상기

보다 적다면, 상기

를 상기

의 부모로 식별하는 단계 및 상기

에 상기 복수의 악성코드 바이너리들이 전부 포함될 때까지 반복하여 진화순서를 식별하는 단계를 포함하는 악성코드 진화관계 분석 방법을 제공할 수 있다.the selected

identifying, said

Malware binaries that have no children in

and identified above

About

calculating, said

About

calculating, said

is reminded

If less, the

to remind

identifying as the parent of the

It is possible to provide a malicious code evolution relationship analysis method comprising repeatedly identifying the evolution order until all of the plurality of malicious code binaries are included in the .

또한, 상기 진화 순서를 추론하는 단계는, 상기 식별된 진화 순서에 따라 그래프를 도출하는 단계를 더 포함하는 악성코드 진화관계 분석 방법을 제공할 수 있다.In addition, the step of inferring the evolutionary order may provide a malicious code evolution relationship analysis method further comprising the step of deriving a graph according to the identified evolutionary order.

본 개시의 제 2측면은, 제 1측면에 따른 방법을 컴퓨터에서 실행시키기 위한 프로그램을 기록한 기록매체를 제공할 수 있다.A second aspect of the present disclosure may provide a recording medium in which a program for executing the method according to the first aspect in a computer is recorded.

본 개시의 제 3측면은, 악성코드 진화관계를 분석하는 장치에 있어서, 메모리: 및 프로세서를 포함하고, 상기 프로세서는, 복수의 악성코드 바이너리들 각각의 제 1복잡도를 계산하고, 상기 계산된 제 1복잡도를 이용하여 최초로 생성된 근원 바이너리를 선별하고, 상기 계산된 제 1복잡도 및 상기 복수의 악성코드 바이너리들 간의 거리도를 기초로 하여 상기 근원 바이너리 외 상기 복수의 악성코드 바이너리들의 진화 순서를 추론하는 악성코드 진화관계 분석 장치를 제공할 수 있다.A third aspect of the present disclosure provides an apparatus for analyzing a malicious code evolution relationship, comprising: a memory; and a processor, wherein the processor calculates a first complexity of each of a plurality of malicious code binaries, and The source binary generated first is selected using complexity 1, and the order of evolution of the plurality of malicious code binaries other than the source binary is inferred based on the calculated first complexity and the distance between the plurality of malicious code binaries. It is possible to provide a device for analyzing the evolutionary relationship of malicious code.

본 개시는, 다량의 악성코드가 주어진 경우, 악성코드 바이너리들의 제 1복잡도를 계산하고, 악성코드 바이너리들의 제 1복잡도에 기초하여 근원 바이너리를 선별할 수 있으며, 악성코드 바이너리들 간의 거리도 및 제 1복잡도에 기초하여 악성코드 바이너리들의 진화 순서를 추론할 수 있다. 또한, 악성코드 바이너리들의 진화 순서를 추론하여 그래프로 도출할 수 있다. 이에 따라, 악성코드 바이너리들의 진화 순서를 효과적으로 식별할 수 있고, 악성코드 바이너리들의 진화 순서를 추론하여 사용자에게 효과적으로 제공할 수 있으며, 새로운 악성코드에 대한 분석을 정확하고 빠르게 수행할 수 있다.According to the present disclosure, when a large amount of malicious code is given, it is possible to calculate the first complexity of the malicious code binaries, select the source binary based on the first complexity of the malicious code binaries, and the distance between the malicious code binaries and the second 1Based on the complexity, the evolution order of malicious code binaries can be inferred. In addition, the evolution order of malicious code binaries can be inferred and derived as a graph. Accordingly, the evolution order of malicious code binaries can be effectively identified, the evolution order of malicious code binaries can be inferred and provided to the user effectively, and new malicious code can be analyzed accurately and quickly.

도 1은 일 실시예에 따른 악성코드 진화관계를 분석하는 방법의 흐름도이다.
도 2는 도 1에 도시된 단계 110 및 단계 120의 구체적인 흐름도이다.
도 3은 도 1에 도시된 단계 130의 구체적인 흐름도이다.
도 4 및 도 5는 일 실시예에 따른 악성코드 바이너리들의 진화 순서를 추론하고, 추론된 진화 순서에 따른 그래프를 도출하는 예시를 설명하기 위한 도면이다.
도 6은 일 실시예에 따른 악성코드 진화관계 분석 장치의 블록도이다.1 is a flowchart of a method of analyzing a malicious code evolution relationship according to an embodiment.
FIG. 2 is a detailed flowchart of steps 110 and 120 shown in FIG. 1 .
FIG. 3 is a detailed flowchart of step 130 shown in FIG. 1 .
4 and 5 are diagrams for explaining an example of inferring the evolution order of malicious code binaries and deriving a graph according to the inferred evolution order according to an embodiment.
6 is a block diagram of a malicious code evolution relationship analysis apparatus according to an embodiment.

본 실시예들에서 사용되는 용어는 본 실시예들에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 당 분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 부분에서 상세히 그 의미를 기재할 것이다. 따라서, 본 실시예들에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 실시예들 전반에 걸친 내용을 토대로 정의되어야 한다. The terms used in the present embodiments are selected as currently widely used general terms as possible while considering the functions in the present embodiments, which may vary depending on the intention or precedent of a person skilled in the art, the emergence of new technology, etc. there is. In addition, in certain cases, there are also terms arbitrarily selected by the applicant, and in this case, the meaning will be described in detail in the relevant part. Therefore, the terms used in the present embodiments should be defined based on the meaning of the term and the contents throughout the present embodiments, rather than the simple name of the term.

본 실시예들은 다양한 변경을 가할 수 있고 여러 가지 형태를 가질 수 있는바, 일부 실시예들을 도면에 예시하고 상세하게 설명하고자 한다. 그러나, 이는 본 실시예들을 특정한 개시형태에 대해 한정하려는 것이 아니며, 본 실시예들의 사상 및 기술범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 본 명세서에서 사용한 용어들은 단지 실시예들의 설명을 위해 사용된 것으로, 본 실시예들을 한정하려는 의도가 아니다.Since the present embodiments may have various changes and may have various forms, some embodiments will be illustrated in the drawings and described in detail. However, this is not intended to limit the present embodiments to a specific disclosed form, and should be understood to include all modifications, equivalents, and substitutes included in the spirit and scope of the present embodiments. The terms used herein are used only for description of the embodiments, and are not intended to limit the present embodiments.

본 실시예들에 사용되는 용어들은 다르게 정의되지 않는 한, 본 실시예들이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미가 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 실시예들에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않아야 한다.Unless otherwise defined, terms used in the present embodiments have the same meanings as commonly understood by those of ordinary skill in the art to which the present embodiments belong. Terms such as those defined in a commonly used dictionary should be interpreted as having a meaning consistent with the meaning in the context of the related art, and unless explicitly defined in the present embodiments, have an ideal or excessively formal meaning. should not be interpreted.

본 개시의 일부 실시예는 기능적인 블록 구성들 및 다양한 처리 단계들로 나타내어질 수 있다. 이러한 기능 블록들의 일부 또는 전부는, 특정 기능들을 실행하는 다양한 개수의 하드웨어 및/또는 소프트웨어 구성들로 구현될 수 있다. 예를 들어, 본 개시의 기능 블록들은 하나 이상의 마이크로프로세서들에 의해 구현되거나, 소정의 기능을 위한 회로 구성들에 의해 구현될 수 있다. 또한, 예를 들어, 본 개시의 기능 블록들은 다양한 프로그래밍 또는 스크립팅 언어로 구현될 수 있다. 기능 블록들은 하나 이상의 프로세서들에서 실행되는 알고리즘으로 구현될 수 있다. 또한, 본 개시는 전자적인 환경 설정, 신호 처리, 및/또는 데이터 처리 등을 위하여 종래 기술을 채용할 수 있다. “매커니즘”, “요소”, “수단” 및 “구성”등과 같은 용어는 넓게 사용될 수 있으며, 기계적이고 물리적인 구성들로서 한정되는 것은 아니다. 또한, 명세서에 기재된 "??부", "??모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어 또는 소프트웨어로 구현되거나 하드웨어와 소프트웨어의 결합으로 구현될 수 있다.Some embodiments of the present disclosure may be represented by functional block configurations and various processing steps. Some or all of these functional blocks may be implemented in various numbers of hardware and/or software configurations that perform specific functions. For example, the functional blocks of the present disclosure may be implemented by one or more microprocessors, or by circuit configurations for a given function. Also, for example, the functional blocks of the present disclosure may be implemented in various programming or scripting languages. The functional blocks may be implemented as an algorithm running on one or more processors. In addition, the present disclosure may employ prior art for electronic configuration, signal processing, and/or data processing, and the like. Terms such as “mechanism”, “element”, “means” and “configuration” may be used broadly and are not limited to mechanical and physical components. In addition, terms such as "unit" and "module" described in the specification mean a unit that processes at least one function or operation, which may be implemented as hardware or software, or a combination of hardware and software. there is.

또한, 도면에 도시된 구성 요소들 간의 연결 선 또는 연결 부재들은 기능적인 연결 및/또는 물리적 또는 회로적 연결들을 예시적으로 나타낸 것일 뿐이다. 실제 장치에서는 대체 가능하거나 추가된 다양한 기능적인 연결, 물리적인 연결, 또는 회로 연결들에 의해 구성 요소들 간의 연결이 나타내어질 수 있다. In addition, the connecting lines or connecting members between the components shown in the drawings only exemplify functional connections and/or physical or circuit connections. In an actual device, a connection between components may be represented by various functional connections, physical connections, or circuit connections that are replaceable or added.

이하 첨부된 도면을 참고하여 본 개시를 상세히 설명하기로 한다.Hereinafter, the present disclosure will be described in detail with reference to the accompanying drawings.

도1은 일 실시예에 따른 악성코드 진화관계를 분석하는 방법의 흐름도이다. 도 1을 참조하면, 단계 110에서 악성코드 진화관계를 분석하는 장치는 복수의 악성코드 바이너리들 각각의 제 1복잡도를 계산 할 수 있다. 1 is a flowchart of a method of analyzing a malicious code evolution relationship according to an embodiment. Referring to FIG. 1 , the apparatus for analyzing the evolutionary relationship of malicious code in step 110 may calculate a first complexity of each of a plurality of malicious code binaries.

악성코드는 기존의 악성코드 외에도 변종 악성코드가 포함 될 수 있다. 변종 악성코드는 아래의 다양한 기법이 적용되어 생성될 수 있다.Malicious code may include variant malicious code in addition to existing malicious code. Variant malicious code can be created by applying the following various techniques.

변종 악성코드는 난독화(Obfuscation) 기법에 의해 생성될 수 있으며, 난독화 기법은 쓰레기코드 삽입(dead code insertion), 레지스터 재할당(register reassignment), 서브루틴 재배치(subroutine reordering), 인스트럭션 치환(instruction substitution), 코드 전위(code transportation), 코드 통합(code integration) 등을 포함할 수 있다.Malware variants can be generated by obfuscation techniques, and the obfuscation techniques include dead code insertion, register reassignment, subroutine reordering, and instruction replacement. substitution, code transportation, code integration, and the like.

악성코드 바이너리는, 악의적인 목적을 위해 작성된 악성코드를 컴퓨터 내부적으로 이용 할 수 있도록 하기 위해 0 또는 1로 이루어진 기계어일 수 있으며, 본 개시가 속하는 기술분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해 될 수 있다.The malicious code binary may be a machine language composed of 0 or 1 in order to allow the malicious code written for malicious purposes to be used internally in a computer, and is generally can be understood

제 1복잡도는 악성코드 바이너리의 복잡도를 의미하고, 악성코드의 성능 또는 복잡한 정도를 의미할 수 있다. The first complexity may mean the complexity of the malicious code binary, and may mean the performance or complexity of the malicious code.

복수의 악성코드 바이너리들은 시스템에서 입력 값으로 주어 질 수 있으며, 같은 패밀리로 분류된 악성코드 바이너리들을 포함 할 수 있으나 이에 제한되지 않는다. A plurality of malicious code binaries may be given as input values in the system, and may include, but is not limited to, malicious code binaries classified into the same family.

같은 패밀리로 분류된 악성코드 바이너리들은 백신 회사, 컴퓨터 프로그램 회사 등에서 제작한 프로그램에 악성코드 바이너리들을 입력 하였을 때, 같은 제작자로부터 제작되거나, 유사하거나 같은 기능을 가진 악성코드 바이너리들을 같은 군으로 분류하는데, 같은 군으로 분류된 악성코드 바이너리들을 의미할 수 있다. Malicious code binaries classified into the same family are classified into the same group when malicious code binaries are input to programs produced by vaccine companies, computer program companies, etc. It may mean malicious code binaries classified into the same group.

악성코드 바이너리의 제 1복잡도는 동적 분석 및 정적 분석을 이용하여 계산 할 수 있고, 동적 분석은 분석 대상 악성코드 파일을 실행하고 분석하는 방법을 의미하고, 정적 분석은 악성코드 파일을 실행하지 않고 분석하는 방법을 의미할 수 있다.The first complexity of the malicious code binary can be calculated using dynamic analysis and static analysis. Dynamic analysis refers to the method of executing and analyzing the target malicious code file, and static analysis is analysis without executing the malicious code file. It can mean how

악성코드 진화관계 분석 장치는 동적 분석 및 정적 분석을 수행하여 추출된 정보를 기초로 하여 다음 수학식1에 따라 제 1복잡도를 계산 할 수 있다. The apparatus for analyzing the evolutionary relationship of malicious code may calculate the first complexity according to Equation 1 below based on the information extracted by performing dynamic analysis and static analysis.

[수학식 1][Equation 1]

제 1

No. 1

앞의 수학식 1에서

임의의 값으로 가중치에 해당하고 휴리스틱(huristic)하게 정할 수 있다. s는 정적 분석을 수행하여 추출된 정보에 해당하는 것으로, 제 2복잡도를 의미 할 수 있고, d는 동적 분석을 수행하여 추출된 정보에 해당하는 것으로, API 시퀀스 개수를 의미 할 수 있다.In Equation 1 above

An arbitrary value corresponds to a weight and can be determined heuristically. s corresponds to information extracted by performing static analysis, which may mean a second complexity, and d corresponds to information extracted by performing dynamic analysis, which may mean the number of API sequences.

제 2복잡도는 정적 분석기에 의해서 생성되는 하나의 프로그램을 구성하고 있는 단위 프로그램들 사이의 호출 관계를 나타내는 그래프인 호출 그래프의 복잡도를 의미하고, 제 2복잡도는 호출 그래프에서 나타나는 노드의 개수와 엣지의 개수의 합을 나타낼 수 있다.The second complexity means the complexity of the call graph, which is a graph representing the call relationship between the unit programs constituting one program generated by the static analyzer, and the second complexity is the number of nodes and edges appearing in the call graph. It can represent the sum of numbers.

Windows내 프로그램이 실행되면서 함수를 호출하는데, 호출하는 시스템 함수들의 순서를 API 시퀀스라고 하며, API 시퀀스 개수는 시퀀스를 구성하는 API 개수를 의미할 수 있다.A function is called while a program in Windows is executed. The order of the system functions to be called is called an API sequence, and the number of API sequences may mean the number of APIs constituting the sequence.

단계 120에서, 악성코드 진화관계 분석 장치는 단계110에서 계산한 제 1복잡도를 이용하여 근원 바이너리(ROOT)를 선별 할 수 있다. In step 120, the apparatus for analyzing the evolution of malicious code may select the root binary (ROOT) using the first complexity calculated in step 110 .

근원 바이너리는 프로그램들의 진화관계 내에서 가장 먼저 생성된 최초의 악성코드 바이너리이며, 제 1복잡도가 가장 낮은 악성코드 바이너리를 근원 바이너리로 선별 할 수 있다. 예를 들어, 악성코드 진화관계 분석 장치는 같은 패밀리로 분류된 악성코드 바이너리들 각각의 제 1복잡도를 계산하고, 계산된 제 1복잡도가 가장 낮은 악성코드 바이너리를 근원 바이너리로 선별 할 수 있다. 단계 110 및 단계 120은 도2에서 상세히 후술하기로 한다. The source binary is the first malicious code binary generated first within the evolutionary relationship of programs, and the malicious code binary with the lowest first complexity can be selected as the source binary. For example, the malicious code evolution relationship analysis apparatus may calculate a first complexity of each of the malicious code binaries classified into the same family, and select a malicious code binary having the lowest calculated first complexity as the source binary. Steps 110 and 120 will be described later in detail with reference to FIG. 2 .

단계 130에서, 악성코드 진화관계 분석 장치는 계산된 제 1복잡도 및 악성코드 바이너리들 간의 거리도를 기초로 하여 근원 바이너리 외 악성코드 바이너리들의 진화 순서를 추론 할 수 있다. 악성코드 진화관계 분석 장치는 악성코드 바이너리들의 진화 순서를 추론 하기 위해 거리도를 이용할 수 있으며, 거리도는 다음 수학식 2에 따라 계산할 수 있다.In step 130, the apparatus for analyzing the evolutionary relationship of malicious code may infer the evolution order of malicious code binaries other than the source binary based on the calculated first complexity and distance between malicious code binaries. The malicious code evolution relationship analysis apparatus may use the distance diagram to infer the evolution order of malicious code binaries, and the distance diagram may be calculated according to Equation 2 below.

[수학식 2][Equation 2]

는 임의의 악성코드 바이너리 각각을 의미하며,

는

간의 거리도를 의미하며,

는

의 유사도를 나타낸다. 유사도는 자카드(Jaccard) 유사도를 의미 할 수 있으며, 거리도는 1에서 자카드 유사도를 뺀 값을 사용 할 수 있다.

stands for each arbitrary malicious code binary,

Is

It means the distance between

Is

shows the similarity of The similarity may mean a degree of similarity to Jaccard, and a value obtained by subtracting the degree of similarity to Jacquard from 1 may be used for the degree of distance.

악성코드 진화관계 분석 장치는 API 시퀀스 개수를 이용하여 유사도를 계산할 수 있다. 예를 들어, 제 1 악성코드 바이너리 및 제 2 악성코드 바이너리의 API 시퀀스 개수가 각각 100개이고, 공통적으로 호출하는 API 시퀀스 개수가 50개 일 때,

는 50,

는 150이므로 유사도는

으로 계산될 수 있다. The malware evolution relationship analysis device may calculate the similarity by using the number of API sequences. For example, when the number of API sequences of the first malicious code binary and the second malicious code binary is 100, and the number of API sequences commonly called is 50,

is 50,

is 150, so the similarity is

can be calculated as

또한, 악성코드 진화관계 분석 장치는 니들만-브니쉬 알고리즘(Needleman-Wunsch Algorithm), 스미스-워터맨 알고리즘(Smith-Waterman Algorithm) 및 히르쉬베르크 알고리즘(Hirschberg's Algorithm)중 적어도 어느 하나를 이용함으로써 유사도를 계산할 수 있다. In addition, the malicious code evolution relationship analysis apparatus determines the similarity by using at least one of the Needleman-Wunsch Algorithm, the Smith-Waterman Algorithm, and the Hirschberg's Algorithm. can be calculated

니들만-브니쉬 알고리즘은 비교할 서열들의 전체의 길이에 대한 포괄적인 유사성 점수를 계산하는 것으로, 비교할 두 서열의 길이가 비슷하고 서열의 모든 문자가 중요할 때 적절한 알고리즘이다. 스미스-워터맨 알고리즘은 비교할 서열들의 부분의 유사한 영역을 결정하기 위한 것으로 전체 시퀀스를 보는 대신 가능한 모든 길이의 부분을 비교하고 유사성을 측정할 수 있다. 히르쉬베르크 알고리즘은 두 서열 사이에서 최적의 서열 정렬을 찾는 알고리즘이다.The Needleman-Bnish algorithm calculates a global similarity score for the entire length of the sequences to be compared, and is an appropriate algorithm when the lengths of the two sequences to be compared are similar and every letter of the sequence is significant. The Smith-Waterman algorithm is designed to determine similar regions of parts of sequences to be compared. Instead of looking at the entire sequence, it can compare parts of all possible lengths and measure similarity. The Hirschberg algorithm is an algorithm that finds the optimal sequence alignment between two sequences.

일 실시예에서, 악성코드 바이너리들이 각각 동적으로 실행되며 호출하는 API 시퀀스를 비교하여 공통적으로 호출되는 API 시퀀스 개수 계산시, 시퀀스 전체를 비교하는 Global Alignment에는 니들만-브니쉬 알고리즘(Needleman-Wunsch Algorithm), 시퀀스 부분을 비교하는 Local Alignment에는 스미스-워터맨 알고리즘(Smith-Waterman Algorithm), 공간복잡도 최적화에는 히르쉬베르크 알고리즘(Hirschberg's Algorithm)중 적어도 어느 하나를 이용할 수 있다. 다만, 이에 반드시 제한되는 것은 아니다.In one embodiment, when calculating the number of API sequences commonly called by comparing the API sequences that malicious code binaries are dynamically executed and called, the Needleman-Wunsch Algorithm for Global Alignment that compares the entire sequence ), at least one of the Smith-Waterman Algorithm for Local Alignment comparing sequence parts, and Hirschberg's Algorithm for spatial complexity optimization can be used. However, the present invention is not necessarily limited thereto.

악성코드 진화관계 분석 장치는 제 1복잡도 및 거리도를 기초로 하여 근원 바이너리 외 악성코드 바이너리들의 진화순서를 추론 할 수 있으며, 단계 130은 도 3에서 상세히 후술하기로 한다. The malicious code evolution relationship analysis apparatus can infer the evolution order of malicious code binaries other than the source binary based on the first complexity and distance diagram, and step 130 will be described in detail later with reference to FIG. 3 .

도2는 도 1에 도시된 단계 110 및 단계 120의 구체적인 흐름도이다. 도 2는 도 1의 단계 110 및 단계 120에 대응되므로 중복되는 설명은 생략한다. FIG. 2 is a detailed flowchart of steps 110 and 120 shown in FIG. 1 . Since FIG. 2 corresponds to steps 110 and 120 of FIG. 1 , a redundant description will be omitted.

도2를 참조하면, 단계 210에서, 악성코드 진화관계 분석 장치는 악성코드 바이너리들 각각에 대해 동적 분석 및 정적 분석을 수행한다. 악성코드 바이너리들은 같은 패밀리로 분류된 악성코드 바이너리들을 의미할 수 있으나, 이에 제한되지는 않는다.Referring to FIG. 2 , in step 210, the malicious code evolution relationship analysis apparatus performs dynamic analysis and static analysis on each of the malicious code binaries. Malicious code binaries may mean malicious code binaries classified into the same family, but is not limited thereto.

일 실시예에서, 악성코드 바이너리의 동적 분석을 수행시 쿠쿠 샌드박스(Cuckoo Sandbox)를 이용할 수 있다. 악성코드 바이너리를 샌드박스 환경에서 실행시키고, 나타나는 결과 값을 제공받아 정보로 활용할 수 있다. 쿠쿠 샌드박스(Cuckoo Sandbox)에서 제공하는 정보는 JSON(JavaScript Object Notation) 파일 형태로 나타나며, JSON 파일 내에 API 시퀀스 정보가 포함되어 있다.In one embodiment, the Cuckoo Sandbox may be used when performing dynamic analysis of malware binaries. The malicious code binary is executed in a sandbox environment, and the resulting value can be provided and used as information. The information provided by Cuckoo Sandbox appears in the form of a JSON (JavaScript Object Notation) file, and API sequence information is included in the JSON file.

쿠쿠 샌드박스는 자동화된 동적 악성코드 분석 시스템으로, 격리된 환경에서 의심스러운 파일을 검사하는데 사용되고, JSON은 웹과 컴퓨터 프로그램에서 용량이 적은 데이터를 교환하기 위해 데이터 객체를 속성과 값의 형태로 표현하는 형식이다. 쿠쿠 샌드박스의 결과는 JSON의 형식으로 생성할 수 있다.Cuckoo Sandbox is an automated, dynamic malware analysis system that is used to scan suspicious files in an isolated environment, and JSON represents data objects in the form of properties and values to exchange low-volume data between the web and computer programs. is the format The result of the Cuckoo Sandbox can be generated in JSON format.

또한, 악성코드 바이너리의 정적 분석을 수행하여 호출 그래프(Call Graph)의 정보를 얻을 수 있다. 호출 그래프의 노드(node)는 단위 프로그램을, 엣지(edge)는 단위 프로그램 사이의 호출을 의미한다.In addition, information of the call graph can be obtained by performing static analysis of the malicious code binary. A node in the call graph refers to a unit program, and an edge refers to a call between unit programs.

단계 220에서, 악성코드 진화관계 분석 장치는 API 시퀀스 개수 및 제 2복잡도를 추출 할 수 있다. In step 220, the apparatus for analyzing the evolutionary relationship of the malicious code may extract the number of API sequences and the second complexity.

악성코드 바이너리의 동적 분석을 수행하여 얻을 수 있는 API 시퀀스 정보를 이용하여 API 시퀀스 개수를 추출할 수 있고, 악성코드 바이너리의 정적 분석을 수행하여 얻을 수 있는 호출 그래프 정보를 이용하여 제 2복잡도를 추출 할 수 있다. The number of API sequences can be extracted using API sequence information obtained by performing dynamic analysis of malicious code binaries, and the second complexity is extracted using call graph information obtained by performing static analysis of malicious code binaries. can do.

API 시퀀스 개수는 시퀀스를 구성하는 API 개수일 수 있고, 제 2복잡도는 호출 그래프 복잡도를 의미할 수 있다.The number of API sequences may be the number of APIs constituting the sequence, and the second complexity may mean call graph complexity.

단계 230에서, 추출된 API 시퀀스 개수 및 제 2복잡도를 기초로 하여 악성코드 바이너리의 제 1복잡도를 계산할 수 있다. In operation 230, the first complexity of the malicious code binary may be calculated based on the extracted number of API sequences and the second complexity.

악성코드 진화관계 분석 장치는 도 1에서 전술한 수학식1에 따라 제 1복잡도를 계산 할 수 있다. The malicious code evolution relationship analysis apparatus may calculate the first complexity according to Equation 1 described above in FIG. 1 .

임의의 값으로 가중치에 해당하고, 악성코드 바이너리의 특징을 파악하여 휴리스틱하게 정해질 수 있다.

It corresponds to a weight with an arbitrary value, and it can be determined heuristically by identifying the characteristics of the malicious code binary.

악성코드 바이너리는 패킹(Packing) 또는 안티디버깅(Anti-debugging) 되어 있을 수 있는데, 악성코드 바이너리가 패킹되어 있을 경우 정적 분석 보다는 동적 분석이 정확하게 나타나므로

를 높일 수 있다. 그러나 악성코드 바이너리가 안티디버깅 되어 있다면 동적 분석 보다는 정적 분석이 정확하게 나타나므로

를 낮출 수 있다. 또한, 악성코드 바이너리가 패킹 및 안티디버깅이 되어 있지 않을 경우

가 동일할 수 있다.Malicious code binaries can be packed or anti-debugging. If the malicious code binary is packed, dynamic analysis is more accurate than static analysis.

can increase However, if the malware binary is anti-debugging, static analysis is more accurate than dynamic analysis.

can lower Also, if the malware binary is not packed and anti-debugging

may be the same.

패킹은 악성코드 프로그램이 압축되어 분석할 수 없게 난독화된 프로그램의 일부를 의미하고, 안티디버깅은 리버스 엔지니어링(Reverse Engineering)이나 디버깅을 방지하는 소프트웨어 기술로 악성코드의 탐지 및 제거를 어렵게 할 때 사용되는 것을 의미할 수 있다.Packing refers to a part of a program that is compressed and obfuscated so that it cannot be analyzed. Anti-debugging is a software technology that prevents reverse engineering or debugging, and is used when it is difficult to detect and remove malicious code. could mean to be

일 실시예에서, 악성코드 진화관계 분석 장치는 악성코드 바이너리의 제 1복잡도를 계산 할 때, 패킹 및 안티디버깅이 되어 있지 않은 악성코드 바이너리의 경우,

이고, 패킹되어 있으나 안티디버깅이 되어 있지 않은 악성코드 바이너리의 경우,

이고, 패킹되어 있지 않으나 안티디버깅이 되어 있는 상기 악성코드 바이너리의 경우,

을 사용할 수 있으나, 이에 제한되지 않는다. In one embodiment, when the malicious code evolution relationship analysis device calculates the first complexity of the malicious code binary, in the case of the malicious code binary that is not packed and anti-debugging,

, and in the case of a malicious code binary that is packed but not anti-debugging,

, and in the case of the malicious code binary that is not packed but is anti-debugging,

can be used, but is not limited thereto.

단계 240에서, 악성코드 진화관계 분석 장치는 악성코드 바이너리의 제 1복잡도가 가장 낮은 악성코드 바이너리를 근원 바이너리(ROOT)로 선별할 수 있다. In operation 240, the apparatus for analyzing the evolution of malicious code may select a malicious code binary having the lowest first complexity of the malicious code binary as the root binary (ROOT).

도 3은 도 1에 도시된 단계 130의 구체적인 흐름도이다. 도 3은 도 1의 단계 130에 대응되므로 중복되는 설명은 생략한다.FIG. 3 is a detailed flowchart of step 130 shown in FIG. 1 . Since FIG. 3 corresponds to step 130 of FIG. 1 , a redundant description will be omitted.

도 3을 참조하면 단계 310에서, 악성코드 진화관계 분석 장치는 근원 바이너리를 P₁으로 식별할 수 있다. 악성코드 진화관계 분석 장치는 같은 패밀리로 분류된 악성코드 바이너리들 중 근원 바이너리를 P₁으로 식별할 수 있다.Referring to FIG. 3 , in step 310 , the apparatus for analyzing the evolutionary relationship of malicious code may identify the source binary as P ₁ . The apparatus for analyzing the evolutionary relationship of malicious code may identify a source binary as P ₁ among malicious code binaries classified into the same family.

근원 바이너리는 가장 먼저 생성된 최초의 악성코드 바이너리이며, 악성코드 진화관계 분석 장치는 근원 바이너리를 선별할 수 있고, 근원 바이너리를 기준으로 근원 바이너리 외 악성코드 바이너리들의 진화 순서를 추론해 나갈 수 있다.The source binary is the first malicious code binary generated first, and the malware evolution relationship analysis device can select the source binary and infer the evolution order of malicious code binaries other than the source binary based on the source binary.

단계 320에서. 악성코드 진화관계 분석 장치는

의 악성코드 바이너리들을 제 1복잡도에 따라 오름차순으로 정렬할 수 있다. 악성코드 진화관계 분석 장치는

의 악성코드 바이너리들을 제 1복잡도가 낮은 순으로 정렬할 수 있다.In step 320. Malicious code evolution relationship analysis device

of malicious code binaries can be sorted in ascending order according to the first complexity. Malicious code evolution relationship analysis device

of malicious code binaries can be sorted in the order of the lowest first complexity.

은 복수의 악성코드 바이너리들 중 진화 순서가 식별된 악성코드 바이너리들의 집합을 의미하고,

은 복수의 악성코드 바이너리들 중 진화 순서가 식별되지 않은 악성코드 바이너리들의 집합을 의미할 수 있다.

denotes a set of malicious code binaries whose evolution order is identified among a plurality of malicious code binaries,

may mean a set of malicious code binaries whose evolution order is not identified among the plurality of malicious code binaries.

일 실시예에서,

및

는 같은 패밀리로 분류된 악성코드 바이너리들에 포함된 집합일 수 있다. 단계 330에서 악성코드 진화관계 분석 장치는

에서 제 1복잡도가 가장 낮은 원소를 P_j로 선택할 수 있다. In one embodiment,

and

may be a set included in malicious code binaries classified into the same family. In step 330, the malware evolution relationship analysis device

In , the element having the lowest first complexity may be selected as P _j .

단계 340에서, 악성코드 진화관계 분석 장치는

원소 중 P_j와 가장 거리도가 적은 원소를 P_i로 선택할 수 있다. In step 340, the malicious code evolution relationship analysis device

Among the elements, the element with the smallest distance from P _j may be selected as P _i .

구체적으로, 악성코드 진화관계 분석 장치는

에 포함된 모든 악성코드 바이너리들과

에 포함된 P_j의 거리도를 계산하여 비교하고,

에 포함된 모든 악성코드 바이너리들 중 P_j와의 거리도가 가장 적은 원소를 P_i로 선택할 수 있다.Specifically, the malware evolution relationship analysis device is

All malware binaries included in

Calculate and compare the distance diagram of P _j included in

Among all the malicious code binaries included in , the element with the smallest distance from P _j can be selected as P _i .

단계 350에서, 악성코드 진화관계 분석 장치는 P_j를 P_i의 자식으로 식별할 수 있다.In step 350, the apparatus for analyzing the evolutionary relationship of malicious code may identify P _j as a child of P _i .

부모 및 자식은 진화 관계에서의 순서나 계층 구조를 나타내기 위한 표현으로 악성코드 바이너리들 간의 진화 관계가 관련되어 있어 진화 순서가 더 빠른 악성코드 바이너리를 부모라 하고, 진화 순서가 더 느린 악성코드 바이너리를 자식이라 할 수 있다. Parent and child are expressions to indicate the order or hierarchical structure in the evolutionary relationship, and the evolutionary relationship between malicious code binaries is related. can be called children.

악성코드 진화관계 분석 장치는 P_j를 P_i의 자식으로 식별할 수 있고, P_i를P_j의 부모로 식별할 수 있으며,P_i의 진화 순서가 P_j보다 더 빠를 수 있다. The malware evolution relationship analysis device can identify P _j as a child of P _i _, andcan be identified as the parent of P _j ,The evolutionary order of P _i may be faster than P _j .

단계 360에서, 악성코드 진화관계 분석 장치는

<

인 경우, P_k를 P_j의 부모로 식별할 수 있다. In step 360, the malware evolution relationship analysis device

<

, P _k may be identified as a parent of P _j .

구체적으로 악성코드 진화관계 분석 장치는

와

를 계산하여

<

인 경우, P_k를 P_j의 부모로 식별할 수 있고, P_j는 P_k의 자식으로 식별될 수 있다. 한편,

>

인 경우, P_k를 P_j의 부모로 식별하지 않을 수 있다.Specifically, the malware evolution relationship analysis device is

Wow

by calculating

<

, P _k may be identified as a parent of P _j , and P _j may be identified as a child of P _k . Meanwhile,

>

, P _k may not be identified as a parent of P _j .

일 실시예에서, 악성코드 바이너리들은 진화 중에 다른 가지로 나뉘는 분기(Branch) 및 일정 규칙에 따라 하나로 합쳐지는 병합(Merge)이 일어날 수 있는데, 악성코드 진화관계 분석 장치는 악성코드 바이너리들의 분기 및 병합이 일어날 수 있는 형태로 분석할 수 있다. 따라서 부모와 자식이 일대일로 대응되지 않을 수 있고, 임의의 자식 악성코드 바이너리에 대해 부모 악성코드 바이너리가 여럿 존재할 수 있다.In an embodiment, the malicious code binaries are divided into different branches during evolution and merge into one according to a certain rule may occur. The malware evolution relationship analysis device branches and merges the malicious code binaries. It can be analyzed in the form in which it can occur. Therefore, there may not be a one-to-one correspondence between parent and child, and multiple parent malicious code binaries may exist for arbitrary child malicious code binaries.

도 3을 참조하면, 단계 370에서, 악성코드 진화관계 분석 장치는 모든 악성코드 바이너리들이

에 포함될 때까지 과정을 반복할 수 있다. Referring to FIG. 3 , in step 370, the apparatus for analyzing the evolutionary relationship of malicious code records all malicious code binaries.

The process can be repeated until included in

일 실시예에서, 같은 패밀리로 분류된 악성코드 바이너리들 중 근원 바이너리는 한 개이므로, 단계 310을 제외할 수 있고, 진화 순서가 식별되지 않은

의 악성코드 바이너리들이 모두 진화 순서가 식별되어

에 포함될 때까지, 단계 320부터 단계 370을 반복할 수 있다. 모든 악성코드 바이너리들이

에 포함되는 과정을 통해, 악성코드 바이너리들의 진화 순서를 추론할 수 있다.In one embodiment, since the source binary is one among the malicious code binaries classified into the same family, step 310 may be excluded, and the evolution order is not identified.

of the malware binaries are identified in the evolution order

Steps 320 to 370 may be repeated until included in . All malware binaries

Through the process included in

또한, 악성코드 진화관계 분석 장치는 악성코드 바이너리들의 진화 순서에 따라 그래프를 도출할 수 있다. In addition, the malicious code evolution relationship analysis apparatus may derive a graph according to the evolution order of the malicious code binaries.

그래프는 악성코드 바이너리들의 진화 순서를 시각적으로 나타낼 수 있는 그림, 표, 차트, 다이어그램 등을 포함할 수 있고, 이에 제한되지 않는다. The graph may include, but is not limited to, a figure, table, chart, diagram, etc. that can visually represent the evolution order of malicious code binaries.

악성코드 진화관계 분석 장치는 악성코드 바이너리들의 진화 순서를 추론 후 진화 순서에 따라 그래프를 도출할 수도 있고, 악성코드 바이너리들의 진화 순서를 추론함과 동시에 그래프를 도출할 수도 있다.The malicious code evolution relationship analysis device may deduce the evolution order of malicious code binaries and then derive a graph according to the evolution order, or may derive a graph while inferring the evolution order of the malicious code binaries.

도4 및 도 5는 일 실시예에 따른 악성코드 바이너리들의 진화 순서를 추론하고, 추론된 진화 순서에 따른 그래프를 도출하는 예시를 설명하기 위한 도면이다.4 and 5 are diagrams for explaining an example of inferring the evolution order of malicious code binaries and deriving a graph according to the inferred evolution order according to an embodiment.

도 4를 참조하면, P₁, P₂, P₃ 등은 임의의 악성코드 바이너리를 의미할 수 있으며, P_M은 같은 패밀리로 분류된 악성코드 바이너리들의 개수가 M개 인 것을 의미할 수 있다.

(410)은 복수의 악성코드 바이너리들 중 진화 순서가 식별된 악성코드 바이너리들의 집합을 의미하고,

(420)는 복수의 악성코드 바이너리들 중 진화 순서가 식별되지 않은 악성코드 바이너리들의 집합을 의미할 수 있다. Referring to FIG. 4 , P ₁ , P ₂ , P ₃ , etc. may mean arbitrary malicious code binaries, and P _M may mean that the number of malicious code binaries classified into the same family is M.

410 denotes a set of malicious code binaries whose evolution order is identified among a plurality of malicious code binaries,

Reference 420 may refer to a set of malicious code binaries whose evolution order is not identified among the plurality of malicious code binaries.

P₁은 근원 바이너리로 식별되어

(410)에 포함되어 있고, P₁, P₂, P₃, P₄ ???? P_M 순으로 제 1복잡도가 높아지는 것을 가정한다.P ₁ is identified as the root binary,

(410), P ₁ , P ₂ , P ₃ , P ₄ ???? It is assumed that the first complexity increases in the order of P _M.

일 실시예에서, P₁을 식별하고, 제 1복잡도를 오름차순(P₂, P₃, P₄, ????P_M)으로 정렬한 후,

(420)에서 제 1복잡도가 가장 낮은 악성코드 바이너리 P₂를 선택하고,

(410)에는 P₁만 존재하므로 P₂는 P₁의 자식으로 식별할 수 있다.In one embodiment, after identifying P ₁ and sorting the first complexity in ascending order (P ₂ , P ₃ , P ₄ , ????P _M ),

In (420), the malware binary P ₂ having the lowest first complexity is selected,

Since only P ₁ exists in 410 , P ₂ can be identified as a child of P ₁ .

P₂와 P₁의 진화 순서가 식별된 경우,

(420)에서 제 1복잡도가 가장 낮은 악성코드 바이너리 P₃를 선택하여,

(410)에는 P₁ 및 P₂가 존재하므로

및

를 계산하고,

가

보다 작은 값임을 가정하여 P₃는 P₁의 자식으로 식별할 수 있다.If the evolutionary order of P ₂ and P ₁ is identified,

In (420), the malware binary P ₃ having the lowest first complexity is selected,

Since P ₁ and P ₂ exist in (410),

and

to calculate,

go

Assuming that it is a smaller value, P ₃ can be identified as a child of P ₁ .

P₁, P₂, P₃의 진화 순서가 식별 된 경우,

(420)에서 제 1복잡도가 가장 낮은 악성코드 바이너리 P₄를 선택한다.

(410)에는 P₁, P₂ 및 P₃가 존재하므로

,

및

를 계산하고,

,

중

가 가장 작은 값임을 가정하여 P₄는 P₂의 자식으로 식별할 수 있다. If the evolutionary order of P ₁ , P ₂ , P ₃ was identified,

In step 420, a malware binary P ₄ having the lowest first complexity is selected.

Since P ₁ , P ₂ and P ₃ exist in (410),

,

and

to calculate,

,

middle

Assuming that is the smallest value, P ₄ can be identified as a child of P ₂ .

악성코드 진화관계 분석 장치는 악성코드 바이너리들의 진화 순서를 그래프로 도출할 수 있다. 그래프는 방향성이 있는 비순환 그래프(Directed Acyclic Graph)로 나타낼 수 있으나, 이에 제한되지 않는다.The malicious code evolution relationship analysis apparatus may derive the evolution order of malicious code binaries as a graph. The graph may be represented as a directed acyclic graph, but is not limited thereto.

일 실시예에서, P₁, P₂, P₃ 등은 임의의 악성코드 바이너리를 의미할 수 있고, 악성코드 바이너리들은 원(circle)으로 표현할 수 있다. 부모와 자식 간은 화살표를 연결하여 나타낼 수 있다. In an embodiment, P ₁ , P ₂ , P ₃ , etc. may mean arbitrary malicious code binaries, and the malicious code binaries may be expressed as a circle. The relationship between parent and child can be represented by connecting arrows.

예를 들어, 근원 바이너리인 P₁을 원을 그려 표현하고, 제 1복잡도를 오름차순(P₂, P₃, P₄, ????P_M)으로 정렬하고,

(420)에서 제 1복잡도가 가장 낮은 악성코드 바이너리 P₂를 선택하여 원을 그려 표현할 수 있다.

(410)에는 P₁만 존재하므로 P₂는 P₁의 자식으로 화살표로 연결할 수 있다. 그리고, P₃를 원을 그려 표현하고,

및

를 계산하고,

가

보다 작은 값임을 가정하여 P₃는 P₁의 자식으로 화살표로 연결할 수 있다. 다음으로, P₄를 원을 그려 표현하고,

,

및

를 계산하고,

,

중

가 가장 작은 값임을 가정하여 P₄는 P₂의 자식으로 화살표로 연결하여 그래프로 도출할 수 있다. For example, the root binary P ₁ is expressed by drawing a circle, and the first complexity is sorted in ascending order (P ₂ , P ₃ , P ₄ , ????P _M ),

In 420, a circle may be drawn by selecting the malware binary P ₂ having the lowest first complexity.

Since only P ₁ exists in 410 , P ₂ can be connected with an arrow as a child of P ₁ . And, P ₃ is expressed by drawing a circle,

and

to calculate,

go

Assuming a smaller value, P ₃ can be connected with an arrow as a child of P ₁ . Next, P ₄ is expressed by drawing a circle,

,

and

to calculate,

,

middle

Assuming that is the smallest value, P ₄ is a child of P ₂ and can be derived as a graph by connecting it with an arrow.

(420)의 P₄와

(410)의 P₄는 동일한 악성코드 바이너리이며, P₄가 연결된 점선은

(420) 에서 P₄를 선택하고, P₄의 진화순서가 식별되어

(410)에 포함되는 과정을 설명하기 위한 것으로 그래프로 도출 시에는 생략할 수 있다..

(420) with P ₄

P ₄ in (410) is the same malicious code binary, and the dotted line connected to P ₄ is

Select P ₄ from 420, and the evolution order of P ₄ is identified.

It is for explaining the process included in 410 and may be omitted when deriving as a graph.

도 5를 참조하면, P₁, P₂, P₃ 등은 임의의 악성코드 바이너리를 의미할 수 있으며, P_M은 같은 패밀리로 분류된 악성코드 바이너리들의 개수가 M개 인 것을 의미할 수 있다.

은 진화 순서가 식별된 악성코드 바이너리들의 집합을 의미하고,

는 진화 순서가 식별되지 않은 악성코드 바이너리들의 집합을 의미할 수 있다. Referring to FIG. 5 , P ₁ , P ₂ , P ₃ , etc. may mean arbitrary malicious code binaries, and P _M may mean that the number of malicious code binaries classified into the same family is M.

denotes a set of malicious code binaries whose evolutionary order has been identified,

may mean a set of malicious code binaries whose evolutionary order is not identified.

P₁은 근원 바이너리로 식별되어

에 포함되어 있고, P₁, P₂, P₃, P₄ ???? P_M 순으로 제 1복잡도가 높아지는 것을 가정한다. 한편, 도 5는 도 4에 이어지는 내용으로 중복되는 설명은 생략한다.P ₁ is identified as the root binary,

is contained in P ₁ , P ₂ , P ₃ , P ₄ ???? It is assumed that the first complexity increases in the order of P _M. Meanwhile, in FIG. 5 , overlapping descriptions of the contents following FIG. 4 will be omitted.

일 실시예에서, 악성코드 진화관계 분석 장치는 P₄ 와 P₃의 관계성이 P₂ 와 P₃의 관계성보다 떨어진다면, P₂와 P₃의 거리보다 P₄와 P₃의 거리가 더 멀어져야 한다. 그러나, P₄와 P₃의 거리가 P₂와 P₃의 거리보다 더 가까워진다면 P₄는 P₃에 연관성이 높다는 것이며, P₃를 P₄의 부모로 식별할 수 있다.In an embodiment, if the relationship between P ₄ and P ₃ is inferior to the relationship between P ₂ and P ₃ , the apparatus for analyzing the evolutionary relationship of malicious code indicates that the distance between P ₄ and P ₃ is greater than the distance between P ₂ and P ₃ have to go away However, if the distance between P ₄ and P ₃ is closer than the distance between P ₂ and P ₃ , then P ₄ is highly correlated with P ₃ , and P ₃ can be identified as the parent of P ₄ .

예를 들어, P₁, P₂, P₃, P₄가 도 4에서와 동일하게 식별되어 있는 경우를 가정한다. P₃는

에서 자식이 없는 악성코드 바이너리에 해당하고, 도 3의 P_k에 해당할 수 있다. P₄는 P₂의 자식으로 식별된 악성코드 바이너리에 해당하고, 도 3의 P_j에 해당할 수 있다. P₂는 P₄의 부모로 식별된 악성코드 바이너리에 해당하고, 도 3의 P_j의 부모에 해당할 수 있다.

및

를 계산하고,

가

보다 작은 값임을 가정하여, P₃을 P₄의 부모로 식별할 수 있다. For example, it is assumed that P ₁ , P ₂ , P ₃ , and P ₄ are identified as in FIG. 4 . P ₃ is

Corresponds to a malicious code binary having no children in , and may correspond to P _k of FIG. 3 . P ₄ corresponds to a malicious code binary identified as a child of P ₂ , and may correspond to P _j in FIG. 3 . P ₂ may correspond to the malicious code binary identified as the parent of P ₄ , and may correspond to the parent of P _j in FIG. 3 .

and

to calculate,

go

Assuming a smaller value, P ₃ can be identified as a parent of P ₄ .

악성코드 진화관계 분석 장치는 악성코드 바이너리들의 진화 순서를 그래프로 도출할 수 있다. The malicious code evolution relationship analysis apparatus may derive the evolution order of malicious code binaries as a graph.

예를 들어, P₁, P₂, P₃, P₄가 도 4에서와 동일하게 그래프로 표현되어 있는 경우를 가정한다.

및

를 계산하고,

가

보다 작은 값임을 가정하여, P₃을 P₄의 부모로 화살표로 연결하여 그래프로 도출할 수 있다. For example, it is assumed that P ₁ , P ₂ , P ₃ , and P ₄ are expressed in the same graph as in FIG. 4 .

and

to calculate,

go

Assuming that it is a smaller value, it can be derived as a graph by connecting P ₃ as the parent of P ₄ with an arrow.

도 6은 일 실시예에 따른 악성코드 진화관계 분석 장치의 블록도이다. 6 is a block diagram of a malicious code evolution relationship analysis apparatus according to an embodiment.

도 6을 참조하면, 악성코드 진화관계 분석 장치(600)는 메모리(610) 및 프로세서(620)를 포함할 수 있다. 도 6에 도시된 악성코드 진화관계 분석 장치(600)에는 실시예와 관련된 구성요소들만이 도시되어 있다. 따라서, 도 6에 도시된 구성요소들 외에 다른 범용적인 구성요소들이 더 포함될 수 있음을 당해 기술분야의 통상의 기술자라면 이해할 수 있다.Referring to FIG. 6 , the malicious code evolution relationship analysis apparatus 600 may include a memory 610 and a processor 620 . In the malicious code evolution relationship analysis apparatus 600 shown in FIG. 6, only the components related to the embodiment are shown. Accordingly, it can be understood by those skilled in the art that other general-purpose components may be further included in addition to the components shown in FIG. 6 .

메모리(610)는 악성코드 진화관계 분석 장치(600) 내에서 처리되는 각종 데이터들을 저장하는 하드웨어이다. 예를 들어, 메모리(610)는 악성코드 바이너리의 제 1복잡도를 계산한 값을 저장할 수 있다. 또한, 메모리(610)는 같은 패밀리로 분류된 악성코드 바이너리들, 악성코드 바이너리들 간의 거리도를 계산한 값 등을 저장할 수 있다. 메모리(530)는 DRAM(dynamic random access memory), SRAM(static random access memory) 등과 같은 RAM(random access memory), ROM(read-only memory), EEPROM(electrically erasable programmable read-only memory), CD-ROM, 블루레이 또는 다른 광학 디스크 스토리지, HDD(hard disk drive), SSD(solid state drive), 또는 플래시 메모리를 포함할 수 있다.The memory 610 is hardware for storing various data processed in the malicious code evolution relationship analysis device 600 . For example, the memory 610 may store a value obtained by calculating the first complexity of the malicious code binary. Also, the memory 610 may store malicious code binaries classified into the same family, a value obtained by calculating a distance between the malicious code binaries, and the like. The memory 530 is a random access memory (RAM), such as dynamic random access memory (DRAM), static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), CD- It may include ROM, Blu-ray or other optical disk storage, a hard disk drive (HDD), a solid state drive (SSD), or flash memory.

프로세서(620)는 도 1 내지 도 5에서 상술한, 악성코드 진화관계를 분석하기 위한 전반적인 기능을 수행한다. The processor 620 performs an overall function for analyzing the malicious code evolution relationship described above with reference to FIGS. 1 to 5 .

일 실시예에서 프로세서(620)는 복수의 악성코드 바이너리들 각각의 제 1복잡도를 계산할 수 있다. 또한 프로세서(620)는 계산된 제 1복잡도를 이용하여 최초로 생성된 근원 바이너리를 선별할 수 있다. 프로세서(620)는 계산된 제 1복잡도 및 복수의 악성코드 바이너리들 간의 거리도를 기초로 하여 근원 바이너리 외 악성코드 바이너리들의 진화 순서를 추론할 수 있다. In an embodiment, the processor 620 may calculate a first complexity of each of the plurality of malicious code binaries. Also, the processor 620 may select the first generated source binary by using the calculated first complexity. The processor 620 may infer the evolution order of malicious code binaries other than the source binary based on the calculated first complexity and distance between the plurality of malicious code binaries.

본 실시예들은 컴퓨터에 의해 실행되는 프로그램 모듈과 같은 프로그램을 기록한 기록 매체의 형태로도 구현될 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체 및 통신 매체를 모두 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다. 통신 매체는 전형적으로 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈과 같은 변조된 데이터 신호의 기타 데이터, 또는 기타 전송 메커니즘을 포함하며, 임의의 정보 전달 매체를 포함한다.The present embodiments may also be implemented in the form of a recording medium in which a program such as a program module executed by a computer is recorded. Computer-readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. In addition, computer-readable media may include both computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Communication media typically includes computer readable instructions, data structures, other data in modulated data signals, such as program modules, or other transport mechanisms, and includes any information delivery media.

또한, 본 명세서에서, "부"는 프로세서 또는 회로와 같은 하드웨어 구성(hardware component), 및/또는 프로세서와 같은 하드웨어 구성에 의해 실행되는 소프트웨어 구성(software component)일 수 있다.Also, in this specification, "unit" may be a hardware component such as a processor or circuit, and/or a software component executed by a hardware component such as a processor.

전술한 본 명세서의 설명은 예시를 위한 것이며, 본 명세서의 내용이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The description of the present specification described above is for illustration, and those of ordinary skill in the art to which the content of this specification belongs will understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present invention. will be able Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive. For example, each component described as a single type may be implemented in a dispersed form, and likewise components described as distributed may be implemented in a combined form.

상술한 실시예들에 대한 설명은 예시적인 것에 불과하며, 당해 기술 분야에서 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 다른 실시예가 가능하다는 점을 이해할 것이다. 따라서 발명의 진정한 보호 범위는 첨부된 청구범위에 의해 정해져야 할 것이며, 청구범위에 기재된 내용과 동등한 범위에 있는 모든 차이점은 청구범위에 의해 정해지는 보호 범위에 포함되는 것으로 해석되어야 할 것이다.The description of the above-described embodiments is merely exemplary, and it will be understood by those skilled in the art that various modifications and equivalent other embodiments are possible therefrom. Accordingly, the true scope of protection of the invention should be defined by the appended claims, and all differences within the scope of equivalents to those described in the claims should be construed as being included in the protection scope defined by the claims.

Claims

In the method of analyzing the evolutionary relationship of malicious code through the processor of the device for analyzing the evolution of malicious code,
calculating a first complexity of each of a plurality of malicious code binaries through the processor;
selecting, through the processor, a source binary first generated using the calculated first complexity; and
inferring the evolution order of the plurality of malicious code binaries in addition to the source binary based on the calculated first complexity through the processor and the distance between the plurality of malicious code binaries;
The first complexity is calculated according to Equation 1 below,
[Equation 1]

(here, the

Corresponds to a weight with an arbitrary value, where s is the second complexity, and d is the number of API sequences)
If any malicious code binary among the plurality of malicious code binaries is packed, the

smaller than,
If the arbitrary malicious code binary is anti-debugging, the

smaller than,
If the arbitrary malicious code binary is not packed and anti-debugging, the

Malicious code evolution relationship analysis method, including the same.

The method of claim 1,
The step of selecting the source binary,
and selecting a malicious code binary having the lowest first complexity as the source binary from among the plurality of malicious code binaries classified into the same family.

The method of claim 1,
Calculating the first complexity comprises:
Calculating the first complexity using dynamic analysis and static analysis;
The dynamic analysis extracts the number of API sequences called by each of the plurality of malicious code binaries,
The static analysis extracts the second complexity of each of the plurality of malicious code binaries, wherein the extracted second complexity is determined by the sum of the number of nodes and the number of edges. Evolutionary relationship analysis method.

The method of claim 1,
The distance diagram is
A malicious code evolution relationship analysis method calculated according to Equation 2 below.
[Equation 2]

(here,

denotes each of the arbitrary malicious code binaries,

is said

It means the distance between

is said

is the similarity of )

5. The method of claim 4,
wherein the similarity is calculated using the number of API sequences.

6. The method of claim 5,
The similarity is determined by using at least one of Needleman-Wunsch Algorithm, Smith-Waterman Algorithm, and Hirschberg's Algorithm.

The method of claim 1,
Inferring the evolution order comprises:
the source binary

When said to

sorting in ascending order according to the first complexity;
remind

The malware binary with the lowest first complexity in

selecting as;
The above that satisfies the following Equation (3)

of malware binaries

selecting as;
[Equation 3]

(here, the

denotes each of the arbitrary malicious code binaries,

is said

distance between them)
the selected

identifying;
remind

Malware binaries that have no children in

and identified above

About

calculating ;
remind

About

calculating ;
remind

is reminded

If less, the

to remind

to identify as the parent of; and
remind

and repeatedly identifying the evolution order until all of the plurality of malicious code binaries are included in the malicious code evolution relationship analysis method.

8. The method of claim 7,
Inferring the evolution order comprises:
Deriving a graph according to the identified evolutionary order; Malware evolution relationship analysis method further comprising a.

A recording medium recording a program for executing the method according to any one of claims 1 to 8 in a computer.

In the device for analyzing the evolutionary relationship of malicious code,
memory: and
including a processor;
The processor is
calculating a first complexity of each of the plurality of malware binaries;
selecting a source binary first generated using the calculated first complexity;
infer the evolution order of the plurality of malicious code binaries other than the source binary based on the calculated first complexity and the distance between the plurality of malicious code binaries;
The first complexity is calculated according to Equation 1 below,
[Equation 1]

(here, the

smaller than,
If the arbitrary malicious code binary is anti-debugging, the

Malicious code evolution relationship analysis device including the same.