KR102195906B1

KR102195906B1 - Apparatus and Method for program analysis dynamically

Info

Publication number: KR102195906B1
Application number: KR1020180169833A
Authority: KR
Inventors: 조은선; 목성균; 전현구
Original assignee: 충남대학교산학협력단
Priority date: 2017-12-26
Filing date: 2018-12-26
Publication date: 2020-12-29
Also published as: KR20190078545A

Abstract

프로그램 동적 분석 장치 및 방법이 개시된다.
본 발명의 일 실시예에 따른 프로그램 동적 분석 장치는, 바이너리 프로그램을 동적 분석하여 콜 트레이스(call trace)를 획득하는 동적 분석부, 상기 콜 트레이스를 기반으로 상기 바이너리 프로그램의 테인트(taint)에 대한 정적 분석을 수행하여 의존 그래프를 생성하는 정적 분석부, 상기 의존 그래프를 기반으로 상기 바이너리 프로그램을 동적 역방향 분석을 수행하여, 상기 바이너리 프로그램의 입력과 관련 여부에 대한 결과를 출력하는 동적 역방향 분석부를 포함한다.A program dynamic analysis apparatus and method are disclosed.
The program dynamic analysis apparatus according to an embodiment of the present invention includes a dynamic analysis unit for dynamically analyzing a binary program to obtain a call trace, and a taint of the binary program based on the call trace. A static analysis unit that generates a dependency graph by performing static analysis, and a dynamic reverse analysis unit that performs a dynamic reverse analysis on the binary program based on the dependency graph, and outputs a result of whether the binary program is related to the input. do.

Description

Apparatus and Method for program analysis dynamically}

본 발명은 프로그램 동적 분석 장치 및 그 방법에 관한 것으로, 특히 정적 분석을 통한 프로그램 슬라이싱 후 동적 역방향 분석을 수행하는 프로그램 동적 분석 방법 및 그 장치에 관한 것이다. The present invention relates to a program dynamic analysis apparatus and a method thereof, and more particularly, to a program dynamic analysis method and apparatus for performing dynamic reverse analysis after slicing a program through static analysis.

바이너리 프로그램 분석은 소스 코드에 의존하지 않고 프로그램의 구조와 실행 흐름 등을 파악하는데 매우 중요하다. 하지만 다른 모든 분석들과 마찬가지로, 정확한 프로그램 분석에는 프로그램의 크기에 비례하여 많은 시간이 걸리게 되고, 시간이 적게 소요되는 간단한 분석 방법으로는 그 적용 범위가 매우 작거나 잘못된 분석 결과를 도출할 확률이 높아진다. 따라서 바이너리 프로그램을 효율적으로 분석하려는 연구가 필요하다.Binary program analysis is very important to understand the structure and execution flow of a program without depending on the source code. However, like all other analyzes, accurate program analysis takes a lot of time in proportion to the size of the program, and a simple analysis method that takes less time has a very small scope of application or a high probability of erroneous analysis results. . Therefore, research is needed to efficiently analyze binary programs.

프로그램의 동적 분석은 프로그램의 취약점 분석과 악성코드 분석 등 보안 분야에서 널리 활용된다. 그중에서 동적 역방향 분석은 프로그램의 특정 지점이 프로그램의 입력과 관계가 있는지 파악하기 위해 사용한다. Program dynamic analysis is widely used in security fields such as program vulnerability analysis and malicious code analysis. Among them, dynamic reverse analysis is used to determine whether a specific point in the program is related to the input of the program.

종래의 동적 역방향 분석에 관해서 ARM-Analyzer와 VDT(Visual Data Tracer)가 있다. 그러나, 이러한 방법은 프로그램을 실행시켜야 하기 때문에 분석에 시간이 오래 걸리고, 분석이 오래 걸리면 취약점의 패치나 악성코드에 대한 대처가 제때 되지 않을 수 있다.For conventional dynamic reverse analysis, there are ARM-Analyzer and VDT (Visual Data Tracer). However, this method takes a long time to analyze because the program must be executed, and if the analysis takes a long time, it may not be timely to deal with the patch or malicious code of the vulnerability.

또한, 바이너리 프로그램을 동적으로 역방향 분석을 할 때, 명령어 트레이스를 추출하여 분석을 한다. 이 때, 트레이스의 크기가 크면 클수록 분석에 필요한 시간 역시 많이 필요하다. 실제 상용프로그램은 프로그램의 코드의 양이 많기 때문에 트레이스의 크기 또한 커지고 분석에 필요한 시간도 오래 걸리게 된다. 프로그램 분석이 오래 걸린다면, 악성코드에 대한 대처가 늦거나 프로그램의 취약점 패치가 느려지는 단점이 있다. In addition, when performing a dynamic reverse analysis of a binary program, the instruction trace is extracted and analyzed. In this case, the larger the size of the trace, the more time required for analysis is required. In actual commercial programs, since the amount of code in the program is large, the size of the trace is also increased, and the time required for analysis is increased. If it takes a long time to analyze a program, there is a disadvantage in that the response to the malicious code is slow or the vulnerability patch of the program is slow.

이에, 프로그램 실행 명령 트레이스의 크기를 줄이는 방법으로 동적 분석을 수행할 수 있는 기술 개발이 요구되고 있다. Accordingly, there is a need to develop a technology capable of performing dynamic analysis by reducing the size of a program execution instruction trace.

한국 등록특허공보 제10-1482073호(2006.04.24.)Korean Registered Patent Publication No. 10-1482073 (2006.04.24.)

본 발명이 해결하고자 하는 과제는 프로그램 실행 명령 트레이스의 크기를 줄이는 방법으로 동적 분석을 수행할 수 있는 프로그램 동적 분석 장치 및 그 방법을 제공하는 것이다. The problem to be solved by the present invention is to provide a program dynamic analysis apparatus and method capable of performing dynamic analysis by reducing the size of a program execution instruction trace.

본 발명이 해결하고자 하는 과제는 이상에서 언급한 과제(들)로 제한되지 않으며, 언급되지 않은 또 다른 과제(들)은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The problem to be solved by the present invention is not limited to the problem(s) mentioned above, and another problem(s) not mentioned will be clearly understood by those skilled in the art from the following description.

상기한 과제를 해결하기 위하여, 본 발명의 일 실시예에 따른 프로그램 동적 분석 장치는, 바이너리 프로그램을 동적 분석하여 콜 트레이스(call trace)를 획득하는 동적 분석부, 상기 콜 트레이스를 기반으로 상기 바이너리 프로그램의 테인트(taint)에 대한 정적 분석을 수행하여 의존 그래프를 생성하는 정적 분석부, 상기 의존 그래프를 기반으로 상기 바이너리 프로그램을 동적 역방향 분석을 수행하여, 상기 바이너리 프로그램의 입력과 관련 여부에 대한 결과를 출력하는 동적 역방향 분석부를 포함한다. In order to solve the above problems, a program dynamic analysis apparatus according to an embodiment of the present invention includes a dynamic analysis unit that dynamically analyzes a binary program to obtain a call trace, and the binary program based on the call trace. A static analysis unit that generates a dependency graph by performing a static analysis on the taint of, and a result of whether it is related to the input of the binary program by performing a dynamic reverse analysis on the binary program based on the dependency graph It includes a dynamic reverse analysis unit that outputs.

바람직하게는, 상기 정적 분석부는, 상기 바이너리 프로그램의 실행을 통한 테인트 분석을 수행하여, 상기 바이너리 프로그램에 대하여 크래시(crash)를 발생시킨 명령어들 각각에 의해 영향을 받는 모든 명령어를 정적 분석하여 해당 크래시의 위험도를 분석할 수 있다. Preferably, the static analysis unit performs a taint analysis through execution of the binary program, statically analyzes all instructions affected by each of the instructions that caused a crash to the binary program, You can analyze the risk of crash.

바람직하게는, 상기 정적 분석부는, 상기 다수의 크래시들을 발생시킨 명령어들 각각에 대하여 상기 명령어들의 도달지점들을 식별하고, 상기 도달지점들 중 해당 명령어가 실제로 사용되는 지점을 찾아낸 후, 그 결과를 의존 그래프로 생성하며, 상기 의존 그래프로부터 프로그램의 제어권을 옮길 수 있는 명령어를 식별한 후, 상기 크래시를 발생시킨 명령어의 공격 가능성을 분석할 수 있다. Preferably, the static analysis unit identifies arrival points of the instructions for each of the instructions that caused the plurality of crashes, finds a point at which the instruction is actually used among the arrival points, and relies on the result. It is generated as a graph, and after identifying a command capable of transferring control of a program from the dependency graph, it is possible to analyze the possibility of attacking the command that caused the crash.

바람직하게는, 상기 정적 분석부는, 오염된 데이터가 쓰인 지점의 주소를 입력하고 그 지점이 있는 함수부터 함수 내 분석(Intraprocedural analysis)을 수행하는 함수내 분석모듈, 현재 분석 지점의 함수 내부에 시스템콜이 있는지에 대해 확인하여, 프로그램의 입력과 관련이 있는지 확인하는 함수 단위 분석(Interprocedural analysis)을 수행하는 함수 단위 분석모듈을 포함할 수 있다. Preferably, the static analysis unit is an analysis module within a function that inputs the address of a point where the contaminated data is written and performs intraprocedural analysis from a function having the point, and a system call inside the function of the current analysis point. It may include a function unit analysis module that checks whether there is present and performs interprocedural analysis to check whether there is a relationship with the input of the program.

바람직하게는, 상기 함수내 분석모듈은, 크래시가 일어난 지점의 명령어를 오염된 명령어의 집합에 넣은 후 분석을 시작하고, 상기 명령어에 영향을 준(use) 명령어를 찾는 과정을 반복하는 역테인트 분석을 수행할 수 있다. Preferably, the analysis module in the function starts analysis after inserting the instruction at the point where the crash occurred into the set of contaminated instructions, and repeats the process of finding the instruction that used the instruction. Can be done.

바람직하게는, 상기 함수간 분석모듈은, 현재 분석 지점의 함수 내부에 시스템콜이 있는지에 대해 확인하여, 프로그램의 입력과 관련이 있는지 확인하고, 상기 확인결과 관련이 있는 경우 상기 콜 트레이스를 바탕으로 해당 함수를 호출한 함수(caller function)을 모두 찾고, 상기 Caller 함수에서 callee함수의 호출 지점을 찾은 후, callee 함수의 입력을 넣어주는 지점을 모두 찾아 그 지점에 대해 역테인트 분석을 수행할 수 있다. Preferably, the inter-function analysis module checks whether there is a system call inside the function of the current analysis point, checks whether it is related to the input of the program, and, if the check result is related, based on the call trace. After finding all the functions that called the corresponding function (caller function), finding the calling point of the callee function in the Caller function, finding all points where the input of the callee function is put, and performing an inverse taint analysis on that point. .

바람직하게는, 상기 동적 역방향 분석부는, 레지스터 값 및 메모리 정보에 기초하여, 상기 의존 그래프를 가지치기하고, 상기 가지치기된 의존 그래프를 동적 역방향 분석을 수행할 수 있다. Preferably, the dynamic reverse analysis unit may prun the dependence graph based on register values and memory information, and perform dynamic reverse analysis of the pruned dependence graph.

상기한 과제를 해결하기 위하여, 본 발명의 일 실시예에 따른 프로그램 동적 분석 방법은, 바이너리 프로그램을 동적 분석하여 콜 트레이스(call trace)를 획득하는 단계, 상기 콜 트레이스를 기반으로 상기 바이너리 프로그램의 테인트(taint)에 대한 정적 분석을 수행하여 의존 그래프를 생성하는 단계, 상기 의존 그래프를 기반으로 상기 바이너리 프로그램을 동적 역방향 분석을 수행하여, 상기 바이너리 프로그램의 입력과 관련 여부에 대한 결과를 출력하는 단계를 포함한다.In order to solve the above problems, a program dynamic analysis method according to an embodiment of the present invention includes the steps of obtaining a call trace by dynamically analyzing a binary program, and the binary program data based on the call trace. Generating a dependency graph by performing a static analysis on a taint, performing a dynamic reverse analysis of the binary program based on the dependency graph, and outputting a result of whether the binary program is related to the input Includes.

바람직하게는, 상기 의존 그래프를 생성하는 단계는, 오염된 데이터가 쓰인 지점의 주소를 입력하고 그 지점이 있는 함수부터 함수 내 분석(Intraprocedural analysis)을 수행하는 단계, 현재 분석 지점의 함수 내부에 시스템콜이 있는지에 대해 확인하여, 프로그램의 입력과 관련이 있는지 확인하는 함수 단위 분석(Interprocedural analysis)을 수행하는 단계를 포함할 수 있다. Preferably, in the step of generating the dependency graph, inputting the address of the point where the contaminated data is written and performing intraprocedural analysis starting from the function having the point, the system inside the function of the current analysis point It may include the step of performing interprocedural analysis to check whether there is a call and check whether there is a relation to the input of the program.

본 발명에 따르면, 프로그램 실행 명령 트레이스의 크기를 줄이는 방법으로 동적 분석을 수행함으로써, 동적 분석의 속도를 개선할 수 있다. According to the present invention, it is possible to improve the speed of dynamic analysis by performing dynamic analysis in a way to reduce the size of a program execution instruction trace.

또한, 프로그램을 동적 분석하기 전에 정적 분석을 통하여 분석할 명령어를 추출하기 때문에 동적 분석할 명령어가 줄어들게 되어 보다 빠르게 동적 분석을 수행할 수 있다. 이처럼, 동적 분석 시간을 줄여줌으로써, 보다 빠르게 취약점이나 악성코드에 대처할 수 있다. In addition, since commands to be analyzed are extracted through static analysis before dynamic analysis of a program, the number of commands to be dynamically analyzed is reduced, so that dynamic analysis can be performed more quickly. In this way, by reducing the dynamic analysis time, it is possible to respond to vulnerabilities or malicious codes faster.

본 발명의 효과들은 이상에서 언급한 효과들로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The effects of the present invention are not limited to the above-mentioned effects, and other effects not mentioned will be clearly understood by those skilled in the art from the following description.

도 1은 본 발명의 일 실시예에 따른 프로그램 동적 분석 장치를 설명하기 위한 도면이다.
도 2는 도 1에 도시된 동적 역방향 분석부를 세분화한 블록도이다.
도 3은 본 발명의 일 실시예에 따른 Reaching-Definition 분석을 설명하기 위한 예시도이다.
도 4는 본 발명의 일 실시예에 따른 함수 내 분석(Intraprocedural analysis)을 설명하기 위한 알고리즘이다.
도 5는 본 발명의 일 실시예에 따른 함수 단위 분석을 설명하기 위한 예시도이다.
도 6은 본 발명의 일 실시예에 따른 함수 단위 분석(Interprocedural analysis)을 설명하기 위한 예시도이다.
도 7은 본 발명의 일 실시예에 따른 정적 분석과 동적 분석에 이용되는 의존 그래프를 설명하기 위한 예시도이다.
도 8 및 도 9는 본 발명의 일 실시예에 따른 프로그램 동적 분석 방법을 설명하기 위한 순서도이다. 1 is a diagram illustrating a program dynamic analysis apparatus according to an embodiment of the present invention.
2 is a block diagram of a subdivided dynamic reverse analysis unit shown in FIG. 1.
3 is an exemplary diagram for explaining Reaching-Definition analysis according to an embodiment of the present invention.
4 is an algorithm for explaining intraprocedural analysis according to an embodiment of the present invention.
5 is an exemplary diagram for explaining functional unit analysis according to an embodiment of the present invention.
6 is an exemplary diagram for explaining interprocedural analysis according to an embodiment of the present invention.
7 is an exemplary diagram for explaining a dependence graph used for static analysis and dynamic analysis according to an embodiment of the present invention.
8 and 9 are flowcharts illustrating a program dynamic analysis method according to an embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세하게 설명하고자 한다. 그러나 이는 본 발명을 특정한 실시 형태에 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 각 도면을 설명하면서 유사한 참조부호를 유사한 구성요소에 대해 사용하였다. In the present invention, various modifications may be made and various embodiments may be provided, and specific embodiments will be illustrated in the drawings and described in detail. However, this is not intended to limit the present invention to a specific embodiment, and it should be understood to include all modifications, equivalents, and substitutes included in the spirit and scope of the present invention. In describing each drawing, similar reference numerals have been used for similar elements.

제1, 제2, A, B 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다. Terms such as first, second, A, and B may be used to describe various components, but the components should not be limited by the terms. These terms are used only for the purpose of distinguishing one component from another component. For example, without departing from the scope of the present invention, a first element may be referred to as a second element, and similarly, a second element may be referred to as a first element. The term and/or includes a combination of a plurality of related listed items or any of a plurality of related listed items.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. When a component is referred to as being "connected" or "connected" to another component, it is understood that it may be directly connected or connected to the other component, but other components may exist in the middle. Should be. On the other hand, when a component is referred to as being "directly connected" or "directly connected" to another component, it should be understood that there is no other component in the middle.

본 출원에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terms used in the present application are used only to describe specific embodiments, and are not intended to limit the present invention. Singular expressions include plural expressions unless the context clearly indicates otherwise. In the present application, terms such as "comprise" or "have" are intended to designate the presence of features, numbers, steps, actions, components, parts, or combinations thereof described in the specification, but one or more other features. It is to be understood that the presence or addition of elements or numbers, steps, actions, components, parts, or combinations thereof, does not preclude in advance.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless otherwise defined, all terms, including technical and scientific terms, used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the present invention belongs. Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the related technology, and should not be interpreted as an ideal or excessively formal meaning unless explicitly defined in this application. Does not.

이하에서는 본 발명에 따른 바람직한 실시예를 첨부된 도면을 참조하여 상세하게 설명한다.Hereinafter, preferred embodiments according to the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 프로그램 동적 분석 장치를 설명하기 위한 도면, 도 2는 도 1에 도시된 동적 역방향 분석부를 세분화한 블록도, 도 3은 본 발명의 일 실시예에 따른 Reaching-Definition 분석을 설명하기 위한 예시도, 도 4는 본 발명의 일 실시예에 따른 함수 내 분석(Intraprocedural analysis)을 설명하기 위한 알고리즘, 도 5는 본 발명의 일 실시예에 따른 함수 단위 분석을 설명하기 위한 예시도, 도 6은 본 발명의 일 실시예에 따른 함수 단위 분석(Interprocedural analysis)을 설명하기 위한 예시도, 도 7은 본 발명의 일 실시예에 따른 정적 분석과 동적 분석에 이용되는 의존 그래프를 설명하기 위한 예시도이다. 1 is a diagram for explaining a program dynamic analysis apparatus according to an embodiment of the present invention, FIG. 2 is a block diagram of a subdivided dynamic reverse analysis unit shown in FIG. 1, and FIG. 3 is a Reaching according to an embodiment of the present invention. -An exemplary diagram for explaining definition analysis, FIG. 4 is an algorithm for explaining intraprocedural analysis according to an embodiment of the present invention, and FIG. 5 is a diagram illustrating functional unit analysis according to an embodiment of the present invention. 6 is an exemplary diagram for explaining interprocedural analysis according to an embodiment of the present invention, and FIG. 7 is a dependency used for static analysis and dynamic analysis according to an embodiment of the present invention. It is an example diagram for explaining the graph.

도 1을 참조하면, 본 발명의 일 실시예에 따른 프로그램 동적 분석 장치(100)는 동적 분석부(110), 정적 분석부(120), 동적 역방향 분석부(130)를 포함한다. Referring to FIG. 1, a program dynamic analysis apparatus 100 according to an embodiment of the present invention includes a dynamic analysis unit 110, a static analysis unit 120, and a dynamic reverse analysis unit 130.

동적 분석부(110)는 바이너리 프로그램을 동적 분석하여 콜 트레이스(call trace)를 획득한다. 동적 분석은 프로그램을 실행함으로써 프로그램을 분석하는 것으로, 바이너리 코드의 실행 경로 탐색에 대한 동적 분석일 수 있다. 콜 트레이스는 함수들의 동적 콜 그래프를 의미하는 것으로, 함수들의 동적 콜 그래프는 프로그램의 실행 동안 프로그램의 프로파일링을 위하여 이용되는 개요(abstraction)를 나타낸다. 프로파일링은 프로그램 동작(behavior)을 이해하고 프로그램 및 프로그램들의 소스들에서의 오류를 검출하고, 프로그램의 성능 분석에 필요하다. 프로그램서 모든 함수 콜을 트레이싱하는 정확한 동적 콜 그래프를 획득하기 위해서는 프로그램의 모든 함수들의 사용(instrumentation)이 요구되며 이용될 수 있다. The dynamic analysis unit 110 dynamically analyzes a binary program to obtain a call trace. Dynamic analysis is analyzing a program by executing a program, and may be a dynamic analysis of searching an execution path of a binary code. The call trace refers to a dynamic call graph of functions, and the dynamic call graph of functions represents an abstraction used for profiling a program during execution of the program. Profiling is necessary for understanding program behavior, detecting errors in programs and their sources, and analyzing program performance. In order to obtain an accurate dynamic call graph that traces all function calls in the program, the instrumentation of all functions in the program is required and can be used.

정적 분석부(120)는 동적 분석부(110)에서 획득된 콜 트레이스를 기반으로 바이너리 프로그램의 테인트 소스(taint source)에 대한 정적 분석을 수행하여 의존 그래프를 생성한다. 여기서, 테인트 소스는 바이너리 프로그램에서 크래시를 발생시킨 명령어일 수 있다. The static analysis unit 120 generates a dependency graph by performing static analysis on a taint source of a binary program based on the call trace obtained from the dynamic analysis unit 110. Here, the taint source may be a command that causes a crash in a binary program.

바이너리 프로그램은 함수들로 구성되고, 각 함수들은 명령어들로 구성되므로, 프로그램을 동적 분석하기 전에 분석에 필요한 명령어만을 추출할 필요가 있다. 다시 말하면, 바이너리 프로그램을 동적으로 역방향 분석을 할 때, 명령어 트레이스를 추출하여 분석을 한다. 이때, 트레이스의 크기가 크면 클수록 분석에 필요한 시간 역시 많이 필요하다. 실제 상용 프로그램은 프로그램의 코드의 양이 많기 때문에 트레이스의 크기 또한 커지고 분석에 필요한 시간도 오래 걸리게 된다. 프로그램 분석이 오래 걸린다면, 악성코드에 대한 대처가 늦거나 프로그램의 취약점 패치가 느려질 수 있다. Since a binary program is composed of functions, and each function is composed of instructions, it is necessary to extract only the instructions necessary for analysis before dynamic analysis of the program. In other words, when performing a dynamic reverse analysis of a binary program, the instruction trace is extracted and analyzed. At this time, the larger the size of the trace, the more time required for analysis is required. In actual commercial programs, because the amount of code in the program is large, the size of the trace is also increased, and the time required for analysis is increased. If the analysis of the program takes a long time, the response to the malicious code may be delayed or the vulnerability patch of the program may be slow.

이에, 정적 분석부(120)는 정적 분석을 통해 분석에 필요한 명령어만을 추출하여, 의존 그래프를 생성한다. 즉, 정적 분석부(120)는 크래시를 발생시킨 명령어들 각각에 대하여 각 명령어들의 도달지점들(Reaching Definition)을 식별하고, 식별된 도달지점들 중 해당 명령어가 실제로 사용되는 지점(Def-Use Chaining)을 찾아내어 의존 그래프로 생성한다. 이때, 정적 분석부(120)는 정적 분석 기법인 리칭-데프(Reaching definition) 분석을 이용할 수 있다. Reaching-Definition 분석은 변수를 정의하는 명령어가 어디까지 영향을 줄 수 있는가, 즉 어디까지 도달하는가를 알아내는 것이 목적이다. 따라서, 정적 분석부(120)는 Reaching-Definition 분석을 통해 각 인스트럭션(instruction)에 도달 가능한 인스트럭션(instruction)이 어떤 것이 있는지 추출하고, 이 결과를 이용하여 데프-유즈 체이닝(Def-use chaining) 분석을 할 수 있다. Accordingly, the static analysis unit 120 extracts only commands necessary for analysis through static analysis, and generates a dependency graph. That is, the static analysis unit 120 identifies reaching points of each instruction for each of the instructions that caused the crash (Def-Use Chaining), and among the identified arrival points, the corresponding instruction is actually used (Def-Use Chaining). ) And generate a dependency graph. In this case, the static analysis unit 120 may use a reaching definition analysis, which is a static analysis technique. Reaching-Definition analysis aims to find out how far the command defining a variable can affect, that is, how far it reaches. Therefore, the static analysis unit 120 extracts what instruction can be reached for each instruction through Reaching-Definition analysis, and analyzes the def-use chaining using the result. can do.

한편, 테인트 분석은 특정 데이터가 어떻게 전파되는지를 알기 위해 사용된다. 일반적인 실행 경로를 따라 데이터의 흐름을 분석하게 된다. 하지만, 정적 테인트 분석에서는 모든 실행 경로에 대해 만족하는 분석이 이루어져야 한다. 즉, 반복문이나 조건문 같은 분기가 발생할 경우, 모든 경로를 만족시키는 분석이 이루어져야 한다. 도 3의 CFG는 c언어로 작성된 if문으로 분기를 가지는 프로그램이다. 1번 명령어의 변수 a를 테인트된 데이터라고 할 때, a는 6번 명령어에서 변수 r로 퍼지게 된다. 반대로 7번 명령어에서는 변수 r은 a에 영향을 받지 않는다. 정적 분석에서는 조건문의 분기가 6으로 가게 될지 7로 가게 될지 모르기 때문에 안전한 분석을 위해서 8번 명령어의 지점에서 r은 테인트 되었다고 해야 한다. 여기서 Reaching Definition은 모든 경로에 대해 만족시키는 분석결과를 얻을 수 있기에 이에 대해 적합한 분석이다. Reaching Definition 분석의 결과는 각 베이직 블록의 In과 Out으로 나타나게 된다. Reaching Definitnion은 예제와 같이 6번과 7번의 정의가 8번에서 모두 도달한다고 분석한다. 따라서 보수적으로 분석해야만 하는 정적 테인트 분석에서는 적합한 방법이다. On the other hand, Taint analysis is used to know how certain data is propagated. It analyzes the flow of data along a general execution path. However, in static taint analysis, satisfactory analysis must be performed for all execution paths. In other words, when a branch such as a loop or conditional statement occurs, an analysis that satisfies all paths must be performed. The CFG of FIG. 3 is a program having branches with an if statement written in c language. Assuming that the variable a of command 1 is the tainted data, a spreads from command 6 to the variable r. Conversely, in command 7, the variable r is not affected by a. In static analysis, since the branch of the conditional statement does not know whether it will go to 6 or 7, we must say that r is tainted at the point of instruction 8 for safe analysis. Here, the Reaching Definition is an appropriate analysis for all paths, since satisfactory analysis results can be obtained. The results of Reaching Definition analysis are shown as In and Out of each basic block. As in the example, Reaching Definitnion analyzes that the definitions 6 and 7 arrive at both 8. Therefore, it is a suitable method for static taint analysis that must be analyzed conservatively.

이것은 역방향 분석 시에도 유효하게 된다. 변수 r에 대해서 추적할 때, 4의 r은 변수 a 또는 b의 영향을 받게 된다. 정방향 분석과 같이, a 또는 b 둘 중 하나만 추적하는 것이 아닌 a와 b 모두 추적해야 한다. 이러한 분석은 오버 테인트(over taint)가 될 수 있지만, 취약점을 놓치면 안 되기 때문에 보수적으로 분석이 이루어져야한다. 그런 후, 정적 분석부(120)는 Reaching Definition을 바탕으로 Use-Def 체인 분석을 하게 된다. 즉, 정적 분석부(120)는 리칭-데프(Reaching definition) 분석 결과를 이용하여 데프-유즈 체이닝(Def-use chaining) 분석을 할 수 있다. 예컨대, 입력 데이터(Taint Source, 크래시 포인트)를 시작으로, 데프-유즈 체이닝(Def-use chaining)은 어떤 명령어 i1에 도달 가능한 정의(Definition)들 중에서 i1에서 사용(Use)하는 경우에 그 관계를 그래프로 반영한다. 결과적으로 크래시에 영향을 받은 모든 명령어들의 관계가 분석되어 데프-유즈 그래프가 완성된다. 데프-유즈 그래프에 포함된 노드(vertex)는 크래시 포인트의 영향을 받은 인스트럭션(instructions)일 수 있다.This is also effective in reverse analysis. When tracking against variable r, r in 4 is affected by variable a or b. Like forward analysis, you need to track both a and b, not just one of a or b. Such an analysis can be over taint, but it must be done conservatively because the vulnerability should not be missed. Then, the static analysis unit 120 analyzes the Use-Def chain based on the Reaching Definition. That is, the static analysis unit 120 may perform a def-use chaining analysis using a reaching definition analysis result. For example, starting with the input data (Taint Source, crash point), def-use chaining is the relationship between when used in i1 among the definitions reachable to a certain command i1. Reflect in graph. As a result, the relationship of all the instructions affected by the crash is analyzed to complete the def-use graph. The nodes (vertex) included in the def-use graph may be instructions affected by the crash point.

정적 분석부(120)는 의존 그래프가 생성되면, 의존 그래프로부터 프로그램의 제어권을 옮길 수 있는 명령어를 식별하여, 크래시를 발생시킨 명령어의 공격 가능성을 분석한다. When the dependency graph is generated, the static analysis unit 120 identifies a command capable of transferring control of a program from the dependence graph, and analyzes the possibility of an attack of the command causing the crash.

상술한 바와 같이 정적 분석부(120)는 동적 분석부(120)에서 획득된 콜 트레이스에 대해 정적 분석을 수행하여, 바이너리 프로그램의 입력과 관련 여부에 대한 결과를 출력한다.As described above, the static analysis unit 120 performs static analysis on the call trace obtained from the dynamic analysis unit 120 and outputs a result of whether the binary program is input and related.

즉, 정적 분석부(120)는 취약한 지점의 주소와 취약한 주소까지의 동적 call trace를 입력 받으면, 오염된 데이터가 쓰인 지점의 주소를 입력하고 그 지점이 있는 함수부터 함수 내 분석(Intraprocedural analysis)을 수행한다. 여기서, 함수 내 분석은 Reaching Definition과 Use-Def chaining을 기반으로 한 역테인트 분석을 실시하는 것을 의미할 수 있다. 역테인트 분석은 오염된 데이터가 어느 프로그램 지점까지 영향을 주는 지에 대해 프로그램 실행의 역방향으로 분석한다. 함수내 분석이 완료되면, 정적 분석부(120)는 함수 단위 분석(Interprocedural analysis)을 수행한다. 즉, 현재 분석 지점의 함수 내부에 read()와 같은 시스템콜이 있는지에 대해 확인하여, 프로그램의 입력과 관련이 있는지 확인한다. 만약 있다면 분석은 종료하게 되고, 없으면 현재 분석하고 있는 함수를 호출한 함수(Caller)를 찾아 함수 간 분석(Interprocedural analysis)을 하게 된다. 이때 함수 간 분석에서는 실제 실행을 통해 얻은 call trace를 바탕으로 분석을 하게 된다. Call trace는 함수 간 분석을 할 때 취약점을 유발할 수 있는 지점에 대해 분석할 때 실제 실행된 함수만 분석하게 해줌으로써 분석의 범위를 좁혀준다. 만약 함수 인자로부터 영향을 받았다면, 해당 함수의 호출 함수를 찾고 호출 함수에 대해 함수 내 분석을 수행한다. 함수내 분석 수행을 마치면, 마찬가지로 프로그램의 입력과의 관계성 분석과 함수 간 분석을 반복한다. 만약 함수의 인자와 관련이 없다면 분석을 종료하게 된다. 그리고 그 데이터가 함수의 입력에 영향을 준다면, 해당 함수를 호출하는 함수에 대해 역테인트 분석을 실시한다. 만약, read()와 같은 시스템콜의 결과에 영향을 받는다면 입력과 연관이 있는 것으로 판단한다. That is, when the static analysis unit 120 receives the address of the vulnerable point and the dynamic call trace to the vulnerable address, it inputs the address of the point where the contaminated data is written, and performs intraprocedural analysis from the function where the point is located. Perform. Here, intra-function analysis may mean performing an inverse taint analysis based on Reaching Definition and Use-Def chaining. Inverse-taint analysis analyzes the reverse direction of program execution as to which program point contaminated data affects. When the intra-function analysis is completed, the static analysis unit 120 performs an interprocedural analysis. In other words, it checks whether there is a system call such as read() inside the function of the current analysis point, and checks whether it is related to the input of the program. If there is, the analysis is terminated, and if not, interprocedural analysis is performed by searching for the caller that called the function being analyzed. At this time, in the analysis between functions, analysis is performed based on the call trace obtained through actual execution. Call trace narrows the scope of analysis by allowing only the actually executed function to be analyzed when analyzing points that can cause weaknesses when analyzing between functions. If it is affected by a function argument, it finds the calling function of the function and performs intra-function analysis on the calling function. When the analysis in the function is finished, the relationship analysis with the input of the program and the analysis between the functions are repeated. If it is not related to the argument of the function, the analysis ends. And if the data affects the input of the function, an inverse taint analysis is performed on the function that calls the function. If it is affected by the result of a system call such as read(), it is determined that it is related to the input.

이러한, 정적 분석부(120)는 함수내 분석모듈(122), 함수 단위 분석모듈(124)을 포함한다. The static analysis unit 120 includes an intra-function analysis module 122 and a function unit analysis module 124.

함수내 분석모듈(122)은 오염된 데이터가 쓰인 지점의 주소를 입력하고 그 지점이 있는 함수부터 함수 내 분석(Intraprocedural analysis)을 수행한다. 여기서, 함수내 분석은 크래시가 일어난 지점의 명령어를 오염된 명령어의 집합에 넣은 후 분석을 시작하고, 상기 명령어에 영향을 준(use) 명령어를 찾는 과정을 반복하는 역테인트 분석을 수행한다.The intra-function analysis module 122 inputs the address of a point where the contaminated data is written, and performs intraprocedural analysis from the function having the point. Here, the intra-function analysis performs an inverse taint analysis in which the instruction at the point where the crash occurred is inserted into the set of contaminated instructions, and then the analysis is started, and the process of finding the instruction that has influenced the instruction is repeated.

함수 단위 분석모듈(124)은 현재 분석 지점의 함수 내부에 시스템콜이 있는지에 대해 확인하여, 프로그램의 입력과 관련이 있는지 확인하는 함수 단위 분석(Interprocedural analysis)을 수행한다. 즉, 함수 단위 분석모듈(124)은 역테인트 분석의 결과가 함수의 입력과 관련이 있는지를 확인하고, 그 확인결과 관련이 있는 경우 콜 트레이스를 바탕으로 해당 함수를 호출한 함수(caller function)를 모두 찾고, Caller 함수에서 callee함수의 호출 지점을 찾으며, callee 함수의 입력을 넣어주는 지점을 모두 찾고 이 지점에 대해 역테인트 분석을 수행한다.The function unit analysis module 124 checks whether there is a system call in the function of the current analysis point and performs an interprocedural analysis to check whether it is related to an input of a program. That is, the function unit analysis module 124 checks whether the result of the inverse taint analysis is related to the input of the function, and if the result of the check is related, the function that called the function (caller function) based on the call trace It finds all, finds the call point of the callee function in the Caller function, finds all the points where the input of the callee function is put, and performs inverse taint analysis on this point.

구체적으로, 정적 분석은 동적 분석에 비해 정보가 부족하기 때문에 분석의 결과가 정교하지 않다. 특히 CFG에서 간접 점프에 관한 정보가 없고, call sensitivity 때문에 정교하지 않은 분석이 이루어지게 된다. 함수 단위 분석모듈은 실제 call trace를 통하여 간접 점프 정보를 얻고, 실제 취약점과 관련된 경로만을 분석할 수 있게 해준다. 함수 단위 분석모듈은 함수 단위 분석을 위해서 동적 분석부에서 얻은 call trace를 이용한다. Call trace를 이용함으로써, 간접 점프 정보와 실제 취약점이 발생한 실행 경로만을 분석할 수 있다. Specifically, because static analysis lacks information compared to dynamic analysis, the results of analysis are not elaborate. In particular, there is no information about indirect jumps in CFG, and due to call sensitivity, an unsophisticated analysis is performed. The function unit analysis module obtains indirect jump information through the actual call trace and allows only the path related to the actual vulnerability to be analyzed. The function unit analysis module uses the call trace obtained from the dynamic analysis unit for function unit analysis. By using call trace, only indirect jump information and execution path where the actual vulnerability has occurred can be analyzed.

함수 단위 분석(Interprocedural Analysis)는 오염된 데이터의 원출처를 알기 위해서 반드시 필요하다. 프로그램은 많은 함수들로 구성된다. 입력과 관련된 시스템콜은 프로그램의 초기에 있을 가능성이 크지만, 개발자가 관찰하고자 하는 데이터는 프로그램 어디에도 존재할 수 있다. 따라서 오염된 데이터의 원출처를 알기 위해서 함수 단위 분석은 선택이 아닌 필수이다. Interprocedural Analysis is essential to know the source of contaminated data. A program consists of many functions. The system call related to the input is likely to be at the beginning of the program, but the data that the developer wants to observe can exist anywhere in the program. Therefore, in order to know the source of contaminated data, functional unit analysis is not optional, but essential.

함수 단위 분석은 도 6에 도시된 알고리즘대로 분석을 한다. 먼저, 크래시가 일어난 함수 내에서 역테인트 분석을 한다. 역테인트 분석의 결과가 함수의 입력과 관련이 있는지를 확인한다. 관련이 있다면 4번처럼 call graph를 바탕으로 이 함수를 호출한 함수(caller function)을 모두 찾는다. 이때 call graph는 동적으로 얻은 call graph일 수 있다. Caller 함수에서 callee함수의 호출 지점을 찾고, callee함수의 임력을 넣어주는 지점을 모두 찾고 이 지점에 대해 역테인트 분석을 실시한다. 역테인트 분석 후, 도 6에 정의된 상황에 따라 분석을 한다. 만약 함수의 입력과 관련이 없다면 분석을 멈추게 된다. The functional unit analysis is performed according to the algorithm shown in FIG. 6. First, do an inverse taint analysis within the function where the crash occurred. Check whether the result of inverse taint analysis is related to the input of the function. If relevant, find all the functions that called this function based on the call graph as in step 4. At this time, the call graph may be a dynamically obtained call graph. In the Caller function, find the call point of the callee function, find all the points where the power of the callee function is put, and perform an inverse taint analysis on this point. After inverse taint analysis, analysis is performed according to the situation defined in FIG. 6. If it has nothing to do with the input of the function, the analysis stops.

도 3은 정적 call graph가 있고 ‘crashfunc’함수에서 크래시가 발생한 예제이다. ‘crashfunc’ 함수는 func4와 func5에서 호출한다. 이때 동적으로 얻은 call graph에서 ‘func4’만을 실행했다고 할 때 ‘func4’에 대해 분석을 한다고 가정하면, ‘crashfunc’ 함수의 인자는 ‘func4’ 함수 내에서 ‘eax’레지스터와 ‘ecx’레지스터의 영향을 받는다. 그리고 이 레지스터들은 ‘ebp-4’와 ‘ebp-8’이 가리키는 주소가 가지는 값으로 영향을 받고 이 주소들은 ‘func4’함수의 인자들이기 때문에, ‘func4’를 호출한 ‘func3’에 대해서도 같은 분석이 이루어진다. 3 is an example of a static call graph and a crash in the'crashfunc' function. The ‘crashfunc’ function is called from func4 and func5. At this time, assuming that'func4' is only executed in the dynamically obtained call graph, assuming that'func4' is analyzed, the argument of the'crashfunc' function is the influence of the'eax' and'ecx' registers in the'func4' function Receive. And since these registers are affected by the values of the addresses pointed to by'ebp-4' and'ebp-8', and these addresses are arguments of the'func4' function, the same analysis for'func3' calling'func4' This is done.

함수 단위 분석 시 외부 라이브러리 함수 호출에 대해서도 고려해야 한다. 정적 분석에서는 외부 라이브러리에 대한 정보가 부족하기 때문에 함수 내에서 오염된 데이터가 어떻게 퍼지는지에 대해 알 수 없다. 따라서 외부 라이브러리 함수 호출에 대해서 보수적으로 분석해야하기 때문에, 외부 함수 호출이 있을 경우, 해당 함수의 인자들이 모두 오염된 것으로 판단한다. When analyzing a function unit, it is also necessary to consider calling external library functions. In static analysis, information about the external library is insufficient, so it is impossible to know how the contaminated data spreads within the function. Therefore, because external library function calls must be analyzed conservatively, if there is an external function call, it is determined that all arguments of the corresponding function are contaminated.

이처럼, 정적 분석부(120)는 프로그램을 동적 분석하기 전에 정적 분석을 통해 분석에 필요할 명령어를 추출하기 때문에, 동적 분석할 명령어가 줄어들게 되어 보다 빠르게 동적 분석을 수행할 수 있다.As described above, since the static analysis unit 120 extracts commands necessary for analysis through static analysis before dynamic analysis of the program, the number of commands to be dynamically analyzed is reduced, so that the dynamic analysis can be performed more quickly.

동적 역방향 분석부(130)는 정적 분석부(120)에서 생성된 의존 그래프를 기반으로 바이너리 프로그램을 동적 역방향 분석을 수행하여, 바이너리 프로그램의 입력과 관련 여부에 대한 결과를 출력한다. The dynamic reverse analysis unit 130 performs a dynamic reverse analysis on the binary program based on the dependency graph generated by the static analysis unit 120, and outputs a result of whether the binary program is input and related.

동적 분석 장치(100)는 정적분석이 이루어진 후에 정적분석의 결과를 바탕으로 다시 한 번 동적분석을 수행한다. 이는 정적분석은 동적분석에 비해 정보가 분석을 하기 위한 정보가 적기 때문이다. 예컨대, 레지스터가 가리키는 메모리 정보, 시스템콜 등의 정보가 적기 때문에 분석을 할 때 이러한 부분들 때문에 부정확한 분석이 이루어질 수 있다. 하여 정적 분석 후에 동적분석을 다시 하게 된다. 그러나, 정적분석의 결과에서 취약점이 없다고 판단된 경우, 동적 분석을 하지 않는다.After the static analysis is performed, the dynamic analysis apparatus 100 performs dynamic analysis once again based on the result of the static analysis. This is because static analysis has less information for analysis than dynamic analysis. For example, since information such as memory information and system call indicated by a register is small, inaccurate analysis may occur due to these parts when analyzing. Therefore, dynamic analysis is again performed after static analysis. However, if it is determined that there is no vulnerability from the result of static analysis, dynamic analysis is not performed.

동적 역방향 분석부(140)는 정적분석에서 얻은 의존 그래프를 기반으로 분석을 하게 되는데, 이 의존 그래프는 정적분석에서 생성된 의존 그래프를 도 7과 같이 가지치기하여 생성된 그래프일 수 있다. 도 7의 (a)는 정적분석에서 얻은 Use-Def 그래프(의존 그래프)로, 빨간색은 실제로는 Use-Def 관계가 아니지만 정적분석에서 이와 같이 생성된다. (b)는 (a)를 바탕으로 동적분석을 한 그래프로, 동적분석에서는 빨간색과 같은 잘못된 Use-Def 관계를 제거한다. 정적분석에서는 메모리값과 같은 정확한 레지스터의 값을 모르기 때문에 분석의 안전성을 위하여 실제로는 use-def관계가 아니지만 use-def 관계로 정의하는 경우가 있다. 동적분석에서는 정확한 레지스터의 값과 메모리 정보를 알고 있기 때문에 이러한 것들에 대해서 가지치기가 가능하다. The dynamic reverse analysis unit 140 performs an analysis based on the dependence graph obtained from the static analysis, which may be a graph generated by pruning the dependence graph generated in the static analysis as shown in FIG. 7. 7A is a Use-Def graph (dependency graph) obtained from static analysis, and red is not actually a Use-Def relationship, but is generated in this manner in static analysis. (b) is a graph for dynamic analysis based on (a). In dynamic analysis, the wrong Use-Def relationship such as red is removed. In static analysis, since the exact register value such as the memory value is not known, for the safety of analysis, it is not actually a use-def relationship, but it is sometimes defined as a use-def relationship. In dynamic analysis, it is possible to prune these things because the exact register values and memory information are known.

이렇게 함으로써 정적분석의 부정확함과 동적분석의 분석시간이 많이 든다는 단점을 완화시키는 분석이 가능하다. In this way, analysis is possible to alleviate the disadvantages of inaccuracy of static analysis and time-consuming analysis of dynamic analysis.

한편, 프로그램의 동적 분석은 프로그램의 취약점 분석과 악성코드 분석 등 보안 분야에서 널리 활용된다. 그중에서 동적 역방향 분석은 프로그램의 특정 지점이 프로그램의 입력과 관계가 있는지 파악하기 위해 사용한다. 하지만, 동적 분석은 프로그램을 실행시켜야 하기 때문에 시간이 오래 걸린다는 단점이 있다. 분석이 오래 걸리면 취약점의 패치나 악성코드에 대한 대처가 제때 되지 않을 수 있다. On the other hand, dynamic analysis of programs is widely used in security fields such as program vulnerability analysis and malicious code analysis. Among them, dynamic reverse analysis is used to determine whether a specific point in the program is related to the input of the program. However, dynamic analysis has the disadvantage that it takes a long time because the program must be executed. If the analysis takes a long time, it may not be timely to patch vulnerabilities or deal with malicious codes.

이에, 프로그램 동적 분석 장치(100)는 동적 역방향 분석을 수행하여 분석 시간을 줄이고, 이로 인해 보다 빠르게 취약점이나 악성코드에 대처할 수 있게 해준다. Accordingly, the program dynamic analysis apparatus 100 reduces analysis time by performing a dynamic reverse analysis, thereby enabling a faster response to a vulnerability or malicious code.

도 8 및 도 9는 본 발명의 일 실시예에 따른 프로그램 동적 분석 방법을 설명하기 위한 순서도이다. 8 and 9 are flowcharts illustrating a program dynamic analysis method according to an embodiment of the present invention.

도 8 및 9를 참조하면, 동적 분석 장치는 바이너리 프로그램을 동적 분석하여 콜 트레이스(call trace)를 획득한다(S810).8 and 9, the dynamic analysis apparatus dynamically analyzes a binary program to obtain a call trace (S810).

단계 S810의 수행 후, 동적 분석 장치는 콜 트레이스를 기반으로 바이너리 프로그램의 테인트(taint)에 대한 정적 분석을 수행하여 의존 그래프를 생성한다(S820). 이때, 동적 분석 장치는 크래시를 발생시킨 명령어들 각각에 대하여 명령어들의 도달지점들을 식별하고, 도달지점들 중 해당 명령어가 실제로 사용되는 지점을 찾아낸 후, 그 결과를 의존 그래프로 생성한다. After performing step S810, the dynamic analysis device generates a dependency graph by performing static analysis on the taint of the binary program based on the call trace (S820). At this time, the dynamic analysis apparatus identifies arrival points of instructions for each of the instructions that caused the crash, finds a point where the instruction is actually used among the arrival points, and generates the result as a dependency graph.

단계 S820 수행 후, 동적 분석 장치는 크래시가 일어난 지점의 명령어를 오염된 명령어의 집합에 넣은 후, 오염된 데이터가 쓰인 지점의 주소를 입력하고 그 지점이 있는 함수부터 함수 내 분석(Intraprocedural analysis)을 수행한다(S830). 즉. 동적 분석장치는 크래시가 일어난 함수 내에서 역테인트 분석을 한다.After performing step S820, the dynamic analysis device inserts the instruction at the point where the crash occurred into the set of contaminated instructions, inputs the address of the point where the polluted data is written, and performs intraprocedural analysis starting from the function at the point. It performs (S830). In other words. The dynamic analysis device performs reverse taint analysis within the function where the crash occurred.

단계 S830이 수행되면, 동적 분석 장치는 역테인트 분석의 결과가 프로그램의 입력과 관계가 있는지를 판단한다(S840). 즉, 동적 분석 장치는 현재 분석 지점의 함수 내부에 read()와 같은 시스템콜이 있는지에 대해 확인하여, 프로그램의 입력과 관련이 있는지 확인한다. When step S830 is performed, the dynamic analysis device determines whether the result of the inverse taint analysis is related to the input of the program (S840). That is, the dynamic analysis device checks whether there is a system call such as read() in the function of the current analysis point, and checks whether it is related to the input of the program.

단계 S840의 판단결과 관련이 있다면, 동적 분석 장치는 크래시의 위험도를 분석한다(S850). 즉, 프로그램의 입력과 관련이 있다면, 해당 크래시를 공격 가능한 것으로 결정한다. 이처럼 공격 가능성이 있는 것으로 판단되면, 동적 분석 장치는 의존 그래프를 기반으로 바이너리 프로그램을 동적 역방향 분석을 수행하여, 상기 바이너리 프로그램의 입력과 관련 여부에 대한 결과를 출력한다.If the determination result of step S840 is related, the dynamic analysis device analyzes the risk of crash (S850). That is, if it is related to the input of the program, it is determined that the crash is attackable. If it is determined that there is a possibility of an attack as described above, the dynamic analysis apparatus performs a dynamic reverse analysis of the binary program based on the dependency graph, and outputs a result of whether the binary program is related to the input.

만약, 단계 S840의 판단결과 관련이 없으면, 동적 분석 장치는 함수간 분석을 수행하고(S860), 단계 S840을 수행한다. 즉, 프로그램 입력과 관련이 없으면, 현재 분석하고 있는 함수를 호출한 함수(Caller)를 찾아 함수 간 분석(Interprocedural analysis)을 하게 된다. 이때 함수 간 분석에서는 실제 실행을 통해 얻은 call trace를 바탕으로 분석을 하게 된다. Call trace는 함수 간 분석을 할 때 취약점을 유발할 수 있는 지점에 대해 분석할 때 실제 실행된 함수만 분석하게 해줌으로써 분석의 범위를 좁혀준다. 만약 함수 인자로부터 영향을 받았다면, 해당 함수의 호출 함수를 찾고 호출 함수에 대해 함수 내 분석을 수행한다. 함수내 분석을 수행을 마치면, 마찬가지로 프로그램의 입력과의 관계성 분석과 함수 간 분석을 반복한다. 만약 함수의 인자와 관련이 없다면 분석을 종료하게 된다. 그리고 그 데이터가 함수의 입력에 영향을 준다면, 해당 함수를 호출하는 함수에 대해 역테인트 분석을 실시한다. 만약, read()와 같은 시스템콜의 결과에 영향을 받는다면 입력과 연관이 있는 것으로 판단한다.If the determination result of step S840 is not related, the dynamic analysis apparatus performs inter-function analysis (S860) and performs step S840. In other words, if it is not related to the program input, interprocedural analysis is performed by finding the function that called the function being analyzed (Caller). At this time, in the analysis between functions, analysis is performed based on the call trace obtained through actual execution. Call trace narrows the scope of analysis by allowing only the actually executed function to be analyzed when analyzing points that can cause weaknesses when analyzing between functions. If it is affected by a function argument, it finds the calling function of the function and performs intra-function analysis on the calling function. After performing the intra-function analysis, similarly, the relationship analysis with the input of the program and the analysis between functions are repeated. If it is not related to the argument of the function, the analysis ends. And if the data affects the input of the function, an inverse taint analysis is performed on the function that calls the function. If it is affected by the result of a system call such as read(), it is determined that it is related to the input.

이제까지 본 발명에 대하여 그 바람직한 실시예들을 중심으로 살펴보았다. 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.So far, the present invention has been looked at around its preferred embodiments. Those of ordinary skill in the art to which the present invention pertains will be able to understand that the present invention can be implemented in a modified form without departing from the essential characteristics of the present invention. Therefore, the disclosed embodiments should be considered from an illustrative point of view rather than a limiting point of view. The scope of the present invention is shown in the claims rather than the foregoing description, and all differences within the scope equivalent thereto should be construed as being included in the present invention.

100 : 동적 분석 장치
110 : 동적 분석부
120 : 정적 분석부
130 : 동적 역방향 분석부100: dynamic analysis device
110: dynamic analysis unit
120: static analysis unit
130: dynamic reverse analysis unit

Claims

A dynamic analysis unit that dynamically analyzes the binary program to obtain a call trace;
A static analysis unit for generating a dependency graph including a plurality of nodes by performing a static analysis on a taint of the binary program based on the call trace; And
Performing pruning to remove a specific node from the dependency graph based on register values and memory information, and performing dynamic reverse analysis of the binary program based on the pruned dependency graph, and inputting the binary program Dynamic reverse analysis unit that outputs the result of the relationship
Including,
The specific node is a node corresponding to a relationship between the specific instruction and a definition not used for the specific instruction among a plurality of definitions that can reach a specific instruction.

The method of claim 1,
The static analysis unit,
By performing a taint analysis through the execution of the binary program, all commands affected by each of the commands that caused a crash for the binary program are statically analyzed to analyze the risk of the crash. Program dynamic analysis device.

The method of claim 2,
The static analysis unit,
For each of the instructions that caused the crash, the arrival points of the instructions are identified, the point where the instruction is actually used among the arrival points is found, the result is generated as a dependency graph, and the program is After identifying a command capable of transferring control rights, an attack possibility of the command causing the crash is analyzed.

The method of claim 1,
The static analysis unit,
An intra-function analysis module that inputs an address of a point where contaminated data is written and performs intraprocedural analysis from a function having the point; And
A program dynamic analysis device comprising a function unit analysis module that checks whether there is a system call inside a function of a current analysis point and performs interprocedural analysis to check whether it is related to an input of a program. .

The method of claim 4,
The analysis module in the function,
A program dynamic analysis device, characterized in that it performs an inverse taint analysis in which the instruction at the point where the crash occurred is inserted into the set of contaminated instructions, and then analysis starts, and repeats a process of finding a use instruction that affects the instruction. .

The method of claim 4,
The functional unit analysis module,
It checks whether there is a system call inside the function of the current analysis point, checks whether it is related to the input of the program, and if it is related to the check result, the caller function that called the corresponding function is determined based on the call trace. A program dynamic analysis device, characterized in that after searching for all, finding a call point of the callee function in the Caller function, searching for all points to which the input of the callee function is put, and performing an inverse taint analysis on that point.

delete

Dynamically analyzing a binary program by a dynamic analysis unit to obtain a call trace;
Generating a dependency graph by performing static analysis on the taint of the binary program based on the call trace by a static analysis unit; And
By performing pruning to remove a specific node from the dependency graph based on register values and memory information by a dynamic reverse analysis unit, and performing dynamic reverse analysis of the binary program based on the pruned dependency graph, Outputting a result of whether the binary program is related to the input
Including,
The specific node is a node corresponding to a relationship between the specific instruction and a definition not used for the specific instruction among a plurality of definitions that can reach a specific instruction.

The method of claim 8,
The step of generating the dependence graph by the static analysis unit,
Inputting an address of a point where contaminated data is written by an analysis module in a function of the static analysis unit, and performing intraprocedural analysis from a function having the point; And
Including the step of performing an interprocedural analysis of checking whether there is a system call in the function of the current analysis point by the function unit analysis module of the static analysis unit, and whether it is related to the input of the program. Program dynamic analysis method characterized by.