KR102267618B1

KR102267618B1 - Apparatus and Method for crash risk classification of program

Info

Publication number: KR102267618B1
Application number: KR1020180169834A
Authority: KR
Inventors: 조은선; 목성균; 전현구
Original assignee: 충남대학교산학협력단
Priority date: 2017-12-26
Filing date: 2018-12-26
Publication date: 2021-06-21
Also published as: KR20190078546A

Abstract

본 발명은 프로그램의 크래시 위험도 분석 장치 및 그 방법에 관한 것으로, 크래시 위험도 분석 장치는 분석 대상인 바이너리(binary)의 실행을 통한 테인트 분석(taint analysis)을 수행하여, 상기 바이너리에 대하여 크래시(crash)를 발생시킨 명령어들 각각에 의해 영향을 받는 모든 명령어를 정적 분석하여 해당 크래시의 위험도를 분석하는 위험도 분석기를 포함하되, 상기 위험도 분석기는, 상기 크래시들을 발생시킨 명령어들 각각에 대하여 상기 명령어들의 도달지점들과 사용가능성(available)을 식별하고, 상기 도달지점들과 사용가능성에 기초하여 프로그램의 제어권을 옮길 수 있는 명령어를 식별하며, 상기 식별된 명령어가 저장(store) 계열 명령어이거나, 분기(branch) 계열 명령어인 경우 공격 가능성을 분석하고, 그 분석 결과에 기초하여 위험도의 단계로 분류할 수 있다. The present invention relates to a crash risk analysis apparatus and method for a program, wherein the crash risk analysis apparatus performs taint analysis through execution of a binary to be analyzed, and crashes the binary. A risk analyzer for analyzing the risk of the crash by statically analyzing all the instructions affected by each of the instructions that caused the crash, wherein the risk analyzer is, for each of the instructions that caused the crashes, the arrival point of the instructions Identification of locations and availability, and an instruction capable of transferring control of a program based on the arrival points and availability, wherein the identified instruction is a store series instruction, or a branch In the case of a series command, the attack potential can be analyzed, and the level of risk can be classified based on the analysis result.

Description

Apparatus and Method for crash risk classification of program

본 발명은 프로그램의 크래시 위험도 분석 장치 및 그 방법에 관한 것으로, 특히 단계별 정적 분석을 통해 보다 정교하게 크래시 위험도를 분류하는 프로그램의 크래시 위험도 분석 장치 및 그 방법에 관한 것이다.The present invention relates to an apparatus and method for analyzing a crash risk of a program, and more particularly, to an apparatus and method for analyzing a crash risk of a program that more precisely classifies the crash risk through a step-by-step static analysis.

바이너리 프로그램 분석은 소스 코드에 의존하지 않고 프로그램의 구조와 실행 흐름 등을 파악하는데 매우 중요하다. 하지만 다른 모든 분석들과 마찬가지로, 정확한 프로그램 분석에는 프로그램의 크기에 비례하여 많은 시간이 걸리게 되고, 시간이 적게 소요되는 간단한 분석 방법으로는 그 적용 범위가 매우 작거나 잘못된 분석 결과를 도출할 확률이 높아진다. 따라서 바이너리 프로그램을 효율적으로 분석하려는 연구가 필요하다.Binary program analysis is very important to understand the program structure and execution flow without depending on the source code. However, as with all other analyzes, accurate program analysis takes a lot of time in proportion to the size of the program, and a simple analysis method that takes less time has a very small application range or increases the probability of deriving incorrect analysis results. . Therefore, research to efficiently analyze binary programs is needed.

바이너리 정적 분석은 취약점 분석, 악성코드 분석, 표절 탐지 등에서 사용되고 있다. 이러한 바이너리 정적 분석 시, 크래시가 발생할 수 있다. 크래시란 프로그램을 실행할 때 예외처리가 되지 않아 프로그램이 종료되는 예외 상황을 의미하며, 이는 개발자가 의도치 않은 실행 흐름으로 이를 악성 코드를 실행시키는 등 악용할 소지가 있어 반드시 수정해야 한다.Binary static analysis is used in vulnerability analysis, malicious code analysis, and plagiarism detection. During static analysis of such binary, a crash may occur. Crash refers to an exception situation in which the program is terminated because no exception handling is made when executing the program. This is an unintended execution flow that the developer may abuse, such as executing malicious code, so it must be fixed.

이때 크래시를 수동으로 분석하는 것은 매우 시간이 오래 걸리고 힘들기 때문에 자동화된 도구를 사용하는 것이 불가피하다.Manually analyzing crashes at this time is very time consuming and laborious, so the use of automated tools is unavoidable.

이에, 바이너리 프로그램의 크래시를 자동으로 분석할 수 있는 기술 개발이 요구되고 있다. Accordingly, there is a need to develop a technology capable of automatically analyzing a crash of a binary program.

한국 등록특허공보 제10-1482073호(2006.04.24.)Korean Patent Publication No. 10-1482073 (2006.04.24.)

본 발명이 해결하고자 하는 과제는 바이너리 파일에 대한 크래시 위험도를 자동으로 단계별로 분석할 수 있는 프로그램의 크래시 위험도 분석 장치 및 그 방법을 제공하는 것이다. SUMMARY OF THE INVENTION An object of the present invention is to provide an apparatus and method for analyzing a crash risk of a program capable of automatically analyzing a crash risk for a binary file step by step.

본 발명이 해결하고자 하는 과제는 이상에서 언급한 과제(들)로 제한되지 않으며, 언급되지 않은 또 다른 과제(들)은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The problem to be solved by the present invention is not limited to the problem(s) mentioned above, and another problem(s) not mentioned will be clearly understood by those skilled in the art from the following description.

상기한 과제를 해결하기 위하여, 본 발명의 일 실시예에 따른 크래시 위험도 분석 장치는, 분석 대상인 바이너리(binary)의 실행을 통한 테인트 분석(taint analysis)을 수행하여, 상기 바이너리에 대하여 크래시(crash)를 발생시킨 명령어들 각각에 의해 영향을 받는 모든 명령어를 정적 분석하여 해당 크래시의 위험도를 분석하는 위험도 분석기를 포함하되, 상기 위험도 분석기는, 상기 크래시들을 발생시킨 명령어들 각각에 대하여 상기 명령어들의 도달지점들과 사용가능성(available)을 식별하고, 상기 도달지점들과 사용가능성에 기초하여 프로그램의 제어권을 옮길 수 있는 명령어를 식별하며, 상기 식별된 명령어가 저장(store) 계열 명령어이거나, 분기(branch) 계열 명령어인 경우 공격 가능성을 분석하고, 그 분석 결과에 기초하여 위험도의 단계로 분류하는 것을 특징으로 한다. In order to solve the above problem, the crash risk analysis apparatus according to an embodiment of the present invention performs taint analysis through execution of a binary to be analyzed, and crashes the binary. ) including a risk analyzer that statically analyzes all instructions affected by each of the instructions that generated the crash and analyzes the risk of the crash, wherein the risk analyzer includes the arrival of the instructions for each of the instructions that caused the crashes Identifies points and availability, identifies an instruction capable of transferring control of a program based on the arrival points and availability, wherein the identified instruction is a store family instruction or a branch ) in the case of a series of commands, it is characterized in that the attack potential is analyzed, and the level of risk is classified based on the analysis result.

바람직하게는, 상기 도달지점들 식별은 아래 수학식을 이용하되, 미트 연산자(meet operator)로 합집합을 이용하여 In을 획득할 수 있다. Preferably, the arrival points are identified using the following equation, but In can be obtained by using union as a meet operator.

[수학식][Equation]

여기서, B는 basic block,

는 집합 out,

는 전달함수로,

를 의미하고,

는 집합 in를 의미함.where B is the basic block,

is set out,

is the transfer function,

means,

means the set in.

바람직하게는, 상기 사용가능성은 아래 수학식을 이용하여 식별하되, 미트 연산자(meet operator)로 교집합(∩)을 이용하여 In을 획득할 수 있다. Preferably, the usability is identified using the following equation, but In can be obtained by using the intersection (∩) as a meet operator.

[수학식][Equation]

바람직하게는, 상기 위험도의 단계는, SE(Strongly Exploitable), E(Exploitable), PE(Probably Exploitable), NE(Not Exploitable)를 포함할 수 있다.Preferably, the level of risk may include SE (Strongly Exploitable), E (Exploitable), PE (Probably Exploitable), and NE (Not Exploitable).

바람직하게는, 상기 크래시 위험도 분석기는, 상기 프로그램의 제어권을 옮길 수 있는 명령어가 분기(branch) 계열 명령어이고, 상기 분기 계열 명령어의 목적 주소(Target Address)가 상기 크래시를 발생시킨 명령어로부터 영향을 받은 경우, 해당 크래시를 공격 가능한 것으로 결정하되, 해당 명령어가 도달지점들에 기초하여 식별된 분기 계열 명령어인 경우, 위험도를 E(exploitable) 단계로 분류하고, 해당 명령어가 사용가능성에 기초하여 식별된 분기 계열 명령어인 경우 위험도를 SE(Strongly Exploitable) 단계로 분류할 수 있다. Preferably, in the crash risk analyzer, an instruction capable of transferring control of the program is a branch series instruction, and a target address of the branch series instruction is affected by the instruction that caused the crash. In this case, it is determined that the crash is attackable, but if the instruction is a branch-based instruction identified based on the arrival points, the risk is classified as an E (exploitable) level, and the branch identified based on the availability of the instruction In the case of a series of instructions, the degree of risk can be classified into SE (Strongly Exploitable) level.

바람직하게는, 상기 위험도 분석기는, 상기 프로그램의 제어권을 옮길 수 있는 명령어가 저장 계열 명령어이고, 상기 저장 계열 명령어의 저장할 데이터를 가진 레지스터 또는 저장할 메모리 위치가 상기 크래시를 발생시킨 명령어로부터 영향을 받은 경우, 해당 크래시를 공격 가능한 것으로 결정하되, 해당 명령어가 도달지점들에 기초하여 식별된 저장 계열 명령어인 경우, 위험도를 E(exploitable) 또는 PE(probably exploitable)의 단계로 분류하고, 해당 명령어가 사용가능성에 기초하여 식별된 저장 계열 명령어인 경우 위험도를 SE(Strongly Exploitable) 단계로 분류할 수 있다. Preferably, in the risk analyzer, when the instruction capable of transferring control of the program is a storage series instruction, and a register having data to be stored or a memory location to be stored of the storage series instruction is affected by the instruction that caused the crash , it is determined that the crash is attackable, but if the command is a stored command identified based on the arrival points, the risk is classified into a level of E (exploitable) or PE (probably exploitable), and the command is available In the case of a storage sequence command identified based on , the risk may be classified into a strongly exploitable (SE) level.

바람직하게는, 상기 위험도 분석기는, 상기 명령어가 도달지점들에 기초하여 식별된 저장 계열 명령어이고, 상기 저장 계열 명령어의 저장할 데이터를 가진 레지스터만이 상기 크래시를 발생시킨 명령어로부터 영향을 받은 경우 또는 상기 저장 계열 명령어의 저장할 메모리 위치만이 상기 크래시를 발생시킨 명령어로부터 영향을 받은 경우, 위험도를 PE(probably exploitable)의 단계로 분류하고, 상기 저장 계열 명령어의 저장할 데이터를 가진 레지스터와 저장할 메모리 위치가 상기 크래시를 발생시킨 명령어로부터 영향을 받은 경우 위험도를 E(exploitable) 단계로 분류할 수 있다. Preferably, the risk analyzer is configured to: when the instruction is a storage family instruction identified based on arrival points, and only a register having data to store of the storage family instruction is affected by the instruction causing the crash, or When only the storage memory location of the storage sequence instruction is affected by the instruction that caused the crash, the risk is classified as a level of probably exploitable (PE), and the register having the data to be stored and the memory location to be stored of the storage sequence instruction are the above. If it is affected by the command that caused the crash, the level of risk can be classified as E (exploitable).

바람직하게는, 상기 위험도 분석기는, 상기 명령어가 사용가능성에 기초하여 식별된 저장 계열 명령어이고, 상기 저장 계열 명령어의 저장할 데이터를 가진 레지스터와 저장할 메모리 위치가 상기 크래시를 발생시킨 명령어로부터 영향을 받은 경우 위험도를 SE(Strongly Exploitable) 단계로 분류할 수 있다. Preferably, the risk analyzer is, when the instruction is a storage family instruction identified based on availability, and a register having data to store and a memory location to store of the storage family instruction are affected by the instruction that caused the crash. Risk can be classified into SE (Strongly Exploitable) level.

상기한 과제를 해결하기 위하여, 본 발명의 다른 실시예에 따른 분석 대상인 바이너리(binary)의 실행을 통한 테인트 분석(taint analysis)을 수행하여, 상기 바이너리에 대하여 크래시(crash)를 발생시킨 명령어들 각각에 의해 영향을 받는 모든 명령어를 정적 분석하여, 상기 크래시들을 발생시킨 명령어들 각각에 대하여 상기 명령어들의 도달지점들과 사용가능성(available)을 식별하는 정적 분석부, 상기 식별된 도달지점들과 사용가능성에 기초하여 프로그램의 제어권을 옮길 수 있는 명령어를 식별하는 익스플로이터블 포인트 분석부, 상기 식별된 명령어가 저장(store) 계열 명령어이거나, 분기(branch) 계열 명령어인 경우 공격 가능성을 분석하고, 그 분석 결과에 기초하여 위험도의 단계로 분류하는 위험도 분석부를 포함한다. In order to solve the above problems, commands that cause a crash with respect to the binary by performing taint analysis through the execution of the binary (binary) to be analyzed according to another embodiment of the present invention A static analysis unit that statically analyzes all commands affected by each, and identifies the arrival points and availability of the commands for each of the commands that caused the crashes, the identified arrival points and usage An exploitable point analysis unit that identifies an instruction capable of transferring control of a program based on the possibility, and if the identified instruction is a store series instruction or a branch series instruction, the attack potential is analyzed, and the It includes a risk analysis unit for classifying the risk level based on the analysis result.

바람직하게는, 상기 크래시들을 발생시킨 명령어들의 도달지점들을 식별하는 리칭-데프(Reaching Definition) 분석모듈, 크래시들을 발생시킨 명령어들의 사용가능성(available)을 식별하는 사용 가능성-데프(Avaliable Definition) 분석모듈, 메모리를 레지스터처럼 보고 kill하는 메모리 사용가능성-데프 분석모듈을 포함할 수 있다. Preferably, a Reaching Definition analysis module that identifies the arrival points of the instructions that caused the crashes, and an Availability-Def analysis module that identifies the availability of the instructions that caused the crashes , it may include a memory availability-def analysis module that sees and kills the memory like a register.

바람직하게는, 상기 리칭-데프 분석모듈은, 아래 수학식을 이용하여 도달지점들을 식별하되, 미트 연산자(meet operator)로 합집합을 이용하여 In을 획득할 수 있다. Preferably, the reaching-def analysis module identifies the arrival points using the following equation, but may obtain In using union as a meet operator.

[수학식] [Equation]

여기서, B는 basic block,

는 집합 out,

는 전달함수로,

를 의미하고,

는 집합 in를 의미함.where B is the basic block,

is set out,

is the transfer function,

means,

means the set in.

바람직하게는, 상기 사용 가능성-데프 분석모듈은, 아래 수학식을 이용하여 상기 사용가능성을 식별하되, 미트 연산자(meet operator)로 교집합(∩)을 이용하여 In을 획득할 수 있다. Preferably, the availability-def analysis module identifies the usability using the following equation, but may obtain In by using an intersection (∩) as a meet operator.

[수학식][Equation]

바람직하게는, 상기 위험도 분석부는, SE(Strongly Exploitable), E(Exploitable), PE(Probably Exploitable), NE(Not Exploitable) 중 적어도 하나로 위험도의 단계를 분류할 수 있다. Preferably, the risk analysis unit may classify the level of risk into at least one of Strongly Exploitable (SE), Exploitable (E), Probably Exploitable (PE), and Not Exploitable (NE).

상기한 과제를 해결하기 위하여, 본 발명의 또 다른 실시예에 따른 크래시 위험도 분석 장치가 크래시 위험도를 분석하는 방법에 있어서, 분석 대상인 바이너리(binary)의 실행을 통한 테인트 분석(taint analysis)을 수행하여, 상기 바이너리에 대하여 크래시들를 발생시킨 명령어들 각각에 대하여 도달지점들과 사용가능성(available)을 식별하는 단계, 상기 도달지점들과 사용가능성에 기초하여 프로그램의 제어권을 옮길 수 있는 명령어를 식별하는 단계, 상기 식별된 명령어가 저장(store) 계열 명령어이거나, 분기(branch) 계열 명령어인 경우 공격 가능성을 분석하고, 그 분석 결과에 기초하여 위험도의 단계로 분류하는 단계를 포함한다. In order to solve the above problem, in the method for the crash risk analysis apparatus according to another embodiment of the present invention to analyze the crash risk, taint analysis is performed through the execution of the binary to be analyzed. Thus, identifying arrival points and availability for each of the instructions that caused crashes with respect to the binary, and identifying an instruction capable of transferring control of the program based on the arrival points and availability Step, when the identified instruction is a store series instruction or a branch series instruction, analyzing the attack potential, and classifying it into a level of risk based on the analysis result.

본 발명에 따르면, 바이너리 파일과 크래시 정보로 정적 분석을 실시하고 크래시의 위험도를 자동으로 판단하여 사용자에게 보고함으로써, 비교적 더 심각한 문제를 초래할 수 있는 크래시를 우선적으로 보완하거나 공격가능성이 없는 크래시는 수정범위에서 제외할 수 있을 뿐만 아니라 공격가능성에 대한 정보로 보완에 도움을 줄 수 있다.According to the present invention, by performing static analysis with binary files and crash information, automatically determining the risk of crashes and reporting them to the user, crashes that can cause relatively more serious problems are preferentially supplemented or crashes with no attack potential are corrected. Not only can it be excluded from the range, but it can also help supplement with information on the possibility of attack.

또한, 본 발명에 따르면, 크래시의 위험도를 5가지로 분류함으로써 사용자가 크래시의 위험도를 보다 정밀하게 알 수 있다.In addition, according to the present invention, by classifying the crash risk into five categories, the user can know the crash risk more precisely.

또한, 본 발명에 따르면, 크래시를 3단계의 정적 분석을 함으로써, 크래시의 위험도를 여러 단계로 나누어 사용자에게 알려줄 수 있고, 이는 사용자가 보다 확실하게 크래시의 위험도를 알 수 있게 하고 프로그램의 취약점의 위험도를 알게 함으로써 패치의 순서를 알 수 있다.In addition, according to the present invention, by performing a three-stage static analysis of a crash, it is possible to divide the crash risk into several stages and notify the user, which allows the user to know the crash risk more reliably and the risk of program vulnerabilities. By knowing the , the order of the patches can be known.

본 발명의 효과들은 이상에서 언급한 효과들로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.Effects of the present invention are not limited to the above-mentioned effects, and other effects not mentioned will be clearly understood by those skilled in the art from the following description.

도 1은 본 발명의 일 실시 예에 따른 크래시 위험도 분석 장치에 대한 개략적인 블록도이다.
도 2는 본 발명의 일 실시예에 따른 위험도 분석기를 설명하기 위한 블록도이다.
도 3은 도 2에 도시된 정적 분석부의 세부 구성을 설명하기 위한 블록도이다.
도 4는 본 발명의 일 실시예에 따른 Reaching Definition와 Avaliable Definition를 설명하기 위한 예시도이다.
도 5는 본 발명의 일 실시예에 따른 Reaching Definition, Avaliable Definition, Menory Avaliable Definition의 특징을 설명하기 위한 도면이다.
도 6은 본 발명의 일 실시예에 따른 위험도 분류의 기준을 설명하기 위한 도면이다.
도 7은 본 발명의 일 실시 예에 따른 크래시 위험도 분석 방법에 대한 개략적인 처리 흐름도이다.
도 8은 본 발명의 일 실시예에 따른 위험도의 단계를 판단하는 방법을 설명하기 위한 흐름도이다.1 is a schematic block diagram of a crash risk analysis apparatus according to an embodiment of the present invention.
2 is a block diagram illustrating a risk analyzer according to an embodiment of the present invention.
FIG. 3 is a block diagram illustrating a detailed configuration of the static analysis unit shown in FIG. 2 .
4 is an exemplary diagram for explaining Reaching Definition and Avaliable Definition according to an embodiment of the present invention.
5 is a diagram for explaining the characteristics of Reaching Definition, Avaliable Definition, and Menory Avaliable Definition according to an embodiment of the present invention.
6 is a diagram for explaining the criteria for risk classification according to an embodiment of the present invention.
7 is a schematic flowchart of a crash risk analysis method according to an embodiment of the present invention.
8 is a flowchart illustrating a method of determining a level of risk according to an embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세하게 설명하고자 한다. 그러나 이는 본 발명을 특정한 실시 형태에 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 각 도면을 설명하면서 유사한 참조부호를 유사한 구성요소에 대해 사용하였다. Since the present invention can have various changes and can have various embodiments, specific embodiments are illustrated in the drawings and described in detail. However, this is not intended to limit the present invention to a specific embodiment, it should be understood to include all modifications, equivalents and substitutes included in the spirit and scope of the present invention. In describing each figure, like reference numerals have been used for like elements.

제1, 제2, A, B 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다. Terms such as first, second, A, and B may be used to describe various elements, but the elements should not be limited by the terms. The above terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, a first component may be referred to as a second component, and similarly, a second component may also be referred to as a first component. and/or includes a combination of a plurality of related listed items or any of a plurality of related listed items.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. When a component is referred to as being “connected” or “connected” to another component, it is understood that the other component may be directly connected or connected to the other component, but other components may exist in between. it should be On the other hand, when it is mentioned that a certain element is "directly connected" or "directly connected" to another element, it should be understood that the other element does not exist in the middle.

본 출원에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terms used in the present application are only used to describe specific embodiments, and are not intended to limit the present invention. The singular expression includes the plural expression unless the context clearly dictates otherwise. In the present application, terms such as “comprise” or “have” are intended to designate that a feature, number, step, operation, component, part, or combination thereof described in the specification exists, but one or more other features It should be understood that this does not preclude the existence or addition of numbers, steps, operations, components, parts, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms such as those defined in a commonly used dictionary should be interpreted as having a meaning consistent with the meaning in the context of the related art, and should not be interpreted in an ideal or excessively formal meaning unless explicitly defined in the present application. does not

이하에서는 본 발명에 따른 바람직한 실시예를 첨부된 도면을 참조하여 상세하게 설명한다.Hereinafter, a preferred embodiment according to the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시 예에 따른 크래시 위험도 분석 장치에 대한 개략적인 블록도이다.1 is a schematic block diagram of a crash risk analysis apparatus according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 일 실시 예에 따른 크래시 위험도 분석 장치(100)는 디스어셈블러(110), 중간 언어 변환기(120), 크래시 발생기(130) 및 크래시 위험도 분석기(140)를 포함한다.Referring to FIG. 1 , the crash risk analysis apparatus 100 according to an embodiment of the present invention includes a disassembler 110 , an intermediate language converter 120 , a crash generator 130 , and a crash risk analyzer 140 .

디스어셈블러(Disassembler)(110)는 입력 데이터를 기계어로 변환한다. 특히, 디스어셈블러(110)는 크래시 위험보 분석 대상인 바이너리 프로그램(binary program)(예컨대, ARM 바이너리 프로그램)을 디스어셈블링(disassembling)하여 기계어로 변환하고, 상기 기계어로부터 획득된 제어 흐름 정보(Control Flow Graph)를 출력한다.The disassembler 110 converts input data into machine language. In particular, the disassembler 110 disassembles a binary program (eg, an ARM binary program) that is a crash risk analysis target and converts it into machine language, and the control flow information obtained from the machine language (Control Flow Graph) ) is output.

중간 언어 변환기(Abstract Interpretation Framework)(120)는 디스어셈블러(110)에서 출력되는 기계어를 중간 언어(Intermediate Language)로 변환한다. 예를 들어, 중간 언어 변환기(120)는 상기 기계어를 REIL(Reverse Engineering Intermediate Language)라는 중간 언어로 변환한다. 이때, 기계어를 중간 언어로 변환하는 것은 과다하게 많은 명령어의 수를 줄여서 정적 분석의 편의를 제공하기 위함이다. 예를 들어, 약 300개의 ARM 명령어가 입력된 경우 이를 REIL 이라는 중간 언어로 변환하여 사용할 경우 17개로 줄일 수 있다.The intermediate language converter (Abstract Interpretation Framework) 120 converts the machine language output from the disassembler 110 into an intermediate language. For example, the intermediate language converter 120 converts the machine language into an intermediate language called Reverse Engineering Intermediate Language (REIL). In this case, the conversion of the machine language to the intermediate language is to reduce the number of excessively large instructions to provide convenience of static analysis. For example, if about 300 ARM instructions are input, it can be reduced to 17 when converted into an intermediate language called REIL.

크래시 발생기(130)는 고의로 크래시를 유발시킨다. 이를 위해, 크래시 발생기(130)는 다수의 크래시를 고의로 발생시키기 위해 사전에 제작된 크래시 유발 프로그램을 내장하고, 그 크래시 유발 프로그램(예컨대, Fuzzer 프로그램)에 의해 다수의 크래시들을 발생시킨다. 이는 일반적으로 해커들이 퍼저(Fuzzer)와 같은 크래시 유발 프로그램을 이용하여 대상 프로그램에 많은 크래시(crash)를 발생시키고, 그 크래시들을 직접 하나씩 분석한 후 공격 가능성이 있는 경우에 해당 취약점에 맞는 공격 코드를 작성해서 공격하는 공격 패턴을 역 이용하기 위한 것이다. 즉, 상기와 같이 고의로 발생된 크래시들 각각에 대한 공격 가능성(즉, 위험도)을 자동으로 분석하기 위함이다.The crash generator 130 intentionally causes the crash. To this end, the crash generator 130 contains a pre-made crash-inducing program to intentionally generate a plurality of crashes, and generates a plurality of crashes by the crash-inducing program (eg, a Fuzzer program). In general, hackers use a crash-inducing program such as a fuzzer to generate many crashes in the target program, analyze the crashes one by one, and, if there is an attack potential, find the attack code for the vulnerability. This is to reverse exploit the attack pattern that is written and attacked. That is, it is to automatically analyze the attack potential (ie, the degree of risk) for each of the crashes intentionally generated as described above.

크래시 위험도 분석기(140)는 중간언어 변환기(120)에서 중간 언어로 변환된 명령어들과, 크래시 발생기(130)에서 발생된 크래시 정보에 기초하여 해당 크래시의 위험도를 분석한다. 즉, 크래시 위험도 분석기(140)는 중간 언어로 변환된 명령어들 중 다수의 크래시들을 발생시킨 명령어들 각각에 의해 영향을 받는 모든 명령어를 정적으로 분석하여 해당 크래시의 위험도를 분석한다. 이를 위해, 위험도 분석기(140)는 다수의 크래시들을 발생시킨 명령어들 각각에 대하여 상기 명령어들의 도달지점(Reaching Definition)들을 식별하고, 그 도달지점들 중 해당 명령어가 실제로 사용되는 지점(Def-Use Chaining)을 찾아낸 후, 그 결과를 사용 지점 그래프로 도출한다. 또한, 위험도 분석기(140)는 다수의 크래시들을 발생시킨 명령어들 각각에 대하여 상기 명령어들의 사용가능성(Avaliable Definition)을 식별하고, 그 사용가능지점들 중 해당 명령어가 실제로 사용되는 지점(Def-Use Chaining)을 찾아낸 후, 그 결과를 사용 지점 그래프로 도출한다. The crash risk analyzer 140 analyzes the risk of a corresponding crash based on the commands converted into the intermediate language by the intermediate language converter 120 and crash information generated by the crash generator 130 . That is, the crash risk analyzer 140 statically analyzes all the commands affected by each of the commands that caused a plurality of crashes among the commands converted into the intermediate language to analyze the risk of the corresponding crash. To this end, the risk analyzer 140 identifies the reaching points (Reaching Definitions) of the instructions for each of the instructions causing a plurality of crashes, and the point at which the instruction is actually used among the reaching points (Def-Use Chaining) ), and then derive the result as a point of use graph. In addition, the risk analyzer 140 identifies the availability of the instructions for each of the instructions that caused a plurality of crashes, and the point where the instruction is actually used among the available points (Def-Use Chaining) ), and then derive the result as a point of use graph.

그런 후, 위험도 분석기(140)는 상기 도달지점들과 사용가능성에 기초하여 즉, 사용 지점 그래프로부터 프로그램의 제어권을 옮길 수 있는 명령어를 식별하고, 그 명령어에 대한 공격 가능성을 분석한다. 이때, 위험도 분석기(140)는 크래시를 발생시킨 명령어가 저장(store) 계열 명령어이거나, 분기(branch) 계열 명령어인 경우 공격 가능성을 분석하고, 그 분석 결과에 기초하여 위험도의 단계로 분류한다. 이때, 위험도 분석기는 SE(Strongly Exploitable), E(Exploitable), PE(Probably Exploitable), NE(Not Exploitable)로 위험도의 단계를 분류한다.Then, the risk analyzer 140 identifies a command capable of transferring control of the program from the usage point graph based on the arrival points and availability, and analyzes the attack potential for the command. At this time, the risk analyzer 140 analyzes the attack potential when the command that caused the crash is a store-based command or a branch-based command, and classifies it into a level of risk based on the analysis result. At this time, the risk analyzer classifies the level of risk into SE (Strongly Exploitable), E (Exploitable), PE (Probably Exploitable), and NE (Not Exploitable).

이러한 위험도 분석기(140)에 대한 상세한 설명은 도 2를 참조하기로 한다. For a detailed description of the risk analyzer 140, refer to FIG. 2 .

도 2는 본 발명의 일 실시예에 따른 위험도 분석기를 설명하기 위한 블록도, 도 3은 도 2에 도시된 정적 분석부의 세부 구성을 설명하기 위한 블록도, 도 4는 본 발명의 일 실시예에 따른 Reaching Definition와 Avaliable Definition를 설명하기 위한 예시도, 도 5는 본 발명의 일 실시예에 따른 Reaching Definition, Avaliable Definition, Menory Avaliable Definition의 특징을 설명하기 위한 도면, 도 6은 본 발명의 일 실시예에 따른 위험도 분류의 기준을 설명하기 위한 도면이다. Figure 2 is a block diagram for explaining the risk analyzer according to an embodiment of the present invention, Figure 3 is a block diagram for explaining the detailed configuration of the static analysis unit shown in Figure 2, Figure 4 is an embodiment of the present invention FIG. 5 is a view for explaining the characteristics of Reaching Definition, Avaliable Definition, and Menory Avaliable Definition according to an embodiment of the present invention, FIG. 6 is an embodiment of the present invention It is a diagram for explaining the standard of risk classification according to

도 2를 참조하면, 본 발명의 일 실시예에 따른 위험도 분석기는 정적 분석부(142), 익스플로이터블 포인트(Exploitable Point) 분석부(144), 위험도 분석부(146)를 포함한다. Referring to FIG. 2 , the risk analyzer according to an embodiment of the present invention includes a static analyzer 142 , an exploitable point analyzer 144 , and a risk analyzer 146 .

정적 분석부(142)는 분석 대상인 바이너리(binary)의 실행을 통한 테인트 분석(taint analysis)을 수행하여, 상기 바이너리에 대하여 크래시(crash)를 발생시킨 명령어들 각각에 의해 영향을 받는 모든 명령어를 정적 분석한다. 이때, 정적 분석부(142)는 바이너리 프로그램을 Reaching Definition, Avaliable Definition, Menory Avaliable Definition의 3단계의 정적 분석을 수행한다. The static analysis unit 142 performs taint analysis through the execution of the binary to be analyzed, so that all instructions affected by each of the instructions causing a crash with respect to the binary are analyzed. static analysis. In this case, the static analysis unit 142 performs static analysis of the binary program in three stages: Reaching Definition, Avaliable Definition, and Menory Avaliable Definition.

따라서, 정적 분석부(142)는 도 3에 도시된 바와 같이 리칭-데프(Reaching Definition) 분석모듈(142a), 사용 가능성-데프(Avaliable Definition) 분석모듈(142b), 메모리 사용가능성-데프(Memory Avaliable Definition) 분석모듈(142c)을 포함한다. Accordingly, the static analysis unit 142, as shown in FIG. 3 , the reaching-def analysis module 142a, the availability-def analysis module 142b, the memory availability-def (Memory) Avaliable Definition) analysis module 142c is included.

리칭-데프 분석모듈(142a)은 크래시들을 발생시킨 명령어들의 도달지점들을 식별한다. The reaching-def analysis module 142a identifies the arrival points of the instructions that caused the crashes.

Reaching-Definition 분석은 변수를 정의하는 명령어가 어디까지 영향을 줄 수 있는가, 즉 어디까지 도달하는가를 알아내는 것이 목적이다. Reaching-Definition 분석의 도메인은 레지스터를 새롭게 갱신하는 명령어의 집합이며 이 분석을 수행하기 위해서는 두 개의 집합 gen과 kill이 필요하다. 집합 gen은 해당 basic block에서 레지스터를 새롭게 정의하는 명령어의 모임이며, 집합 kill은 해당 basic block에서 정의된 레지스터를 정의하는 다른 모든 명령어의 모임이며 이때 같은 basic block내에서 동일한 레지스터를 정의하는 명령어가 있으면 수행 순서에 따라 하위명령어를 기준으로 계산한다. 집합 in은 레지스터를 갱신하는 명령어 중에서 해당 basic block의 명령어들을 수행하기 시작하는 시점까지 다른 명령어에 의해 영향을 받지 않고 도달한 명령어의 모임이며, 집합 out은 해당 basic block의 명령어들을 모두 수행한 시점까지 도달한 갱신 명령어의 모임이다. 집합 in과 out을 구하기 위해서는 각 basic block의 부모 basic block의 집합 out을 으로 초기화하고 아래 수학식 1을 반복적으로 적용한다. The purpose of Reaching-Definition analysis is to find out how far the command that defines the variable can affect, that is, how far it reaches. The domain of Reaching-Definition analysis is a set of instructions that update registers, and two sets of gen and kill are required to perform this analysis. The set gen is a set of instructions that newly define a register in the basic block, and the set kill is a set of all other instructions that define a register defined in the basic block. At this time, if there is an instruction defining the same register in the same basic block, It is calculated based on the subcommand according to the execution order. The set in is a group of instructions that arrive without being affected by other instructions up to the point in time when the instructions of the corresponding basic block among the instructions that update the registers are started to be executed, and the set out is the time when all the instructions of the basic block are executed The set of update commands reached. To obtain the sets in and out, the set out of the parent basic block of each basic block is initialized to , and Equation 1 below is repeatedly applied.

[수학식 1][Equation 1]

여기서, B는 basic block,

는 집합 out,

는 전달함수로,

를 의미하고,

는 집합 in를 의미할 수 있다. where B is the basic block,

is set out,

is the transfer function,

means,

may mean a set in.

수학식 1과 같이 Reaching Definition는 전달함수와 방정식으로 기반으로 만들어진 분석 알고리즘일 수 있다. 여기서 kill은 각 명령어마다 고정되는 상수집합이다. 그래서 수학식 1의 분석을 시작하기 전에 각 명령어에 대한 집합 kill이 계산되어 있어야 한다. 하지만 바이너리를 대상으로 이 집합 kill을 구할 때에는 메모리에 관하여 고려해야 할 사항이 있다. 바이너리에서는 메모리를 접근할 때 레지스터를 가지고 간접적으로 접근하던지, 상수값을 가지고 직접적으로 접근한다.As shown in Equation 1, the Reaching Definition may be an analysis algorithm made based on a transfer function and an equation. where kill is a set of constants that are fixed for each command. So, before starting the analysis of Equation 1, the set kill for each command must be calculated. However, there are some things to consider about memory when looking for this set kill for binaries. In binary, when accessing memory, it is either indirectly accessed with registers or directly accessed with constant values.

사용 가능성-데프 분석모듈(142b)은 크래시들을 발생시킨 명령어들의 사용가능성(available)을 식별한다.The availability-def analysis module 142b identifies the availability of the instructions that caused the crashes.

Avaliable Definition 분석은 Reaching definition과 유사하나, 집합 in과 out을 구하기 위해서는 각 basic block의 부모 basic block의 집합 out을 Uuniverse로 초기화하고, 아래 수학식 2를 반복적으로 적용한다.Avaliable definition analysis is similar to reaching definition, but in order to obtain sets in and out, the set out of the parent basic block of each basic block is initialized to Uuniverse, and Equation 2 below is repeatedly applied.

[수학식 2][Equation 2]

Avaliable Definition 분석은 Reaching definition 분석에서 미트 연산자(meet operator)로 사용한 합집합 대신, 교집합(∩)을 이용하여 In을 얻게 된다. Avaliable definition analysis obtains In by using intersection (∩) instead of union used as meet operator in Reaching definition analysis.

이러한 Avaliable Definition 분석에 따라 전반적인 분석 결과로 오염된 데이터가 프로그램의 경로에 관계없이 확실하게 전파되는지 확인할 수 있다.According to this Avaliable Definition analysis, as a result of the overall analysis, it can be confirmed whether the contaminated data is reliably propagated regardless of the program path.

Reaching-Definition과 Avaliable Definition에 대해 도 4의 예시를 참조하여 설명하기로 한다. Reaching-Definition and Avaliable Definition will be described with reference to the example of FIG. 4 .

도 4의 (a)의 코드를 CFG로 바꾸면 (b)와 같을 수 있다. (b)에 대해 Reaching Definition을 수행하면 (c)와 같을 수 있다. (c)를 참조하면, 3번 베이직 블록에서 변수 b가 null 로 새로운 값이 정의되지만, Exit 베이직 블록에서는 2와 3의 베이직 블록의 out이 합집합된 결과가 In으로 들어오게 된다. 즉, 1에서 정의된 b의 정의가 Exit 베이직 블록에서도 유효하다고 보게 된다.Changing the code in (a) of FIG. 4 to CFG may be the same as (b). If Reaching Definition is performed for (b), it can be the same as (c). Referring to (c), the new value of variable b is defined as null in the 3rd basic block, but in the Exit basic block, the result of the union of outs of the basic blocks of 2 and 3 is entered as In. That is, it is seen that the definition of b defined in 1 is also valid in the Exit basic block.

따라서, Reaching-Definition 분석은 프로그램의 path에 따라 유효하지 않은 정의가 도달 가능하다고 분석한 결과가 나올 수 있다.Therefore, the Reaching-Definition analysis may result in the analysis that an invalid definition is reachable depending on the path of the program.

(b)에 대해 Available definition을 수행하면 (d)와 같을 수 있다. Reaching definition과는 반대로 Exit 베이직 블록에서 2와 3의 베이직 블록의 out을 교집합으로 In을 얻게 된다. 따라서 path에 상관없이 명확한(explicit)한 데이터의 전파만을 보기 때문에 한 단계 더 높은 수준의 경고를 할 수 있게 함을 알 수 있다. If Available definition is performed for (b), it can be the same as (d). Contrary to the Reaching definition, In is obtained by the intersection of the outs of the basic blocks of 2 and 3 in the Exit basic block. Therefore, it can be seen that a higher level warning can be issued because only explicit data propagation is seen regardless of the path.

메모리 사용가능성-데프 분석모듈(142c)은 메모리를 레지스터처럼 보고 kill한다. 리칭-데프 분석모듈(142a)과 사용 가능성-데프 분석모듈(142b)은 메모리에 대해서 kill하지 않으므로, 메모리를 kill할 필요가 있다. 이에 메모리 사용가능성-데프 분석모듈(142c)은 메모리를 레지스터처럼 보고 kill한다. The memory availability-def analysis module 142c sees the memory as a register and kills it. Since the reaching-def analysis module 142a and the availability-def analysis module 142b do not kill the memory, it is necessary to kill the memory. Accordingly, the memory availability-def analysis module 142c sees the memory as a register and kills it.

Available Definition은 레지스터에 대해서만 kill을 하게 된다. 두 개의 레지스터가 하나의 메모리 공간을 가르켜도 다른 레지스터이기 때문에 서로 kill 하지 않는다. 이렇게 kill 되지 않은 레지스터는 후에 분석의 부정확함을 초래할 수 있다. 하지만 Memory Available Definition은 레지스터가 메모리를 가르킬 경우 같은 메모리를 가르키는 레지스터 또한 kill 한다. 앞서 언급한 예처럼 두 개의 레지스터가 하나의 메모리 공간을 가르키고 하나의 레지스터를 통해 그 메모리 공간에 새로운 값을 쓸 경우, 다른 레지스터 또한 kill 하게 된다. Available Definition kills only registers. Even if two registers point to one memory space, they do not kill each other because they are different registers. Registers that are not killed in this way can lead to inaccuracies in later analysis. However, Memory Available Definition kills registers pointing to the same memory when a register points to memory. As in the previous example, if two registers point to one memory space and a new value is written to that memory space through one register, the other registers are also killed.

상술한 Reaching-Definition, Avaliable Definition, Memory Avaliable Definition 각각의 특징은 도 5에 도시된 바와 같을 수 있다. Each of the above-described Reaching-Definition, Avaliable Definition, and Memory Avaliable Definition features may be as shown in FIG. 5 .

다시 도 2를 참조하면, 익스플로이터블 포인트 분석부(144)는 정적 분석부(142)에서 식별된 도달지점들과 사용가능성에 기초하여 프로그램의 제어권을 옮길 수 있는 명령어를 식별한다. 즉, 익스플로이터블 포인터 분석부(144)는 도달지점들 및 사용가능성들 중 당 명령어가 실제로 사용되는 지점을 찾아낸 후, 그 결과를 사용 지점 그래프로 도출한다. 그런 후, 익스플로이터블 포인터 분석부(144)는 사용 지점 그래프로부터 프로그램의 제어권을 옮길 수 있는 명령어를 식별한다. Referring back to FIG. 2 , the exploitable point analysis unit 144 identifies a command capable of transferring control of a program based on arrival points and availability identified by the static analysis unit 142 . That is, the exploitable pointer analyzer 144 finds a point where the instruction is actually used among the arrival points and availability, and then derives the result as a point of use graph. Then, the exploitable pointer analysis unit 144 identifies an instruction capable of transferring the control right of the program from the point of use graph.

위험도 분석부(146)는 익스플로이터블 포인터 분석부(144)에서 식별된 크래시를 발생시킨 명령어가 저장(store) 계열 명령어이거나, 분기(branch) 계열 명령어인 경우 공격 가능성을 분석하고, 그 분석 결과에 기초하여 위험도의 단계로 분류한다. 이때, 위험도 분석부(146)는 도 6에 도시된 기준에 따라 SE(Strongly Exploitable), E(Exploitable), PE(Probably Exploitable), NE(Not Exploitable)로 위험도의 단계를 분류한다. 분기 계열의 명령어는 말 그대로 실행할 명령어를 가리키고 있는 레지스터의 값을 변경하여 실행 흐름을 변경 할 수 있기 때문에 제어권을 옮길 수 있다. 만약 이때 제어권을 옮기려는 목적 주소(Target Address)가 Crash Point로부터 영향을 받았다면 해당 Crash는 공격 가능하다고 할 수 있다. 여기서, 크래시 포인트는 Crash를 유발시킨 명령어를 의미할 수 있다. 그리고 저장 계열 명령어는 직접 실행할 명령어를 가리키고 있는 레지스터의 값을 변경할 수는 없지만, 함수의 Return 주소나 IAT(Import Address Table), EAT(Export Address Table)에 있는 함수의 주소 등을 변경하여 제어권을 옮길 수 있다. 이러한 공격 가능성을 가진 명령어는 아래 표 1과 같을 수 있다.The risk analysis unit 146 analyzes the attack potential when the command causing the crash identified by the exploitable pointer analysis unit 144 is a store series command or a branch series command, and the analysis result based on the level of risk. At this time, the risk analysis unit 146 classifies the level of risk into SE (Strongly Exploitable), E (Exploitable), PE (Probably Exploitable), and NE (Not Exploitable) according to the criteria shown in FIG. 6 . Since the branch instruction can change the execution flow by changing the value of the register pointing to the instruction to be executed literally, control can be transferred. If the target address to which the control is transferred is affected by the crash point, the crash can be said to be attackable. Here, the crash point may mean a command that caused the crash. In addition, the storage instruction cannot change the value of the register pointing to the instruction to be executed directly, but the control can be transferred by changing the return address of the function or the address of the function in the IAT (Import Address Table) or EAT (Export Address Table). can Commands with such attack potential may be shown in Table 1 below.

[표 1][Table 1]

구체적으로, 위험도 분석부(146)는 프로그램의 제어권을 옮길 수 있는 명령어가 분기(branch) 계열 명령어이고, 분기 계열 명령어의 목적 주소(Target Address)가 크래시를 발생시킨 명령어로부터 영향을 받은 경우, 해당 크래시를 공격 가능한 것으로 결정한다. 이때, 위험도 분석부(146)는 해당 명령어가 도달지점들에 기초하여 식별된 분기 계열 명령어인 경우, 위험도를 E(exploitable) 단계로 분류하고, 해당 명령어가 사용가능성에 기초하여 식별된 분기 계열 명령어인 경우 위험도를 SE(Strongly Exploitable) 단계로 분류한다. Specifically, the risk analysis unit 146 is a branch-series instruction that can transfer control of the program, and the target address of the branch-series instruction is affected by the instruction that caused the crash. Determines the crash as attackable. At this time, when the corresponding instruction is a branch-based instruction identified based on the arrival points, the risk analysis unit 146 classifies the risk into an E (exploitable) stage, and the branch-based instruction identified based on the availability of the instruction. In case of , the risk is classified as SE (Strongly Exploitable) level.

또한, 위험도 분석부(146)는 프로그램의 제어권을 옮길 수 있는 명령어가 저장 계열 명령어이고, 저장 계열 명령어의 저장할 데이터를 가진 레지스터 또는 저장할 메모리 위치가 크래시를 발생시킨 명령어로부터 영향을 받은 경우, 해당 크래시를 공격 가능한 것으로 결정한다. 이때, 위험도 분석부(146)는 해당 명령어가 도달지점들에 기초하여 식별된 저장 계열 명령어인 경우, 위험도를 E(exploitable) 또는 PE(probably exploitable)의 단계로 분류하고, 해당 명령어가 사용가능성에 기초하여 식별된 저장 계열 명령어인 경우 위험도를 SE(Strongly Exploitable) 단계로 분류한다. 즉, 위험도 분석부(146)는 명령어가 도달지점들에 기초하여 식별된 저장 계열 명령어이고, 저장 계열 명령어의 저장할 데이터를 가진 레지스터만이 크래시를 발생시킨 명령어로부터 영향을 받은 경우 또는 저장 계열 명령어의 저장할 메모리 위치만이 크래시를 발생시킨 명령어로부터 영향을 받은 경우, 위험도를 PE(probably exploitable)의 단계로 분류하고, 저장 계열 명령어의 저장할 데이터를 가진 레지스터와 저장할 메모리 위치가 크래시를 발생시킨 명령어로부터 영향을 받은 경우 위험도를 E(exploitable) 단계로 분류한다. In addition, the risk analysis unit 146 is a storage-type instruction that can transfer control of the program, and when the register having data to be stored or the memory location to be stored of the storage-type instruction is affected by the instruction that caused the crash, the corresponding crash is determined to be attackable. At this time, the risk analysis unit 146 classifies the risk into a level of E (exploitable) or PE (probably exploitable), when the corresponding command is a stored command identified based on the arrival points, and the corresponding command is based on usability. In the case of a storage sequence command identified based on the classification, the degree of risk is classified into a strongly exploitable (SE) level. That is, the risk analysis unit 146 is a storage sequence command identified based on the arrival points of the command, and only the register having data to be stored of the storage sequence instruction is affected by the instruction that caused the crash or the storage sequence instruction. If only the memory location to be stored is affected by the instruction that caused the crash, the risk is classified as a level of PE (probably exploitable), and the register with the data to be saved and the memory location to be saved in the store instruction are affected by the instruction that caused the crash. If received, the risk is classified as E (exploitable) level.

또한, 위험도 분석부(146)는 명령어가 사용가능성에 기초하여 식별된 저장 계열 명령어이고, 저장 계열 명령어의 저장할 데이터를 가진 레지스터와 저장할 메모리 위치가 크래시를 발생시킨 명령어로부터 영향을 받은 경우 위험도를 SE(Strongly Exploitable) 단계로 분류한다. In addition, the risk analysis unit 146 determines the risk if the instruction is a storage series instruction identified based on availability, and a register having data to store and a memory location to store of the storage series instruction are affected by the instruction that caused the crash. (Strongly Exploitable).

도 7은 본 발명의 일 실시 예에 따른 크래시 위험도 분석 방법에 대한 개략적인 처리 흐름도이다. 7 is a schematic flowchart of a crash risk analysis method according to an embodiment of the present invention.

도 7을 참조하면, 디스어셈블러(110)가 바이너리 프로그램을 디스어셈블링하여 기계어로 변환한다(S710). 특히, 디스어셈블러(110)는 크래시 위험보 분석 대상인 바이너리 프로그램(binary program)(예컨대, ARM 바이너리 프로그램)을 디스어셈블링(disassembling)하여 기계어로 변환한다.Referring to FIG. 7 , the disassembler 110 disassembles a binary program and converts it into machine language ( S710 ). In particular, the disassembler 110 disassembles a binary program (eg, an ARM binary program) that is a crash risk analysis target and converts it into a machine language.

S710이 수행되면, 디스어셈블러(110)가 상기 기계어로부터 제어 흐름 정보를 획득한다(S720). 이때, 제어 흐름 정보는 이후에 실시되는 위험도 분석 단계(S500)에서 각 명령어들의 도달 지점이 어디까지인지, 또는 실제 사용 지점이 어느 지점인지를 도출하기 위한 참고 자료로 사용될 수 있다.When S710 is performed, the disassembler 110 obtains control flow information from the machine language (S720). In this case, the control flow information may be used as a reference for deriving where the arrival point of each command is or where the actual use point is in the risk analysis step ( S500 ) performed later.

단계 S720이 수행되면, 중간 언어 변환기(120)가 상기 기계어를 중간언어로 변환한다(S730). 예를 들어, 중간 언어 변환기(120)는 상기 기계어를 REIL(Reverse Engineering Intermediate Language)라는 중간 언어로 변환한다. 이때, 상기와 같이 기계어를 중간 언어로 변환하는 것은 과다하게 많은 명령어의 수를 줄여서 정적 분석의 편의를 제공하기 위함이다. 예를 들어, 약 300개의 ARM 명령어가 입력된 경우 이를 REIL 이라는 중간 언어로 변환하여 사용할 경우 17개로 줄일 수 있다.When step S720 is performed, the intermediate language converter 120 converts the machine language into an intermediate language (S730). For example, the intermediate language converter 120 converts the machine language into an intermediate language called Reverse Engineering Intermediate Language (REIL). In this case, the conversion of the machine language to the intermediate language as described above is to reduce the number of excessively large commands to provide convenience of static analysis. For example, if about 300 ARM instructions are input, it can be reduced to 17 when converted into an intermediate language called REIL.

단계 S730이 수행되면, 크래시 발생기(130)가 고의로 크래시를 발생시킨다(S740). 이를 위해, 크래시 발생기(130)는 다수의 크래시를 고의로 발생시키기 위해 사전에 제작된 크래시 유발 프로그램을 내장하고, 그 크래시 유발 프로그램(예컨대, Fuzzer 프로그램)에 의해 다수의 크래시들을 발생시킨다. 이는 일반적으로 해커들이 퍼저(Fuzzer)와 같은 크래시 유발 프로그램을 이용하여 대상 프로그램에 많은 크래시(crach)를 발생시키고, 그 크래시들을 직접 하나씩 분석한 후 공격 가능성이 있는 경우에 해당 취약점에 맞는 공격 코드를 작성해서 공격하는 공격 패턴을 역 이용하기 위한 것이다. 즉, 상기와 같이 고의로 발생된 크래시들 각각에 대한 공격 가능성(즉, 위험도)을 자동으로 분석하기 위함이다.When step S730 is performed, the crash generator 130 intentionally causes a crash (S740). To this end, the crash generator 130 contains a pre-made crash-inducing program to intentionally generate a plurality of crashes, and generates a plurality of crashes by the crash-inducing program (eg, a Fuzzer program). In general, hackers use a crash-inducing program such as a fuzzer to generate many crashes in the target program, analyze the crashes one by one, and, if there is a possibility of an attack, find an attack code suitable for the vulnerability. This is to reverse exploit the attack pattern that is written and attacked. That is, it is to automatically analyze the attack potential (ie, the degree of risk) for each of the crashes intentionally generated as described above.

단계 S740이 수행되면, 크래시 위험도 분석기(140)가 S730 단계에서 중간 언어로 변환된 명령어들과, S740 단계에서 발생된 크래시 정보에 기초하여 해당 크래시의 위험도를 분석한다(S750). 즉, 크래시 위험도 분석기(140)는 중간언어로 변환된 명령어들 중 크래시들을 발생시킨 명령어들 각각에 의해 영향을 받는 모든 명령어를 정적으로 분석하여 해당 크래시의 위험도를 분석한다.When step S740 is performed, the crash risk analyzer 140 analyzes the risk of the corresponding crash based on the commands converted into the intermediate language in step S730 and the crash information generated in step S740 ( S750 ). That is, the crash risk analyzer 140 statically analyzes all commands that are affected by each of the crash-causing commands among the commands converted into the intermediate language to analyze the crash risk.

위험도 분석기가 크래시의 위험도를 분석하는 방법에 대한 상세한 설명은 도 8을 참조하기로 한다.A detailed description of how the risk analyzer analyzes the crash risk will be described with reference to FIG. 8 .

도 8은 본 발명의 일 실시예에 따른 위험도의 단계를 판단하는 방법을 설명하기 위한 흐름도이다.8 is a flowchart illustrating a method of determining a level of risk according to an embodiment of the present invention.

도 8을 참조하면, 위험도 분석기는 크래시들을 발생시킨 명령어들 각각에 대하여 도달지점들과 사용가능성(available)을 식별하고(S802), 도달지점들과 사용가능성에 기초하여 프로그램의 제어권을 옮길 수 있는 명령어를 식별한다(S804).Referring to FIG. 8, the risk analyzer identifies arrival points and availability for each of the instructions that caused crashes (S802), and based on the arrival points and availability, the control of the program can be transferred. Identifies a command (S804).

단계 S804가 수행되면, 위험도 분석기는 식별된 명령어가 저장(store) 계열 명령어인지를 판단한다(S806).When step S804 is performed, the risk analyzer determines whether the identified command is a store series command (S806).

단계 S806의 판단결과, 저장 계열 명령어인 경우, 위험도 분석기는 저장 계열 명령어의 저장할 데이터를 가진 레지스터와 저장할 메모리 위치 모두가 크래시를 발생시킨 명령어로부터 영향을 받는지를 판단한다(S808).As a result of the determination in step S806, in the case of a storage instruction, the risk analyzer determines whether both the register having the data to be stored and the memory location to be stored of the storage instruction are affected by the instruction causing the crash (S808).

단계 S808의 판단결과 저장할 데이터를 가진 레지스터와 저장할 메모리 위치 모두에 영향을 받으면, 위험도 분석기는 해당 저장 계열 명령어가 도달 지점에 기초하여 식별된 명령어인지를 판단한다(S810).If both the register having the data to be stored and the memory location to be stored are affected as a result of the determination in step S808, the risk analyzer determines whether the corresponding storage sequence command is an identified command based on the arrival point (S810).

단계 S810의 판단결과, 해당 저장 계열 명령어가 도달 지점에 기초하여 식별된 명령어인 경우, 위험도 분석기는 위험도를 E(exploitable) 단계로 분류한다(S812).As a result of the determination in step S810, if the corresponding stored sequence command is a command identified based on the arrival point, the risk analyzer classifies the risk into the E (exploitable) step (S812).

만약, 단계 S810의 판단결과, 해당 저장 계열 명령어가 도달 지점에 기초하여 식별된 명령어가 아니면, 위험도 분석기는 사용가능성에 기초하여 식별된 명령어인지를 판단한다(S814).If, as a result of the determination of step S810, if the corresponding storage series command is not a command identified based on the arrival point, the risk analyzer determines whether the command is identified based on availability (S814).

단계 S814의 판단결과, 해당 저장 계열 명령어가 사용가능성에 기초하여 식별된 명령어이면, 위험도 분석기는 위험도를 SE(Strongly Exploitable) 단계로 분류한다(S816).As a result of the determination in step S814, if the corresponding stored sequence command is a command identified based on availability, the risk analyzer classifies the risk into a strongly exploitable (SE) step (S816).

만약, 단계 S808의 판단결과 저장할 데이터를 가진 레지스터와 저장할 메모리 위치 모두에 영향을 받지 않으면, 위험도 분석기는 둘중 하나에 영향을 받는지를 판단한다(S818).If, as a result of the determination of step S808, neither the register having the data to be stored nor the memory location to be stored are affected, the risk analyzer determines whether one of them is affected (S818).

단계 S818의 판단결과, 저장할 데이터를 가진 레지스터 또는 저장할 메모리 위치 중 하나에 의해 영향을 받으면, 위험도 분석기는 위험도를 PE(probably exploitable)의 단계로 분류하고(S820), 영향을 받지 않으면 위험도를 NE(not exploitable)의 단계로 분류한다(S822).As a result of the determination of step S818, if it is affected by either the register having the data to be stored or the memory location to be stored, the risk analyzer classifies the risk as a level of PE (probably exploitable) (S820), and if not affected, sets the risk to NE ( not exploitable) (S822).

만약, 단계 S806의 판단결과, 저장 계열 명령어가 아니면, 위험도 분석기는 분기 계열 명령어인지를 판단한다(S824). If, as a result of the determination of step S806, it is not a storage series instruction, the risk analyzer determines whether it is a branch series instruction (S824).

단계 S824의 판단결과 분기 계열 명령어이면, 위험도 분석기는 분기 계열 명령어의 목적 주소(Target Address)가 크래시를 발생시킨 명령어로부터 영향을 받는지를 판단한다(S826).If the determination result of step S824 is a branched instruction, the risk analyzer determines whether the target address of the branched instruction is affected by the instruction that caused the crash (S826).

단계 S826의 판단결과, 분기 계열 명령어의 목적 주소가 영향을 받으면, 위험도 분석기는 해당 분기 계열 명령어가 도달 지점에 기초하여 식별된 명령어인지를 판단한다(S828).As a result of the determination in step S826, if the destination address of the branch instruction is affected, the risk analyzer determines whether the corresponding branch instruction is an instruction identified based on the arrival point (S828).

단계 S828의 판단결과, 해당 분기 계열 명령어가 도달 지점에 기초하여 식별된 명령어인 경우, 위험도 분석기는 위험도를 E(exploitable) 단계로 분류한다(S830).As a result of the determination in step S828, if the branch-series instruction is an instruction identified based on the arrival point, the risk analyzer classifies the risk into the E (exploitable) stage (S830).

만약, 단계 S828의 판단결과, 해당 분기 계열 명령어가 도달 지점에 기초하여 식별된 명령어가 아니면, 위험도 분석기는 해당 분기 계열 명령어가 사용 가능성에 기초하여 식별된 명령어인지를 판단한다(S832).If it is determined in step S828 that the branch instruction is not an instruction identified based on the arrival point, the risk analyzer determines whether the branch instruction is an instruction identified based on availability (S832).

단계 S832의 판단결과, 사용 가능성에 기초하여 식별된 명령어이면, 위험도 분석기는 위험도를 SE(Strongly Exploitable) 단계로 분류한다(S834).As a result of the determination in step S832, if the command is identified based on the usability, the risk analyzer classifies the risk into a strongly exploitable (SE) step (S834).

이제까지 본 발명에 대하여 그 바람직한 실시예들을 중심으로 살펴보았다. 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.So far, the present invention has been looked at with respect to preferred embodiments thereof. Those of ordinary skill in the art to which the present invention pertains will understand that the present invention can be implemented in modified forms without departing from the essential characteristics of the present invention. Therefore, the disclosed embodiments are to be considered in an illustrative rather than a restrictive sense. The scope of the present invention is indicated in the claims rather than the foregoing description, and all differences within the scope equivalent thereto should be construed as being included in the present invention.

100 : 크래시 위험도 분석 장치
110 : 디스어셈블러
120 : 중간언어 변환기
130 : 크래시 발생기
140 : 위험도 분석기
142 : 정적 분석부
144 : 익스플로이터블 포인트 분석부
146 : 위험도 분석부100: crash risk analysis device
110 : disassembler
120: Intermediate language converter
130: crash generator
140: risk analyzer
142: static analysis unit
144: exploitable point analysis unit
146: risk analysis unit

Claims

By performing taint analysis through the execution of the binary to be analyzed, static analysis of all instructions affected by each of the instructions causing a crash in the binary, the risk of the corresponding crash including a risk analyzer to analyze the
The risk analyzer is
for each of the instructions that caused the crashes, identify points of arrival and availability of the instructions, and based on the points of arrival and availability, identify an instruction capable of transferring control of the program based on the points of arrival and availability; If the command is a store-type instruction or a branch-type instruction, the attack potential is analyzed, and based on the analysis result, it is classified into a level of risk,
The level of risk is
Including SE (Strongly Exploitable), E (Exploitable), PE (Probably Exploitable), NE (Not Exploitable),
The risk analyzer is
When the instruction that can transfer control of the program is a storage instruction, and the register having data to be stored or the memory location to be stored of the storage instruction is affected by the instruction that caused the crash, the risk is evaluated in a strongly exploitable (SE) step. , and determine that the crash is attackable, but
When the corresponding instruction is a storage-type instruction identified based on arrival points, the risk is classified as E (exploitable) or PE (probably exploitable) level, and when the corresponding instruction is a storage-type instruction identified based on availability Crash risk analysis device, characterized in that the risk is classified into SE (Strongly Exploitable) stage.

According to claim 1,
Crash risk analysis apparatus, characterized in that the identification of the arrival points is obtained by using the following equation, but using the union as a meet operator (meet operator).
[Equation]

where B is the basic block,

is set out,

is the transfer function,

means,

means the set in.

According to claim 1,
The usability is identified using the following equation, but Crash risk analysis apparatus, characterized in that In is obtained by using the intersection (∩) as a meet operator (meet operator).
[Equation]

delete

According to claim 1,
The risk analyzer is
If the instruction capable of transferring the control of the program is a branch instruction, and the target address of the branch instruction is affected by the instruction that caused the crash, it is determined that the crash is attackable. ,
When the corresponding instruction is a branch-based instruction identified based on arrival points, the risk is classified as E (exploitable) level, and when the corresponding instruction is a branch-based instruction identified based on availability, the risk is set to SE (Strongly Exploitable) Crash risk analysis device characterized in that it is classified into stages.

delete

According to claim 1,
The risk analyzer is
The instruction is a storage series instruction identified based on arrival points,
When only the register having data to be stored of the storage instruction is affected by the instruction that caused the crash, or when only the memory location to store of the storage instruction is affected by the instruction that caused the crash, the risk is set to PE ( Classified as a stage of probably exploitable,
Crash risk analysis apparatus, characterized in that when the register having the data to be stored and the memory location to be stored of the storage series instruction are affected by the instruction causing the crash, the risk is classified into an E (exploitable) stage.

delete

By performing taint analysis through the execution of the binary to be analyzed, static analysis of all instructions affected by each of the instructions that caused a crash with respect to the binary, a static analysis unit for identifying arrival points and availability of the commands for each of the generated commands;
an exploitable point analysis unit for identifying a command capable of transferring control of a program based on the identified arrival points and availability; and
If the identified instruction is a store series instruction or a branch series instruction, a risk analysis unit that analyzes the attack potential and classifies it into a level of risk based on the analysis result
including,
The level of risk is
Including SE (Strongly Exploitable), E (Exploitable), PE (Probably Exploitable), NE (Not Exploitable),
The risk analysis unit,
When the instruction that can transfer control of the program is a storage instruction, and the register having data to be stored or the memory location to be stored of the storage instruction is affected by the instruction that caused the crash, the risk is evaluated in a strongly exploitable (SE) step. , and determine that the crash is attackable, but
When the corresponding instruction is a storage-type instruction identified based on arrival points, the risk is classified as E (exploitable) or PE (probably exploitable) level, and when the corresponding instruction is a storage-type instruction identified based on availability Crash risk analysis device that classifies risk into SE (Strongly Exploitable) level.

10. The method of claim 9,
a Reaching Definition analysis module that identifies the arrival points of the instructions that caused the crashes;
an availability-def analysis module that identifies the availability of instructions that caused crashes; and
A crash risk analysis device comprising a memory availability-def analysis module that sees and kills the memory like a register.

11. The method of claim 10,
The reaching-def analysis module,
Crash risk analysis device, characterized in that the arrival points are identified using the following equation, but In is obtained by using a union as a meet operator.
[Equation]

where B is the basic block,

is set out,

is the transfer function,

means,

means the set in.

11. The method of claim 10,
The usability-def analysis module,
Crash risk analysis apparatus, characterized in that the use is identified using the following equation, but In is obtained by using the intersection (∩) as a meet operator.
[Equation]

11. The method of claim 10,
The risk analysis unit,
A crash risk analysis device, characterized in that the level of risk is classified into at least one of SE (Strongly Exploitable), E (Exploitable), PE (Probably Exploitable), and NE (Not Exploitable).

In the method of the crash risk analysis device analyzing the crash risk,
performing taint analysis through execution of a binary to be analyzed, and identifying arrival points and availability for each of the instructions that caused crashes with respect to the binary;
identifying an instruction capable of transferring control of the program based on the arrival points and availability;
When the identified instruction is a store series instruction or a branch series instruction, analyzing the attack potential and classifying the attack potential into a level of risk based on the analysis result
including,
The level of risk is
Including SE (Strongly Exploitable), E (Exploitable), PE (Probably Exploitable), NE (Not Exploitable),
The step of classifying the risk level is:
When the instruction that can transfer control of the program is a storage instruction, and the register having data to be stored or the memory location to be stored of the storage instruction is affected by the instruction that caused the crash, the risk is evaluated in a strongly exploitable (SE) step. , and determine that the crash is attackable, but
When the corresponding instruction is a storage-type instruction identified based on arrival points, the risk is classified as E (exploitable) or PE (probably exploitable) level, and when the corresponding instruction is a storage-type instruction identified based on availability Crash risk analysis method that classifies risk into SE (Strongly Exploitable) level.