KR102430337B1

KR102430337B1 - Source Code Reconstruction and Optimization Method and Device Therefor

Info

Publication number: KR102430337B1
Application number: KR1020200120537A
Authority: KR
Inventors: 이영비; 석재혁; 이재융; 이동훈; 김현숙; 신임섭
Original assignee: 국방과학연구소; 고려대학교 산학협력단
Priority date: 2020-09-18
Filing date: 2020-09-18
Publication date: 2022-08-08
Also published as: KR20220037721A

Abstract

본 발명은 소스코드 재건 및 최적화 방법으로서, 난독화 기술이 적용된 소스코드 구조를 확인하는 단계; 확인된 소스코드 구조의 내부에 적재된 코드 블록을 실행 순서에 따라 재건하는 단계; 및 재건된 코드 블록의 제어흐름에 따라 소스코드 최적화를 수행하는 단계를 포함하고, 이를 위한 소스코드 재건 및 최적화 장치를 제공한다.The present invention provides a method for reconstructing and optimizing source code, comprising: checking a source code structure to which an obfuscation technique is applied; reconstructing the code blocks loaded in the identified source code structure according to the execution order; and performing source code optimization according to the control flow of the reconstructed code block, and provides a source code reconstruction and optimization apparatus for this.

Description

Source Code Reconstruction and Optimization Method and Device Therefor

본 발명은 소스코드 재건 및 최적화 방법 및 이를 위한 장치에 관한 발명으로, 보다 구체적으로는, 소스코드 레벨에서 제어흐름 난독화 기술이 적용된 경우, 이를 컴파일하기 전에 재건 및 최적화를 수행하는 소스코드 재건 및 최적화 방법 및 장치에 관한 것이다.The present invention relates to a source code reconstruction and optimization method and an apparatus therefor, and more specifically, when a control flow obfuscation technique is applied at the source code level, the source code reconstruction and optimization are performed before compiling it. It relates to an optimization method and apparatus.

난독화 기술은 소프트웨어의 지적재산권을 보호하기 위하여 소프트웨어의 기능성은 그대로 유지하면서 자료구조, 제어흐름 등을 변형하는 기법이다. 이러한 난독화 기술은 컴파일이 수행되기 전인 소스코드 레벨에 적용될 수도 있고 컴파일이 수행된 후인 바이너리 레벨에 적용될 수도 있다. 컴파일러에서 컴파일을 수행하는 과정에서 소스코드에 최적화 기법을 적용하게 되는데, 해당 최적화 기법이 적용되면 난독 변환된 부분이 제거된다. 따라서, 소스코드 레벨에서 난독화가 이루어지는 경우 난독화 기술에 의한 보호 효과가 크게 감소할 수 있다. 이 때문에, 일반적으로 컴파일이 수행된 후인 바이너리 레벨에 난독화 기술을 적용하는 것이 선호된다. 따라서, 소프트웨어 제작사에서는 자사에서 제작한 소프트웨어를 보호하기 위하여 난독화 기술을 적용할 때, 소스코드를 제공하여야 하는 경우가 아니라면 바이너리 레벨에서 난독화 기술을 적용하여 배포하였다. 그리고 이러한 소스코드 레벨에서 난독화의 비선호도는 소스코드 레벨에서의 난독화 기술이 많이 연구되지 않는 이유 중 하나이다.Obfuscation technology is a technique of modifying data structure and control flow while maintaining software functionality in order to protect intellectual property rights of software. This obfuscation technique may be applied to the source code level before compilation is performed, or may be applied to the binary level after compilation is performed. In the process of compiling in the compiler, an optimization technique is applied to the source code, and when the optimization technique is applied, the obfuscated part is removed. Therefore, when obfuscation is performed at the source code level, the protection effect by the obfuscation technique may be greatly reduced. For this reason, it is generally preferred to apply the obfuscation technique at the binary level, after compilation is performed. Therefore, when the software manufacturer applies the obfuscation technology to protect the software produced by the company, unless it is necessary to provide the source code, the obfuscation technology is applied and distributed at the binary level. And the dislike of obfuscation at the source code level is one of the reasons why obfuscation techniques at the source code level are not studied much.

난독화 기술은 구획 난독화(변수명/함수명 변경, 주석 제거 등), 데이터 난독화(변수값 변형, 자료구조 변형 등), 제어흐름 난독화(제어흐름 변형, 불필요한 코드 삽입 등) 및 방지 난독화(디버깅 탐지, 가상머신 탐지 등)의 4개 범주로 분류될 수 있다. 종래의 소스코드 레벨에서의 상용 난독화 도구는 주로 변수명/함수명을 변경하거나, 소스코드 내의 주석을 제거하거나, 변수값을 변형하는 수준이었다. 즉, 소스코드 레벨에서는 배포가 목적인 경우가 대부분이어서, 주로 구획 난독화와 데이터 난독화 기술이 적용되었다. 바이너리 레벨에서라면 구획 난독화, 데이터 난독화, 제어흐름 난독화, 그리고 방지 난독화 기술을 모두 적용할 수 있지만, 소스코드 레벨에서는 상세한 주소 정보나 어셈블리어를 가지고 있지 않기 때문에 설계/구현하기 힘든 난독화 기술이 있었다.Obfuscation technology includes partition obfuscation (variable/function name change, comment removal, etc.), data obfuscation (variable value transformation, data structure transformation, etc.), control flow obfuscation (control flow modification, unnecessary code insertion, etc.) and prevention obfuscation It can be classified into four categories of fire (debugging detection, virtual machine detection, etc.). Conventional commercial obfuscation tools at the source code level were mainly to change the variable name/function name, remove comments in the source code, or modify the variable value. In other words, at the source code level, distribution is the goal in most cases, so partition obfuscation and data obfuscation techniques are mainly applied. At the binary level, partition obfuscation, data obfuscation, control flow obfuscation, and prevention obfuscation techniques can all be applied, but at the source code level, it is difficult to design/implement obfuscation because it does not have detailed address information or assembly language. There was technology.

다만, 구획 난독화와 데이터 난독화는 앞서 서술한 것처럼 컴파일러에서 컴파일하는 과정에서 모두 최적화되어 무력화되는 타입의 난독화 기술이다. 따라서 소스코드 레벨에서는 주로 구획 난독화와 데이터 난독화 기술이 적용되므로, 소스코드 레벨에서의 난독화 기술을 역난독화하는 기술로써 최적화 기술이 활용되었다. However, as described above, partition obfuscation and data obfuscation are types of obfuscation techniques that are optimized and disabled in the process of compiling by the compiler. Therefore, since partition obfuscation and data obfuscation techniques are mainly applied at the source code level, the optimization technique is used as a technique to deobfuscate the obfuscation technique at the source code level.

한편, 최근 연구에서는 소스코드 레벨에서 적용 가능한 제어흐름 난독화 기술이 설계 및 구현되어 상용 난독화 도구에 탑재되고 있다. 제어흐름 난독화 기술은 앞선 구획 난독화와 데이터 난독화 기술과는 달리 컴파일러 내부의 최적화기로는 최적화되지 않는다. 이는 제어흐름 난독화 기술이 소스코드의 내부 구조를 강하게 변형시키는데, 해당 변형부를 먼저 재건하여야 최적화를 통한 가독성의 증가를 기대할 수 있다는 것을 의미한다. 따라서 소스코드 레벨에서의 제어흐름 난독화 기술에 대응하기 위해 재건 기술에 대해 연구가 이루어지고 있다.On the other hand, in recent research, control flow obfuscation technology applicable at the source code level has been designed and implemented, and is being loaded into commercial obfuscation tools. Unlike the previous block obfuscation and data obfuscation techniques, the control flow obfuscation technique is not optimized with an optimizer inside the compiler. This means that the control flow obfuscation technology strongly deforms the internal structure of the source code, and the readability can be expected to increase through optimization only by reconstructing the modified part first. Therefore, in order to cope with the control flow obfuscation technology at the source code level, research is being conducted on the reconstruction technology.

본 실시 예가 해결하고자 하는 과제는, 제어흐름 난독화 기술이 적용된 소스코드에 대해 소스코드 레벨에서 제어흐름의 재건을 선행한 후 최적화하여, 컴파일 과정에서 컴파일러 최적화 옵션을 전혀 적용하지 않더라도 매우 높은 최적화율과 가독성을 제공하는 소스코드 재건 및 최적화 방법 및 장치를 제공하는 데 있다.The problem to be solved by this embodiment is to optimize the source code to which the control flow obfuscation technology has been applied, after reconstruction of the control flow at the source code level, and thus a very high optimization rate even if the compiler optimization option is not applied at all during the compilation process. It is to provide a method and apparatus for reconstructing and optimizing source code that provides and readability.

본 실시 예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 이하의 실시 예들로부터 또 다른 기술적 과제들이 유추될 수 있다.The technical problems to be achieved by the present embodiment are not limited to the technical problems described above, and other technical problems may be inferred from the following embodiments.

일 실시 예에 따른 소스코드 재건 및 최적화 방법은, 제어흐름 난독화 기술이 적용된 소스코드 구조를 확인하는 단계; 상기 확인된 소스코드 구조의 내부에 적재된 코드 블록을 실행 순서에 따라 재건(reconstruction)하는 단계; 및 상기 재건된 코드 블록의 제어흐름에 따라 소스코드 최적화(optimization)를 수행하는 단계를 포함할 수 있다.A source code reconstruction and optimization method according to an embodiment includes: checking a source code structure to which a control flow obfuscation technique is applied; reconstructing the code blocks loaded in the identified source code structure according to the execution order; and performing source code optimization according to the control flow of the reconstructed code block.

일 실시 예에 따른 소스코드 재건 및 최적화 장치는, 추출부를 통해, 제어흐름 난독화 기술이 적용된 소스코드 구조를 확인하고, 재건부를 통해, 상기 확인된 소스코드 구조의 내부에 적재된 코드 블록을 실행 순서에 따라 재건하고, 최적화부를 통해, 상기 재건된 코드 블록의 제어흐름에 따라 소스코드 최적화를 수행할 수 있다.The apparatus for reconstructing and optimizing source code according to an embodiment checks the source code structure to which the control flow obfuscation technology is applied through the extraction unit, and executes the code block loaded in the identified source code structure through the reconstruction unit Source code optimization may be performed according to the control flow of the reconstructed code block through the reconstruction in the order and through the optimization unit.

일 실시 예에 따른 소스코드 재건 및 최적화 방법을 컴퓨터에서 실행시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 비일시적 기록매체로서, 상기 소스코드 재건 및 최적화 방법은, 제어흐름 난독화 기술이 적용된 소스코드 구조를 확인하는 단계; 상기 확인된 소스코드 구조의 내부에 적재된 코드 블록을 실행 순서에 따라 재건하는 단계; 및 상기 재건된 코드 블록의 제어흐름에 따라 소스코드 최적화를 수행하는 단계를 포함할 수 있다.As a computer-readable non-transitory recording medium recording a program for executing the source code reconstruction and optimization method in a computer according to an embodiment, the source code reconstruction and optimization method includes a source code structure to which a control flow obfuscation technique is applied to confirm; reconstructing the code blocks loaded in the identified source code structure according to the execution order; and performing source code optimization according to the control flow of the reconstructed code block.

본 개시에 따르면, 소스코드 재건 및 최적화 방법은 컴파일러 최적화기에서 적용할 수 없는 동적 분석의 형태를 적용하고 있기 때문에 더욱 정확한 형태의 제어흐름 재건이 가능하다.According to the present disclosure, since the source code reconstruction and optimization method applies a form of dynamic analysis that cannot be applied to the compiler optimizer, a more accurate form of control flow reconstruction is possible.

또한, 본 개시의 소스코드 재건 및 최적화 방법은 제어흐름 재건을 수행함으로써 이로 인해 최적화 가능 패턴이 추가 생성되므로, 컴파일 과정에서 컴파일러 최적화 옵션을 전혀 적용하지 않더라도 매우 높은 최적화율과 가독성을 제공할 수 있다.In addition, since the source code reconstruction and optimization method of the present disclosure performs control flow reconstruction and thereby additionally generates an optimizable pattern, it is possible to provide a very high optimization rate and readability even if the compiler optimization option is not applied at all during the compilation process. .

또한, 본 개시의 소스코드 재건 및 최적화 방법은 소스코드간 변환(Source-to-Source Transformation)이기 때문에 일반적인 코드 분석가에게 다소 친숙하며, 컴파일러의 최적화 기술을 적용하였을 때 보다 생성 결과물의 가독성이 높다.In addition, since the source code reconstruction and optimization method of the present disclosure is source-to-source transformation, it is somewhat familiar to a general code analyst, and the readability of the generated result is higher than when the optimization technique of the compiler is applied.

발명의 효과는 이상에서 언급한 효과로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 청구범위의 기재로부터 당해 기술 분야의 통상의 기술자에게 명확하게 이해될 수 있을 것이다.Effects of the invention are not limited to the effects mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the description of the claims.

도 1은 일 실시 예에 따른 소스코드 재건 및 최적화 방법의 흐름도이다.
도 2a는 제어흐름 평탄화 기술이 적용되기 전의 원본 소스코드이다.
도 2b는 제어흐름 평탄화 기술이 적용되기 전의 제어흐름 그래프이다.
도 2c는 제어흐름 평탄화 기술이 적용되기 전의 소스코드 실행 순서이다.
도 3a는 제어흐름 평탄화 기술이 적용된 소스코드이다.
도 3b는 제어흐름 평탄화 기술이 적용된 제어흐름 그래프이다.
도 3c는 제어흐름 평탄화 기술이 적용된 소스코드 실행 순서이다.
도 4는 일 실시 예에 따른 소스코드 재건 및 최적화 방법의 구조도이다.
도 5는 일 실시 예에 따른 소스코드 재건 및 최적화 방법의 알고리즘의 흐름도이다.
도 6은 일 실시 예에 따른 제어흐름 재건을 나타낸 개념도이다.
도 7은 일 실시 예에 따른 소스코드 재건 및 최적화 장치의 블록도이다.1 is a flowchart of a source code reconstruction and optimization method according to an embodiment.
2A is the original source code before the control flow flattening technique is applied.
2B is a control flow graph before the control flow flattening technique is applied.
2C is a sequence of source code execution before the control flow flattening technique is applied.
3A is a source code to which a control flow flattening technique is applied.
3B is a control flow graph to which a control flow flattening technique is applied.
3C is a source code execution sequence to which a control flow flattening technique is applied.
4 is a structural diagram of a source code reconstruction and optimization method according to an embodiment.
5 is a flowchart of an algorithm of a source code reconstruction and optimization method according to an embodiment.
6 is a conceptual diagram illustrating control flow reconstruction according to an embodiment.
7 is a block diagram of a source code reconstruction and optimization apparatus according to an embodiment.

실시 예들에서 사용되는 용어는 본 개시에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 당 분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서 본 개시에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 개시의 전반에 걸친 내용을 토대로 정의되어야 한다.Terms used in the embodiments are selected as currently widely used general terms as possible while considering functions in the present disclosure, but may vary depending on intentions or precedents of those of ordinary skill in the art, emergence of new technologies, and the like. In addition, in certain cases, there are also terms arbitrarily selected by the applicant, and in this case, the meaning will be described in detail in the corresponding description. Therefore, the terms used in the present disclosure should be defined based on the meaning of the term and the contents of the present disclosure, rather than the simple name of the term.

명세서 전체에서 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있음을 의미한다. 또한, 명세서에 기재된 "...부", "...모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어 또는 소프트웨어로 구현되거나 하드웨어와 소프트웨어의 결합으로 구현될 수 있다.In the entire specification, when a part "includes" a certain component, it means that other components may be further included, rather than excluding other components, unless otherwise stated. In addition, terms such as "...unit" and "...module" described in the specification mean a unit that processes at least one function or operation, which is implemented as hardware or software, or a combination of hardware and software. can be

명세서 전체에서 기재된 "a, b, 및 c 중 적어도 하나"의 표현은, 'a 단독', 'b 단독', 'c 단독', 'a 및 b', 'a 및 c', 'b 및 c', 또는 'a, b, 및 c 모두'를 포괄할 수 있다.The expression "at least one of a, b, and c" described throughout the specification means 'a alone', 'b alone', 'c alone', 'a and b', 'a and c', 'b and c ', or 'all a, b, and c'.

이하에서 언급되는 "단말"은 네트워크를 통해 서버나 타 단말에 접속할 수 있는 컴퓨터나 휴대용 단말로 구현될 수 있다. 여기서, 컴퓨터는 예를 들어, 웹 브라우저(WEB Browser)가 탑재된 노트북, 데스크톱(desktop), 랩톱(laptop) 등을 포함하고, 휴대용 단말은 예를 들어, 휴대성과 이동성이 보장되는 무선 통신 장치로서, IMT(International Mobile Telecommunication), CDMA(Code Division Multiple Access), W-CDMA(W-Code Division Multiple Access), LTE(Long Term Evolution) 등의 통신 기반 단말, 스마트폰, 태블릿 PC 등과 같은 모든 종류의 핸드헬드(Handheld) 기반의 무선 통신 장치를 포함할 수 있다.The "terminal" referred to below may be implemented as a computer or a portable terminal capable of accessing a server or other terminal through a network. Here, the computer includes, for example, a laptop, a desktop, and a laptop equipped with a web browser (WEB Browser), and the portable terminal is, for example, a wireless communication device that ensures portability and mobility. , IMT (International Mobile Telecommunication), CDMA (Code Division Multiple Access), W-CDMA (W-Code Division Multiple Access), LTE (Long Term Evolution) and other communication-based terminals, smartphones, tablet PCs, etc. It may include a handheld-based wireless communication device.

아래에서는 첨부한 도면을 참고하여 본 개시의 실시 예에 대하여 본 개시가 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 개시는 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시 예에 한정되지 않는다. 이하에서는 도면을 참조하여 본 개시의 실시 예들을 상세히 설명한다.Hereinafter, with reference to the accompanying drawings, embodiments of the present disclosure will be described in detail so that those of ordinary skill in the art to which the present disclosure pertains can easily implement them. However, the present disclosure may be implemented in several different forms and is not limited to the embodiments described herein. Hereinafter, embodiments of the present disclosure will be described in detail with reference to the drawings.

도 1은 일 실시 예에 따른 소스코드 재건 및 최적화 방법의 흐름도이다.1 is a flowchart of a source code reconstruction and optimization method according to an embodiment.

단계 S101에서 난독화된 소스코드에서 제어흐름 평탄화 기술이 적용된 소스코드 구조를 확인할 수 있다. 이 때의 난독화된 소스코드는 제어흐름 난독화 기술이 적용된 소스코드이고, 제어흐름 난독화 기술은 제어흐름 평탄화(Control-Flow Flattening) 기술일 수 있다. 이 기술은 일반적으로 수직 관계인 소프트웨어의 제어흐름을 수평 관계로 변형하는 기술로서, 보다 상세한 설명은 도 2 및 3을 참조로 설명하기로 한다. 일 실시 예에 따르면, 제어흐름 평탄화 기술이 적용된 소스코드 구조를 확인하는 것은 난독화된 소스코드 내부를 분석하여 재건 및 최적화 대상이 되는 평탄화 구조의 위치를 추출하는 것을 의미할 수 있다.In step S101, it is possible to check the source code structure to which the control flow flattening technique is applied from the obfuscated source code. At this time, the obfuscated source code may be a source code to which a control flow obfuscation technique is applied, and the control flow obfuscation technique may be a control-flow flattening technique. This technique transforms the control flow of software, which is generally in a vertical relationship, into a horizontal relationship, and a more detailed description will be described with reference to FIGS. 2 and 3 . According to an embodiment, checking the source code structure to which the control flow flattening technology is applied may mean extracting the location of the flattening structure to be reconstructed and optimized by analyzing the inside of the obfuscated source code.

단계 S102에서는, S101에서 확인된 소스코드 구조의 내부에 적재된 코드 블록을 실행 순서에 따라 재건할 수 있다. 일 실시 예에 따르면, 코드 블록을 재건하는 것은, 적재된 코드 블록의 각각에서 레이블의 위치를 확인하고, 레이블의 하단에 코드 블록 인덱스를 출력하기 위한 인터페이스를 삽입하여 실행함으로써 코드 블록 인덱스를 출력하는 과정을 포함할 수 있다. 이 때 코드 블록 인덱스는 적재된 코드 블록 중 실제 실행되는 코드 블록에 대해 실행 순서대로 출력될 수 있다. 일 실시 예에 따르면, 출력된 코드 블록 인덱스에 대응하는 코드 블록을 레이블과 코드 블록 이동 명령문을 블록 내에서 삭제한 채로 실행 순서대로 배치할 수 있다. 또한, 배치된 코드 블록에서 디스패처(dispatcher) 블록을 제거하여 제어흐름 재건을 완료할 수 있다.In step S102, the code blocks loaded inside the source code structure identified in S101 may be reconstructed according to the execution order. According to an embodiment, the reconstruction of the code block outputs the code block index by checking the position of the label in each of the loaded code blocks, and inserting and executing an interface for outputting the code block index at the bottom of the label. process may be included. In this case, the code block index may be output in the order of execution for actually executed code blocks among the loaded code blocks. According to an embodiment, the code block corresponding to the output code block index may be arranged in the execution order while the label and the code block move statement are deleted from within the block. In addition, the control flow reconstruction can be completed by removing the dispatcher block from the deployed code block.

단계 S103에서는, S102에서 재건된 코드 블록의 제어흐름에 따라 소스코드 최적화를 수행할 수 있다. 일 실시 예에 따르면, 더 이상 제거 및 최적화 할 소스코드가 없을 때까지 재건된 코드 블록의 소스코드를 제거 및 최적화하는 단계를 반복할 수 있다.In step S103, source code optimization may be performed according to the control flow of the code block reconstructed in S102. According to an embodiment, the step of removing and optimizing the source code of the reconstructed code block may be repeated until there is no more source code to be removed and optimized.

일 실시 예에 따르면, 최적화 된 소스코드에 대하여 분석을 수행할 수 있다. 분석을 통해 해당 소스코드가 어느정도 가독성이 있는지, 난독화된 소스코드 대비 얼만큼 최적화가 수행되었는지 알 수 있다. 그리고 이러한 최적화가 완료된 소스코드를 컴파일러를 통해 컴파일하여 배포할 수 있다.According to an embodiment, analysis may be performed on the optimized source code. Through analysis, it is possible to know how readable the source code is and how much optimization has been performed compared to the obfuscated source code. In addition, the source code that has been optimized can be compiled and distributed through a compiler.

한편, 소스코드 레벨에 적용 가능한 상용 난독화 도구는 그 수가 많지 않으며, 그 중에서도 제어흐름 난독화 기술을 적용할 수 있는 상용 난독화 도구는 더욱 적다. 제어흐름 평탄화 기술은 소스코드 레벨에서 적용할 수 있는 제어흐름 난독화 기술 중 거의 유일한 상용화 기술에 해당한다. 따라서 제어흐름 난독화 기술을 적용할 수 있는 소스코드 레벨 상용 난독화 도구는 대부분 제어흐름 평탄화 기술을 옵션으로 제공하고 있다. 이러한 제어흐름 평탄화 기술에 대해 다음 도 2 및 3을 참조하여 설명하도록 한다.On the other hand, there are not many commercial obfuscation tools applicable to the source code level, and among them, there are even fewer commercial obfuscation tools that can apply the control flow obfuscation technology. Control flow flattening technology is almost the only commercialized technology among control flow obfuscation technologies that can be applied at the source code level. Therefore, most of the source code level commercial obfuscation tools that can apply the control flow obfuscation technology provide the control flow flattening technology as an option. This control flow flattening technique will be described with reference to FIGS. 2 and 3 below.

먼저, 도 2a는 제어흐름 평탄화 기술이 적용되기 전의 원본 소스코드이고, 도 2b는 제어흐름 평탄화 기술이 적용되기 전의 제어흐름 그래프이며, 도 2c는 제어흐름 평탄화 기술이 적용되기 전의 소스코드 실행 순서이다.First, FIG. 2A is the original source code before the control flow flattening technique is applied, FIG. 2B is a control flow graph before the control flow flattening technique is applied, and FIG. 2C is the source code execution sequence before the control flow flattening technique is applied .

제어흐름 평탄화 기술이 적용되기 전의 제어흐름은 수직관계인 것이 일반적이다. 도 2a의 원본 소스코드는 예시적으로 작성된 비교적 간단한 소스코드로서, "Hello"와 "World"를 출력하기 위한 소스코드이다. 이러한 수직적인 구조의 소스코드는, 도 2b를 참조하면, 한 개의 노드(node)로 이루어진 제어흐름을 가진다. 따라서, 해당 소스코드가 컴파일 될 때에 "Hello" "World" Exit(즉, 코드블록 탈출)로 이루어진 한 개의 코드블록 내부에서 순차적으로 실행이 되게 된다.Before the control flow flattening technique is applied, the control flow is generally vertical. The original source code of FIG. 2A is a relatively simple source code written as an example, and is a source code for outputting "Hello" and "World". The source code of such a vertical structure, with reference to FIG. 2B, has a control flow consisting of one node. Therefore, when the corresponding source code is compiled, it is sequentially executed inside one code block consisting of "Hello" and "World" Exit (ie, code block escape).

소스코드 레벨에서 적용되는 제어흐름 난독화 기술로서 제어흐름 평탄화 기술이 있으며, 소스코드에 이 기술이 적용되면 제어흐름은 수평관계로 변형된다. 이와 관련하여, 도 3a는 제어흐름 평탄화 기술이 적용된 소스코드이고, 도 3b는 제어흐름 평탄화 기술이 적용된 제어흐름 그래프이며, 도 3c는 제어흐름 평탄화 기술이 적용된 소스코드 실행 순서이다.As a control flow obfuscation technique applied at the source code level, there is a control flow flattening technique, and when this technique is applied to the source code, the control flow is transformed into a horizontal relationship. In this regard, FIG. 3A is a source code to which the control flow flattening technique is applied, FIG. 3B is a control flow graph to which the control flow flattening technique is applied, and FIG. 3C is a source code execution sequence to which the control flow flattening technique is applied.

먼저, 제어흐름 평탄화 기술이 적용된 소스코드의 구성요소에는 디스패처(dispatcher), 디스패처 변수, 코드 블록, 코드 블록 인덱스 및 레이블(label)이 포함될 수 있다.First, components of the source code to which the control flow flattening technology is applied may include a dispatcher, a dispatcher variable, a code block, a code block index, and a label.

디스패처는 컴파일 실행 될 코드 블록을 선택하는 선택기이다. 예를 들어, 도 3a에서 Switch-Case 문이 디스패처에 해당할 수 있다. 디스패처 변수는 컴파일 실행될 코드 블록을 선택하기 위해 사용되는 변수이다. 예를 들어, 도 3a에서 SwVar이 디스패처 변수에 해당할 수 있다.The dispatcher is a selector that selects a block of code to be compiled and executed. For example, the Switch-Case statement in FIG. 3A may correspond to the dispatcher. A dispatcher variable is a variable used to select a code block to be compiled and executed. For example, in FIG. 3A , SwVar may correspond to a dispatcher variable.

코드 블록은 기능성 단위로 분할된 블록을 나타내며, 기본 블록과 유사한 개념으로 볼 수 있다. 예를 들어, 도 3c에서 Label_#k 에서 Goto Switch문 까지를 k번째 코드 블록으로 지칭할 수 있다. 코드 블록 인덱스는 코드 블록의 숫자 (또는 이름)을 나타낸다. 예를 들어, 도 3c에서의 #k에 해당하며, 본 실시 예에서의 k는 1, 2, 3이다. 레이블은 코드 블록 내에서 이동 대상 주소로써 활용될 수 있다. 예를 들어, Goto Label_#2문은 인덱스가 2인 코드 블록으로 이동할 것을 나타낼 수 있다. 레이블은 도 3c에서의 Label_#k에 해당한다.A code block represents a block divided into functional units, and can be viewed as a concept similar to a basic block. For example, in FIG. 3C, from Label_#k to the Goto Switch statement may be referred to as a k-th code block. The code block index represents the number (or name) of the code block. For example, it corresponds to #k in FIG. 3C, and k in this embodiment is 1, 2, and 3. A label can be used as a moving target address within a code block. For example, the Goto Label_#2 statement may indicate to move to the code block with index 2. The label corresponds to Label_#k in FIG. 3C.

이를 기초로 도 3을 다시 살펴보면, 제어흐름 평탄화 기술이 적용된 제어흐름은 여러 개의 코드블록으로 이루어진 수평관계로 변형된다. 도 3a는 도 2a의 원본 소스코드에 제어흐름 평탄화 기술이 적용된 소스코드로서, "Hello"와 "World"를 출력하기 위한 소스코드가 switch case 0와 switch case 1로 두개의 코드 블록으로 나누어 진 것을 확인할 수 있다. 즉, 디스패처의 실행을 선두로 하여 디스패처 변수에 따라 실행되는 코드 블록이 정해진다. 해당 코드 블록의 실행이 완료되는 시점에서 디스패처 변수는 다른 값으로 변경되며, 이는 다음에 실행될 코드 블록 인덱스를 계산하는 과정일 것이다.Referring back to FIG. 3 based on this, the control flow to which the control flow flattening technique is applied is transformed into a horizontal relationship composed of several code blocks. 3A is a source code to which a control flow flattening technique is applied to the original source code of FIG. 2A, and the source code for outputting "Hello" and "World" is divided into two code blocks, switch case 0 and switch case 1. can be checked That is, the code block to be executed is determined according to the dispatcher variable, starting with the execution of the dispatcher. When the execution of the corresponding code block is completed, the dispatcher variable is changed to a different value, which will be the process of calculating the index of the code block to be executed next.

이러한 수평적인 구조의 소스코드는 도 3b를 참조하면, 복수 개의 노드로 이루어진 제어흐름을 가진다. 따라서, 해당 소스코드가 컴파일 될 때에 디스패처인 Switch-Case 블록(301)을 기준으로 선택 가능한 코드 블록 3가지, "World"를 출력하기 위한 블록(302), "Hello"를 출력하기 위한 블록(303) 및 while문을 탈출하기 위한 블록(304)이 나열되게 된다. 도 3c을 참조하면, 디스패처를 제외하고 나머지 블록(302, 303, 304)이 수평으로 나열된 것을 확인할 수 있다. 이는 수평구조로 변형된 소스코드로는 실제 컴파일하지 않는 이상 코드 블록의 실행 순서를 알 수 없음을 의미한다. 본 실시 예는 비교적 간단한 소스코드이나, 실제 사용되는 소스코드의 구조는 매우 복잡할 것이므로, 제어흐름 평탄화 기술이 적용된 소스코드는 실제로 실행해 보지 않는 이상 실제 실행되는 코드 블록과 그 실행 순서를 파악하기 어렵다. 다시 말해, 제어흐름 평탄화 기술이 적용된 소스코드는 코드 블록의 실행 순서를 정적으로 파악하기 어렵다.Referring to FIG. 3B , the source code of such a horizontal structure has a control flow composed of a plurality of nodes. Accordingly, when the corresponding source code is compiled, there are three selectable code blocks based on the Switch-Case block 301 that is the dispatcher, a block 302 for outputting “World”, and a block 303 for outputting “Hello”. ) and a block 304 for escaping the while statement are listed. Referring to FIG. 3C , it can be seen that the remaining blocks 302 , 303 , and 304 are horizontally arranged except for the dispatcher. This means that the execution order of code blocks cannot be known unless actually compiled with source code transformed into a horizontal structure. Although this embodiment is a relatively simple source code, the structure of the source code actually used will be very complicated, so unless you actually run the source code to which the control flow flattening technology is applied, it is difficult to understand the code block that is actually executed and the order of execution. difficult. In other words, it is difficult to statically grasp the execution order of code blocks in the source code to which the control flow flattening technology is applied.

도 3a에서는 코드 블록 인덱스가 디스패처 변수와 동일한 값(둘 다 0 또는 1의 값만을 가짐)이지만, 난독 변환 수준이 높은 상용 난독화 도구에서는 코드 블록 인덱스와 디스패처 변수 간의 상관관계를 파악할 수 없게 구성된다. 또한, 도 3의 실시 예는 디스패처 변수의 초기값(int SwVar = 0)을 쉽게 획득할 수 있으며, 이의 연산 과정(SwVar = 1) 또한 복잡하지 않기 때문에 코드 블록의 실행 순서를 정적으로도 쉽게 파악 가능하지만, 상용 난독화 도구에서는 초기값과 연산 과정 또한 정적으로 파악하기 어렵게 구성된다.In Figure 3a, the code block index has the same value as the dispatcher variable (both have only a value of 0 or 1), but a commercial obfuscation tool with a high level of obfuscation transformation cannot determine the correlation between the code block index and the dispatcher variable. . In addition, in the embodiment of FIG. 3 , the initial value (int SwVar = 0) of the dispatcher variable can be easily obtained, and the operation process thereof (SwVar = 1) is also not complicated, so the execution order of the code block can be easily grasped statically. It is possible, but in commercial obfuscation tools, the initial value and the operation process are also configured to be difficult to understand statically.

그리고, 고수준의 제어흐름 평탄화 기술에서는 분석을 더욱 복잡하게 하기 위하여 실제 사용되지 않는 코드 블록과 코드 블록 인덱스도 다수 삽입된다. 예를 들어, 어떠한 코드 블록 내 디스패처 변수도 특정 코드 블록 인덱스를 나타내지 않으면 해당 코드 블록 인덱스가 출력되지 않고 코드 블록이 실행되지 않을 수 있다. 또는, 코드 블록 내 인덱스의 연산이 블록 순차적으로 a=a+1; a=a-1; 로 작성되는 경우와 같이 실행되어도 동일한 기능성을 유지하는 코드 블록이 있을 수 있다. 이 때문에 분석가의 입장에서는 특정 코드 블록의 실제 실행 여부를 파악하는 것에도 디스패처 변수의 분석을 선행하여야 하는 문제가 있다. 따라서, 고수준의 제어흐름 평탄화 기술이 적용된 소스코드를 분석하기 위해서는 동적 분석을 통한 코드 블록의 실행 순서 추출 과정이 선행되어야 한다. 이 과정은 컴파일러 최적화기에서는 수행할 수 없는 과정이기 때문에, 기존 컴파일러 최적화기만으로는 제어흐름 난독화가 적용된 소스코드의 가독성을 증가시키기 어렵게 된다.In addition, in the high-level control flow flattening technique, a number of code blocks and code block indexes that are not actually used are inserted in order to further complicate the analysis. For example, if the dispatcher variable in any code block does not indicate a specific code block index, the corresponding code block index may not be output and the code block may not be executed. Alternatively, the operation of the index within the code block is block sequentially a=a+1; a=a-1; There may be blocks of code that retain the same functionality when executed as if they were written as . For this reason, from the analyst's point of view, there is a problem that the analysis of the dispatcher variable must precede the analysis of whether or not a specific code block is actually executed. Therefore, in order to analyze the source code to which the high-level control flow flattening technology is applied, the process of extracting the execution order of the code block through dynamic analysis must be preceded. Since this process cannot be performed by the compiler optimizer, it is difficult to increase the readability of the source code to which the control flow obfuscation is applied using only the existing compiler optimizer.

따라서, 본 발명에서 제어흐름 난독화가 적용된 소스코드의 가독성을 증가시키기 위해 최적화 이전에 소스코드 레벨에서 구조를 재건하는 방식이 제안된다.Therefore, in the present invention, in order to increase the readability of the source code to which the control flow obfuscation is applied, a method of reconstructing the structure at the source code level before optimization is proposed.

도 4는 일 실시 예에 따른 소스코드 재건 및 최적화 방법의 구조도이고, 도 5는 일 실시 예에 따른 소스코드 재건 및 최적화 방법의 알고리즘의 흐름도이다.4 is a structural diagram of a source code reconstruction and optimization method according to an embodiment, and FIG. 5 is a flowchart of an algorithm of a source code reconstruction and optimization method according to an embodiment.

도 4 및 5를 참조하면, 난독화된 소스코드는 본 발명에 따른 소스코드 재건 및 최적화 장치(400)에 의해 최적화된 소스코드로 변환될 수 있다. 우선, 단계 S501에서 난독화된 소스코드 파싱 모듈(Obfuscated Source Code Parsing Module)(410)을 통해 난독화된 소스코드의 내부를 판독하고 분석할 수 있다. 그리고 단계 S502에서 정규 표현식 모듈(Regular Expression Module)(420)을 통해, 소스코드 내부에서 제어흐름 평탄화 기술이 적용된 평탄화 구조의 위치를 식별하여 추출할 수 있다. 예를 들어, 개발자는 소스코드에 마커를 삽입하여 어떤 함수에 난독화 기술을 적용할지 표시할 수 있다. 이 경우 마커의 삽입으로 인하여 함수 명이 변경되게 되며, 그 함수 내에 위치한 SwitchCase 문이 일반적으로 평탄화 구조에 해당한다.4 and 5, the obfuscated source code may be converted into an optimized source code by the source code reconstruction and optimization apparatus 400 according to the present invention. First, it is possible to read and analyze the inside of the obfuscated source code through the obfuscated source code parsing module 410 in step S501. And in step S502, through the regular expression module (Regular Expression Module) 420, it is possible to identify and extract the location of the flattening structure to which the control flow flattening technology is applied in the source code. For example, a developer can insert a marker in the source code to indicate which function to apply the obfuscation technique to. In this case, the function name is changed due to the insertion of the marker, and the SwitchCase statement located within the function generally corresponds to a flattened structure.

소스코드 재건 알고리즘은 평탄화 구조를 식별하고 재건하기 위해 최적화 과정 이전에 한번 수행되므로, 알고리즘의 라운드(round)를 확인하여 수행하게 된다. 라운드는 초기 값 0으로 설정되며, 이에 따라 소스코드 재건을 거치지 않은 소스코드는 라운드가 증가하지 않고 계속 0인 상태이다. 단계 S503에서 이러한 라운드가 0인지 체크할 수 있다. 즉, 단계 S503에서 대상이 되는 소스코드 구조가 재건된 소스코드인지 확인할 수 있다.Since the source code reconstruction algorithm is performed once before the optimization process to identify and reconstruct the flattened structure, it is performed by checking the round of the algorithm. The round is set to an initial value of 0, and accordingly, for source code that has not undergone source code reconstruction, the round does not increase and continues to be 0. In step S503, it may be checked whether this round is 0. That is, it can be checked whether the target source code structure is the reconstructed source code in step S503.

단계 S503에서 "예"인 경우, 즉, 재건된 적이 없는 소스코드인 경우, 단계 S504에서 레이블 추출 모듈(Label Extraction Module)(430)을 통해, 평탄화 구조에 적재된 코드 블록에 레이블을 출력하기 위한 인터페이스를 각각 삽입하고 해당 코드 블록을 실행할 수 있다. 이 때의 인터페이스는 API(Application Programming Interface)를 의미한다. 레이블 추출 모듈(430)은 평탄화 구조 내부에 적재된 코드 블록을 실행 순서대로 추출할 수 있는 모듈로서, 동적 분석을 적용하여 평탄화 구조 내부의 레이블의 위치를 파악하고, 레이블의 아래에 코드 블록 인덱스를 출력하기 위한 인터페이스를 삽입할 수 있다. 이 때 레이블의 위치를 확인하기 위해 정규 표현식 모듈(420)이 같이 사용될 수 있다.If "Yes" in step S503, that is, in the case of source code that has never been reconstructed, in step S504 for outputting a label to the code block loaded in the flattened structure through the Label Extraction Module 430 You can insert each interface and run that code block. In this case, the interface means an API (Application Programming Interface). The label extraction module 430 is a module that can extract the code blocks loaded in the flattening structure in the order of execution. By applying dynamic analysis, the label extraction module 430 finds the position of the label inside the flattening structure, and sets the code block index under the label. You can insert an interface for output. In this case, the regular expression module 420 may be used together to check the position of the label.

일 실시 예에 따르면, 코드 블록 인덱스를 출력하기 위한 API는 Print("#3")과 같은 Print문일 수 있다. 코드 블록에 레이블을 출력하기 위한 인터페이스를 삽입하고 이를 컴파일 후 실행 한다면, 실제로 실행되는 코드 블록에 대한 코드 블록 인덱스만 코드 블록의 실행 순서대로 출력되게 된다. 예를 들어, 도 3c의 예시에서 코드 블록 인덱스가 4 및 5인 코드 블록이 존재하고, 이러한 코드 블록들이 실제로는 실행되지 않는 코드 블록인 경우를 가정한다. 이 경우, 상기 API가 삽입된 평탄화 구조가 실행되면, #2 Hello #1 World #3이 그 실행 순서에 따라 출력되고, 실행되지 않는 코드 블록의 인덱스인 #4와 #5는 출력되지 않을 것이다.According to an embodiment, the API for outputting the code block index may be a Print statement such as Print("#3"). If an interface for outputting a label is inserted in a code block and executed after compiling, only the code block index for the code block that is actually executed is output in the order of execution of the code block. For example, in the example of FIG. 3C , it is assumed that code blocks having code block indexes 4 and 5 exist, and these code blocks are code blocks that are not actually executed. In this case, when the flattening structure in which the API is inserted is executed, #2 Hello #1 World #3 is output according to the execution order, and #4 and #5, which are indices of code blocks that are not executed, will not be output.

코드 블록 인덱스가 적재된 코드 블록 중 실제 실행되는 코드 블록에 대해 실행 순서대로 출력된 이후 다음의 단계 S505 내지 S508의 소스코드 재건 과정을 거치게 된다. 먼저, 단계 S505에서 제어흐름 재건 모듈(Control-Flow Reconstruction Module)(440)을 통해, 브랜치 명령문을 재건할 수 있다. 다시 말해, 단계 S504에서 출력된 코드 블록 인덱스에 대응하는 코드 블록을 실행 순서대로 배치할 수 있다.After the code block index is output in the order of execution for actually executed code blocks among the loaded code blocks, the source code reconstruction process of steps S505 to S508 is performed. First, through the control-flow reconstruction module (Control-Flow Reconstruction Module) 440 in step S505, it is possible to reconstruct the branch statement. In other words, code blocks corresponding to the code block index output in step S504 may be arranged in the execution order.

그리고 단계 S506에서, 실행 순서대로 배치된 코드 블록의 레이블을 분석하고 디스패처를 제거할 수 있다. 구체적으로, 코드 블록(들)은 블록 내에서 제일 윗줄인 레이블과 제일 아랫줄인 코드 블록 이동 명령문(예를 들어, Goto 문)을 제거한 뒤 배치될 수 있다. 예를 들어, 도 3c의 예시에서 코드 블록(302)은 Label_#1:과 Goto Switch문이 제거되고 "World"만 코드 블록에 남고, 코드 블록(303)은 Label_#2:와 Goto Switch문이 제거되고 "Hello"만 코드 블록에 남고, 코드 블록(304)은 Label_#3:이 제거되고 Exit문만 남은 채로, 그 실행 순서인 코드 블록(303), 코드 블록(302), 코드 블록(304) 순서로 배치될 것이다. 보다 상세하게는, 디스패처(301), 코드 블록(303), 디스패처(301), 코드 블록(302), 디스패처(301), 코드 블록(304) 순서로 배치될 것이다.And in step S506, the label of the code block arranged in the execution order may be analyzed and the dispatcher may be removed. Specifically, the code block(s) may be placed after removing the label on the top line and the code block move statement on the bottom line (eg, Goto statement) in the block. For example, in the example of Figure 3c, code block 302 has Label_#1: and Goto Switch statements removed, leaving only "World" in the code block, and code block 303 has Label_#2: and Goto Switch statements. removed and only "Hello" is left in the code block, the code block 304 has the Label_#3: removed and only the Exit statement remains, the execution order of the code block 303, code block 302, code block 304 will be placed in order. More specifically, the dispatcher 301, the code block 303, the dispatcher 301, the code block 302, the dispatcher 301, the code block 304 will be arranged in the order.

이후, 배치된 코드 블록에서 디스패처(또는 디스패처 블록)을 제거할 수 있다. 앞서 도 3과 관련하여 설명하였듯이, 디스패처는 평탄화 구조에 적재된 코드 블록 중 실행할 코드 블록을 선택하기 위한 선택 블록으로써 SwitchCase문 등을 의미할 수 있다. 디스패처 블록까지 모두 제거된 후, 코드 블록 재정렬이 완료된다(단계 S507). Thereafter, the dispatcher (or dispatcher block) may be removed from the disposed code block. As described above with reference to FIG. 3 , the dispatcher may mean a SwitchCase statement or the like as a selection block for selecting a code block to be executed from among the code blocks loaded in the planarization structure. After all of the dispatcher blocks are removed, the code block rearrangement is completed (step S507).

코드 블록이 재정렬 된 후, 단계 S508에서 알고리즘의 라운드를 1 증가시킨다. 이는 단계 S503으로 다시 돌아가 "아니오"인 경우로 판단되어, 해당 소스코드가 재건이 완료된 소스코드임을 알리기 위함이다.After the code blocks are rearranged, the round of the algorithm is incremented by one in step S508. This is to return to step S503 again, and it is determined that the case is "No", so as to inform that the source code is the source code that has been reconstructed.

다음, 재건된 난독화된 소스코드는 단계 S509 내지 S512의 소스코드 최적화 단계를 거치게 된다. 일 실시 예에 따르면, 소스코드 제거 모듈(Source Code Elimination Module)(450)과 소스코드 최적화 모듈(Source Code Optimization Module)(460)은 기존의 컴파일러 최적화기에서 적용 가능한 중간 언어에서 구현되던 최적화 기법을 소스코드 레벨에도 적용할 수 있도록 구현될 수 있다.Next, the reconstructed obfuscated source code is subjected to the source code optimization step of steps S509 to S512. According to one embodiment, the source code elimination module (Source Code Elimination Module) 450 and the source code optimization module (Source Code Optimization Module) 460 are the optimization techniques implemented in the intermediate language applicable to the existing compiler optimizer. It can be implemented so that it can be applied to the source code level as well.

먼저, 삭제된 코드 개수를 의미하는 변수 DeletedCode와 최적화된 코드 개수를 의미하는 변수 OptimizedCode가 초기 값 0으로 설정된다(S509). 그리고 단계 S510에서 소스코드 제거 모듈(450)을 통해, 재건된 소스코드에서 삭제 가능한 패턴을 식별하고 삭제할 수 있다. 일 실시 예에 따르면, 삭제 가능한 패턴은 삭제하여도 코드 블록을 컴파일할 때에 그 기능성에 변동이 없는 소스코드를 의미할 수 있다. 예를 들어, 코드 블록 내에 'A = 3; A = 8;'와 같은 두 줄의 소스코드가 있는 경우, 변수 A는 3으로 저장되었다가 다시 8로 저장될 것이므로, 'A = 3;'을 삭제한 뒤 소스코드를 실행하여도 변수 A가 8로 저장됨에는 변화가 없다. 따라서 이 경우 소스코드 'A = 3;'는 삭제 가능한 패턴일 것이다. 반대로, 코드 블록 내에 'A = 3; B = A + 6; A = 8;'와 같은 세 줄의 소스코드가 있는 경우, 변수 A는 3으로 할당된 뒤에 B의 값 계산에 사용된다. 이 경우, 'A = 3;'을 삭제하고 소스코드를 실행할 시, B의 값이 'A = 3;'가 존재하고 있던 컴파일 결과인 9와 달라질 수 있으므로 코드 블록의 기능성에 변경이 발생하게 된다. 따라서 이 때의 'A = 3;'는 삭제 가능한 코드가 아닐 것이다.First, the variable DeletedCode indicating the number of deleted codes and the variable OptimizedCode indicating the number of optimized codes are set to an initial value of 0 (S509). And through the source code removal module 450 in step S510, it is possible to identify and delete a pattern that can be deleted from the reconstructed source code. According to an embodiment, the deleteable pattern may mean a source code that does not change its functionality when compiling a code block even if it is deleted. For example, 'A = 3; If there are two lines of source code such as A = 8;', variable A will be saved as 3 and then as 8 again. There is no change in saving as 8. Therefore, in this case, the source code 'A = 3;' will be a deleteable pattern. Conversely, 'A = 3; B = A + 6; If there are three lines of source code like A = 8;', variable A is assigned to 3 and then used to calculate the value of B. In this case, when 'A = 3;' is deleted and the source code is executed, the value of B may be different from 9, which is the result of compilation where 'A = 3;' exists, so the functionality of the code block is changed. . Therefore, 'A = 3;' in this case will not be a deleteable code.

다음으로, 단계 S511에서 소스코드 최적화 모듈(460)을 통해, 재건된 소스코드에서 최적화 가능한 패턴을 식별하고 최적화할 수 있다. 이러한 소스코드 제거와 최적화가 수행됨에 따라, 단계 S512에서 DeletedCode는 삭제된 패턴의 수만큼 증가하고 OptimizedCode는 최적화된 패턴의 수만큼 증가하며, 소스코드 최적화가 한번 이루어졌으므로 라운드도 1 증가한다.Next, through the source code optimization module 460 in step S511, it is possible to identify and optimize an optimizable pattern in the reconstructed source code. As such source code removal and optimization are performed, in step S512, DeletedCode increases by the number of deleted patterns and OptimizedCode increases by the number of optimized patterns, and since source code optimization is performed once, the round is also increased by 1.

일 실시 예에 따르면, 소스코드 최적화 단계는 더 이상 제거 및 최적화할 소스코드가 없을 때까지 재건된 코드 블록의 소스코드를 제거 및 최적화하는 과정을 반복할 수 있다. 단계 S513에서 DeletedCode와 OptimizedCode를 합한 값이 0인지 판별할 수 있는데, 이는 더 이상 제거되거나 최적화된 소스코드가 있는지 여부를 확인하기 위함이다. 따라서, 단계 S513에서 "아니오"인 경우, 아직 제거 또는 최적화 할 소스코드가 남아 있거나 더 이상 제거 및 최적화 할 소스코드가 없는지 확인되지 않은 상태이므로 단계 S509로 돌아가 소스코드 최적화 과정을 반복할 것이다. 반대로, 단계 S513에서 "예"인 경우, 더 이상 제거 및 최적화 할 소스코드가 없을 때까지 소스코드가 제거 및 최적화된 것이므로 다음 단계로 넘어가게 된다.According to an embodiment, the source code optimization step may repeat the process of removing and optimizing the source code of the reconstructed code block until there is no more source code to be removed and optimized. In step S513, it may be determined whether the sum of DeletedCode and OptimizedCode is 0, in order to check whether there is any more removed or optimized source code. Therefore, if "No" in step S513, it is not confirmed whether there is still source code to be removed or optimized or whether there is no source code to be removed or optimized anymore. Therefore, the process returns to step S509 to repeat the source code optimization process. Conversely, if "Yes" in step S513, the source code has been removed and optimized until there is no more source code to be removed and optimized, and thus the process proceeds to the next step.

소스코드 최적화가 완료된 경우, 단계 S514에서 최적화된 소스코드 생성 모듈(470)을 통해, 최적화된 소스코드를 작성할 수 있다. 사용자는 이러한 소스코드 재건 및 최적화 과정을 거친 소스코드를 분석하고 컴파일 후 실행할 수 있을 것이다.When the source code optimization is completed, the optimized source code may be written through the optimized source code generation module 470 in step S514 . Users will be able to analyze the source code that has undergone such source code reconstruction and optimization process, compile it, and then run it.

본 발명의 일 실시 예에 따른 소스코드 재건 및 최적화 방법은 컴파일러 최적화기에서 적용할 수 없는 동적 분석의 형태를 적용하고 있기 때문에 더욱 정확한 형태의 제어흐름 재건이 가능하다. 도 6은 일 실시 예에 따른 제어흐름 재건을 나타낸 개념도이다. 도 6을 통해, 제어흐름 평탄화 기술이 적용된 소스코드가 동적 분석을 통해 정확한 제어흐름 재건이 가능한 것을 확인할 수 있다.Since the source code reconstruction and optimization method according to an embodiment of the present invention applies a form of dynamic analysis that cannot be applied to the compiler optimizer, a more accurate form of control flow reconstruction is possible. 6 is a conceptual diagram illustrating control flow reconstruction according to an embodiment. 6, it can be confirmed that the source code to which the control flow flattening technology is applied can accurately reconstruct the control flow through dynamic analysis.

일 실시 예에 따르면, 프로그램의 실행 순서는 ABAABC 순서일 수 있으나, 제어흐름 평탄화 기술이 적용된 소스코드의 블록도(610)에서는 컴파일 하지 않는 이상 그 실행 순서를 알 수 없다. 따라서 도 5의 단계 S504 내지 S508의 단계를 통해 재건 기술이 적용된 소스코드의 블록도(620)와 같이 실행 순서대로 코드 블록을 정렬할 수 있다.According to an embodiment, the execution order of the program may be the ABAABC order, but in the block diagram 610 of the source code to which the control flow flattening technology is applied, the execution order cannot be known unless the program is compiled. Therefore, it is possible to arrange the code blocks in the order of execution as shown in the block diagram 620 of the source code to which the reconstruction technique is applied through the steps S504 to S508 of FIG. 5 .

구체적으로, 레이블의 위치를 파악하여 레이블의 아래에 코드 블록 인덱스를 출력하는 API를 삽입하고 이를 실행할 수 있다. 이 경우, 실제로 실행되는 코드 블록 인덱스만 실행 순서대로 출력될 것이며 출력된 인덱스의 순서대로 코드 블록을 정렬할 수 있다. 그리고 코드 블록 내에서 레이블과 Goto문을 제거하고, 디스패처 블록을 제거하여 코드 블록을 재정렬할 수 있다. 평탄화 구조의 소스코드는 이와 같은 제어흐름 재건 과정을 거친 후 실제 프로그램 실행 순서인 ABAABC 순서대로 코드 블록이 배열될 것이다(620).Specifically, it is possible to insert an API that outputs the code block index under the label by locating the label and executes it. In this case, only the code block indexes that are actually executed will be output in the execution order, and the code blocks can be sorted in the order of the output indexes. You can also rearrange code blocks by removing labels and Goto statements within the code block, and removing the dispatcher block. After the source code of the flattened structure undergoes such a control flow reconstruction process, the code blocks will be arranged in the ABAABC order, which is the actual program execution order ( 620 ).

소스코드 레벨에서 제어흐름의 재건을 선행한 후 소스코드 레벨 최적화 기술을 적용하는 것은, 제어흐름 재건이 선행되고 컴파일러 최적화 옵션이 전혀 적용되지 않은 경우와 비교하였을 때에 더욱 높은 최적화율과 가독성을 제공할 수 있다. 또한, 제어흐름 재건이 선행되지 않고 컴파일 과정에서 컴파일러 최적화 옵션이 적용되는 경우와 비교하였을 때에도 소스코드 레벨에서 제어흐름 재건이 선행되는 경우가 더 높은 가독성을 제공한다. 다시 말해, 제어흐름 난독화 기술이 적용된 소스코드를 타 컴파일러(예를 들어, Visual C++ Compiler, Intel C++ Compiler, GNU Compiler, Collection(GCC), Clang 등)에서 컴파일 할 때 컴파일러 최적화 옵션을 최대로 적용하는 것보다 소스코드 레벨에서 제어흐름 재건과 최적화가 선행된 경우가 명확히 개선된 결과를 보여줄 수 있다. Applying the source code level optimization technique after the reconstruction of the control flow at the source code level provides a higher optimization rate and readability compared to the case where the control flow reconstruction is preceded and the compiler optimization option is not applied at all. can Also, the case where control flow reconstruction is preceded at the source code level provides higher readability compared to the case where the control flow reconstruction is not preceded and the compiler optimization option is applied during the compilation process. In other words, when compiling source code with control flow obfuscation technology in other compilers (eg, Visual C++ Compiler, Intel C++ Compiler, GNU Compiler, Collection (GCC), Clang, etc.), the compiler optimization option is applied to the maximum. A case in which control flow reconstruction and optimization at the source code level is preceded can show clearly improved results.

한편, 위와 같은 본 발명의 효과는 일 실시 예에 따라 제어흐름 재건을 수행함으로써 추가 생성되는 최적화 가능 패턴이 존재함을 의미한다. 따라서, 제어흐름 재건이 수행되지 않고 최적화가 수행되는 소스코드보다 제어흐름 재건이 수행되고 최적화가 수행되는 소스코드에서 최적화가 더 많이 이루어지는 것을 알 수 있다.On the other hand, the effect of the present invention as described above means that there is an optimizable pattern that is additionally generated by performing control flow reconstruction according to an embodiment. Therefore, it can be seen that more optimization is performed in the source code in which the control flow reconstruction is performed and optimization is performed than in the source code in which the control flow reconstruction is not performed and the optimization is performed.

본 발명의 일 실시 예에 따른 소스코드 재건 및 최적화 방법은 또한 소스코드간 변환이기 때문에 컴파일러의 최적화 기술을 적용하였을 때보다 생성 결과물의 가독성이 높을 수 있다. 즉, 컴파일러 최적화 기술은 소스코드가 아닌 중간언어(Intermediate Representation, IP) 레벨에서 적용되기 때문에 생성 결과물 자체가 중간언어 내지는 바이너리에 해당하여 일반적인 분석가에게는 생소할 수 있다. 반면에 본 발명의 일 실시 예에 따른 소스코드 재건 및 최적화 방법은 생성 결과물이 최적화된 소스코드이기 때문에 일반적인 분석가에게도 다소 친숙하다는 점에서 기존 최적화 기술보다 기여도를 가질 수 있다.Since the source code reconstruction and optimization method according to an embodiment of the present invention also converts between source codes, the readability of the generated result may be higher than when the optimization technique of the compiler is applied. That is, since the compiler optimization technology is applied at the intermediate language (IP) level rather than the source code, the generated result itself corresponds to an intermediate language or binary, which may be unfamiliar to general analysts. On the other hand, the source code reconstruction and optimization method according to an embodiment of the present invention may have a contribution to the existing optimization technology in that it is somewhat familiar to a general analyst because the generated result is an optimized source code.

도 7은 일 실시 예에 따른 소스코드 재건 및 최적화 장치의 블록도이다. 도 1 및 5 등과 같은 순서도에서 각 단계들은 본 발명의 소스코드 재건 및 최적화 장치(700)에 의해 수행될 수 있다. 일 실시 예에 따른 소스코드 재건 및 최적화 장치(700)는 추출부(710), 재건부(720) 및 최적화부(730)를 포함할 수 있다.7 is a block diagram of a source code reconstruction and optimization apparatus according to an embodiment. Each of the steps in the flowcharts such as FIGS. 1 and 5 may be performed by the apparatus 700 for reconstructing and optimizing the source code of the present invention. The source code reconstruction and optimization apparatus 700 according to an embodiment may include an extraction unit 710 , a reconstruction unit 720 , and an optimization unit 730 .

추출부(710)는 난독화된 소스코드에서 제어흐름 평탄화 기술이 적용된 소스코드 구조를 확인할 수 있다. 보다 상세하게, 추출부(710)는 난독화된 소스코드의 내부를 판독하고 분석하고, 소스코드 내부에서 제어흐름 평탄화 기술이 적용된 평탄화 구조의 위치를 식별하여 추출할 수 있다. 일 실시 예에 따른 추출부(710)는 난독화된 소스코드 파싱 모듈(410) 및 정규 표현식 모듈(420)을 포함할 수 있다.The extractor 710 may check the source code structure to which the control flow flattening technique is applied from the obfuscated source code. In more detail, the extraction unit 710 may read and analyze the inside of the obfuscated source code, identify and extract the location of the flattening structure to which the control flow flattening technology is applied within the source code. The extractor 710 according to an embodiment may include an obfuscated source code parsing module 410 and a regular expression module 420 .

재건부(720)는 확인된 소스코드 구조의 내부에 적재된 코드 블록을 실행 순서에 따라 재건할 수 있다. 보다 상세하게, 재건부(720)는 동적 분석을 적용하여 평탄화 구조 내부의 레이블의 위치를 파악하고, 레이블의 아래에 코드 블록 인덱스를 출력하기 위한 인터페이스를 삽입할 수 있다. 또한, 재건부(720)는 소스코드를 실행하여 출력된 코드 블록 인덱스에 대응하는 코드 블록을 실행 순서대로 배치할 수 있다. 이 때 코드 블록 내에서 레이블과 코드 블록 이동 명령문이 제거되고 디스패처가 제거되어 코드 블록이 실행 순서대로 배열될 수 있다. 일 실시 예에 따른 재건부(720)는 레이블 추출 모듈(430) 및 제어흐름 재건 모듈(440)을 포함할 수 있다.The reconstruction unit 720 may reconstruct the code blocks loaded in the identified source code structure according to the execution order. In more detail, the reconstruction unit 720 may apply dynamic analysis to determine the position of the label inside the planarization structure, and insert an interface for outputting the code block index under the label. In addition, the reconstruction unit 720 may execute the source code and arrange the code blocks corresponding to the output code block indexes in the execution order. At this point, labels and code block move statements are removed within the code block, and the dispatcher is removed so that the code blocks can be arranged in execution order. The reconstruction unit 720 according to an embodiment may include a label extraction module 430 and a control flow reconstruction module 440 .

최적화부(730)는 재건된 코드 블록의 제어흐름에 따라 소스코드 최적화를 수행할 수 있다. 보다 상세하게, 최적화부(730)는 재건된 소스코드에서 삭제 가능하거나 최적화 가능한 패턴을 식별하고 삭제 및 최적화할 수 있다. 이는 더 이상 제거 및 최적화할 소스코드가 없을 때까지 반복될 수 있다. 일 실시 예에 따른 최적화부(730)는 소스코드 제거 모듈(450) 및 소스코드 최적화 모듈(460)을 포함할 수 있다.The optimizer 730 may perform source code optimization according to the control flow of the reconstructed code block. In more detail, the optimizer 730 may identify, delete, and optimize a deleterable or optimizable pattern in the reconstructed source code. This can be repeated until there is no more source code to remove and optimize. The optimizer 730 according to an embodiment may include a source code removal module 450 and a source code optimization module 460 .

도 7에서는 편의에 따라 추출부(710), 재건부(720) 및 최적화부(730)만을 도시하였으나, 이는 예시일 뿐, 일 실시 예에 따른 소스코드 재건 및 최적화 장치(700)는 다른 구성요소를 더 포함할 수 있다.7 shows only the extraction unit 710, the reconstruction unit 720, and the optimization unit 730 for convenience, this is only an example, and the source code reconstruction and optimization apparatus 700 according to an embodiment has other components. may further include.

전술한 실시 예들에 따른 장치는 프로세서, 프로그램 데이터를 저장하고 실행하는 메모리, 디스크 드라이브와 같은 영구 저장부(permanent storage), 외부 장치와 통신하는 통신 포트, 터치 패널, 키(key), 버튼 등과 같은 사용자 인터페이스 장치 등을 포함할 수 있다. 소프트웨어 모듈 또는 알고리즘으로 구현되는 방법들은 상기 프로세서상에서 실행 가능한 컴퓨터가 읽을 수 있는 코드들 또는 프로그램 명령들로서 컴퓨터가 읽을 수 있는 기록 매체 상에 저장될 수 있다. 여기서 컴퓨터가 읽을 수 있는 기록 매체로 마그네틱 저장 매체(예컨대, ROM(read-only memory), RAM(random-Access memory), 플로피 디스크, 하드 디스크 등) 및 광학적 판독 매체(예컨대, 시디롬(CD-ROM), 디브이디(DVD: Digital Versatile Disc)) 등이 있다. 컴퓨터가 읽을 수 있는 기록 매체는 네트워크로 연결된 컴퓨터 시스템들에 분산되어, 분산 방식으로 컴퓨터가 판독 가능한 코드가 저장되고 실행될 수 있다. 매체는 컴퓨터에 의해 판독가능하며, 메모리에 저장되고, 프로세서에서 실행될 수 있다. The device according to the above-described embodiments includes a processor, a memory for storing and executing program data, a permanent storage such as a disk drive, a communication port for communicating with an external device, a touch panel, a key, a button, etc. user interface devices, and the like. Methods implemented as software modules or algorithms may be stored on a computer-readable recording medium as computer-readable codes or program instructions executable on the processor. Here, the computer-readable recording medium includes a magnetic storage medium (eg, read-only memory (ROM), random-access memory (RAM), floppy disk, hard disk, etc.) and an optically readable medium (eg, CD-ROM). ), DVD (Digital Versatile Disc), and the like. The computer-readable recording medium is distributed among network-connected computer systems, so that the computer-readable code can be stored and executed in a distributed manner. The medium may be readable by a computer, stored in a memory, and executed on a processor.

본 실시 예는 기능적인 블록 구성들 및 다양한 처리 단계들로 나타내어질 수 있다. 이러한 기능 블록들은 특정 기능들을 실행하는 다양한 개수의 하드웨어 또는/및 소프트웨어 구성들로 구현될 수 있다. 예를 들어, 실시 예는 하나 이상의 마이크로프로세서들의 제어 또는 다른 제어 장치들에 의해서 다양한 기능들을 실행할 수 있는, 메모리, 프로세싱, 로직(logic), 룩 업 테이블(look-up table) 등과 같은 직접 회로 구성들을 채용할 수 있다. 구성 요소들이 소프트웨어 프로그래밍 또는 소프트웨어 요소들로 실행될 수 있는 것과 유사하게, 본 실시 예는 데이터 구조, 프로세스들, 루틴들 또는 다른 프로그래밍 구성들의 조합으로 구현되는 다양한 알고리즘을 포함하여, C, C++, 자바(Java), 어셈블러(assembler) 등과 같은 프로그래밍 또는 스크립팅 언어로 구현될 수 있다. 기능적인 측면들은 하나 이상의 프로세서들에서 실행되는 알고리즘으로 구현될 수 있다. 또한, 본 실시 예는 전자적인 환경 설정, 신호 처리, 및/또는 데이터 처리 등을 위하여 종래 기술을 채용할 수 있다. “매커니즘”, “요소”, “수단”, “구성”과 같은 용어는 넓게 사용될 수 있으며, 기계적이고 물리적인 구성들로서 한정되는 것은 아니다. 상기 용어는 프로세서 등과 연계하여 소프트웨어의 일련의 처리들(routines)의 의미를 포함할 수 있다.This embodiment may be represented by functional block configurations and various processing steps. These functional blocks may be implemented in any number of hardware and/or software configurations that perform specific functions. For example, an embodiment may be an integrated circuit configuration, such as memory, processing, logic, look-up table, etc., capable of executing various functions by the control of one or more microprocessors or other control devices. can be hired Similar to how components may be implemented as software programming or software components, this embodiment includes various algorithms implemented in a combination of data structures, processes, routines or other programming constructs, including C, C++, Java ( Java), assembler, etc. may be implemented in a programming or scripting language. Functional aspects may be implemented in an algorithm running on one or more processors. In addition, the present embodiment may employ the prior art for electronic environment setting, signal processing, and/or data processing. Terms such as “mechanism”, “element”, “means” and “configuration” may be used broadly and are not limited to mechanical and physical configurations. The term may include the meaning of a series of routines of software in connection with a processor or the like.

전술한 실시 예들은 일 예시일 뿐 후술하는 청구항들의 범위 내에서 다른 실시 예들이 구현될 수 있다.The above-described embodiments are merely examples, and other embodiments may be implemented within the scope of the claims to be described later.

Claims

In the source code reconstruction and optimization method,
Checking the source code structure to which the obfuscation technology for flattening the control flow (Control-Flow) is applied;
reconstructing the code blocks loaded in the identified source code structure according to the execution order based on the code block index; and
and performing source code optimization according to the control flow of the reconstructed code block.

According to claim 1,
The rebuilding step is
identifying a position of a label in each of the loaded code blocks;
inserting an interface for outputting the code block index under the label; and
and outputting the code block index by executing the code block into which the interface is inserted.

3. The method of claim 2,
In the outputting step, the code block index is output in the order of execution for the code blocks that are actually executed among the loaded code blocks.

4. The method of claim 3,
The method further comprising the step of arranging the code block corresponding to the output code block index in the execution order.

5. The method of claim 4,
wherein the placed code block is disposed with the label and the code block move statement deleted within the block.

5. The method of claim 4,
Further comprising the step of removing a dispatcher (dispatcher) block from the deployed code block,
wherein the dispatcher block is a selection block for selecting one of the loaded code blocks to execute.

According to claim 1,
The step of optimizing the source code comprises:
iterating removing and optimizing the source code of the reconstructed code block until there is no more source code to remove and optimize.

According to claim 1,
The method further comprising the step of analyzing and compiling the source code on which the optimization has been performed.

In the source code reconstruction and optimization apparatus,
Through the extraction unit, the source code structure to which the obfuscation technology to flatten the control flow is applied,
Through the reconstruction unit, the code blocks loaded inside the identified source code structure are reconstructed according to the execution order based on the code block index,
An apparatus for performing source code optimization according to the control flow of the reconstructed code block through the optimization unit.

As a computer-readable non-transitory recording medium recording a program for executing a source code reconstruction and optimization method on a computer,
The source code reconstruction and optimization method is,
Checking the source code structure to which the obfuscation technology for flattening the control flow is applied;
reconstructing the code blocks loaded in the identified source code structure according to the execution order based on the code block index; and
and performing source code optimization according to the control flow of the reconstructed code block.