CN109543368B - Cross-platform source code virtualization protection method based on intermediate language interpreter - Google Patents

Cross-platform source code virtualization protection method based on intermediate language interpreter Download PDF

Info

Publication number
CN109543368B
CN109543368B CN201811410412.7A CN201811410412A CN109543368B CN 109543368 B CN109543368 B CN 109543368B CN 201811410412 A CN201811410412 A CN 201811410412A CN 109543368 B CN109543368 B CN 109543368B
Authority
CN
China
Prior art keywords
file
bitcode
interpreter
llvm
bytecode
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811410412.7A
Other languages
Chinese (zh)
Other versions
CN109543368A (en
Inventor
韩宇超
姚伟涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Zhexin Information Technology Co ltd
Original Assignee
Hangzhou Zhexin Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Zhexin Information Technology Co ltd filed Critical Hangzhou Zhexin Information Technology Co ltd
Priority to CN201811410412.7A priority Critical patent/CN109543368B/en
Publication of CN109543368A publication Critical patent/CN109543368A/en
Application granted granted Critical
Publication of CN109543368B publication Critical patent/CN109543368B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
    • G06F21/12Protecting executable software
    • G06F21/14Protecting executable software against software analysis or reverse engineering, e.g. by obfuscation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/52Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow
    • G06F21/53Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow by executing in a restricted environment, e.g. sandbox or secure virtual machine
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/43Checking; Contextual analysis
    • G06F8/433Dependency analysis; Data or control flow analysis
    • G06F8/434Pointers; Aliasing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/447Target code generation

Abstract

The invention discloses a cross-platform source code virtualization protection method based on an intermediate language interpreter, which comprises the following steps: converting the source code into an intermediate language to generate an original intermediate file; performing confusion processing on an original intermediate language to generate a confused intermediate file; modifying a bitcode generation rule of the LLVM according to a bitcode file format table of the LLVM, and converting the intermediate file subjected to confusion processing into a bitcode file in a random format; modifying the original bytecode interpreter according to the converted bytecode file in the random format to obtain a corresponding bytecode interpreter; outputting the modified bytecode interpreter according to a target compiler structure, embedding the bytecode corresponding to the modified bytecode interpreter into the data segment, completing compilation and generating a protected executable file. The invention has the beneficial effects that: the confusion is combined with the virtual machine protection technology, so that the source code protection degree is increased; by utilizing the LLVM characteristics, the virtual machine is independent of languages and platforms, and has good universality.

Description

Cross-platform source code virtualization protection method based on intermediate language interpreter
Technical Field
The invention relates to the technical field of software security, in particular to a cross-platform source code virtualization protection method based on an intermediate language interpreter.
Background
With the rapid development of software ecology, how to protect the security and integrity of software products becomes an increasingly important topic, and code obfuscation and virtual machine protection are widely applied to various production environments as mature software protection strategies. Nowadays, binary analysis techniques such as symbol execution are becoming more mature, and code obfuscation techniques represented by OLLVM generally employ techniques such as basic block segmentation, instruction replacement, control flow flattening, forged branch, and string obfuscation. For confusion with branch jumping, i.e. branch forgery, control flow flattening, symbolic execution can be used to determine the true code execution logic, but since there are very mature decompilation engines on the market, the symbolic execution technology is more mature, and binary analysis frameworks such as angr, Miasm and the like all include a relatively mature symbolic execution engine. For block branch obfuscation, it may be identified with symbolic execution; the confusion technology can still be correctly decompiled at the assembly level, and the original code logic can be analyzed by analyzing the assembly code, so the confusion technology has the risk of being cracked, and the simple confusion can not meet the protection force required by the current software ecological field.
In addition, the implementation of virtual machine protection still has great limitations. For an interpreter of assembly language, the traditional virtual machine protection technology usually needs to realize different virtual instruction mappings aiming at different CPU architectures, and the method has the problems of execution efficiency, difficulty in debugging and the like; for the interpreter of the high-level language, the structure of the source code needs to be adjusted to realize the interpreter of the high-level language, but the original structure of the source code is damaged, the risk of source code leakage exists to a certain extent, and the interpreter is related to the language, and the interpreter of the high-level language also needs to be redesigned when being adapted to the new language.
Disclosure of Invention
In order to solve the above problems, an object of the present invention is to provide a cross-platform source code virtualization protection method based on an intermediate language interpreter, which includes the following steps:
step 101: converting the source code into an intermediate language to generate an original intermediate file;
step 102: performing confusion processing on an original intermediate language to generate a confused intermediate file;
step 103: modifying a bitcode generation rule of the LLVM according to a bitcode file format table of the LLVM, and converting the intermediate file subjected to confusion processing into a bitcode file in a random format;
step 104: modifying the original bytecode interpreter according to the converted bytecode file in the random format to obtain a corresponding bytecode interpreter;
step 105: outputting the modified bytecode interpreter according to a target compiler structure, embedding the bytecode corresponding to the modified bytecode interpreter into the data segment, completing compilation and generating a protected executable file.
As a further improvement of the present invention, in step 103, modifying the bitcode generation rule of the LLVM is to add a bitcode file format table on the original basis, and disturb the LLVM bitcode file format according to the bitcode file format table.
As a further improvement of the present invention, in step 103, the bitcode file format is specified by a plurality of mapping relationships.
As a further improvement of the present invention, in step 103, the bitcode file format is specified by using a value function to establish a mapping relationship between the function declaration table and the parameter table.
As a further improvement of the invention, the bitcode file format is specified by adding a new virtual instruction and by the mapping relation between the new virtual instruction and the original instruction.
The invention has the beneficial effects that: the source code is converted into the bytecode with a specific format, the interpreter corresponding to the source code is generated, and the bytecode and the interpreter corresponding to the source code are embedded into a code protection scheme in the output binary file, so that the universality of language and architecture is realized; only the interpreter in the generated binary file can be directly executed, and the code logic is hidden in the data segment, so that the protection strength of the source code is improved.
Drawings
Fig. 1 is a flowchart of a cross-platform source code virtualization protection method based on an intermediate language interpreter according to an embodiment of the present invention;
FIG. 2 is an interpreter execution logic diagram of a cross-platform source virtualization protection method based on an intermediate language interpreter according to an embodiment of the present invention;
FIG. 3 is a diagram of a LLVM compiler framework.
Detailed Description
The present invention will be described in further detail below with reference to specific embodiments and with reference to the attached drawings.
As shown in fig. 1, a cross-platform source virtualization protection method based on an intermediate language interpreter according to an embodiment of the present invention includes the following steps:
step 101: converting the source code into an intermediate language to generate an original intermediate file;
step 102: performing confusion processing on an original intermediate language to generate a confused intermediate file;
step 103: modifying a bitcode generation rule of the LLVM according to a bitcode file format table of the LLVM, and converting the intermediate file subjected to confusion processing into a bitcode file in a random format;
step 104: modifying the original bytecode interpreter according to the converted random format bytecode file to obtain a bytecode interpreter corresponding to the converted random format bytecode file;
step 105: outputting the modified bytecode interpreter according to a target compiler structure, embedding the bytecode corresponding to the modified bytecode interpreter into the data segment, completing compilation and generating a protected executable file.
The invention depends on LLVM, FIG. 3 is a framework diagram of LLVM compiler, and the method is mainly divided into two processes, namely bit code (bytecode) generation and bit code (bytecode) interpretation execution. The bytecode is converted from an intermediate language (IR) of the LLVM, a source code needs to be converted into the intermediate language, the conversion process into the intermediate language supports different languages, including C/C + +, Objective-C, Fortran and Haskell, which already support the language of the LLVM front end, and for other languages, the conversion can be performed by only designing one LLVM front end.
After the conversion is completed, an additional Pass is added on the basis of the OLLVM to perform obfuscation processing on the intermediate language, and the obfuscation processing method specifically includes the following steps:
(1) and modifying the conditional judgment of the forged control flow in the OLLVM. Since in the original implementation a constant expression is used (y <10| | | x (x + 1)% 2 ═ 0), it is easy to trace to the actual instruction. In this way, a random factor is generated, a random number is given, if the random number hits a given range, the condition cond is set as true, otherwise, false is set; and looking up an expression table according to the random number to generate a new conditional expression, wherein whether the expression is satisfied or not is related to whether the value of the random number hits the range or not.
(2) A new optimizer is added to change the direct branch jump into an indirect jump. Firstly, traversing branch instructions in a target function, storing all jump addresses in a newly-added global address table, and storing the mapping of the instructions and the addresses; if the branch is a non-conditional branch, directly looking up a table according to the instruction; if the branch is the conditional branch, judging whether table lookup is needed or not according to whether the condition is met or not so as to carry out branch jump. Because the realization is related to the actual runtime condition, the assembly code can be prevented from being directly assembled into a high-level language, and the use of a back-compiling plug-in is effectively prevented.
(3) Adding confusion of backtracking references, such as func _ a calling func _ b, changing func _ a calling func _ b1, func _ b1 calling func _ b2 … … until reaching func _ b, for each jump in the target function, and marking each function as a horizontal node; and adding vertical reference confusion, namely adding a plurality of fake _ stub functions which have no effect in actual calling, calling functions in the horizontal chain in the fake _ stub functions, and confusing the calling chain which goes back upwards. In addition, for each node, different flower instructions are randomly added; to avoid vertical nodes being removed due to compiler optimization, a global table is maintained, and addresses of vertical nodes are stored and referenced in a binary entry, such as the init _ array of C.
(4) And modifying the encryption and decryption logic of the Armariris character string confusion, wherein the original realization is decryption at a binary entrance, and the character string can return to the state before the confusion only by dynamically debugging to the position where the decryption is finished. The method is to find out the reference point of each character string, insert a new local variable at the beginning of the function where the reference point is located, store the confused character string content, and perform decryption operation.
The obfuscation process may select one of the four ways described above.
Aiming at a virtual machine protection system realized by LLVM intermediate language, the virtual machine is irrelevant to language and platform, and has good universality; and the code protection strength is increased by combining the confusion with the virtual machine protection technology.
After the confusion is finished, modifying the bytecode generation rule of the LLVM, and converting the confused intermediate file into a bytecode in a random format; then, adjusting the implementation of the LLVM interpreter (lli) according to the bitcode file format table; finally, outputting the modified lli according to a target compiler structure, and embedding a corresponding bitcode file in a data segment of the target compiler structure to finish compiling.
Further, in step 103, modifying the bitcode generation rule of the LLVM is to add a bitcode file format table on the original basis, and disorder the LLVM bitcode file format according to the bitcode file format table.
Further, in step 103, the bytecode file format is specified by a plurality of mapping relationships, and the file format can be expanded according to the following rules:
(1) the bitcode file format is specified by establishing a mapping relation between a function statement table and a parameter table through a value function;
and disturbing the function statement and the parameter table, and establishing a corresponding relation through mapping.
(2) The bitcode file format adds a new virtual instruction;
modifying the content in the instruction list, adding a new virtual instruction, wherein if the load instruction corresponds to a new vload instruction, the new virtual instruction and the original instruction have unique mapping, the byte code of the new vload instruction is generated according to a random factor, and the vload instruction is restored to the intermediate expression form of the real load instruction before the explanation and execution; the modified interpreter execution logic is shown in fig. 2, where the virtual instruction refers to a new virtual instruction such as vload.
(3) For the block containing the symbols and the constants, the block is simply processed by using the means or the means, so that the information output is reduced; likewise, these blocks need to be decrypted before the interpretation is performed.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (4)

1. A cross-platform source code virtualization protection method based on an intermediate language interpreter is characterized by comprising the following steps:
step 101: converting the source code into an intermediate language to generate an original intermediate file;
step 102: performing obfuscation processing on an original intermediate language to generate an obfuscated intermediate file, where the performing obfuscation processing on the original intermediate language includes: modifying the condition judgment of the forged control flow in the LLVM to perform confusion processing;
step 103: modifying a bitcode generation rule of the LLVM according to a bitcode file format table of the LLVM, and converting the obfuscated intermediate file into a bitcode file in a random format, wherein the modification of the bitcode generation rule of the LLVM according to the bitcode file format table of the LLVM is realized by adding a bitcode file format table on the original basis and obfuscating the LLVM bitcode file format according to the bitcode file format table;
step 104: modifying the original bytecode interpreter according to the converted bytecode file in the random format to obtain a corresponding bytecode interpreter;
step 105: outputting the modified bytecode interpreter according to a target compiler structure, embedding the bytecode corresponding to the modified bytecode interpreter into the data segment, completing compilation and generating a protected executable file.
2. The method for cross-platform source code virtualization protection based on the intermediate language interpreter according to claim 1, characterized in that: in step 103, the bitcode file format is specified by a plurality of mapping relationships.
3. The method for cross-platform source code virtualization protection based on the intermediate language interpreter as claimed in claim 2, wherein: in step 103, the bitcode file format is specified by establishing a mapping relationship between the function declaration table and the parameter table by using a value function.
4. The method for cross-platform source code virtualization protection based on the intermediate language interpreter as claimed in claim 2, wherein: in step 103, the bitcode file format is specified by adding a new virtual instruction and by the mapping relationship between the new virtual instruction and the original instruction.
CN201811410412.7A 2018-11-23 2018-11-23 Cross-platform source code virtualization protection method based on intermediate language interpreter Active CN109543368B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811410412.7A CN109543368B (en) 2018-11-23 2018-11-23 Cross-platform source code virtualization protection method based on intermediate language interpreter

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811410412.7A CN109543368B (en) 2018-11-23 2018-11-23 Cross-platform source code virtualization protection method based on intermediate language interpreter

Publications (2)

Publication Number Publication Date
CN109543368A CN109543368A (en) 2019-03-29
CN109543368B true CN109543368B (en) 2021-08-17

Family

ID=65849541

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811410412.7A Active CN109543368B (en) 2018-11-23 2018-11-23 Cross-platform source code virtualization protection method based on intermediate language interpreter

Country Status (1)

Country Link
CN (1) CN109543368B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110245468B (en) * 2019-05-24 2023-06-16 创新先进技术有限公司 Compiling method, compiling device, compiling equipment and compiling system
CN110929234B (en) * 2019-11-28 2023-03-24 施羊梦燊 Python program encryption protection system and method based on code virtualization
CN112328258A (en) * 2020-11-26 2021-02-05 中国科学院计算机网络信息中心 Method and system for converting programming language
CN114327789A (en) * 2022-01-07 2022-04-12 北京大学 Virtual machine protection method and system based on LLVM

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106960141A (en) * 2016-01-08 2017-07-18 阿里巴巴集团控股有限公司 Coding, coding/decoding method and the device of virtual machine instructions, virtual machine protection system
CN107545174A (en) * 2017-08-22 2018-01-05 武汉大学 A kind of system and method for resisting controlling stream abduction based on LLVM

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106960141A (en) * 2016-01-08 2017-07-18 阿里巴巴集团控股有限公司 Coding, coding/decoding method and the device of virtual machine instructions, virtual machine protection system
CN107545174A (en) * 2017-08-22 2018-01-05 武汉大学 A kind of system and method for resisting controlling stream abduction based on LLVM

Also Published As

Publication number Publication date
CN109543368A (en) 2019-03-29

Similar Documents

Publication Publication Date Title
CN109543368B (en) Cross-platform source code virtualization protection method based on intermediate language interpreter
CN108345773B (en) Code protection method and device based on virtual machine, electronic equipment and storage medium
CN108614960B (en) JavaScript virtualization protection method based on front-end byte code technology
CN106462677B (en) Method and device for protecting software project
CN103544414B (en) A kind of degree of depth Code obfuscation method of android system application
US8589897B2 (en) System and method for branch extraction obfuscation
CN104091100A (en) Software protection method based on intermediate result compiling
US8874928B2 (en) System and method for obfuscating constants in a computer program
JP6429785B2 (en) Compiler without operators
CN106682460B (en) It is a kind of based on the Code obfuscation method converted twice
US20130125243A1 (en) Method for preventing software reverse engineering, unauthorized modification, and runtime data interception
CN107908933A (en) A kind of character string encryption method based on intermediate language
CN108710787B (en) Code obfuscation method and apparatus, computing device, computer storage medium
KR101861341B1 (en) Deobfuscation apparatus of application code and method of deobfuscating application code using the same
CN109948308A (en) Code security guard method, device, electronic equipment and computer readable storage medium
CN110569628B (en) Code obfuscation method and device, computer device and storage medium
CN105930694A (en) Flexible Instruction Sets For Obfuscated Virtual Machines
CN107632832B (en) Dalvik byte code oriented control flow confusion method
CN106960140A (en) Virtual machine instructions obscure method and device, virtual machine protection system
KR101356676B1 (en) Translating expressions in a computing environment
CN110245467A (en) Android application program guard method based on Dex2C and LLVM
CN103514027B (en) Method for enhancing usability of software protection
CN103678961A (en) Code dynamic generating method
EP2937803B1 (en) Control flow flattening for code obfuscation where the next block calculation needs run-time information
CN108932407B (en) Program safety protection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant