CN109543368B - Cross-platform source code virtualization protection method based on intermediate language interpreter - Google Patents
Cross-platform source code virtualization protection method based on intermediate language interpreter Download PDFInfo
- Publication number
- CN109543368B CN109543368B CN201811410412.7A CN201811410412A CN109543368B CN 109543368 B CN109543368 B CN 109543368B CN 201811410412 A CN201811410412 A CN 201811410412A CN 109543368 B CN109543368 B CN 109543368B
- Authority
- CN
- China
- Prior art keywords
- file
- bitcode
- interpreter
- llvm
- bytecode
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 21
- 238000012545 processing Methods 0.000 claims abstract description 10
- 230000006870 function Effects 0.000 claims description 14
- 238000013507 mapping Methods 0.000 claims description 12
- 230000004048 modification Effects 0.000 claims description 3
- 238000012986 modification Methods 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 abstract description 6
- 230000009286 beneficial effect Effects 0.000 abstract description 2
- 230000006872 improvement Effects 0.000 description 5
- 238000006243 chemical reaction Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000009191 jumping Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/10—Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
- G06F21/12—Protecting executable software
- G06F21/14—Protecting executable software against software analysis or reverse engineering, e.g. by obfuscation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/52—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow
- G06F21/53—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow by executing in a restricted environment, e.g. sandbox or secure virtual machine
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/43—Checking; Contextual analysis
- G06F8/433—Dependency analysis; Data or control flow analysis
- G06F8/434—Pointers; Aliasing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
- G06F8/447—Target code generation
Abstract
The invention discloses a cross-platform source code virtualization protection method based on an intermediate language interpreter, which comprises the following steps: converting the source code into an intermediate language to generate an original intermediate file; performing confusion processing on an original intermediate language to generate a confused intermediate file; modifying a bitcode generation rule of the LLVM according to a bitcode file format table of the LLVM, and converting the intermediate file subjected to confusion processing into a bitcode file in a random format; modifying the original bytecode interpreter according to the converted bytecode file in the random format to obtain a corresponding bytecode interpreter; outputting the modified bytecode interpreter according to a target compiler structure, embedding the bytecode corresponding to the modified bytecode interpreter into the data segment, completing compilation and generating a protected executable file. The invention has the beneficial effects that: the confusion is combined with the virtual machine protection technology, so that the source code protection degree is increased; by utilizing the LLVM characteristics, the virtual machine is independent of languages and platforms, and has good universality.
Description
Technical Field
The invention relates to the technical field of software security, in particular to a cross-platform source code virtualization protection method based on an intermediate language interpreter.
Background
With the rapid development of software ecology, how to protect the security and integrity of software products becomes an increasingly important topic, and code obfuscation and virtual machine protection are widely applied to various production environments as mature software protection strategies. Nowadays, binary analysis techniques such as symbol execution are becoming more mature, and code obfuscation techniques represented by OLLVM generally employ techniques such as basic block segmentation, instruction replacement, control flow flattening, forged branch, and string obfuscation. For confusion with branch jumping, i.e. branch forgery, control flow flattening, symbolic execution can be used to determine the true code execution logic, but since there are very mature decompilation engines on the market, the symbolic execution technology is more mature, and binary analysis frameworks such as angr, Miasm and the like all include a relatively mature symbolic execution engine. For block branch obfuscation, it may be identified with symbolic execution; the confusion technology can still be correctly decompiled at the assembly level, and the original code logic can be analyzed by analyzing the assembly code, so the confusion technology has the risk of being cracked, and the simple confusion can not meet the protection force required by the current software ecological field.
In addition, the implementation of virtual machine protection still has great limitations. For an interpreter of assembly language, the traditional virtual machine protection technology usually needs to realize different virtual instruction mappings aiming at different CPU architectures, and the method has the problems of execution efficiency, difficulty in debugging and the like; for the interpreter of the high-level language, the structure of the source code needs to be adjusted to realize the interpreter of the high-level language, but the original structure of the source code is damaged, the risk of source code leakage exists to a certain extent, and the interpreter is related to the language, and the interpreter of the high-level language also needs to be redesigned when being adapted to the new language.
Disclosure of Invention
In order to solve the above problems, an object of the present invention is to provide a cross-platform source code virtualization protection method based on an intermediate language interpreter, which includes the following steps:
step 101: converting the source code into an intermediate language to generate an original intermediate file;
step 102: performing confusion processing on an original intermediate language to generate a confused intermediate file;
step 103: modifying a bitcode generation rule of the LLVM according to a bitcode file format table of the LLVM, and converting the intermediate file subjected to confusion processing into a bitcode file in a random format;
step 104: modifying the original bytecode interpreter according to the converted bytecode file in the random format to obtain a corresponding bytecode interpreter;
step 105: outputting the modified bytecode interpreter according to a target compiler structure, embedding the bytecode corresponding to the modified bytecode interpreter into the data segment, completing compilation and generating a protected executable file.
As a further improvement of the present invention, in step 103, modifying the bitcode generation rule of the LLVM is to add a bitcode file format table on the original basis, and disturb the LLVM bitcode file format according to the bitcode file format table.
As a further improvement of the present invention, in step 103, the bitcode file format is specified by a plurality of mapping relationships.
As a further improvement of the present invention, in step 103, the bitcode file format is specified by using a value function to establish a mapping relationship between the function declaration table and the parameter table.
As a further improvement of the invention, the bitcode file format is specified by adding a new virtual instruction and by the mapping relation between the new virtual instruction and the original instruction.
The invention has the beneficial effects that: the source code is converted into the bytecode with a specific format, the interpreter corresponding to the source code is generated, and the bytecode and the interpreter corresponding to the source code are embedded into a code protection scheme in the output binary file, so that the universality of language and architecture is realized; only the interpreter in the generated binary file can be directly executed, and the code logic is hidden in the data segment, so that the protection strength of the source code is improved.
Drawings
Fig. 1 is a flowchart of a cross-platform source code virtualization protection method based on an intermediate language interpreter according to an embodiment of the present invention;
FIG. 2 is an interpreter execution logic diagram of a cross-platform source virtualization protection method based on an intermediate language interpreter according to an embodiment of the present invention;
FIG. 3 is a diagram of a LLVM compiler framework.
Detailed Description
The present invention will be described in further detail below with reference to specific embodiments and with reference to the attached drawings.
As shown in fig. 1, a cross-platform source virtualization protection method based on an intermediate language interpreter according to an embodiment of the present invention includes the following steps:
step 101: converting the source code into an intermediate language to generate an original intermediate file;
step 102: performing confusion processing on an original intermediate language to generate a confused intermediate file;
step 103: modifying a bitcode generation rule of the LLVM according to a bitcode file format table of the LLVM, and converting the intermediate file subjected to confusion processing into a bitcode file in a random format;
step 104: modifying the original bytecode interpreter according to the converted random format bytecode file to obtain a bytecode interpreter corresponding to the converted random format bytecode file;
step 105: outputting the modified bytecode interpreter according to a target compiler structure, embedding the bytecode corresponding to the modified bytecode interpreter into the data segment, completing compilation and generating a protected executable file.
The invention depends on LLVM, FIG. 3 is a framework diagram of LLVM compiler, and the method is mainly divided into two processes, namely bit code (bytecode) generation and bit code (bytecode) interpretation execution. The bytecode is converted from an intermediate language (IR) of the LLVM, a source code needs to be converted into the intermediate language, the conversion process into the intermediate language supports different languages, including C/C + +, Objective-C, Fortran and Haskell, which already support the language of the LLVM front end, and for other languages, the conversion can be performed by only designing one LLVM front end.
After the conversion is completed, an additional Pass is added on the basis of the OLLVM to perform obfuscation processing on the intermediate language, and the obfuscation processing method specifically includes the following steps:
(1) and modifying the conditional judgment of the forged control flow in the OLLVM. Since in the original implementation a constant expression is used (y <10| | | x (x + 1)% 2 ═ 0), it is easy to trace to the actual instruction. In this way, a random factor is generated, a random number is given, if the random number hits a given range, the condition cond is set as true, otherwise, false is set; and looking up an expression table according to the random number to generate a new conditional expression, wherein whether the expression is satisfied or not is related to whether the value of the random number hits the range or not.
(2) A new optimizer is added to change the direct branch jump into an indirect jump. Firstly, traversing branch instructions in a target function, storing all jump addresses in a newly-added global address table, and storing the mapping of the instructions and the addresses; if the branch is a non-conditional branch, directly looking up a table according to the instruction; if the branch is the conditional branch, judging whether table lookup is needed or not according to whether the condition is met or not so as to carry out branch jump. Because the realization is related to the actual runtime condition, the assembly code can be prevented from being directly assembled into a high-level language, and the use of a back-compiling plug-in is effectively prevented.
(3) Adding confusion of backtracking references, such as func _ a calling func _ b, changing func _ a calling func _ b1, func _ b1 calling func _ b2 … … until reaching func _ b, for each jump in the target function, and marking each function as a horizontal node; and adding vertical reference confusion, namely adding a plurality of fake _ stub functions which have no effect in actual calling, calling functions in the horizontal chain in the fake _ stub functions, and confusing the calling chain which goes back upwards. In addition, for each node, different flower instructions are randomly added; to avoid vertical nodes being removed due to compiler optimization, a global table is maintained, and addresses of vertical nodes are stored and referenced in a binary entry, such as the init _ array of C.
(4) And modifying the encryption and decryption logic of the Armariris character string confusion, wherein the original realization is decryption at a binary entrance, and the character string can return to the state before the confusion only by dynamically debugging to the position where the decryption is finished. The method is to find out the reference point of each character string, insert a new local variable at the beginning of the function where the reference point is located, store the confused character string content, and perform decryption operation.
The obfuscation process may select one of the four ways described above.
Aiming at a virtual machine protection system realized by LLVM intermediate language, the virtual machine is irrelevant to language and platform, and has good universality; and the code protection strength is increased by combining the confusion with the virtual machine protection technology.
After the confusion is finished, modifying the bytecode generation rule of the LLVM, and converting the confused intermediate file into a bytecode in a random format; then, adjusting the implementation of the LLVM interpreter (lli) according to the bitcode file format table; finally, outputting the modified lli according to a target compiler structure, and embedding a corresponding bitcode file in a data segment of the target compiler structure to finish compiling.
Further, in step 103, modifying the bitcode generation rule of the LLVM is to add a bitcode file format table on the original basis, and disorder the LLVM bitcode file format according to the bitcode file format table.
Further, in step 103, the bytecode file format is specified by a plurality of mapping relationships, and the file format can be expanded according to the following rules:
(1) the bitcode file format is specified by establishing a mapping relation between a function statement table and a parameter table through a value function;
and disturbing the function statement and the parameter table, and establishing a corresponding relation through mapping.
(2) The bitcode file format adds a new virtual instruction;
modifying the content in the instruction list, adding a new virtual instruction, wherein if the load instruction corresponds to a new vload instruction, the new virtual instruction and the original instruction have unique mapping, the byte code of the new vload instruction is generated according to a random factor, and the vload instruction is restored to the intermediate expression form of the real load instruction before the explanation and execution; the modified interpreter execution logic is shown in fig. 2, where the virtual instruction refers to a new virtual instruction such as vload.
(3) For the block containing the symbols and the constants, the block is simply processed by using the means or the means, so that the information output is reduced; likewise, these blocks need to be decrypted before the interpretation is performed.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (4)
1. A cross-platform source code virtualization protection method based on an intermediate language interpreter is characterized by comprising the following steps:
step 101: converting the source code into an intermediate language to generate an original intermediate file;
step 102: performing obfuscation processing on an original intermediate language to generate an obfuscated intermediate file, where the performing obfuscation processing on the original intermediate language includes: modifying the condition judgment of the forged control flow in the LLVM to perform confusion processing;
step 103: modifying a bitcode generation rule of the LLVM according to a bitcode file format table of the LLVM, and converting the obfuscated intermediate file into a bitcode file in a random format, wherein the modification of the bitcode generation rule of the LLVM according to the bitcode file format table of the LLVM is realized by adding a bitcode file format table on the original basis and obfuscating the LLVM bitcode file format according to the bitcode file format table;
step 104: modifying the original bytecode interpreter according to the converted bytecode file in the random format to obtain a corresponding bytecode interpreter;
step 105: outputting the modified bytecode interpreter according to a target compiler structure, embedding the bytecode corresponding to the modified bytecode interpreter into the data segment, completing compilation and generating a protected executable file.
2. The method for cross-platform source code virtualization protection based on the intermediate language interpreter according to claim 1, characterized in that: in step 103, the bitcode file format is specified by a plurality of mapping relationships.
3. The method for cross-platform source code virtualization protection based on the intermediate language interpreter as claimed in claim 2, wherein: in step 103, the bitcode file format is specified by establishing a mapping relationship between the function declaration table and the parameter table by using a value function.
4. The method for cross-platform source code virtualization protection based on the intermediate language interpreter as claimed in claim 2, wherein: in step 103, the bitcode file format is specified by adding a new virtual instruction and by the mapping relationship between the new virtual instruction and the original instruction.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811410412.7A CN109543368B (en) | 2018-11-23 | 2018-11-23 | Cross-platform source code virtualization protection method based on intermediate language interpreter |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811410412.7A CN109543368B (en) | 2018-11-23 | 2018-11-23 | Cross-platform source code virtualization protection method based on intermediate language interpreter |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109543368A CN109543368A (en) | 2019-03-29 |
CN109543368B true CN109543368B (en) | 2021-08-17 |
Family
ID=65849541
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811410412.7A Active CN109543368B (en) | 2018-11-23 | 2018-11-23 | Cross-platform source code virtualization protection method based on intermediate language interpreter |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109543368B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110245468B (en) * | 2019-05-24 | 2023-06-16 | 创新先进技术有限公司 | Compiling method, compiling device, compiling equipment and compiling system |
CN110929234B (en) * | 2019-11-28 | 2023-03-24 | 施羊梦燊 | Python program encryption protection system and method based on code virtualization |
CN112328258A (en) * | 2020-11-26 | 2021-02-05 | 中国科学院计算机网络信息中心 | Method and system for converting programming language |
CN114327789A (en) * | 2022-01-07 | 2022-04-12 | 北京大学 | Virtual machine protection method and system based on LLVM |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106960141A (en) * | 2016-01-08 | 2017-07-18 | 阿里巴巴集团控股有限公司 | Coding, coding/decoding method and the device of virtual machine instructions, virtual machine protection system |
CN107545174A (en) * | 2017-08-22 | 2018-01-05 | 武汉大学 | A kind of system and method for resisting controlling stream abduction based on LLVM |
-
2018
- 2018-11-23 CN CN201811410412.7A patent/CN109543368B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106960141A (en) * | 2016-01-08 | 2017-07-18 | 阿里巴巴集团控股有限公司 | Coding, coding/decoding method and the device of virtual machine instructions, virtual machine protection system |
CN107545174A (en) * | 2017-08-22 | 2018-01-05 | 武汉大学 | A kind of system and method for resisting controlling stream abduction based on LLVM |
Also Published As
Publication number | Publication date |
---|---|
CN109543368A (en) | 2019-03-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109543368B (en) | Cross-platform source code virtualization protection method based on intermediate language interpreter | |
CN108345773B (en) | Code protection method and device based on virtual machine, electronic equipment and storage medium | |
CN108614960B (en) | JavaScript virtualization protection method based on front-end byte code technology | |
CN106462677B (en) | Method and device for protecting software project | |
CN103544414B (en) | A kind of degree of depth Code obfuscation method of android system application | |
US8589897B2 (en) | System and method for branch extraction obfuscation | |
CN104091100A (en) | Software protection method based on intermediate result compiling | |
US8874928B2 (en) | System and method for obfuscating constants in a computer program | |
JP6429785B2 (en) | Compiler without operators | |
CN106682460B (en) | It is a kind of based on the Code obfuscation method converted twice | |
US20130125243A1 (en) | Method for preventing software reverse engineering, unauthorized modification, and runtime data interception | |
CN107908933A (en) | A kind of character string encryption method based on intermediate language | |
CN108710787B (en) | Code obfuscation method and apparatus, computing device, computer storage medium | |
KR101861341B1 (en) | Deobfuscation apparatus of application code and method of deobfuscating application code using the same | |
CN109948308A (en) | Code security guard method, device, electronic equipment and computer readable storage medium | |
CN110569628B (en) | Code obfuscation method and device, computer device and storage medium | |
CN105930694A (en) | Flexible Instruction Sets For Obfuscated Virtual Machines | |
CN107632832B (en) | Dalvik byte code oriented control flow confusion method | |
CN106960140A (en) | Virtual machine instructions obscure method and device, virtual machine protection system | |
KR101356676B1 (en) | Translating expressions in a computing environment | |
CN110245467A (en) | Android application program guard method based on Dex2C and LLVM | |
CN103514027B (en) | Method for enhancing usability of software protection | |
CN103678961A (en) | Code dynamic generating method | |
EP2937803B1 (en) | Control flow flattening for code obfuscation where the next block calculation needs run-time information | |
CN108932407B (en) | Program safety protection method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |