CN112115427B - Code confusion method, device, electronic equipment and storage medium - Google Patents

Code confusion method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112115427B
CN112115427B CN202010819524.9A CN202010819524A CN112115427B CN 112115427 B CN112115427 B CN 112115427B CN 202010819524 A CN202010819524 A CN 202010819524A CN 112115427 B CN112115427 B CN 112115427B
Authority
CN
China
Prior art keywords
code block
code
address
instruction
code blocks
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010819524.9A
Other languages
Chinese (zh)
Other versions
CN112115427A (en
Inventor
兰丽
蒲志明
夏冰
于大鹏
高迪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
MIGU Culture Technology Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
MIGU Culture Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, MIGU Culture Technology Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202010819524.9A priority Critical patent/CN112115427B/en
Publication of CN112115427A publication Critical patent/CN112115427A/en
Application granted granted Critical
Publication of CN112115427B publication Critical patent/CN112115427B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
    • G06F21/12Protecting executable software
    • G06F21/14Protecting executable software against software analysis or reverse engineering, e.g. by obfuscation

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Technology Law (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)

Abstract

The embodiment of the invention provides a code confusion method, a device, electronic equipment and a storage medium; the method comprises the following steps: determining basic code blocks in a function according to the control flow trend of the function in a target program; dividing the basic code blocks to obtain sub code blocks; converting a target address of an unconditional jump instruction in a code block into an address determined when the target program runs; wherein the code blocks comprise sub-code blocks or sub-code blocks and basic code blocks which are not segmented. According to the code confusion method, the device, the electronic equipment and the storage medium, the target address of the unconditional jump instruction in the code block is converted into the address determined when the target program runs, so that the direct jump relation between the code block with the unconditional jump instruction and the code block to be jumped is cut off, and the difficulty of reverse analysis is increased.

Description

Code confusion method, device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of computer security technologies, and in particular, to a code confusion method, a code confusion device, an electronic device, and a storage medium.
Background
With the development of information science and technology, the software system brings convenience to users and meanwhile, the security of the software system is also subjected to serious threat. An attacker can easily obtain user private information, a core algorithm, a key business process and even a source code of the software contained in the software through reverse means such as decompilation, disassembly, dynamic debugging and the like. This results in a significant loss of software intellectual property protection for the enterprise.
In order to effectively resist the reverse analysis of an attacker aiming at software, software developers put forward protection technologies such as software encryption, code confusion, software watermarking, tamper resistance and the like. The code confusion technology is one of key technologies for guaranteeing software safety, and converts source codes and internal structure logic of a program into a form which is more difficult to analyze and modify on the premise of not changing the semantics of the original program, so that the reverse analysis cost of an attacker is greatly increased.
Control flow confusion is a mature and key technology in code confusion, and the control flow of an original program is changed or complicated to hide the real execution logic of the program, so that the difficulty of a cracker in analyzing and reconstructing the control flow of the program is increased, and the source code is protected. The implementation method for the control flow confusion technology which is researched more at present comprises the following steps:
(1) Opaque predicates
Adding logically true, logically false or opaque predicates of true time and false time, the values of which are difficult to infer from the expression itself, to basic blocks in a control flow graph confuses the true execution flow of the basic blocks, and the control flow becomes more complex.
(2) Control flow planarization
By destroying the easily identifiable conditions and loop structures in the function control flow graph, the easily readable code flow is re-organized into a code execution flow in the form of SWITCH CASE which is difficult to understand.
(3) Inserting false control streams
And by using an opaque predicate technology, redundant control flows are inserted into the original control flow, so that the complexity of the original control flow is increased, and the difficulty of an attacker in reconstructing the original control flow is improved.
The control flow confusion method has obvious characteristics after the program is confused, is easily found by an attacker, and is broken by using the existing reverse technology to restore the code control flow. Opaque predicates, as often used, are a limited number of more complex mathematical expressions that can be collected and sorted and filtered out directly upon reverse analysis; for another example, the control flow flattened program has an obvious SWITCH CASE structure, and the execution sequence of the code blocks can be recorded in the dynamic debugging process to reconstruct the control flow.
Disclosure of Invention
The embodiment of the invention provides a code confusion method, a device, electronic equipment and a storage medium, which are used for solving the defects that after a program code is confused by the code confusion method in the prior art, the program characteristics are obvious and are easy to be found and cracked by an attacker.
An embodiment of a first aspect of the present invention provides a code obfuscation method, including:
Determining basic code blocks in a function according to the control flow trend of the function in a target program;
dividing the basic code blocks to obtain sub code blocks;
converting a target address of an unconditional jump instruction in a code block into an address determined when the target program runs; wherein the code blocks comprise sub-code blocks or sub-code blocks and basic code blocks which are not segmented.
In the above technical solution, after the step of dividing the basic code block to obtain the sub code block, the method further includes:
the arrangement order of the code blocks in the functions in the target program in the affiliated functions is disturbed.
In the above technical solution, the converting the target address of the unconditional jump instruction in the code block into the address determined when the target program runs specifically includes:
Inserting an address calculation code block between a first code block with an unconditional jump instruction and a second code block to be jumped according to the unconditional jump instruction, and changing a jump target of the unconditional jump instruction to the address calculation code block; wherein the address calculation code block is configured to dynamically calculate an address of the second code block at runtime.
In the above technical solution, the inserting an address calculation code block between a first code block having an unconditional jump instruction and a second code block to be jumped according to the unconditional jump instruction specifically includes:
Inserting an address calculation code block between the first code block and the second code block;
According to the address information of the address calculation code block and the address information of the second code block, calculating the address offset between the address calculation code block and the second code block;
modifying an address to be jumped by the unconditional jump instruction in the address calculation code block into an address calculation formula, wherein the address calculation formula comprises: the address calculates address information of the code block itself, the address calculating an address offset between the code block and the second code block.
In the above technical solution, the method further includes:
And replacing the direct call instruction to the system function in the target program with the indirect call instruction to the system function.
In the above technical solution, the replacing the direct call instruction to the system function in the target program with the indirect call instruction to the system function specifically includes:
Generating an indirect calling instruction according to the real address of the system function and the dynamic link library, wherein the indirect calling instruction obtains the real address of the system function to be executed by analyzing the form of the function address in the dynamic link library;
and replacing the direct call instruction to the system function in the target program with the indirect call instruction to the system function.
In the above technical solution, the dividing the basic code block according to a second preset rule to obtain the sub code block specifically includes:
judging whether the basic code block meets a preset rule, and dividing the basic code block to obtain a first division result when the basic code block meets the preset rule;
Judging whether a jump instruction exists at the tail of the first segmentation result, and adding the jump instruction at the tail of the first segmentation result when the jump instruction does not exist, so as to obtain a sub-code block; the jump instruction is used for jumping to the next instruction of the last instruction of the first segmentation result in the basic code block.
An embodiment of a second aspect of the present invention provides a code obfuscating apparatus, including:
the basic code block determining module is used for determining basic code blocks in the functions according to the control flow trend of the functions in the target program;
The sub-code block generation module is used for dividing the basic code blocks to obtain sub-code blocks;
The instruction conversion module is used for converting the target address of the unconditional jump instruction in the code block into an address determined when the target program runs; wherein the code blocks comprise sub-code blocks or sub-code blocks and basic code blocks which are not segmented.
An embodiment of the third aspect of the present invention provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the code obfuscation method according to the embodiment of the first aspect of the present invention when the program is executed.
An embodiment of a fourth aspect of the present invention provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of a code obfuscation method according to an embodiment of the first aspect of the present invention.
According to the code confusion method, the device, the electronic equipment and the storage medium, the target address of the unconditional jump instruction in the code block is converted into the address determined when the target program runs, so that the direct jump relation between the code block with the unconditional jump instruction and the code block to be jumped is cut off, and the difficulty of reverse analysis is increased.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a code obfuscation method provided by an embodiment of the present invention;
FIG. 2 is a schematic diagram of inserting an address calculation code block between a code block with an unconditional jump instruction and a code block to jump;
FIG. 3 is a control flow graph that is viewed with the inverse tool IDA before a function is obfuscated;
FIG. 4 is a control flow graph of the function associated with FIG. 3, as seen with the inverse tool IDA, after it has been obfuscated;
FIG. 5 is a schematic diagram of a code obfuscation apparatus according to an embodiment of the present invention;
fig. 6 is a schematic entity structure diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Fig. 1 is a flowchart of a code confusion method provided by an embodiment of the present invention, where, as shown in fig. 1, the code confusion method provided by the embodiment of the present invention includes:
And step 101, determining basic code blocks in the function according to the trend of the control flow of the function in the target program.
In the embodiment of the invention, the target program refers to a program for performing code confusion by adopting the code confusion method provided by the embodiment of the invention. Specifically, the functions in the target program are the objects of the code obfuscation operation, i.e., the code within the functions contained in the target program needs to be obfuscated.
Those skilled in the art will appreciate that a function in the computer arts is a fixed program segment that is capable of performing a specified function independently. The basic constituent unit of a function is an instruction. Executing the instructions may implement a particular step. The basic code blocks are interposed between functions and instructions. The basic code blocks include a plurality of instructions, and the functions may include one or more basic code blocks.
The execution logic inside the target program is called control flow direction. The control flow trend reflects the calling relation among functions and the execution flow of each instruction in the functions. As an alternative implementation, the control flow direction of the function of the target program may be represented by a control flow graph.
In the embodiment of the invention, the control flow trend is obtained by pre-analysis, and in other embodiments of the invention, the target program can be analyzed by a tool and a function library provided by the LLVM, so as to obtain the control flow trend of the function of the target program. The LLVM compiles the source code of the target program into an intermediate code file, analyzes the intermediate code file, and traverses the functions in the intermediate code file, thereby obtaining the control flow trend of the functions.
As an alternative implementation manner, the tool and function library provided by the LLVM can be utilized to analyze the trend of the control flow of the function, and all switch-case structures in the function are replaced by if-else structures, so that the basic code blocks of the original control flow of the function are obtained. Since the switch-case structure is easily used by an attacker for reverse analysis of the target program, it is necessary to replace all the switch-case structures in the function with if-else structures.
And further segmenting the function according to the trend of the control flow of the target program to obtain a basic code block. When the function is further segmented, the basic code blocks can be segmented from the function according to the trend of the control flow of the function in the target program and the first preset rule.
According to the first preset rule, a start instruction of the basic code block can be determined, wherein the first preset rule comprises any one of the following three items:
a) Inlet instructions for the function;
b) Target instructions of jump instructions;
c) The next instruction after the jump instruction and which is not the target instruction of the jump instruction.
After determining the start instruction of the basic code block, all instructions between the start instructions of two basic code blocks in the same function (without the second start instruction) are formed into one basic code block.
It should be noted that, if a function includes a switch-case structure, after the switch-case structure is replaced, the switch-case structure will not be included in the basic code blocks obtained by further dividing the function.
And 102, dividing the basic code blocks to obtain sub-code blocks.
In order to further increase the difficulty of the backward analysis, in the embodiment of the invention, the basic code blocks can be further divided to obtain the sub code blocks.
The sub-code block is a portion of the base code block that includes a plurality of instructions. Before dividing the basic code blocks, it is first determined whether the basic code blocks satisfy a second preset rule. The second preset rule is used for describing the segmentation condition of the basic code block. For example, one implementation of the second preset rule is a preset number of instructions, such as 5. Judging whether the number of the instruction strips in the basic code blocks is 5 or not when judging whether the basic code blocks meet the second preset rule, and if so, enabling the basic code blocks to meet the second preset rule and further segmentation processing; if the number of the basic code blocks is less than 5, the basic code blocks cannot be subjected to segmentation processing. The content of the second preset rule is not limited to the preset number of instructions, and the content of the second preset rule can be determined according to actual needs.
The second preset rule may be used not only to determine whether the basic code blocks can be segmented, but also to guide the segmentation of the basic code blocks. When the basic code blocks are segmented, the basic code blocks can be segmented according to a second preset rule (such as a preset instruction number) to obtain sub-code blocks. For example, every 5 instructions are preset to form a subcode block, and when the basic code block is divided, the continuous 5 instructions in the basic code block are divided as a basic unit, so that the corresponding subcode block is obtained. If the basic code block is divided for a certain time, and the residual instructions are less than 5, the residual instructions can be used as a basic unit to form a sub-code block.
When the basic code blocks are generated, the jump instruction is used as a dividing boundary, so that the logical relationship between the basic code blocks obtained by dividing is not interrupted. However, when generating a subcode block, except a few subcode blocks (such as subcode blocks at the tail of a basic code block), the tail of the subcode block does not have an instruction which has an explicit logic relationship with other subcode blocks, such as a jump instruction, before being split, so that when splitting the basic code block and generating the subcode block, a jump instruction needs to be added at the tail of the subcode block, so as to define the next subcode block to be executed after the execution of the current subcode block is completed.
For example, the code of one basic code block in a certain function is as follows:
label%3:
%4=alloca i8*,align 8
%5=alloca i32*,align 8
%6=alloca i32*,align 4
%7=alloca i32*,align 4
%8=alloca i32*,align 4
store i8*%0,i8**%4,align 8
store i32*%1,i32**%5,align 8
store i32*%2,i32**%6,align 4
store i32*%0,i32**%7,align 4
store i32*%0,i32**%8,align 4
br label%9
the basic code blocks are segmented according to a certain rule, and the following sub-code blocks can be obtained:
sub-code block 1
label%3:
%4=alloca i8*,align 8
%5=alloca i32*,align 8
%6=alloca i32*,align 4
br label%.split
Sub-code block 2
label.split:
%7=alloca i32*,align 4
%8=alloca i32*,align 4
br label%.split.split
Sub-code block 3
label.split.split:
store i8*%0,i8**%4,align 8
br label%.split.split.split
Sub-code block 4
label.split.split.split:
store i32*%1,i32**%5,align 8
store i32*%2,i32**%6,align 4
store i32*%0,i32**%7,align 4
store i32*%0,i32**%8,align 4
br label%9
As can be seen from the above examples, the tail of the subcode block 1-subcode block 3 newly divided from the basic code block is newly added with jump instructions, and the next subcode block can be jumped to according to these newly added jump instructions.
Step 103, converting the target address of the unconditional jump instruction in the code block into an address determined when the target program runs.
After the operation of the previous step, each function in the target program includes sub-code blocks, and in some cases, some basic code blocks in the function do not meet the segmentation condition, so that the function also includes basic code blocks which are not segmented. The basic code blocks (if any) that are not partitioned are collectively referred to as code blocks with subcode blocks.
The method for converting the target address of the unconditional jump instruction in the code block into the address determined when the target program runs specifically comprises the following steps:
Inserting an address calculation code block between a first code block with an unconditional jump instruction and a second code block to be jumped according to the unconditional jump instruction, and changing a jump target of the unconditional jump instruction to the address calculation code block; the address calculation code block is configured to dynamically calculate an address of the second code block at runtime.
The jump instruction includes a conditional jump instruction and a non-conditional jump instruction. Unconditional jump instructions, also known as direct jump instructions, require a jump operation without making a conditional determination. The jump instruction can clearly reflect the execution sequence of the target program, so in order to increase the difficulty of reverse analysis, the jump instruction needs to be hidden.
Specifically, an address calculation code block is inserted between a first code block having an unconditional jump instruction and a second code block to be jumped to, while a direct control flow relationship between the first code block having the unconditional jump instruction and the second code block to be jumped to according to the unconditional jump instruction is cut off by modifying a target address of the unconditional jump instruction of the first code block from the second code block to the address calculation code block. The address calculation module comprises an unconditional jump instruction, the address to be jumped by the unconditional jump instruction is an address calculation formula, and the address calculation formula comprises: address information of the address calculation code block itself, address offset between the address calculation code block and the second code block. The address offset between the address calculation code block and the second code block may be calculated in advance according to the address information of the address calculation code block and the address information of the second code block.
When the target program runs, the specific value of the address information of the address calculation code block can be determined, so that the address information of the second code block is calculated by combining the address offset between the address calculation code block and the second code block, and further the jump to the second code block is realized.
The operation can cut off the direct jump relation between the first code block and the second code block, so that the address of the second code block can be dynamically calculated only when the target program runs, and the difficulty of reverse analysis is increased.
For example, FIG. 2 is a schematic diagram of inserting an address calculation code block between a code block with an unconditional jump instruction and a code block to jump. In the embodiment shown in fig. 2, assuming that there are code block a and code block B in the target program, the original execution flow of code block a and code block B is depicted at the leftmost side of fig. 2. According to the original execution flow, the code block A directly jumps to the code block B, and the code block A comprises a pseudo Jump instruction Jump B (actually can be various instructions such as Jump, mov, bx, jne and the like). According to the description of the embodiment of the present invention, first, a code block T is added between a code block a and a code block B, where the content of the code block T includes only one instruction for jumping to the code block B, and at the same time, the instruction Jump B jumping to the code block B in the code block a is changed to the instruction Jump T jumping to the code block T. The middle part of fig. 2 depicts the addition of code block T between code block a and code block B. Then, the address offset between code block B and code block T is calculated and denoted as offset. The content of the modified code block T is: and (3) obtaining the address of the T, modifying the direct Jump instruction into indirect Jump, namely replacing the Jump B instruction of the direct Jump code block B in the code block T with Jump T+offset, so that the direct Jump relation between the code block A and the code block B is cut off, and the address of the code block B can only be dynamically calculated in the running process. The right-most side of fig. 2 depicts the address content in the modified code block T.
According to the code confusion method provided by the embodiment of the invention, the target address of the unconditional jump instruction in the code block is converted into the address determined when the target program runs, so that the direct jump relation between the code block with the unconditional jump instruction and the code block to be jumped is cut off, and the difficulty of reverse analysis is increased.
Based on any of the above embodiments, in an embodiment of the present invention, after step 102, the method further includes:
The order of code blocks within the function in the target program within the function is disturbed.
There is a linear order between the code blocks inside the respective functions, and in order to increase the difficulty of the reverse analysis, in this embodiment, the order between the code blocks inside the functions is disturbed.
Specifically, in the embodiment of the present invention, numbers may be set for each code block in the function, and a random array capable of reflecting the code block numbers may be generated. For example, there are N code blocks in a certain function, and the code blocks are numbered in the order from 1 to N. After numbering the code blocks, a random array of size N, content 1 to N, and no duplicate data can be generated. The numbers in this random array are not ordered by the size of the numbers, but are randomly ordered.
After the random array is provided, the code blocks can be rearranged according to the sequence of the random array, so that the purpose of destroying the original layout of the function is achieved. For example, the number ordering in a random array of size 10 is: 1. 3, 8, 5, 7, 4, 6, 2, 9, 10. In order to order the code blocks, the code block numbered 1 is placed first in the function, then the code block numbered 3 is placed second in the function, then the code block numbered 8 is placed third in the function, and so on, and finally the code block numbered 10 is placed third in the function.
It should be noted that, in the embodiment of the present invention, the order of the code blocks inside the disturbing function refers to the static layout order of the disturbed code blocks in the function, and since each code block includes a jump instruction, the dynamic execution logic of the whole function is not changed, and the function can still be normally executed.
The target program comprises a plurality of functions, and the sequence of the code blocks in each function can be disturbed according to the description.
According to the code confusion method provided by the embodiment of the invention, the basic code blocks are obtained from the functions contained in the target program, the basic code blocks are segmented to obtain the sub-code blocks, and then the basic code blocks which are not segmented in the functions and the sub-code blocks are arranged in disorder, so that the content layout of each function in the target program is changed, and the aim of increasing the reverse analysis difficulty is fulfilled.
Based on any of the foregoing embodiments, in an embodiment of the present invention, the method further includes:
and replacing the direct call instruction to the system function in the target function with the indirect call instruction to the system function.
In the embodiment of the invention, the system function refers to a library function specified by a C language standard, such as a function in glibc. The system functions are prone to expose the locations of certain critical code, so it is desirable to hide the system functions in the target program.
In specific implementation, the direct call instruction to the system function in the target program is replaced by the indirect call instruction to the system function, and the indirect call instruction obtains the real address of the system function to be executed by analyzing the form of the function address in the dynamic link library. For example, under linux, system function addresses to be executed are resolved through dlsym function forms, so that system functions are introduced; under windows, the address of the system function to be executed is acquired through a GetProcAddress function form, so that the system function is introduced.
For example, in one embodiment, a function in the target program is first traversed, a system function in a function call is analyzed and obtained, and the system function is recorded as sysFunc; next, encrypting the system function name to obtain an encrypted system function name encStr; then, converting all system function call instructions in the target program into indirect call instructions, wherein the indirect call instructions specifically comprise: loading a system function to be called by dlsym, and recording the name of the related function as INDIRECTCALL, wherein the name of the system function is decrypted in the running process; finally, function INDIRECTCALL replaces the previous system function in the objective function.
The relevant codes for this embodiment are as follows:
before the system function is replaced:
sysFunc(…)
After the system function is replaced:
encStr=encode(“sysFunc”);
indirectCall=dls ym((void*)0,decode(encStr))
indirectCall(…)
After the operation, the call to the system function in the target program is hidden, so that an attacker cannot find out the system function in the target program through reverse analysis.
In the above-described embodiment, as a preferred implementation, the operation of encrypting the system function name is performed while replacing the direct call instruction of the system function with the indirect call instruction to the system function. Doing so may further deepen the degree of confusion of the system functions. In other embodiments, the operation of encrypting the system function name may be omitted according to actual needs.
According to the code confusion method provided by the embodiment of the invention, by hiding the call to the system function in the target program, an attacker cannot find out the system function in the target program through reverse analysis, static analysis can be effectively prevented, and copying or tampering to software is delayed.
In order to illustrate the technical effects of the code confusion method provided by the embodiment of the invention, control flow diagrams of the target program before and after code confusion are compared through the attached drawings.
FIG. 3 is a control flow diagram of a reverse tool IDA looking into a function before it is obfuscated. According to the control flow graph, an attacker can clearly see the layout and execution flow of the program. After the code confusion method provided by the embodiment of the invention is used, the control flow of the function is checked through the IDA reverse tool as shown in fig. 4, the situation that the number of the program code blocks is increased and a plurality of code blocks which are not related to each other exist can be seen, an attacker can hardly read the code and directly analyze the program execution flow.
Fig. 5 is a schematic diagram of a code confusion apparatus according to an embodiment of the present invention, and as shown in fig. 5, the code confusion apparatus according to an embodiment of the present invention includes:
A basic code block determining module 501, configured to determine a basic code block in a function according to a control flow trend of the function in a target program;
A sub-code block generating module 502, configured to divide the basic code block to obtain a sub-code block;
an instruction conversion module 503, configured to convert a target address of an unconditional jump instruction in a code block into an address determined when the target program runs; wherein the code blocks comprise sub-code blocks or sub-code blocks and basic code blocks which are not segmented.
The code confusion device provided by the embodiment of the invention converts the target address of the unconditional jump instruction in the code block into the address determined when the target program runs, thereby cutting off the direct jump relation between the code block with the unconditional jump instruction and the code block to be jumped and increasing the difficulty of reverse analysis.
Fig. 6 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention, where, as shown in fig. 6, the electronic device may include: processor 610, communication interface (Communications Interface) 620, memory 630, and communication bus 640, wherein processor 610, communication interface 620, memory 630 communicate with each other via communication bus 640. The processor 610 may call logic instructions in the memory 630 to perform the following methods: determining basic code blocks in a function according to the control flow trend of the function in a target program; dividing the basic code blocks to obtain sub code blocks; converting a target address of an unconditional jump instruction in a code block into an address determined when the target program runs; wherein the code blocks comprise sub-code blocks or sub-code blocks and basic code blocks which are not segmented.
It should be noted that, in this embodiment, the electronic device may be a server, a PC, or other devices in the specific implementation, so long as the structure of the electronic device includes a processor 610, a communication interface 620, a memory 630, and a communication bus 640 as shown in fig. 6, where the processor 610, the communication interface 620, and the memory 630 complete communication with each other through the communication bus 640, and the processor 610 may call logic instructions in the memory 630 to execute the above method. The embodiment does not limit a specific implementation form of the electronic device.
Further, the logic instructions in the memory 630 may be implemented in the form of software functional units and stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Further, embodiments of the present invention disclose a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the methods provided by the above-described method embodiments, for example comprising: determining basic code blocks in a function according to the control flow trend of the function in a target program; dividing the basic code blocks to obtain sub code blocks; converting a target address of an unconditional jump instruction in a code block into an address determined when the target program runs; wherein the code blocks comprise sub-code blocks or sub-code blocks and basic code blocks which are not segmented.
In another aspect, embodiments of the present invention also provide a non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor is implemented to perform the method provided in the above embodiments, for example, including: determining basic code blocks in a function according to the control flow trend of the function in a target program; dividing the basic code blocks to obtain sub code blocks; converting a target address of an unconditional jump instruction in a code block into an address determined when the target program runs; wherein the code blocks comprise sub-code blocks or sub-code blocks and basic code blocks which are not segmented.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (11)

1. A method of code obfuscation, comprising:
Determining basic code blocks in a function according to the control flow trend of the function in a target program;
dividing the basic code blocks to obtain sub code blocks;
converting a target address of an unconditional jump instruction in a code block into an address determined when the target program runs; wherein the code blocks comprise sub-code blocks or sub-code blocks and basic code blocks which are not segmented;
after the step of dividing the basic code block to obtain the sub-code blocks, the method further includes:
disturbing the arrangement sequence of code blocks in the functions in the target program in the affiliated functions;
the method for converting the target address of the unconditional jump instruction in the code block into the address determined when the target program runs specifically comprises the following steps:
Inserting an address calculation code block between a first code block with an unconditional jump instruction and a second code block to be jumped according to the unconditional jump instruction, and changing a jump target of the unconditional jump instruction to the address calculation code block; wherein the address calculation code block is used for dynamically calculating the address of the second code block in the running process;
The inserting an address calculation code block between a first code block with an unconditional jump instruction and a second code block to be jumped according to the unconditional jump instruction specifically comprises:
Inserting an address calculation code block between the first code block and the second code block;
According to the address information of the address calculation code block and the address information of the second code block, calculating the address offset between the address calculation code block and the second code block;
modifying an address to be jumped by the unconditional jump instruction in the address calculation code block into an address calculation formula, wherein the address calculation formula comprises: the address calculates address information of the code block itself, the address calculating an address offset between the code block and the second code block.
2. The code obfuscation method of claim 1, further comprising:
And replacing the direct call instruction to the system function in the target program with the indirect call instruction to the system function.
3. The code obfuscation method according to claim 2, wherein the replacing the direct call instruction to the system function in the target program with the indirect call instruction to the system function specifically includes:
Generating an indirect calling instruction according to the real address of the system function and the dynamic link library, wherein the indirect calling instruction obtains the real address of the system function to be executed by analyzing the form of the function address in the dynamic link library;
and replacing the direct call instruction to the system function in the target program with the indirect call instruction to the system function.
4. The code obfuscation method according to claim 1, wherein the dividing the basic code blocks according to a second preset rule to obtain sub-code blocks specifically includes:
judging whether the basic code block meets a preset rule, and dividing the basic code block to obtain a first division result when the basic code block meets the preset rule;
Judging whether a jump instruction exists at the tail of the first segmentation result, and adding the jump instruction at the tail of the first segmentation result when the jump instruction does not exist, so as to obtain a sub-code block; the jump instruction is used for jumping to the next instruction of the last instruction of the first segmentation result in the basic code block.
5. A method of code obfuscation, comprising:
Determining basic code blocks in a function according to the control flow trend of the function in a target program;
dividing the basic code blocks to obtain sub code blocks;
converting a target address of an unconditional jump instruction in a code block into an address determined when the target program runs; wherein the code blocks comprise sub-code blocks or sub-code blocks and basic code blocks which are not segmented;
Replacing a direct call instruction to a system function in the target program with an indirect call instruction to the system function;
The replacing the direct call instruction to the system function in the target program with the indirect call instruction to the system function specifically comprises the following steps:
Generating an indirect calling instruction according to the real address of the system function and the dynamic link library, wherein the indirect calling instruction obtains the real address of the system function to be executed by analyzing the form of the function address in the dynamic link library;
and replacing the direct call instruction to the system function in the target program with the indirect call instruction to the system function.
6. A method of code obfuscation, comprising:
Determining basic code blocks in a function according to the control flow trend of the function in a target program;
dividing the basic code blocks to obtain sub code blocks;
converting a target address of an unconditional jump instruction in a code block into an address determined when the target program runs; wherein the code blocks comprise sub-code blocks or sub-code blocks and basic code blocks which are not segmented;
The dividing the basic code block according to a second preset rule to obtain a sub-code block specifically includes:
judging whether the basic code block meets a preset rule, and dividing the basic code block to obtain a first division result when the basic code block meets the preset rule;
Judging whether a jump instruction exists at the tail of the first segmentation result, and adding the jump instruction at the tail of the first segmentation result when the jump instruction does not exist, so as to obtain a sub-code block; the jump instruction is used for jumping to the next instruction of the last instruction of the first segmentation result in the basic code block.
7. A code obfuscation apparatus, comprising:
the basic code block determining module is used for determining basic code blocks in the functions according to the control flow trend of the functions in the target program;
The sub-code block generation module is used for dividing the basic code blocks to obtain sub-code blocks;
The instruction conversion module is used for converting the target address of the unconditional jump instruction in the code block into an address determined when the target program runs; wherein the code blocks comprise sub-code blocks or sub-code blocks and basic code blocks which are not segmented; the method is also used for disturbing the arrangement sequence of the code blocks in the functions in the target program in the affiliated functions;
The instruction conversion module is specifically configured to insert an address calculation code block between a first code block having an unconditional jump instruction and a second code block to be jumped according to the unconditional jump instruction, and change a jump target of the unconditional jump instruction to the address calculation code block; wherein the address calculation code block is used for dynamically calculating the address of the second code block in the running process;
The instruction conversion module is also specifically configured to insert an address calculation code block between the first code block and the second code block;
According to the address information of the address calculation code block and the address information of the second code block, calculating the address offset between the address calculation code block and the second code block;
modifying an address to be jumped by the unconditional jump instruction in the address calculation code block into an address calculation formula, wherein the address calculation formula comprises: the address calculates address information of the code block itself, the address calculating an address offset between the code block and the second code block.
8. A code obfuscation apparatus, comprising:
the basic code block determining module is used for determining basic code blocks in the functions according to the control flow trend of the functions in the target program;
The sub-code block generation module is used for dividing the basic code blocks to obtain sub-code blocks;
The instruction conversion module is used for converting the target address of the unconditional jump instruction in the code block into an address determined when the target program runs; wherein the code blocks comprise sub-code blocks or sub-code blocks and basic code blocks which are not segmented; replacing a direct call instruction to a system function in the target program with an indirect call instruction to the system function;
the instruction conversion module is specifically used for generating an indirect call instruction according to the real address of the system function and the dynamic link library, and the indirect call instruction obtains the real address of the system function to be executed by analyzing the form of the function address in the dynamic link library; and replacing the direct call instruction to the system function in the target program with the indirect call instruction to the system function.
9. A code obfuscation apparatus, comprising:
the basic code block determining module is used for determining basic code blocks in the functions according to the control flow trend of the functions in the target program;
The sub-code block generation module is used for dividing the basic code blocks to obtain sub-code blocks;
the instruction conversion module is used for converting the target address of the unconditional jump instruction in the code block into an address determined when the target program runs; wherein the code blocks comprise sub-code blocks or sub-code blocks and basic code blocks which are not segmented;
The subcode block generation module is specifically configured to determine whether the basic code block meets a preset rule, and segment the basic code block when the basic code block meets the preset rule to obtain a first segmentation result; judging whether a jump instruction exists at the tail of the first segmentation result, and adding the jump instruction at the tail of the first segmentation result when the jump instruction does not exist, so as to obtain a sub-code block; the jump instruction is used for jumping to the next instruction of the last instruction of the first segmentation result in the basic code block.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the code obfuscation method according to any of claims 1-6 when the program is executed by the processor.
11. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the code obfuscation method according to any of claims 1 to 6.
CN202010819524.9A 2020-08-14 2020-08-14 Code confusion method, device, electronic equipment and storage medium Active CN112115427B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010819524.9A CN112115427B (en) 2020-08-14 2020-08-14 Code confusion method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010819524.9A CN112115427B (en) 2020-08-14 2020-08-14 Code confusion method, device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112115427A CN112115427A (en) 2020-12-22
CN112115427B true CN112115427B (en) 2024-05-31

Family

ID=73804123

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010819524.9A Active CN112115427B (en) 2020-08-14 2020-08-14 Code confusion method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112115427B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113836545A (en) * 2021-08-20 2021-12-24 咪咕音乐有限公司 Code encryption method, device, equipment and storage medium
CN113569269B (en) * 2021-09-23 2022-12-27 苏州浪潮智能科技有限公司 Encryption method, device, equipment and readable medium for code obfuscation
CN114662063B (en) * 2022-04-22 2024-06-25 苏州浪潮智能科技有限公司 Method, device and medium for confusing codes
CN116956245A (en) * 2023-09-19 2023-10-27 安徽大学 Software watermark realization method and system based on control flow flattening confusion
CN117313047B (en) * 2023-11-28 2024-03-15 深圳润世华软件和信息技术服务有限公司 Source code confusion method, confusion reversal method, corresponding device and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103547993A (en) * 2011-03-25 2014-01-29 索夫特机械公司 Executing instruction sequence code blocks by using virtual cores instantiated by partitionable engines
CN108537012A (en) * 2018-02-12 2018-09-14 北京梆梆安全科技有限公司 Source code based on variable and code execution sequence obscures method and device
CN110688120A (en) * 2018-07-06 2020-01-14 武汉斗鱼网络科技有限公司 Method for jumping to designated module and electronic equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8918768B2 (en) * 2012-12-06 2014-12-23 Apple Inc. Methods and apparatus for correlation protected processing of data operations

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103547993A (en) * 2011-03-25 2014-01-29 索夫特机械公司 Executing instruction sequence code blocks by using virtual cores instantiated by partitionable engines
CN108537012A (en) * 2018-02-12 2018-09-14 北京梆梆安全科技有限公司 Source code based on variable and code execution sequence obscures method and device
CN110688120A (en) * 2018-07-06 2020-01-14 武汉斗鱼网络科技有限公司 Method for jumping to designated module and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
代码克隆检测研究进展;陈秋远;李善平;鄢萌;夏鑫;;软件学报(第04期);全文 *

Also Published As

Publication number Publication date
CN112115427A (en) 2020-12-22

Similar Documents

Publication Publication Date Title
CN112115427B (en) Code confusion method, device, electronic equipment and storage medium
CA2776913C (en) A system and method for aggressive self-modification in dynamic function call systems
US8589897B2 (en) System and method for branch extraction obfuscation
EP3012763A1 (en) Control flow graph flattening device and method
US10127160B2 (en) Methods and systems for binary scrambling
US20090083521A1 (en) Program illegiblizing device and method
Jhi et al. Program characterization using runtime values and its application to software plagiarism detection
Ceccato et al. A large study on the effect of code obfuscation on the quality of java code
Kalysch et al. VMAttack: Deobfuscating virtualization-based packed binaries
CN112839036B (en) Software running environment generation method and system based on mimicry defense theory
CN111512307B (en) Compiling apparatus and method
CN111819542A (en) Compiling apparatus and method
CN113366474A (en) System, method and storage medium for obfuscating a computer program by representing control flow of the computer program as data
US10331896B2 (en) Method of protecting secret data when used in a cryptographic algorithm
Yang et al. Towards code watermarking with dual-channel transformations
CN114003868A (en) Method for processing software code and electronic equipment
CN110147238B (en) Program compiling method, device and system
KR20140089044A (en) Method of detecting software similarity using feature information of executable files and apparatus therefor
EP2947590A1 (en) Program code obfuscation based upon recently executed program code
Banescu Characterizing the strength of software obfuscation against automated attacks
Groß et al. Protecting JavaScript apps from code analysis
Kumar et al. A thorough investigation of code obfuscation techniques for software protection
EP3899761B1 (en) Protected processing of operations
EP3834106B1 (en) System and method for watermarking software
Li et al. Iollvm: enhance version of ollvm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant