CN117891463A - Code confusion method and device, electronic equipment and readable storage medium - Google Patents

Code confusion method and device, electronic equipment and readable storage medium Download PDF

Info

Publication number
CN117891463A
CN117891463A CN202311616927.3A CN202311616927A CN117891463A CN 117891463 A CN117891463 A CN 117891463A CN 202311616927 A CN202311616927 A CN 202311616927A CN 117891463 A CN117891463 A CN 117891463A
Authority
CN
China
Prior art keywords
node
type
node type
code
rewritten
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311616927.3A
Other languages
Chinese (zh)
Inventor
何岩峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Corp Ltd
Original Assignee
China Telecom Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Corp Ltd filed Critical China Telecom Corp Ltd
Priority to CN202311616927.3A priority Critical patent/CN117891463A/en
Publication of CN117891463A publication Critical patent/CN117891463A/en
Pending legal-status Critical Current

Links

Abstract

The embodiment of the invention provides a code confusion method, a device, electronic equipment and a readable storage medium, relating to the technical field of computers, comprising the following steps: converting the source code into an original syntax tree; acquiring node objects of the node types to be rewritten from all original nodes of the original grammar tree; wherein the node type to be rewritten includes at least one of the following: control flow node type, expression node type, statement node type, specific code block node set type, variable node type to be hidden, node type to be internally converged, character string constant node type and local variable node type; according to the type of the node to be rewritten, rewriting the corresponding node object to obtain a target grammar tree; the target syntax tree is output as a obfuscated code. The embodiment of the invention can carry out targeted processing on the node object according to the type of the node to be rewritten, and can improve the decompilation difficulty of the code compared with the code confusion scheme of only carrying out name replacement in the related technology, thereby improving the code security.

Description

Code confusion method and device, electronic equipment and readable storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a code confusion method, a code confusion device, an electronic device, and a readable storage medium.
Background
Code obfuscation refers to the logical processing of code to circumvent decompilation issues. For web page code, client code, etc., generally mass sensitive logic is included, such as logic to obtain object information, logic to report object behavior. By performing code obfuscation operations on these codes, the codes may be made difficult to decompile.
In the related art, the code obfuscation process includes: converting the source code into an abstract syntax tree (abstract syntax code, AST), AST being a tree representation of the abstract syntax structure of the source code, each node on the tree representing a structure in the source code (i.e. json objects, json being a syntax for storing and exchanging text information); further, name substitution is performed on nodes including variable names and function names in AST, thereby completing code confusion.
However, when the dictionary for performing name replacement is stolen, the above code obfuscation scheme loses obfuscation meaning, and there is a risk of decompilation.
Disclosure of Invention
The invention provides a code confusion method, a code confusion device, electronic equipment and a readable storage medium, which are used for solving the technical problem that decompilation risks still exist in the prior art.
In a first aspect, the present invention provides a code obfuscation method, the method including:
converting the source code into an original syntax tree;
acquiring node objects of the node type to be rewritten from each original node of the original grammar tree; wherein the node type to be rewritten includes at least one of the following: control flow node type, expression node type, statement node type, specific code block node set type, variable node type to be hidden, node type to be internally converged, character string constant node type and local variable node type;
according to the node type to be rewritten, rewriting the corresponding node object to obtain a target grammar tree;
and outputting the target grammar tree as a confusion code.
In a second aspect, the present invention provides a code obfuscation apparatus, the apparatus comprising:
a code conversion unit for converting the source code into an original syntax tree;
the node determining unit is used for acquiring node objects of the node types to be rewritten from all original nodes of the original grammar tree; wherein the node type to be rewritten includes at least one of the following: control flow node type, expression node type, statement node type, specific code block node set type, variable node type to be hidden, node type to be internally converged, character string constant node type and local variable node type;
The node rewriting unit is used for rewriting the corresponding node object according to the node type to be rewritten to obtain a target grammar tree;
and the code output unit is used for outputting the target grammar tree as a confusion code.
In a third aspect, the present invention provides an electronic device comprising: a processor, a memory and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the code obfuscation method described above when executing the program.
In a fourth aspect, the present invention provides a readable storage medium, which when executed by a processor of an electronic device, enables the electronic device to perform the above-described code obfuscation method.
In an embodiment of the present invention, the method includes: converting the source code into an original syntax tree; acquiring node objects of the node types to be rewritten from all original nodes of the original grammar tree; wherein the node type to be rewritten includes at least one of the following: control flow node type, expression node type, statement node type, specific code block node set type, variable node type to be hidden, node type to be internally converged, character string constant node type and local variable node type; according to the type of the node to be rewritten, rewriting the corresponding node object to obtain a target grammar tree; the target syntax tree is output as a obfuscated code. In the embodiment of the invention, the node object of the node type to be rewritten in the original grammar tree (AST) can be obtained, and the node object is processed in a targeted manner according to the node type to be rewritten. In addition, the technical scheme of the invention can improve the difficulty of decompiling the code so as to improve the code safety, thus the technical support of increasing the sustainability for network safety, namely, the invention can provide safety protection for the interactive service logic between the client/webpage and the server realized by the source code and prevent the dynamic page script from being stolen, cracked and utilized.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of steps of a code obfuscation method provided by an embodiment of the present invention;
FIG. 2 is a schematic diagram of a system architecture for implementing a code obfuscation method according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating steps of another code obfuscation method according to an embodiment of the present invention;
FIG. 4 is a block diagram of a code obfuscation apparatus provided by an embodiment of the present invention;
fig. 5 is a block diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, fig. 1 is a flowchart illustrating steps of a code obfuscation method according to an embodiment of the present invention. The code obfuscation method shown in fig. 1 may be applied in the field of computers, and as shown in fig. 1, the method may include:
step 101, converting the source code into an original grammar tree.
In an embodiment of the invention, the source code is readable text written in a particular programming language, and the goal of the source code is to set up the exact rules and specifications for a computer that can be converted to machine language. Thus, the source code is the basis for programs and websites. Source code may be present in each piece of software, the software executing according to programming in the source code, the usual format being a text file, the ultimate purpose of the computer source code being to translate readable text into binary instructions that can be executed by the computer, a process known as compilation, performed by a compiler.
The source code provided in the invention can be implemented into any required form of a client, a webpage and the like, and the embodiment of the invention is not limited. After the source code is obtained, the specific implementation mode of converting the source code into the original grammar tree is as follows: performing lexical analysis and syntax analysis on the source code through a tool chain (e.g., babel-parameter) to convert the source code into an original syntax tree, wherein the original syntax tree refers to an AST corresponding to the source code; AST is represented as a data structure, shown as a top-down tree structure, each layer is composed of one or more nodes, each node has a type attribute (e.g., functionDeclaration, blockStatement, variableDeclaration, etc.) for representing the node type.
It should be noted that, the babel is used to convert the source code into a JavaScript syntax that is backward compatible, so as to be able to run in current and old versions of the browser or other environments.
Further, optionally, after step 101, it may further include:
and optionally step 11, traversing the original grammar tree, and removing the preset type node from the original grammar tree to obtain a first grammar tree.
In the embodiment of the present invention, the preset type nodes may represent nodes that need to be excluded, that is, nodes that do not need to be confused, where the nodes may include, but are not limited to, preset variable names, preset function main bodies, preset conditional expressions, preset core logic, and the like, and the above various preset nodes may be flexibly set by a technician.
And step 12, collecting the literal quantity of each character string from the first grammar tree to obtain a confusion dictionary corresponding to the source code.
In the embodiment of the invention, the first grammar tree can be traversed first, and the literal quantity (literal) of each character string is collected, wherein the literal quantity is a representation method (station) for expressing a fixed value in the source code, and most computer programming languages have literal quantity representations of basic values, such as integers, floating point numbers and character strings. And then, acquiring the literal quantity of each character string from the first grammar tree to obtain the confusion dictionary corresponding to the source code.
In the embodiment of the invention, the confusion dictionary generated based on the literal quantity can be applied to a confusion process and a confusion code interpretation process; the confusion dictionary is realized as a character set, can also be realized as a grammar tree, and can be used for replacing an original grammar tree to carry out subsequent code confusion operation.
Therefore, by implementing the optional steps 11 to 12, the nodes which do not need to be confused can be removed from the original grammar tree, and the confusion dictionary used for the subsequent confusion operation is obtained, so that the number of nodes which need to be traversed when the subsequent traversal confusion dictionary carries out different types of confusion can be reduced, and the confusion efficiency is improved.
Further, optionally, after step 101, it may further include:
optional step 21, determining the specific node to be renamed from the original nodes.
In the embodiment of the present invention, since each original node corresponds to a node type, a hook function (enter hook function) may be called first, and a specific node (e.g., identifier, functionDeclaration, etc.) that needs to be renamed may be determined based on the node type of each original node, and the number of specific nodes may be one or more.
The specific nodes may be designated nodes of some node types or may be full-scale nodes, which is not limited in the embodiment of the present invention.
Optional step 22 renames the specific node according to a specific function method.
In the embodiment of the invention, the specific function method can be in one-to-one correspondence with the node type of the specific node, or can be a unified function method aiming at all the specific nodes; specifically, renaming a specific node according to a specific function method includes: a random name is generated, and a node object (i.e., a node name) of a specific node is modified into the generated random name by a set/replace method in a path (path). It should be noted that, when there are a plurality of specific nodes, the random names generated for different specific nodes may be different.
It can be seen that, the optional step 21 to the optional step 22 can be implemented by renaming the specific node of the original node first, so as to realize preliminary confusion, promote the difficulty of code cracking, and ensure the safety of the source code.
Further, optionally, after step 22, it may further include:
optional step 23, if the specific node includes child nodes, performing sub-tree recursive renaming on the specific node.
In the embodiment of the invention, if the node object of the specific node has a binding problem in the action domain, the specific node is judged to contain child nodes; because the name used in a section of program code is not always valid and available, the code range of the availability of the name needs to be limited by a scope, which is used for improving the locality of program logic, enhancing the reliability of the program and avoiding name conflicts. Furthermore, the child nodes of the specific node are traversed to recursively enter the child nodes to realize unified processing of the child nodes. After the full traversal of the original syntax tree is completed, a random renaming of the node global names of the AST may be initially achieved.
It can be seen that, by implementing the optional steps 21 to 23, renaming of a specific node and recursive renaming of a subtree of the specific node can be realized, so that accurate renaming of an original syntax tree is realized, and problems of scope errors and the like in the renaming process are avoided.
102, acquiring node objects of the node type to be rewritten from each original node of an original grammar tree; wherein the node type to be rewritten includes at least one of the following: control flow node type, expression node type, statement node type, specific code block node set type, variable node type to be hidden, node type to be internally converged, character string constant node type, and local variable node type.
In the embodiment of the invention, the node type to be rewritten is expressed as the node attribute, and the node type to be rewritten of the current node in the original grammar tree can be determined by reading the node attribute. Since the node type to be rewritten is defined, at least one of the following is included: control flow node type, expression node type, statement node type, specific code block node set type, variable node type to be hidden, node type to be internally converged, character string constant node type, and local variable node type.
Therefore, the node object corresponding to the control flow node type (e.g., ifStatement, whileStatement, etc.), the node object corresponding to the expression node type (e.g., binaryExpression, logicalExpression, etc.), the node object corresponding to the statement node type (e.g., variableDeclaration, expressionStatement, etc.), the node object set corresponding to the specific code block node set type, the node object corresponding to the variable node type to be hidden (e.g., identifier), the node object corresponding to the node type to be converged (e.g., while, switch, etc.), the node object corresponding to the character string constant node type (e.g., literal, etc.), the node object corresponding to the local variable node type (e.g., declaror, etc.) can be directly obtained from the original syntax tree by reading the node attribute; the node object refers to specific content stored in the node.
And 103, rewriting the corresponding node object according to the type of the node to be rewritten to obtain a target grammar tree.
In the embodiment of the present invention, the target syntax tree refers to a new AST after the original syntax tree is rewritten.
Since there are a plurality of types of nodes to be rewritten, the following first to eighth embodiments are used to describe a node object rewriting method for each of different types of nodes to be rewritten.
The first to eighth embodiments are parallel embodiments, and the specific matters of the first to eighth embodiments are developed sequentially hereinafter.
First, optionally, in step 103, if the node type to be rewritten is a control flow node type, rewriting the corresponding node object according to the node type to be rewritten may include: substep 1031a or substep 1032a.
Sub-step 1031a, inserting a type of false object into the node object corresponding to the control flow node type; wherein one type of spurious object includes spurious decision result branches.
In the embodiment of the present invention, the node object of the control flow node type is a node object indicating a program logic executing method, and the false condition assignment is implemented by acquiring the node object of the control flow node type (for example, if) through a hook function and inserting a type of false object (for example, a false if branch) into the node object, where the type of false object inserted into the node object may be one or more, and the embodiment of the present invention is not limited. For example, inserting a type of dummy object into a node object corresponding to a control flow node type may be shown as the following code:
Sub-step 1032a, modifying the node object corresponding to the control flow node type into a second type false object; wherein the second type of false object includes a false loop termination result.
In the embodiment of the invention, the node object (such as while) of the control flow node type can also be obtained through a hook function and modified into a second-class false object (such as a false loop termination condition). For example, modifying a node object corresponding to a control flow node type into a second type of dummy object may be shown as the following code:
it can be seen that implementing substep 1031a or substep 1032a may promote complexity of the code obfuscation result and promote difficulty of code being broken based on overwriting the node object of the control flow node type.
Second, optionally, in step 103, if the node type to be rewritten is the expression node type, rewriting the corresponding node object according to the node type to be rewritten may include: substep 1031b to substep 1032b.
Substep 1031b generates a conversion function for hiding the actual logic for the node object corresponding to the expression node type.
In the embodiment of the invention, the node object corresponding to the expression node type is a statement, and for the node object, a conversion function for hiding the real expression statement can be generated after the node object is obtained through a hook function, and the conversion function can be in the form of functions such as a function call chain. For example, generating a transfer function to hide the actual logic may be shown as the following code:
Sub-step 1032b, modifying the node object corresponding to the expression node type to a transfer function.
In the embodiment of the invention, the node objects are modified into the conversion function, so that the node objects can be prevented from being directly acquired, the conversion function conceals the real node objects, and the reading difficulty of the expression is improved. For example, modifying a node object corresponding to an expression node type into a transfer function may be shown as the following code:
it can be seen that implementing sub-steps 1031b through 1032b may increase the complexity of the code obfuscation results and increase the difficulty of the code being broken based on overwriting the node objects of the expression node type.
Third, optionally, in step 103, if the node type to be rewritten is a sentence node type, rewriting the corresponding node object according to the node type to be rewritten may include: substep 1031c.
A substep 1031c, executing a sentence processing operation for the node object corresponding to the sentence node; wherein the sentence processing operation includes at least one of: statement rearrangement and statement packaging.
In the embodiment of the invention, the node object corresponding to the statement node is a programming statement element, and after the node object is obtained through the hook function, the statement processing operation can be executed to enhance the statement semantic analysis difficulty, for example, when the node object is an assignment statement, the assignment statement is split into a plurality of sub-statements. In addition, it should be noted that, in addition to statement rearrangement and statement packaging, the statement processing operation may also include any other operation, which is not limited by the embodiment of the present invention according to the actual situation.
It can be seen that implementing substep 1031c may promote complexity of the code confusion result and promote difficulty of code being broken based on overwriting the node object of the sentence node type.
Fourth, optionally, in step 103, if the node type to be rewritten is a specific code block node set type, rewriting the corresponding node object according to the node type to be rewritten may include: substep 1031d to substep 1032d.
Sub-step 1031d, selecting the scope of the node object corresponding to the node set type of at least two specific code blocks, to obtain at least two scopes.
In an embodiment of the present invention, a node object of a particular code block node set type refers to a function body or multiple statements used to compose a code block. Scope (Scope) refers to the accessible Scope of variables in a program, which defines in which parts a variable can be referenced or modified.
Sub-step 1032d, swaps the positions of at least two scopes in the parent node.
In the embodiment of the invention, random exchange/appointed form exchange can be carried out on the positions of at least two scopes in the father node under the condition of not influencing code semantics.
It can be seen that implementing sub-steps 1031d through 1032d may increase the complexity of the code confusion result and increase the difficulty of code being broken based on overwriting the node objects of the particular code block node set type.
Fifth, optionally, in step 103, if the node type to be rewritten is the variable node type to be hidden, rewriting the corresponding node object according to the node type to be rewritten may include: substep 1031e.
Substep 1031e, replacing the node object corresponding to the node type of the variable to be hidden with a specified chain access structure according to a replacement rule; the designated chain access structure comprises node objects corresponding to the node types of the variables to be hidden.
In the embodiment of the invention, the node object of the variable node type to be hidden refers to a variable which can be replaced by chained access, the replacement rule is used for limiting the generation mode of the access structure aiming at the node object, and the node object corresponding to the variable node type to be hidden can be replaced by a specified chained access structure according to the replacement rule. The chain access structure is used for enabling external operation to acquire a real variable value through an intermediate variable, and improving the acquisition difficulty of an original variable in a code.
It can be seen that implementing substep 1031e may promote complexity of the code confusion result and promote difficulty of code being broken based on the overwriting of the node object of the node type to be overwritten.
Sixth, optionally, in step 103, if the node type to be rewritten is the node type to be healed, rewriting the corresponding node object according to the node type to be rewritten may include: substep 1031f to substep 1033f.
And step 1031f, obtaining a call point node of the function node object corresponding to the node type to be converged. If the function node object and the call site node do not depend on external variables, then sub-step 1032f is performed; sub-step 1033f is performed if the function node object and the call site node depend on external variables.
In the embodiment of the invention, the function node object corresponding to the node type to be converged refers to a function only applicable to one place. Further, it is possible to analyze whether the function body and the call parameter depend on external variables.
Sub-step 1032f, connotes the node object to the location of the call site node.
In the embodiment of the invention, when the function node object and the call point node do not depend on external variables, the function body of the node object can be directly converged to the call point position.
And step 1033f, packaging the function node object through the closure function, and connotating the packaged function node object to the position of the calling point node.
In the embodiment of the invention, when the function node object and the call point node depend on external variables, the position of the call point node can be reached by the inclusion of the function body of the node object after the function body is packed through the closure function.
It can be seen that implementing sub-steps 1031f through 1033f may increase the complexity of the code confusion result and increase the difficulty of the code being broken based on the overwriting of the node object to the type of the converging node.
Seventh, optionally, in step 103, if the node type to be rewritten is a character string constant node type, rewriting the corresponding node object according to the node type to be rewritten may include: substep 1031g to substep 1033g.
Substep 1031g, based on the word segmentation algorithm and the scrambling sequence algorithm, generates a confusion table for the character strings in the node objects corresponding to the character string constant node types.
In embodiments of the present invention, word segmentation algorithms and scrambling algorithms may be used to generate elements that make up a confusion table that rely on strings in node objects.
And sub-step 1032g, dividing and reorganizing the character string according to the confusion table to obtain the reference character string.
In the embodiment of the invention, the character string can be segmented and recombined based on the confusion table to obtain a new reference character string which is difficult to crack.
In step 1033g, the reference character string is parsed into a reducible target character string based on a preset encoding rule.
In the embodiment of the present invention, the preset encoding rule may be configured according to practical situations, and the embodiment of the present invention is not limited. The preset encoding rule is used for analyzing the reference character string into a reducible target character string.
For example, the overwriting of a node object whose type to be overwritten is a string constant node type may be shown as the following code:
It can be seen that implementing substep 1031g to substep 1033g may increase the complexity of the code confusion result and increase the difficulty of code being broken based on the overwriting of the node object of the string constant node type.
Eighth, optionally, in step 103, if the node type to be rewritten is a local variable node type, rewriting the corresponding node object according to the node type to be rewritten may include: substep 1031h to substep 1033h.
Substep 1031h modifies the variable declaration of the node object corresponding to the local variable node type to the outer scope.
In the embodiment of the invention, the node object corresponding to the local variable node type refers to the variable declaration in the function/block scope. Variable declarations may be promoted to outer scopes for such node objects.
And step 1032h, adding a target variable declaration statement to the outer scope to obtain a target outer scope.
In the embodiment of the invention, a target variable declaration statement is added to the outer acting domain at the same time, and the target variable declaration statement is used for explaining a node object corresponding to a local variable node type contained in the target outer acting domain.
Substep 1033h, replacing the node object corresponding to the local variable node type with the target outer scope.
In the embodiment of the invention, furthermore, the node object can be replaced by the target outer scope, so that the variable access complexity can be increased, and the deep confusion is realized.
It can be seen that implementing sub-steps 1031h through 1033h may be based on overwriting node objects of the local variable node type, increasing the complexity of the code confusion result, and increasing the difficulty of the code being broken.
Step 104, outputting the target grammar tree as a confusion code.
In an embodiment of the present invention, outputting the target syntax tree as a confusion code refers to outputting the target syntax tree as a computer-readable binary confusion code.
Optionally, in step 104, outputting the target syntax tree as a confusion code may include: substep 1041 to substep 1042.
Step 1041, traversing each target node in the target grammar tree by a grammar tree traversing device (i.e. an AST traversing device) to construct a code character string corresponding to each target node until each target node is traversed, and obtaining a confusion code corresponding to the target grammar tree;
in the embodiment of the invention, depth-first traversal can be performed on each target node in the target grammar tree through an AST (object oriented tree) traversal device, and a method for constructing codes is added when entering/leaving the node based on an enter/exit hook of a generator (generator) in the AST traversal device, so that code character strings corresponding to each target node are constructed, and the code character strings of each target node form confusion codes.
Substep 1042, outputting the obfuscated code in the form of a string or a file.
In the embodiment of the invention, the confusion code can be output in the form of a character string or a file in response to the generator.
It can be seen that implementing sub-steps 1041 through 1042, a high complexity obfuscated code may be output as a usable string form/file form for user invocation.
In summary, the code confusion method provided by the embodiment of the invention comprises the following steps: converting the source code into an original syntax tree; acquiring node objects of the node types to be rewritten from all original nodes of the original grammar tree; wherein the node type to be rewritten includes at least one of the following: control flow node type, expression node type, statement node type, specific code block node set type, variable node type to be hidden, node type to be internally converged, character string constant node type and local variable node type; according to the type of the node to be rewritten, rewriting the corresponding node object to obtain a target grammar tree; the target syntax tree is output as a obfuscated code. In the embodiment of the invention, the node object of the node type to be rewritten in the original grammar tree (AST) can be obtained, and the node object is processed in a targeted manner according to the node type to be rewritten. In addition, the technical scheme of the invention can improve the difficulty of decompiling the code so as to improve the code safety, thus the technical support of increasing the sustainability for network safety, namely, the invention can provide safety protection for the interactive service logic between the client/webpage and the server realized by the source code and prevent the dynamic page script from being stolen, cracked and utilized.
Referring to fig. 2, fig. 2 is a schematic diagram of a system architecture for implementing a code obfuscation method according to an embodiment of the present invention. As shown in fig. 2, a system for implementing a code obfuscation method may include: a removal module 210, a confusion engine 220; wherein the confusion engine 220 includes, but is not limited to: a node renaming module 221, an expression conversion module 222, a code block replacement module 223, a variable hiding module 224, a control flow rewriting module 225, a statement reconstruction module 226, a call relation reordering module 227 and a function inner convergence module 228.
After receiving the source code including the variable name, the function name, the conditional expression, the core logic, etc., the source code may be converted into an AST, and then, the removing module 210 traverses the original syntax tree, removes the preset type node from the original syntax tree to obtain a first syntax tree, and collects the literal quantity of each character string from the first syntax tree to obtain a confusion dictionary corresponding to the source code, where the confusion dictionary is implemented as a character set. Further, code obfuscation steps may be performed based on the obfuscation engine 220.
The node renaming module 221 is configured to determine a specific node to be renamed from the original nodes, and rename the specific node according to a specific function method; and if the specific node comprises the child node, carrying out sub-tree recursion renaming on the specific node.
The expression conversion module 222 is configured to generate a conversion function for hiding actual logic for a node object corresponding to an expression node type; and modifying the node object corresponding to the expression node type into a conversion function.
The code block exchange module 223 is configured to select a scope of a node object corresponding to at least two specific code block node set types, so as to obtain at least two scopes; at least two scopes are swapped for their location in the parent node.
The variable hiding module 224 is configured to replace a node object corresponding to a node type of a variable to be hidden with a specified chain access structure according to a replacement rule; the designated chain access structure comprises node objects corresponding to the node types of the variables to be hidden. Modifying the variable declaration of the node object corresponding to the local variable node type to an outer scope; adding a target variable declaration statement to the outer scope to obtain a target outer scope; and replacing the node object corresponding to the local variable node type with the target outer scope.
The control flow rewrite module 225 is configured to insert a type of false object into a node object corresponding to a control flow node type; wherein, the false object includes false judgment result branches; or modifying the node object corresponding to the control flow node type into a second-class false object; wherein the second type of false object includes a false loop termination result.
The sentence reconstruction module 226 is configured to execute a sentence processing operation for a node object corresponding to a sentence node; wherein the sentence processing operation includes at least one of: statement rearrangement and statement packaging.
The call relation reordering module 227 is configured to generate a confusion table for a character string in a node object corresponding to a character string constant node type based on a word segmentation algorithm and a scrambling sequence algorithm; dividing and reorganizing the character strings according to the confusion table to obtain reference character strings; and analyzing the reference character string into a reducible target character string based on a preset coding rule.
The function inner convergence module 228 is configured to obtain a call point node of a function node object corresponding to a node type to be inner converged; if the function node object and the call point node do not depend on external variables, the node object is connotated to the position of the call point node; if the function node object and the call point node depend on external variables, the function node object is packed through the closure function, and the packed function node object is connotated to the position of the call point node.
In summary, the system provided by the embodiment of the invention can acquire the node object of the node type to be rewritten in the original grammar tree (AST), and perform targeted processing on the node object according to the node type to be rewritten, compared with the code confusion scheme of performing only name replacement in the related art, the code confusion scheme of performing targeted processing on the node object based on the node type to be rewritten can reduce the risk of decompiling codes, and the reliability and safety of the source code are improved by outputting the target grammar tree obtained after the node object is rewritten as the confusion code, which is equivalent to realizing a code encryption scheme with higher cracking difficulty. In addition, the technical scheme of the invention can improve the difficulty of decompiling the code so as to improve the code safety, thus the technical support of increasing the sustainability for network safety, namely, the invention can provide safety protection for the interactive service logic between the client/webpage and the server realized by the source code and prevent the dynamic page script from being stolen, cracked and utilized.
Referring to fig. 3, fig. 3 is a flowchart illustrating steps of another code obfuscation method according to an embodiment of the present invention. As shown in fig. 3, may include: steps 310 through 334.
Step 310: the source code is converted to an original syntax tree.
Step 312: traversing an original grammar tree, removing preset type nodes from the original grammar tree to obtain a first grammar tree, collecting the literal quantity of each character string from the first grammar tree to obtain a confusion dictionary corresponding to a source code, and applying the confusion dictionary to a code confusion process.
Step 314: determining a specific node to be renamed from each original node, and renaming the specific node according to a specific function method; and if the specific node comprises the child node, carrying out sub-tree recursion renaming on the specific node.
Step 316: acquiring node objects of the node types to be rewritten from all original nodes of the original grammar tree; the node types to be rewritten comprise a control flow node type, an expression node type, a statement node type, a specific code block node set type, a variable node type to be hidden, a node type to be inner converged, a character string constant node type and a local variable node type.
Step 318: inserting a false object into a node object corresponding to the control flow node type; wherein, the false object includes false judgment result branches; or modifying the node object corresponding to the control flow node type into a second-class false object; wherein the second type of false object includes a false loop termination result.
Step 320: generating a conversion function for hiding actual logic for a node object corresponding to the expression node type; and modifying the node object corresponding to the expression node type into a conversion function.
Step 322: executing statement processing operation on the node object corresponding to the statement node; wherein the sentence processing operation includes at least one of: statement rearrangement and statement packaging.
Step 324: selecting the scope of the node object corresponding to the node set type of at least two specific code blocks to obtain at least two scopes; at least two scopes are swapped for their location in the parent node.
Step 326: replacing the node object corresponding to the node type of the variable to be hidden with a designated chain access structure according to the replacement rule; the designated chain access structure comprises node objects corresponding to the node types of the variables to be hidden.
Step 328: acquiring a call point node of a function node object corresponding to the node type to be converged; if the function node object and the call point node do not depend on external variables, the node object is connotated to the position of the call point node; if the function node object and the call point node depend on external variables, the function node object is packed through the closure function, and the packed function node object is connotated to the position of the call point node.
Step 330: generating a confusion table aiming at the character strings in the node objects corresponding to the character string constant node types based on a word segmentation algorithm and a scrambling sequence algorithm; dividing and reorganizing the character strings according to the confusion table to obtain reference character strings; and analyzing the reference character string into a reducible target character string based on a preset coding rule.
Step 332: modifying the variable declaration of the node object corresponding to the local variable node type to an outer scope; adding a target variable declaration statement to the outer scope to obtain a target outer scope; and replacing the node object corresponding to the local variable node type with the target outer scope.
Step 334: traversing each target node in the target grammar tree through a grammar tree traversing device to construct code character strings corresponding to each target node until each target node is traversed, and obtaining a confusion code corresponding to the target grammar tree; the obfuscated code is output in the form of a string or a file.
It should be noted that, the steps 310 to 334 correspond to the steps and embodiments shown in fig. 1, and for the specific implementation of the steps 310 to 334, reference is made to the steps and embodiments shown in fig. 1, and the description thereof will not be repeated here.
In summary, the code confusion method provided by the embodiment of the invention comprises the following steps: converting the source code into an original syntax tree; acquiring node objects of the node types to be rewritten from all original nodes of the original grammar tree; wherein the node type to be rewritten includes at least one of the following: control flow node type, expression node type, statement node type, specific code block node set type, variable node type to be hidden, node type to be internally converged, character string constant node type and local variable node type; according to the type of the node to be rewritten, rewriting the corresponding node object to obtain a target grammar tree; the target syntax tree is output as a obfuscated code. In the embodiment of the invention, the node object of the node type to be rewritten in the original grammar tree (AST) can be obtained, and the node object is processed in a targeted manner according to the node type to be rewritten. In addition, the technical scheme of the invention can improve the difficulty of decompiling the code so as to improve the code safety, thus the technical support of increasing the sustainability for network safety, namely, the invention can provide safety protection for the interactive service logic between the client/webpage and the server realized by the source code and prevent the dynamic page script from being stolen, cracked and utilized.
Fig. 4 is a block diagram of a code obfuscating apparatus according to an embodiment of the present invention, where the code obfuscating apparatus 400 may include:
a transcoding unit 401 for converting the source code into an original syntax tree;
a node determining unit 402, configured to obtain a node object of a node type to be rewritten from each original node of the original syntax tree; wherein the node type to be rewritten includes at least one of the following: control flow node type, expression node type, statement node type, specific code block node set type, variable node type to be hidden, node type to be internally converged, character string constant node type and local variable node type;
a node rewriting unit 403, configured to rewrite the corresponding node object according to the node type to be rewritten, so as to obtain a target syntax tree;
and a code output unit 404 for outputting the target syntax tree as a confusion code.
Optionally, the apparatus further comprises:
a specific node determining unit, configured to determine a specific node to be renamed from the original nodes;
and the renaming unit is used for renaming the specific node according to the specific function method.
Optionally, wherein:
and the renaming unit is further used for carrying out sub-tree recursion renaming on the specific node if the specific node contains the child nodes.
Optionally, the apparatus further comprises:
the traversing unit is used for traversing the original grammar tree, removing the preset type node from the original grammar tree and obtaining a first grammar tree;
and the confusion dictionary determining unit is used for acquiring the literal quantity of each character string from the first grammar tree to obtain a confusion dictionary corresponding to the source code.
Optionally, if the node type to be rewritten is a control flow node type, the node rewriting unit 403 rewrites the corresponding node object according to the node type to be rewritten, including:
inserting a false object into a node object corresponding to the control flow node type; wherein, the false object includes false judgment result branches;
or modifying the node object corresponding to the control flow node type into a second-class false object; wherein the second type of false object includes a false loop termination result.
Optionally, if the node type to be rewritten is the expression node type, the node rewriting unit 403 rewrites the corresponding node object according to the node type to be rewritten, including:
generating a conversion function for hiding actual logic for a node object corresponding to the expression node type;
and modifying the node object corresponding to the expression node type into a conversion function.
Optionally, if the node type to be rewritten is a sentence node type, the node rewriting unit 403 rewrites the corresponding node object according to the node type to be rewritten, including:
executing statement processing operation on the node object corresponding to the statement node; wherein the sentence processing operation includes at least one of: statement rearrangement and statement packaging.
Optionally, if the node type to be rewritten is a specific code block node set type, the node rewriting unit 403 rewrites the corresponding node object according to the node type to be rewritten, including:
selecting the scope of the node object corresponding to the node set type of at least two specific code blocks to obtain at least two scopes;
at least two scopes are swapped for their location in the parent node.
Optionally, if the node type to be rewritten is the variable node type to be hidden, the node rewriting unit 403 rewrites the corresponding node object according to the node type to be rewritten, including:
replacing the node object corresponding to the node type of the variable to be hidden with a designated chain access structure according to the replacement rule; the designated chain access structure comprises node objects corresponding to the node types of the variables to be hidden.
Optionally, if the node type to be rewritten is the node type to be endoclashed, the node rewriting unit 403 rewrites the corresponding node object according to the node type to be rewritten, including:
Acquiring a call point node of a function node object corresponding to the node type to be converged;
if the function node object and the call point node do not depend on external variables, the node object is connotated to the position of the call point node;
if the function node object and the call point node depend on external variables, the function node object is packed through the closure function, and the packed function node object is connotated to the position of the call point node.
Optionally, if the node type to be rewritten is a character string constant node type, the node rewriting unit 403 rewrites the corresponding node object according to the node type to be rewritten, including:
generating a confusion table aiming at the character strings in the node objects corresponding to the character string constant node types based on a word segmentation algorithm and a scrambling sequence algorithm;
dividing and reorganizing the character strings according to the confusion table to obtain reference character strings;
and analyzing the reference character string into a reducible target character string based on a preset coding rule.
Optionally, if the node type to be rewritten is a local variable node type, the node rewriting unit 403 rewrites the corresponding node object according to the node type to be rewritten, including:
modifying the variable declaration of the node object corresponding to the local variable node type to an outer scope;
Adding a target variable declaration statement to the outer scope to obtain a target outer scope;
and replacing the node object corresponding to the local variable node type with the target outer scope.
Alternatively, the code output unit 404 outputs the target syntax tree as a mixed code, including:
traversing each target node in the target grammar tree through a grammar tree traversing device to construct code character strings corresponding to each target node until each target node is traversed, and obtaining a confusion code corresponding to the target grammar tree;
the obfuscated code is output in the form of a string or a file.
In summary, the code confusion device provided by the embodiment of the invention comprises: converting the source code into an original syntax tree; acquiring node objects of the node types to be rewritten from all original nodes of the original grammar tree; wherein the node type to be rewritten includes at least one of the following: control flow node type, expression node type, statement node type, specific code block node set type, variable node type to be hidden, node type to be internally converged, character string constant node type and local variable node type; according to the type of the node to be rewritten, rewriting the corresponding node object to obtain a target grammar tree; the target syntax tree is output as a obfuscated code. In the embodiment of the invention, the node object of the node type to be rewritten in the original grammar tree (AST) can be obtained, and the node object is processed in a targeted manner according to the node type to be rewritten. In addition, the technical scheme of the invention can improve the difficulty of decompiling the code so as to improve the code safety, thus the technical support of increasing the sustainability for network safety, namely, the invention can provide safety protection for the interactive service logic between the client/webpage and the server realized by the source code and prevent the dynamic page script from being stolen, cracked and utilized.
The present invention also provides a block diagram of an electronic device, see fig. 5, comprising: a processor 501, a memory 502 and a computer program 5021 stored on the memory and executable on the processor, which when executed implements the code obfuscation method of the previous embodiments.
The invention also provides a readable storage medium which, when executed by a processor of an electronic device, enables the electronic device to perform the code obfuscation method of the previous embodiments.
For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.
It should be noted that, various information and data acquired in the embodiment of the present invention are acquired under the condition that the information/data holder is authorized.
The algorithms and displays presented herein are not inherently related to any particular computer, virtual system, or other apparatus. Various general-purpose systems may also be used with the teachings herein. The required structure for a construction of such a system is apparent from the description above. In addition, the present invention is not directed to any particular programming language. It will be appreciated that the teachings of the present invention described herein may be implemented in a variety of programming languages, and the above description of specific languages is provided for disclosure of enablement and best mode of the present invention.
In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the above description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be construed as reflecting the intention that: i.e., the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules in the apparatus of the embodiments may be adaptively changed and disposed in one or more apparatuses different from the embodiments. The modules or units or components of the embodiments may be combined into one module or unit or component and, furthermore, they may be divided into a plurality of sub-modules or sub-units or sub-components. Any combination of all features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or units of any method or apparatus so disclosed, may be used in combination, except insofar as at least some of such features and/or processes or units are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that some or all of the functions of some or all of the components in a sorting device according to the present invention may be implemented in practice using a microprocessor or Digital Signal Processor (DSP). The present invention may also be implemented as an apparatus or device program for performing part or all of the methods described herein. Such a program embodying the present invention may be stored on a computer readable medium, or may have the form of one or more signals. Such signals may be downloaded from an internet website, provided on a carrier signal, or provided in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order. These words may be interpreted as names.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.
The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims (16)

1. A method of code obfuscation, the method comprising:
converting the source code into an original syntax tree;
acquiring node objects of the node type to be rewritten from each original node of the original grammar tree; wherein the node type to be rewritten includes at least one of the following: control flow node type, expression node type, statement node type, specific code block node set type, variable node type to be hidden, node type to be internally converged, character string constant node type and local variable node type;
According to the node type to be rewritten, rewriting the corresponding node object to obtain a target grammar tree;
and outputting the target grammar tree as a confusion code.
2. The method according to claim 1, wherein the method further comprises:
determining a specific node to be renamed from the original nodes;
renaming the specific node according to a specific function method.
3. The method according to claim 2, wherein the method further comprises:
and if the specific node comprises a child node, carrying out sub-tree recursion renaming on the specific node.
4. The method according to claim 1, wherein the method further comprises:
traversing the original grammar tree, and removing a preset type node from the original grammar tree to obtain a first grammar tree;
and acquiring the literal quantity of each character string from the first grammar tree to obtain the confusion dictionary corresponding to the source code.
5. The method according to claim 1, wherein if the node type to be rewritten is a control flow node type, the rewriting the corresponding node object according to the node type to be rewritten includes:
Inserting a false object into the node object corresponding to the control flow node type; wherein the false object class includes a false judgment result branch;
or modifying the node object corresponding to the control flow node type into a second-class false object; wherein the second type of false object includes a false loop termination result.
6. The method according to claim 1, wherein if the node type to be rewritten is an expression node type, the rewriting the corresponding node object according to the node type to be rewritten includes:
generating a conversion function for hiding actual logic for the node object corresponding to the expression node type;
and modifying the node object corresponding to the expression node type into the conversion function.
7. The method according to claim 1, wherein if the node type to be rewritten is a sentence node type, the rewriting the corresponding node object according to the node type to be rewritten includes:
executing statement processing operation on the node object corresponding to the statement node; wherein the statement processing operation includes at least one of: statement rearrangement and statement packaging.
8. The method according to claim 1, wherein if the node type to be rewritten is a specific code block node set type, the rewriting the corresponding node object according to the node type to be rewritten includes:
selecting the scope of the node object corresponding to the node set type of at least two specific code blocks to obtain at least two scopes;
the at least two scopes are swapped for their location in the parent node.
9. The method according to claim 1, wherein if the node type to be rewritten is a variable node type to be hidden, the rewriting the corresponding node object according to the node type to be rewritten includes:
replacing the node object corresponding to the node type of the variable to be hidden with a designated chain access structure according to a replacement rule; the designated chained access structure comprises node objects corresponding to the node types of the variables to be hidden.
10. The method according to claim 1, wherein if the node type to be rewritten is a node type to be endoclashed, the rewriting the corresponding node object according to the node type to be rewritten includes:
Acquiring a call point node of a function node object corresponding to the node type to be converged;
if the function node object and the call point node do not depend on external variables, the node object is connotated to the position of the call point node;
and if the function node object and the call point node depend on the external variable, packaging the function node object through a closure function, and connotating the packaged function node object to the position of the call point node.
11. The method according to claim 1, wherein if the node type to be rewritten is a character string constant node type, the rewriting the corresponding node object according to the node type to be rewritten includes:
generating a confusion table aiming at the character strings in the node objects corresponding to the character string constant node types based on a word segmentation algorithm and a scrambling sequence algorithm;
dividing and reorganizing the character strings according to the confusion table to obtain reference character strings;
and analyzing the reference character string into a reducible target character string based on a preset coding rule.
12. The method according to claim 1, wherein if the node type to be rewritten is a local variable node type, the rewriting the corresponding node object according to the node type to be rewritten includes:
Modifying the variable declaration of the node object corresponding to the local variable node type to an outer scope;
adding a target variable declaration statement to the outer layer scope to obtain a target outer layer scope;
and replacing the node object corresponding to the local variable node type with the target outer scope.
13. The method of claim 1, wherein said outputting the target syntax tree as a obfuscated code comprises:
traversing each target node in the target grammar tree through a grammar tree traversing device to construct a code character string corresponding to each target node until each target node is traversed, and obtaining a confusion code corresponding to the target grammar tree;
outputting the confusion code in the form of character strings or files.
14. A code obfuscation apparatus, the apparatus comprising:
a code conversion unit for converting the source code into an original syntax tree;
the node determining unit is used for acquiring node objects of the node types to be rewritten from all original nodes of the original grammar tree; wherein the node type to be rewritten includes at least one of the following: control flow node type, expression node type, statement node type, specific code block node set type, variable node type to be hidden, node type to be internally converged, character string constant node type and local variable node type;
The node rewriting unit is used for rewriting the corresponding node object according to the node type to be rewritten to obtain a target grammar tree;
and the code output unit is used for outputting the target grammar tree as a confusion code.
15. An electronic device, comprising:
a processor, a memory and a computer program stored on the memory and executable on the processor, the processor implementing the method according to any one of claims 1-13 when executing the program.
16. A readable storage medium, characterized in that instructions in the readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the method of any one of claims 1-13.
CN202311616927.3A 2023-11-29 2023-11-29 Code confusion method and device, electronic equipment and readable storage medium Pending CN117891463A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311616927.3A CN117891463A (en) 2023-11-29 2023-11-29 Code confusion method and device, electronic equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311616927.3A CN117891463A (en) 2023-11-29 2023-11-29 Code confusion method and device, electronic equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN117891463A true CN117891463A (en) 2024-04-16

Family

ID=90645451

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311616927.3A Pending CN117891463A (en) 2023-11-29 2023-11-29 Code confusion method and device, electronic equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN117891463A (en)

Similar Documents

Publication Publication Date Title
CN112100054B (en) Data management and control oriented program static analysis method and system
CN106462677B (en) Method and device for protecting software project
US20210081185A1 (en) System and method for compiling high-level language code into a script executable on a blockchain platform
CN110414261B (en) Data desensitization method, device, equipment and readable storage medium
US11349816B2 (en) Obfuscating source code sent, from a server computer, to a browser on a client computer
CN110069259B (en) ID L file-based parsing method and device, electronic equipment and storage medium
Ďurfina et al. Design of a retargetable decompiler for a static platform-independent malware analysis
CN110245467B (en) Android application program protection method based on Dex2C and LLVM
Gui et al. Cross-language binary-source code matching with intermediate representations
CN107741847A (en) Realize the method and device of domain-driven model
CN108563561B (en) Program implicit constraint extraction method and system
CN114611074A (en) Method, system, equipment and storage medium for obfuscating source code of solid language
Kop et al. Constrained term rewriting tool
Kaposi et al. Shallow embedding of type theory is morally correct
Proksch et al. A dataset of simplified syntax trees for C#
Grabmayer et al. Maximal Sharing in the Lambda Calculus with letrec
Fritzson et al. Metamodelica–a symbolic-numeric modelica language and comparison to julia
CN117891463A (en) Code confusion method and device, electronic equipment and readable storage medium
CN113849781B (en) Go language source code confusion method, system, terminal and storage medium
Shahkar On matching binary to source code
Stoffers et al. Automated memoization for parameter studies implemented in impure languages
Husák et al. PeachPie: Mature PHP to CLI compiler
Kim et al. Static dalvik bytecode optimization for Android applications
Wiegley et al. Using Coq to write fast and correct Haskell
Petrila @ C--augmented version of C programming language

Legal Events

Date Code Title Description
PB01 Publication