CN113515745B - Method and system for Trojan horse detection - Google Patents

Method and system for Trojan horse detection Download PDF

Info

Publication number
CN113515745B
CN113515745B CN202110700781.5A CN202110700781A CN113515745B CN 113515745 B CN113515745 B CN 113515745B CN 202110700781 A CN202110700781 A CN 202110700781A CN 113515745 B CN113515745 B CN 113515745B
Authority
CN
China
Prior art keywords
instruction
function
basic block
code
current
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110700781.5A
Other languages
Chinese (zh)
Other versions
CN113515745A (en
Inventor
罗远哲
刘瑞景
李雪茹
罗晓婷
王玲洁
赵利波
罗晓萌
郭振庭
李文静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing China Super Industry Information Security Technology Ltd By Share Ltd
Original Assignee
Beijing China Super Industry Information Security Technology Ltd By Share Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing China Super Industry Information Security Technology Ltd By Share Ltd filed Critical Beijing China Super Industry Information Security Technology Ltd By Share Ltd
Priority to CN202110700781.5A priority Critical patent/CN113515745B/en
Publication of CN113515745A publication Critical patent/CN113515745A/en
Application granted granted Critical
Publication of CN113515745B publication Critical patent/CN113515745B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/563Static detection by source code analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/75Structural analysis for program understanding

Abstract

The invention relates to a method and a system for Trojan detection, the method analyzes a code to be judged into an abstract syntax tree, constructs a control flow graph from the abstract syntax tree, forms information in an instruction into a constant set, an external world data set, an external world function set, a cleaning function set or a function set according to instruction characteristics in each basic block in the control flow graph by traversing the control flow graph, obtains a state unit set according to the cleaning function set basic block, converts the state unit set into a kripke structure state migration system, further generates an SMV code from the kripke structure state migration system, and detects the SMV code through a model detector, so that whether the code to be judged is judged for Trojan can be judged, and the Trojan detection efficiency is improved.

Description

Method and system for Trojan horse detection
Technical Field
The invention relates to the technical field of information security, in particular to a method and a system for Trojan horse detection.
Background
The website trojan (webshell) is a backdoor program based on Web services. An attacker obtains the management authority of the Web service through the Trojan horse of the website, thereby achieving penetration and control on Web application. As the characteristics of the Trojan horse of the website are almost consistent with those of the common Web page, the detection of the traditional firewall and antivirus software can be avoided. And with the application of various feature confusion hiding technologies for anti-detection to website trojans, the traditional detection mode based on feature code matching is difficult to detect new variants in time.
Disclosure of Invention
The invention aims to provide a method and a system for Trojan horse detection, which improve the Trojan horse detection efficiency.
In order to achieve the purpose, the invention provides the following scheme:
a method for Trojan horse detection, comprising:
analyzing the code to be judged into an abstract syntax tree;
traversing the abstract syntax tree, judging whether each node is contained in a code for changing a control flow, if the node is contained in the code for changing the control flow, rewriting the code for changing the control flow into a goto instruction, and converting the goto instruction into a static single assignment form to obtain an intermediate representation of the static single assignment;
constructing a control flow graph according to the intermediate representation of the static single assignment;
traversing a basic block set formed by basic blocks except the first basic block in the control flow graph, deleting the basic blocks without predecessors in the basic block set, and updating a predecessor successor list of each basic block to obtain an updated control flow graph; the precursor successor list is used for recording a precursor basic block and a successor basic block corresponding to the basic block;
traversing the updated control flow graph by adopting a depth-first strategy, and respectively forming instructions corresponding to each basic block into a constant set, an external world data set, an external world function set, a cleaning function set or a function set; the constant set comprises an instruction with a constant in the instruction, the function set comprises an instruction with a function in the instruction, the external world function set comprises an instruction with an external input function in the instruction, the cleaning function set comprises an instruction with a function which is not the external input function and has a constant return value, and the external world data set comprises an instruction with a variable in the instruction; the external input function is a function of which the parameters comprise external input quantity;
traversing the updated control flow graph by adopting an iterative approximation algorithm, if the current instruction is an instruction with an external input function, adding a function name in the current instruction into the external world function set, adding a return value of a function in the current instruction into the external world data set, if the function in the current instruction is not the external input function and the return value of the function is a constant instruction, adding the function name in the current instruction into the constant set, if at least one parameter in the current instruction function is a variable, adding the return value of the function in the current instruction into the external world data set, if the parameters in the current instruction function are not variables, adding the return value of the function in the current instruction into the constant set, and if the current instruction is an operation instruction and at least one operation in the operation instruction comprises a variable, adding an operation result of the current instruction into the external world data set, if the current instruction is an operation instruction and the operation in the operation instruction does not include variables, adding the operation result of the current instruction into the constant set, and exiting traversal until the number of elements in the constant set, the external world data set, the external world function set and the cleaning function set is unchanged;
splitting an instruction in each basic block in the cleaning function set according to a state unit comprising a function calling instruction, and combining a plurality of split state units into a state unit set according to a precursor successor relationship; each state unit comprises a label, a function call instruction and a precursor successor list;
converting the state unit set into a kripke structure state migration system; the kripke structure state migration system is defined as a quadruple M (S, SO, R, L), wherein S represents a label set in a state unit set, S0 represents a label of a first state unit, R represents a set of label pairs, a successor list of the state unit corresponding to the first label in the label pairs comprises the state unit corresponding to a second label, a successor list of the state unit corresponding to the label S1 comprises the state unit corresponding to the label S2, L represents a function interpretation set, and function interpretations in L are used for recording function call instruction information of the state units;
generating an SMV code according to the kripke structure state migration system;
inputting the SMV code into a model detector for model detection to obtain the type of the code to be judged; the types of the codes to be judged comprise normal files and Trojan horse types.
Optionally, the constructing a control flow graph according to the intermediate representation of the static single assignment specifically includes:
initializing a basic block set and a current basic block;
and traversing all the instructions in the intermediate representation of the static single assignment in sequence, if the current instruction is a jump instruction, taking the instruction position to which the current instruction jumps as the continuation of the current basic block, adding the current basic block into the basic block set, if the current instruction is reversely referred, updating the successor basic block of the last basic block in the basic block set into the current basic block, adding the current instruction into the updated current basic block, adding the updated current basic block into the basic block set, and if the current instruction is not the jump instruction and is not reversely referred, adding the current instruction into the current basic block.
Optionally, traversing the abstract syntax tree, determining whether each node is included in a code that changes a control flow, and if the node is included in the code that changes the control flow, rewriting the code that changes the control flow into a goto instruction, and converting the goto instruction into a static single assignment form to obtain an intermediate representation of the static single assignment, specifically including:
traversing the abstract syntax tree, judging whether each node is contained in the code for changing the control flow, if the node is contained in the code for changing the control flow, rewriting the code for changing the control flow into a goto instruction, and converting the goto instruction into a static single assignment form by adopting an SSA construction algorithm to obtain an intermediate representation of the static single assignment.
Optionally, the inputting the SMV code into a model detector for performing model detection to obtain the type of the code to be determined specifically includes:
and inputting the SMV code into a NuSMV model detector for model detection to obtain the type of the code to be judged.
The invention also discloses a system for Trojan horse detection, which comprises:
the code analysis module is used for analyzing the codes to be judged into an abstract syntax tree;
the static single assignment intermediate representation acquisition module is used for traversing the abstract syntax tree, judging whether each node is contained in a code for changing a control flow, rewriting the code for changing the control flow into a goto instruction if the node is contained in the code for changing the control flow, and converting the goto instruction into a static single assignment form to obtain a static single assignment intermediate representation;
the control flow graph building module is used for building a control flow graph according to the intermediate representation of the static single assignment;
a control flow diagram updating module, configured to traverse a basic block set formed by basic blocks in the control flow diagram except a first basic block, delete a basic block without predecessor in the basic block set, and update a predecessor successor list of each basic block to obtain an updated control flow diagram; the precursor successor list is used for recording a precursor basic block and a successor basic block corresponding to the basic block;
the first traversal module is used for traversing the updated control flow graph by adopting a depth-first strategy and enabling the instructions corresponding to the basic blocks to respectively form a constant set, an external world data set, an external world function set, a cleaning function set or a function set; the constant set comprises an instruction with a constant in the instruction, the function set comprises an instruction with a function in the instruction, the external world function set comprises an instruction with an external input function in the instruction, the cleaning function set comprises an instruction with a function which is not the external input function and has a constant return value, and the external world data set comprises an instruction with a variable in the instruction; the external input function is a function of which the parameters comprise external input quantity;
a second traversal module, configured to traverse the updated control flow graph by using an iterative approximation algorithm, add a function name in a current instruction to the external world function set if the current instruction is an instruction with an external input function, add a return value of a function in the current instruction to the external world data set, add the function name in the current instruction to the constant set if the function in the current instruction is not an external input function and the return value of the function is a constant instruction, add the return value of the function in the current instruction to the external world data set if at least one parameter in the current instruction function is a variable, add the return value of the function in the current instruction to the constant set if none of the parameters in the current instruction function is a variable, add the return value of the function in the current instruction to the constant set if the current instruction is an operation instruction and at least one operation in the operation instruction includes a variable, adding the operation result of the current instruction into the external world data set, if the current instruction is the operation instruction and the operation instruction does not include variables, adding the operation result of the current instruction into the constant set, and exiting traversal until the number of elements in the constant set, the external world data set, the external world function set and the cleaning function set is unchanged;
a state unit set acquisition module, configured to split an instruction in each basic block in the cleaning function set according to a state unit including a function call instruction, and combine the split state units into a state unit set according to a predecessor-successor relationship; each state unit comprises a label, a function call instruction and a precursor successor list;
a kripke structural state migration system acquisition module, configured to convert the state unit set into a kripke structural state migration system; the kripke structure state migration system is defined as a quadruple M (S, SO, R, L), wherein S represents a label set in a state unit set, S0 represents a label of a first state unit, R represents a set of label pairs, a successor list of the state unit corresponding to the first label in the label pairs comprises the state unit corresponding to a second label, a successor list of the state unit corresponding to the label S1 comprises the state unit corresponding to the label S2, L represents a function interpretation set, and function interpretations in L are used for recording function call instruction information of the state units;
an SMV code generation module, configured to generate an SMV code according to the kripke structure state migration system;
the Trojan judgment module is used for inputting the SMV code into a model detector for model detection to obtain the type of the code to be judged; the types of the codes to be judged comprise normal files and Trojan horse types.
Optionally, the control flow graph building module specifically includes:
an initialization unit, configured to initialize a basic block set and a current basic block;
and the instruction judging unit is used for sequentially traversing all the instructions in the intermediate representation of each static single assignment, if the current instruction is a jump instruction, taking the instruction position to which the current instruction jumps as the continuation of the current basic block, adding the current basic block into the basic block set, if the current instruction is reversely quoted, updating the continuation basic block of the last basic block in the basic block set into the current basic block, adding the current instruction into the updated current basic block, adding the updated current basic block into the basic block set, and if the current instruction is not the jump instruction and is not reversely quoted, adding the current instruction into the current basic block.
Optionally, the module for obtaining the intermediate representation of the static single assignment specifically includes:
and the static single assignment intermediate representation acquisition unit is used for traversing the abstract syntax tree, judging whether each node is contained in the code for changing the control flow, rewriting the code for changing the control flow into a goto instruction if the node is contained in the code for changing the control flow, and converting the goto instruction into a static single assignment form by adopting an SSA construction algorithm to obtain the static single assignment intermediate representation.
Optionally, the Trojan horse determination module specifically includes:
and the Trojan judgment unit is used for inputting the SMV code into a NuSMV model detector for model detection to obtain the type of the code to be judged.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the method comprises the steps of analyzing a code to be judged into an abstract syntax tree, constructing a control flow graph from the abstract syntax tree, traversing the control flow graph, forming information in a command into a constant set, an external world data set, an external world function set, a cleaning function set or a function set according to instruction characteristics in each basic block in the control flow graph, obtaining a state unit set according to the cleaning function set basic block, converting the state unit set into a kripke structure state migration system, generating an SMV code by the kripke structure state migration system, and detecting the SMV code through a model detector, so that whether the code to be judged is a Trojan can be judged, and the Trojan detection efficiency is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a schematic flow chart of a method for Trojan horse detection according to the present invention;
FIG. 2 is a schematic diagram of a control flow diagram of the present invention;
FIG. 3 is a simplified flowchart of a method for Trojan horse detection according to the present invention;
fig. 4 is a schematic structural diagram of a system for Trojan horse detection according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Fig. 1 is a schematic flow chart of a method for detecting a Trojan horse, and as shown in fig. 1, the method for detecting a Trojan horse includes:
step 101: and resolving the code to be judged into an abstract syntax tree. The method specifically comprises the following steps: and inputting the code to be judged into an open source Parser PHP-Parser of a corresponding programming language, and parsing out the AST.
Step 102: traversing the abstract syntax tree, judging whether each node is contained in the code for changing the control flow, if the node is contained in the code for changing the control flow, rewriting the code for changing the control flow into a goto instruction, converting the goto instruction into a static single assignment form, and obtaining the intermediate representation of the static single assignment.
As a specific embodiment, the goto instruction is converted into a static single assignment form by adopting an SSA construction algorithm, and an intermediate representation of the static single assignment is obtained.
Step 103: and constructing a control flow graph according to the intermediate representation of the static single assignment.
Wherein, step 103 specifically comprises:
initializing a basic block set and a current basic block;
and traversing all the instructions in the intermediate representation of each static single assignment in sequence, if the current instruction is a jump instruction, taking the instruction position to which the current instruction jumps as the continuation of the current basic block, adding the current basic block into the basic block set, if the current instruction is reversely quoted, updating the continuation basic block of the last basic block in the basic block set into the current basic block, adding the current instruction into the updated current basic block, adding the updated current basic block into the basic block set, and if the current instruction is not the jump instruction and is not reversely quoted, adding the current instruction into the current basic block.
Step 104: traversing a basic block set formed by basic blocks except the first basic block in the control flow graph, deleting the basic blocks without predecessors in the basic block set, and updating a predecessor successor list of each basic block to obtain an updated control flow graph; the precursor successor list is used for recording a precursor basic block and a successor basic block corresponding to the basic block;
step 105: traversing the updated control flow graph by adopting a depth-first strategy, and respectively forming the instructions corresponding to the basic blocks into a constant set, an external world data set, an external world function set, a cleaning function set or a function set; the constant set comprises an instruction with a constant in the instruction, the function set comprises an instruction with a function in the instruction, the external world function set comprises an instruction with an external input function in the instruction, the cleaning function set comprises an instruction with a function which is not the external input function and the return value of the function is a constant, and the external world data set comprises an instruction with a variable in the instruction; the external input function is a function with parameters including external input quantity, and the output value of the external input function is indefinite;
step 106: traversing the updated control flow graph by adopting an iterative approximation algorithm, if the current instruction is an instruction with an external input function, adding a function name in the current instruction into an external world function set, adding a return value of the function in the current instruction into an external world data set, if the function in the current instruction is not the external input function and the return value of the function is a constant instruction, adding the function name in the current instruction into a constant set, if at least one parameter in the current instruction function is a variable, adding the return value of the function in the current instruction into the external world data set, if the parameter in the current instruction function is not a variable, adding the return value of the function in the current instruction into the constant set, if the current instruction is an operation instruction and at least one operation in the operation instruction comprises a variable, adding an operation result of the current instruction into the external world data set, if the current instruction is an operation instruction and the operation in the operation instruction does not include variables, adding the operation result of the current instruction into the constant set, and exiting traversal until the number of elements in the constant set, the external world data set, the external world function set and the cleaning function set is unchanged;
step 107: splitting an instruction in each basic block in the cleaning function set according to a state unit comprising a function calling instruction, and combining a plurality of split state units into a state unit set according to a precursor successor relationship; each state unit comprises a label, a function call instruction and a precursor successor list;
step 108: converting the state unit set into a kripke structure state migration system; the kripke structure state migration system is defined as a quadruplet M (S, SO, R, L), wherein S represents a label set in a state unit set, S0 represents a label of a first state unit, R represents { S1, S2}, a successor list of the state unit corresponding to the label S1 comprises the state unit corresponding to the label S2, L represents a function interpretation set, and each function interpretation in L is used for recording function call instruction information of each state unit;
step 109: generating an SMV code according to a kripke structure state migration system;
step 110: inputting the SMV code into a model detector for model detection to obtain the type of the code to be judged; the types of the codes to be judged comprise normal files and Trojan horse types.
Wherein, step 110 specifically includes:
and inputting the SMV code into a NuSMV model detector for model detection to obtain the type of the code to be judged.
A method for Trojan horse detection according to the present invention is described in an embodiment with reference to fig. 3.
Step1, firstly, the PHP Trojan horse program code is used as text to input an open source Parser PHP-Parser of a corresponding programming language, and the abstract syntax tree AST is parsed by using the open source Parser PHP-Parser.
The php Trojan program code is:
Figure DEST_PATH_IMAGE002
the abstract syntax tree AST is:
array(
0: Stmt_If(
cond: Expr_Isset(
vars: array(
0: Expr_ArrayDimFetch(
var: Expr_Variable(
name: _GET
)
dim: Scalar_String(
value: a
)
)
)
)
stmts: array(
0: Stmt_Expression(
expr: Expr_ErrorSuppress(
expr: Expr_Eval(
expr: Expr_ArrayDimFetch(
var: Expr_Variable(
name: _GET
)
dim: Scalar_String(
value: a
)
)
)
)
)
)
elseifs: array(
)
else: Stmt_Else(
stmts: array(
0: Stmt_Echo(
exprs: array(
0: Scalar_String(
value: not found
)
)
)
)
)
)
)
step 2: traversing the abstract syntax tree, judging whether each node is related to a control flow, wherein the related control flow means that codes for changing the control flow are included in the node codes, namely whether the nodes are contained in branch statements, loop statements, break statements or continue statements, the nodes are all rewritten by goto jump instructions according to the semantics, and then the instructions are converted into a static single assignment form by using an SSA construction algorithm to obtain an intermediate representation IR of the static single assignment.
The method specifically comprises the following steps: traversing the abstract syntax tree, judging whether each node is in an instruction set { Stmt _ If, Stmt _ For }, a statement in the first AST is rewritten by a jump instruction goto For an Stmt _ If statement, and then converting the instruction into a static single assignment form by using an SSA construction algorithm to obtain an intermediate representation IR of the static single assignment as follows:
L1:
v1 = Expr_Variable _GET
v2 = Expr_ArrayDimFetch v1 “a”
If v2 goto L2
param “not found”
call echo 1
goto L3
L2:
param v2
call eval 1
L3:
return
step 3: initializing a basic block set and a current basic block into an empty set, sequentially traversing the instruction of the IR, if the current instruction is a jump instruction, using the jump instruction position loc as a basic block for subsequent operation, when a certain instruction is traversed and is referred by other jump instructions, the basic block needs to be split, the current basic block is added into the basic block set, if the current instruction is referred back, the last basic block of the basic block set is taken out to update the successor of the last basic block to be the current basic block, the current instruction is added into the current basic block, the current basic block is added into the basic block set, otherwise, adding the current instruction into the current basic block to construct a control flow graph CFG, wherein the control flow graph CFG is shown in FIG. 2.
Step 4: and traversing the basic block set of the control flow graph, finding out the basic blocks without predecessors except the first basic block, deleting the basic blocks, and updating predecessor and successor lists of other basic blocks, thereby forming a simplified control flow graph.
Step 5: initializing a constant set CDS = ∅, an external world data set EDS = { _ GET, _ POST, _ REQUEST, _ SERVER }, an external world function set EFS = { socket _ recv, socket _ connect }, a cleaning function set CFS = ∅, a function set FS = ∅, initializing a maximum parameter number mp =0, then adding elements to the above four sets by using a depth-first strategy traversal simplified control, and adding a constant (comprising a word level, a character string and a number) to the CDS when the constant appears in an instruction; when a function appears in the instruction, adding the function into the FS, then comparing the number of the function parameters with the number of the mp, if the number of the function parameters is larger, updating the mp to the number of the function parameters, then judging whether the function is an external input function, if so, adding the EFS, otherwise, judging whether a return value is a constant, and if so, adding the CFS; variables appearing in the instruction that belong to the sets { _ GET, _ POST, _ REQUEST, _ SERVER, ARGV, ARGC } are added to the EDS.
Step 6: traversing the control flow graph in the step four by using an iterative approximation algorithm, and performing abstract interpretation on each instruction: when a function call instruction is encountered, if the function name belongs to EFS, adding a return value into EDS, if the function name belongs to CFS, adding a function return value into CDS, if at least one parameter in the function belongs to EDS, adding a function return value into EDS, and if the function does not include a parameter belonging to EDS, adding a function return value into CDS; when an operation instruction is encountered, if at least one operand in the operation instruction belongs to the EDS, the operation result is added into the EDS, otherwise, the operation result is added into the CDS, and the process is repeated continuously until the elements of each set are not changed any more, and the loop exits.
In a particular embodiment, an abstract interpretation is performed for each instruction:
the first statement v 1= Expr _ Variable _ GET, encounters the function Expr _ Variable, parameter _ GET is in EDS, and then adds v1 to EDS.
The second statement v2 = Expr _ arraydifetch v1 "a", two parameters being v1 and the constant "a", respectively, since v1 belongs to the EDS, the return value v2 is added to the EDS and "a" is added to the CDS.
All statements are processed in this way, and then the process is repeated to add elements to the sets until the elements of each set do not change any more, and the loop exits, at which point EDS = { _ GET, _ POST, _ REQUEST, _ SERVER, v1, v2} is obtained, and CDS { "a", "not found" }.
Step 7: state units SU are defined, each state unit containing a label, a function call instruction and a predecessor successor list of the state unit. And splitting the CFG, namely splitting the instruction in each basic block according to one state unit and one function call, splitting a plurality of state units, and stringing all the state units together through a precursor successor list to form a state unit set.
In a specific embodiment, the first basic block will be split into two state units, labeled s1 and s2, respectively, due to the two function calls, and s1 and s2 are connected in the figure, so s2 is s1 successor and s1 is predecessor of s 2.
The second basic block, the third basic block and the fourth basic block only contain one function call and do not need to be split.
Obtaining a state unit set SUS = { s1, s2, s3, s4, s5};
s1.succs={s2},s1.preds=∅,f1= Expr_Variable;
s2.succ={s3,s5},s2.preds={s1},f2=Expr_ArrayDimFetch;
s3.succ={s4},s2.preds={s2},f3=echo;
s4.succ=∅,s4.preds={s3,s5},f4=return;
s5.succ={s4},s5.preds={s2},f5=eval;
step 8: the state cell set at Step7 is formalized as a Kripke structure state transition system, which can be defined as a quadruple M (S, S0, R, L). Wherein, one status unit corresponds to the index of one SU, S represents the index set of the status unit set, S0 is the index of the first status unit, S = { S1, S2, S3, S4, S5}, S0= S1, R: { (S1, S2) | S1 corresponds to the successor list of SUs includes SU } = { (S1, S2), (S2, S3), (S2, S5), (S3, S4), (S5, S4) } corresponding to S2, L is an interpretation function (function interpretation set), L is S → 2 → S → 2AP
L is used to derive the details of the state from the state label, where the atomic proposition set AP contains a formal representation of the portion of the SU other than the label, e.g., the instruction label is used as a constraint predicate such as s =2; the source set of parameters (EDS or CDS) at the time of function call is used as a dependent predicate, such as param1= CDS.
L:S→2AP={(s1,{s=1,f=Expr_Variable,param1=EDS}),
(s2,{s=2,f=Expr_ArrayDimFetch,param1=EDS,param2=CDS}),
(s3,{s=3,f=echo,param1=CDS}),
(s4,{s=4,f=return}),
(s5,{s=5,f=eval,param1=EDS})}。
Step 9: the Kripke state migration system is then compiled into a modular part of SMV (Symbolic Model Verifier Symbolic Model validator). The SMV module comprises VAR and ASSIGN parts.
Wherein the VAR part uses a variable S to determine that the type of S of Kripke is an enumeration type and is used for state transition; using a variable func to indicate which function is called, wherein the func is of an enumeration type and corresponds to a function set FS in Step 5; the function parameter is represented by using a variable param1, a variable param2, and a variable paramk, wherein the value of k is the maximum parameter number mp calculated in Step5, the variable type is an enumeration type, and values { EDS, CDS, NULL } are taken from the following set, the value is EDS, which represents that the source of the parameter is the outside world, the value is CDS, which represents the constant of the source, and the value is NULL, which represents that the parameter is not provided in the state.
Where the ASSIGN section initializes the states using init(s) = s1, ASSIGNs function names of state variables func in each state using case statements, ASSIGNs source of state variables param1 through paramk in each state using case statements, ASSIGNs next(s) using case statements, and updates the index of the next state in each state.
The block section code of SMV is expressed as:
MODULE main
VAR
s:{s1,s2,s3,s4,s5};
param1:{EDS,CDS,NULL};
param2:{EDS,CDS,NULL};
f:{eval,Expr_Variable,Expr_ArrayDimFetch,echo,return};
ASSIGN
init(s):=s1;
next(s):=
case
s=s1 : s2;
s=s2 : {s3,s5};
s=s3 : s4;
s=s5 : s4;
s=s4 : s4;
esac;
f :=
case
s=s1 : Expr_Variable;
s=s2 : Expr_ArrayDimFetch;
s=s3 : echo;
s=s4 : return;
s=s5 : eval;
esac;
param1 :=
case
s=s1 : EDS;
s=s2 : EDS;
s=s3 : CDS;
s=s5 : EDS;
TRUE : NULL;
esac;
param2 :=
case
s=s2 : CDS;
TRUE : NULL;
esac.
step 10: a detection rule list is established by using LTL (Linear temporal logic) according to expert experience construction, and the detection rule list is compiled into a specification part of the SMV after logic inversion.
The method specifically comprises the following steps: and (4) establishing a setting rule according to expert experience, and if the parameters of the eval function originate from the outside world, judging the eval function as a malicious program. The spec (canonical) portion of the SMV code is written after inversion by the CTL computation tree logic.
spec part code is as follows:
SPEC
!EF (f=eval & param1 = EDS)
step 11: after generating a complete SMV code, inputting the complete SMV code into an open source model detection tool NuSMV for model detection (model checking), searching in a state space, judging whether the model meets the properties defined by the rules, and if so, judging that the model is a normal file; if not, the SMV will give a counter-example of the violation property, thereby determining which rule is not satisfied, resulting in the type of Trojan horse.
After generating a complete SMV code, inputting the code into an open source model detection tool NuSMV for model detection (model checking) to obtain the output:
-- specification !(EF (f = eval & param1 = EDS)) is false
-- as demonstrated by the following execution sequence
Trace Description: CTL Counterexample
Trace Type: Counterexample
-> State: 1.1 <-
s = s1
param1 = EDS
param2 = NULL
f = Expr_Variable
-> State: 1.2 <-
s = s2
param2 = CDS
f = Expr_ArrayDimFetch
-> State: 1.3 <-
s = s5
param2 = NULL
f = eval
the above output proves that the php file is a rule-type Trojan.
Fig. 4 is a schematic structural diagram of a system for Trojan horse detection according to the present invention, and as shown in fig. 4, the system for Trojan horse detection includes:
a code analysis module 201, configured to analyze a code to be determined into an abstract syntax tree;
the static single-assignment intermediate representation acquisition module 202 is configured to traverse the abstract syntax tree, determine whether each node is included in the code that changes the control flow, rewrite the code that changes the control flow into a goto instruction if the node is included in the code that changes the control flow, and convert the goto instruction into a static single-assignment form to obtain an intermediate representation of the static single assignment;
a control flow graph construction module 203, configured to construct a control flow graph according to the intermediate representation of the static single assignment;
a control flow diagram updating module 204, configured to traverse a basic block set formed by basic blocks in the control flow diagram except a first basic block, delete a basic block without predecessor in the basic block set, and update a predecessor successor list of each basic block to obtain an updated control flow diagram; the precursor successor list is used for recording a precursor basic block and a successor basic block corresponding to the basic block;
a first traversal module 205, configured to traverse the updated control flow graph by using a depth-first policy, and form, into a constant set, an external world data set, an external world function set, a cleaning function set, or a function set, instructions corresponding to each basic block, respectively; the constant set comprises an instruction with a constant in the instruction, the function set comprises an instruction with a function in the instruction, the external world function set comprises an instruction with an external input function in the instruction, the cleaning function set comprises an instruction with a function which is not the external input function and the return value of the function is a constant, and the external world data set comprises an instruction with a variable in the instruction; the external input function is a function of which the parameters comprise external input quantity;
a second traversal module 206, configured to traverse the updated control flow graph by using an iterative approximation algorithm, add a function name in the current instruction to an external world function set if the current instruction is an instruction with an external input function, add a return value of a function in the current instruction to the external world data set, add the function name in the current instruction to a constant set if the function in the current instruction is not an external input function and the return value of the function is a constant instruction, add the return value of the function in the current instruction to the external world data set if at least one parameter in the current instruction function is a variable, add the return value of the function in the current instruction to the constant set if none of the parameters in the current instruction function is a variable, add the operation result of the current instruction to the external world data set if the current instruction is an operation instruction and at least one operation in the operation instruction includes a variable, if the current instruction is an operation instruction and the operation in the operation instruction does not include variables, adding the operation result of the current instruction into the constant set, and exiting traversal until the number of elements in the constant set, the external world data set, the external world function set and the cleaning function set is unchanged;
a state unit set obtaining module 207, configured to split an instruction in each basic block in the cleaning function set according to a state unit including a function call instruction, and combine the split state units into a state unit set according to a predecessor-successor relationship; each state unit comprises a label, a function call instruction and a precursor successor list;
a kripke structural state migration system obtaining module 208, configured to convert the state unit set into a kripke structural state migration system; the kripke structure state migration system is defined as a quadruplet M (S, SO, R, L), wherein S represents a label set in a state unit set, S0 represents a label of a first state unit, R represents { S1, S2}, a successor list of the state unit corresponding to the label S1 comprises the state unit corresponding to the label S2, L represents a function interpretation set, and each function interpretation in L is used for recording function call instruction information of each state unit;
an SMV code generating module 209, configured to generate an SMV code according to the kripke structure state migration system;
the Trojan judgment module 210 is configured to input the SMV code into a model detector for performing model detection, and obtain a type of the code to be judged; the types of the codes to be judged comprise normal files and Trojan horse types.
The control flow graph constructing module 203 specifically includes:
an initialization unit, configured to initialize a basic block set and a current basic block;
and the instruction judging unit is used for sequentially traversing all the instructions in the intermediate representation of each static single assignment, if the current instruction is a jump instruction, taking the instruction position to which the current instruction jumps as the successor of the current basic block, adding the current basic block into the basic block set, if the current instruction is reversely quoted, updating the successor basic block of the last basic block in the basic block set into the current basic block, adding the current instruction into the updated current basic block, adding the updated current basic block into the basic block set, and if the current instruction is not the jump instruction and is not reversely quoted, adding the current instruction into the current basic block.
The static single assignment intermediate representation obtaining module 202 specifically includes:
and the static single assignment intermediate representation acquisition unit is used for traversing the abstract syntax tree, judging whether each node is contained in the code for changing the control flow, rewriting the code for changing the control flow into a goto instruction if the node is contained in the code for changing the control flow, and converting the goto instruction into a static single assignment form by adopting an SSA construction algorithm to obtain the static single assignment intermediate representation.
The Trojan horse determining module 210 specifically includes:
and the Trojan judgment unit is used for inputting the SMV code into the NuSMV model detector for model detection to obtain the type of the code to be judged.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims (8)

1. A method for Trojan horse detection, comprising:
analyzing the code to be judged into an abstract syntax tree;
traversing the abstract syntax tree, judging whether each node is contained in a code for changing a control flow, if the node is contained in the code for changing the control flow, rewriting the code for changing the control flow into a goto instruction, and converting the goto instruction into a static single assignment form to obtain an intermediate representation of the static single assignment;
constructing a control flow graph according to the intermediate representation of the static single assignment;
traversing a basic block set formed by basic blocks except the first basic block in the control flow graph, deleting the basic blocks without predecessors in the basic block set, and updating a predecessor successor list of each basic block to obtain an updated control flow graph; the precursor successor list is used for recording a precursor basic block and a successor basic block corresponding to the basic block;
traversing the updated control flow graph by adopting a depth-first strategy, and respectively forming instructions corresponding to each basic block into a constant set, an external world data set, an external world function set, a cleaning function set or a function set; the constant set comprises an instruction with a constant in the instruction, the function set comprises an instruction with a function in the instruction, the external world function set comprises an instruction with an external input function in the instruction, the cleaning function set comprises an instruction with a function which is not the external input function and has a constant return value, and the external world data set comprises an instruction with a variable in the instruction; the external input function is a function of which the parameters comprise external input quantity;
traversing the updated control flow graph by adopting an iterative approximation algorithm, if the current instruction is an instruction with an external input function, adding a function name in the current instruction into the external world function set, adding a return value of a function in the current instruction into the external world data set, if the function in the current instruction is not the external input function and the return value of the function is a constant instruction, adding the function name in the current instruction into the constant set, if at least one parameter in the current instruction function is a variable, adding the return value of the function in the current instruction into the external world data set, if the parameters in the current instruction function are not variables, adding the return value of the function in the current instruction into the constant set, and if the current instruction is an operation instruction and at least one operation in the operation instruction comprises a variable, adding an operation result of the current instruction into the external world data set, if the current instruction is an operation instruction and the operation in the operation instruction does not include variables, adding the operation result of the current instruction into the constant set, and exiting traversal until the number of elements in the constant set, the external world data set, the external world function set and the cleaning function set is unchanged;
splitting an instruction in each basic block in the cleaning function set according to a state unit comprising a function calling instruction, and combining a plurality of split state units into a state unit set according to a precursor successor relationship; each state unit comprises a label, a function call instruction and a precursor successor list;
converting the state unit set into a kripke structure state migration system; the kripke structure state migration system is defined as a quadruple M (S, S0, R, L), wherein S represents a label set in a state unit set, S0 represents a label of a first state unit, R represents a set of label pairs, a subsequent list of the state unit corresponding to the first label in the label pairs comprises the state unit corresponding to a second label, a subsequent list of the state unit corresponding to the label S1 comprises the state unit corresponding to the label S2, L represents a function interpretation set, and function interpretations in L are used for recording function call instruction information of the state units;
generating an SMV code according to the kripke structure state migration system;
inputting the SMV code into a model detector for model detection to obtain the type of the code to be judged; the types of the codes to be judged comprise normal files and Trojan horse types.
2. The method for Trojan horse detection according to claim 1, wherein the constructing a control flow graph according to the intermediate representation of the static single assignment specifically comprises:
initializing a basic block set and a current basic block;
and traversing all the instructions in the intermediate representation of the static single assignment in sequence, if the current instruction is a jump instruction, taking the instruction position to which the current instruction jumps as the continuation of the current basic block, adding the current basic block into the basic block set, if the current instruction is reversely referred, updating the successor basic block of the last basic block in the basic block set into the current basic block, adding the current instruction into the updated current basic block, adding the updated current basic block into the basic block set, and if the current instruction is not the jump instruction and is not reversely referred, adding the current instruction into the current basic block.
3. The method as claimed in claim 1, wherein the traversing the abstract syntax tree determines whether each node is included in the code for changing the control flow, and if the node is included in the code for changing the control flow, the method rewrites the code for changing the control flow into a goto instruction, and converts the goto instruction into a static single assignment form, so as to obtain an intermediate representation of the static single assignment, specifically comprises:
traversing the abstract syntax tree, judging whether each node is contained in the code for changing the control flow, if the node is contained in the code for changing the control flow, rewriting the code for changing the control flow into a goto instruction, and converting the goto instruction into a static single assignment form by adopting an SSA construction algorithm to obtain an intermediate representation of the static single assignment.
4. The method according to claim 1, wherein the inputting the SMV code into a model detector for model detection to obtain the type of the code to be determined specifically comprises:
and inputting the SMV code into a NuSMV model detector for model detection to obtain the type of the code to be judged.
5. A system for Trojan horse detection, comprising:
the code analysis module is used for analyzing the codes to be judged into an abstract syntax tree;
the static single assignment intermediate representation acquisition module is used for traversing the abstract syntax tree, judging whether each node is contained in a code for changing a control flow, rewriting the code for changing the control flow into a goto instruction if the node is contained in the code for changing the control flow, and converting the goto instruction into a static single assignment form to obtain a static single assignment intermediate representation;
the control flow graph building module is used for building a control flow graph according to the intermediate representation of the static single assignment;
a control flow diagram updating module, configured to traverse a basic block set formed by basic blocks in the control flow diagram except a first basic block, delete a basic block without predecessor in the basic block set, and update a predecessor successor list of each basic block to obtain an updated control flow diagram; the precursor successor list is used for recording a precursor basic block and a successor basic block corresponding to the basic block;
the first traversal module is used for traversing the updated control flow graph by adopting a depth-first strategy and enabling the instructions corresponding to the basic blocks to respectively form a constant set, an external world data set, an external world function set, a cleaning function set or a function set; the constant set comprises an instruction with a constant in the instruction, the function set comprises an instruction with a function in the instruction, the external world function set comprises an instruction with an external input function in the instruction, the cleaning function set comprises an instruction with a function which is not the external input function and has a constant return value, and the external world data set comprises an instruction with a variable in the instruction; the external input function is a function of which the parameters comprise external input quantity;
a second traversal module, configured to traverse the updated control flow graph by using an iterative approximation algorithm, add a function name in a current instruction to the external world function set if the current instruction is an instruction with an external input function, add a return value of a function in the current instruction to the external world data set, add the function name in the current instruction to the constant set if the function in the current instruction is not an external input function and the return value of the function is a constant instruction, add the return value of the function in the current instruction to the external world data set if at least one parameter in the current instruction function is a variable, add the return value of the function in the current instruction to the constant set if none of the parameters in the current instruction function is a variable, add the return value of the function in the current instruction to the constant set if the current instruction is an operation instruction and at least one operation in the operation instruction includes a variable, adding the operation result of the current instruction into the external world data set, if the current instruction is the operation instruction and the operation instruction does not include variables, adding the operation result of the current instruction into the constant set, and exiting traversal until the number of elements in the constant set, the external world data set, the external world function set and the cleaning function set is unchanged;
a state unit set acquisition module, configured to split an instruction in each basic block in the cleaning function set according to a state unit including a function call instruction, and combine the split state units into a state unit set according to a predecessor-successor relationship; each state unit comprises a label, a function call instruction and a precursor successor list;
a kripke structural state migration system acquisition module, configured to convert the state unit set into a kripke structural state migration system; the kripke structure state migration system is defined as a quadruple M (S, S0, R, L), wherein S represents a label set in a state unit set, S0 represents a label of a first state unit, R represents a set of label pairs, a subsequent list of the state unit corresponding to the first label in the label pairs comprises the state unit corresponding to a second label, a subsequent list of the state unit corresponding to the label S1 comprises the state unit corresponding to the label S2, L represents a function interpretation set, and function interpretations in L are used for recording function call instruction information of the state units;
an SMV code generation module, configured to generate an SMV code according to the kripke structure state migration system;
the Trojan judgment module is used for inputting the SMV code into a model detector for model detection to obtain the type of the code to be judged; the types of the codes to be judged comprise normal files and Trojan horse types.
6. The system for Trojan horse detection according to claim 5, wherein the control flow graph construction module specifically comprises:
an initialization unit, configured to initialize a basic block set and a current basic block;
and the instruction judging unit is used for sequentially traversing all the instructions in the intermediate representation of each static single assignment, if the current instruction is a jump instruction, taking the instruction position to which the current instruction jumps as the continuation of the current basic block, adding the current basic block into the basic block set, if the current instruction is reversely quoted, updating the continuation basic block of the last basic block in the basic block set into the current basic block, adding the current instruction into the updated current basic block, adding the updated current basic block into the basic block set, and if the current instruction is not the jump instruction and is not reversely quoted, adding the current instruction into the current basic block.
7. The system according to claim 5, wherein the module for obtaining the intermediate representation of the static single assignment specifically comprises:
and the static single assignment intermediate representation acquisition unit is used for traversing the abstract syntax tree, judging whether each node is contained in the code for changing the control flow, rewriting the code for changing the control flow into a goto instruction if the node is contained in the code for changing the control flow, and converting the goto instruction into a static single assignment form by adopting an SSA construction algorithm to obtain the static single assignment intermediate representation.
8. The system for Trojan horse detection according to claim 5, wherein the Trojan horse determination module specifically comprises:
and the Trojan judgment unit is used for inputting the SMV code into a NuSMV model detector for model detection to obtain the type of the code to be judged.
CN202110700781.5A 2021-06-24 2021-06-24 Method and system for Trojan horse detection Active CN113515745B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110700781.5A CN113515745B (en) 2021-06-24 2021-06-24 Method and system for Trojan horse detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110700781.5A CN113515745B (en) 2021-06-24 2021-06-24 Method and system for Trojan horse detection

Publications (2)

Publication Number Publication Date
CN113515745A CN113515745A (en) 2021-10-19
CN113515745B true CN113515745B (en) 2021-12-21

Family

ID=78065879

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110700781.5A Active CN113515745B (en) 2021-06-24 2021-06-24 Method and system for Trojan horse detection

Country Status (1)

Country Link
CN (1) CN113515745B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117008911A (en) * 2022-04-28 2023-11-07 三六零数字安全科技集团有限公司 Code detection method, device, equipment and storage medium based on abstract syntax tree
CN117251171B (en) * 2023-11-20 2024-04-12 常熟理工学院 Predicate basic block detection method and equipment in control flow graph

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102841844B (en) * 2012-07-13 2015-12-16 北京航空航天大学 A kind of binary code bug excavation method based on simple and easy semiology analysis
US11188850B2 (en) * 2018-03-30 2021-11-30 Derek Alexander Pisner Automated feature engineering of hierarchical ensemble connectomes
CN110198319B (en) * 2019-06-03 2020-09-15 电子科技大学 Security protocol vulnerability mining method based on multiple counter-examples
CN111695119B (en) * 2020-05-13 2023-08-01 国家电网有限公司 Web vulnerability detection method based on fine-grained static stain analysis and symbol execution

Also Published As

Publication number Publication date
CN113515745A (en) 2021-10-19

Similar Documents

Publication Publication Date Title
CN111639344B (en) Vulnerability detection method and device based on neural network
Brockschmidt et al. Generative code modeling with graphs
US10122749B2 (en) Systems and methods for analyzing software using queries
CN110383238B (en) System and method for model-based software analysis
CN113515745B (en) Method and system for Trojan horse detection
Costantini et al. A suite of abstract domains for static analysis of string values
CN103036730A (en) Method and device for achieving safety testing on protocol implementation
US20100037213A1 (en) Grammar-based generation of types and extensions
CN115309451A (en) Code clone detection method, device, equipment, storage medium and program product
CN111813675A (en) SSA structure analysis method and device, electronic equipment and storage medium
US7409619B2 (en) System and methods for authoring domain specific rule-driven data generators
CN112860263A (en) Contract defect detection method based on intelligent contract knowledge graph
Benzmüller Higher-order automated theorem provers
Vikram et al. Growing a test corpus with bonsai fuzzing
Höller Translating totally ordered HTN planning problems to classical planning problems using regular approximation of context-free languages
Zhu et al. A neural network architecture for program understanding inspired by human behaviors
Ortin et al. Cnerator: A Python application for the controlled stochastic generation of standard C source code
Alkhalaf Automatic Detection and Repair of Input Validation and Sanitization Bugs
Utkin et al. Evaluating the impact of source code parsers on ML4SE models
Dong et al. SolChecker: A Practical Static Analysis Framework for Ethereum Smart Contract
Haslbeck et al. An Isabelle/HOL formalization of AProVE’s termination method for LLVM IR
Paranjpe et al. Bohemia–A Validator for Parser Frameworks
CN115037648B (en) Intelligent contract test case generation method and system based on data flow reduction
Yli-Jyrä Applications of diamonded double negation
Yang et al. Api misuse detection method based on transformer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant