CN107169358A - Code homology detection method and its device based on code fingerprint - Google Patents
Code homology detection method and its device based on code fingerprint Download PDFInfo
- Publication number
- CN107169358A CN107169358A CN201710375425.4A CN201710375425A CN107169358A CN 107169358 A CN107169358 A CN 107169358A CN 201710375425 A CN201710375425 A CN 201710375425A CN 107169358 A CN107169358 A CN 107169358A
- Authority
- CN
- China
- Prior art keywords
- code
- homology
- mrow
- coefficient
- fingerprint
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 59
- 238000000034 method Methods 0.000 claims abstract description 27
- 238000004040 coloring Methods 0.000 claims abstract description 15
- 238000004458 analytical method Methods 0.000 claims abstract description 13
- 238000004364 calculation method Methods 0.000 claims abstract description 6
- 230000006870 function Effects 0.000 claims description 44
- 230000000875 corresponding effect Effects 0.000 claims description 17
- 238000004422 calculation algorithm Methods 0.000 claims description 16
- 230000000750 progressive effect Effects 0.000 claims description 7
- 230000014509 gene expression Effects 0.000 claims description 6
- 238000010276 construction Methods 0.000 claims description 5
- 230000003068 static effect Effects 0.000 claims description 5
- 238000012545 processing Methods 0.000 claims description 3
- 230000007246 mechanism Effects 0.000 abstract description 2
- 238000005516 engineering process Methods 0.000 description 9
- 230000008569 process Effects 0.000 description 8
- 238000010586 diagram Methods 0.000 description 6
- 230000006399 behavior Effects 0.000 description 5
- 241000700605 Viruses Species 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 238000013515 script Methods 0.000 description 2
- 238000012916 structural analysis Methods 0.000 description 2
- SPBWHPXCWJLQRU-FITJORAGSA-N 4-amino-8-[(2r,3r,4s,5r)-3,4-dihydroxy-5-(hydroxymethyl)oxolan-2-yl]-5-oxopyrido[2,3-d]pyrimidine-6-carboxamide Chemical compound C12=NC=NC(N)=C2C(=O)C(C(=O)N)=CN1[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O SPBWHPXCWJLQRU-FITJORAGSA-N 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 238000006386 neutralization reaction Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000036961 partial effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004083 survival effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/562—Static detection
- G06F21/563—Static detection by source code analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/42—Syntactic analysis
- G06F8/425—Lexical analysis
Landscapes
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Computer Security & Cryptography (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Virology (AREA)
- Stored Programmes (AREA)
Abstract
The present invention relates to a kind of code homology detection method and its device based on code fingerprint, this method is included:Dependence analysis is carried out to input code, original program dependency graph PDG is obtained;Structure simplification, nested removal and coloring treatment are carried out to original program dependency graph PDG, obtains and simplifies program dependency graph sPDG;Based on the crucial syntactic information of abstract syntax tree parsing code;The system call sequence of code execution path is extracted, the complete trails parameter vector set of object code is obtained, code fingerprint is built;Homologous property coefficient between calculation code fingerprint part;Two parts of codes S and T homologous sex index are calculated according to homologous property coefficient, the affinity that code both sides are present is judged by the homologous sex index.The present invention can take into account code semanteme and behavior on the basis of similitude, improve detection efficiency with simplifying mechanism using the feature of lightweight, multi-angle weighs the affinity existed between code, can improve detection efficiency while guarantee accuracy.
Description
Technical Field
The invention belongs to the technical field of computer software application, and particularly relates to a code homology detection method and a code homology detection device based on code fingerprints.
Background
With the increase of the demands of various internet applications and the increase of the code iteration speed, higher demands are made on the development efficiency and speed of programmers. On a software development pipeline, template-based secondary development and reuse of existing components are common phenomena; meanwhile, in order to solve new requirements, developers usually refer to codes in an open source code warehouse in the internet. This has led to the growing number of homologous codes through different channels, and the spread of hidden defects and errors in the codes. Meanwhile, with the continuous development of computer security technology and the continuous improvement of virus detection technology, the probability that malicious codes such as macro viruses, malicious VBS scripts, malicious JavaScript scripts and the like on the internet are detected is higher and higher, and an attacker needs to bypass detection by means of modifying code content, converting code forms and the like on the basis of original codes, so that the survival capability of the malicious codes is improved. Endogenous homology exists among various versions of the same malicious code, and the homology is an important basis for detecting the same malicious code.
As an important aspect of computer program research, homology detection techniques for software source code at present are mainly classified into the following types: text-based software homology detection, structural analysis-based software homology detection, and semantic-based software homology detection. One) text-based software homology detection technology, the detected object is the text of the source code, for example, the similarity detection of the code based on the similarity of the text and the attribute of the text. One benefit of treating program source code as text analysis is that it is not tied to the programming language used to analyze the object, but because it does not take into account the linguistic characteristics of the code, such methods are generally less resistant to code obfuscation. Simple code obfuscation means are as follows: the detection effect can be greatly influenced by replacing the variable function name, inserting the junk code, disordering the sentence sequence on the premise of not influencing the function and the like. Therefore, the technology can only carry out simple homology detection from the text level, and has larger limitation. Second) software homology detection techniques based on structural analysis, through analysis of code structures and expression in other comparable intermediate forms, token-based, tree-based and graph-based detection methods, etc. are common. Compared with a text-based detection method, the technology has a better detection effect and has certain resistance to common confusion means. But the computational complexity depends on the method of intermediate representation, and the complex structure can bring large performance overhead in the detection process. Thirdly), extracting features such as control flow, data flow, standard API flow and the like on the basis of static semantic analysis by a semantic-based software homology detection technology, and depicting program behaviors from different angles; or compiling and executing the source code, and recording a program instruction stream and a system calling sequence to describe the program behavior. The technology essentially describes the semantic and behavior characteristics of a program, and can effectively deal with the challenge of homology detection caused by various code confusion. However, the method based on semantics cannot effectively cover the self characteristics of the code, and meanwhile, the difficulty in carrying out accurate semantic analysis is high.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a code homology detection method and a code homology detection device based on code fingerprints, which solve the problems of poor confusion interference resistance and low detection efficiency in the software source code detection process, can accurately extract code features, effectively cope with the influence caused by common code confusion methods, improve the homology detection efficiency and the detection accuracy, and effectively prevent the spread of malicious codes.
According to the design scheme provided by the invention, the code homology detection method based on the code fingerprint comprises the following steps:
step 1, analyzing the dependency relationship of two input codes S and T to obtain an original program dependency graph PDG; carrying out structure simplification, nesting removal and coloring treatment on the original program dependency graph PDG to obtain a simplified program dependency graph sPDG;
step 2, analyzing key syntax information of the code based on the abstract syntax tree;
step 3, extracting a system calling sequence of the code execution path, acquiring a full path parameter vector set of the target code, and constructing a code fingerprint;
step 4, calculating the homology coefficient among code fingerprint parts, wherein the homology coefficient comprises a simplified program dependence graph sPDG isomorphic coefficient PS,TSyntax information superposition coefficient CS,TAnd the system call sequence similarity coefficient AS,T;
And 5, calculating the homology indexes of the two codes S and T according to the homology coefficients, and judging the homology relation between the two codes according to the homology indexes.
As described above, in step 1, the original program dependency graph PDG is subjected to structure simplification, nesting removal, and coloring processing, and the simplified program dependency graph sPDG is obtained, which includes the following contents:
step 11, simplifying the structure of the original program dependence graph PDG according to a simplification principle;
step 12, removing the nested input nodes and output nodes from the nodes containing the nested relation, and removing the edges of the corresponding dependency relation to the outer layer function call nodes;
and step 13, classifying and coloring the nodes according to the statement types, and acquiring the simplified program dependence graph sPDG.
The above simplified principle in step 11 includes: removing vertices with only one outgoing edge without any incoming edge, removing vertices with only one incoming edge without any outgoing edge; removing vertices with only one input and one output edge and introducing a point from its input vertex to the output vertex; removing does not have any incoming or outgoing edge vertices.
In the above, the parsing of the key syntax information of the code based on the abstract syntax tree in step 2 includes the following contents:
step 21, recording global variables, local variables and attributes thereof in the designated code domain to form a quadruple, wherein the quadruple comprises a scope of the variables, link attributes, storage types and names;
step 22, analyzing and recording the macro definition and the corresponding content thereof to form a triple, wherein the triple comprises a macro definition identifier, a content type and a name;
and step 23, analyzing a key data structure in the code based on the abstract syntax tree AST, and recording a custom structure body in the code in a sequence form.
As described above, step 3 includes the following steps:
step 31, starting from an entry function, generating a call graph and a subsequent domination tree of a function f, extracting a system call sequence K in a single execution path, and recording a system call sequence set K in all possible execution paths;
step 32, for each function f in the system calling sequence k, locating the function domain where the function f is locatedd, analyzing all parameters of the function f in the abstract syntax tree, determining the data source s of each parameter in the f through static taint analysis, judging the source of the parameter value, and combining the data type t of the parameter to form a parameter vector e of the function ff(d, f, t, s); obtaining a parameter vector set E of a system call sequence kk;
Step 33, executing step 32 on each sequence K in the system call sequence set K, and acquiring a full-path parameter vector set E of the target codeK;
Step 34, according to the target code, the full path parameter vector set EKAnd constructing the code fingerprint.
As mentioned above, the step 3 includes the following steps:
step 31, respectively calculating the field relevance, recommendation response rate and recommendation satisfaction rate of recommenders in the initial set of recommenders;
step 32, setting a recommendation response rate, a recommendation satisfaction rate and a domain correlation acceptance threshold, screening all recommenders in the initial set of recommenders through the acceptance threshold, and removing recommenders lower than the acceptance threshold from the initial set of recommenders;
and step 33, obtaining a recommender candidate set through screening.
As described above, step 4 includes the following steps:
step 41, for the simplified program dependence graph sPDG of the target code, seeking the maximum isomorphic subgraph between the simplified program dependence graphs sPDG through a progressive graph isomorphism solving algorithm, and calculating the isomorphism coefficient P between the simplified program dependence graphs sPDGS,T;
Step 42, calculating a coincidence coefficient C through a Jaccard algorithm according to the key grammar information obtained in the step 2S,T;
Step 43, according to the target code in step 3, the full path parameter vector set EKSolving for E by JaccardKSimilarity of subsetsTaking the highest value as the similarity coefficient A of the system call sequenceS,T。
Preferably, in step 41, the original program dependency graph PDG is represented as a directed graph G ═ V, E, the node set V represents a set of predicate expressions or statements, E represents data dependencies and control dependencies existing between the parts, and let G1=(V1,E1), G2=(V2,E2) Respectively representing a simplified program dependence graph sPDG, by evaluating the functions:
the calculation of the isomorphic coefficients P between the reduced program dependency graphs sPDG.
Preferably, step 42 includes the inclusion of coincidence coefficients of a single syntax information αWherein,respectively, the grammar information α sequence corresponding to two codes S and T, calculating the coincidence coefficient of key grammar informationwαIs the weight of the syntax information α.
As described above, in step 5, the following formula is used:calculating the homology index of the two codes S and T, wherein wPWeight of isomorphic coefficients, w, for sPDG graphCAs weights of coincidence coefficients of syntax information, wAWeights for the system call sequence similarity coefficients.
A code homology detection apparatus based on code fingerprints, comprising: the system comprises a program simplifying module, a grammar analyzing module, a fingerprint constructing module, a homology coefficient acquiring module and a homology judging module;
the program simplification module is used for analyzing the dependency relationship between the two input codes S and T and acquiring an original program dependency graph PDG; carrying out structure simplification, nesting removal and coloring treatment on the original program dependency graph PDG to obtain a simplified program dependency graph sPDG;
the grammar parsing module is used for parsing key grammar information of the codes based on an abstract grammar tree and comprises a variable parsing unit, a macro definition parsing unit and a key data structure parsing unit, wherein the variable parsing unit is used for recording global variables, local variables and corresponding action domains, link attributes and storage types of the global variables and the local variables in a designated code domain, the macro definition parsing unit is used for recording macro definitions and corresponding content types of the macro definitions, and the key data structure parsing unit is used for parsing all classes and structural bodies defined in functions in a target code domain;
the fingerprint construction module is used for extracting a system calling sequence of a code execution path, acquiring a full-path parameter vector set of a target code and constructing a code fingerprint;
a homology coefficient acquisition module for calculating the homology coefficient between code fingerprint components according to the information acquired by the program simplification module, the grammar analysis module and the fingerprint construction module, wherein the homology coefficient comprises a simplified program dependence graph sPDG isomorphic coefficient PS,TSyntax information superposition coefficient CS,TAnd the system call sequence similarity coefficient AS,T;
A homology judging module used for calculating the homology indexes of the two codes S and T according to the homology coefficient obtained by the homology coefficient obtaining module and judging the homology relation between the two codes according to the homology indexes
The invention has the beneficial effects that:
aiming at the source code homology judgment method, the invention can give consideration to code semantics and behaviors on the basis of similarity, improves the detection efficiency by utilizing light-weight characteristics and a simplified mechanism, and measures the homology relation existing among codes in multiple angles; the problems existing in the prior art are solved: a) the code confusion method compounded by using various means such as format change, renaming modification, junk code insertion, statement reordering and the like cannot be effectively dealt with; b) the detection method based on the complex structure and the algorithm can obtain higher accuracy, but the problems of large solving calculation amount and low detection efficiency exist in the detection process, and the detection efficiency cannot be well taken into consideration while the accuracy is improved; the accuracy can be guaranteed, and meanwhile, the detection efficiency is improved.
The invention abstracts the logic and the characteristics of the code through the code fingerprint, integrates the grammatical characteristics and the behavior characteristics of the code while showing the relation between data flow and control flow through a program dependence graph, solves the problem that the existing code homology detection focuses on analyzing the similarity of code text and characteristics and reflects the insufficient capability of the internal logic and deep association between the codes, greatly improves the homology detection efficiency while keeping high accuracy, effectively prevents the spread of malicious codes, provides technical support for the homology detection and judgment of computer program source codes, and has important guiding significance for computer network security technology and virus detection technology.
Description of the drawings:
FIG. 1 is a schematic flow diagram of the process of the present invention;
FIG. 2 is a schematic flow chart of the homology analysis method in the example;
FIG. 3 is a schematic flow chart of obtaining a simplified program dependence graph sPDG in the embodiment;
FIG. 4 is a schematic diagram illustrating a process of parsing key syntax information of a code based on an abstract syntax tree in an embodiment;
FIG. 5 is a schematic flowchart of constructing a code fingerprint according to an embodiment;
FIG. 6 is a schematic diagram of a process for calculating the homology coefficients between code fingerprint components according to an embodiment;
FIG. 7 is a schematic view of the apparatus of the present invention;
FIG. 8 is an example input code schematic;
FIG. 9 is a simplified part of the structure of the dependency graph of the original program in the embodiment;
FIG. 10 is a schematic diagram of a nest removal process in an embodiment;
FIG. 11 is a diagram illustrating parsing based on an abstract syntax tree in an embodiment;
FIG. 12 is a diagram illustrating the extraction of the target code system call parameter in the embodiment.
The specific implementation mode is as follows:
in order to make the objects, technical solutions and advantages of the present invention clearer and more obvious, the present invention is further described in detail below with reference to the accompanying drawings and technical solutions.
At present, most of the research on methods for detecting code homology is carried out based on a single type. The coarse-grained feature detection can improve the detection efficiency but can reduce the detection accuracy, and the fine-grained features bring performance bottlenecks with large calculation amount while improving the detection accuracy. How to effectively deal with a complex code confusion means under the condition of efficient detection and accurately abstract code logic and generalize code characteristics is an important content which needs to be researched currently.
In an embodiment, a code homology detection method based on code fingerprints is provided, and is shown in fig. 1, and includes the following steps:
step 1, analyzing the dependency relationship of two input codes S and T to obtain an original program dependency graph PDG; carrying out structure simplification, nesting removal and coloring treatment on the original program dependency graph PDG to obtain a simplified program dependency graph sPDG;
step 2, analyzing key syntax information of the code based on the abstract syntax tree;
step 3, extracting a system calling sequence of the code execution path, acquiring a full path parameter vector set of the target code, and constructing a code fingerprint;
step 4, calculating the homology coefficient among code fingerprint parts, wherein the homology coefficient comprises a simplified program dependence graph sPDG isomorphic coefficient PS,TSyntax information superposition coefficient CS,TAnd the system call sequence similarity coefficient AS,T;
And 5, calculating the homology indexes of the two codes S and T according to the homology coefficients, and judging the homology relation between the two codes according to the homology indexes.
The embodiment can accurately extract the code features, effectively cope with the influence of common code confusion methods, and greatly improve the detection efficiency while improving the homology detection accuracy.
For two input code files to be detected, program analysis is performed according to the programming language compiling principle, and an original program dependency graph PDG of a code is obtained as a basis of code fingerprints, in another embodiment of the present invention, as shown in fig. 3, structure simplification, nesting removal and coloring processing are performed on the original program dependency graph PDG, and a simplified program dependency graph sPDG is obtained, which includes the following contents:
step 11, simplifying the structure of the original program dependence graph PDG according to a simplification principle;
step 12, removing the nested input nodes and output nodes from the nodes containing the nested relation, and removing the edges of the corresponding dependency relation to the outer layer function call nodes;
and step 13, classifying and coloring the nodes according to the statement types, and acquiring the simplified program dependence graph sPDG.
In another embodiment of the present invention, the PDG structure simplification for the original program dependency graph includes the following simplification operations for nodes in the graph according to the simplification principle: removing vertices with only one outgoing edge without any incoming edge, removing vertices with only one incoming edge without any outgoing edge; removing vertices with only one input and one output edge and introducing a point from its input vertex to the output vertex; removing vertices without any incoming or outgoing edges; and repeating the simplification operation until no node conforming to the simplification principle exists.
The nodes are classified according to statement type, then the nodes of different types are colored according to different colors, and each type is identified by a coloring number for comparison. Examples of classifications used are as follows: function calls, control statements, declaration statements, arithmetic statements, switch statements, logical expressions, jump statements, and return statements, among others. The details are shown in Table 1.
Type (B) | Node representation information | Color number |
Function call | Calling function and system API | 1 |
Control statement | If,switch,while,for. | 2 |
Statement sentence | Variable declaration or formatting parameters | 3 |
Operation statement | Variable operation, auto-increment/decrement operation | 4 |
switch statement | case,default | 5 |
Jump statement | goto,break,continue | 6 |
Conditional statement | <,>,==,!= | 7 |
Return statement | return | 8 |
Others | Others | 0 |
TABLE 1
In another embodiment of the present invention, the key Syntax information of the code is parsed based on the Abstract Syntax Tree, which is used as a component of the code fingerprint, as shown in fig. 4, and the following contents are included:
step 21, recording global variables, local variables and attributes thereof in the designated code domain to form a quadruple, wherein the quadruple comprises a scope of the variables, link attributes, storage types and names;
step 22, analyzing and recording the macro definition and the corresponding content thereof to form a triple, wherein the triple comprises a macro definition identifier, a content type and a name;
and step 23, analyzing a key data structure in the code based on the abstract syntax tree AST, and recording a custom structure body in the code in a sequence form.
In another embodiment of the present invention, a system call sequence of a code execution path is extracted, a full path parameter vector set E of a target code is obtained, and a code fingerprint is constructed, as shown in fig. 5, which includes the following contents:
step 31, starting from an entry function, generating a call graph and a subsequent domination tree of a function f, extracting a system call sequence K in a single execution path, and recording a system call sequence set K in all possible execution paths;
step 32, for the function f in each system call sequence k, locating the function domain d where the function f is located, analyzing all parameters of the function f in the abstract syntax tree, determining the data source s of each parameter in the f through static taint analysis, judging the source of the parameter value, and combining the data type t of the parameter to form the parameter vector e of the function ff(d, f, t, s); obtaining a parameter vector set E of a system call sequence kk;
Step 33, executing step 32 on each sequence K in the system call sequence set K, and acquiring a full-path parameter vector set E of the target codeK;
Step 34, according to the target code, the full path parameter vector set EKAnd constructing the code fingerprint.
In the homology determination of code fingerprints, in another embodiment of the present invention, the homology coefficients between code fingerprint components are calculated, as shown in fig. 6, and include the following:
step 41, for the simplified program dependence graph sPDG of the target code, seeking the maximum isomorphic subgraph between the simplified program dependence graphs sPDG through a progressive graph isomorphism solving algorithm, and calculating the isomorphism coefficient P between the simplified program dependence graphs sPDGS,T;
Step 42, calculating a coincidence coefficient C through a Jaccard algorithm according to the key grammar information obtained in the step 2S,T;
Step 43, according to the target code in step 3, the full path parameter vector set EKSolving for E by JaccardKAnd taking the highest value of the similarity coefficients of the subsets as a similarity coefficient A of the system call sequenceS,T。
In another embodiment, the original program dependency graph PDG of the target code is represented as a directed graph G ═ V, E, the set of nodes V represents a set of predicate expressions or statements, E represents the data and control dependencies that exist between the parts, let G1=(V1,E1),G2=(V2,E2) Respectively representing a simplified program dependence graph sPDG, according to the solution result of a progressive graph isomorphic solution algorithm, through an evaluation function:
calculating isomorphic coefficient P between simplified program dependence graphs sPDG, wherein when P is 0, G is represented1Is G2Complete subgraph of (1).
For the obtained code key grammar information sequence, in the further embodiment of the invention, the coincidence coefficient is calculated by the Jaccard algorithm, and the coincidence coefficient of the single grammar information α is containedWherein, respectively, the grammar information α sequence corresponding to two codes S and T, calculating the coincidence coefficient of key grammar informationwαIs the weight of the syntax information α.
Aiming at two input codes S and T, calculating an sPDG graph isomorphic coefficient PS,TSyntax information superposition coefficient CS,TAnd the system call sequence similarity coefficient AS,TIn other embodiments of the present invention, the following formula is used:
calculating the homology index of the two codes S and T, wherein wPWeight of isomorphic coefficients, w, for sPDG graphCAs weights of coincidence coefficients of syntax information, wAWeights for the system call sequence similarity coefficients. The larger the Homology (S, T), the more obvious the Homology relationship exists among the input samples.
Corresponding to the above method, an embodiment of the present invention further provides a code homology detection apparatus based on a code fingerprint, as shown in fig. 7, including: a program simplifying module 201, a grammar parsing module 202, a fingerprint constructing module 203, a homology coefficient obtaining module 204 and a homology judging module 205;
the program simplification module 201 is used for analyzing the dependency relationship between the two input codes S and T to obtain an original program dependency graph PDG; carrying out structure simplification, nesting removal and coloring treatment on the original program dependency graph PDG to obtain a simplified program dependency graph sPDG;
the syntax parsing module 202 is configured to parse code key syntax information based on an abstract syntax tree, and includes a variable parsing unit, a macro definition parsing unit, and a key data structure parsing unit, where the variable parsing unit is configured to record a global variable, a local variable, and a scope, a link attribute, and a storage type corresponding to the global variable and the local variable in a designated code domain, the macro definition parsing unit is configured to record a macro definition and a content type corresponding to the macro definition, and the key data structure parsing unit is configured to parse structural bodies defined in all classes and functions in a target code domain;
the fingerprint construction module 203 is configured to extract a system call sequence of a code execution path, obtain a full-path parameter vector set of a target code, and construct a code fingerprint;
a homology coefficient obtaining module 204 for calculating the homology coefficient between code fingerprint parts according to the information obtained by the program simplifying module, the grammar parsing module and the fingerprint constructing module, wherein the homology coefficient comprises the homologous coefficient P of the sPDG in the simplified program dependence graphS,TSyntax information superposition coefficient CS,TAnd the system call sequence similarity coefficient AS,T;
And the homology judgment module 205 is used for calculating the homology indexes of the two codes S and T according to the homology coefficients obtained by the homology coefficient acquisition module and judging the homology relationship between the two codes according to the homology indexes.
The effectiveness of the present invention is further explained by specific examples, as shown in fig. 8, part of contents in two input program code files are illustrated, program analysis is performed according to a programming language compiling principle, an original program dependency graph PDG is obtained, the original program dependency graph PDG is represented as a directed graph G ═ V, E, a node set V represents a set of predicate expressions or statements, E represents data dependencies and control dependencies existing among various parts, and the original program dependency graph PDG is used as a basis for code fingerprints; simplifying the structure of the original program dependence graph, wherein the simplified partial effect is schematically shown in FIG. 9; performing nested removal on the simplified program dependency graph, wherein the process of nested removal is shown in FIG. 10; coloring the program dependence graph after the structure simplification and the nesting removal to obtain a simplified program dependence graph sPDG; constructing an abstract syntax tree of the source code by using LLVM or Clang, and analyzing key syntax information of the code based on the abstract syntax tree to form a code fingerprint, wherein the analysis of two input files is schematically shown in FIG. 11.
Extracting a parameter sequence of system call in a target code to form a complete code fingerprint, as shown in fig. 12, starting from an entry function such as main, generating a call graph and a subsequent domination tree of a function f, extracting a system call sequence K in a single execution path, recording a system call sequence set K in all possible execution paths, locating a function domain d of the function f in each system call sequence K, analyzing all parameters of the function f in an abstract syntax tree, determining a data source s of each parameter in f through static taint analysis, determining the value of the parameter from the outside or from the inside of the function, and finally forming a parameter vector e of the function f by combining a data type t of the parameterf(d, f, t, s). The above operation is carried out on a system calling sequence k to obtain a parameter vector set E of the sequence kkPerforming the above steps on each sequence K in the system call sequence set K to obtain a full-path parameter vector set E of the target codeK. To alleviate EKThe problem of overlarge size or too many invalid paths is solved, and only the number | E of elements satisfying the parameter vector set is reservedkA path | ≧ 5.
For the simplified program dependence graph sPDG of the target code, a maximum isomorphic subgraph among the sPDGs is found by adopting a progressive graph isomorphism solving algorithm, and the sPDG graph isomorphism coefficients are calculated according to the result. Let G1=(V1,E1),G2=(V2,E2) The simplified program dependence graph sPDG respectively represents two input code files, and the algorithm is as follows:
the number of steps n and the size m of each step expansion are determined before the algorithm startsiWherein m isiAnd n satisfiesAnd m isiNot less than 1. Isomorphh (G) implementation using VFLib open source graph isomorphic decision framework1,i,G2) Function pair graph G1,iAnd G2Isomorphic determination to obtain G1Neutralization G2Isomorphic maximal subgraph.
According to the solution result of the isomorphic solution algorithm of the progressive graph, the following evaluation function is used for calculating the proportion of the number of different edges between the two graphs to the number of edges in the smaller graph.
The calculation result P represents the isomorphic coefficients of the sPDG graph of the simplified program attribute graph. When P is 0, it represents G1Is G2Complete subgraph of (1).
For the obtained code key grammar information sequence, including a variable quadruple sequence, a macro definition triple sequence and a structural body sequence, calculating a coincidence coefficient C by adopting a Jaccard algorithm, specifically as follows:
coincidence coefficients of single syntax information αWherein, the sequences of grammar information α corresponding to the two codes S and T respectively are defined When all are empty Hα0; calculating key syntax information coincidence coefficientwαIs the weight of the syntax information α in this example, the variable information weight is set to 0.4, the macro definition information weight is set to 0.2, and the structure information weight is set to 0.4.
Full path parameter vector set E for object codeKCalculating E by adopting Jaccard algorithmKAnd taking the highest value of the similarity coefficients of the subsets as a similarity coefficient A of the system call sequence. The algorithm is as follows:
calculating sPDG graph isomorphic coefficient P aiming at codes S and TS,TSyntax information superposition coefficient CS,TAnd the system call sequence similarity coefficient AS,TThe homology indices of the two codes S and T were calculated by the following formula:
wherein w isPWeight of isomorphic coefficients, w, for sPDG graphCAs weights of coincidence coefficients of syntax information, wAWeights for the system call sequence similarity coefficients. The larger the Homology (S, T), the more obvious the Homology relationship exists among the input samples.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
The elements of the various examples and method steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and the components and steps of the examples have been described in a functional generic sense in the foregoing description for clarity of hardware and software interchangeability. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
Those skilled in the art will appreciate that all or part of the steps of the above methods may be implemented by instructing the relevant hardware through a program, which may be stored in a computer-readable storage medium, such as: read-only memory, magnetic or optical disk, and the like. Alternatively, all or part of the steps of the foregoing embodiments may also be implemented by using one or more integrated circuits, and accordingly, each module/unit in the foregoing embodiments may be implemented in the form of hardware, and may also be implemented in the form of a software functional module. The present invention is not limited to any specific form of combination of hardware and software.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (10)
1. A code homology detection method based on code fingerprints is characterized by comprising the following steps:
step 1, analyzing the dependency relationship of two input codes S and T to obtain an original program dependency graph PDG; carrying out structure simplification, nesting removal and coloring treatment on the original program dependency graph PDG to obtain a simplified program dependency graph sPDG;
step 2, analyzing key syntax information of the code based on the abstract syntax tree;
step 3, extracting a system calling sequence of the code execution path, acquiring a full path parameter vector set of the target code, and constructing a code fingerprint;
step 4, calculating the homology coefficient among code fingerprint parts, wherein the homology coefficient comprises a simplified program dependence graph sPDG isomorphic coefficient PS,TSyntax information superposition coefficient CS,TAnd the system call sequence similarity coefficient AS,T;
And 5, calculating the homology indexes of the two codes S and T according to the homology coefficients, and judging the homology relation between the two codes according to the homology indexes.
2. The code homology detection method based on code fingerprints according to claim 1, wherein the original program dependency graph PDG is subjected to structure simplification, nesting removal and coloring processing in step 1 to obtain a simplified program dependency graph sPDG, and the method comprises the following steps:
step 11, simplifying the structure of the original program dependence graph PDG according to a simplification principle;
step 12, removing the nested input nodes and output nodes from the nodes containing the nested relation, and removing the edges of the corresponding dependency relation to the outer layer function call nodes;
and step 13, classifying and coloring the nodes according to the statement types, and acquiring the simplified program dependence graph sPDG.
3. The code homology detection method based on code fingerprint as claimed in claim 2, wherein the simplification rule in step 11 comprises: removing vertices with only one outgoing edge without any incoming edge, removing vertices with only one incoming edge without any outgoing edge; removing vertices with only one input and one output edge and introducing a point from its input vertex to the output vertex; removing does not have any incoming or outgoing edge vertices.
4. The method for detecting code homology based on code fingerprint as claimed in claim 1, wherein the key syntax information of the code is parsed based on the abstract syntax tree in the step 2, which comprises the following contents:
step 21, recording global variables, local variables and attributes thereof in the designated code domain to form a quadruple, wherein the quadruple comprises a scope of the variables, link attributes, storage types and names;
step 22, analyzing and recording the macro definition and the corresponding content thereof to form a triple, wherein the triple comprises a macro definition identifier, a content type and a name;
and step 23, analyzing a key data structure in the code based on the abstract syntax tree AST, and recording a custom structure body in the code in a sequence form.
5. The code homology detection method based on code fingerprint as claimed in claim 1, wherein the step 3 comprises the following steps:
step 31, starting from an entry function, generating a call graph and a subsequent domination tree of a function f, extracting a system call sequence K in a single execution path, and recording a system call sequence set K in all possible execution paths;
step 32, for the function f in each system call sequence k, locating the function domain d where the function f is located, analyzing all parameters of the function f in the abstract syntax tree, determining the data source s of each parameter in the f through static taint analysis, judging the source of the parameter value, and combining the data type t of the parameter to form the parameter vector e of the function ff(d, f, t, s); obtaining a parameter vector set E of a system call sequence kk;
Step 33, executing step 32 on each sequence K in the system call sequence set K, and acquiring a full-path parameter vector set E of the target codeK;
Step 34, according to the target code, the full path parameter vector set EKAnd constructing the code fingerprint.
6. The code homology detection method based on code fingerprint as claimed in claim 1, wherein the step 4 comprises the following steps:
step 41, simplifying the procedure for the object codeThe method comprises the steps of obtaining the maximum isomorphic subgraph between simplified program dependence graphs sPDG through a progressive graph isomorphic solving algorithm and calculating the isomorphic coefficient P between the simplified program dependence graphs sPDGS,T;
Step 42, calculating a coincidence coefficient C through a Jaccard algorithm according to the key grammar information obtained in the step 2S,T;
Step 43, according to the target code in step 3, the full path parameter vector set EKSolving for E by JaccardKAnd taking the highest value of the similarity coefficients of the subsets as a similarity coefficient A of the system call sequenceS,T。
7. The code homology detection method based on code fingerprints as claimed in claim 6, wherein in step 41, the original program dependency graph PDG is represented as a directed graph G (V, E), the node set V represents a group of predicate expressions or sentences, E represents data dependencies and control dependencies existing among parts, and G is represented as1=(V1,E1),G2=(V2,E2) Respectively representing a simplified program dependence graph sPDG, by evaluating the functions:
<mrow> <mi>I</mi> <mrow> <mo>(</mo> <mi>e</mi> <mo>,</mo> <mi>E</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mn>0</mn> </mtd> <mtd> <mrow> <mi>i</mi> <mi>f</mi> <mi> </mi> <mi>e</mi> <mo>&Element;</mo> <mi>E</mi> </mrow> </mtd> </mtr> <mtr> <mtd> <mn>1</mn> </mtd> <mtd> <mrow> <mi>o</mi> <mi>t</mi> <mi>h</mi> <mi>e</mi> <mi>r</mi> <mi>w</mi> <mi>i</mi> <mi>s</mi> <mi>e</mi> <mo>.</mo> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>,</mo> <mi>P</mi> <mo>=</mo> <mfrac> <mrow> <msub> <mi>&Sigma;</mi> <mrow> <mi>e</mi> <mo>&Element;</mo> <msub> <mi>E</mi> <mn>1</mn> </msub> </mrow> </msub> <mi>I</mi> <mrow> <mo>(</mo> <mi>e</mi> <mo>,</mo> <msub> <mi>E</mi> <mn>2</mn> </msub> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>&Sigma;</mi> <mrow> <mi>e</mi> <mo>&Element;</mo> <msub> <mi>E</mi> <mn>2</mn> </msub> </mrow> </msub> <mi>I</mi> <mrow> <mo>(</mo> <mi>e</mi> <mo>,</mo> <msub> <mi>E</mi> <mn>1</mn> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <mo>|</mo> <msub> <mi>E</mi> <mn>1</mn> </msub> <mo>|</mo> </mrow> </mfrac> </mrow>
the calculation of the isomorphic coefficients P between the reduced program dependency graphs sPDG.
8. The method of claim 6, wherein the step 42 comprises a coincidence coefficient of a single syntax information αWherein,respectively, the grammar information α sequence corresponding to two codes S and T, calculating the coincidence coefficient of key grammar informationwαIs the weight of the syntax information α.
9. The code homology detection method based on code fingerprint as claimed in claim 1, wherein in step 5, by the formula:calculating the homology index of the two codes S and T, wherein wPWeight of isomorphic coefficients, w, for sPDG graphCAs weights of coincidence coefficients of syntax information, wAWeights for the system call sequence similarity coefficients.
10. A code homology detection apparatus based on a code fingerprint, comprising: the system comprises a program simplifying module, a grammar analyzing module, a fingerprint constructing module, a homology coefficient acquiring module and a homology judging module;
the program simplification module is used for analyzing the dependency relationship between the two input codes S and T and acquiring an original program dependency graph PDG; carrying out structure simplification, nesting removal and coloring treatment on the original program dependency graph PDG to obtain a simplified program dependency graph sPDG;
the grammar parsing module is used for parsing key grammar information of the codes based on an abstract grammar tree and comprises a variable parsing unit, a macro definition parsing unit and a key data structure parsing unit, wherein the variable parsing unit is used for recording global variables, local variables and corresponding action domains, link attributes and storage types of the global variables and the local variables in a designated code domain, the macro definition parsing unit is used for recording macro definitions and corresponding content types of the macro definitions, and the key data structure parsing unit is used for parsing all classes and structural bodies defined in functions in a target code domain;
the fingerprint construction module is used for extracting a system calling sequence of a code execution path, acquiring a full-path parameter vector set of a target code and constructing a code fingerprint;
a homology coefficient acquisition module for calculating homology between code fingerprint components according to the information acquired by the program simplification module, the grammar analysis module and the fingerprint construction moduleCoefficients containing the sPDG isomorphic coefficients P of a simplified program dependence graphS,TSyntax information superposition coefficient CS,TAnd the system call sequence similarity coefficient AS,T;
And the homology judging module is used for calculating the homology indexes of the two codes S and T according to the homology coefficients obtained by the homology coefficient obtaining module and judging the homology relation between the two codes according to the homology indexes.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710375425.4A CN107169358B (en) | 2017-05-24 | 2017-05-24 | Code homology detection method and its device based on code fingerprint |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710375425.4A CN107169358B (en) | 2017-05-24 | 2017-05-24 | Code homology detection method and its device based on code fingerprint |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107169358A true CN107169358A (en) | 2017-09-15 |
CN107169358B CN107169358B (en) | 2019-10-08 |
Family
ID=59820829
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710375425.4A Active CN107169358B (en) | 2017-05-24 | 2017-05-24 | Code homology detection method and its device based on code fingerprint |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107169358B (en) |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107967152A (en) * | 2017-12-12 | 2018-04-27 | 西安交通大学 | Software based on minimum individual path function birthmark locally plagiarizes evidence generation method |
CN108229170A (en) * | 2018-02-02 | 2018-06-29 | 中科软评科技(北京)有限公司 | Utilize big data and the software analysis method and device of neural network |
CN108287996A (en) * | 2018-01-08 | 2018-07-17 | 北京工业大学 | A kind of malicious code obscures feature cleaning method |
CN108399321A (en) * | 2017-11-03 | 2018-08-14 | 西安邮电大学 | Software based on dynamic instruction dependency graph birthmark locally plagiarizes detection method |
CN109101816A (en) * | 2018-08-10 | 2018-12-28 | 北京理工大学 | A kind of malicious code homology analysis method for calling controlling stream graph based on system |
CN109190653A (en) * | 2018-07-09 | 2019-01-11 | 四川大学 | Malicious code family homology analysis technology based on semi-supervised Density Clustering |
CN109918128A (en) * | 2019-03-25 | 2019-06-21 | 湘潭大学 | A kind of detection method of code similarity and system based on relationship variogram |
CN110347428A (en) * | 2018-04-08 | 2019-10-18 | 北京京东尚科信息技术有限公司 | A kind of detection method and device of code similarity |
CN110489973A (en) * | 2019-08-06 | 2019-11-22 | 广州大学 | A kind of intelligent contract leak detection method, device and storage medium based on Fuzz |
CN110555305A (en) * | 2018-05-31 | 2019-12-10 | 武汉安天信息技术有限责任公司 | Malicious application tracing method based on deep learning and related device |
WO2019233112A1 (en) * | 2018-06-05 | 2019-12-12 | 北京航空航天大学 | Vectorized representation method for software source codes |
CN110955758A (en) * | 2019-12-18 | 2020-04-03 | 中国电子技术标准化研究院 | Code detection method, code detection server and index server |
CN111291373A (en) * | 2020-02-03 | 2020-06-16 | 思客云(北京)软件技术有限公司 | Method, apparatus and computer-readable storage medium for analyzing data pollution propagation |
CN113064633A (en) * | 2021-03-26 | 2021-07-02 | 山东师范大学 | Automatic code abstract generation method and system |
CN113138924A (en) * | 2021-04-23 | 2021-07-20 | 扬州大学 | Thread security code identification method based on graph learning |
CN113434145A (en) * | 2021-06-09 | 2021-09-24 | 华东师范大学 | Program code similarity measurement method based on abstract syntax tree path context |
CN114879974A (en) * | 2022-06-09 | 2022-08-09 | 西安交通大学 | Implicit dependency mode analysis method based on CPG + graph |
CN115129364A (en) * | 2022-07-05 | 2022-09-30 | 四川大学 | Fingerprint identity recognition method and system based on abstract syntax tree and graph neural network |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090172650A1 (en) * | 2007-12-28 | 2009-07-02 | International Business Machines Corporation | System and method for comparing partially decompiled software |
CN101697121A (en) * | 2009-10-26 | 2010-04-21 | 哈尔滨工业大学 | Method for detecting code similarity based on semantic analysis of program source code |
CN104407872A (en) * | 2014-12-04 | 2015-03-11 | 北京邮电大学 | Code clone detection method |
CN104933364A (en) * | 2015-07-08 | 2015-09-23 | 中国科学院信息工程研究所 | Automatic malicious code homology judgment method and system based on calling behaviors |
-
2017
- 2017-05-24 CN CN201710375425.4A patent/CN107169358B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090172650A1 (en) * | 2007-12-28 | 2009-07-02 | International Business Machines Corporation | System and method for comparing partially decompiled software |
CN101697121A (en) * | 2009-10-26 | 2010-04-21 | 哈尔滨工业大学 | Method for detecting code similarity based on semantic analysis of program source code |
CN104407872A (en) * | 2014-12-04 | 2015-03-11 | 北京邮电大学 | Code clone detection method |
CN104933364A (en) * | 2015-07-08 | 2015-09-23 | 中国科学院信息工程研究所 | Automatic malicious code homology judgment method and system based on calling behaviors |
Non-Patent Citations (1)
Title |
---|
黄柳柳 等: "面向代码相似度检测的指纹选取方法", 《计算机工程与应用》 * |
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108399321A (en) * | 2017-11-03 | 2018-08-14 | 西安邮电大学 | Software based on dynamic instruction dependency graph birthmark locally plagiarizes detection method |
CN108399321B (en) * | 2017-11-03 | 2021-05-18 | 西安邮电大学 | Software local plagiarism detection method based on dynamic instruction dependence graph birthmark |
CN107967152A (en) * | 2017-12-12 | 2018-04-27 | 西安交通大学 | Software based on minimum individual path function birthmark locally plagiarizes evidence generation method |
CN107967152B (en) * | 2017-12-12 | 2020-06-19 | 西安交通大学 | Software local plagiarism evidence generation method based on minimum branch path function birthmarks |
CN108287996A (en) * | 2018-01-08 | 2018-07-17 | 北京工业大学 | A kind of malicious code obscures feature cleaning method |
CN108229170B (en) * | 2018-02-02 | 2020-05-12 | 中科软评科技(北京)有限公司 | Software analysis method and apparatus using big data and neural network |
CN108229170A (en) * | 2018-02-02 | 2018-06-29 | 中科软评科技(北京)有限公司 | Utilize big data and the software analysis method and device of neural network |
CN110347428A (en) * | 2018-04-08 | 2019-10-18 | 北京京东尚科信息技术有限公司 | A kind of detection method and device of code similarity |
CN110555305A (en) * | 2018-05-31 | 2019-12-10 | 武汉安天信息技术有限责任公司 | Malicious application tracing method based on deep learning and related device |
US11256487B2 (en) | 2018-06-05 | 2022-02-22 | Beihang University | Vectorized representation method of software source code |
WO2019233112A1 (en) * | 2018-06-05 | 2019-12-12 | 北京航空航天大学 | Vectorized representation method for software source codes |
CN109190653A (en) * | 2018-07-09 | 2019-01-11 | 四川大学 | Malicious code family homology analysis technology based on semi-supervised Density Clustering |
CN109101816A (en) * | 2018-08-10 | 2018-12-28 | 北京理工大学 | A kind of malicious code homology analysis method for calling controlling stream graph based on system |
CN109101816B (en) * | 2018-08-10 | 2022-02-08 | 北京理工大学 | Malicious code homology analysis method based on system call control flow graph |
CN109918128A (en) * | 2019-03-25 | 2019-06-21 | 湘潭大学 | A kind of detection method of code similarity and system based on relationship variogram |
CN109918128B (en) * | 2019-03-25 | 2022-04-08 | 湘潭大学 | Code similarity detection method and system based on relation variable graph |
CN110489973A (en) * | 2019-08-06 | 2019-11-22 | 广州大学 | A kind of intelligent contract leak detection method, device and storage medium based on Fuzz |
CN110955758A (en) * | 2019-12-18 | 2020-04-03 | 中国电子技术标准化研究院 | Code detection method, code detection server and index server |
CN111291373A (en) * | 2020-02-03 | 2020-06-16 | 思客云(北京)软件技术有限公司 | Method, apparatus and computer-readable storage medium for analyzing data pollution propagation |
CN113064633A (en) * | 2021-03-26 | 2021-07-02 | 山东师范大学 | Automatic code abstract generation method and system |
CN113138924A (en) * | 2021-04-23 | 2021-07-20 | 扬州大学 | Thread security code identification method based on graph learning |
CN113138924B (en) * | 2021-04-23 | 2023-10-31 | 扬州大学 | Thread safety code identification method based on graph learning |
CN113434145A (en) * | 2021-06-09 | 2021-09-24 | 华东师范大学 | Program code similarity measurement method based on abstract syntax tree path context |
CN114879974A (en) * | 2022-06-09 | 2022-08-09 | 西安交通大学 | Implicit dependency mode analysis method based on CPG + graph |
CN114879974B (en) * | 2022-06-09 | 2024-09-13 | 西安交通大学 | Implicit dependency pattern analysis method based on CPG+ graph |
CN115129364A (en) * | 2022-07-05 | 2022-09-30 | 四川大学 | Fingerprint identity recognition method and system based on abstract syntax tree and graph neural network |
Also Published As
Publication number | Publication date |
---|---|
CN107169358B (en) | 2019-10-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107169358B (en) | Code homology detection method and its device based on code fingerprint | |
CN111639344B (en) | Vulnerability detection method and device based on neural network | |
Ortiz et al. | Worst-case optimal reasoning for the Horn-DL fragments of OWL 1 and 2 | |
KR101306667B1 (en) | Apparatus and method for knowledge graph stabilization | |
CN114861194B (en) | Multi-type vulnerability detection method based on BGRU and CNN fusion model | |
CN110581864B (en) | Method and device for detecting SQL injection attack | |
Xiao et al. | Bug localization with semantic and structural features using convolutional neural network and cascade forest | |
CN107844415A (en) | A kind of model inspection path reduction method, computer based on interpolation | |
Wang et al. | Explainable apt attribution for malware using nlp techniques | |
CN115098857B (en) | Visual malicious software classification method and device | |
CN115269427A (en) | Intermediate language representation method and system for WEB injection vulnerability | |
JP2008299723A (en) | Program verification method and device | |
Wang et al. | Enhancing dnn-based binary code function search with low-cost equivalence checking | |
Jiang et al. | Scalable processing of contemporary semi-structured data on commodity parallel processors-a compilation-based approach | |
Sozeau et al. | Correct and Complete Type Checking and Certified Erasure for Coq, in Coq | |
CN116663018A (en) | Vulnerability detection method and device based on code executable path | |
Zheng et al. | A multitype software buffer overflow vulnerability prediction method based on a software graph structure and a self-attentive graph neural network | |
Alrabaee et al. | BinDeep: Binary to source code matching using deep learning | |
Zhongzheng et al. | Webshell detection with byte-level features based on deep learning | |
Xu et al. | Fuzzing JavaScript engines with a syntax-aware neural program model | |
CN115879868B (en) | Expert system and deep learning integrated intelligent contract security audit method | |
Cauli et al. | Equivalence of probabilistic-calculus and p-automata | |
Stepanov et al. | Making Bounded Model Checking Interprocedural in (Static Analysis) Style | |
Miao et al. | AST2Vec: A Robust Neural Code Representation for Malicious PowerShell Detection | |
US20240354424A1 (en) | System and methods for unbiased transformer source code vulnerability learning with semantic code graph |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |