CN116383832A - Intelligent contract vulnerability detection method based on graph neural network - Google Patents

Intelligent contract vulnerability detection method based on graph neural network Download PDF

Info

Publication number
CN116383832A
CN116383832A CN202310462926.1A CN202310462926A CN116383832A CN 116383832 A CN116383832 A CN 116383832A CN 202310462926 A CN202310462926 A CN 202310462926A CN 116383832 A CN116383832 A CN 116383832A
Authority
CN
China
Prior art keywords
graph
nodes
intelligent contract
layer
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310462926.1A
Other languages
Chinese (zh)
Inventor
陈铁明
周睿
吕明琪
朱添田
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202310462926.1A priority Critical patent/CN116383832A/en
Publication of CN116383832A publication Critical patent/CN116383832A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/43Checking; Contextual analysis
    • G06F8/436Semantic checking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an intelligent contract vulnerability detection method based on a graph neural network, which takes sensitive operation in an intelligent contract as an entry point of program analysis, extracts codes with control dependency and data dependency of the sensitive operation as key codes for analysis, reduces tasks of program analysis, reduces interference of irrelevant codes while improving efficiency, improves analysis accuracy and is beneficial to locating vulnerabilities. And the key codes are represented by the graph structure, so that semantic information and structural information of the codes can be considered, and the accuracy of detection can be improved. In addition, the invention can accurately diagnose and explain abnormal results through a graph neural network learning model and an attention mechanism.

Description

Intelligent contract vulnerability detection method based on graph neural network
Technical Field
The invention belongs to the technical field of intelligent contract security vulnerability detection, and particularly relates to an intelligent contract vulnerability detection method based on a graph neural network.
Background
Smart contracts are essentially a piece of computer program running on a blockchain, written in the Turing complete language, that can be automatically executed in a blockchain network. Currently, tens of thousands of intelligent contracts are deployed on various types of blockchain platforms.
The security loopholes of the intelligent contracts not only cause huge economic loss, but also break the trust foundation of people on the blockchain and the intelligent contracts, and the loophole detection and the security protection of the intelligent contracts become key problems and huge challenges to be solved urgently. If in the process of intelligent contract development, before the project is actually deployed, a developer can quickly and accurately detect whether a certain intelligent contract has a vulnerability by using tools or technologies, so that the potential vulnerability can be effectively prevented from being exposed to an attacker, and the safety of a large number of transactions in a blockchain is ensured.
The intelligent contract automatic detection method can accurately and one-key deal with endless intelligent contract vulnerability types in the blockchain network, and reduces false alarm and vulnerability situations possibly caused by manual verification and expert analysis, so that the problem of vulnerability mining and detection of intelligent contracts is significant to work and research by adopting an accurate and efficient intelligent contract automatic detection tool.
With the development of machine learning technology, a vulnerability detection method based on machine learning is widely paid attention to, and the machine learning technology used for vulnerability detection comprises a traditional shallow learning technology and a deep learning technology. The deep learning technology can automatically learn complex nonlinear hidden features, and generally has higher accuracy and generalization capability. Conventional deep learning models commonly used for vulnerability detection include MLP (multi-layer perceptron), CNN (convolutional neural network), LSTM (long short term memory network), automatic encoder, etc. However, because the traditional deep learning model directly converts codes into word vectors, the methods cannot highlight key variables in the intelligent contract source codes to cause insufficient semantic modeling, so that detection results are not ideal, and on the other hand, because of the 'black box' of the neural network, the neural network has poor interpretability in most cases, namely, the exact positions or codes with possible holes cannot be given like the traditional detection tools.
Aiming at the problems, what kind of graph the intelligent contract is converted into, how to convert the code of the intelligent contract into a non-Euclidean graph, so that the graph contains the data and control dependency relationship in the contract, the semantics and grammar information in the original contract can be better represented through the non-Euclidean graph, so that the vulnerability detection problem is further converted into the identification problem of the graph topology, and what kind of graph neural network architecture is adopted to detect the graph converted by the intelligent contract on the basis, so that the detection efficiency and the accuracy are higher, and the problem to be solved is urgent.
Disclosure of Invention
The invention aims to provide an intelligent contract vulnerability detection method based on a graph neural network, which aims to solve the problems existing in the prior art.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
the intelligent contract vulnerability detection method based on the graph neural network specifically comprises the following steps:
step 1: source code pretreatment: normalizing source codes of intelligent contracts, extracting a control flow graph from the normalized source codes of the intelligent contracts, marking nodes containing sensitive operations in the control flow graph according to a preset matching rule base, creating a program dependency graph according to the marked control flow graph, extracting a key code graph according to the program dependency graph, and forming the matching rule base according to the sensitive operations causing the intelligent contract loopholes;
step 2: the method comprises the steps of constructing a detection model based on a graph neural network, wherein the detection model comprises a symbolizing module, a semantic feature extraction module and a prediction module, the symbolizing module is used for symbolizing variable names and function names in the key code graph, the semantic feature extraction module is used for extracting node semantic features from the symbolized key code graph, and the prediction module is constructed based on the graph neural network and is used for outputting a prediction result of intelligent contract detection according to the node semantic features;
step 3: training, testing and application of the detection model: training the detection model, and taking the detection model passing the test to perform intelligent contract vulnerability detection on the key code diagram corresponding to the source code to be detected.
The following provides several alternatives, but not as additional limitations to the above-described overall scheme, and only further additions or preferences, each of which may be individually combined for the above-described overall scheme, or may be combined among multiple alternatives, without technical or logical contradictions.
Preferably, the sensitive operation is a specific function.
Preferably, the normalization processing is to format the source code of the intelligent contract to be detected by using an ethline tool according to Solidity Style Guide specification, then complement the default key omitted by the source code of the intelligent contract in the programming process, and then convert different codes expressing the same semantics into the same expression form.
Preferably, the control flow graph is extracted from an abstract syntax tree reflecting a source code syntax relationship, and the abstract syntax tree is obtained through source code conversion of the intelligent contract after normalization processing.
Preferably, the program dependency graph is generated by combining a control dependency graph and a data dependency graph generated by the control flow graph, nodes of the control dependency graph, the data dependency graph and the program dependency graph are identical, and edges of the program dependency graph are a union of edges of the control dependency graph and edges of the data dependency graph.
Preferably, the control dependency graph is generated by: adding an outlet and an inlet in the control flow graph, wherein the outlet is the last part of the source code of the intelligent contract, the inlet is the first part of the source code of the intelligent contract, an edge is added between the outlet and the inlet, each edge of the control flow graph is inverted to obtain a reverse control flow graph, each node is subjected to a dominance according to the reverse control flow graph, a forward dominance boundary of each node is obtained through the dominance and the reverse control flow graph, and the control dependency graph is obtained according to the forward dominance boundary;
the data dependency graph is generated in the following way: selecting nodes which define variables in the control flow graph as main nodes, traversing the main nodes, screening nodes which directly use the values of the variables defined by the main nodes as secondary nodes, forming an edge between the main nodes and the corresponding secondary nodes, and obtaining the data dependency graph after traversing.
Preferably, the key code graph is all nodes with control dependency and data dependency with the nodes containing sensitive operations on the program dependency graph.
Preferably, the symbolizing processing of variable names and function names in the key code map includes: and mapping the user-defined variable names and function names in the nodes in the key code graph into respective unified forms.
Preferably, the extracting the node semantic features from the key code map after the symbolization processing includes: splitting the nodes in the key code graph into basic units with minimum semantics, and extracting semantic information from the basic units in each node through a Doc2Vec model or a Word2Vec model to obtain node semantic features.
Preferably, the prediction module comprises an input layer, a graph convolution layer, a pooling layer, a reading layer and a multi-layer perceptron layer; the input of the input layer is node semantic features and an adjacency matrix obtained according to a key code diagram; the graph volume stacking layer is combined with a self-attention mechanism to self-adaptively learn the importance of the nodes in the key code graph, the output is connected with the pooling layer through jump connection, the pooling layer adopts a SagPool pooling mechanism, the reading layer adopts a mean pooling mode, and the multi-layer perceptron layer is formed by combining an activation layer, a full connection layer, a Dropout layer and a softMax layer.
According to the intelligent contract vulnerability detection method based on the graph neural network, sensitive operation in an intelligent contract is used as an entry point of program analysis, codes with control dependency and data dependency of the sensitive operation are extracted and used as key codes for analysis, tasks of program analysis are reduced, the efficiency is improved, interference of irrelevant codes is reduced, analysis accuracy is improved, and vulnerability positioning is facilitated. And the key codes are represented by the graph structure, so that semantic information and structural information of the codes can be considered, and the accuracy of detection can be improved. In addition, the invention can accurately diagnose and explain abnormal results through a graph neural network learning model and an attention mechanism.
Drawings
FIG. 1 is a flow chart of an intelligent contract vulnerability detection method based on a graph neural network;
FIG. 2 is a schematic diagram of the creation of a control flow graph of the present invention;
FIG. 3 is a schematic diagram of a node marked as containing sensitive operations in accordance with the present invention;
FIG. 4 is a schematic diagram of the creation of a reverse control flow graph of the present invention;
FIG. 5 is a schematic diagram of the creation of a control dependency graph in accordance with the present invention;
FIG. 6 is a schematic diagram of creating a data dependency graph in accordance with the present invention;
FIG. 7 is a schematic diagram of creating a program dependency graph in accordance with the present invention;
FIG. 8 is a schematic diagram of the invention for extracting key code graphs based on program dependency graphs;
FIG. 9 is a schematic diagram of the structure of the prediction model of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
The intelligent contract vulnerability detection method based on the graph neural network can improve intelligent contract vulnerability detection efficiency and accuracy. As shown in fig. 1, the method comprises the following steps:
step 1: and (5) preprocessing source codes. The source code preprocessing can be divided into five parts.
A first part: and collecting and sorting sensitive operations which lead to intelligent contract loopholes to form a matching rule base.
The sensitive operation is a specific function, namely, a function which is induced by a security analyst and is prone to cause loopholes, such as a call.value function which can trigger reentry leakage, a block.timestamp function which can cause infinite loop loopholes, a logAndcail () function which can cause injection attack, a deltatecall function, a specr.call () function and the like.
The sensitive operation can be collected and arranged once, after the matching rule base is formed, the formed matching rule base can be directly used by detecting the source code of the intelligent contract, and the matching rule base can be supplemented later if necessary.
A second part: and (5) source code normalization processing. Formatting the source codes to be detected according to a unified specification, and avoiding the influence of codes of different programming styles on training results.
Formatting is to firstly perform formatting treatment on the intelligent contract source codes to be detected according to the coding specification of the intelligent contract, so that the coding specification of the source codes to be detected is kept uniform. And then, complementing default keywords omitted in the programming process of the source codes of the intelligent contracts, so that the model extracts more complete semantic information. Finally, different codes expressing the same semantics are converted into the same expression form, such as unified ternary expression into equivalent form, and certain monocular operators into complete form, etc., so that the coding style tends to be unified.
Where the code specification refers to the well-known code specification Solidity Style Guide specification, the formatting tool may employ ethlin et al.
In this embodiment, the conversion of different codes expressing the same semantics into the same expression form is a conventional process, and an example of uniformly converting the ternary expression into the equivalent form thereof is as follows: the ternary expression "conditionx: y" is all converted into this unified expression of If … else. An example of converting some monocular operators into a complete form is converting the complex assignment operators a+=b, a- =b, a/=b, a% =b, a|=b, a =b, a < <=b, a > =b, a++ into a=a+b, a=a-b, a=a/b, a=a% b, a=a|b, a=a < < b, a=a > > b, a=a+1, respectively. The specific implementation mode of converting different codes expressing the same semantics into the same expression form is realized by a developer.
Third section: the control flow graph is extracted and nodes containing sensitive operations are marked. The source codes of the intelligent contracts processed by the first part and the second part are converted into abstract syntax trees capable of reflecting the source code syntax relationship, and then control flow diagrams of programs are extracted from the abstract syntax trees. The control flow graph is a directed graph that includes nodes representing basic blocks and directed edges representing calling relationships between the basic blocks.
A basic block is a set of consecutive statements in which there is only a single entry and exit, and control flow can only enter at the entry of the basic block, exit from the basic block exit, and not stop or branch at a point other than the end of the basic block.
And traversing each node in the control flow graph of the program, and marking the nodes containing sensitive operations in the control flow graph according to a preset matching rule base. In order to facilitate understanding of the process of extracting a control flow graph according to the present invention, in another embodiment, the specific flow steps of providing a withDraw function are shown in fig. 2, where source code is converted into an Abstract Syntax Tree (AST) and then the abstract syntax tree is converted into a Control Flow Graph (CFG). As shown in fig. 3, the gray-filled basic block contains a sensitive operation call.value (), which is marked when traversing the matching basic block.
Fourth part: a program dependency graph is created from the control flow graph. And generating a control dependency graph and a data dependency graph on the basis of the marked control flow graph obtained in the third part, and combining the generated control dependency graph and the data dependency graph to generate a program dependency graph.
Wherein the control dependence is that for two nodes a, b in the CFG, if there is a path between the two nodes and the execution of b needs to depend on the execution result of a, the b control dependence is called a. Therefore, to generate the control dependency graph, an outlet and an inlet are added in the control flow graph obtained in the third part, the inlet is the initial position of the source code, the outlet is the final position of the source code, and an edge is added between the inlet and the outlet and points to the outlet from the inlet. Each edge of the control flow graph is then inverted to obtain an inverted control flow graph, i.e., a reverse control flow graph, which is denoted reverse fg, as shown in fig. 4.
Each node is subjected to dominance according to the revedcfg, and a dominance tree, namely a forward dominance tree, which can represent the dominance relation is obtained. Wherein the dominant relationship is calculated as: for a certain node A in the reverse order FG, a path from the root node to the node A is acquired, a certain node on the path is sequentially deleted, then whether the node A can still arrive from the root node is checked, and if the node A can not arrive, the node governs the node A.
Where the dominance tree is a tree-like data structure, each node in the dominance tree dominates its descendant nodes, the node that dominates the nearest to the tree n is called the direct dominance node of n, which can directly dominate n. Wherein the forward dominant tree reverses the edges of the control flow to determine the dominant of each node to form a tree data structure representing the dominant relationship.
As shown in fig. 5, the forward dominant boundary of each node is obtained from the reverse fg and the forward dominant tree, and the dominant boundary is the boundary that a certain node can most significantly dominant. The forward dominant boundary is translated into a Control Dependency Graph (CDG), i.e. the control of each node depends on its forward dominant boundary. The calculation of the forward dominant boundary may be made with reference to the prior art, for example the algorithm proposed by Cooper in document A Simple Fast Dominance Algorithm.
As shown in fig. 6, in the generation of the data dependency graph, when the data dependency graph is generated, a node defining a variable in the control flow graph is selected as a main node, the main node is traversed, a node directly using the value of the variable defined by the main node is screened as a secondary node, an edge is formed between the main node and the corresponding secondary node, and the Data Dependency Graph (DDG) is obtained after the traversing is completed.
Specifically, let a, b be two nodes on the control flow graph, if a path exists from a to b, a variable v is defined at node a, v is used when node b executes, and v is not redefined on the execution path, a data dependent edge is constructed between the two nodes.
In fig. 6, basic block 1 defines a variable amount, basic block 2, basic block 3 uses the value of amount, and basic block 2 and basic block 3 form a data dependency with basic block 1.
Let the control dependency graph be (N, C) and the data dependency graph be (N, D). Where N is a set of nodes
N={n 1 ,n 2 ,n 3 ,n 4 ,…n x },n x For the xth node, C is the set of control dependent edges (i.e., directed edges in the control dependency graph), c= {<n a ,n b >,<n c ,n d >… …, n.epsilon.N. D is a set D = { of data dependent edges (i.e., directed edges in the data dependent graph)<n f ,N h >,<n c ,n d >… …, N e N, the program dependency graph can be expressed as p= (N, C u D). That is, the nodes of the control dependency graph, the data dependency graph and the program dependency graph are the same, and the edges of the program dependency graph are taken as the union of the edge sets of the control dependency graph and the edge sets of the data dependency graph.
As shown in fig. 7, the process of combining the control dependency graph and the data dependency graph is shown, in the graph, each node is represented by the number of the basic block corresponding to the source code of the intelligent contract, where the solid line represents the directed edge of the control dependency graph, the dotted line represents the directed edge of the data dependency graph, and the nodes containing the sensitive operation in the program dependency graph are marked.
Fifth part: and extracting key codes according to the program dependency graph. Starting from the marked nodes in the program dependency graph, traversing the program dependency graph, deleting the nodes and edges which are irrelevant to the marked nodes in the program dependency graph, and obtaining the nodes and edges which have control dependency relations or data dependency relations with the marked nodes, wherein the nodes and edges are called a key code graph.
When executing, firstly finding all paths from all entry nodes to marked nodes, storing the nodes passing through the paths, then taking each marked node as a starting point, storing the nodes which can be reached by depth-first traversal in a depth-first traversal mode, and finally deleting the nodes and edges which have no control dependence and data dependence relationship with the marked nodes on the program dependency graph, so as to obtain the nodes and edges which are strongly related to the vulnerability grammar fragments.
In order to facilitate understanding of the method for extracting key code graphs from program dependency graphs according to the present invention, in another embodiment, specific steps are provided as shown in fig. 8. First find all paths from the ingress node r to the marked node 3 in the Program Dependency Graph (PDG), the path of this embodiment can be seen in fig. 8 as r- >1- >3; r- >2- >3; r- >1- >2- >3; and all nodes and edges on the path are saved to obtain a set of edges and points, then the marked nodes 3 are started to carry out deep traversal, and the nodes with the deep traversal reachable are saved, and in the embodiment, the nodes 3 have no subsequent nodes, so that an empty set is obtained. And finally, only the saved nodes and the related edges are reserved, other nodes and edges are deleted, nodes and edges with control dependency relationships and data dependency relationships with the marked nodes are obtained, and the key code graph (handledPDG) shown in fig. 8 is obtained.
By preprocessing the source codes, codes with control dependency and data dependency in sensitive operation can be extracted and analyzed, code segments with loopholes can be screened out, irrelevant codes are removed, tasks of program analysis can be reduced, analysis efficiency is improved, interference of useless codes on detection can be reduced, analysis accuracy can be improved, and loopholes can be positioned conveniently.
Step 2: and constructing a detection model based on the graph neural network. The detection model comprises a symbolization module, a semantic feature extraction module and a prediction module.
Symbolizing module: and symbolizing variable names and function names in the key code graph.
Mapping all user-defined identifiers (variable names and function names) in the nodes in the key code diagram obtained in the step 1 into a unified form, such as mapping the user-defined variable names into VAR1, VAR2 and … …; and mapping the user-defined function name into FUNC1 and FUNC2.
In order to facilitate understanding of the symbolized variable names and function names proposed by the present invention, in another embodiment, specific symbolized process flow steps of the witdraw function are provided, that is, the witdraw function is mapped to be fu n1, and the variable amounts are all mapped to be VAR1, so that the influence of the custom identifier name on the features in the graph of the subsequent extracted key code can be reduced.
Semantic feature extraction module: and extracting the semantic features of the nodes from the key code diagram after symbolization.
In the key code obtained in step 1, each node is text, and in the key code diagram after symbolizing based on step 2, each node is text. Each of the nodes can be split into basic units with minimal semantics, such as msg.sender.call.value (VAR 1) (), using lexical analysis tools in intelligent contracts: "msg", "sender", "call", "", "value", "(", "VAR1", ")", "(", ")".
And extracting semantic information from each basic unit by using a Doc2Vec model or a Word2Vec model to obtain a semantic feature vector of each node. The Doc2Vec process is exemplified below:
X=[Doc2Vec(V i )],i=1,2…,n
wherein V is i And n is the total number of the nodes in the key code graph. Finally obtain each30-dimensional feature vectors for each node. The feature matrix formed by each node feature of the graph can be represented as a matrix of n×30, i.e. the key code graph is represented as g= (V, E) where v= [ V ] 1 ,v 2 ,v 3 …,v n ]For all nodes of the graph G, associating the feature matrix corresponding to the code graph
Figure BDA0004201340760000081
h vi Is a 30-dimensional feature vector for the i-th node in the graph.
And a prediction module: based on the construction of the graph neural network, the graph neural network is used for outputting a prediction result of intelligent contract vulnerability detection according to the semantic features of the nodes.
As shown in fig. 9, the network used in the prediction module includes an input layer, a picture scroll layer, a pooling layer, a readout layer, and a multi-layer perceptron layer. The graph convolution layer aggregates and updates the characteristic information of the nodes in the key code graph, the output of the five-layer graph convolution neural network is connected with a final pooling layer in a jump connection mode, and then the final vector is read out through a reading layer and is transmitted to a multi-layer perceptron layer for classification.
Wherein, input layer: the input data are a feature matrix formed by feature vectors of all nodes of the key code diagram and an adjacent matrix representing the edge relation. Wherein the feature matrix is the node semantic features extracted by the semantic feature extraction module in the step 2, and is used for extracting the node semantic features by using
Figure BDA0004201340760000091
To express, the adjacency matrix is obtained based on the key code diagram, the rows and columns of the adjacency matrix respectively correspond to the nodes in the key code diagram, each element in the adjacency matrix represents the connection relation between two nodes, if there is a connecting edge between the nodes, the corresponding element of the adjacency matrix is 1, otherwise, is 0, and the expression is that
Figure BDA0004201340760000092
Graph convolution layer: using multiple graph convolution layers, each graph convolution layer updates node aggregation of nodes in the graph and adaptively learns the importance of each node from the input graph in conjunction with a self-attention mechanism. Each node is assigned an importance score using the following graph convolution formula
Figure BDA0004201340760000093
Wherein delta represents an activation function, A represents an adjacency matrix, D represents a degree matrix of a key code diagram, X represents a node semantic matrix, and theta att Representing the characteristic parameters. The number of layers of the convolution layer is not limited, but five layers are preferred in this embodiment.
Pooling layer: the output of the graph rolling network is connected with a final pooling layer in a jump connection mode, a SagPool pooling mechanism is adopted, top-k pooling operation is carried out according to importance scores and a topological structure, some less important nodes are abandoned, a new graph structure is obtained, and therefore the adjacent matrix and the node semantic features are updated.
Readout layer: and reducing the graphs with different nodes into the same dimension by adopting a mean value pooling mode to obtain a final feature vector for a subsequent image classification task.
Multilayer perceptron layer: the network layer is formed by combining an activation layer, a full connection layer, a Dropout layer and a SoftMax layer and is used for classifying the feature vectors output by the reading layer.
The key code graphs are represented, semantic information and structural information of codes can be considered, accuracy is improved, and characteristics of the graphs are learned through the graph neural network. In addition, abnormal structures can be accurately diagnosed and interpreted through a graph neural network learning model and an attention mechanism.
Step 3: training, testing and application of a detection model: training the detection model, and taking the detection model passing the test to perform intelligent contract vulnerability detection on the key code diagram corresponding to the source code to be detected.
The training and testing of the detection model are performed based on the existing source codes, and the source codes with intelligent contract holes are marked to obtain a training set and a testing set. When the detection model is applied, the key code diagram is firstly extracted through preprocessing the source codes to be detected, then the trained detection model is input for diagram classification, whether the key code diagram has intelligent contract holes or not is checked, for example, the detection model is finally output to 0 to indicate that the intelligent contract holes do not exist, and the detection model is finally output to 1 to indicate that the intelligent contract holes exist.
The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples merely represent a few embodiments of the present invention, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of the invention should be assessed as that of the appended claims.

Claims (10)

1. The intelligent contract vulnerability detection method based on the graph neural network is characterized by comprising the following steps of:
step 1: source code pretreatment: normalizing source codes of intelligent contracts, extracting a control flow graph from the normalized source codes of the intelligent contracts, marking nodes containing sensitive operations in the control flow graph according to a preset matching rule base, creating a program dependency graph according to the marked control flow graph, extracting a key code graph according to the program dependency graph, and forming the matching rule base according to the sensitive operations causing the intelligent contract loopholes;
step 2: the method comprises the steps of constructing a detection model based on a graph neural network, wherein the detection model comprises a symbolizing module, a semantic feature extraction module and a prediction module, the symbolizing module is used for symbolizing variable names and function names in the key code graph, the semantic feature extraction module is used for extracting node semantic features from the symbolized key code graph, and the prediction module is constructed based on the graph neural network and is used for outputting a prediction result of intelligent contract detection according to the node semantic features;
step 3: training, testing and application of the detection model: training the detection model, and taking the detection model passing the test to perform intelligent contract vulnerability detection on the key code diagram corresponding to the source code to be detected.
2. The intelligent contract vulnerability detection method based on graph neural network of claim 1, wherein the sensitive operation is a specific function.
3. The intelligent contract vulnerability detection method based on the graphic neural network as set forth in claim 1, wherein the normalization processing is to format the source codes of the intelligent contracts to be detected by using an ethline tool according to Solidity Style Guide specifications, then complement default keywords omitted by the source codes of the intelligent contracts in the programming process, and then convert different codes expressing the same semantics into the same expression form.
4. The intelligent contract vulnerability detection method based on the graphic neural network as set forth in claim 1, wherein the control flow graph is extracted from an abstract syntax tree reflecting a source code syntax relationship, and the abstract syntax tree is obtained by source code conversion of the intelligent contracts after normalization processing.
5. The intelligent contract vulnerability detection method based on graph neural network of claim 1, wherein the program dependency graph is generated by combining a control dependency graph and a data dependency graph generated by the control flow graph, nodes of the control dependency graph, the data dependency graph and the program dependency graph are identical, and edges of the program dependency graph are a union of edges of the control dependency graph and edges of the data dependency graph.
6. The intelligent contract vulnerability detection method based on graph neural network as set forth in claim 5, wherein the control dependency graph is generated by: adding an outlet and an inlet in the control flow graph, wherein the outlet is the last part of the source code of the intelligent contract, the inlet is the first part of the source code of the intelligent contract, an edge is added between the outlet and the inlet, each edge of the control flow graph is inverted to obtain a reverse control flow graph, each node is subjected to a dominance according to the reverse control flow graph, a forward dominance boundary of each node is obtained through the dominance and the reverse control flow graph, and the control dependency graph is obtained according to the forward dominance boundary;
the data dependency graph is generated in the following way: selecting nodes which define variables in the control flow graph as main nodes, traversing the main nodes, screening nodes which directly use the values of the variables defined by the main nodes as secondary nodes, forming an edge between the main nodes and the corresponding secondary nodes, and obtaining the data dependency graph after traversing.
7. The intelligent contract vulnerability detection method based on graph neural network as set forth in claim 5, wherein the key code graph is all nodes with control dependency and data dependency with the nodes with sensitive operations on the program dependency graph.
8. The intelligent contract vulnerability detection method based on graph neural network of claim 1, wherein the symbolizing processing of variable names and function names in the key code graph comprises: and mapping the user-defined variable names and function names in the nodes in the key code graph into respective unified forms.
9. The intelligent contract vulnerability detection method based on graph neural network as set forth in claim 1, wherein the extracting node semantic features from the signed key code graph includes: splitting the nodes in the key code graph into basic units with minimum semantics, and extracting semantic information from the basic units in each node through a Doc2Vec model or a Word2Vec model to obtain node semantic features.
10. The intelligent contract vulnerability detection method based on graph neural network of claim 1, wherein the prediction module comprises an input layer, a graph convolution layer, a pooling layer, a readout layer and a multi-layer perceptron layer; the input of the input layer is node semantic features and an adjacency matrix obtained according to a key code diagram; the graph volume stacking layer is combined with a self-attention mechanism to self-adaptively learn the importance of the nodes in the key code graph, the output is connected with the pooling layer through jump connection, the pooling layer adopts a SagPool pooling mechanism, the reading layer adopts a mean pooling mode, and the multi-layer perceptron layer is formed by combining an activation layer, a full connection layer, a Dropout layer and a softMax layer.
CN202310462926.1A 2023-04-26 2023-04-26 Intelligent contract vulnerability detection method based on graph neural network Pending CN116383832A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310462926.1A CN116383832A (en) 2023-04-26 2023-04-26 Intelligent contract vulnerability detection method based on graph neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310462926.1A CN116383832A (en) 2023-04-26 2023-04-26 Intelligent contract vulnerability detection method based on graph neural network

Publications (1)

Publication Number Publication Date
CN116383832A true CN116383832A (en) 2023-07-04

Family

ID=86961753

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310462926.1A Pending CN116383832A (en) 2023-04-26 2023-04-26 Intelligent contract vulnerability detection method based on graph neural network

Country Status (1)

Country Link
CN (1) CN116383832A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117556425A (en) * 2023-11-24 2024-02-13 烟台大学 Intelligent contract vulnerability detection method, system and equipment based on graph neural network

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117556425A (en) * 2023-11-24 2024-02-13 烟台大学 Intelligent contract vulnerability detection method, system and equipment based on graph neural network
CN117556425B (en) * 2023-11-24 2024-04-23 烟台大学 Intelligent contract vulnerability detection method, system and equipment based on graph neural network

Similar Documents

Publication Publication Date Title
CN111639344B (en) Vulnerability detection method and device based on neural network
CN108446540B (en) Program code plagiarism type detection method and system based on source code multi-label graph neural network
CN111125716B (en) Method and device for detecting Ethernet intelligent contract vulnerability
CN102339252B (en) Static state detecting system based on XML (Extensive Makeup Language) middle model and defect mode matching
CN107229563A (en) A kind of binary program leak function correlating method across framework
Al-Obeidallah et al. A survey on design pattern detection approaches
CN106293891B (en) Multidimensional investment index monitoring method
CN113326187B (en) Data-driven memory leakage intelligent detection method and system
CN112733156A (en) Intelligent software vulnerability detection method, system and medium based on code attribute graph
CN115309451A (en) Code clone detection method, device, equipment, storage medium and program product
CN115146279A (en) Program vulnerability detection method, terminal device and storage medium
Zeng et al. EtherGIS: a vulnerability detection framework for ethereum smart contracts based on graph learning features
CN114547611A (en) Intelligent contract Pompe fraudster detection method and system based on multi-modal characteristics
CN116383832A (en) Intelligent contract vulnerability detection method based on graph neural network
CN117215935A (en) Software defect prediction method based on multidimensional code joint graph representation
CN115455429A (en) Vulnerability analysis method and system based on big data
CN116150757A (en) Intelligent contract unknown vulnerability detection method based on CNN-LSTM multi-classification model
CN116578980A (en) Code analysis method and device based on neural network and electronic equipment
Zhou et al. Grapheye: A novel solution for detecting vulnerable functions based on graph attention network
CN115758388A (en) Vulnerability detection method of intelligent contract based on low-dimensional byte code characteristics
Xia et al. Source Code Vulnerability Detection Based On SAR-GIN
CN116628695A (en) Vulnerability discovery method and device based on multitask learning
Tang et al. An attention-based automatic vulnerability detection approach with GGNN
Ouyang et al. Binary vulnerability mining based on long short-term memory network
Yang et al. A function level Java code clone detection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination