CN117272312A

CN117272312A - Interpretive intelligent contract vulnerability detection and positioning method based on reinforcement learning

Info

Publication number: CN117272312A
Application number: CN202311113916.3A
Authority: CN
Inventors: 江池; 张引; 山长义; 施曼华
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2023-08-31
Filing date: 2023-08-31
Publication date: 2023-12-22

Abstract

The invention provides an interpretive intelligent contract vulnerability detection and positioning method based on reinforcement learning, which is applied to blockchain supervision and aims to solve the problem that intelligent contracts deployed on blockchains have security vulnerabilities. According to the method, firstly, a contract graph is generated according to intelligent contract source codes, the characteristics of the contract graph are mined based on a graph neural network model to conduct vulnerability classification, and a subgraph with the greatest influence on classification results is extracted based on reinforcement learning. According to the invention, the characteristics in the intelligent source code diagram are mined through the deep learning technology, and whether security holes exist in the contract source codes or not is detected before the intelligent contracts are deployed in the blockchain, so that the security of the intelligent contracts is ensured. Meanwhile, through reinforcement learning technology, key subgraphs causing the contract loopholes can be extracted, so that code parts related to the loopholes are positioned, and an interpretable risk code segment positioning result is provided for intelligent contract loophole detection.

Description

Interpretive intelligent contract vulnerability detection and positioning method based on reinforcement learning

Technical Field

The invention belongs to the field of blockchains, and particularly relates to an interpretive intelligent contract vulnerability detection and positioning method based on reinforcement learning.

Background

Intelligent contracts have been developed significantly in different fields as one of the core innovations of blockchain technology. Such automatically executed computing code not only ensures the reliability and transparency of the transaction, but also provides a wide range of possibilities for decentralised applications. However, as the application scope of the smart contracts expands, potential security hazards of the smart contracts are also increasingly highlighted, which makes smart contract vulnerability detection an important field. In the past few years, smart contracts have been in the reach of many areas of finance, supply chain management, digital collections, and the like. In the financial field, decentralised finance (DeFi) is an attractive direction for smart contract applications. DeFi realizes decentralized financial services such as lending, transaction and the like through intelligent contracts, eliminates tedious intermediation links in the traditional financial system, and provides more efficient and open financial ecology. In addition, the field of digital collection is also renewed by intelligent contracts. The intelligent contract enables ownership and trade of the digital collection to be carried out more easily and safely, so that innovation and development of the fields of digital artworks, virtual lands and the like are promoted. However, while the development of smart contracts brings about many opportunities, some challenges are faced. The complexity of smart contracts makes them susceptible to vulnerabilities and errors that can lead to capital loss and security issues. Therefore, security and reliability of smart contracts are currently a problem to be solved. For this challenge, intelligent contract vulnerability detection techniques have evolved. By comprehensive examination and analysis of the smart contract code, potential vulnerabilities can be discovered early, avoiding possible risks. With the continuous evolution of blockchain technology, the development of intelligent contracts will continue to be focused and more innovative applications will be driven in different fields.

In the prior art, there have been many studies on intelligent contract vulnerability detection on blockchains, such as Huang, jianjun, et al, "Precise Dynamic Symbolic Execution for Nonuniform Data Access in Smart contacts," IEEE Transactions on Computers 71.7.7 (2021): 1551-1563. Considering the problem of multiple addressing models in intelligent contracts, to solve the problem of non-uniform data access in intelligent contracts, a Dynamic Symbolic Execution (DSE) method was proposed and implemented with an integer overflow vulnerability detector and a multi-transaction vulnerability detector; li, bixin, zhenyu Pan, and Tianyuan Hu. "redefector: detecting reentrancy vulnerabilities in smart contracts automation," IEEE Transactions on Reliability 71.2.2 (2022): 984-999. An intelligent contract vulnerability detection tool redefector based on fuzzy test is provided, which executes an intelligent contract by generating fuzzy input data, and determines whether a vulnerability exists in the intelligent contract by analyzing the execution log thereof; wu, hongjun, et al, "Peculiar: smart contract vulnerability detection based on crucial data flow graph and pre-training technologies," 2021IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE), "IEEE, 2021. A deep learning method is proposed to implement vulnerability detection by extracting features of key dataflow graphs in intelligent contracts based on natural language processing pre-training techniques. However, the above scheme has the following problems:

(1) The multi-classification vulnerability detection result is lacking. The leak detection results of the existing research work are mostly classified into two categories, namely, whether the intelligent contract has the leak or not can be judged, and the specific leak cannot be subdivided.

(2) And lack of fine-grained vulnerability localization results. Most of the existing research works only consider realizing the detection result of the loopholes, and can not provide the positioning result of the code fragments with specific loopholes.

(3) The detection result has poor interpretability. In the existing deep learning-based method, since the used model is mostly a black box, it is difficult to provide a leak detection result with interpretability.

Disclosure of Invention

The technical problem to be solved by the invention is to provide the intelligent contract vulnerability detection and positioning method capable of guaranteeing the security of intelligent contract codes and the operation reliability of block chains.

The technical scheme adopted by the invention for solving the technical problems is that the method for detecting and positioning the interpretable intelligent contract loopholes based on reinforcement learning is used for carrying out loophole classification based on the feature of the contract graph mined by a graph neural network model, extracting the subgraph with the greatest influence on the classification result based on reinforcement learning, thereby realizing the multi-classification loophole detection result of the intelligent contract, positioning the function code segment with the specific loopholes and guaranteeing the code security of the intelligent contract and the operation reliability of a block chain.

The method specifically comprises the following steps:

(1) Firstly, collecting an intelligent contract data set from an Ethernet, wherein the data set comprises an intelligent contract source code and a corresponding vulnerability type; then converting the intelligent contract source code into a contract graph, wherein nodes in the contract graph are variables in the source code, and edges in the contract graph are calling relations in the source code; obtaining a data set from the contract graph and the corresponding vulnerability categoryWherein (1)>i= {1,2, …, N } represents the i-th contract graph G _i Corresponding vulnerability species, y _i ＝{0,1,2,…,C}，y _i =0 denotes a contract graph G _i No loopholes exist, otherwise representing the contract graph G _i The existing vulnerability types, C represents the total vulnerability number, N represents the number of contract graphs in the dataset;

(2) Dividing the subgraph by taking the code segments of each function in the contract graph as a reference; the code segment of each function corresponds to a subgraph to obtain a subgraph set { g) corresponding to each contract graph ₁ ,g ₂ ,…,g _j ,…,g _M -wherein M represents the number of subgraphs; inputting each sub-graph into a graph neural network, wherein the graph neural network firstly obtains node characteristic representation in the sub-graph, and the j-th sub-graph g _j Is denoted as H (g) _j ) J= {1,2, …, M }; and then represents H (g) to the node characteristics in the subgraph _j ) The weighted fusion is carried out to obtain the characteristic representation r of the subgraph _j The method comprises the steps of carrying out a first treatment on the surface of the Finally, the characteristic representation r of each sub-graph _j The feature representation c of the contract graph is obtained after weighted fusion _i ， Representing a sub-graph weight matrix W _subgraph Is a trainable parameter;

(3) Vulnerability multi-classification model of graph neural network represents the feature of contract graph c _i The multi-classification vulnerability detection of the contract graph is realized through the full connection layer, and the multi-classification vulnerability detection result of the contract graph is output;

(4) Obtaining each sub-graph weight matrix according to the trained graph neural network, and defining the importance degree im of each sub-graph _j ：Sorting the weight coefficients of the vulnerability detection results of the contract graph from high to low according to the importance degree of the subgraphs, and selecting the first m subgraphs, wherein the functions corresponding to the first m subgraphs are code fragments with possible vulnerabilities;

wherein, in the training process of the graph neural network, the subgraph weight matrix W _subgraph Continuously updating to finally obtain the importance score of each sub-graph on the vulnerability detection result; then, fixing model parameters of the vulnerability multi-classification model, and selecting the model parameters based on reinforcement learningPerforming vulnerability detection by merging sub-graphs>The representation is rounded upwards, k is a proportional value with a value range of (0-1), and is a trainable parameter; and stopping the reinforcement learning iteration process when the change of the leak detection accuracy of the m+1 sub-graphs compared with the leak detection accuracy of the first m sub-graphs is smaller than a preset iteration threshold.

The invention discloses an interpretive intelligent contract vulnerability detection and positioning method based on reinforcement learning, which is applied to the field of blockchain supervision and aims to solve the problem that intelligent contracts deployed on blockchains have security vulnerabilities. According to the method, work is performed based on reinforcement learning technology, a contract graph is firstly generated according to intelligent contract source codes, vulnerability classification is performed by mining contract graph features based on a graph neural network model, and subgraphs with the greatest influence on classification results are extracted based on reinforcement learning. According to the invention, the characteristics in the intelligent source code diagram are mined through the deep learning technology, and whether security holes exist in the contract source codes or not is detected before the intelligent contracts are deployed in the blockchain, so that the security of the intelligent contracts is ensured. Meanwhile, through reinforcement learning technology, key subgraphs causing the contract loopholes can be extracted, so that code parts related to the loopholes are positioned, and an interpretable risk code segment positioning result is provided for intelligent contract loophole detection.

The beneficial effects of the invention are as follows:

(1) And (5) multi-category detection. The vulnerability detection and positioning model provided by the invention can output multi-classification vulnerability detection results, realize the detection of multiple vulnerability types at the same time, output corresponding detection results and improve the vulnerability detection efficiency.

(2) Fine-grained positioning. The vulnerability detection and positioning model provided by the invention can output the first m sub-graphs with the greatest influence on the vulnerability detection, and the function segments corresponding to the sub-graphs are the code segments with the most possibility of vulnerability, so that fine-granularity vulnerability positioning is realized.

(3) The results may be interpreted. According to the reinforcement learning-based vulnerability detection and positioning model provided by the invention, the selection of reinforcement learning intelligent agents is iterated continuously by setting the rewarding parameters, so that the interpretive vulnerability positioning is realized, and an evidence is provided for the vulnerability detection result.

Drawings

Fig. 1 is a schematic view of a scenario provided by an embodiment of the present invention.

Fig. 2 is a flow chart of an embodiment.

Detailed Description

The following description of the embodiments of the invention is presented in conjunction with the accompanying drawings to provide a better understanding of the invention to those skilled in the art. It is to be expressly noted that in the description below, detailed descriptions of known functions and designs are omitted here as perhaps obscuring the present invention.

As shown in fig. 1, the blockchain scenario applied by the embodiment is: security is always a key concern in intelligent contract-based blockchain applications. After contract codes are designed and written, the development team can perform vulnerability detection and repair so as to capture and solve potential vulnerabilities and ensure the reliability and stability of contracts. Only after a strict vulnerability detection and confirmation is made is the contract deployed onto the blockchain network to secure the user's funds and transaction data. The development method of the security priority is helpful for establishing user trust and pushing sustainable application of the blockchain technology in the key field.

The invention considers the following factors that affect the deployment efficiency of intelligent contracts:

first are many types of vulnerabilities that exist in the code of the smart contract, including but not limited to reentry attacks, integer overflows, rights control issues, and the like. Therefore, there is a need to develop models that enable multi-classification vulnerability detection to quickly identify various potential vulnerability patterns in contracts.

Second, smart contracts often have quite complex code structures, and because of the long contract code space, developers may take a significant amount of time to locate problems when discovering vulnerabilities. To help developers repair vulnerabilities faster, accurate vulnerability localization results need to be provided.

Finally, vulnerability detection models typically produce some detection results, but these often require the developer to understand and trust. It is not sufficient to simply tell a contract that a vulnerability exists, and the developer needs to know why the model classified as such vulnerability, and how it was detected. Thus, the vulnerability detection model needs to provide interpretable results.

Based on the points, the invention designs an interpretive intelligent contract vulnerability detection and positioning method based on reinforcement learning, which is applied to the field of blockchain supervision and aims to solve the problem that intelligent contracts deployed on blockchains have security vulnerabilities. According to the method, work is performed based on reinforcement learning technology, a contract graph is firstly generated according to intelligent contract source codes, vulnerability classification is performed by mining contract graph features based on a graph neural network model, and subgraphs with the greatest influence on classification results are extracted based on reinforcement learning. According to the invention, the characteristics in the intelligent source code diagram are mined through the deep learning technology, and whether security holes exist in the contract source codes or not is detected before the intelligent contracts are deployed in the blockchain, so that the security of the intelligent contracts is ensured. Meanwhile, through reinforcement learning technology, key subgraphs causing the contract loopholes can be extracted, so that code parts related to the loopholes are positioned, and an interpretable risk code segment positioning result is provided for intelligent contract loophole detection.

The method for detecting and locating the interpretive intelligent contract vulnerability based on reinforcement learning of the invention is described in detail below with reference to fig. 2, and specifically comprises the following steps:

(1) First, an intelligent contract data set is collected from the ethernet, wherein the data set comprises intelligent contract source codes and vulnerability types corresponding to each contract. The intelligent contract source code is then converted into a contract graph, wherein nodes consist of variables in the source code and edges consist of call relationships in the source code. Each constructed contract graph is denoted as g= (V, X, a), where v= { V ₁ ,v ₂ ,…,v _n The node set of the contract graph is represented,representing node characteristics, ++>Representing the adjacency matrix. The node initial characteristics in the contract graph are obtained by a word2vec algorithm, n represents the number of nodes, and d represents the dimension of the node characteristics. Thus, a data set can be obtained +.>Wherein (1)>i= {1,2, …, N } represents the contract graph G _i Corresponding toVulnerability species, y _i ＝{0,1,2,…,C}，y _i =0 denotes a contract graph G _i No loopholes exist, otherwise representing the contract graph G _i The number of vulnerability classes present, C represents the number of vulnerability classes, and N represents the number of samples in the dataset.

(2) Sub-graph partitioning

The subgraph is partitioned with reference to the code segments contained by each function in the contract graph. Each function corresponds to a sub-graph, and the sub-graph contains variables and calling relations in the function. Thus, the sub-graph set { g } corresponding to each contract graph can be obtained ₁ ,g ₂ ,…,g _j ,…,g _M And, wherein M represents the number of subgraphs. The number of nodes in each sub-graph is limited to s, and the node characteristic matrix in each sub-graph is obtained based on the graph neural network and expressed asH(g _j ) Obtained by the following formula:

H(g _j )＝f(g _j )＝{h _j |v _j ∈V(g _j )}

wherein v is _j J= {1,2, …, s } represents subgraph g _i Each node of (h) _j Representing hidden vectors corresponding to each node, and f (·) represents a message propagation mechanism of the graph neural network, and the message propagation mechanism is specifically shown as the following formula:

wherein U (·) represents a state update function of the graph neural network, AGG (·) represents a message aggregation function of the graph neural network, h _j ' represents node v _j Adjacent node v of (a) _j The hidden vector of' superscript l indicates the current aggregation state and superscript l+1 indicates the next aggregation state

(3) Constructing sub-graph feature representations

Obtaining node characteristic representation H (g) in the subgraph through step (2) _j ) Then, obtaining the characteristic representation r of the subgraph based on a weighted fusion method _j The node weight matrix is defined as W _node ThenThe subgraph characterization is calculated as follows:

wherein, node weight matrixIs a parameter that can be trained and is, ^T is transposed.

(4) Building contract graph feature representations

Similarly, the feature representation r of each sub-graph in the contract graph is obtained through the step (3) _j After that, the feature representation c of the contract graph is calculated still based on the weighted fusion mode _i The weight matrix is defined as W _subgraph The contract graph characterization is calculated as follows:

wherein, sub (G) _i ) All sub-graph sets representing the ith contract graph, sub-graph weight matrixIs a trainable parameter, ++>Representing the j-th row of the matrix.

(5) Contract vulnerability multi-class detection

After the feature representation of the contract graph is obtained in the step (4), the multi-classification vulnerability detection of the contract graph is realized through the full-connection layer of the neural network, and the cross entropy loss function of the vulnerability multi-classification model is defined as follows:

wherein, C represents the number of the bug categories, y represents the real bug categories of the sample, and y' represents the classification results output by the model; y is _i,c Representing the ith contract graph corresponding to the c-th vulnerability speciesTrue results of class, y' _i,c The probability of occurrence of loopholes of the ith contract graph corresponding to the c loophole species output by the graph neural network is represented, namely, the multi-classification intelligent contract loophole detection result.

(6) Subgraph selection

In order to determine the first m sub-graphs with the greatest influence on the vulnerability classification result and position the vulnerability function fragments, the top-m sub-graphs are selected based on a reinforcement learning method. First, the importance { im } of each sub-graph is defined according to the weight matrix of each sub-graph _j |g _j ∈G}:

The method comprises the steps of sorting weight coefficients of a contract graph vulnerability detection result from high to low according to subgraphs, and selecting the first m most important subgraphs based on a reinforcement learning method, wherein functions corresponding to the first m subgraphs are code fragments with the most possibility of vulnerability.

(7) Reinforced learning process

In the model training process of vulnerability multi-classification by contract graph, weight matrix W _subgraph And continuously updating to finally obtain the importance score of each sub-graph on the vulnerability detection result. Then, fixing model parameters of the vulnerability multi-classification model, and selecting the model parameters based on reinforcement learningAnd merging the sub-graphs to detect the loopholes, and stopping the reinforcement learning iterative process when the accuracy of the loopholes of the current m+1 sub-graphs is not changed more than that of the previous m sub-graphs. At this time, the m subgraphs are considered to be the code segments with the greatest influence on the vulnerability result, so that the vulnerability function code segments of the contract graph are positioned. The invention uses reinforcement learning method to find the optimal solution of the proportional k value, models the Markov decision process as a cyclic process of taking Action (Action) by an Agent to change the State (State) of the Agent, obtaining rewards (Reward) and continuously interacting with Environment (Environment).

State: state s for each cycle e _e Defined as, preceding in smart contractsPerforming vulnerability detection by splicing the sub-graphs, wherein M sub-graphs are detected according to the importance degree im _i Ordering (S)/(S)>Representing an upward rounding.

Action: the reinforcement learning agent takes action a of each cycle e according to rewards _e . Definition a _e To add or subtract the adjustment value Deltak E [0,1 ] to the current value of k]，

Reward: the method comprises the steps of determining each action a according to the accuracy result of vulnerability detection _e Defining a reward function(s) _e ,a _e ). According to the current state s _e Accuracy of the method _e And the last state s _e-1 Accuracy of the method _e-1 Is a function of the change relation of the reward function(s) _e ,a _e ) There are three values: if accuracy is found _e >accuracy _e-1 The bonus function takes a value of +1; if accuracy is found _e ＝accuracy _e-1 The bonus function takes a value of 0; if accuracy is found _e <accuracy _e-1 The bonus function takes a value of-1.

Terminal. The termination condition of reinforcement learning is defined as: the reinforcement learning algorithm is terminated when the value of m is unchanged in the successive 5-round cycles. This means that reinforcement learning has reached the optimal threshold.

This section learns the Markov decision process using the Q-learning method. Q-learning is a non-strategic reinforcement learning algorithm that aims to find the best action given the current state, which conforms to the best formula of bellman. The method employs an epsilon-greedy search strategy, which means that reinforcement learning agents explore new states by randomly selecting probabilistic epsilon operations, rather than selecting actions based on maximum future rewards.

While the foregoing describes illustrative embodiments of the present invention to facilitate an understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, but is to be construed as protected by the accompanying claims insofar as various changes are within the spirit and scope of the present invention as defined and defined by the appended claims.

Claims

1. An interpretive intelligent contract vulnerability detection and positioning method based on reinforcement learning is characterized by comprising the following steps:

(1) Firstly, collecting an intelligent contract data set from an Ethernet, wherein the data set comprises an intelligent contract source code and a corresponding vulnerability type; then converting the intelligent contract source code into a contract graph, wherein nodes in the contract graph are variables in the source code, and edges in the contract graph are calling relations in the source code; obtaining a data set from the contract graph and the corresponding vulnerability categoryWherein (1)> Representing the ith contract graph G _i Corresponding vulnerability species, y _i ＝{0,1,2,…,C}，y _i =0 denotes a contract graph G _i No loopholes exist, otherwise representing the contract graph G _i The existing vulnerability types, C represents the total vulnerability number, N represents the number of contract graphs in the dataset;

(2) Dividing the subgraph by taking the code segments of each function in the contract graph as a reference; the code segment of each function corresponds to a subgraph to obtain a subgraph set { g) corresponding to each contract graph ₁ ,g ₂ ,…,g _j ,…,g _M M represents the number of subgraphs }, where M represents the number of subgraphsAn amount of; inputting each sub-graph into a graph neural network, wherein the graph neural network firstly obtains node characteristic representation in the sub-graph, and the j-th sub-graph g _j Is denoted as H (g) _j ) J= {1,2, …, M }; and then represents H (g) to the node characteristics in the subgraph _j ) The weighted fusion is carried out to obtain the characteristic representation r of the subgraph _j The method comprises the steps of carrying out a first treatment on the surface of the Finally, the characteristic representation r of each sub-graph _j The feature representation c of the contract graph is obtained after weighted fusion _i ， Representing a sub-graph weight matrix W _subgraph Is a trainable parameter;

wherein, in the training process of the graph neural network, the subgraph weight matrix W _subgraph Continuously updating to finally obtain the importance score of each sub-graph on the vulnerability detection result; then, fixing model parameters of the vulnerability multi-classification model, and selecting the model parameters based on reinforcement learningPerforming vulnerability detection by merging sub-graphs>Represents rounding up, k is a value in the range of (0-1]Is a trainable parameter; and stopping the reinforcement learning iteration process when the change of the leak detection accuracy of the m+1 sub-graphs compared with the leak detection accuracy of the first m sub-graphs is smaller than a preset iteration threshold.

2. The method of claim 1, wherein each contract graph in step (1) is represented as g= (V, X, a), where V represents a set of nodes of the contract graph, X represents a node characteristic, and a represents an adjacency matrix; the node characteristics in the contract graph are obtained by the source codes corresponding to the nodes through word2vec coding.

3. The method of claim 1, wherein the specific method for obtaining the contract graph feature representation in step (2) is:

the neural network pair-wise graph represents the node characteristics H (g _j ) The weighted fusion is carried out to obtain the characteristic representation r of the subgraph _j The method comprises the following steps:W _node the node weight matrix in the graph neural network is a trainable parameter; ^T is transposed.

4. A method according to claim 3, characterized in that the node characteristics represent H (g _j ) The message propagation mechanism f (·) through the graph neural network is: h (g) _i )＝f(g _i )＝{h _j }；h _j Representing subgraph g _j Each node v of (a) _j Hidden vector, h _j The updating mode in the training process is as follows:wherein U (·) represents a state update function of the graph neural network, AGG (·) represents a message aggregation function of the graph neural network, h _j ' represents node v _j Adjacent node v of (a) _j The hidden vector of' superscript l indicates the currentThe aggregation state, superscript l+1, indicates the next aggregation state.

5. The method of claim 1, wherein the cross entropy loss function L used in training a vulnerability multi-classification model in a graph neural network _CE The method comprises the following steps:

wherein y is _i,c Representing the actual result of the ith contract graph corresponding to the c-th vulnerability class, y' _i,c The probability of occurrence of loopholes of the ith contract graph corresponding to the c loophole species output by the graph neural network is represented, namely, the multi-classification intelligent contract loophole detection result.

6. The method of claim 1, wherein reinforcement learning of the vulnerability multi-classification model employs markov decisions, which are cyclic processes in which the vulnerability multi-classification model obtains rewards by continually taking action to change states;

the state is: state s for each cycle e _e Set up to front in intelligent contractSplicing the sub-graphs to detect loopholes;

the actions are as follows: action a of taking each round of cycle e in accordance with rewards _e The method comprises the steps of carrying out a first treatment on the surface of the Set a _e To add or subtract an adjustment value ak to or from the current k,

the rewards are: each action a is based on the accuracy result of the vulnerability detection _e Setting a reward function(s) _e ,a _e ) The method comprises the steps of carrying out a first treatment on the surface of the According to the current state s _e Accuracy of the method _e And the last state s _e-1 Accuracy of the method _e-1 Is a function of the change relation of the reward function(s) _e ,a _e ) There are three values: if accuracy is found _e >accuracy _e-1 The bonus function takes a value of +1; if accuracy is found _e ＝accuracy _e-1 The bonus function takes a value of 0; if accuracy is found _e <accuracy _e-1 The bonus function takes a value of-1;

the cycle termination conditions for markov decisions are: when the value of m is unchanged in the continuous 5-round circulation.