CN113626826A

CN113626826A - Intelligent contract security detection method, system, equipment, terminal and application

Info

Publication number: CN113626826A
Application number: CN202110862067.6A
Authority: CN
Inventors: 董学文; 田文生; 沈玉龙; 丛雅倩; 张志为; 佟威; 张涛; 冶英杰; 李光夏
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2021-07-29
Filing date: 2021-07-29
Publication date: 2021-11-09

Abstract

The invention belongs to the technical field of block chain safety, and discloses an intelligent contract safety detection method, a system, equipment, a terminal and application, wherein the intelligent contract safety detection method comprises the following steps: training a word2vec model by using open source codes; encapsulating the open source code into an intelligent contract function according to the intelligent contract grammar; converting the packaged function into an abstract syntax tree, and extracting data flow and control flow information; converting data flow and control flow information of the intelligent contract into a graph model; converting the graph nodes into vectors by using a trained word2vec model; training a graph model by using a graph neural network; reading out all node information, and converting the intelligent contract function graph model into a vector; and judging whether the function vector contains intelligent contract vulnerability information or not by using the classification model. The invention improves the safety detection efficiency of the intelligent contract and obtains better effect.

Description

Intelligent contract security detection method, system, equipment, terminal and application

Technical Field

The invention belongs to the technical field of block chain security, and particularly relates to an intelligent contract security detection method, system, equipment, terminal and application.

Background

Currently, the intelligent contracts on the blockchain platform mostly involve the transaction and processing of digital assets or cryptocurrency, so that vulnerabilities in the intelligent contracts can be exploited to expose users to malicious attacks. As such, in recent years, more and more researchers have started to research on intelligent contract security detection methods, but the existing research is mainly carried out around the ethernet platform, and the intelligent contract security detection on the alliance-chain platform represented by the superhedger Fabric lacks a targeted detection method. The main reasons are as follows:

(1) the intelligent contracts on the Fabric platform are mainly deployed in an organization and are difficult to obtain;

(2) the Etherns and the Fabric have different platform characteristics, and the safety detection tools and methods of the two platforms cannot be directly used with each other;

(3) the intelligent contracts disclosed by the Fabric platform are few, and large-scale analysis and research cannot be carried out.

The intelligent contract is a piece of software running on a blockchain system, and the earlier the problem in the intelligent contract is found, the less manpower and material resources are invested in the correction process as other software systems are. With more and more assets hosted on the Fabric platform, once a problem occurs in an intelligent contract running on the Fabric platform, the intelligent contract will affect business of an enterprise and will cause capital loss, so security research on the Fabric intelligent contract is particularly urgent and necessary.

Through the above analysis, the problems and defects of the prior art are as follows:

(1) the existing research is mainly developed around an EtherFang platform, and the intelligent contract security detection of a alliance chain platform represented by HyperLegendr Fabric is lack of a targeted detection method.

(2) The intelligent contracts on the Fabric platform are mainly deployed in an organization and are difficult to obtain; the intelligent contracts disclosed by the Fabric platform are few, and large-scale analysis and research cannot be carried out.

(3) The Etherns and the Fabric have different platform characteristics, and the safety detection tools and methods of the two platforms cannot be directly used mutually.

The difficulty in solving the above problems and defects is:

on one hand, because the intelligent contract codes disclosed by the Fabric platform are less, for the graph neural network, the available data set is insufficient, the quality of the machine learning detection model is limited due to the lack of training data, and the difficulty of the model training is increased; on the other hand, the existing block chain platform contract security detection tool mainly aims at the Ethernet platform, cannot be directly used for the Fabric platform, and is extremely lack of related Fabric contract vulnerability detection data for reference, so that the difficulty of scheme specific implementation is further increased.

The significance of solving the problems and the defects is as follows:

the invention aims at the characteristics of a Hyperhedger Fabric platform, systematically and completely analyzes the security problem and potential risk of the Intelligent contract of the Fabric platform, comprehensively evaluates the security of the intelligent contract, improves the overall security of the Fabric Block chain system, plays a certain role in preventing the capital of an enterprise from being lost due to the contract security loopholes, and greatly reduces the capital loss risk of the enterprise.

Disclosure of Invention

The invention provides an intelligent contract security detection method, a system, equipment, a terminal and application aiming at the problems in the prior art, and particularly relates to an intelligent contract security detection method, a system, equipment, a terminal and application based on a graph neural network.

The invention is realized in this way, a security detection method of intelligent contracts, the security detection method of intelligent contracts includes the following steps:

firstly, using an open source Go code training word2vec model on Github to realize the initialization of nodes of a program graph and obtain an initial vector of each node in a graph representation of each source code;

step two, encapsulating open source Github codes into intelligent contract functions as main sources of the data sets according to intelligent contract syntax;

converting the packaged function into an abstract syntax tree, and extracting data flow and control flow information for constructing a relational graph;

step four, converting the data flow and control flow information of the intelligent contract into a graph model, wherein the graph model can reserve and deduce more control and data flow information so as to capture more structural information of the intelligent contract vulnerability as the input of a graph neural network;

converting graph nodes into vectors by using a trained word2vec model, wherein the purpose is to construct a vector space, so that words with close context relation in the source code are adjacent to each other in the vector space;

step six, using a graph neural network to train a graph model to obtain a group of parameters of the model, and finally using the model adopting the parameters to complete the classification task of the invention;

reading out all node information, converting the intelligent contract function graph model into a vector, and transmitting the code embedding vector to a downstream neural network for prediction through code embedding;

and step eight, judging whether the function vector contains intelligent contract vulnerability information by using a classification model, and finally realizing the classification task of the invention.

Further, the intelligent contract security detection method further comprises:

(1) using an open source code training word2vec model, packaging the open source code into an intelligent contract function, and establishing a training data set through manual marking;

(2) the data set is divided into 8: 1: 1, sorting the training set, the verification set and the test set;

(3) processing the source code file by using an AST analysis tool developed by Go language to generate an AST graph structure of the source file;

(4) saving the generated diagram structure into a file, wherein the generated program diagram structure file corresponds to the source file;

(5) training the graph structure file in the step (4) by using a word2vec algorithm;

(6) training a neural network vulnerability detection model of the training diagram to finish classification detection.

Further, in step (4), the program graph is constructed by grammar nodes and grammar marks in the AST; the standard AST node only has one edge and is used for representing the parent-child relationship between two AST nodes; adding a plurality of edges including a protection edge, a jump and a final dictionary to the AST by the model, and recording the edge of each relational graph by using an adjacency matrix for acquiring additional grammar, data and control information; for each edge, additionally adding a backward edge for propagating information in the relational graph; wherein, the non-terminal in the grammar node language grammar in the AST comprises AST nodes declared by if statements or functions; the grammar tag is a terminal, including an identifier name and a constant value.

Further, in the step (5), the graph structure file is trained by using a word2vec algorithm by calling a Gensim library; the word2vec network maps the nodes and tokens of each program graph into a vector, so that words with close context in the source code are in close proximity to each other in vector space.

Dividing a source code file by taking a function as a unit, processing the functions one by using a trained word2vec model to realize the initialization of node vectors of each function graph, creating a data flow and a control flow graph, and independently storing graph nodes and graph structures of the functions into files, wherein the files are marked by 0 and 1 to judge whether the functions contain bugs, and the files are used as input data of a graph neural network.

Further, in step (6), the training graph neural network vulnerability detection model includes:

learning a multi-relation graph by using a neighborhood aggregation algorithm, expressing each node of the relation graph into a vector containing 100 features by using a GGNN model, and updating the embedding of the nodes by using a neighborhood aggregation scheme; 100-dimensional embedding vector h of graph node v_vIs computed by the embedding layer by recursively aggregating and transforming the representation vectors of its neighboring nodes; the nodes exchange information by sending their current state, i.e. the embedded vector, as a message to all neighbors along the edge; at each node, the messages are aggregated for updating the associated node representation at the next embedding level, i.e. the next iteration; after repeating this process to update the node states for a fixed number of iterations, the primitives are aggregated into a single embedding vector using the read-out function.

Another object of the present invention is to provide an intelligent contract security detection system using the intelligent contract security detection method, the intelligent contract security detection system comprising:

the word2vec model training module is used for training the word2vec model by using an open source Go code on Github;

the contract function packaging module is used for packaging the open source Github code into an intelligent contract function according to the intelligent contract grammar;

the conversion extraction module is used for converting the packaged functions into an abstract syntax tree and extracting data flow and control flow information;

the information conversion module is used for converting data flow and control flow information of the intelligent contract into a graph model;

the graph node conversion module is used for converting the graph nodes into vectors by using the trained word2vec model;

the graph model training module is used for training the graph model by using a graph neural network;

the graph model conversion module is used for reading out all node information and converting the intelligent contract function graph model into a vector;

and the contract vulnerability judgment module is used for judging whether the function vector contains intelligent contract vulnerability information by using the classification model.

It is a further object of the invention to provide a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of:

training a word2vec model using open source Go code on Github; encapsulating open source Github codes into intelligent contract functions according to intelligent contract syntax; converting the packaged function into an abstract syntax tree, and extracting data flow and control flow information; converting data flow and control flow information of the intelligent contract into a graph model;

converting the graph nodes into vectors by using a trained word2vec model; training a graph model by using a graph neural network; reading out all node information, and converting the intelligent contract function graph model into a vector; and judging whether the function vector contains intelligent contract vulnerability information or not by using the classification model.

It is another object of the present invention to provide a computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:

Another object of the present invention is to provide an information data processing terminal, which is used for implementing the intelligent contract security detection system.

The invention also aims to provide application of the intelligent contract security detection system in detecting the vulnerability of the HyperLegger Fabric intelligent contract.

By combining all the technical schemes, the invention has the advantages and positive effects that: the intelligent contract security detection system provided by the invention realizes vulnerability detection on the Hyperridge Fabric intelligent contract.

Compared with the existing intelligent contract vulnerability detection system, the intelligent contract vulnerability detection system and method based on Fabric provided by the invention have the following advantages that:

(1) the complexity of manually defining functions by human experts is reduced, and the safety detection efficiency of intelligent contracts is improved;

(2) the intelligent contract is learned through the graph neural network, and the model can reserve and reason more control and data flow information so as to capture more structural information of the intelligent contract vulnerability;

(3) the method has the advantages that the source codes in the open source project are used for marking, the problem that the Fabric intelligent contract data set is insufficient is solved, the marked source codes are trained by marking the source codes in the open source project, and a good effect is achieved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments of the present invention will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart of an intelligent contract security detection method according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of an intelligent contract security detection method provided by an embodiment of the present invention.

FIG. 3 is a block diagram of an intelligent contract security detection system provided by an embodiment of the present invention;

in the figure: 1. a word2vec model training module; 2. a contract function encapsulation module; 3. a conversion extraction module; 4. an information conversion module; 5. a graph node conversion module; 6. a graph model training module; 7. a graph model conversion module; 8. and a contract vulnerability judgment module.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Example 1

1) Experiment the Fabric Intelligent contract Security detection model was implemented using TensorFlow-2.1.0.

2) An AST graph of the source code is constructed by means of the Go language AST packet, and different edge relations are extracted from the AST graph. I.e., traversing all of the source code's AST nodes. During the passing, all nodes are numbered sequentially, the relation between different edges is obtained according to a specific rule, and variable names are rewritten by using a uniform naming scheme. This step ensures that semantic differences such as variable names in the program do not affect the choice of token (token) embedding.

3) Py training file.

4) GGNN model training code is written within this file. The code starts with a following packet load code.

5) Py test files are created under the same directory.

6) GGNN model test code is written within this file. The code starts with a following packet load code.

import json

import os

import random

import re

import sys

import psutil

import time

from typing import Dict,Optional,Callable,Any

import nni

import jsonlines

import numpy as np

import tensorflow as tf

from tensorflow.python.training.tracking impor tdata_structuresas tf_data_structures

from dpu_utils.utils import RichPath

from tensorflow_core.python.keras import Sequential

from tensorflow_core.python.keras.layers import Dense

7) Training and testing were performed on a Huacheng cloud server equipped with GPU:1 × v100NV32, CPU:8 core 64 GiB.

8) And dividing the training samples into a plurality of batches according to the vulnerability types, and respectively training the GGNN model, wherein each batch of samples consists of a positive sample and a negative sample.

9) In order to minimize the distance between the two probability distributions for the predicted and true values, the cross-entropy loss is chosen as the objective function.

10) The model uses minimum batch Stochastic Gradient Descent (SGD) and Adam algorithm with a learning rate of 0.001. Training will terminate when the loss is less than 0.005 or a maximum of 100 training periods is reached.

11) All methods on their respective datasets were evaluated using quintuple cross-validation. The standard method is used to evaluate the generalization ability of the predictive model.

12) The following four standard indices were selected as evaluation indices:

accuracy (Accuracy): ratio of correctly labeled cases to total number of test cases.

Precision (Precision): the ratio of correctly predicted samples to the total number of samples predicted to have a particular label.

Recall (Recall): the ratio of correctly predicted samples to the total number of test samples belonging to a class.

F1 score: average values of Precision and Recall, calculated as: 2 × (Recall × Precision)/(Recall + Precision). The index facilitates testing vulnerability type distribution.

13) In order to verify the model accuracy of the neural network based on the gated graph, the invention designs a graph embedding model which does not contain control flow and data flow in the testing process to carry out comparison testing with the model of the embodiment. The model of the present embodiment is represented by GGNN-CP, and the model not containing control flow and data flow is represented by GGNN-nocP. The results are shown below for each evaluation index:

according to test results, the accuracy and the distinguishing capability of the GGNN-CP model to negative samples are higher than those of the GGNN-noCP model, and the fact that the GGNN-CP has better source code representation capability than the GGNN-noCP shows that the GGNN-CP can represent more source code internal relations.

14) The detection result of the sample is visualized by using t-SNE (t-partitioned stored fluorescence Neighbor Embedding), and the detection effect of the GGNN-CP is obviously superior to that of the GGNN-nocP.

Aiming at the problems in the prior art, the invention provides an intelligent contract security detection method, a system, equipment, a terminal and application thereof, and the invention is described in detail below with reference to the accompanying drawings.

As shown in fig. 1, the intelligent contract security detection method provided by the embodiment of the present invention includes the following steps:

s101, using an open source Go code on Github to train a word2vec model;

s102, encapsulating open source Github codes into intelligent contract functions according to intelligent contract syntax;

s103, converting the packaged functions into an abstract syntax tree, and extracting data stream and control stream information;

s104, converting data flow and control flow information of the intelligent contract into a graph model;

s105, converting the graph nodes into vectors by using the trained word2vec model;

s106, training the graph model by using a graph neural network;

s107, reading out all node information, and converting the intelligent contract function graph model into a vector;

and S108, judging whether the function vector contains intelligent contract vulnerability information or not by using the classification model.

A schematic diagram of an intelligent contract security detection method provided by an embodiment of the present invention is shown in fig. 2.

As shown in fig. 3, the intelligent contract security detection system provided in the embodiment of the present invention includes:

a word2vec model training module 1, configured to train a word2vec model using an open source Go code on Github;

the contract function packaging module 2 is used for packaging the open source Github code into an intelligent contract function according to the intelligent contract grammar;

the conversion extraction module 3 is used for converting the packaged functions into abstract syntax trees and extracting data flow and control flow information;

the information conversion module 4 is used for converting data flow and control flow information of the intelligent contract into a graph model;

a graph node conversion module 5, configured to convert a graph node into a vector using a trained word2vec model;

the graph model training module 6 is used for training a graph model by using a graph neural network;

the graph model conversion module 7 is used for reading out all node information and converting the intelligent contract function graph model into a vector;

and the contract vulnerability judging module 8 is used for judging whether the function vector contains intelligent contract vulnerability information by using the classification model.

The technical solution of the present invention will be further described with reference to the following examples.

The invention aims to provide a hyper-hedger Fabric intelligent contract security detection system based on a graph neural network, which realizes vulnerability detection on a hyper-hedger Fabric intelligent contract.

The method for detecting the vulnerability of the neural network based on the graph applied to the HyperLegger Fabric comprises the following steps:

firstly, the model firstly trains a word2vec model by using open source codes, then the open source codes are packaged into intelligent contract functions, and a training data set is established through manual marking.

And secondly, for the data set in the first step, the data set is divided into 8 parts: 1: 1, sorting the training set, the verification set and the test set;

thirdly, processing the source code file by using an AST analysis tool developed by Go language to generate an AST graph structure of the source file;

and fourthly, storing the graph structure generated in the third step into a file, wherein the generated program graph structure file corresponds to the source file one by one. The program graph is constructed from grammar nodes in the AST (i.e., non-terminals in the language grammar, such as AST nodes declared by if statements or functions) and grammar tokens (terminals, such as identifier names and constant values). The standard AST node has only one edge to represent the parent-child relationship between two AST nodes. To obtain additional syntax, data, and control information, the model adds a number of edges to the AST, such as guard edges, jumps, last dictionaries, etc., records the edges of each relationship graph using an adjacency matrix, and adds, for each edge, an additional backward edge that helps propagate information in the relationship graph.

And fifthly, training the structure file of the four graphs in the step by using a word2vec algorithm, wherein the method is realized by calling a Gensim library. The word2vec network maps the nodes and tokens of each program graph into a vector, so that words in the source code that are closely related contextually are in close proximity to each other in vector space. In the embodiment, a source code file is divided by taking a function as a unit, the functions are processed one by using a trained word2vec model, the initialization of node vectors of each function graph is realized, a data stream is created, the flow graph is controlled, graph nodes and graph structures of the functions are independently stored in files, the files are marked by 0 and 1 to judge whether the functions contain bugs, and the files are used as input data of a graph neural network.

And sixthly, training a neural network vulnerability detection model of the diagram to finish classification detection. The invention uses a neighborhood aggregation algorithm to learn a multi-relation graph, each node of the relation graph is represented into a vector containing 100 features by the GGNN model, and the embedding of the nodes is updated through a neighborhood aggregation scheme. 100-dimensional embedding vector h of graph node v_vIs computed by the embedding layer by recursively aggregating and transforming the representation vectors of its neighboring nodes. Nodes exchange information by sending their current state (i.e., the embedded vector) as a message to all neighbors along the edge. At each node, the messages are aggregated and then used to update the associated node representation at the next embedding level (i.e., the next iteration). After repeating this process to update the node states for a fixed number of iterations, the primitives are aggregated into a single embedding vector using the read-out function.

Compared with the existing intelligent contract vulnerability detection system, the intelligent contract vulnerability detection system and method based on Fabric provided by the embodiment of the invention have the following advantages that by applying the method of combining the neural network of the graph:

1. the complexity of manually defining functions by human experts is reduced, and the safety detection efficiency of intelligent contracts is improved;

2. the intelligent contract is learned through the graph neural network, and the model can reserve and reason more control and data flow information so as to capture more structural information of the intelligent contract vulnerability;

3. the method has the advantages that the source codes in the open source project are used for marking, the problem that the Fabric intelligent contract data set is insufficient is solved, the marked source codes are trained by marking the source codes in the open source project, and a good effect is achieved.

The graph embedding model accuracy of the invention reaches 0.942, the accuracy is 0.893, the recall rate is 1, the F1 score is 0.943, and the graph embedding model accuracy without control flow and data flow is 0.915, the accuracy is 0.846, the recall rate is 1, and the F1 score is 0.917.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When used in whole or in part, can be implemented in a computer program product that includes one or more computer instructions. When loaded or executed on a computer, cause the flow or functions according to embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL), or wireless (e.g., infrared, wireless, microwave, etc.)). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.

Claims

1. An intelligent contract security detection method is characterized by comprising the following steps:

step one, using open source Go code on Github to train word2vec model;

step two, encapsulating open source Github codes into intelligent contract functions according to intelligent contract syntax;

converting the packaged function into an abstract syntax tree, and extracting data flow and control flow information;

step four, converting the data flow and control flow information of the intelligent contract into a graph model;

step five, converting the graph nodes into vectors by using a trained word2vec model;

step six, using a graph neural network to train a graph model;

reading out all node information, and converting the intelligent contract function graph model into a vector;

and step eight, judging whether the function vector contains intelligent contract vulnerability information by using a classification model.

2. The intelligent contract security detection method of claim 1, further comprising:

3. The smart contract security detection method of claim 2, wherein in step (4), the program graph is constructed from grammar nodes and grammar tags in the AST; the standard AST node only has one edge and is used for representing the parent-child relationship between two AST nodes; adding a plurality of edges including a protection edge, a jump and a final dictionary to the AST by the model, and recording the edge of each relational graph by using an adjacency matrix for acquiring additional grammar, data and control information; for each edge, additionally adding a backward edge for propagating information in the relational graph; wherein, the non-terminal in the grammar node language grammar in the AST comprises AST nodes declared by if statements or functions; the grammar tag is a terminal, including an identifier name and a constant value.

4. The intelligent contract security detection method of claim 2, wherein in step (5), the training of the graph structure file using the word2vec algorithm is implemented by calling a Gensim library; the word2vec network maps the nodes and marks of each program graph into a vector, so that words with close context relation in a source code are adjacent to each other in a vector space;

5. The intelligent contract security detection method according to claim 2, wherein in step (6), the training graph neural network vulnerability detection model comprises:

learning a multi-relation graph by using a neighborhood aggregation algorithm, expressing each node of the relation graph into a vector containing 100 features by using a GGNN model, and updating the embedding of the nodes by using a neighborhood aggregation scheme; 100-dimensional embedding vector h of graph node v_vIs computed by the embedding layer by recursively aggregating and transforming the representation vectors of its neighboring nodes; the nodes exchange information by sending their current state, i.e. the embedded vector, as a message to all neighbors along the edge; at each node, the messages are aggregated for updating the associated node representation at the next embedding level, i.e. the next iteration; in repeating this processAfter the routine updates the node state with a fixed number of iterations, the graph is aggregated into a single embedding vector using the read function.

6. An intelligent contract security detection system for implementing the intelligent contract security detection method according to any one of claims 1 to 5, wherein the intelligent contract security detection system comprises:

7. A computer device, characterized in that the computer device comprises a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the steps of:

8. A computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:

9. An information data processing terminal characterized by being used for implementing the intelligent contract security detection system according to claim 6.

10. An application of the intelligent contract security detection system of claim 6 in the detection of the vulnerability of the HyperhedgerFabric intelligent contracts.