CN113626826A - Intelligent contract security detection method, system, equipment, terminal and application - Google Patents
Intelligent contract security detection method, system, equipment, terminal and application Download PDFInfo
- Publication number
- CN113626826A CN113626826A CN202110862067.6A CN202110862067A CN113626826A CN 113626826 A CN113626826 A CN 113626826A CN 202110862067 A CN202110862067 A CN 202110862067A CN 113626826 A CN113626826 A CN 113626826A
- Authority
- CN
- China
- Prior art keywords
- graph
- intelligent contract
- model
- converting
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 65
- 239000013598 vector Substances 0.000 claims abstract description 61
- 238000012549 training Methods 0.000 claims abstract description 54
- 238000013528 artificial neural network Methods 0.000 claims abstract description 29
- 238000013145 classification model Methods 0.000 claims abstract description 11
- 230000006870 function Effects 0.000 claims description 72
- 238000006243 chemical reaction Methods 0.000 claims description 16
- 238000012360 testing method Methods 0.000 claims description 12
- 238000010586 diagram Methods 0.000 claims description 8
- 238000004806 packaging method and process Methods 0.000 claims description 8
- 238000012545 processing Methods 0.000 claims description 8
- 230000002776 aggregation Effects 0.000 claims description 6
- 238000004220 aggregation Methods 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 5
- 238000000605 extraction Methods 0.000 claims description 4
- 230000004931 aggregating effect Effects 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 230000001131 transforming effect Effects 0.000 claims description 3
- 238000012795 verification Methods 0.000 claims description 3
- 230000001902 propagating effect Effects 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 4
- 239000004744 fabric Substances 0.000 description 27
- 230000008676 import Effects 0.000 description 15
- 238000000034 method Methods 0.000 description 14
- 238000011160 research Methods 0.000 description 6
- 239000013256 coordination polymer Substances 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 230000007547 defect Effects 0.000 description 3
- 238000009826 distribution Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000005538 encapsulation Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000010561 standard procedure Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/57—Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
- G06F21/577—Assessing vulnerabilities and evaluating computer system security
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/42—Syntactic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention belongs to the technical field of block chain safety, and discloses an intelligent contract safety detection method, a system, equipment, a terminal and application, wherein the intelligent contract safety detection method comprises the following steps: training a word2vec model by using open source codes; encapsulating the open source code into an intelligent contract function according to the intelligent contract grammar; converting the packaged function into an abstract syntax tree, and extracting data flow and control flow information; converting data flow and control flow information of the intelligent contract into a graph model; converting the graph nodes into vectors by using a trained word2vec model; training a graph model by using a graph neural network; reading out all node information, and converting the intelligent contract function graph model into a vector; and judging whether the function vector contains intelligent contract vulnerability information or not by using the classification model. The invention improves the safety detection efficiency of the intelligent contract and obtains better effect.
Description
Technical Field
The invention belongs to the technical field of block chain security, and particularly relates to an intelligent contract security detection method, system, equipment, terminal and application.
Background
Currently, the intelligent contracts on the blockchain platform mostly involve the transaction and processing of digital assets or cryptocurrency, so that vulnerabilities in the intelligent contracts can be exploited to expose users to malicious attacks. As such, in recent years, more and more researchers have started to research on intelligent contract security detection methods, but the existing research is mainly carried out around the ethernet platform, and the intelligent contract security detection on the alliance-chain platform represented by the superhedger Fabric lacks a targeted detection method. The main reasons are as follows:
(1) the intelligent contracts on the Fabric platform are mainly deployed in an organization and are difficult to obtain;
(2) the Etherns and the Fabric have different platform characteristics, and the safety detection tools and methods of the two platforms cannot be directly used with each other;
(3) the intelligent contracts disclosed by the Fabric platform are few, and large-scale analysis and research cannot be carried out.
The intelligent contract is a piece of software running on a blockchain system, and the earlier the problem in the intelligent contract is found, the less manpower and material resources are invested in the correction process as other software systems are. With more and more assets hosted on the Fabric platform, once a problem occurs in an intelligent contract running on the Fabric platform, the intelligent contract will affect business of an enterprise and will cause capital loss, so security research on the Fabric intelligent contract is particularly urgent and necessary.
Through the above analysis, the problems and defects of the prior art are as follows:
(1) the existing research is mainly developed around an EtherFang platform, and the intelligent contract security detection of a alliance chain platform represented by HyperLegendr Fabric is lack of a targeted detection method.
(2) The intelligent contracts on the Fabric platform are mainly deployed in an organization and are difficult to obtain; the intelligent contracts disclosed by the Fabric platform are few, and large-scale analysis and research cannot be carried out.
(3) The Etherns and the Fabric have different platform characteristics, and the safety detection tools and methods of the two platforms cannot be directly used mutually.
The difficulty in solving the above problems and defects is:
on one hand, because the intelligent contract codes disclosed by the Fabric platform are less, for the graph neural network, the available data set is insufficient, the quality of the machine learning detection model is limited due to the lack of training data, and the difficulty of the model training is increased; on the other hand, the existing block chain platform contract security detection tool mainly aims at the Ethernet platform, cannot be directly used for the Fabric platform, and is extremely lack of related Fabric contract vulnerability detection data for reference, so that the difficulty of scheme specific implementation is further increased.
The significance of solving the problems and the defects is as follows:
the invention aims at the characteristics of a Hyperhedger Fabric platform, systematically and completely analyzes the security problem and potential risk of the Intelligent contract of the Fabric platform, comprehensively evaluates the security of the intelligent contract, improves the overall security of the Fabric Block chain system, plays a certain role in preventing the capital of an enterprise from being lost due to the contract security loopholes, and greatly reduces the capital loss risk of the enterprise.
Disclosure of Invention
The invention provides an intelligent contract security detection method, a system, equipment, a terminal and application aiming at the problems in the prior art, and particularly relates to an intelligent contract security detection method, a system, equipment, a terminal and application based on a graph neural network.
The invention is realized in this way, a security detection method of intelligent contracts, the security detection method of intelligent contracts includes the following steps:
firstly, using an open source Go code training word2vec model on Github to realize the initialization of nodes of a program graph and obtain an initial vector of each node in a graph representation of each source code;
step two, encapsulating open source Github codes into intelligent contract functions as main sources of the data sets according to intelligent contract syntax;
converting the packaged function into an abstract syntax tree, and extracting data flow and control flow information for constructing a relational graph;
step four, converting the data flow and control flow information of the intelligent contract into a graph model, wherein the graph model can reserve and deduce more control and data flow information so as to capture more structural information of the intelligent contract vulnerability as the input of a graph neural network;
converting graph nodes into vectors by using a trained word2vec model, wherein the purpose is to construct a vector space, so that words with close context relation in the source code are adjacent to each other in the vector space;
step six, using a graph neural network to train a graph model to obtain a group of parameters of the model, and finally using the model adopting the parameters to complete the classification task of the invention;
reading out all node information, converting the intelligent contract function graph model into a vector, and transmitting the code embedding vector to a downstream neural network for prediction through code embedding;
and step eight, judging whether the function vector contains intelligent contract vulnerability information by using a classification model, and finally realizing the classification task of the invention.
Further, the intelligent contract security detection method further comprises:
(1) using an open source code training word2vec model, packaging the open source code into an intelligent contract function, and establishing a training data set through manual marking;
(2) the data set is divided into 8: 1: 1, sorting the training set, the verification set and the test set;
(3) processing the source code file by using an AST analysis tool developed by Go language to generate an AST graph structure of the source file;
(4) saving the generated diagram structure into a file, wherein the generated program diagram structure file corresponds to the source file;
(5) training the graph structure file in the step (4) by using a word2vec algorithm;
(6) training a neural network vulnerability detection model of the training diagram to finish classification detection.
Further, in step (4), the program graph is constructed by grammar nodes and grammar marks in the AST; the standard AST node only has one edge and is used for representing the parent-child relationship between two AST nodes; adding a plurality of edges including a protection edge, a jump and a final dictionary to the AST by the model, and recording the edge of each relational graph by using an adjacency matrix for acquiring additional grammar, data and control information; for each edge, additionally adding a backward edge for propagating information in the relational graph; wherein, the non-terminal in the grammar node language grammar in the AST comprises AST nodes declared by if statements or functions; the grammar tag is a terminal, including an identifier name and a constant value.
Further, in the step (5), the graph structure file is trained by using a word2vec algorithm by calling a Gensim library; the word2vec network maps the nodes and tokens of each program graph into a vector, so that words with close context in the source code are in close proximity to each other in vector space.
Dividing a source code file by taking a function as a unit, processing the functions one by using a trained word2vec model to realize the initialization of node vectors of each function graph, creating a data flow and a control flow graph, and independently storing graph nodes and graph structures of the functions into files, wherein the files are marked by 0 and 1 to judge whether the functions contain bugs, and the files are used as input data of a graph neural network.
Further, in step (6), the training graph neural network vulnerability detection model includes:
learning a multi-relation graph by using a neighborhood aggregation algorithm, expressing each node of the relation graph into a vector containing 100 features by using a GGNN model, and updating the embedding of the nodes by using a neighborhood aggregation scheme; 100-dimensional embedding vector h of graph node vvIs computed by the embedding layer by recursively aggregating and transforming the representation vectors of its neighboring nodes; the nodes exchange information by sending their current state, i.e. the embedded vector, as a message to all neighbors along the edge; at each node, the messages are aggregated for updating the associated node representation at the next embedding level, i.e. the next iteration; after repeating this process to update the node states for a fixed number of iterations, the primitives are aggregated into a single embedding vector using the read-out function.
Another object of the present invention is to provide an intelligent contract security detection system using the intelligent contract security detection method, the intelligent contract security detection system comprising:
the word2vec model training module is used for training the word2vec model by using an open source Go code on Github;
the contract function packaging module is used for packaging the open source Github code into an intelligent contract function according to the intelligent contract grammar;
the conversion extraction module is used for converting the packaged functions into an abstract syntax tree and extracting data flow and control flow information;
the information conversion module is used for converting data flow and control flow information of the intelligent contract into a graph model;
the graph node conversion module is used for converting the graph nodes into vectors by using the trained word2vec model;
the graph model training module is used for training the graph model by using a graph neural network;
the graph model conversion module is used for reading out all node information and converting the intelligent contract function graph model into a vector;
and the contract vulnerability judgment module is used for judging whether the function vector contains intelligent contract vulnerability information by using the classification model.
It is a further object of the invention to provide a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of:
training a word2vec model using open source Go code on Github; encapsulating open source Github codes into intelligent contract functions according to intelligent contract syntax; converting the packaged function into an abstract syntax tree, and extracting data flow and control flow information; converting data flow and control flow information of the intelligent contract into a graph model;
converting the graph nodes into vectors by using a trained word2vec model; training a graph model by using a graph neural network; reading out all node information, and converting the intelligent contract function graph model into a vector; and judging whether the function vector contains intelligent contract vulnerability information or not by using the classification model.
It is another object of the present invention to provide a computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:
training a word2vec model using open source Go code on Github; encapsulating open source Github codes into intelligent contract functions according to intelligent contract syntax; converting the packaged function into an abstract syntax tree, and extracting data flow and control flow information; converting data flow and control flow information of the intelligent contract into a graph model;
converting the graph nodes into vectors by using a trained word2vec model; training a graph model by using a graph neural network; reading out all node information, and converting the intelligent contract function graph model into a vector; and judging whether the function vector contains intelligent contract vulnerability information or not by using the classification model.
Another object of the present invention is to provide an information data processing terminal, which is used for implementing the intelligent contract security detection system.
The invention also aims to provide application of the intelligent contract security detection system in detecting the vulnerability of the HyperLegger Fabric intelligent contract.
By combining all the technical schemes, the invention has the advantages and positive effects that: the intelligent contract security detection system provided by the invention realizes vulnerability detection on the Hyperridge Fabric intelligent contract.
Compared with the existing intelligent contract vulnerability detection system, the intelligent contract vulnerability detection system and method based on Fabric provided by the invention have the following advantages that:
(1) the complexity of manually defining functions by human experts is reduced, and the safety detection efficiency of intelligent contracts is improved;
(2) the intelligent contract is learned through the graph neural network, and the model can reserve and reason more control and data flow information so as to capture more structural information of the intelligent contract vulnerability;
(3) the method has the advantages that the source codes in the open source project are used for marking, the problem that the Fabric intelligent contract data set is insufficient is solved, the marked source codes are trained by marking the source codes in the open source project, and a good effect is achieved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments of the present invention will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of an intelligent contract security detection method according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of an intelligent contract security detection method provided by an embodiment of the present invention.
FIG. 3 is a block diagram of an intelligent contract security detection system provided by an embodiment of the present invention;
in the figure: 1. a word2vec model training module; 2. a contract function encapsulation module; 3. a conversion extraction module; 4. an information conversion module; 5. a graph node conversion module; 6. a graph model training module; 7. a graph model conversion module; 8. and a contract vulnerability judgment module.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Example 1
1) Experiment the Fabric Intelligent contract Security detection model was implemented using TensorFlow-2.1.0.
2) An AST graph of the source code is constructed by means of the Go language AST packet, and different edge relations are extracted from the AST graph. I.e., traversing all of the source code's AST nodes. During the passing, all nodes are numbered sequentially, the relation between different edges is obtained according to a specific rule, and variable names are rewritten by using a uniform naming scheme. This step ensures that semantic differences such as variable names in the program do not affect the choice of token (token) embedding.
3) Py training file.
4) GGNN model training code is written within this file. The code starts with a following packet load code.
5) Py test files are created under the same directory.
6) GGNN model test code is written within this file. The code starts with a following packet load code.
import json
import os
import random
import re
import sys
import psutil
import time
from typing import Dict,Optional,Callable,Any
import nni
import jsonlines
import numpy as np
import tensorflow as tf
from tensorflow.python.training.tracking impor tdata_structuresas tf_data_structures
from dpu_utils.utils import RichPath
from tensorflow_core.python.keras import Sequential
from tensorflow_core.python.keras.layers import Dense
7) Training and testing were performed on a Huacheng cloud server equipped with GPU:1 × v100NV32, CPU:8 core 64 GiB.
8) And dividing the training samples into a plurality of batches according to the vulnerability types, and respectively training the GGNN model, wherein each batch of samples consists of a positive sample and a negative sample.
9) In order to minimize the distance between the two probability distributions for the predicted and true values, the cross-entropy loss is chosen as the objective function.
10) The model uses minimum batch Stochastic Gradient Descent (SGD) and Adam algorithm with a learning rate of 0.001. Training will terminate when the loss is less than 0.005 or a maximum of 100 training periods is reached.
11) All methods on their respective datasets were evaluated using quintuple cross-validation. The standard method is used to evaluate the generalization ability of the predictive model.
12) The following four standard indices were selected as evaluation indices:
accuracy (Accuracy): ratio of correctly labeled cases to total number of test cases.
Precision (Precision): the ratio of correctly predicted samples to the total number of samples predicted to have a particular label.
Recall (Recall): the ratio of correctly predicted samples to the total number of test samples belonging to a class.
F1 score: average values of Precision and Recall, calculated as: 2 × (Recall × Precision)/(Recall + Precision). The index facilitates testing vulnerability type distribution.
13) In order to verify the model accuracy of the neural network based on the gated graph, the invention designs a graph embedding model which does not contain control flow and data flow in the testing process to carry out comparison testing with the model of the embodiment. The model of the present embodiment is represented by GGNN-CP, and the model not containing control flow and data flow is represented by GGNN-nocP. The results are shown below for each evaluation index:
according to test results, the accuracy and the distinguishing capability of the GGNN-CP model to negative samples are higher than those of the GGNN-noCP model, and the fact that the GGNN-CP has better source code representation capability than the GGNN-noCP shows that the GGNN-CP can represent more source code internal relations.
14) The detection result of the sample is visualized by using t-SNE (t-partitioned stored fluorescence Neighbor Embedding), and the detection effect of the GGNN-CP is obviously superior to that of the GGNN-nocP.
Aiming at the problems in the prior art, the invention provides an intelligent contract security detection method, a system, equipment, a terminal and application thereof, and the invention is described in detail below with reference to the accompanying drawings.
As shown in fig. 1, the intelligent contract security detection method provided by the embodiment of the present invention includes the following steps:
s101, using an open source Go code on Github to train a word2vec model;
s102, encapsulating open source Github codes into intelligent contract functions according to intelligent contract syntax;
s103, converting the packaged functions into an abstract syntax tree, and extracting data stream and control stream information;
s104, converting data flow and control flow information of the intelligent contract into a graph model;
s105, converting the graph nodes into vectors by using the trained word2vec model;
s106, training the graph model by using a graph neural network;
s107, reading out all node information, and converting the intelligent contract function graph model into a vector;
and S108, judging whether the function vector contains intelligent contract vulnerability information or not by using the classification model.
A schematic diagram of an intelligent contract security detection method provided by an embodiment of the present invention is shown in fig. 2.
As shown in fig. 3, the intelligent contract security detection system provided in the embodiment of the present invention includes:
a word2vec model training module 1, configured to train a word2vec model using an open source Go code on Github;
the contract function packaging module 2 is used for packaging the open source Github code into an intelligent contract function according to the intelligent contract grammar;
the conversion extraction module 3 is used for converting the packaged functions into abstract syntax trees and extracting data flow and control flow information;
the information conversion module 4 is used for converting data flow and control flow information of the intelligent contract into a graph model;
a graph node conversion module 5, configured to convert a graph node into a vector using a trained word2vec model;
the graph model training module 6 is used for training a graph model by using a graph neural network;
the graph model conversion module 7 is used for reading out all node information and converting the intelligent contract function graph model into a vector;
and the contract vulnerability judging module 8 is used for judging whether the function vector contains intelligent contract vulnerability information by using the classification model.
The technical solution of the present invention will be further described with reference to the following examples.
The invention aims to provide a hyper-hedger Fabric intelligent contract security detection system based on a graph neural network, which realizes vulnerability detection on a hyper-hedger Fabric intelligent contract.
The method for detecting the vulnerability of the neural network based on the graph applied to the HyperLegger Fabric comprises the following steps:
firstly, the model firstly trains a word2vec model by using open source codes, then the open source codes are packaged into intelligent contract functions, and a training data set is established through manual marking.
And secondly, for the data set in the first step, the data set is divided into 8 parts: 1: 1, sorting the training set, the verification set and the test set;
thirdly, processing the source code file by using an AST analysis tool developed by Go language to generate an AST graph structure of the source file;
and fourthly, storing the graph structure generated in the third step into a file, wherein the generated program graph structure file corresponds to the source file one by one. The program graph is constructed from grammar nodes in the AST (i.e., non-terminals in the language grammar, such as AST nodes declared by if statements or functions) and grammar tokens (terminals, such as identifier names and constant values). The standard AST node has only one edge to represent the parent-child relationship between two AST nodes. To obtain additional syntax, data, and control information, the model adds a number of edges to the AST, such as guard edges, jumps, last dictionaries, etc., records the edges of each relationship graph using an adjacency matrix, and adds, for each edge, an additional backward edge that helps propagate information in the relationship graph.
And fifthly, training the structure file of the four graphs in the step by using a word2vec algorithm, wherein the method is realized by calling a Gensim library. The word2vec network maps the nodes and tokens of each program graph into a vector, so that words in the source code that are closely related contextually are in close proximity to each other in vector space. In the embodiment, a source code file is divided by taking a function as a unit, the functions are processed one by using a trained word2vec model, the initialization of node vectors of each function graph is realized, a data stream is created, the flow graph is controlled, graph nodes and graph structures of the functions are independently stored in files, the files are marked by 0 and 1 to judge whether the functions contain bugs, and the files are used as input data of a graph neural network.
And sixthly, training a neural network vulnerability detection model of the diagram to finish classification detection. The invention uses a neighborhood aggregation algorithm to learn a multi-relation graph, each node of the relation graph is represented into a vector containing 100 features by the GGNN model, and the embedding of the nodes is updated through a neighborhood aggregation scheme. 100-dimensional embedding vector h of graph node vvIs computed by the embedding layer by recursively aggregating and transforming the representation vectors of its neighboring nodes. Nodes exchange information by sending their current state (i.e., the embedded vector) as a message to all neighbors along the edge. At each node, the messages are aggregated and then used to update the associated node representation at the next embedding level (i.e., the next iteration). After repeating this process to update the node states for a fixed number of iterations, the primitives are aggregated into a single embedding vector using the read-out function.
Compared with the existing intelligent contract vulnerability detection system, the intelligent contract vulnerability detection system and method based on Fabric provided by the embodiment of the invention have the following advantages that by applying the method of combining the neural network of the graph:
1. the complexity of manually defining functions by human experts is reduced, and the safety detection efficiency of intelligent contracts is improved;
2. the intelligent contract is learned through the graph neural network, and the model can reserve and reason more control and data flow information so as to capture more structural information of the intelligent contract vulnerability;
3. the method has the advantages that the source codes in the open source project are used for marking, the problem that the Fabric intelligent contract data set is insufficient is solved, the marked source codes are trained by marking the source codes in the open source project, and a good effect is achieved.
The graph embedding model accuracy of the invention reaches 0.942, the accuracy is 0.893, the recall rate is 1, the F1 score is 0.943, and the graph embedding model accuracy without control flow and data flow is 0.915, the accuracy is 0.846, the recall rate is 1, and the F1 score is 0.917.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When used in whole or in part, can be implemented in a computer program product that includes one or more computer instructions. When loaded or executed on a computer, cause the flow or functions according to embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL), or wireless (e.g., infrared, wireless, microwave, etc.)). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.
Claims (10)
1. An intelligent contract security detection method is characterized by comprising the following steps:
step one, using open source Go code on Github to train word2vec model;
step two, encapsulating open source Github codes into intelligent contract functions according to intelligent contract syntax;
converting the packaged function into an abstract syntax tree, and extracting data flow and control flow information;
step four, converting the data flow and control flow information of the intelligent contract into a graph model;
step five, converting the graph nodes into vectors by using a trained word2vec model;
step six, using a graph neural network to train a graph model;
reading out all node information, and converting the intelligent contract function graph model into a vector;
and step eight, judging whether the function vector contains intelligent contract vulnerability information by using a classification model.
2. The intelligent contract security detection method of claim 1, further comprising:
(1) using an open source code training word2vec model, packaging the open source code into an intelligent contract function, and establishing a training data set through manual marking;
(2) the data set is divided into 8: 1: 1, sorting the training set, the verification set and the test set;
(3) processing the source code file by using an AST analysis tool developed by Go language to generate an AST graph structure of the source file;
(4) saving the generated diagram structure into a file, wherein the generated program diagram structure file corresponds to the source file;
(5) training the graph structure file in the step (4) by using a word2vec algorithm;
(6) training a neural network vulnerability detection model of the training diagram to finish classification detection.
3. The smart contract security detection method of claim 2, wherein in step (4), the program graph is constructed from grammar nodes and grammar tags in the AST; the standard AST node only has one edge and is used for representing the parent-child relationship between two AST nodes; adding a plurality of edges including a protection edge, a jump and a final dictionary to the AST by the model, and recording the edge of each relational graph by using an adjacency matrix for acquiring additional grammar, data and control information; for each edge, additionally adding a backward edge for propagating information in the relational graph; wherein, the non-terminal in the grammar node language grammar in the AST comprises AST nodes declared by if statements or functions; the grammar tag is a terminal, including an identifier name and a constant value.
4. The intelligent contract security detection method of claim 2, wherein in step (5), the training of the graph structure file using the word2vec algorithm is implemented by calling a Gensim library; the word2vec network maps the nodes and marks of each program graph into a vector, so that words with close context relation in a source code are adjacent to each other in a vector space;
dividing a source code file by taking a function as a unit, processing the functions one by using a trained word2vec model to realize the initialization of node vectors of each function graph, creating a data flow and a control flow graph, and independently storing graph nodes and graph structures of the functions into files, wherein the files are marked by 0 and 1 to judge whether the functions contain bugs, and the files are used as input data of a graph neural network.
5. The intelligent contract security detection method according to claim 2, wherein in step (6), the training graph neural network vulnerability detection model comprises:
learning a multi-relation graph by using a neighborhood aggregation algorithm, expressing each node of the relation graph into a vector containing 100 features by using a GGNN model, and updating the embedding of the nodes by using a neighborhood aggregation scheme; 100-dimensional embedding vector h of graph node vvIs computed by the embedding layer by recursively aggregating and transforming the representation vectors of its neighboring nodes; the nodes exchange information by sending their current state, i.e. the embedded vector, as a message to all neighbors along the edge; at each node, the messages are aggregated for updating the associated node representation at the next embedding level, i.e. the next iteration; in repeating this processAfter the routine updates the node state with a fixed number of iterations, the graph is aggregated into a single embedding vector using the read function.
6. An intelligent contract security detection system for implementing the intelligent contract security detection method according to any one of claims 1 to 5, wherein the intelligent contract security detection system comprises:
the word2vec model training module is used for training the word2vec model by using an open source Go code on Github;
the contract function packaging module is used for packaging the open source Github code into an intelligent contract function according to the intelligent contract grammar;
the conversion extraction module is used for converting the packaged functions into an abstract syntax tree and extracting data flow and control flow information;
the information conversion module is used for converting data flow and control flow information of the intelligent contract into a graph model;
the graph node conversion module is used for converting the graph nodes into vectors by using the trained word2vec model;
the graph model training module is used for training the graph model by using a graph neural network;
the graph model conversion module is used for reading out all node information and converting the intelligent contract function graph model into a vector;
and the contract vulnerability judgment module is used for judging whether the function vector contains intelligent contract vulnerability information by using the classification model.
7. A computer device, characterized in that the computer device comprises a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the steps of:
training a word2vec model using open source Go code on Github; encapsulating open source Github codes into intelligent contract functions according to intelligent contract syntax; converting the packaged function into an abstract syntax tree, and extracting data flow and control flow information; converting data flow and control flow information of the intelligent contract into a graph model;
converting the graph nodes into vectors by using a trained word2vec model; training a graph model by using a graph neural network; reading out all node information, and converting the intelligent contract function graph model into a vector; and judging whether the function vector contains intelligent contract vulnerability information or not by using the classification model.
8. A computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:
training a word2vec model using open source Go code on Github; encapsulating open source Github codes into intelligent contract functions according to intelligent contract syntax; converting the packaged function into an abstract syntax tree, and extracting data flow and control flow information; converting data flow and control flow information of the intelligent contract into a graph model;
converting the graph nodes into vectors by using a trained word2vec model; training a graph model by using a graph neural network; reading out all node information, and converting the intelligent contract function graph model into a vector; and judging whether the function vector contains intelligent contract vulnerability information or not by using the classification model.
9. An information data processing terminal characterized by being used for implementing the intelligent contract security detection system according to claim 6.
10. An application of the intelligent contract security detection system of claim 6 in the detection of the vulnerability of the HyperhedgerFabric intelligent contracts.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110862067.6A CN113626826A (en) | 2021-07-29 | 2021-07-29 | Intelligent contract security detection method, system, equipment, terminal and application |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110862067.6A CN113626826A (en) | 2021-07-29 | 2021-07-29 | Intelligent contract security detection method, system, equipment, terminal and application |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113626826A true CN113626826A (en) | 2021-11-09 |
Family
ID=78381463
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110862067.6A Pending CN113626826A (en) | 2021-07-29 | 2021-07-29 | Intelligent contract security detection method, system, equipment, terminal and application |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113626826A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114465887A (en) * | 2021-12-23 | 2022-05-10 | 杭州溪塔科技有限公司 | Method and device for block chain configuration management based on git |
CN115080981A (en) * | 2022-06-22 | 2022-09-20 | 东北大学 | Intelligent contract vulnerability detection method based on local and sequence feature fusion |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109697162A (en) * | 2018-11-15 | 2019-04-30 | 西北大学 | A kind of software defect automatic testing method based on Open Source Code library |
CN110674503A (en) * | 2019-09-24 | 2020-01-10 | 杭州云象网络技术有限公司 | Intelligent contract endless loop detection method based on graph convolution neural network |
CN111259394A (en) * | 2020-01-15 | 2020-06-09 | 中山大学 | Fine-grained source code vulnerability detection method based on graph neural network |
CN111488582A (en) * | 2020-04-01 | 2020-08-04 | 杭州云象网络技术有限公司 | Intelligent contract reentry vulnerability detection method based on graph neural network |
KR20200094618A (en) * | 2019-01-30 | 2020-08-07 | 주식회사 린아레나 | Method for auditing source code using smart contract similarity analysis and apparatus thereof |
CN112035842A (en) * | 2020-08-17 | 2020-12-04 | 杭州云象网络技术有限公司 | Intelligent contract vulnerability detection interpretability method based on codec |
CN112035841A (en) * | 2020-08-17 | 2020-12-04 | 杭州云象网络技术有限公司 | Intelligent contract vulnerability detection method based on expert rules and serialized modeling |
WO2021037196A1 (en) * | 2019-08-28 | 2021-03-04 | 杭州趣链科技有限公司 | Smart contract code vulnerability detection method and apparatus, computer device and storage medium |
US11036614B1 (en) * | 2020-08-12 | 2021-06-15 | Peking University | Data control-oriented smart contract static analysis method and system |
CN113127933A (en) * | 2021-03-22 | 2021-07-16 | 西北大学 | Intelligent contract Pompe fraudster detection method and system based on graph matching network |
CN113157723A (en) * | 2021-04-06 | 2021-07-23 | 福州大学 | SQL access method for Hyperridge Fabric |
-
2021
- 2021-07-29 CN CN202110862067.6A patent/CN113626826A/en active Pending
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109697162A (en) * | 2018-11-15 | 2019-04-30 | 西北大学 | A kind of software defect automatic testing method based on Open Source Code library |
KR20200094618A (en) * | 2019-01-30 | 2020-08-07 | 주식회사 린아레나 | Method for auditing source code using smart contract similarity analysis and apparatus thereof |
WO2021037196A1 (en) * | 2019-08-28 | 2021-03-04 | 杭州趣链科技有限公司 | Smart contract code vulnerability detection method and apparatus, computer device and storage medium |
CN110674503A (en) * | 2019-09-24 | 2020-01-10 | 杭州云象网络技术有限公司 | Intelligent contract endless loop detection method based on graph convolution neural network |
CN111259394A (en) * | 2020-01-15 | 2020-06-09 | 中山大学 | Fine-grained source code vulnerability detection method based on graph neural network |
CN111488582A (en) * | 2020-04-01 | 2020-08-04 | 杭州云象网络技术有限公司 | Intelligent contract reentry vulnerability detection method based on graph neural network |
US11036614B1 (en) * | 2020-08-12 | 2021-06-15 | Peking University | Data control-oriented smart contract static analysis method and system |
CN112035842A (en) * | 2020-08-17 | 2020-12-04 | 杭州云象网络技术有限公司 | Intelligent contract vulnerability detection interpretability method based on codec |
CN112035841A (en) * | 2020-08-17 | 2020-12-04 | 杭州云象网络技术有限公司 | Intelligent contract vulnerability detection method based on expert rules and serialized modeling |
CN113127933A (en) * | 2021-03-22 | 2021-07-16 | 西北大学 | Intelligent contract Pompe fraudster detection method and system based on graph matching network |
CN113157723A (en) * | 2021-04-06 | 2021-07-23 | 福州大学 | SQL access method for Hyperridge Fabric |
Non-Patent Citations (3)
Title |
---|
ZHEN YANG ET AL.: "A Multi-Modal Transformer-based Code Summarization Approach for Smart Contracts", 《IEEE XPLORE》, 28 June 2021 (2021-06-28) * |
杨晓宙: "基于Fabric区块链的智能合约协同开发系统", 《南京信息工程大学学报(自然科学版)》, vol. 11, no. 5, 31 May 2019 (2019-05-31) * |
陈肇炫;邹德清;李珍;金海;: "基于抽象语法树的智能化漏洞检测系统", 信息安全学报, no. 04, 15 July 2020 (2020-07-15) * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114465887A (en) * | 2021-12-23 | 2022-05-10 | 杭州溪塔科技有限公司 | Method and device for block chain configuration management based on git |
CN114465887B (en) * | 2021-12-23 | 2024-01-23 | 杭州溪塔科技有限公司 | Block chain configuration management method and device based on git |
CN115080981A (en) * | 2022-06-22 | 2022-09-20 | 东北大学 | Intelligent contract vulnerability detection method based on local and sequence feature fusion |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111783100B (en) | Source code vulnerability detection method for code graph representation learning based on graph convolution network | |
CN112818023B (en) | Big data analysis method and cloud computing server in associated cloud service scene | |
CN113626826A (en) | Intelligent contract security detection method, system, equipment, terminal and application | |
CN112580328A (en) | Event information extraction method and device, storage medium and electronic equipment | |
CN113486357A (en) | Intelligent contract security detection method based on static analysis and deep learning | |
CN112464233B (en) | RNN-based malicious software detection method on cloud platform | |
CN112925914A (en) | Data security classification method, system, device and storage medium | |
CN112507912A (en) | Method and device for identifying illegal picture | |
CN115204886A (en) | Account identification method and device, electronic equipment and storage medium | |
CN116992052B (en) | Long text abstracting method and device for threat information field and electronic equipment | |
CN117453917A (en) | Model training method and device, storage medium and electronic equipment | |
CN117370980A (en) | Malicious code detection model generation and detection method, device, equipment and medium | |
CN117291722A (en) | Object management method, related device and computer readable medium | |
CN116663018A (en) | Vulnerability detection method and device based on code executable path | |
CN107463578A (en) | Using download statistics De-weight method, device and terminal device | |
CN117056919A (en) | Software vulnerability detection method and system based on deep learning | |
CN115622810A (en) | Business application identification system and method based on machine learning algorithm | |
CN115935358A (en) | Malicious software identification method and device, electronic equipment and storage medium | |
CN115905293A (en) | Switching method and device of job execution engine | |
CN112765236B (en) | Adaptive abnormal equipment mining method, storage medium, equipment and system | |
CN115473734A (en) | Remote code execution attack detection method based on single classification and federal learning | |
CN114912628A (en) | Feature selection method and device, electronic equipment and computer-readable storage medium | |
CN115293872A (en) | Method for establishing risk identification model and corresponding device | |
CN116090538A (en) | Model weight acquisition method and related system | |
CN112132367A (en) | Modeling method and device for enterprise operation management risk identification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |