CN113486357B - Intelligent contract security detection method based on static analysis and deep learning - Google Patents

Intelligent contract security detection method based on static analysis and deep learning Download PDF

Info

Publication number
CN113486357B
CN113486357B CN202110766768.XA CN202110766768A CN113486357B CN 113486357 B CN113486357 B CN 113486357B CN 202110766768 A CN202110766768 A CN 202110766768A CN 113486357 B CN113486357 B CN 113486357B
Authority
CN
China
Prior art keywords
matrix
abstract
source program
intelligent contract
deep learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110766768.XA
Other languages
Chinese (zh)
Other versions
CN113486357A (en
Inventor
周福才
罗熙霖
焦梓
孙劲桐
Original Assignee
东北大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 东北大学 filed Critical 东北大学
Priority to CN202110766768.XA priority Critical patent/CN113486357B/en
Publication of CN113486357A publication Critical patent/CN113486357A/en
Application granted granted Critical
Publication of CN113486357B publication Critical patent/CN113486357B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/563Static detection by source code analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/42Syntactic analysis
    • G06F8/425Lexical analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Stored Programmes (AREA)

Abstract

The invention discloses an intelligent contract security detection method based on static analysis and deep learning, and relates to the technical field of block chain intelligent contract security. Performing static analysis on the intelligent contract solubility source program to obtain a graph structure of the intelligent contract solubility source program; extracting abstract facts from the graph structure; according to abstract facts of the solubility source program, a deep learning model for performing vulnerability classification on the solubility source program is built, and the method comprises the following steps: the device comprises an input module, an attention module, a residual error connection module and an output module; constructing a training data set; training the deep learning model by utilizing the training data set; and performing vulnerability detection on the input intelligent contracts by using the trained deep learning model, and outputting a security detection result of the intelligent contract solubility source program. The method can comprehensively analyze the behaviors of the intelligent contract solubility source program and improve the accuracy of the security detection of the intelligent contract solubility source program.

Description

Intelligent contract security detection method based on static analysis and deep learning
Technical Field
The invention relates to the technical field of intelligent contract security of blockchain, in particular to an intelligent contract security detection method based on static analysis and deep learning.
Background
Smart Contract (Smart Contract) is a special protocol deployed in the blockchain. Buterin determines the applicability of decentralized computing outside of the transaction and designs an ethernet blockchain that supports execution of smart contracts. The smart contract contains code functions whose functions include trading, decision making, sending ethernet, etc. Smart contracts have proven suitable for many applications, including securities, communications, banking, medical, and other fields. However, smart contracts are characterized by transparency in that participants can view the source code of the smart contract. Moreover, the intelligent contract has the characteristic that once deployed, the intelligent contract cannot be changed, so that software update cannot be performed in time after the intelligent contract discovers the loopholes, and loss can be reduced only by means of suspending transactions or forking and the like. If the security detection is not performed on the intelligent contract, the intelligent contract cannot be repaired in time, so that normal use of the intelligent contract function is affected, and even the interests of the intelligent contract user can be damaged to cause serious consequences. Such as DAO attack event: anonymous hackers use reentrant vulnerabilities of intelligent contracts to fool 360 ten thousand ethernet coins; parity crack event: the deliberate destructor finds out the timestamp loophole in the intelligent contract code base, utilizes the problem of inconsistent timestamp, destroys the code base, and causes the loss of 1.5 hundred million dollars; malicious contract event: five hackers maliciously issued 34000 problematic intelligent contracts, resulting in complex ethernet, and an abnormal chain reaction was generated, resulting in theft of ethernet dollars worth 440 ten thousand dollars. In the situation of such severe security threats, there is currently no better universal means to detect smart contract vulnerabilities, and smart contract security is still largely dependent on the security technology level of contract developers and code auditing based on expert experience. Therefore, a scheme for effectively and automatically detecting the security of intelligent contracts is needed to be proposed. The existing automated security inspection has the following problems: 1. the intelligent contract code cannot be subjected to full coverage analysis, 2. The false alarm rate of security detection is high, 3. Only specific attacks are concerned, and the method is not easy to expand to detection of other attacks.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides an intelligent contract safety detection method based on static analysis and deep learning, which aims to solve the problem of intelligent contract safety detection.
The technical scheme of the invention is as follows:
1. an intelligent contract safety detection method based on static analysis and deep learning is characterized by comprising the following steps:
step 1: performing static analysis on the intelligent contract solubility source program to obtain a graph structure of the intelligent contract solubility source program; the static analysis comprises lexical analysis and grammar analysis; the graph structure comprises an abstract syntax tree AST and a control flow graph CFG;
step 2: extracting abstract facts from the map structure of the solubility source program obtained in the step 1;
step 3: according to the abstract facts of the solubility source program obtained in the step 2, a deep learning model for performing vulnerability classification on the solubility source program is built, and the deep learning model comprises: the device comprises an input module, an attention module, a residual error connection module and an output module;
step 4: constructing a training data set of the deep learning model;
step 5: training the deep learning model by utilizing the training data set;
step 6: and performing vulnerability detection on the input intelligent contracts by using the trained deep learning model, and outputting a security detection result of the intelligent contract solubility source program.
Further, according to the intelligent contract security detection method based on static analysis and deep learning, the step 1 specifically includes the following steps:
step 1.1: preprocessing the intelligent contract solubility source program, and deleting all contents irrelevant to the safety detection of the solubility source program;
step 1.2: importing a source code file corresponding to an import statement into the preprocessed intelligent contract visibility source program to obtain a complete visibility source program;
step 1.3: for a complete solubility source program, converting the solubility source program into an abstract syntax tree by using an ANTLR analyzer;
step 1.4: constructing a control flow graph CFG of the solubility source program according to the abstract syntax tree.
Further, according to the intelligent contract security detection method based on static analysis and deep learning, the step 1.3 specifically includes the following steps:
step 1.3.1: performing lexical analysis on the complete solubility source program by using an ANTLR analyzer, and marking the attribute of the words in the solubility source program according to the predefined word attribute category to obtain a word sequence with word attribute marks corresponding to each program statement;
step 1.3.2: for word sequences corresponding to each program statement generated by lexical analysis, using an ANTLR analyzer to carry out grammar analysis, and determining the grammar structure of each program statement according to a predefined grammar rule; the grammar structure comprises a contract structure, a function structure, a variable structure, an expression structure and a statement control flow structure;
step 1.3.3: according to the grammar structure of each program statement, the soldity source program is converted into an abstract grammar tree by using an ANTLR analyzer.
Further, according to the intelligent contract security detection method based on static analysis and deep learning, the word attribute categories include a keyword < keyword >, a visibility definer < qualitier >, a variable data type < variabletype >, an identifier < identifier >, an operator, and a constant.
Further, according to the intelligent contract security detection method based on static analysis and deep learning, the predefined grammar rules are as follows:
a)Contract::=”contract”<identifier>”{”[contractBlock]”}”;
b)ContractBlock::=[Function]|[Variable];
c)Function::=”function”<identifier>”(”[Variable]”)”<qualifier>[keyword][Return][”;”|Block];
d)Variable::=<variabletype><qualifier><identifier>[”=”Expression]”;”;
e)Expression::=Functioncall|<identifier>|Expression<operator>|Expression<operator>Expres sion|<identifier><operator><constant>;
f)Functioncall::=Expression”(”Variable”)”;
g)Block::=”{”Statement”}”;
h)Statement::=IfStatement|WhileStatment|ForStatement|Variable|Expression|Block|”break”|”continue”|Return;
i)IfStatement::=”if””(”Expression”)”Block[”else”Block];
j)WhileStatment::=”while””(”Expression”)”Block;
k)ForStatement::=”for””(”[Variable]”;”[Expression]”;”[Expression]”)”Block;
l)Return::=”return”[Expression]。
further, according to the intelligent contract security detection method based on static analysis and deep learning, the step 1.4 specifically includes the following steps:
step 1.4.1: constructing different basic blocks Block according to Block nodes in an abstract syntax tree AST by utilizing program sentences belonging to a sentence control flow structure, recording sentence numbers StmtId of each sentence in each basic Block, and recording the in-edge and out-edge of each basic Block;
step 1.4.2: connecting different basic blocks, when the outgoing edge of one basic block is equal to the incoming edge of the other basic block, connecting the two basic blocks, and when the number of the outgoing edges of one basic block is greater than 1, recording the jump condition of the basic block;
step 1.4.3: the number VarId of the variable in the program statement and the assignment operation assignment are recorded by using a static single assignment form, namely, one variable only carries out one assignment operation, and the variable name of the variable carrying out secondary assignment is modified.
Further, according to the intelligent contract security detection method based on static analysis and deep learning, the abstract facts comprise all control flow information, data information and function information of the intelligent contract, and are written by using a datalog language, and the abstract facts have the following structural form:
the predicate is a corresponding predicate name defined according to a solubility source program structure, and comprises a data type, a function type, an expression structure and a control flow structure; arg 1..argn is another parameter related to the content of a concrete solubility program sentence.
Further, according to the intelligent contract security detection method based on static analysis and deep learning, the method for extracting abstract facts from the graph structure of the solubility source program obtained in the step 1 is as follows: traversing the graph structure of the solubility source program, and extracting abstract facts of the solubility source program according to keyword matching.
Further, according to the intelligent contract security detection method based on static analysis and deep learning, the step 3 specifically includes the following steps:
step 3.1: building an input module: the abstract facts obtained in the step 2 are represented by a 0-1 coding matrix X, word embedding processing and position embedding processing are respectively carried out on the abstract facts represented by the 0-1 coding matrix X, and an E matrix obtained by splicing a matrix obtained after the word embedding processing and a matrix obtained after the position embedding processing is used as input of an attention module;
step 3.2: the method for building the attention module specifically comprises the following steps of:
step 3.2.1: respectively obtaining a Q matrix, a K matrix and a V matrix of the abstract facts through three linear changes of the E matrix, and obtaining an attention coefficient matrix A of the abstract facts according to a formula (4);
A=QK T (4)
the Q matrix is a Query matrix of abstract facts and consists of Query vectors corresponding to each word of each abstract fact; the K matrix is a Key matrix of abstract facts and consists of Key vectors corresponding to each word of each abstract fact; the V matrix is a Value matrix of the abstract facts and consists of Value vectors corresponding to each word of each abstract fact;
step 3.2.2: updating element values in the V matrix according to the attention coefficient matrix A of the abstract facts and the formula (5) to obtain an updated V matrix V';
where dk represents the arithmetic sum of squares of the K matrix; the softmax function is the activation function;
step 3.2.3: a layer normalization mechanism is added into a matrix V 'of the attention module, so that elements in the matrix V' are more standard, convergence is accelerated, and stability of feature distribution is ensured;
step 3.3: building a residual connection module, wherein a matrix calculation formula of the residual connection module is as follows:
Z=H(E)=E+F(E)=E+V″ (9)
wherein, the matrix E is the input of the attention module; v' is the output of the attention module; z is the output of the residual error connection module; f is a residual function, in the attention module, a mapping H (E) -Z is obtained through back propagation, and if no residual connection module exists, F (E) -0;
step 3.4: building an output module to output the possible vulnerability probability of abstract facts, wherein the concrete steps of building the output module are as follows:
step 3.4.1: defining a vulnerability class output formula shown in formula (10) for outputting abstract fact vulnerability class results of intelligent contracts;
P k =softmax(Linear(Z)) (10)
wherein Linear represents a Linear function, and performs Linear transformation on the matrix Z once; p (P) k Probability values for different vulnerability types;
step 3.4.2: and constructing a loss function of the deep learning model, so that the model has vulnerability classification capability.
Further, according to the intelligent contract security detection method based on static analysis and deep learning, the loss function is a multi-category cross entropy loss function shown in formula (11):
Loss 1 =-∑ k y k log(P k ) (11)
wherein y is k And representing a one-hot coded label corresponding to the abstract fact, wherein k represents a vulnerability class corresponding to the abstract fact.
Compared with the prior art, the invention has the following beneficial effects:
1. the behavior of the intelligent contract solubility source program can be comprehensively analyzed. Security detection of intelligent contracts first requires a comprehensive analysis of their code behavior. In the method, firstly, an abstract syntax tree and a control flow diagram of an intelligent contract similarity source program are analyzed, then, the diagram structure is abstracted into fact representation, the abstract fact can cover code behaviors more comprehensively, semantic features in the program are effectively represented, and support is provided for a deep learning model machine.
2. The expandability of the intelligent contract security source program security detection is enhanced. Traditional security detection methods are based mainly on predefined rules, focusing only on known security vulnerabilities. The deep learning model used by the method is not limited to specific security holes, and the model can be trained by supplementing the training set so as to achieve the detection of various security holes and be easily expanded. In addition, on the aspect of security detection of unknown vulnerabilities, the method can have the capability of detecting the vulnerabilities only by training the model again, and has good expandability for detecting the security vulnerabilities compared with the traditional security detection method.
3. The accuracy of the intelligent contract security source program safety detection is improved. In the method, two methods of static analysis and deep learning are combined to carry out security detection on intelligent contracts, an existing deep learning model is improved, an attention module is added to learn key information in abstract facts, the accuracy of security detection classification is effectively improved on the basis of improving vectorization characterization of the abstract facts, and the false alarm rate of security holes is effectively reduced.
Drawings
FIG. 1 is a flow chart of the intelligent contract security detection method based on static analysis and deep learning of the present invention;
FIG. 2 is an abstract syntax tree diagram of example code in an embodiment of the invention;
FIG. 3 is a schematic diagram of a deep learning model structure in an embodiment of the invention;
fig. 4 is a schematic diagram of an attention module according to an embodiment of the invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and detailed description. The following examples are only illustrative of the present invention, but limit the scope of the present invention.
FIG. 1 is a flow chart of the intelligent contract security inspection method based on static analysis and deep learning of the present invention, which comprises the following steps:
step 1: performing static analysis on the intelligent contract solubility source program to obtain a graph structure of the intelligent contract solubility source program; the static analysis comprises lexical analysis and grammar analysis; the graph structure includes an abstract syntax tree (Abstract Syntax Tree, AST) and a control flow graph (Control Flow Graph, CFG).
Step 1.1: preprocessing the intelligent contract solubility source program, and deleting all contents irrelevant to the safety detection of the solubility source program;
in a preferred embodiment, the preprocessing performed on the intelligent contract solubility source program includes deleting a single line of notes "//", a plurality of lines of notes "//// …/", a space "", a carriage return "\n", and all content unrelated to the security detection of the solubility source program.
Step 1.2: and importing a source code file corresponding to the import statement into the preprocessed intelligent contract solubility source program to obtain a complete solubility source program.
Step 1.3: for a complete solubility source program, an ANTLR analyzer is used to convert the solubility source program into an abstract syntax tree.
Step 1.3.1: performing lexical analysis on the complete solubility source program by using an ANTLR analyzer, and marking the attribute of the words in the solubility source program according to the predefined word attribute category to obtain a word sequence with word attribute marks corresponding to each program statement;
the word attribute categories include a keyword < keyword >, a visibility definer < qualiteier >, a variable data type < variabletype >, an identifier < identifier >, an operator > and a constant.
Step 1.3.2: for word sequences corresponding to each program statement generated by lexical analysis, using an ANTLR analyzer to carry out grammar analysis, and determining the grammar structure of each program statement according to a predefined grammar rule; the grammar structure comprises a contract structure, a function structure, a variable structure, an expression structure and a statement control flow structure;
in a preferred embodiment, the syntax rules predefined using BNF (Backus-Naur Form, backus-Van) are as follows, according to the solubility language characteristics:
m)Contract::=”contract”<identifier>”{”[contractBlock]”}”;
n)ContractBlock::=[Function]|[Variable];
o)Function::=”function”<identifier>”(”[Variable]”)”<qualifier>[keyword][Return][”;”|Block];
p)Variable::=<variabletype><qualifier><identifier>[”=”Expression]”;”;
q)Expression::=Functioncall|<identifier>|Expression<operator>|Expression<operator>Expression|<identifier><operator><constant>;
r)Functioncall::=Expression”(”Variable”)”;
s)Block::=”{”Statement”}”;
t)Statement::=IfStatement|WhileStatment|ForStatement|Variable|Expression|Block|”break”|”continue”|Return;
u)IfStatement::=”if””(”Expression”)”Block[”else”Block];
v)WhileStatment::=”while””(”Expression”)”Block;
w)ForStatement::=”for””(”[Variable]”;”[Expression]”;”[Expression]”)”Block;
x)Return::=”return”[Expression]。
step 1.3.3: according to the grammar structure of each program statement, converting the solubility source program into an abstract grammar tree by using an ANTLR analyzer;
for example, for the code shown below, an ANTLR analyzer is used to convert it into an abstract syntax tree as shown in fig. 2.
Step 1.4: constructing a control flow graph CFG of the solubility source program according to the abstract syntax tree, wherein the concrete steps are as follows:
step 1.4.1: constructing different basic blocks Block according to Block nodes in an abstract syntax tree AST by utilizing program sentences belonging to a sentence control flow structure, recording sentence numbers StmtId of each sentence in each basic Block, and recording the in-edge and out-edge of each basic Block;
step 1.4.2: connecting different basic blocks, when the outgoing edge of one basic block is equal to the incoming edge of the other basic block, connecting the two basic blocks, and when the number of the outgoing edges of one basic block is greater than 1, recording the jump condition of the basic block;
step 1.4.3: the number VarId of the variable in the program statement and the assignment operation assignment are recorded using a static single assignment form (SSA form), i.e. one variable performs only one assignment operation, and its variable name is modified for the variable performing the secondary assignment.
For example, for the assignment operation "x=1; y=x+1; x=y; "its static single assignment form is" x1=1; y=x1+1; x2=y; the assignment operation of recording variables using a static single assignment form facilitates subsequent analysis of abstract facts.
Step 2: extracting abstract facts from the map structure of the solubility source program obtained in the step 1, specifically traversing the map structure of the solubility source program, and extracting the abstract facts of the solubility source program according to keyword matching.
The abstract facts are written in a datalog language and contain all control flow information, data information and function information of the intelligent contract, wherein the information is a key feature related to security vulnerabilities;
in a preferred embodiment, the abstract facts are structured as follows:
here, the predicate is a corresponding predicate name defined according to the solubility structure, arg1, & gt, argn is another parameter related to the statement content of the specific solubility program.
In the preferred embodiment, there are four predicate names, data type, function type, expression structure, and control flow structure, respectively. The specific predicate name definition and parameter definition are as follows:
traversing all nodes of an AST of a solubility source program, defining a predicate name VarDecl of the operation node Variable of a data type, defining a predicate name FunDecl of the operation node Function of a Function type, defining a predicate name FunCall of the Function call node Function in an expression structure, wherein the call of a special Function comprises an address correlation Function call, callcode, delegatecall, send, transfer and an error processing Function revert, assert, require, and defining the predicate name of the special Function as the original name of the special Function; the parameters are the sentence numbers, the variable numbers and the parameters of all leaf nodes corresponding to the nodes.
Traversing a control flow graph of a solubility source program, defining predicate names VarASS of assignment operation Assign among variables, defining the predicate names of sentences in the same basic Block, namely corresponding sentence numbers StmtId and related variable numbers VarId, defining the predicate names of the sentences in the same basic Block, namely a Block, the parameters of the sentences in the basic Block numbers BlockId and the sentence numbers StmtId, and defining the predicate names of the sentences as BlockPath and the parameters of the corresponding basic Block numbers BlockId when paths exist among the basic blocks.
For example, the abstract facts extracted by traversing the graph structure generated by the example code in step 1.3.2 are shown below:
VarDecl(StmtId='S00',VarId='V00',variabletype='uint',identifier='storedData')
Block(BlockId='B00',StmtId='S00')
FunDecl(identifier='set',VarId='V01',qualifier='public')
VarDecl(VarId=′V01′,variabletype=′uint′,identifier=′x′)
Block(BlockId=′B01′,StmtId=′S01′)
VarAss(StmtId=′S01′,VarId=′V00′,VarId=′V01′)
step 3: according to the abstract facts of the solubility source program obtained in the step 2, a deep learning model for performing vulnerability classification on the solubility source program is built;
in a preferred embodiment, the architecture based on the transducer model designs a deep learning model, as shown in fig. 3, comprising four modules: the device comprises an input module, an attention module, a residual error connection module and an output module. The construction process of the deep learning model comprises the following steps:
step 3.1: building an input module: carrying out vectorization preprocessing on the abstract facts obtained in the step 2, representing the input abstract facts by using a 0-1 coding matrix X, and carrying out dimension reduction processing on the abstract facts represented by the 0-1 coding matrix X because the 0-1 coding matrix X is too sparse, namely carrying out word embedding processing and position embedding processing on the abstract facts represented by the 0-1 coding matrix X, wherein the matrix obtained after the dimension reduction processing is the input required by an attention module, and specifically comprises the following steps:
step 3.1.1: performing word embedding processing on abstract facts represented by a 0-1 coding matrix X according to a formula (1) to obtain a word matrix X':
X l*d ′=tanh(X l*v W 1 ) (1)
wherein W is 1 Is a parameter matrix to be trained in the input module; l is the number of lines of the longest abstract facts in the abstract facts corresponding to different solubility source programs; v is the vocabulary size of the abstract fact; d is the term dimension after dimension reduction.
Step 3.1.2: performing position embedding processing on abstract facts represented by a 0-1 coding matrix X;
in order to ensure that the deep learning model can better acquire the position information of the abstract facts, the input module introduces a position coding mechanism of the abstract facts, namely position embedding.
In a preferred embodiment, the position information of each statement in the abstract fact is represented by a matrix P, and the matrix P is subjected to an activation function according to a formula (2) to obtain a position coding matrix P':
P l*d ′=tanh(P l*d ) (2)
the matrix P is randomly initialized before training, and a position coding matrix P' formed by position vectors corresponding to each position is obtained after training.
Step 3.1.3: for the abstract facts of an intelligent contract, the position coding matrix P 'is spliced with the word matrix X' according to the formula (3) to obtain an E matrix as the input of the attention module.
Step 3.2: building an attention module, wherein a schematic diagram of the attention module is shown in fig. 4;
the attention module is the core of the deep learning model. Through the attention mechanism of the module, the attention coefficient between abstract fact words can be calculated, and the vector corresponding to each word of each abstract fact contains the information of the vectors corresponding to other words, so that the key information in the abstract fact can be better obtained. The principle of the attention mechanism is that the attention coefficients between each word and other words in the abstract fact are obtained by matrix multiplication.
In a preferred embodiment, the specific steps for building an attention module are as follows:
step 3.2.1: calculating attention coefficients among the abstract fact words to obtain an attention coefficient matrix of the abstract fact;
the method of calculation of the attention coefficients in the preferred embodiment is similar to BERT, involving three matrices: q matrix, K matrix and V matrix. The Q matrix is a Query matrix of abstract facts and consists of Query vectors corresponding to each word of each abstract fact; the K matrix is a Key matrix of the abstract facts, and consists of Key vectors corresponding to each word of each abstract fact, the V matrix is a Value matrix of the abstract facts, and consists of Value vectors corresponding to each word of each abstract fact. The three matrixes are randomly given values in the initial state, are respectively obtained by the E matrix through three linear changes, and the values of Q, K and V have characterization significance after training.
An attention coefficient matrix of the abstract fact is derived according to equation (4):
A=QK T (4)
step 3.2.2: updating element values in the V matrix according to the attention coefficient matrix A of the abstract facts to obtain an updated V matrix y';
in a preferred embodiment, after obtaining the attention coefficient matrix a, the element values in the V matrix are updated according to the formula (5), so as to obtain an updated V matrix V'.
Where dk represents the arithmetic sum of squares of the K matrix, and the dimension that is multiplied by the square in equation (5) is reduced to the original size, and in the process of back propagation, a certain gradient update value jitter is reduced. Softmax is an activation function that is applied in the sense that adding a nonlinear variation enhances the characterizability of the V' matrix.
Step 3.2.3: a layer normalization mechanism is added into a matrix V 'of the attention module, so that elements in the matrix V' are more standard, convergence is accelerated, and stability of feature distribution is ensured;
the layer normalization mechanism considers the inputs of all dimensions of the matrix V', calculates the average input value and the input variance, and then converts the inputs of each dimension with the same normalization operation. The mean formula for all elements of the V' matrix is as follows:
the variance formula for all elements of the V' matrix is as follows:
wherein n is (v) Is the number of elements in V', mu (v) Is the mean value of the two values,is variance, sigma (v) Is the standard deviation. Each element V in the matrix V i Normalization processing is performed according to the formula (8):
in the above, v i ' is each element V in the matrix V i Normalized values.
Step 3.3: constructing a residual error connection module;
the vocabulary of the source inputs (abstract facts) of the deep learning model in the preferred embodiment is too small, the attention module may excessively capture the connection relations between words, and the addition of the residual connection module may overcome the problem to some extent.
In a preferred embodiment, the matrix calculation formula of the residual connection module is as follows:
Z=H(E)=E+F(E)=E+V″ (9)
wherein, the matrix E is the input of the attention module; v "is the output of the attention module and the addition of these two matrices yields the output Z of the residual connection module. F is a residual function, and in the attention module, a mapping H (E) →z is obtained by back propagation, and if there is no residual connection module, F (E) →0.
Step 3.4: building an output module;
the output module is used for outputting the possible vulnerability probability of the abstract facts and maximizing the security vulnerability detection capability of the deep learning model according to the loss function.
In a preferred embodiment, the specific steps for building the output module are as follows:
step 3.4.1: and defining a vulnerability class output formula shown in formula (10) for outputting abstract fact vulnerability class results of the intelligent contract.
P k =softmax(Linear(Z)) (10)
Wherein Linear represents a Linear function, i.e. a Linear transformation of matrix Z is performed once, the softmax function is an activation function, P k Probability values for different vulnerability types.
Step 3.4.2: and constructing a loss function of the deep learning model, wherein the model has vulnerability classification capability through the loss function, and the loss function is a multi-category cross entropy loss function shown in a formula (11).
Loss 1 =-∑ k y k log(P k ) (11)
Wherein y is k And representing a one-hot coded label corresponding to the abstract fact, wherein k represents a vulnerability class corresponding to the abstract fact.
Step 4: constructing a training data set of a deep learning model;
the vulnerability detection problem may be regarded as a multi-classification problem in machine learning. Since the classification problem belongs to supervised learning, data (solubility program) and a tag of the data (vulnerability type) are required. The construction of the training data set for the deep learning model thus includes acquiring data and labeling the data with a tag type.
In the preferred embodiment, a total of 1500 program files for the real existing smart contracts for ethernet are first collected. And then manually marking the 1500 program files according to the definition of SWC Registry on the loopholes of the intelligent contract, and constructing a training data set of the deep learning model. SWC Registry is the currently mainstream library of intelligent contract vulnerability annotation standards. It is built by ethernet security personnel and developers in the Smart Contract Security organization. The loophole library provides security loopholes classification, partial test cases of the intelligent contracts of the Ethernet and results brought by the loopholes. The number of loopholes for each class in the training dataset and their occupancy are shown in table 2.
TABLE 2 vulnerability count and duty cycle
Vulnerability class Quantity of Duty ratio of
Reentrant vulnerability 1014 67.6%
Timestamp dependency loopholes 715 46.7%
Endless loop vulnerability 326 21.7%
Leak-free 293 19.5%
Step 5: and training the deep learning model by using the training data set.
In a preferred embodiment, the training of the deep learning model is split into two steps, the first step being pre-training, with the aim of causing the value of the loss function of the deep learning model to drop rapidly. The second step is fine training (Finetune Train) to further improve the security detection capabilities of the deep learning model. The combined training mode of the pre-training and the fine-tuning training enables the deep learning model to have better robustness and expandability.
In a preferred embodiment, the pretraining and fine-tuning training of the deep learning model is performed using a juyter Notebook platform with GPU resources: during pre-training, setting the Batch-size to be 16, setting the epoch to be 80, selecting the optimizer to be Adam, stopping the pre-training when the loss value change is stable to be 1, and starting fine-tuning training; during fine tuning training, the Batch-size is set to 4, the epoch is set to 20, the optimizer is selected to be SGD, and the fine tuning training is stopped when the loss value change is stabilized to 0.1. The deep learning model after pre-training and fine-tuning training has the ability of vulnerability classification for intelligent contracts.
Step 6: performing vulnerability detection on the input intelligent contracts by using the trained deep learning model, and outputting a security detection result of the intelligent contract solubility source program;
and performing vulnerability detection on the intelligent contracts by using the trained deep learning model, outputting a probability value of each vulnerability type as an output result, if the output probability value is more than or equal to 0.5, considering that the intelligent contracts have the vulnerability, and if the output probability value is less than 0.5, not having the vulnerability. The method can effectively and automatically detect the security of the intelligent contract.
It should be apparent that the above-described embodiments are merely some, but not all, embodiments of the present invention. The above examples are only for explaining the present invention and do not limit the scope of the present invention. Based on the above embodiments, all other embodiments, i.e. all modifications, equivalents and improvements made within the spirit and principles of the present application, which are obtained by persons skilled in the art without making creative efforts are within the scope of the present invention claimed.

Claims (8)

1. An intelligent contract safety detection method based on static analysis and deep learning is characterized by comprising the following steps:
step 1: performing static analysis on the intelligent contract solubility source program to obtain a graph structure of the intelligent contract solubility source program; the static analysis comprises lexical analysis and grammar analysis; the graph structure comprises an abstract syntax tree AST and a control flow graph CFG;
step 2: extracting abstract facts from the map structure of the solubility source program obtained in the step 1;
step 3: according to the abstract facts of the solubility source program obtained in the step 2, a deep learning model for performing vulnerability classification on the solubility source program is built, and the deep learning model comprises: the device comprises an input module, an attention module, a residual error connection module and an output module;
step 4: constructing a training data set of the deep learning model;
step 5: training the deep learning model by utilizing the training data set;
step 6: performing vulnerability detection on the input intelligent contracts by using the trained deep learning model, and outputting a security detection result of the intelligent contract solubility source program;
the step 3 specifically comprises the following steps:
step 3.1: building an input module: the abstract facts obtained in the step 2 are represented by a 0-1 coding matrix X, word embedding processing and position embedding processing are respectively carried out on the abstract facts represented by the 0-1 coding matrix X, and an E matrix obtained by splicing a matrix obtained after the word embedding processing and a matrix obtained after the position embedding processing is used as input of an attention module;
step 3.2: the method for building the attention module specifically comprises the following steps of:
step 3.2.1: respectively obtaining a Q matrix, a K matrix and a V matrix of the abstract facts through three linear changes of the E matrix, and obtaining an attention coefficient matrix A of the abstract facts according to a formula (4);
A=QK T (4)
the Q matrix is a Query matrix of abstract facts and consists of Query vectors corresponding to each word of each abstract fact; the K matrix is a Key matrix of abstract facts and consists of Key vectors corresponding to each word of each abstract fact; the V matrix is a Value matrix of the abstract facts and consists of Value vectors corresponding to each word of each abstract fact;
step 3.2.2: updating element values in the V matrix according to the attention coefficient matrix A of the abstract facts and the formula (5) to obtain an updated V matrix V';
wherein d k Representing the arithmetic sum of squares of the K matrix; the softmax function is the activation function;
step 3.2.3: a layer normalization mechanism is added into a matrix V 'of the attention module, so that elements in the matrix V' are more standard, convergence is accelerated, and stability of feature distribution is ensured;
step 3.3: building a residual connection module, wherein a matrix calculation formula of the residual connection module is as follows:
Z=H(E)=E+F(E)=E+V" (9)
wherein, the matrix E is the input of the attention module; v' is the output of the attention module; z is the output of the residual error connection module; f is a residual function, in the attention module, a mapping H (E) -Z is obtained through back propagation, and if no residual connection module exists, F (E) -0;
step 3.4: building an output module to output the possible vulnerability probability of abstract facts, wherein the concrete steps of building the output module are as follows:
step 3.4.1: defining a vulnerability class output formula shown in formula (10) for outputting abstract fact vulnerability class results of intelligent contracts;
P k =softmax(Linear(Z)) (10)
wherein Linear represents a Linear function, and performs Linear transformation on the matrix Z once; p (P) k Probability values for different vulnerability types;
step 3.4.2: constructing a loss function of the deep learning model, so that the model has vulnerability classification capability;
the loss function is a multi-class cross entropy loss function shown in formula (11):
Loss 1 =-Σ k y k log(P k ) (11)
wherein y is k And representing a one-hot coded label corresponding to the abstract fact, wherein k represents a vulnerability class corresponding to the abstract fact.
2. The intelligent contract security test method based on static analysis and deep learning as claimed in claim 1, wherein the step 1 specifically includes the following steps:
step 1.1: preprocessing the intelligent contract solubility source program, and deleting all contents irrelevant to the safety detection of the solubility source program;
step 1.2: importing a source code file corresponding to an import statement into the preprocessed intelligent contract visibility source program to obtain a complete visibility source program;
step 1.3: for a complete solubility source program, converting the solubility source program into an abstract syntax tree by using an ANTLR analyzer;
step 1.4: constructing a control flow graph CFG of the solubility source program according to the abstract syntax tree.
3. The intelligent contract security test method based on static analysis and deep learning as claimed in claim 2, wherein the step 1.3 specifically includes the steps of:
step 1.3.1: performing lexical analysis on the complete solubility source program by using an ANTLR analyzer, and marking the attribute of the words in the solubility source program according to the predefined word attribute category to obtain a word sequence with word attribute marks corresponding to each program statement;
step 1.3.2: for word sequences corresponding to each program statement generated by lexical analysis, using an ANTLR analyzer to carry out grammar analysis, and determining the grammar structure of each program statement according to a predefined grammar rule; the grammar structure comprises a contract structure, a function structure, a variable structure, an expression structure and a statement control flow structure;
step 1.3.3: according to the grammar structure of each program statement, the soldity source program is converted into an abstract grammar tree by using an ANTLR analyzer.
4. The intelligent contract security test method based on static analysis and deep learning of claim 3, wherein the word attribute categories include keywords < keyword >, visibility definer < qualitier >, variable data type < variable >, identifier < operator > and constant.
5. The intelligent contract security detection method based on static analysis and deep learning of claim 3, wherein the predefined grammar rules are as follows:
a)Contract::=”contract”<identifier>”{”[contractBlock]”}”;
b)ContractBlock::=[Function]|[Variable];
c)Function::=”function”<identifier>”(”[Variable]”)”<qualifier>[keyword][Return][”;”|Block];
d)Variable::=<variabletype><qualifier><identifier>[”=”Expression]”;”;
e)Expression::=Functioncall|<identifier>|Expression<operator>|Expression<operator>Expres sion|<identifier><operator><constant>;
f)Functioncall::=Expression”(”Variable”)”;
g)Block::=”{”Statement”}”;
h)Statement::=IfStatement|WhileStatment|ForStatement|Variable|Expression|Block|”break”|”continue”|Return;
i)IfStatement::=”if””(”Expression”)”Block[”else”Block];
j)WhileStatment::=”while””(”Expression”)”Block;
k)ForStatement::=”for””(”[Variable]”;”[Expression]”;”[Expression]”)”Block;
l)Return::=”return”[Expression]。
6. the intelligent contract security test method based on static analysis and deep learning as claimed in claim 2, wherein the step 1.4 specifically includes the steps of:
step 1.4.1: constructing different basic blocks Block according to Block nodes in an abstract syntax tree AST by utilizing program sentences belonging to a sentence control flow structure, recording sentence numbers StmtId of each sentence in each basic Block, and recording the in-edge and out-edge of each basic Block;
step 1.4.2: connecting different basic blocks, when the outgoing edge of one basic block is equal to the incoming edge of the other basic block, connecting the two basic blocks, and when the number of the outgoing edges of one basic block is greater than 1, recording the jump condition of the basic block;
step 1.4.3: the number VarId of the variable in the program statement and the assignment operation assignment are recorded by using a static single assignment form, namely, one variable only carries out one assignment operation, and the variable name of the variable carrying out secondary assignment is modified.
7. The intelligent contract security test method based on static analysis and deep learning as claimed in claim 1, wherein the abstract facts include all control flow information, data information and function information of the intelligent contract, and are written in a datalog language, and the abstract facts have the following structural form:
the predicate is a corresponding predicate name defined according to a solubility source program structure, and comprises a data type, a function type, an expression structure and a control flow structure; arg 1..argn is another parameter related to the content of a concrete solubility program sentence.
8. The intelligent contract security detection method based on static analysis and deep learning according to claim 1 or 7, wherein the method for extracting abstract facts from the graph structure of the solubility source program obtained in step 1 is as follows: traversing the graph structure of the solubility source program, and extracting abstract facts of the solubility source program according to keyword matching.
CN202110766768.XA 2021-07-07 2021-07-07 Intelligent contract security detection method based on static analysis and deep learning Active CN113486357B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110766768.XA CN113486357B (en) 2021-07-07 2021-07-07 Intelligent contract security detection method based on static analysis and deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110766768.XA CN113486357B (en) 2021-07-07 2021-07-07 Intelligent contract security detection method based on static analysis and deep learning

Publications (2)

Publication Number Publication Date
CN113486357A CN113486357A (en) 2021-10-08
CN113486357B true CN113486357B (en) 2024-02-13

Family

ID=77941656

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110766768.XA Active CN113486357B (en) 2021-07-07 2021-07-07 Intelligent contract security detection method based on static analysis and deep learning

Country Status (1)

Country Link
CN (1) CN113486357B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114048464B (en) * 2022-01-12 2022-03-15 北京大学 Ether house intelligent contract security vulnerability detection method and system based on deep learning
CN115017507A (en) * 2022-07-14 2022-09-06 北京华云安信息技术有限公司 Method, device, equipment and storage medium for detecting source code tampering
CN115033896B (en) * 2022-08-15 2022-11-08 鹏城实验室 Method, device, system and medium for detecting Ethernet intelligent contract vulnerability
CN115146282A (en) * 2022-08-31 2022-10-04 中国科学院大学 AST-based source code anomaly detection method and device
CN115879868B (en) * 2022-09-09 2023-07-21 南京审计大学 Expert system and deep learning integrated intelligent contract security audit method
CN117033164B (en) * 2023-05-17 2024-03-29 烟台大学 Intelligent contract security vulnerability detection method and system

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109933991A (en) * 2019-03-20 2019-06-25 杭州拜思科技有限公司 A kind of method, apparatus of intelligence contract Hole Detection
CN110096439A (en) * 2019-04-26 2019-08-06 河海大学 A kind of method for generating test case towards solidity language
CN110111218A (en) * 2019-03-18 2019-08-09 东北大学 A kind of software copyright managing and control system and method based on block chain
KR20190105774A (en) * 2018-03-06 2019-09-18 충남대학교산학협력단 Method for improving safty of calling function in smart contracts
CN110659494A (en) * 2019-09-27 2020-01-07 重庆邮电大学 Extensible intelligent contract vulnerability detection method
CN111753306A (en) * 2020-05-29 2020-10-09 西安深信科创信息技术有限公司 Intelligent contract vulnerability detection method and device, electronic equipment and storage medium
CN111861465A (en) * 2020-07-21 2020-10-30 国家计算机网络与信息安全管理中心 Detection method and device based on intelligent contract, storage medium and electronic device
US11036614B1 (en) * 2020-08-12 2021-06-15 Peking University Data control-oriented smart contract static analysis method and system
CN113360915A (en) * 2021-06-09 2021-09-07 扬州大学 Intelligent contract multi-vulnerability detection method and system based on source code graph representation learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190370799A1 (en) * 2018-05-30 2019-12-05 Investa Tech Consulting, Inc. Application for creating real time smart contracts

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190105774A (en) * 2018-03-06 2019-09-18 충남대학교산학협력단 Method for improving safty of calling function in smart contracts
CN110111218A (en) * 2019-03-18 2019-08-09 东北大学 A kind of software copyright managing and control system and method based on block chain
CN109933991A (en) * 2019-03-20 2019-06-25 杭州拜思科技有限公司 A kind of method, apparatus of intelligence contract Hole Detection
CN110096439A (en) * 2019-04-26 2019-08-06 河海大学 A kind of method for generating test case towards solidity language
CN110659494A (en) * 2019-09-27 2020-01-07 重庆邮电大学 Extensible intelligent contract vulnerability detection method
CN111753306A (en) * 2020-05-29 2020-10-09 西安深信科创信息技术有限公司 Intelligent contract vulnerability detection method and device, electronic equipment and storage medium
CN111861465A (en) * 2020-07-21 2020-10-30 国家计算机网络与信息安全管理中心 Detection method and device based on intelligent contract, storage medium and electronic device
US11036614B1 (en) * 2020-08-12 2021-06-15 Peking University Data control-oriented smart contract static analysis method and system
CN113360915A (en) * 2021-06-09 2021-09-07 扬州大学 Intelligent contract multi-vulnerability detection method and system based on source code graph representation learning

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
ContractGuard:面向以太坊区块链智能合约的入侵检测系统;赵淦森;谢智健;王欣明;何嘉浩;张成志;林成创;Ziheng Zhou;陈冰川;Chunming Rong;;网络与信息安全学报(第02期);第39-59页 *
DC-Hunter:一种基于字节码匹配的危险智能合约检测方案;韩松明;梁彬;黄建军;石文昌;;信息安全学报(第03期);第105-117页 *
Deep Residual Shrinkage Networks for Fault Diagnosis;Minghang Zhao 等;IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS;第16卷(第7期);第4681-4690页 *
Pouyan Momeni 等.Machine Learning Model for Smart Contracts Security Analysis.2019 17th International Conference on Pricavy,Security and Trust(PST).2020,文章第II-III节. *
RPVC: A Revocable Publicly Verifiable Computation Solution for Edge Computing;Zi Jiao 等;RPVC: A Revocable Publicly Verifiable Computation Solution for Edge Computing. Sensors 2022;第1-20页 *
基于以太坊智能合约的漏洞扫描器的设计与整合;赵芳煜;中国优秀硕士学位论文全文数据库 信息科技辑;文章第3、4章 *
智能合约安全漏洞研究综述;倪远东;张超;殷婷婷;;信息安全学报(第03期);第83-104页 *
静态程序分析并行化研究进展;陆申明;左志强;王林章;;软件学报(第05期);第7-18页 *

Also Published As

Publication number Publication date
CN113486357A (en) 2021-10-08

Similar Documents

Publication Publication Date Title
CN113486357B (en) Intelligent contract security detection method based on static analysis and deep learning
CN111428044B (en) Method, device, equipment and storage medium for acquiring supervision and identification results in multiple modes
CN109697162B (en) Software defect automatic detection method based on open source code library
CN111611586B (en) Software vulnerability detection method and device based on graph convolution network
CN110941716B (en) Automatic construction method of information security knowledge graph based on deep learning
CN113360915B (en) Intelligent contract multi-vulnerability detection method and system based on source code diagram representation learning
CN112035841B (en) Intelligent contract vulnerability detection method based on expert rules and serialization modeling
CN111160749A (en) Method and device for evaluating information quality and fusing information
CN112115326B (en) Multi-label classification and vulnerability detection method for Etheng intelligent contracts
CN108491228A (en) A kind of binary vulnerability Code Clones detection method and system
CN115357904B (en) Multi-class vulnerability detection method based on program slicing and graph neural network
Ullah et al. Programmers' de-anonymization using a hybrid approach of abstract syntax tree and deep learning
CN115292520B (en) Knowledge graph construction method for multi-source mobile application
CN111866004A (en) Security assessment method, apparatus, computer system, and medium
Ibba et al. Evaluating machine-learning techniques for detecting smart ponzi schemes
CN116305158A (en) Vulnerability identification method based on slice code dependency graph semantic learning
CN112699375A (en) Block chain intelligent contract security vulnerability detection method based on network embedded similarity
CN116702157B (en) Intelligent contract vulnerability detection method based on neural network
CN113127933A (en) Intelligent contract Pompe fraudster detection method and system based on graph matching network
CN115860117B (en) MDTA knowledge extraction method and system based on attack and defense behaviors
CN116415251A (en) Vulnerability influence range reasoning method and system based on deep learning
US20230075290A1 (en) Method for linking a cve with at least one synthetic cpe
CN110610066B (en) Counterfeit application detection method and related device
WO2022201308A1 (en) Information analysis device, information analysis method, and computer-readable recording medium
CN117668850A (en) Intelligent contract vulnerability detection and positioning method based on multitask learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant