CN110737899B - Intelligent contract security vulnerability detection method based on machine learning - Google Patents

Intelligent contract security vulnerability detection method based on machine learning Download PDF

Info

Publication number
CN110737899B
CN110737899B CN201910904539.2A CN201910904539A CN110737899B CN 110737899 B CN110737899 B CN 110737899B CN 201910904539 A CN201910904539 A CN 201910904539A CN 110737899 B CN110737899 B CN 110737899B
Authority
CN
China
Prior art keywords
intelligent contract
vulnerability
code
contract
machine learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910904539.2A
Other languages
Chinese (zh)
Other versions
CN110737899A (en
Inventor
翁健
陈新凯
李明
袁浩宸
张斌
卢贺贤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinan University
Original Assignee
Jinan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan University filed Critical Jinan University
Priority to CN201910904539.2A priority Critical patent/CN110737899B/en
Publication of CN110737899A publication Critical patent/CN110737899A/en
Application granted granted Critical
Publication of CN110737899B publication Critical patent/CN110737899B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Computing Systems (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Stored Programmes (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an intelligent contract security vulnerability detection method based on machine learning, which comprises the steps of firstly collecting intelligent contract source code data, preprocessing the data and constructing a sample set for machine learning; and then determining vulnerability labels for sample set data by using a disclosed intelligent contract vulnerability detector, translating intelligent contract source codes into XML structured texts, extracting the characteristics of the intelligent contract source codes in the data set on the basis, and considering that the current Solidity intelligent contract sample data is limited according to different vulnerability types of the intelligent contract, so that the method adopts two different machine learning algorithms to analyze according to the quantity of the label samples. The method and the device can more efficiently and automatically obtain the detected identity intelligent contract vulnerability by adopting a random forest algorithm to construct a model for multiple data samples and utilizing transfer learning to construct a detection model for less data samples.

Description

Intelligent contract security vulnerability detection method based on machine learning
Technical Field
The invention relates to the technical field of network space security, in particular to an intelligent contract security vulnerability detection method based on machine learning.
Background
Ether Fang is the most mature public chain except Bizhou, and has become the first development platform of the bottom layer module chain in the industry with the continuous development and maturation in the global scope. The intelligent contract with complete pictures can be supported in the ether workshop, the limitation of the bitcoin on the application of the block chain is broken through, people can know the block chain without being limited to digital currency, and the application field is further expanded to various industries in an intelligent contract form, such as block chain distributed application DApp. Economic losses due to blockchain self-mechanics problems, ecological security, and user security reach billions of dollars, while losses due to smart contract security vulnerabilities account for the highest percentage, up to 41.8%. With the increasing economic value of the block chain, lawless persons are prompted to acquire more sensitive data by various attack means, such as 'theft', 'lasso', 'mine digging', and the like, and the block chain security situation becomes more complex by means of the block chain concept and technology. According to Besec survey data of network security companies, digital cryptocurrency, which has a value of about several billion dollars in total, is stolen in recent years, and the amount of money lost due to block chain security events is rising worldwide. Various theft 'repugnance' pushes the digital encryption currency market with the market value as high as 1 trillion dollars to the wave tip of the air opening.
And once the intelligent contracts in the ether house are deployed, once the vulnerabilities occur, the vulnerabilities cannot be solved by means of patching or updating due to the fact that the vulnerabilities cannot be tampered, and most of the cases can only adopt a contract forbidding means to prevent loss from being further expanded. Traditional analysis of security vulnerabilities in intelligent contracts is very valuable for analyzing predefined vulnerability attributes. However, most conventional analysis tools require complex analysis steps to be performed, such as a predetermined calling depth to search for an execution path, and the search time increases as the depth increases. Since 12 months 2015, the number of blockchain contracts like Ethereum increased 176 times. If these tools are unable to analyze an increasing number of contracts in time, then an increasing number of security breaches will irreparably harm the community of intelligent contracts.
Disclosure of Invention
The invention aims to solve the defects in the prior art and provides an intelligent contract security vulnerability detection method based on machine learning, which can detect security risk vulnerabilities existing in intelligent contract codes and problems caused by the properties of block chains by establishing an intelligent contract security vulnerability detection model, and display specific vulnerability information to enable participating users to clearly know the security vulnerabilities existing in contracts concerned by the users; and in the face of the current increasing number of intelligent contracts, the detection result can be obtained more quickly compared with the traditional analysis mode.
The purpose of the invention can be achieved by adopting the following technical scheme:
an intelligent contract security vulnerability detection method based on machine learning comprises the following steps:
s1, collecting massive Solidity intelligent contract codes and Java/C + + codes on the network to form a basic data set for machine learning, and selecting contracts of which the Solidity compilation version is higher than a specified version number and the code content repetition rate is lower than a repetition threshold value in the basic data set as a machine learning sample set;
s2, determining vulnerability labels for sample set data through an intelligent contract vulnerability detector, generating vulnerability label data based on a solid intelligent contract vulnerability detection tool, and counting the number of solid intelligent contract samples of each vulnerability label in a sample set;
s3, performing branch processing according to the number of samples of the Solidity intelligent contract label in the sample set, and constructing a detection model by adopting a random forest algorithm aiming at multiple data labels larger than or equal to a preset comparison threshold value threshold; for the data labels smaller than a preset comparison threshold value threshold, performing transfer learning through a java/C + + vulnerability model to construct a detection model;
and S4, carrying out intelligent contract security vulnerability detection on the intelligent contract to be detected through the constructed detection model to obtain security vulnerability information existing in the intelligent contract.
Further, the procedure of step S1 is as follows:
s11, collecting a Solidity intelligent contract code from the Ethern intelligent contract platform by using a crawler script, and simultaneously collecting a Java/C + + code from an open source community;
s12, converting the identity intelligent contract code into an XML text, directly obtaining a compiled version of the identity, then comparing internal code segments of the converted XML text, and calculating the same proportion of the code segments to obtain a content repetition rate;
and S13, selecting a contract with the consistency compiling version higher than the specified version number and the code content repetition rate lower than the repetition threshold value in the basic data set as a machine learning sample set.
Further, in step S13, a contract with a Solidity compiled version higher than 4.14 and a code content repetition rate lower than 30% is selected as the machine learning sample set
Further, the procedure of step S2 is as follows:
s21, inputting the Solidity intelligent contract codes in a sample set by using one or more Solidity intelligent contract vulnerability detection tools, and outputting a plurality of vulnerability labels;
s22, summarizing the detection results of different detection tools, and recording the same vulnerability label when the frequency of the vulnerability label appearing in different detection results is equal to or higher than 50% to generate the identity intelligent contract vulnerability label;
and S23, counting the quantity of the Solidity intelligent contract samples of all vulnerability labels in the sample set.
Further, the process of constructing the detection model by the random forest algorithm is as follows:
p1, converting the identity intelligent contract code into XML text, each node in the XML text represents the grammar element of the contract code and provides all the details about the source code character;
p2, based on XML text and according to the principle intelligent contract characteristics, respectively considering the principle grammar, the contract semantics and the function behavior, extracting the characteristics;
and P3, training a random forest model by taking the feature vector and the label data corresponding to the solid intelligent contract as input by adopting a random forest algorithm, and training by taking the representative execution path function call and the code flow characteristic as high-weight characteristics in consideration of the inherent characteristics of the solid intelligent contract.
Further, the procedure of step P2 is as follows:
and traversing the XML text by applying a dom4j package and an XPath Language, and further packaging the Solidiy source code information contained in the XML text into a SolFileBean entity, wherein dom4j is an open source XML parsing package for parsing the XML text, the XPath is an XML Path Language (XML Path Language) and is a computer Language for determining the position of a certain part in the XML document, and the SolFileBean is a programming entity for packaging the Solidiy source code information. The SolFileBean provides complete details about the characteristics of the solid source code, including source code information including contract sets, method sets, variable sets and modifier sets;
according to the Solidity intelligent contract characteristics, the Solidity grammar, the contract semantics and the function behavior are respectively considered, various characteristics are extracted on the SolFileBean, and the characteristics are divided into four types, namely 1) the basic information characteristics of the contract; 2) a binary operator characteristic; 3) a code complexity characteristic; 4) and (4) path characteristics.
Further, the process of building the detection model by migrating and learning the java/C + + vulnerability model is as follows:
q1, extracting vulnerability types similar to programming language Java or C + + in the identity intelligent contract, wherein the vulnerability types include integer overflow vulnerability, reentry vulnerability and inter-function call exception vulnerability;
q2, training a detection model including an integer overflow vulnerability, a reentry vulnerability and an inter-function call exception vulnerability by using a large amount of sample data of a programming language Java or C + +;
q3, detecting the vulnerability detection model of the traditional code on the Solidiy intelligent contract test sample by using the transfer learning, checking the result accuracy, and correspondingly adjusting the traditional programming language training detection model.
Further, the intelligent contract vulnerability detection tool based on the identity comprises Oyente, ZEUS and Osiris.
Compared with the prior art, the invention has the following advantages and effects:
1) according to the invention, an intelligent contract security vulnerability detection model is established, and the Solidity codes are combined to perform multi-feature combination extraction analysis, so that security risk vulnerabilities existing in the intelligent contract codes and problems caused by the properties of block chains can be detected, and specific vulnerability information is displayed, so that participating users can know the security vulnerabilities existing in the concerned intelligent contracts at a glance.
2) The method is based on the characteristics of the intelligent contract source code and the vulnerability label, adopts random forest and transfer learning to carry out automatic learning to obtain the intelligent contract detection model aiming at different vulnerability types. Because the intelligent contract source code reflects that the behavior of the contract is closely related to the vulnerability, the characteristics of the intelligent contract source code are extracted for machine learning, better characteristics can be effectively learned, and the vulnerability existing in the intelligent contract is detected. The invention can more efficiently and automatically obtain and detect the vulnerability of the intelligent contract of the identity.
Drawings
Fig. 1 is an operational flow diagram of an intelligent contract security vulnerability detection method based on machine learning according to an embodiment of the present invention;
fig. 2 is a schematic diagram of an XML structured text in an intelligent contract security vulnerability detection method based on machine learning according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Examples
The embodiment discloses an intelligent contract security vulnerability detection method based on machine learning, as shown in fig. 1, the detection method comprises the following steps:
s1, collecting massive Solidity intelligent contract codes and Java/C + + codes on the network to form a basic data set for machine learning. Selecting a contract with a consistency compiling version higher than a specified version number and a code content repetition rate lower than a repetition threshold value in the basic data set as a machine learning sample set;
specifically, in this embodiment, the process of step S1 is as follows:
s11, collecting a identity intelligent contract code from the Etherhouse intelligent contract platform by using a crawler script, and collecting a Java/C + + code from an open source community;
s12, converting the identity intelligent contract code into a structured XML text, directly obtaining a compiled version of the identity, then comparing internal code segments of the converted XML text, and calculating the same proportion of the code segments to obtain the content repetition rate;
and S13, selecting a contract with the consistency compiling version higher than the specified version number and the code content repetition rate lower than the repetition threshold value in the basic data set as a machine learning sample set.
In this embodiment, a contract with a solid compiled version higher than 4.14 and a code content repetition rate lower than 30% is selected as a machine learning sample set.
In the above embodiment, the selected specified version number is a identity compiled version 4.14 and the selected repetition threshold is 30%, which does not limit the technical solution of the present invention, and other values still belong to the protection range of the technical solution of the present invention.
S2, determining vulnerability labels for the sample set data through an intelligent contract vulnerability detector, generating vulnerability label data based on a Solidiy intelligent contract vulnerability detection tool (including Oyente, ZEUS and Osiris), and counting the number of Solidiy intelligent contract samples of each vulnerability label in the sample set;
specifically, in this embodiment, the process of step S2 is as follows:
s21, inputting the Solidity intelligent contract codes in a sample set by using one or more Solidity intelligent contract vulnerability detection tools, and outputting a plurality of vulnerability labels;
the Intelligent contract vulnerability detector is an intelligent contract security detection tool based on semantic analysis, and can automatically detect the following latest Ethernet security vulnerability types: 1) integer Underflow Integer Underflow; 2) integer Overflow Integer Overflow; 3) multiple wallet vulnerability multiple Bug 2; 4) a stack calls a deep Attack Vulnerability Callstack Depth Attack Vulnerability; 5) transaction order dependency vulnerability Transaction (TOD); 6) the Timestamp depends on the vulnerability Timestamp Dependency; 7) the reentry Vulnerability Re-Entrancy Vulnerability.
TABLE 1 vulnerability category table of intelligent contract security vulnerability detection mechanism based on machine learning
Figure BDA0002212885630000071
And S22, summarizing the detection results of different detection tools, and only when the frequency of the same vulnerability label appearing in different detection results is equal to or higher than 50%, the label can record to generate the identity intelligent contract vulnerability label.
And S23, counting the quantity of the Solidity intelligent contract samples of all vulnerability labels in the sample set.
S3, performing branch processing on the basis of the number of the Solidity intelligent contract label samples in the sample set, and constructing a detection model by adopting a random forest algorithm for multiple data labels greater than or equal to a preset comparison threshold value threshold; and for the data labels smaller than the preset comparison threshold value threshold, carrying out transfer learning by using the java/C + + vulnerability model to construct a detection model.
And S4, carrying out intelligent contract security vulnerability detection through the constructed detection model.
The method for constructing the detection model by the random forest algorithm specifically comprises the following steps:
based on the conversion from the intelligent contract source code of the XML text to the formatted XML structure text, the present embodiment adopts ANTLR, a parser generator implemented based on LL (Left-to-right) algorithm, and uses a top-down recursive descent LL (Left-to-right) parser method to convert the intelligent contract code of the identity into the XML structure text, which retains all information of the identity contract to facilitate the following security translation. The generated XML structured data can be considered as an Abstract syntax tree (Abstract syntax tree) of the identity source code. Each node in XML represents a syntax element of a programming language, for example, a < functional definition > node represents a function definition statement in the identity code, and can provide rich details about the characteristics of the source code, such as the number of contracts, the number of functions, the specific content of the functions, and the like.
And extracting features based on the XML structure text. And traversing the XML text by applying a dom4j package and an XPath Language, and further packaging the Solidiy source code information contained in the XML text into a SolFileBean entity, wherein dom4j is an open source XML parsing package for parsing the XML text, the XPath is an XML Path Language (XML Path Language) and is a computer Language for determining the position of a certain part in the XML document, and the SolFileBean is a programming entity for packaging the Solidiy source code information. The SolFileBean provides complete details about the characteristics of the Solidentity source code, including all source code information such as contract sets, method sets, variable sets, modifier sets, etc. According to the characteristics of the intelligent contract of the Solidity, the aspects of the Solidity grammar, the contract semantics, the function behavior and the like are considered respectively, a plurality of characteristics are extracted on the SolFileBean, and the characteristics can be divided into four types, namely 1) the basic information characteristics of the contract; 2) a binary operator characteristic; 3) a code complexity characteristic; 4) and (4) path characteristics.
1) The contract basic information features refer to the number and definition of contracts (contacts), functions (functions), events (events) and modifiers (modifiers) of intelligent contracts. The contract definition refers to the existence of a parent contract of a contract; the function definition refers to an access modifier, a return value and an input parameter list of the function; the event definition refers to an input parameter list of input events; the modifier definition refers to an input parameter list of a modifier;
2) a binary operator feature, which refers to the number of occurrences and frequency of occurrences of a binary operator such as +, -,/, >, <, ═ in each contract and each function;
3) the code complexity characteristic is that the complexity of the code is approximately represented by the number of code lines, the length of the code, the number of loop statements and the number of basic blocks of a code flow chart;
4) the path characteristics refer to calling relations among functions, modifier modification relations of the functions and control statements in the code flow chart. The call relation between the functions refers to calling another function in the function and calling the function by the rest functions. The modifier modification relation of the function means that the function is modified by the modifier, and the function can be normally used only if the condition of the modifier is met. The control statement in the code flow diagram means that the branch statements in the code flow diagram represent different code execution paths respectively.
The method comprises the steps of performing model training by using a random forest algorithm and 10-fold cross validation and taking feature vectors and label data corresponding to a solid intelligent contract as input, taking the inherent characteristics of the solid intelligent contract into consideration, performing training by taking path features such as features of function calling and code flow as high-weight features, and obtaining a detection model with the highest accuracy through modifying weights for multiple times and testing.
The method for constructing the detection model through transfer learning of the java/C + + vulnerability model specifically comprises the following steps:
extracting vulnerability types which are close to the traditional programming language (Java/C + +) in the identity intelligent contract, including integer overflow vulnerability, reentry vulnerability, function calling exception vulnerability and the like;
training a detection model including an integer overflow vulnerability, a reentry vulnerability, an inter-function call exception vulnerability and the like by using a large amount of sample data of a traditional programming language (Java/C + +); training an integer overflow vulnerability, a reentry vulnerability and an inter-function call abnormal vulnerability in a traditional programming language by using a machine learning step of VulDeeParker to obtain a detection model;
VulDeeParker refers to a known method for detecting Java code bugs based on deep learning. The training process is as follows, and consists of 4 steps:
1) library/API function calls and corresponding slices are extracted from the training data (source program code). Extracting one or more program fragments by referring to each parameter of the library/API function call, one program fragment representing one or more lines of code of the program related to the library/API function call parameter;
2) code gadgets and corresponding tags are generated. A Code gadget is composed of multiple semantically related Code lines (Codes in CFG), and then the Code gadget is labeled to be 1 (leaky) or 0 (non-leaky);
3) code gadgets are converted to a vector representation. By representing the Code gadget as a semantic representation, semantic information of the training data is preserved. Then, encoding the semantically expressed Code gadget into a vector, wherein the vector is the input of the BLSTM;
4) the BLSTM neural network is trained. The BLSTM model is trained on a training sample data set, in accordance with a standard training model.
And detecting the Solidiy intelligent contract test sample by using a vulnerability detection model of the traditional code through a parameter/model migration mode in migration learning, checking the accuracy of the result, and correspondingly adjusting the parameters in the traditional programming language training detection model to further fit the intelligent contract security vulnerability detection model.
The parameter/model migration mode refers to that the original model is migrated to a new field (domain) by assuming that some common parameters are shared between a source task (source tasks) and a target task (target tasks) or the prior distribution of the hyper-parameters of the shared model, so as to achieve better precision.
The source tasks (source tasks) represent vulnerability detection tasks (including integer overflow vulnerabilities, reentrant vulnerabilities and the like) of traditional codes, and the target tasks (target tasks) represent identity intelligent contract vulnerability detection tasks, so that the source tasks and the target tasks keep the same marking space to ensure a more efficient migration effect. The same mark space refers to tags such as integer overflow vulnerabilities and reentry vulnerabilities, the actual significance is the same between vulnerability detection tasks of traditional codes and vulnerability detection tasks of the solid intelligent contracts, and consistency is kept. According to the VulDeeParker training method, training is carried out through the semantic relation among codes without depending on the grammar of a specific programming language, which also shows that the feature space and the probability of vulnerability detection of the target domain (target domain) identity intelligent contract and the vulnerability detection of the source domain (source domain) traditional codes have high similarity, and the model construction of the transfer learning is supported.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims (4)

1. An intelligent contract security vulnerability detection method based on machine learning is characterized by comprising the following steps:
s1, collecting the identity intelligent contract code and the Java/C + + code on the network to form a basic data set for machine learning, and selecting a contract of which the identity compilation version is higher than a specified version number and the code content repetition rate is lower than a repetition threshold value in the basic data set as a machine learning sample set;
s2, determining vulnerability labels for sample set data through an intelligent contract vulnerability detector, generating vulnerability label data based on a solid intelligent contract vulnerability detection tool, and counting the number of solid intelligent contract samples of each vulnerability label in a sample set; the procedure of step S2 is as follows:
s21, using one or more solid intelligent contract leak detection tools, inputting solid intelligent contract codes in a sample set, and outputting a plurality of leak labels;
s22, summarizing the detection results of different detection tools, and recording the same vulnerability label when the frequency of the vulnerability label appearing in different detection results is equal to or higher than 50% to generate the identity intelligent contract vulnerability label;
s23, counting the quantity of the Solidity intelligent contract samples of all vulnerability labels in the sample set;
s3, performing branch processing according to the number of samples of the Solidity intelligent contract label in the sample set, and constructing a detection model by adopting a random forest algorithm aiming at multiple data labels larger than or equal to a preset comparison threshold value threshold; aiming at the data labels smaller than a preset comparison threshold value threshold, performing transfer learning through a java/C + + vulnerability model to construct a detection model;
s4, carrying out intelligent contract security vulnerability detection on an intelligent contract to be detected through the constructed detection model to obtain security vulnerability information existing in the intelligent contract; the process of constructing the detection model by the random forest algorithm is as follows:
p1, converting the solid intelligent contract code into XML text, wherein each node in the XML text represents the syntax element of the contract code and provides all details about the source code characteristics;
p2, based on XML text and according to the principle intelligent contract characteristics, respectively considering the principle grammar, the contract semantics and the function behavior, extracting the characteristics;
p3, training a random forest model by taking the feature vectors and label data corresponding to the solid intelligent contract as input by adopting a random forest algorithm, and training by taking the representative execution path as a high-weight feature by considering the inherent characteristics of the solid intelligent contract;
the procedure of step P2 is as follows:
traversing XML texts by applying a dom4j package and an XPath language, and further packaging the Solidity source code information contained in the XML texts into a SolFileBean entity, wherein the dom4j is an open source XML parsing package used for parsing the XML texts, the XPath is an XML path language and is a computer language used for determining a certain part of positions in an XML document, the SolFileBean is a programming entity used for packaging the Solidity source code information, and the SolFileBean provides complete details about the characteristics of the Solidity source code, including source code information including a contract set, a method set, a variable set and a modifier set;
according to the characteristics of the solid intelligent contract, respectively considering from the solid grammar, the contract semantics and the function behavior, extracting a plurality of characteristics on the SolFileBean, wherein the characteristics are divided into four types, namely 1) the basic information characteristics of the contract; 2) a binary operator characteristic; 3) a code complexity characteristic; 4) a path characteristic;
the process of building the detection model through the transfer learning of the java/C + + vulnerability model is as follows:
q1, extracting vulnerability types similar to programming language Java or C + + in the identity intelligent contract, wherein the vulnerability types include integer overflow vulnerability, reentry vulnerability and inter-function call exception vulnerability;
q2, training a detection model including an integer overflow vulnerability, a reentry vulnerability and an inter-function call exception vulnerability by using a large amount of sample data of a programming language Java or C + +;
q3, detecting the vulnerability detection model of the traditional code on the Solidiy intelligent contract test sample by using the transfer learning, checking the result accuracy, and correspondingly adjusting the traditional programming language training detection model.
2. The method for detecting the security vulnerability of the smart contracts based on the machine learning of claim 1, wherein the procedure of the step S1 is as follows:
s11, collecting a identity intelligent contract code from the Etherhouse intelligent contract platform by using a crawler script, and collecting a Java/C + + code from an open source community;
s12, converting the identity intelligent contract code into an XML text, directly obtaining a compiled version of the identity, then comparing internal code segments of the converted XML text, and calculating the same proportion of the code segments to obtain a content repetition rate;
and S13, selecting a contract with the consistency compiling version higher than the specified version number and the code content repetition rate lower than the repetition threshold value in the basic data set as a machine learning sample set.
3. The method for detecting security vulnerabilities of intelligent contracts based on machine learning of claim 2, wherein in step S13, contracts with a Solidity compiled version higher than 4.14 and a code content repetition rate lower than 30% are selected as a machine learning sample set.
4. A machine learning-based intelligent contract security vulnerability detection method according to claim 1, wherein the solid-based intelligent contract vulnerability detection tools comprise Oyente, ZEUS and Osiris.
CN201910904539.2A 2019-09-24 2019-09-24 Intelligent contract security vulnerability detection method based on machine learning Active CN110737899B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910904539.2A CN110737899B (en) 2019-09-24 2019-09-24 Intelligent contract security vulnerability detection method based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910904539.2A CN110737899B (en) 2019-09-24 2019-09-24 Intelligent contract security vulnerability detection method based on machine learning

Publications (2)

Publication Number Publication Date
CN110737899A CN110737899A (en) 2020-01-31
CN110737899B true CN110737899B (en) 2022-09-06

Family

ID=69269516

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910904539.2A Active CN110737899B (en) 2019-09-24 2019-09-24 Intelligent contract security vulnerability detection method based on machine learning

Country Status (1)

Country Link
CN (1) CN110737899B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111310191B (en) * 2020-02-12 2022-12-23 广州大学 Block chain intelligent contract vulnerability detection method based on deep learning
CN111339535A (en) * 2020-02-17 2020-06-26 扬州大学 Vulnerability prediction method and system for intelligent contract codes, computer equipment and storage medium
CN111353160B (en) * 2020-02-25 2022-08-16 融合安全(深圳)信息科技有限公司 Software bug abnormity intelligent detection system and method
CN112184432A (en) * 2020-03-16 2021-01-05 北京天德科技有限公司 Intelligent contract development method based on legal language
CN111563040B (en) * 2020-05-08 2023-08-15 中国工商银行股份有限公司 Block chain intelligent contract code testing method and device
CN111628997B (en) * 2020-05-26 2022-04-26 中国联合网络通信集团有限公司 Attack prevention method and device
CN112115326B (en) * 2020-08-19 2022-07-29 北京交通大学 Multi-label classification and vulnerability detection method for Etheng intelligent contracts
CN112257076B (en) * 2020-11-11 2023-12-15 厦门美域中央信息科技有限公司 Vulnerability detection method based on random detection algorithm and information aggregation
CN112416358B (en) * 2020-11-20 2022-04-29 武汉大学 Intelligent contract code defect detection method based on structured word embedded network
CN112613043B (en) * 2020-12-30 2024-02-27 杭州趣链科技有限公司 Intelligent contract vulnerability detection method based on intelligent contract calling network
CN112699375B (en) * 2020-12-30 2024-07-02 杭州趣链科技有限公司 Block chain intelligent contract security vulnerability detection method based on network embedded similarity
CN113268732B (en) * 2021-04-19 2022-12-20 中国人民解放军战略支援部队信息工程大学 Method and system for detecting similarity of intelligent contracts of identity
CN113221125B (en) * 2021-05-31 2022-09-27 河海大学 TreeGAN-based method and system for generating intelligent contract with vulnerability
CN117909981A (en) * 2021-07-21 2024-04-19 三峡大学 Distributed detection system for intelligent contract conflict in industrial block chain
CN114048464B (en) * 2022-01-12 2022-03-15 北京大学 Ether house intelligent contract security vulnerability detection method and system based on deep learning
CN114490388A (en) * 2022-01-27 2022-05-13 广西教育学院 Deep learning intelligent contract vulnerability detection method based on code segments

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101947760B1 (en) * 2018-09-04 2019-02-13 김종현 Secure authentication server for smart contract
CN109375899A (en) * 2018-09-25 2019-02-22 杭州趣链科技有限公司 A kind of method of formal verification Solidity intelligence contract
CN109933991A (en) * 2019-03-20 2019-06-25 杭州拜思科技有限公司 A kind of method, apparatus of intelligence contract Hole Detection
CN109977682A (en) * 2019-04-01 2019-07-05 中山大学 A kind of block chain intelligence contract leak detection method and device based on deep learning
CN110011986A (en) * 2019-03-20 2019-07-12 中山大学 A kind of source code leak detection method based on deep learning
CN110175454A (en) * 2019-04-19 2019-08-27 肖银皓 A kind of intelligent contract safety loophole mining method and system based on artificial intelligence

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101947760B1 (en) * 2018-09-04 2019-02-13 김종현 Secure authentication server for smart contract
CN109375899A (en) * 2018-09-25 2019-02-22 杭州趣链科技有限公司 A kind of method of formal verification Solidity intelligence contract
CN109933991A (en) * 2019-03-20 2019-06-25 杭州拜思科技有限公司 A kind of method, apparatus of intelligence contract Hole Detection
CN110011986A (en) * 2019-03-20 2019-07-12 中山大学 A kind of source code leak detection method based on deep learning
CN109977682A (en) * 2019-04-01 2019-07-05 中山大学 A kind of block chain intelligence contract leak detection method and device based on deep learning
CN110175454A (en) * 2019-04-19 2019-08-27 肖银皓 A kind of intelligent contract safety loophole mining method and system based on artificial intelligence

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"智能合约安全漏洞挖掘技术研究";付梦琳 等;《计算机应用技术》;20190710;第39卷(第7期);1959-1966 *

Also Published As

Publication number Publication date
CN110737899A (en) 2020-01-31

Similar Documents

Publication Publication Date Title
CN110737899B (en) Intelligent contract security vulnerability detection method based on machine learning
US11568055B2 (en) System and method for automatically detecting a security vulnerability in a source code using a machine learning model
Tann et al. Towards safer smart contracts: A sequence learning approach to detecting security threats
CN109697162B (en) Software defect automatic detection method based on open source code library
CN111459799B (en) Software defect detection model establishing and detecting method and system based on Github
Saccente et al. Project achilles: A prototype tool for static method-level vulnerability detection of Java source code using a recurrent neural network
US11403536B2 (en) System and method for anti-pattern detection for computing applications
CN113360915A (en) Intelligent contract multi-vulnerability detection method and system based on source code graph representation learning
WO2022089188A1 (en) Code processing method, apparatus, device, and medium
CN102054149A (en) Method for extracting malicious code behavior characteristic
CN112307473A (en) Malicious JavaScript code detection model based on Bi-LSTM network and attention mechanism
CN113158189B (en) Method, device, equipment and medium for generating malicious software analysis report
CN110750297B (en) Python code reference information generation method based on program analysis and text analysis
CN111881300A (en) Third-party library dependency-oriented knowledge graph construction method and system
CN109902487B (en) Android application malicious property detection method based on application behaviors
CN112688966A (en) Webshell detection method, device, medium and equipment
CN115022026A (en) Block chain intelligent contract threat detection device and method
CN112817877B (en) Abnormal script detection method and device, computer equipment and storage medium
CN116932381A (en) Automatic evaluation method for security risk of applet and related equipment
CN117725592A (en) Intelligent contract vulnerability detection method based on directed graph annotation network
CN116975881A (en) LLVM (LLVM) -based vulnerability fine-granularity positioning method
CN114996705B (en) Cross-software vulnerability detection method and system based on vulnerability type and Bi-LSTM
CN116821903A (en) Detection rule determination and malicious binary file detection method, device and medium
CN116305131A (en) Static confusion removing method and system for script
Tatarinova et al. Extended vulnerability feature extraction based on public resources

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant