CN116541845A - Intelligent contract multi-label vulnerability detection method and system based on AdaBoost - Google Patents

Intelligent contract multi-label vulnerability detection method and system based on AdaBoost Download PDF

Info

Publication number
CN116541845A
CN116541845A CN202310419740.8A CN202310419740A CN116541845A CN 116541845 A CN116541845 A CN 116541845A CN 202310419740 A CN202310419740 A CN 202310419740A CN 116541845 A CN116541845 A CN 116541845A
Authority
CN
China
Prior art keywords
adaboost
model
word
sample
loopholes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310419740.8A
Other languages
Chinese (zh)
Inventor
张明武
黄梦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hubei University of Technology
Original Assignee
Hubei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hubei University of Technology filed Critical Hubei University of Technology
Priority to CN202310419740.8A priority Critical patent/CN116541845A/en
Publication of CN116541845A publication Critical patent/CN116541845A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an intelligent contract multi-label vulnerability detection method and system based on AdaBoost, which comprises the steps of firstly extracting byte codes of an intelligent contract to be detected; then decompiling the byte code into an operation code; then extracting slice characteristics of the operation code; finally, aiming at slice characteristics, performing joint vulnerability detection by using an One-Vs-Rest model and an AdaBoost model; the method effectively improves the detection performance of the security holes of the intelligent contracts of the Ethernet, and has higher accuracy.

Description

Intelligent contract multi-label vulnerability detection method and system based on AdaBoost
Technical Field
The invention belongs to the technical field of information security, relates to a network space intelligent contract vulnerability detection method and system, and particularly relates to an intelligent contract multi-label vulnerability detection method and system based on AdaBoost.
Background
Intelligent contracts were first proposed by scientists, nickel Szabo, which is a computer protocol that propagates, validates, or executes contracts in an informative manner. At that time, smart contracts have not been used in practice due to lack of trusted execution environments. The above-described problems are not solved until the advent of bitcoin and blockchain technology. However, it cannot provide a complicated service due to the incompleteness of the bitcoin-in.
As one of the most popular blockchain platforms, ethernet has deployed tens of thousands of intelligent contracts. Ethernet has introduced the complete programming language (solution) of the calibrant for smart contracts, enabling developers to deploy smart contract-based applications (dapp) on blockchains. The intelligent contract has the characteristics of certainty, real-time performance, verifiability and decentralization, and can be used for various scenes including digital identity, financial transaction, securities, digital records, internet of things, blockchain, supply chain and the like, distributed computation and insurance.
With the richness of smart contract application scenarios, smart contracts mostly involve cryptocurrency, which may cost millions of dollars. Thus, security of the smart contract may affect security of the cryptocurrency. Security holes in smart contracts not only cause huge financial losses, but also destroy everyone's trust in smart contracts and blockchains. As can be seen, the development of intelligent contracts has a very high security requirement, and vulnerability detection has become an urgent problem in the field of blockchain security.
Currently, the main methods for detecting intelligent contract vulnerabilities are feature matching, formal verification, symbolic execution, static analysis, fuzzy testing and deep learning. The notation Oyente, mythril and security methods are performed. These tools need to find all executable paths in the contract or analyze the control flow graph of the contract to detect the vulnerability, but have the problems of low detection efficiency, long time consumption and the like.
Disclosure of Invention
In order to solve the technical problems, the invention provides an intelligent contract multi-label vulnerability detection method, system and electronic equipment based on AdaBoost, which are used for detecting vulnerabilities in intelligent contract source codes (solubility source codes) of Ethernet.
The technical scheme adopted by the method is as follows: an intelligent contract multi-label vulnerability detection method based on AdaBoost comprises the following steps:
step 1: extracting byte codes of intelligent contracts to be detected;
step 2: decompiling the bytecode into an operation code;
step 3: extracting slice characteristics of the operation code;
step 4: aiming at slice characteristics, performing joint vulnerability detection by using an One-Vs-Rest model and an AdaBoost model; the loopholes comprise reentrant loopholes, integer overflow loopholes, exception handling loopholes, call stack overflow loopholes, tx.origin loopholes, timestamp dependence loopholes and transaction sequence dependence loopholes;
the joint vulnerability detection is carried out by using an One-Vs-Rest model and an AdaBoost model, the One-Vs-Rest model is adopted to convert the multi-classification problem into a plurality of binary classification problems, the conversion idea is that One of the classes is selected to be positive, all other classes are made to be negative, then each binary classification task is classified by One AdaBoost classifier, the total of six AdaBoost classifiers are used for classifying, and the classification results are combined to provide the final multi-label classification result.
The system of the invention adopts the technical proposal that: an intelligent contract multi-label vulnerability detection system based on AdaBoost comprises the following modules:
the byte code extraction module is used for extracting byte codes of intelligent contracts to be detected;
the operation code extraction module is used for decompiling the byte code into an operation code;
the slice feature extraction module is used for extracting slice features of the operation code;
the vulnerability detection module is used for carrying out joint vulnerability detection by using an One-Vs-Rest model and an AdaBoost model aiming at slice characteristics; the loopholes comprise reentrant loopholes, integer overflow loopholes, exception handling loopholes, call stack overflow loopholes, tx.origin loopholes, timestamp dependence loopholes and transaction sequence dependence loopholes;
the joint vulnerability detection is carried out by using an One-Vs-Rest model and an AdaBoost model, the One-Vs-Rest model is adopted to convert the multi-classification problem into a plurality of binary classification problems, the conversion idea is that One of the classes is selected to be positive, all other classes are made to be negative, then each binary classification task is classified by One AdaBoost classifier, the total of six AdaBoost classifiers are used for classifying, and the classification results are combined to provide the final multi-label classification result.
The invention realizes intelligent contract multi-label vulnerability detection based on AdaBoost based on word2vec and AdaBoost, and the scheme mainly researches operation codes of intelligent contracts, wherein the operation codes consist of a plurality of operation code fragments. Because the operation code comprises logic executed by the contract, slice characteristics of the operation code are extracted through word2vec and PCA, and the slice characteristics are used as input of an AdaBoost model for learning and training, the detection performance of the intelligent contract security hole can be effectively improved.
Drawings
Fig. 1 is a schematic diagram of a detection method according to an embodiment of the present invention.
FIG. 2 is a training flow chart of One-Vs-Rest model and AdaBoost model according to an embodiment of the present invention;
Detailed Description
In order to facilitate the understanding and practice of the invention, those of ordinary skill in the art will now make further details with reference to the drawings and examples, it being understood that the examples described herein are for the purpose of illustration and explanation only and are not intended to limit the invention thereto.
Referring to fig. 1, the intelligent contract multi-label vulnerability detection method based on AdaBoost provided by the invention comprises the following steps:
step 1: for an intelligent contract to be detected, extracting byte codes through a solc command;
step 2: decompiling the byte code into an operation code by a vandal tool, wherein the operation code consists of a plurality of operation code fragments;
step 3: extracting slice characteristics of the operation code;
in this embodiment, extracting each section of operation code feature through a word2vec model, generating a word vector matrix, summing the word vector matrix to obtain a feature, and forming the feature of each operation code segment into a slice feature, where the slice feature represents the feature of the contract;
in this embodiment, the word2vec model is composed of a CBOW model and a skip-gram model, and includes an input layer, a hidden layer and an output layer; CBOW model for predicting intermediate target word through context content, inputting one-hot code of context word of current word into input layer, multiplying by eachThe same central word matrix W V×N Obtaining respective 1*N vectors, wherein V is the number of words in the vocabulary, and N is the dimension of the word vector; these 1*N vectors are then averaged to a 1*N vector; multiplying 1*N vector by context matrix U V×N Obtaining a 1*V vector, normalizing 1*V vectors softmax, outputting probability vectors 1*V of each word, and taking the word corresponding to the number with the maximum probability value as a predicted word; if the predicted value is inconsistent with the word of the context, correcting the central word matrix W by using a back propagation algorithm V×N And context matrix U V×N The method comprises the steps of carrying out a first treatment on the surface of the The skip-gram model is used for predicting the context of the context through the center word, and takes the one-hot code of the center word as input to be 1*V vector; multiplying 1*V vector by the center word matrix W V×N Obtaining a 1*N vector; multiplying 1*N vector by context matrix U V×N Obtaining a V-dimensional vector, carrying out normalization processing on the V-dimensional vector softmax, using a word corresponding to the number with the maximum probability as a model predictive word, and if the predictive value of the model is inconsistent with the word of the context, correcting the weight vector center word matrix W by using a back propagation algorithm V×N And context matrix U V×N
In the CBOW model, in a known context W (t-2) ,W (t-1) ,W (t+1) ,W (t+2) On the premise of predicting the current word W (t) Learned objective function P CBOW To maximize the log likelihood formula;
P CBOW =∑logp(W (t) |W (t-2) ,W (t-1) ,W (t+1) ,W (t+2) );
in the Skip-Gram model, the current word W is known (t) Predicting its context W (t-2) ,W (t-1) ,W (t+1) ,W (t+2) Objective function P Skip-Gram The method comprises the following steps:
P Skip-Gram =∑logp(W (t-2) ,W (t-1) ,W (t+1) ,W (t+2) |W (t) )。
step 4: aiming at slice characteristics, performing joint vulnerability detection by using an One-Vs-Rest model and an AdaBoost model; the loopholes comprise reentrant loopholes, integer overflow loopholes, exception handling loopholes, call stack overflow loopholes, tx.origin loopholes, timestamp dependence loopholes and transaction sequence dependence loopholes;
in this embodiment, the joint vulnerability detection is performed by using One-Vs-Rest model and AdaBoost model, which converts the multi-classification problem into multiple binary classification problems by using One-Vs-Rest model, wherein the conversion idea is to select One of the classes as positive and make all other classes as negative, and then classify each binary classification task by using One AdaBoost classifier, and the classification results are combined to provide the final multi-label classification result.
Please refer to fig. 2, the One-Vs-Rest model and AdaBoost model of the present embodiment are trained models; the specific training process comprises the following steps:
(1) Collecting an original data set and preprocessing;
collecting intelligent contracts of the Ethernet to form an original data set; preprocessing the original data, marking the data by an intelligent contract vulnerability detection tool oynte, and marking whether the contract contains the vulnerability and the vulnerability type; compiling the contract source code into byte codes through a solc command;
(2) Decompiling the byte code of the source code into an operation code by a vandal tool, wherein the operation code consists of a plurality of operation code fragments;
because the lengths of the operation code fragments are inconsistent, the characteristic lengths of all contracts are different, the embodiment adopts the PCA algorithm dimension reduction characteristic and the fixed characteristic, and adopts the 0 supplementing principle (according to the longest characteristic length, the other data characteristics are supplemented with 0) to perform the normalization processing of the operation code fragments;
(3) Extracting the characteristics of each operation code segment through word2vec, generating a word vector matrix, summing the word vector matrix to obtain a characteristic, forming the characteristic of each operation code segment into a slice characteristic, wherein the slice characteristic represents the characteristic of the contract, and forming a characteristic data set;
(4) The feature data set is subjected to an oversampling SMOTE algorithm, an undersampled smoetemek algorithm and a SMOTENN algorithm to obtain 3 training data sets;
in this embodiment, an oversampling SMOTE algorithm is used for the feature data set to obtain an SMOTE data set; the specific implementation method comprises the following steps:
1) For each sample a in the samples with the quantity less than the threshold value, calculating the distance from the sample a to all samples in a minority class sample set by taking Euclidean distance as a standard to obtain k nearest neighbor;
2) Setting a sampling proportion according to the sample unbalance proportion to determine a sampling multiplying power N, randomly selecting a plurality of samples from k neighbors of each minority sample a, and assuming that the selected neighbors are b;
3) For each randomly selected neighbor b, a new sample c=a+rand (0, 1) |a-b| is constructed with the original sample a according to the following formula, respectively.
In this embodiment, an undersampled smoetemek algorithm is used on the feature dataset to obtain an smoetemek dataset; the specific implementation method comprises the following steps:
1) Generating a new minority sample by using an SMOTE method to obtain an expanded data set T;
2) Removing TomekLinks pairs in the T, and cleaning data;
where Tomek Links are defined as a pair of Links between nearest neighbor samples of opposite classes, given a pair of samples (x i ,x j ) Wherein x is i ∈S maj, x j ∈S min Record d (x) i ,x j ) Is sample x i And x j Distance between, if there is no sample x k So that d (x i ,x k )<d(x i ,x j ) Then the sample pair (x i ,x j ) Known as Tomek Links.
In the embodiment, an undersampled SMOTENN algorithm is used for the feature data set to obtain an SMOTENN data set; the specific implementation method comprises the following steps:
1) Generating a new minority sample by using an SMOTE method to obtain an expanded data set T;
2) Predicting each sample in T by using a KNN (general K is taken as 3) method, if the prediction result is not consistent with the actual category label, rejecting the sample, and cleaning data;
wherein the KNN method comprises 4 steps: (1) preparing data and preprocessing the data; (2) calculating the distance from the test sample point to each other sample point; (3) sorting each distance, and then selecting K points with the smallest distance; (4) and comparing the categories of the K points, wherein the data minority obeys the principle of majority, and classifying the test sample points into the category with the highest proportion among the K points.
(5) Training an One-Vs-Rest model and an AdaBoost model by using a training data set;
in this embodiment, the training of the One-Vs-Rest model is realized by the following steps:
1) Selecting one of the classes as a positive class and making all other classes as negative classes;
2) Training an AdaBoost classifier for each classification task, and finally training six AdaBoost classifiers;
3) The results of the six classifiers are combined to provide the final result of the multi-label classification.
In this embodiment, the AdaBoost model is trained, and the specific implementation includes the following steps:
1) Initializing weight distribution of training data, wherein each training sample is given the same weight at the beginning:
D 1 =(w 11 ,w 12 ,w 13 ...,w 1i ,w 1N );
wherein w is 1j Representing the weight at the beginning of the jth training sample,n represents the total number of samples, and j is more than or equal to 1 and less than or equal to N;
2) Performing M iterations, wherein each iteration comprises the following steps:
a. using D with weight distribution m Is learned by the training data set of the base classifier G m (x):
G m (x):x→{-1,+1};
b. Calculation G m (x) Classification error rate e on training data set m
Wherein G is m (x i ) Representation base classifier G m (x) X on training set i Classification result, y i Representing training data x i True classification, w m,i Representing sample x at the mth iteration i P () represents the probability of an event, I () represents the result of 1 when the event in brackets is true, otherwise the result is 0;
c. calculation G m (x) Obtaining the weight alpha of the basic classifier in the final classifier m
Wherein e m Represents G m (x) A classification error rate on the training dataset;
d. updating weight distribution of the training data set:
D m+1 =(w m+1,1 ,w m+1,2 ,w m+1,3 ,...,w m+1,i ,w m+1,N );
wherein w is m+1,i Representing the updated weight of the ith training sample after m iterations; z is Z m Representing normalization factors, exp () represents an exponential function based on a natural constant e;
(3) Combining all the base classifiers, the final classification result being represented by all the base classifiers:
wherein G is m (x) Is a base classifier, alpha m Is a basic classifier G m (x) And in the weight of the final classifier, M is the iteration number.
The invention realizes intelligent contract multi-label vulnerability detection based on AdaBoost based on Word2vec and AdaBoost, and the scheme mainly researches operation codes of intelligent contracts, wherein the operation codes consist of a plurality of operation code fragments. Because the operation code comprises logic executed by the contract, the sequence slice characteristics of the operation code are extracted through Word2vec and PCA, and the slice characteristics are used as the input of the AdaBoost model for learning and training, the detection performance of the security vulnerability of the intelligent contract can be effectively improved.
The foregoing description of the preferred embodiments is not to be construed as limiting the scope of the invention, and persons of ordinary skill in the art may make substitutions or alterations without departing from the scope of the invention as set forth in the appended claims.

Claims (9)

1. An intelligent contract multi-label vulnerability detection method based on AdaBoost is characterized by comprising the following steps:
step 1: extracting byte codes of intelligent contracts to be detected;
step 2: decompiling the bytecode into an operation code;
step 3: extracting slice characteristics of the operation code;
step 4: aiming at slice characteristics, performing joint vulnerability detection by using an One-Vs-Rest model and an AdaBoost model; the loopholes comprise reentrant loopholes, integer overflow loopholes, exception handling loopholes, call stack overflow loopholes, tx.origin loopholes, timestamp dependence loopholes and transaction sequence dependence loopholes;
the joint vulnerability detection is carried out by using an One-Vs-Rest model and an AdaBoost model, the One-Vs-Rest model is adopted to convert the multi-classification problem into a plurality of binary classification problems, the conversion idea is that One of the classes is selected to be positive, all other classes are made to be negative, then each binary classification task is classified by One AdaBoost classifier, the total of six AdaBoost classifiers are used for classifying, and the classification results are combined to provide the final multi-label classification result.
2. The AdaBoost-based intelligent contract multi-label vulnerability detection method of claim 1, wherein the method comprises the steps of: in step 1, the bytecode is extracted by a solc command.
3. The AdaBoost-based intelligent contract multi-label vulnerability detection method of claim 1, wherein the method comprises the steps of: in step 2, the bytecode is decompiled into an opcode by a vandal tool, which consists of several opcode fragments.
4. The AdaBoost-based intelligent contract multi-label vulnerability detection method of claim 1, wherein the method comprises the steps of: in step 3, extracting each section of operation code feature through a word2vec model, generating a word vector matrix, summing the word vector matrix to obtain a feature, and forming the feature of each operation code segment into a slice feature, wherein the slice feature represents the feature of the contract;
the word2vec model consists of a CBOW model and a skip-gram model and comprises an input layer, a hidden layer and an output layer; the CBOW model is used for predicting the intermediate target word through the context content, inputting the one-hot code of the context word of the current word into the input layer, and multiplying the one-hot code by the same central word matrix W respectively V×N Obtaining respective 1*N vectors, wherein V is the number of words in the vocabulary, and N is the dimension of the word vector; these 1*N vectors are then averaged to a 1*N vector; multiplying 1*N vector by context matrix U V×N Obtaining a 1*V vector, normalizing 1*V vectors softmax, outputting probability vectors 1*V of each word, and taking the word corresponding to the number with the maximum probability value as a predicted word; if the predicted value does not match the word of the context, correcting the central word by using a back propagation algorithmMatrix W V×N And context matrix U V×N The method comprises the steps of carrying out a first treatment on the surface of the The skip-gram model is used for predicting the context of the context through the center word, and takes the one-hot code of the center word as input to be 1*V vector; multiplying 1*V vector by the center word matrix W V×N Obtaining a 1*N vector; multiplying 1*N vector by context matrix U V×N Obtaining a V-dimensional vector, carrying out normalization processing on the V-dimensional vector softmax, using a word corresponding to the number with the maximum probability as a model predictive word, and if the predictive value of the model is inconsistent with the word of the context, correcting the weight vector center word matrix W by using a back propagation algorithm V×N And context matrix U V×N
In the CBOW model, in a known context W (t-2) ,W (t-1) ,W (t+1) ,W (t+2) On the premise of predicting the current word W (t) Learned objective function P CBOW To maximize the log likelihood formula;
P CBOW =∑logp(W (t) |W (t-2) ,W (t-1) ,W (t+1) ,W (t+2) );
in the Skip-Gram model, the current word W is known (t) Predicting its context W (t-2) ,W (t-1) ,W (t+1) ,W (t+2) Objective function P Skip-Gram The method comprises the following steps:
P Skip-Gram =∑logp(W (t-2) ,W (t-1) ,W (t+1) ,W (t+2) |W (t) )。
5. the AdaBoost-based intelligent contract multi-label vulnerability detection method according to any one of claims 1-4, wherein the method is characterized by comprising the following steps: the One-Vs-Rest model and the AdaBoost model are trained models; the specific training process comprises the following steps:
(1) Collecting an original data set and preprocessing;
collecting intelligent contracts of the Ethernet to form an original data set; preprocessing the original data, marking the data by an intelligent contract vulnerability detection tool oynte, and marking whether the contract contains the vulnerability and the vulnerability type; compiling the contract source code into byte codes through a solc command;
(2) Decompiling the byte code of the source code into an operation code by a vandal tool, wherein the operation code consists of a plurality of operation code fragments;
adopting a dimension reduction feature and a fixed feature of a PCA algorithm, and adopting a 0 supplementing principle to perform normalization processing on the operation code fragments;
(3) Extracting the characteristics of each operation code segment through word2vec, generating a word vector matrix, summing the word vector matrix to obtain a characteristic, forming the characteristic of each operation code segment into a slice characteristic, wherein the slice characteristic represents the characteristic of the contract, and forming a characteristic data set;
(4) The feature data set is subjected to an oversampling SMOTE algorithm, an undersampled smoetemek algorithm and a SMOTENN algorithm to obtain 3 training data sets;
(5) Training an One-Vs-Rest model and an AdaBoost model by using a training data set;
the specific implementation of the training One-Vs-Rest model comprises the following steps:
1) Selecting one of the classes as a positive class and making all other classes as negative classes;
2) Training an AdaBoost classifier for each classification task, and finally training six AdaBoost classifiers;
3) The results of the six classifiers are combined to provide the final result of the multi-label classification.
The training AdaBoost model specifically comprises the following steps:
1) Initializing weight distribution of training data, wherein each training sample is given the same weight at the beginning:
D 1 =(w 11 ,w 12 ,w 13 ...,w 1j ,w 1N );
wherein w is 1j Representing the weight at the beginning of the jth training sample,n represents the total number of samples, and j is more than or equal to 1 and less than or equal to N;
2) Performing M iterations, wherein each iteration comprises the following steps:
a. using D with weight distribution m Is learned by the training data set of the base classifier G m (x):
G m (x):x→{-1,+1};
b. Calculation G m (x) Classification error rate e on training data set m
Wherein G is m (x i ) Representation base classifier G m (x) X on training set i Classification result, y i Representing training data x i True classification, w m,i Representing sample x at the mth iteration i P () represents the probability of an event, I () represents the result of 1 when the event in brackets is true, otherwise the result is 0;
c. calculation G m (x) Obtaining the weight alpha of the basic classifier in the final classifier m
Wherein e m Represents G m (x) A classification error rate on the training dataset;
d. updating weight distribution of the training data set:
D m+1 =(w m+1,1 ,w m+1,2 ,w m+1,3 ,...,w m+1,i ,w m+1,N );
wherein w is m+1,i Representing the ith training sample iteration mUpdating the weight after the second time; z is Z m Representing normalization factors, exp () represents an exponential function based on a natural constant e;
(3) Combining all the base classifiers, the final classification result being represented by all the base classifiers:
wherein G is m (x) Is a base classifier, alpha m Is a basic classifier G m (x) And in the weight of the final classifier, M is the iteration number.
6. The AdaBoost-based intelligent contract multi-label vulnerability detection method of claim 5, wherein the method comprises the following steps: the feature data set is subjected to an oversampling SMOTE algorithm to obtain an SMOTE data set; the specific implementation method comprises the following steps:
(1) For each sample a in the samples with the quantity less than the threshold value, calculating the distance from the sample a to all samples in a minority class sample set by taking Euclidean distance as a standard to obtain k nearest neighbor;
(2) Setting a sampling proportion according to the sample unbalance proportion to determine a sampling multiplying power N, randomly selecting a plurality of samples from k neighbors of each minority sample a, and assuming that the selected neighbors are b;
(3) For each randomly selected neighbor b, a new sample c=a+rand (0, 1) |a-b| is constructed with the original sample a according to the following formula, respectively.
7. The AdaBoost-based intelligent contract multi-label vulnerability detection method of claim 6, wherein the method comprises the steps of: the undersampling smoetemek algorithm is used for the characteristic data set, so that an smoetemek data set is obtained; the specific implementation method comprises the following steps:
(1) Generating a new minority sample by using an SMOTE method to obtain an expanded data set T;
(2) Removing Tomek Links pairs in the T, and cleaning data;
wherein Tomek Links are defined as opposite classesA pair of connections between nearest neighbor samples, given a pair of samples (x i ,x j ) Wherein x is i ∈S maj ,x j ∈S min Record d (x) i ,x j ) Is sample x i And x j Distance between, if there is no sample x k So that d (x i ,x k )<d(x i ,x j ) Then the sample pair (x i ,x j ) Known as Tomek Links.
8. The AdaBoost-based intelligent contract multi-label vulnerability detection method of claim 6, wherein the method comprises the steps of: the undersampled SMOTENN algorithm is used for the feature data set to obtain an SMOTENN data set; the specific implementation method comprises the following steps:
(1) Generating a new minority sample by using an SMOTE method to obtain an expanded data set T;
(2) Predicting each sample in the T by using a KNN method, and if the prediction result does not accord with the actual category label, rejecting the sample and cleaning data;
wherein the KNN method comprises 4 steps: (1) preparing data and preprocessing the data; (2) calculating the distance from the test sample point to each other sample point; (3) sorting each distance, and then selecting K points with the smallest distance; (4) and comparing the categories of the K points, wherein the data minority obeys the principle of majority, and classifying the test sample points into the category with the highest proportion among the K points.
9. An intelligent contract multi-label vulnerability detection system based on AdaBoost is characterized by comprising the following modules:
the byte code extraction module is used for extracting byte codes of intelligent contracts to be detected;
the operation code extraction module is used for decompiling the byte code into an operation code;
the slice feature extraction module is used for extracting slice features of the operation code;
the vulnerability detection module is used for carrying out joint vulnerability detection by using an One-Vs-Rest model and an AdaBoost model aiming at slice characteristics; the loopholes comprise reentrant loopholes, integer overflow loopholes, exception handling loopholes, call stack overflow loopholes, tx.origin loopholes, timestamp dependence loopholes and transaction sequence dependence loopholes;
the joint vulnerability detection is carried out by using an One-Vs-Rest model and an AdaBoost model, the One-Vs-Rest model is adopted to convert the multi-classification problem into a plurality of binary classification problems, the conversion idea is that One of the classes is selected to be positive, all other classes are made to be negative, then each binary classification task is classified by One AdaBoost classifier, the total of six AdaBoost classifiers are used for classifying, and the classification results are combined to provide the final multi-label classification result.
CN202310419740.8A 2023-04-13 2023-04-13 Intelligent contract multi-label vulnerability detection method and system based on AdaBoost Pending CN116541845A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310419740.8A CN116541845A (en) 2023-04-13 2023-04-13 Intelligent contract multi-label vulnerability detection method and system based on AdaBoost

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310419740.8A CN116541845A (en) 2023-04-13 2023-04-13 Intelligent contract multi-label vulnerability detection method and system based on AdaBoost

Publications (1)

Publication Number Publication Date
CN116541845A true CN116541845A (en) 2023-08-04

Family

ID=87444494

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310419740.8A Pending CN116541845A (en) 2023-04-13 2023-04-13 Intelligent contract multi-label vulnerability detection method and system based on AdaBoost

Country Status (1)

Country Link
CN (1) CN116541845A (en)

Similar Documents

Publication Publication Date Title
US11823050B2 (en) Semi-supervised person re-identification using multi-view clustering
Ruby et al. Binary cross entropy with deep learning technique for image classification
Mirza Computer network intrusion detection using various classifiers and ensemble learning
Zhang et al. Deep adversarial learning in intrusion detection: A data augmentation enhanced framework
US7724961B2 (en) Method for classifying data using an analytic manifold
CN113806746B (en) Malicious code detection method based on improved CNN (CNN) network
JPH07296117A (en) Constitution method of sort weight matrix for pattern recognition system using reduced element feature section set
CN110909224B (en) Sensitive data automatic classification and identification method and system based on artificial intelligence
CN111062036A (en) Malicious software identification model construction method, malicious software identification medium and malicious software identification equipment
US20200143209A1 (en) Task dependent adaptive metric for classifying pieces of data
CN116451139B (en) Live broadcast data rapid analysis method based on artificial intelligence
CN112883990A (en) Data classification method and device, computer storage medium and electronic equipment
US20240126876A1 (en) Augmented security recognition tasks
CN113269647A (en) Graph-based transaction abnormity associated user detection method
CN114897085A (en) Clustering method based on closed subgraph link prediction and computer equipment
Chuang et al. Infoot: Information maximizing optimal transport
CN116361788A (en) Binary software vulnerability prediction method based on machine learning
CN111797177A (en) Financial time sequence classification method for abnormal financial account detection and application
Jere et al. Principal component properties of adversarial samples
CN116662991A (en) Intelligent contract intention detection method based on artificial intelligence
CN116541845A (en) Intelligent contract multi-label vulnerability detection method and system based on AdaBoost
Tong et al. Graph convolutional network based semi-supervised learning on multi-speaker meeting data
CN112766423B (en) Training method and device for face recognition model, computer equipment and storage medium
CN112309375B (en) Training test method, device, equipment and storage medium for voice recognition model
Kajdanowicz et al. Boosting-based sequential output prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination