CN116150757A - Intelligent contract unknown vulnerability detection method based on CNN-LSTM multi-classification model - Google Patents

Intelligent contract unknown vulnerability detection method based on CNN-LSTM multi-classification model Download PDF

Info

Publication number
CN116150757A
CN116150757A CN202211228581.5A CN202211228581A CN116150757A CN 116150757 A CN116150757 A CN 116150757A CN 202211228581 A CN202211228581 A CN 202211228581A CN 116150757 A CN116150757 A CN 116150757A
Authority
CN
China
Prior art keywords
operation code
code sequence
cnn
vulnerability
classification model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211228581.5A
Other languages
Chinese (zh)
Inventor
彭滔
李旭彬
王国军
李培强
顾婉仪
翟广鑫
黎相彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou University
Original Assignee
Guangzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou University filed Critical Guangzhou University
Priority to CN202211228581.5A priority Critical patent/CN116150757A/en
Publication of CN116150757A publication Critical patent/CN116150757A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/401Transaction verification
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Business, Economics & Management (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Accounting & Taxation (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Finance (AREA)
  • General Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an intelligent contract unknown vulnerability detection method based on a CNN-LSTM multi-classification model, which comprises the following steps: s1, pile inserting is carried out at an Ethernet client Geth; s2, obtaining a normal operation code sequence set and a vulnerability operation code sequence set through the inserted Ethernet client Geth; s3, training the enabling word vectors by the pre-training model to obtain a word vector index dictionary; s4, converting the replay operation code sequence into a feature vector matrix according to the word vector index dictionary, and reducing the dimension by using a CNN neural network; s5, training the LSTM classification model by using the feature vector matrix after dimension reduction; s6, collecting an operation code sequence to be tested generated by transaction in real time by using the inserted Ethernet client Geth; s7, in the detection stage, for each operation code sequence to be detected, the probability values of all vulnerability categories given by the classification model are summed and compared with a threshold value, and the judgment of unknown vulnerabilities is completed.

Description

Intelligent contract unknown vulnerability detection method based on CNN-LSTM multi-classification model
Technical Field
The invention relates to the technical field of vulnerability detection, in particular to an intelligent contract unknown vulnerability detection method based on a CNN-LSTM multi-classification model.
Background
In recent years, the development speed of deep learning is very rapid, so researchers start to test intelligent contract holes by using a deep learning method. Several intelligent contract vulnerability detection schemes implemented in conjunction with deep learning are described below.
1) N-gram and deep learning based intelligent contract vulnerability detection and analysis method research is proposed by Zhang et al, wherein an operation code sequence refers to an instruction sequence in which byte codes generated after intelligent contract compiling are mapped according to an Ethernet official instruction set, and the instructions are assembly operation codes. And compiling the constraint source codes and analyzing the byte codes to obtain a static operation code data stream, constructing a dictionary according to the corresponding relation between the operation codes and the 16-system numbers in the Ethernet yellow book, and executing different flows according to different schemes. N-gram based scheme: simplifying operation codes, slicing by using n-gram, constructing a feature matrix, and further training by using different machine learning classification algorithms. Deep learning-based scheme: the operation code data stream is converted into an operation code sequence expressed by 16-system numbers, and 4 different deep learning network structure training is input. Comparing the training results of the two schemes, the vulnerability detection effect of the deep learning network structure of CNN+LSTM is found to be best.
2) He J et al train a symbol execution expert by mimicking conventional symbol execution based on a graph rolling network (GCN, graph Convolutional Network) in deep learning. In the model learning stage, the team runs the symbol execution expert on tens of thousands of intelligent contracts, so that thousands of high-quality operation code sequences are generated and input into the next layer of training, and finally, a fuzzy strategy with higher program coverage rate is obtained. And in the using stage, generating a tested operation code sequence by using a fuzzy strategy learned by the model, and finally analyzing the result of the fuzzy test to obtain the result of the vulnerability detection. The operation code sequence in the scheme is a transaction execution process which is simulated by symbol execution, and the feasibility is not necessarily provided, so that the feasibility of the operation code sequence needs to be checked by means of fuzzy test.
3) Huang H D et al translate the 16-ary bytecodes compiled from smart contracts into RGB color codes, thereby converting each smart contract into a fixed-size image code, and training and detecting it as input to a CNN (convolutional neural network) deep learning model, but since this approach converts source codes directly into image codes, the processing of layers in the model may cause otherwise uncorrelated bytecodes to become correlated, or destroy the logical relationships between contexts, the scheme must choose the appropriate CNN (convolutional neural network) model structure to reduce the impact of this possibility,
in the scheme, original data is generally static intelligent contract source codes or compiled byte codes, and the loopholes cannot be dynamically detected according to real-time execution conditions of transactions on a chain; meanwhile, the semantic information of the operation code and the context relation of the sequence are lacked according to the feature vector acquired by the original data; moreover, the classification model can only detect a few known vulnerabilities, and cannot detect unknown vulnerabilities which are not found yet. According to the invention, currently, researches are freshly carried out on unknown vulnerability detection. Therefore, the invention provides an intelligent contract unknown vulnerability detection method based on a CNN-LSTM multi-classification model.
Disclosure of Invention
(one) solving the technical problems
Aiming at the defects of the prior art, the invention provides an intelligent contract unknown vulnerability detection method based on a CNN-LSTM multi-classification model, so as to solve the problems.
(II) technical scheme
In order to achieve the above purpose, the present invention provides the following technical solutions:
an intelligent contract unknown vulnerability detection method based on a CNN-LSTM multi-classification model comprises the following steps:
s1, pile inserting is carried out at an Ethernet client Geth;
s2, replaying the block transaction of the Ethernet, and obtaining a normal operation code sequence set and a vulnerability operation code sequence set through the inserted Ethernet client Geth;
s3, training the enabling Word vectors by using a Word2vec pre-training model to obtain a Word vector index dictionary;
s4, converting the replay operation code sequence into a feature vector matrix according to the word vector index dictionary, and reducing the dimension by using a CNN neural network;
s5, training the LSTM classification model by using the feature vector matrix after dimension reduction in a training stage;
s6, collecting an operation code sequence to be tested generated by transaction in real time by using the inserted Ethernet client Geth;
s7, in the detection stage, for each operation code sequence to be detected, the probability values of all vulnerability categories given by the classification model are summed and compared with a threshold value, and the judgment of unknown vulnerabilities is completed.
Further, in the step S1, the instrumentation refers to inserting a code segment capable of outputting transaction data into the Geth source code of the ethernet client, where the code is written by Golang, and the transaction data collected here includes:
block parameters such as block number, timestamp, nonce value, root hash, gas value, etc.; transaction information such as account addresses and transfer amounts involved in transactions; an executed smart contract address; balance account balance; an operation code sequence formed by assembly operation codes such as PUSH1, MSTORE, CALLDATASIZE, ISZERO and the like and operands; memory, storage, stack, etc. of the ethernet virtual machine.
Further, in the step S2, the replay refers to re-executing the transaction executed by the ethernet on the local private chain, and the normal operation code sequence set and the vulnerability operation code sequence set can be obtained through the instrumented ethernet client Geth, where the two sequence sets form a data set input into the subsequent model, and the specific flow is as follows:
s201, replaying the block transaction of the Ethernet;
s202, a normal operation code sequence set and a vulnerability operation code sequence set are obtained through the instrumented Ethernet client Geth.
Further, in the step S3, the Word2vec pre-training model is a neural network for generating an enabling Word vector in the NLP field, and one operation code in the trained Word vector index dictionary corresponds to one multidimensional Word vector, and the specific process is as follows:
s301, building a Word2vec pre-training model, setting a Word vector dimension to be 128 in parameter setting, setting the iteration number to be 8 (n __ epoch=8), setting the sample number of each time of the model to be 100 (batch_size=100), adopting a skip-gram algorithm and using negative sampling optimization;
s302, a normal operation code sequence set and a vulnerability operation code sequence set are input into a Word2vec pre-training model, a 129 x 128 Word vector index dictionary is output, 129 represents operation code types appearing in all operation code sequences, and 128 represents Word vector dimensions.
Further, the specific flow of the step S4 is as follows:
s401, converting each operation code in the operation code sequence into a corresponding word vector according to the word vector index dictionary, and finally converting each operation code sequence into a feature vector matrix of 128 x 5000, wherein 128 represents word vector dimension, and 5000 represents the length of the operation code sequence;
s402, constructing a single-layer CNN neural network;
and S403, inputting all the multidimensional feature vector matrixes into a CNN neural network to reduce the dimension, and finally reducing the multidimensional feature vector matrixes into a plurality of 64-x 1249 feature vector matrixes through convolution and pooling operation.
Further, in the step S5, the normal operation code sequence set and the vulnerability operation code sequence set form a data set input to the LSTM neural network, and the training process of the LSTM classification model is as follows:
s501, constructing a single-layer LSTM neural network, wherein the iteration number in parameter setting is 5 (n __ epoch=5), the excitation function is sigmoid, the loss function is sparse_category_cross-cosen, and the number of models transmitted each time is 100 (batch_size=100);
s502, inputting the feature vector matrix subjected to the CNN neural network dimension reduction into the LSTM neural network to obtain a classification model capable of distinguishing various vulnerability categories.
Further, the operation code sequence to be tested collected in the step S6 is an operation code sequence generated by a transaction newly executed on the ethernet.
Further, the specific flow of step S7 is as follows:
s701, inputting the collected operation code sequence to be tested into a trained CNN-LSTM multi-classification model, obtaining probability values of each vulnerability category and summing;
s702, comparing the probability sum with a threshold, if the probability sum is larger than the threshold, discovering a new unknown vulnerability, otherwise, not discovering.
(III) beneficial effects
Compared with the prior art, the intelligent contract unknown vulnerability detection method based on the CNN-LSTM multi-classification model has the following beneficial effects:
1. the intelligent contract unknown vulnerability detection method based on the CNN-LSTM multi-classification model supports dynamic detection of intelligent contract unknown vulnerabilities. Based on the premise that the unknown vulnerability operation code sequence has certain similarity with certain known vulnerability operation code sequences, the method combines the advantages of the dynamic detection and the deep learning technology, builds a deep learning multi-classification model of CNN-LSTM, takes the trained model as a tool for unknown vulnerability detection, and improves the accuracy of unknown vulnerability discrimination according to a threshold value set by the user.
2. According to the intelligent contract unknown vulnerability detection method based on the CNN-LSTM multi-classification model, the Word2vec pre-training model is utilized to construct the embedding Word vector dictionary for the operation code, semantic information of the operation code and context relation of an operation code sequence can be reserved, so that the extracted feature vector matrix is more reasonable, and the unknown vulnerability detection result is more evidence.
Drawings
FIG. 1 is a flow diagram of a method for intelligent contract unknown vulnerability detection;
FIG. 2 is a schematic diagram of a system model structure of a method for detecting unknown vulnerabilities of an intelligent contract;
FIG. 3 is a system flow diagram of a method for intelligent contract unknown vulnerability detection.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Examples
The embodiment of the invention discloses an intelligent contract unknown vulnerability detection method based on a CNN-LSTM multi-classification model, which relates to a system model as follows:
the detection system scheme is divided into 2 stages: a data preprocessing stage and a model training and testing stage.
The data preprocessing stage focuses on word vector training and feature vector length unification. Because of the lack of an unknown vulnerability operating code sequence in a real environment, the invention adopts part of the known vulnerability operating code sequence as the unknown vulnerability operating code sequence for detection in an experimental stage. The invention obtains 8 operation code sequences from a database: s0 (normal sequence), S1 (reentrant vulnerability sequence), S2 (unexpected function call vulnerability sequence), S3 (false authority check vulnerability sequence), S4 (false processed exception vulnerability sequence), S5 (lack of standard event vulnerability sequence), S6 (strict check balance vulnerability sequence), S7 (timestamp/block number dependency vulnerability sequence), and divide it into 2 parts: known vulnerability operation code sequences and partial normal sequences, unknown vulnerability operation code sequences and partial normal sequences. The first part is used to train the model and the second part is used to detect unknown vulnerabilities. S1, because the number of samples is small, the accuracy of the classification model is worried about being influenced, the unknown vulnerability is taken as a default, and one of the other 6 vulnerabilities is extracted as the unknown vulnerability. So far, the number of known vulnerabilities is 5, and the number of unknown vulnerabilities is 2. And pre-training all operation code sequences by using a Word2vec neural network to obtain an emmbedding Word vector, filling a feature vector matrix and unifying the length, and effectively reserving semantic information of the operation code and a logic relation before and after the operation code sequences.
The model training and testing stage comprises two parts of training and unknown vulnerability determination, and the complete model built by the method is an input layer, a CNN (convolutional neural network), an LSTM (long-short-term memory model), a full-connection layer and an output layer. Training phase: the known vulnerability sequence and the normal sequence are input into a model, a CNN (convolutional neural network) reduces the dimension of a matrix of 128 x 5000 into a matrix of 64 x 1249, an LSTM (long short term memory model) is used for multi-batch training and optimizing parameters through back propagation, and a full connection layer and an output layer are used for classifying and outputting model loss and accuracy. Unknown vulnerability determination stage: and inputting the feature vector matrix of the processed unknown vulnerability sequence into a trained model, and comparing the probability of all vulnerability judgment with a threshold value to obtain a detection result. The complete model has expansibility, the introduction of CNN (convolutional neural network) ensures that a long operation code sequence can be processed, and the flexible adjustment of the threshold value ensures that the judgment of the unknown vulnerability is more reliable, and the high threshold value and the high accuracy prove that the unknown vulnerability has stronger similarity with the known vulnerability.
Referring to fig. 1-2, the intelligent contract unknown vulnerability detection method based on the cnn+lstm multi-classification model provided in this embodiment includes the following steps:
mainly comprises 6 working procedures: geth pile insertion, training of the emmbedding word vector, data preprocessing, feature vector dimension reduction, classification model training and unknown vulnerability determination.
(1) Geth pile
The invention inserts the code segment into the Geth source code of the Ethernet client, so that the Ethernet client can output the corresponding operation code sequence when executing the transaction, thereby collecting the operation code sequence generated by replaying the transaction and the operation code sequence to be tested generated by newly executing the transaction. The replay operation code sequence is used for training the emmbedding word vector and inputting the vector into the CNN-LSTM multi-classification model for training, and the operation code sequence to be tested is used for detecting unknown vulnerabilities.
(2) Training of emmbedding word vectors
According to the invention, the Word2vec interface of the genesim library is used for training the enabling Word vector, training data are a normal operation code sequence and all vulnerability operation code sequences, the dimension of the Word vector in parameter setting is 128, the iteration number is 8 (n_epoch=8), the number of models which are transmitted each time is 100 (batch_size=100), and a skip-gram algorithm is adopted and negative sampling optimization is used. Because the types of the operation codes supported by the Ethernet virtual machine are only 130, the accuracy of the model is only affected by deleting the low-frequency operation codes, the invention reserves the low-frequency operation codes, and finally generates a 129 x 128 word vector index dictionary, 129 represents the types of the operation codes appearing in all operation code sequences, and 128 represents the word vector dimension.
(3) Data preprocessing
The obtained 5 known vulnerability operation code sequences and the obtained normal operation code sequences are divided into 6 files according to different categories, the files are read in sequence, and the sequences and the labels are respectively stored as a list. Then dividing the two into a training set and a testing set through the train_test_split interface of the sklearn library, wherein the testing set is used for testing the accuracy of the model to detect the known vulnerabilities, and the training set is used for: test set = 4:1, the ratio of each category in the two data sets is consistent with the ratio in the initial data set, ensuring that there is no impact on the model results. And dividing words of the data set according to the operation codes, respectively converting each operation code in the data set into a corresponding word vector according to a word vector index dictionary, complementing sequences which are less than 5000 by 0 vectors by using a pad_sequences interface of a sequence library to unify the sequence length to 5000, deleting operation codes which are more than 5000, and finally generating two 128 x 5000 x num three-dimensional feature vector matrixes, wherein num represents the number of operation code sequences.
(4) Feature vector dimension reduction
Because the feature vector matrix generated after the data preprocessing is too huge, the CNN neural network is built for dimension reduction before training, the dimension reduction is finally reduced to a three-dimensional feature vector matrix of 64 x 1249 x num through convolution and pooling operation, num represents the number of operation code sequences, and the training progress can be accelerated under the premise of not affecting the accuracy of a model.
(5) Classification model training
According to the invention, an LSTM neural network is selected as a classification model, the iteration number is 5 (n_epoch=5), a sigmoid is selected as an excitation function, a spark_category_cross sentropy is selected as a loss function, the number of models transmitted each time is 100 (batch_size=100), and finally the model loss and accuracy are output. The classification model can distinguish 6 categories, namely a normal category from 5 known vulnerability categories.
(6) Unknown vulnerability determination
The invention extracts two vulnerability sequences as unknown vulnerability sequences for testing. And reading unknown vulnerability sequences and storing the unknown vulnerability sequences as a list, wherein the unified sequence length of the pad_sequences interface of the sequence library is 5000, sequences which are insufficient to 5000 are complemented by 0 vectors, operation codes which exceed 5000 are deleted, and finally two 128 x 5000 x num three-dimensional vector matrixes are generated, wherein num represents the number of the operation code sequences. And then inputting a classification model, respectively obtaining the judging probabilities of 6 categories, summing the judging probabilities of all the vulnerability categories except the normal category to be used as the judging probabilities of the unknown vulnerability, and if the probabilities are larger than a set threshold (threshold=0.5), discovering a new unknown vulnerability, otherwise, not discovering.
The intelligent contract unknown vulnerability detection method based on the CNN-LSTM multi-classification model provided by the embodiment of the invention supports the dynamic detection of the intelligent contract unknown vulnerability. Based on the premise that the unknown vulnerability operation code sequence has certain similarity with certain known vulnerability operation code sequences, the method combines the advantages of the dynamic detection and the deep learning technology, builds a deep learning multi-classification model of CNN-LSTM, takes the trained model as a tool for unknown vulnerability detection, and improves the accuracy of unknown vulnerability discrimination according to a threshold value set by the user.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (8)

1. An intelligent contract unknown vulnerability detection method based on a CNN-LSTM multi-classification model is characterized by comprising the following steps:
s1, pile inserting is carried out at an Ethernet client Geth;
s2, replaying the block transaction of the Ethernet, and obtaining a normal operation code sequence set and a vulnerability operation code sequence set through the inserted Ethernet client Geth;
s3, training the enabling Word vectors by using a Word2vec pre-training model to obtain a Word vector index dictionary;
s4, converting the replay operation code sequence into a feature vector matrix according to the word vector index dictionary, and reducing the dimension by using a CNN neural network;
s5, training the LSTM classification model by using the feature vector matrix after dimension reduction in a training stage;
s6, collecting an operation code sequence to be tested generated by transaction in real time by using the inserted Ethernet client Geth;
s7, in the detection stage, for each operation code sequence to be detected, the probability values of all vulnerability categories given by the classification model are summed and compared with a threshold value, and the judgment of unknown vulnerabilities is completed.
2. The intelligent contract unknown vulnerability detection method based on the CNN-LSTM multi-classification model as claimed in claim 1, wherein the method comprises the following steps: in the step S1, the instrumentation refers to inserting a code segment capable of outputting transaction data into the Geth source code of the ethernet client, the code is written by Golang, and the collected transaction data includes:
block parameters such as block number, timestamp, nonce value, root hash, gas value, etc.;
transaction information such as account addresses and transfer amounts involved in transactions;
an executed smart contract address;
balance account balance;
PUSH1, MSTORE, CALLDATASIZE, ISZERO assemble the operation code sequence that operation code and operand make up;
and storing, storing and stacking the bottom layer information related to the Ethernet virtual machine.
3. The intelligent contract unknown vulnerability detection method based on the CNN-LSTM multi-classification model as claimed in claim 1, wherein the method comprises the following steps: in the step S2, the replay refers to re-executing the transaction executed by the ethernet in the present on the local private chain, and the normal operation code sequence set and the vulnerability operation code sequence set can be obtained through the instrumented ethernet client Geth, where the two sequence sets form a data set input into the subsequent model, and the specific flow is as follows:
s201, replaying the block transaction of the Ethernet;
s202, a normal operation code sequence set and a vulnerability operation code sequence set are obtained through the instrumented Ethernet client Geth.
4. The intelligent contract unknown vulnerability detection method based on the CNN-LSTM multi-classification model as claimed in claim 1, wherein the method comprises the following steps: in the step S3, the Word2vec pre-training model is a neural network for generating an emmbedding Word vector in the NLP field, and an operation code in the trained Word vector index dictionary corresponds to a multidimensional Word vector, and the specific process is as follows:
s301, a Word2vec pre-training model is built, a skip-gram algorithm is adopted, and negative sampling optimization is used;
s302, inputting a normal operation code sequence set and a vulnerability operation code sequence set into a Word2vec pre-training model, and outputting a Word vector index dictionary.
5. The intelligent contract unknown vulnerability detection method based on the CNN-LSTM multi-classification model as claimed in claim 1, wherein the method comprises the following steps: the specific flow of the step S4 is as follows:
s401, according to the word vector index dictionary, each operation code in the operation code sequence is converted into a corresponding word vector, and finally each operation code sequence is converted into a multidimensional feature vector matrix;
s402, constructing a single-layer CNN neural network;
s403, inputting all the multidimensional feature vector matrixes into the CNN neural network to reduce the dimension.
6. The intelligent contract unknown vulnerability detection method based on the CNN-LSTM multi-classification model as claimed in claim 1, wherein the method comprises the following steps: in step S5, the normal operation code sequence set and the vulnerability operation code sequence set form a data set input to the LSTM neural network, and the training process of the LSTM classification model is as follows:
s501, constructing a single-layer LSTM neural network;
s502, inputting the feature vector matrix subjected to the CNN neural network dimension reduction into the LSTM neural network to obtain a classification model capable of distinguishing various vulnerability categories.
7. The intelligent contract unknown vulnerability detection method based on the CNN-LSTM multi-classification model as claimed in claim 1, wherein the method comprises the following steps: the operation code sequence to be tested collected in real time in the step S6 is an operation code sequence generated by a newly executed transaction on the ethernet.
8. The intelligent contract unknown vulnerability detection method based on the CNN-LSTM multi-classification model as claimed in claim 1, wherein the method comprises the following steps: the specific flow of the step S7 is as follows:
s701, inputting the collected operation code sequence to be tested into a trained CNN-LSTM multi-classification model, obtaining probability values of each vulnerability category and summing;
s702, comparing the probability sum with a threshold, if the probability sum is larger than the threshold, discovering a new unknown vulnerability, otherwise, not discovering.
CN202211228581.5A 2022-10-08 2022-10-08 Intelligent contract unknown vulnerability detection method based on CNN-LSTM multi-classification model Pending CN116150757A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211228581.5A CN116150757A (en) 2022-10-08 2022-10-08 Intelligent contract unknown vulnerability detection method based on CNN-LSTM multi-classification model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211228581.5A CN116150757A (en) 2022-10-08 2022-10-08 Intelligent contract unknown vulnerability detection method based on CNN-LSTM multi-classification model

Publications (1)

Publication Number Publication Date
CN116150757A true CN116150757A (en) 2023-05-23

Family

ID=86349600

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211228581.5A Pending CN116150757A (en) 2022-10-08 2022-10-08 Intelligent contract unknown vulnerability detection method based on CNN-LSTM multi-classification model

Country Status (1)

Country Link
CN (1) CN116150757A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116361816A (en) * 2023-06-01 2023-06-30 江西农业大学 Intelligent contract vulnerability detection method, system, storage medium and equipment
CN117574214A (en) * 2024-01-15 2024-02-20 中科链安(北京)科技有限公司 Intelligent contract classification model training method, intelligent contract classification method and device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116361816A (en) * 2023-06-01 2023-06-30 江西农业大学 Intelligent contract vulnerability detection method, system, storage medium and equipment
CN116361816B (en) * 2023-06-01 2023-08-11 江西农业大学 Intelligent contract vulnerability detection method, system, storage medium and equipment
CN117574214A (en) * 2024-01-15 2024-02-20 中科链安(北京)科技有限公司 Intelligent contract classification model training method, intelligent contract classification method and device
CN117574214B (en) * 2024-01-15 2024-04-12 中科链安(北京)科技有限公司 Intelligent contract classification model training method, intelligent contract classification method and device

Similar Documents

Publication Publication Date Title
CN109408389B (en) Code defect detection method and device based on deep learning
CN110232280B (en) Software security vulnerability detection method based on tree structure convolutional neural network
CN102339252B (en) Static state detecting system based on XML (Extensive Makeup Language) middle model and defect mode matching
CN116150757A (en) Intelligent contract unknown vulnerability detection method based on CNN-LSTM multi-classification model
CN112711953A (en) Text multi-label classification method and system based on attention mechanism and GCN
CN109857457B (en) Function level embedding representation method in source code learning in hyperbolic space
CN111124487B (en) Code clone detection method and device and electronic equipment
CN112883714B (en) ABSC task syntactic constraint method based on dependency graph convolution and transfer learning
CN111427775B (en) Method level defect positioning method based on Bert model
CN109886021A (en) A kind of malicious code detecting method based on API overall situation term vector and layered circulation neural network
CN113742733B (en) Method and device for extracting trigger words of reading and understanding vulnerability event and identifying vulnerability type
CN111475820A (en) Binary vulnerability detection method and system based on executable program and storage medium
CN113672931A (en) Software vulnerability automatic detection method and device based on pre-training
CN112364352A (en) Interpretable software vulnerability detection and recommendation method and system
CN115146279A (en) Program vulnerability detection method, terminal device and storage medium
CN114036531A (en) Multi-scale code measurement-based software security vulnerability detection method
CN115526234A (en) Cross-domain model training and log anomaly detection method and device based on transfer learning
CN116340952A (en) Intelligent contract vulnerability detection method based on operation code program dependency graph
CN112035345A (en) Mixed depth defect prediction method based on code segment analysis
CN116702157B (en) Intelligent contract vulnerability detection method based on neural network
CN116975881A (en) LLVM (LLVM) -based vulnerability fine-granularity positioning method
CN116611071A (en) Function-level vulnerability detection method based on multiple modes
CN116361788A (en) Binary software vulnerability prediction method based on machine learning
CN115878498A (en) Key byte extraction method for predicting program behavior based on machine learning
CN116522334A (en) RTL-level hardware Trojan detection method based on graph neural network and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination