CN113312268A - Intelligent contract code similarity detection method - Google Patents

Intelligent contract code similarity detection method Download PDF

Info

Publication number
CN113312268A
CN113312268A CN202110695693.0A CN202110695693A CN113312268A CN 113312268 A CN113312268 A CN 113312268A CN 202110695693 A CN202110695693 A CN 202110695693A CN 113312268 A CN113312268 A CN 113312268A
Authority
CN
China
Prior art keywords
intelligent contract
detection method
similarity
similarity detection
source code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110695693.0A
Other languages
Chinese (zh)
Inventor
王荣
蔡维德
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202110695693.0A priority Critical patent/CN113312268A/en
Publication of CN113312268A publication Critical patent/CN113312268A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3604Software analysis for verifying properties of programs
    • G06F11/3608Software analysis for verifying properties of programs using formal methods, e.g. model checking, abstract interpretation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/42Syntactic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/43Checking; Contextual analysis
    • G06F8/433Dependency analysis; Data or control flow analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/43Checking; Contextual analysis
    • G06F8/436Semantic checking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Stored Programmes (AREA)

Abstract

The invention discloses an intelligent contract code similarity detection method, which comprises the following steps: (1) an intelligent contract source code library (2) is built to generate an intelligent contract Abstract Syntax Tree (AST) for each intelligent contract, 3) an intelligent contract semantic graph (5) is built based on the AST to extract a variable sequence and the dependency relationship (4) between variables, intelligent contract source codes and the semantic graph are input into a Bert pre-training model (6) to build an intelligent contract source code library vector matrix database (7), and similarity calculation (8) is carried out on the learned intelligent contract vectors and the intelligent contract source code library vector matrix to generate a similarity detection report. The method can automatically learn the characteristics of the intelligent contract codes and realize automatic detection of similarity of the intelligent contract codes.

Description

Intelligent contract code similarity detection method
Technical Field
The invention relates to the field of block chains, in particular to an intelligent contract code similarity detection method.
Background
With the increasing number of smart contracts, developers are used to copy existing source codes in code libraries to improve development efficiency, but the security problem of the smart contracts is the most common and flexible, and the loss caused by the security problem is the least controllable. Copying code is prone to unforeseen bugs that reduce the reliability and security of the program or software. In recent years, various methods and tools for detecting intelligent contract similar codes are presented, but most of the existing methods for detecting intelligent contract similar codes are based on a grammar model, depend on text information of source codes and ignore grammar structure information, so that grammar and semantic information of the source codes are lost, and many similar features are omitted.
BERT is a pre-trained model proposed by Google AI research in 2018, month 10, providing a model for migratory learning by other tasks, which can be fine-tuned or fixed according to the task and then used as a feature extractor.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides an intelligent contract code similarity detection method.
In order to solve the technical problem, the invention is solved by the following technical scheme:
an intelligent contract code similarity detection method comprises the following steps:
s01, constructing an intelligent contract source code library;
s02, generating an intelligent contract Abstract Syntax Tree (AST) for each intelligent contract;
s03, extracting the dependency relationship between the variable sequence and the variable based on the AST;
s04, constructing an intelligent contract semantic graph;
s05, inputting the intelligent contract source codes and the semantic graph into a Bert pre-training model;
s06, constructing an intelligent contract source code library vector matrix;
s07, calculating the similarity by using the learned intelligent contract vector and the intelligent contract source code library vector matrix;
and S08, generating a similarity detection report.
Specifically, the related intelligent contracts can be public chains, alliance chains or private chains, different intelligent contract source code libraries are created according to different types of intelligent contracts, and the intelligent contract source code libraries are composed of intelligent contract source codes. And generating an intelligent contract abstract syntax tree, wherein an open-source intelligent contract abstract syntax tree tool can be adopted, and an intelligent contract compiling tool can be utilized to realize abstract syntax tree extraction. And extracting the dependency relationship between the variable sequence and the variable based on the AST, namely extracting the dependency relationship between the variable sequence and the variable by performing recursive traversal on the AST node. The intelligent contract semantic graph is a graph data structure consisting of variable sequences and dependency relationships among variables, each variable in the variable sequences is a node in the intelligent contract semantic graph, and the dependency relationships among the variables are edges in the intelligent contract semantic graph. The Bert pre-training model can be an original Bert pre-training model or other types of Bert pre-training models. The intelligent contract source code library vector matrix is constructed by inputting intelligent contract source codes and an intelligent contract semantic graph into a Bert pre-training model to obtain dynamic vectors. The similarity calculation can be dot product or cosine similarity calculation.
Drawings
Some specific embodiments of the invention will be described in detail hereinafter, by way of illustration and not limitation, with reference to the accompanying drawings. The same reference numbers in the drawings identify the same or similar elements or components. Those skilled in the art will appreciate that the drawings are not necessarily drawn to scale. The objects and features of the present invention will become more apparent in view of the following description taken in conjunction with the accompanying drawings, in which:
fig. 1 is a schematic flow chart of an intelligent contract code similarity detection method according to the present invention.
Fig. 2 is a schematic diagram of the dependency relationship between the AST-based extraction variable sequence and the variables according to the present invention.
Detailed Description
In order to clearly illustrate the present invention and make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, so that those skilled in the art can implement the technical solutions in reference to the description text. The technology of the present invention will be described in detail below with reference to the accompanying drawings in conjunction with specific embodiments.
The embodiment of the invention provides an intelligent contract code similarity detection method, which comprises the following implementation processes:
s01, crawling the Etheng intelligent contract code to construct an Etheng intelligent contract source code library;
s02, compiling the intelligent contract in the Taifang intelligent contract source code library by using an open source compiler solc to generate an intelligent contract Abstract Syntax Tree (AST);
s03, determining a dependency relationship between a variable sequence and variables by recursively traversing the variable sequence in the AST, where the variable sequence is identified as V ═ V1, V2.., vk }, and a dependency relationship edge set E between the variables is { E1, E2.., el };
s04, constructing an intelligent contract semantic graph, wherein the intelligent contract semantic graph is a graph data structure composed of variable sequences and dependency relations among variables, each variable in the variable sequences is a node in the intelligent contract semantic graph, and the dependency relations among the variables are edges in the intelligent contract semantic graph. May be represented as smart contract semantic graph G ═ (V, E), where V ═ { V1, V2.., vk }, E ═ E1, E2.., el };
s05, inputting the intelligent contract source code C ═ { C1, C2,. and cn } and the semantic graph G ═ V, E into the Bert pre-training model, and obtaining an intelligent contract source code library vector w1 ═ V1, V2,. and vk } formed by dynamic vectors;
s06, an AST is constructed for each intelligent contract in the Taifang intelligent contract library, and the dependency relationship between the variable sequence and the variable is extracted from the AST to generate an intelligent contract semantic graph. And (4) inputting the intelligent contract semantic graph into the Bert language model to extract features. Finally, constructing an intelligent contract source code library vector matrix W { W1, W2,. multidot.wn };
and S07, performing the operation of steps S02-S05 on the intelligent contract program to be tested to obtain a vector R { R1, R2. Setting a threshold value sigma to be 0.75, when the calculated cosine similarity value is larger than sigma, indicating that the intelligent contract program to be tested has similarity with the program in the intelligent contract library, and otherwise, indicating that the similarity does not exist;
and S08, generating a similarity detection report according to the calculation result of the step S07, wherein the report comprises the intelligent contract similarity specific data and the similarity calculation result.

Claims (9)

1. An intelligent contract code similarity detection method is characterized in that the method is implemented by the steps of:
s01, constructing an intelligent contract source code library;
s02, generating an intelligent contract Abstract Syntax Tree (AST) for each intelligent contract;
s03, extracting the dependency relationship between the variable sequence and the variable based on the AST;
s04, constructing an intelligent contract semantic graph;
s05, inputting the intelligent contract source codes and the semantic graph into a Bert pre-training model;
s06, constructing an intelligent contract source code library vector matrix;
s07, calculating the similarity by using the learned intelligent contract vector and the intelligent contract source code library vector matrix;
and S08, generating a similarity detection report.
2. The intelligent contract code similarity detection method according to claim 1, wherein an intelligent contract source code library is constructed in step S01, the intelligent contract may be a public chain, a alliance chain or a private chain, different intelligent contract source code libraries are created according to different classes of intelligent contracts, and the intelligent contract source code library is composed of intelligent contract source codes.
3. The method according to claim 1, wherein the step S02 is performed to generate an intelligent contract abstract syntax tree, and an open-source intelligent contract abstract syntax tree tool such as a compiler tool solc is used, or an intelligent contract compilation tool is used to perform abstract syntax tree extraction.
4. The intelligent contract code similarity detection method according to claim 1, wherein in said step S03, the dependency relationship between the variable sequence and the variable is extracted by performing a recursive traversal on the AST node based on the AST extracted dependency relationship between the variable sequence and the variable.
5. The intelligent contract code similarity detection method according to claim 1, wherein the step S04 constructs an intelligent contract semantic graph, the intelligent contract semantic graph is a graph data structure composed of a variable sequence and dependencies between variables, each variable in the variable sequence is a node in the intelligent contract semantic graph, and the dependencies between variables are edges in the intelligent contract semantic graph.
6. The intelligent contract code similarity detection method according to claim 1, wherein the Bert pre-trained model in step S05 may be the original Bert pre-trained model or other types of Bert pre-trained models.
7. The intelligent contract code similarity detection method according to claim 1, wherein the step S06 is to construct an intelligent contract source code library vector matrix, which is an intelligent contract source code library vector matrix composed of intelligent contract source codes and dynamic vectors obtained by passing an intelligent contract semantic graph through a Bert pre-training model.
8. The intelligent contract code similarity detection method according to claim 1, wherein the similarity calculation in step S07 may be a dot product or a cosine similarity calculation.
9. The intelligent contract code similarity detection method according to claim 1, wherein a similarity detection report is generated in step S08, the report including intelligent contract similarity specific data and the similarity calculation result.
CN202110695693.0A 2021-07-29 2021-07-29 Intelligent contract code similarity detection method Pending CN113312268A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110695693.0A CN113312268A (en) 2021-07-29 2021-07-29 Intelligent contract code similarity detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110695693.0A CN113312268A (en) 2021-07-29 2021-07-29 Intelligent contract code similarity detection method

Publications (1)

Publication Number Publication Date
CN113312268A true CN113312268A (en) 2021-08-27

Family

ID=77379844

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110695693.0A Pending CN113312268A (en) 2021-07-29 2021-07-29 Intelligent contract code similarity detection method

Country Status (1)

Country Link
CN (1) CN113312268A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113672209A (en) * 2021-10-22 2021-11-19 环球数科集团有限公司 System for automatically generating intelligent contract according to distribution protocol
CN113760941A (en) * 2021-09-10 2021-12-07 北京航空航天大学 Intelligent contract program function retrieval method
CN113904844A (en) * 2021-10-08 2022-01-07 浙江工商大学 Intelligent contract vulnerability detection method based on cross-modal teacher-student network
CN115129364A (en) * 2022-07-05 2022-09-30 四川大学 Fingerprint identity recognition method and system based on abstract syntax tree and graph neural network

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112286575A (en) * 2020-10-20 2021-01-29 杭州云象网络技术有限公司 Intelligent contract similarity detection method and system based on graph matching model
US11036614B1 (en) * 2020-08-12 2021-06-15 Peking University Data control-oriented smart contract static analysis method and system
CN113032001A (en) * 2021-03-26 2021-06-25 中山大学 Intelligent contract classification method and device
WO2021139271A1 (en) * 2020-06-30 2021-07-15 平安科技(深圳)有限公司 Fm model based method and apparatus for predicting medical hot spot, and computer device
CN113157385A (en) * 2021-02-08 2021-07-23 北京航空航天大学 Intelligent contract vulnerability automatic detection method based on graph neural network
CN113177107A (en) * 2021-05-25 2021-07-27 浙江工商大学 Intelligent contract similarity detection method based on syntax tree matching

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021139271A1 (en) * 2020-06-30 2021-07-15 平安科技(深圳)有限公司 Fm model based method and apparatus for predicting medical hot spot, and computer device
US11036614B1 (en) * 2020-08-12 2021-06-15 Peking University Data control-oriented smart contract static analysis method and system
CN112286575A (en) * 2020-10-20 2021-01-29 杭州云象网络技术有限公司 Intelligent contract similarity detection method and system based on graph matching model
CN113157385A (en) * 2021-02-08 2021-07-23 北京航空航天大学 Intelligent contract vulnerability automatic detection method based on graph neural network
CN113032001A (en) * 2021-03-26 2021-06-25 中山大学 Intelligent contract classification method and device
CN113177107A (en) * 2021-05-25 2021-07-27 浙江工商大学 Intelligent contract similarity detection method based on syntax tree matching

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JUAN HE,RONG WANG,WEI-TEK TSAI: "SDFS: A Scalable Data Feed Service for Smart Contracts", 《2019 IEEE 10TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113760941A (en) * 2021-09-10 2021-12-07 北京航空航天大学 Intelligent contract program function retrieval method
CN113760941B (en) * 2021-09-10 2024-01-05 北京航空航天大学 Intelligent contract program function retrieval method
CN113904844A (en) * 2021-10-08 2022-01-07 浙江工商大学 Intelligent contract vulnerability detection method based on cross-modal teacher-student network
CN113904844B (en) * 2021-10-08 2023-09-12 浙江工商大学 Intelligent contract vulnerability detection method based on cross-mode teacher-student network
CN113672209A (en) * 2021-10-22 2021-11-19 环球数科集团有限公司 System for automatically generating intelligent contract according to distribution protocol
CN113672209B (en) * 2021-10-22 2021-12-21 环球数科集团有限公司 System for automatically generating intelligent contract according to distribution protocol
CN115129364A (en) * 2022-07-05 2022-09-30 四川大学 Fingerprint identity recognition method and system based on abstract syntax tree and graph neural network

Similar Documents

Publication Publication Date Title
CN113312268A (en) Intelligent contract code similarity detection method
US10983761B2 (en) Deep learning enhanced code completion system
CN110245496B (en) Source code vulnerability detection method and detector and training method and system thereof
US11048502B1 (en) Systems, devices, and methods for source code generation from binary files
CN112114791B (en) Code self-adaptive generation method based on meta-learning
Bernardi et al. Design pattern detection using a DSL‐driven graph matching approach
CN106843849B (en) Automatic synthesis method of code model based on library function of document
CN108491228B (en) Binary vulnerability code clone detection method and system
US20190317879A1 (en) Deep learning for software defect identification
CN113011461B (en) Software demand tracking link recovery method and electronic device based on classification and enhanced through knowledge learning
US20220035602A1 (en) Graph-Based Vectorization for Software Code Optimization References
CN115357904B (en) Multi-class vulnerability detection method based on program slicing and graph neural network
Wu et al. An Iterative Approach to Synthesize Data Transformation Programs.
CN113157385A (en) Intelligent contract vulnerability automatic detection method based on graph neural network
Wang et al. Explainable apt attribution for malware using nlp techniques
CN116305158A (en) Vulnerability identification method based on slice code dependency graph semantic learning
CN106648636B (en) Software function change prediction system and method based on graph mining
US20210389977A1 (en) System migration support apparatus, system migration support method and program
CN117725592A (en) Intelligent contract vulnerability detection method based on directed graph annotation network
Bernardi et al. Model-driven detection of Design Patterns
CN113297584A (en) Vulnerability detection method, device, equipment and storage medium
CN117195233A (en) Open source software supply chain-oriented bill of materials SBOM+ analysis method and device
CN115373737B (en) Code clone detection method based on feature fusion
CN114707151B (en) Zombie software detection method based on API call and network behavior
CN116595534A (en) Defect detection method of intelligent contract

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination