CN113312268A - Intelligent contract code similarity detection method - Google Patents
Intelligent contract code similarity detection method Download PDFInfo
- Publication number
- CN113312268A CN113312268A CN202110695693.0A CN202110695693A CN113312268A CN 113312268 A CN113312268 A CN 113312268A CN 202110695693 A CN202110695693 A CN 202110695693A CN 113312268 A CN113312268 A CN 113312268A
- Authority
- CN
- China
- Prior art keywords
- intelligent contract
- detection method
- similarity
- similarity detection
- source code
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 22
- 239000013598 vector Substances 0.000 claims abstract description 18
- 239000011159 matrix material Substances 0.000 claims abstract description 10
- 238000000034 method Methods 0.000 claims abstract description 6
- 238000000605 extraction Methods 0.000 claims description 3
- 230000009193 crawling Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000001617 migratory effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/3604—Software analysis for verifying properties of programs
- G06F11/3608—Software analysis for verifying properties of programs using formal methods, e.g. model checking, abstract interpretation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/42—Syntactic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/43—Checking; Contextual analysis
- G06F8/433—Dependency analysis; Data or control flow analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/43—Checking; Contextual analysis
- G06F8/436—Semantic checking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Stored Programmes (AREA)
Abstract
The invention discloses an intelligent contract code similarity detection method, which comprises the following steps: (1) an intelligent contract source code library (2) is built to generate an intelligent contract Abstract Syntax Tree (AST) for each intelligent contract, 3) an intelligent contract semantic graph (5) is built based on the AST to extract a variable sequence and the dependency relationship (4) between variables, intelligent contract source codes and the semantic graph are input into a Bert pre-training model (6) to build an intelligent contract source code library vector matrix database (7), and similarity calculation (8) is carried out on the learned intelligent contract vectors and the intelligent contract source code library vector matrix to generate a similarity detection report. The method can automatically learn the characteristics of the intelligent contract codes and realize automatic detection of similarity of the intelligent contract codes.
Description
Technical Field
The invention relates to the field of block chains, in particular to an intelligent contract code similarity detection method.
Background
With the increasing number of smart contracts, developers are used to copy existing source codes in code libraries to improve development efficiency, but the security problem of the smart contracts is the most common and flexible, and the loss caused by the security problem is the least controllable. Copying code is prone to unforeseen bugs that reduce the reliability and security of the program or software. In recent years, various methods and tools for detecting intelligent contract similar codes are presented, but most of the existing methods for detecting intelligent contract similar codes are based on a grammar model, depend on text information of source codes and ignore grammar structure information, so that grammar and semantic information of the source codes are lost, and many similar features are omitted.
BERT is a pre-trained model proposed by Google AI research in 2018, month 10, providing a model for migratory learning by other tasks, which can be fine-tuned or fixed according to the task and then used as a feature extractor.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides an intelligent contract code similarity detection method.
In order to solve the technical problem, the invention is solved by the following technical scheme:
an intelligent contract code similarity detection method comprises the following steps:
s01, constructing an intelligent contract source code library;
s02, generating an intelligent contract Abstract Syntax Tree (AST) for each intelligent contract;
s03, extracting the dependency relationship between the variable sequence and the variable based on the AST;
s04, constructing an intelligent contract semantic graph;
s05, inputting the intelligent contract source codes and the semantic graph into a Bert pre-training model;
s06, constructing an intelligent contract source code library vector matrix;
s07, calculating the similarity by using the learned intelligent contract vector and the intelligent contract source code library vector matrix;
and S08, generating a similarity detection report.
Specifically, the related intelligent contracts can be public chains, alliance chains or private chains, different intelligent contract source code libraries are created according to different types of intelligent contracts, and the intelligent contract source code libraries are composed of intelligent contract source codes. And generating an intelligent contract abstract syntax tree, wherein an open-source intelligent contract abstract syntax tree tool can be adopted, and an intelligent contract compiling tool can be utilized to realize abstract syntax tree extraction. And extracting the dependency relationship between the variable sequence and the variable based on the AST, namely extracting the dependency relationship between the variable sequence and the variable by performing recursive traversal on the AST node. The intelligent contract semantic graph is a graph data structure consisting of variable sequences and dependency relationships among variables, each variable in the variable sequences is a node in the intelligent contract semantic graph, and the dependency relationships among the variables are edges in the intelligent contract semantic graph. The Bert pre-training model can be an original Bert pre-training model or other types of Bert pre-training models. The intelligent contract source code library vector matrix is constructed by inputting intelligent contract source codes and an intelligent contract semantic graph into a Bert pre-training model to obtain dynamic vectors. The similarity calculation can be dot product or cosine similarity calculation.
Drawings
Some specific embodiments of the invention will be described in detail hereinafter, by way of illustration and not limitation, with reference to the accompanying drawings. The same reference numbers in the drawings identify the same or similar elements or components. Those skilled in the art will appreciate that the drawings are not necessarily drawn to scale. The objects and features of the present invention will become more apparent in view of the following description taken in conjunction with the accompanying drawings, in which:
fig. 1 is a schematic flow chart of an intelligent contract code similarity detection method according to the present invention.
Fig. 2 is a schematic diagram of the dependency relationship between the AST-based extraction variable sequence and the variables according to the present invention.
Detailed Description
In order to clearly illustrate the present invention and make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, so that those skilled in the art can implement the technical solutions in reference to the description text. The technology of the present invention will be described in detail below with reference to the accompanying drawings in conjunction with specific embodiments.
The embodiment of the invention provides an intelligent contract code similarity detection method, which comprises the following implementation processes:
s01, crawling the Etheng intelligent contract code to construct an Etheng intelligent contract source code library;
s02, compiling the intelligent contract in the Taifang intelligent contract source code library by using an open source compiler solc to generate an intelligent contract Abstract Syntax Tree (AST);
s03, determining a dependency relationship between a variable sequence and variables by recursively traversing the variable sequence in the AST, where the variable sequence is identified as V ═ V1, V2.., vk }, and a dependency relationship edge set E between the variables is { E1, E2.., el };
s04, constructing an intelligent contract semantic graph, wherein the intelligent contract semantic graph is a graph data structure composed of variable sequences and dependency relations among variables, each variable in the variable sequences is a node in the intelligent contract semantic graph, and the dependency relations among the variables are edges in the intelligent contract semantic graph. May be represented as smart contract semantic graph G ═ (V, E), where V ═ { V1, V2.., vk }, E ═ E1, E2.., el };
s05, inputting the intelligent contract source code C ═ { C1, C2,. and cn } and the semantic graph G ═ V, E into the Bert pre-training model, and obtaining an intelligent contract source code library vector w1 ═ V1, V2,. and vk } formed by dynamic vectors;
s06, an AST is constructed for each intelligent contract in the Taifang intelligent contract library, and the dependency relationship between the variable sequence and the variable is extracted from the AST to generate an intelligent contract semantic graph. And (4) inputting the intelligent contract semantic graph into the Bert language model to extract features. Finally, constructing an intelligent contract source code library vector matrix W { W1, W2,. multidot.wn };
and S07, performing the operation of steps S02-S05 on the intelligent contract program to be tested to obtain a vector R { R1, R2. Setting a threshold value sigma to be 0.75, when the calculated cosine similarity value is larger than sigma, indicating that the intelligent contract program to be tested has similarity with the program in the intelligent contract library, and otherwise, indicating that the similarity does not exist;
and S08, generating a similarity detection report according to the calculation result of the step S07, wherein the report comprises the intelligent contract similarity specific data and the similarity calculation result.
Claims (9)
1. An intelligent contract code similarity detection method is characterized in that the method is implemented by the steps of:
s01, constructing an intelligent contract source code library;
s02, generating an intelligent contract Abstract Syntax Tree (AST) for each intelligent contract;
s03, extracting the dependency relationship between the variable sequence and the variable based on the AST;
s04, constructing an intelligent contract semantic graph;
s05, inputting the intelligent contract source codes and the semantic graph into a Bert pre-training model;
s06, constructing an intelligent contract source code library vector matrix;
s07, calculating the similarity by using the learned intelligent contract vector and the intelligent contract source code library vector matrix;
and S08, generating a similarity detection report.
2. The intelligent contract code similarity detection method according to claim 1, wherein an intelligent contract source code library is constructed in step S01, the intelligent contract may be a public chain, a alliance chain or a private chain, different intelligent contract source code libraries are created according to different classes of intelligent contracts, and the intelligent contract source code library is composed of intelligent contract source codes.
3. The method according to claim 1, wherein the step S02 is performed to generate an intelligent contract abstract syntax tree, and an open-source intelligent contract abstract syntax tree tool such as a compiler tool solc is used, or an intelligent contract compilation tool is used to perform abstract syntax tree extraction.
4. The intelligent contract code similarity detection method according to claim 1, wherein in said step S03, the dependency relationship between the variable sequence and the variable is extracted by performing a recursive traversal on the AST node based on the AST extracted dependency relationship between the variable sequence and the variable.
5. The intelligent contract code similarity detection method according to claim 1, wherein the step S04 constructs an intelligent contract semantic graph, the intelligent contract semantic graph is a graph data structure composed of a variable sequence and dependencies between variables, each variable in the variable sequence is a node in the intelligent contract semantic graph, and the dependencies between variables are edges in the intelligent contract semantic graph.
6. The intelligent contract code similarity detection method according to claim 1, wherein the Bert pre-trained model in step S05 may be the original Bert pre-trained model or other types of Bert pre-trained models.
7. The intelligent contract code similarity detection method according to claim 1, wherein the step S06 is to construct an intelligent contract source code library vector matrix, which is an intelligent contract source code library vector matrix composed of intelligent contract source codes and dynamic vectors obtained by passing an intelligent contract semantic graph through a Bert pre-training model.
8. The intelligent contract code similarity detection method according to claim 1, wherein the similarity calculation in step S07 may be a dot product or a cosine similarity calculation.
9. The intelligent contract code similarity detection method according to claim 1, wherein a similarity detection report is generated in step S08, the report including intelligent contract similarity specific data and the similarity calculation result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110695693.0A CN113312268A (en) | 2021-07-29 | 2021-07-29 | Intelligent contract code similarity detection method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110695693.0A CN113312268A (en) | 2021-07-29 | 2021-07-29 | Intelligent contract code similarity detection method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113312268A true CN113312268A (en) | 2021-08-27 |
Family
ID=77379844
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110695693.0A Pending CN113312268A (en) | 2021-07-29 | 2021-07-29 | Intelligent contract code similarity detection method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113312268A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113672209A (en) * | 2021-10-22 | 2021-11-19 | 环球数科集团有限公司 | System for automatically generating intelligent contract according to distribution protocol |
CN113760941A (en) * | 2021-09-10 | 2021-12-07 | 北京航空航天大学 | Intelligent contract program function retrieval method |
CN113904844A (en) * | 2021-10-08 | 2022-01-07 | 浙江工商大学 | Intelligent contract vulnerability detection method based on cross-modal teacher-student network |
CN115129364A (en) * | 2022-07-05 | 2022-09-30 | 四川大学 | Fingerprint identity recognition method and system based on abstract syntax tree and graph neural network |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112286575A (en) * | 2020-10-20 | 2021-01-29 | 杭州云象网络技术有限公司 | Intelligent contract similarity detection method and system based on graph matching model |
US11036614B1 (en) * | 2020-08-12 | 2021-06-15 | Peking University | Data control-oriented smart contract static analysis method and system |
CN113032001A (en) * | 2021-03-26 | 2021-06-25 | 中山大学 | Intelligent contract classification method and device |
WO2021139271A1 (en) * | 2020-06-30 | 2021-07-15 | 平安科技(深圳)有限公司 | Fm model based method and apparatus for predicting medical hot spot, and computer device |
CN113157385A (en) * | 2021-02-08 | 2021-07-23 | 北京航空航天大学 | Intelligent contract vulnerability automatic detection method based on graph neural network |
CN113177107A (en) * | 2021-05-25 | 2021-07-27 | 浙江工商大学 | Intelligent contract similarity detection method based on syntax tree matching |
-
2021
- 2021-07-29 CN CN202110695693.0A patent/CN113312268A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021139271A1 (en) * | 2020-06-30 | 2021-07-15 | 平安科技(深圳)有限公司 | Fm model based method and apparatus for predicting medical hot spot, and computer device |
US11036614B1 (en) * | 2020-08-12 | 2021-06-15 | Peking University | Data control-oriented smart contract static analysis method and system |
CN112286575A (en) * | 2020-10-20 | 2021-01-29 | 杭州云象网络技术有限公司 | Intelligent contract similarity detection method and system based on graph matching model |
CN113157385A (en) * | 2021-02-08 | 2021-07-23 | 北京航空航天大学 | Intelligent contract vulnerability automatic detection method based on graph neural network |
CN113032001A (en) * | 2021-03-26 | 2021-06-25 | 中山大学 | Intelligent contract classification method and device |
CN113177107A (en) * | 2021-05-25 | 2021-07-27 | 浙江工商大学 | Intelligent contract similarity detection method based on syntax tree matching |
Non-Patent Citations (1)
Title |
---|
JUAN HE,RONG WANG,WEI-TEK TSAI: "SDFS: A Scalable Data Feed Service for Smart Contracts", 《2019 IEEE 10TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113760941A (en) * | 2021-09-10 | 2021-12-07 | 北京航空航天大学 | Intelligent contract program function retrieval method |
CN113760941B (en) * | 2021-09-10 | 2024-01-05 | 北京航空航天大学 | Intelligent contract program function retrieval method |
CN113904844A (en) * | 2021-10-08 | 2022-01-07 | 浙江工商大学 | Intelligent contract vulnerability detection method based on cross-modal teacher-student network |
CN113904844B (en) * | 2021-10-08 | 2023-09-12 | 浙江工商大学 | Intelligent contract vulnerability detection method based on cross-mode teacher-student network |
CN113672209A (en) * | 2021-10-22 | 2021-11-19 | 环球数科集团有限公司 | System for automatically generating intelligent contract according to distribution protocol |
CN113672209B (en) * | 2021-10-22 | 2021-12-21 | 环球数科集团有限公司 | System for automatically generating intelligent contract according to distribution protocol |
CN115129364A (en) * | 2022-07-05 | 2022-09-30 | 四川大学 | Fingerprint identity recognition method and system based on abstract syntax tree and graph neural network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113312268A (en) | Intelligent contract code similarity detection method | |
US10983761B2 (en) | Deep learning enhanced code completion system | |
CN110245496B (en) | Source code vulnerability detection method and detector and training method and system thereof | |
US11048502B1 (en) | Systems, devices, and methods for source code generation from binary files | |
CN112114791B (en) | Code self-adaptive generation method based on meta-learning | |
Bernardi et al. | Design pattern detection using a DSL‐driven graph matching approach | |
CN106843849B (en) | Automatic synthesis method of code model based on library function of document | |
CN108491228B (en) | Binary vulnerability code clone detection method and system | |
US20190317879A1 (en) | Deep learning for software defect identification | |
CN113011461B (en) | Software demand tracking link recovery method and electronic device based on classification and enhanced through knowledge learning | |
US20220035602A1 (en) | Graph-Based Vectorization for Software Code Optimization References | |
CN115357904B (en) | Multi-class vulnerability detection method based on program slicing and graph neural network | |
Wu et al. | An Iterative Approach to Synthesize Data Transformation Programs. | |
CN113157385A (en) | Intelligent contract vulnerability automatic detection method based on graph neural network | |
Wang et al. | Explainable apt attribution for malware using nlp techniques | |
CN116305158A (en) | Vulnerability identification method based on slice code dependency graph semantic learning | |
CN106648636B (en) | Software function change prediction system and method based on graph mining | |
US20210389977A1 (en) | System migration support apparatus, system migration support method and program | |
CN117725592A (en) | Intelligent contract vulnerability detection method based on directed graph annotation network | |
Bernardi et al. | Model-driven detection of Design Patterns | |
CN113297584A (en) | Vulnerability detection method, device, equipment and storage medium | |
CN117195233A (en) | Open source software supply chain-oriented bill of materials SBOM+ analysis method and device | |
CN115373737B (en) | Code clone detection method based on feature fusion | |
CN114707151B (en) | Zombie software detection method based on API call and network behavior | |
CN116595534A (en) | Defect detection method of intelligent contract |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |