CN113177107A - Intelligent contract similarity detection method based on syntax tree matching - Google Patents
Intelligent contract similarity detection method based on syntax tree matching Download PDFInfo
- Publication number
- CN113177107A CN113177107A CN202110569353.3A CN202110569353A CN113177107A CN 113177107 A CN113177107 A CN 113177107A CN 202110569353 A CN202110569353 A CN 202110569353A CN 113177107 A CN113177107 A CN 113177107A
- Authority
- CN
- China
- Prior art keywords
- syntax tree
- similarity
- intelligent contract
- vector
- matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 50
- 239000013598 vector Substances 0.000 claims abstract description 60
- 238000000034 method Methods 0.000 claims abstract description 20
- 238000000605 extraction Methods 0.000 claims abstract description 12
- 238000004364 calculation method Methods 0.000 claims abstract description 9
- 230000007246 mechanism Effects 0.000 claims abstract description 7
- 239000011159 matrix material Substances 0.000 claims description 36
- 230000006870 function Effects 0.000 claims description 6
- RTZKZFJDLAIYFH-UHFFFAOYSA-N Diethyl ether Chemical compound CCOCC RTZKZFJDLAIYFH-UHFFFAOYSA-N 0.000 claims description 4
- 238000004458 analytical method Methods 0.000 claims description 4
- 238000004422 calculation algorithm Methods 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 238000011160 research Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 3
- 238000012935 Averaging Methods 0.000 abstract description 2
- 239000000284 extract Substances 0.000 abstract description 2
- 230000008569 process Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000004141 dimensional analysis Methods 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000011895 specific detection Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/253—Grammatical analysis; Style critique
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses an intelligent contract similarity detection method based on syntax tree matching, which captures intelligent contract syntax information by means of an abstract syntax tree extraction tool, obtains semantic information of each node in a syntax tree by utilizing an attention mechanism of an encoder, finally extracts high semantic feature vectors of the intelligent contract syntax tree, obtains similarity values between different syntax trees by taking the feature vectors as input of a similarity calculation function, and obtains a similarity detection result of two sections of intelligent contract source codes by an averaging method. Compared with the traditional code clone detection method, the method realizes more accurate detection effect, has the similarity detection explanation accurate to the code line, and has good universality and practical value.
Description
Technical Field
The invention belongs to the technical field of program similarity detection, and particularly relates to an intelligent contract similarity detection method based on syntax tree matching.
Background
In recent years, in order to improve software development efficiency, more and more developers have come to use code multiplexing techniques, such as multiplexing existing program codes, multiplexing general software frameworks, multiplexing common design patterns, and the like. However, blindly reusing existing program code may cause many problems, such as increased extra cost of the project, vulnerability of the software to vulnerability risks, and easy infringement of the software copyright.
Code similarity detection is one of the effective techniques for checking code reuse, also called code clone detection, and can determine whether identical or similar code fragments exist in two programs. According to different similarity degrees of codes, the code similarity detection is generally divided into four levels: (1) identical program code; (2) code other than space, comments, variable or function renaming, etc. is fully multiplexed; (3) code slightly modified based on type (2); (4) code that is implemented differently but semantically or functionally the same. The traditional detection method usually only considers code similarity detection at a grammar level, so that the detection level of the type (1) and the type (2) can be only achieved, and the existing similarity detection method combines multi-dimensional analysis methods such as vocabulary, grammar, semantics and the like to realize the code similarity detection at the type (3) and the type (4) levels.
The intelligent contract similarity detection is code clone detection aiming at a block chain intelligent contract, and the intelligent contract is program code written by a Turing complete language and has the advantages of non-reversibility and non-variability, namely, the contract cannot be modified and updated after being deployed. If a certain intelligent contract has a vulnerability, the cloned derivative contract may also have a corresponding vulnerability, so that it is necessary to research a similarity detection method for the intelligent contract code, which can effectively avoid the propagation of the contract vulnerability, thereby further improving the reliability and security of the intelligent contract.
The method based on syntax tree matching can effectively solve the problem of intelligent contract similarity detection, and comprises the steps of converting an intelligent contract code into Abstract Syntax Trees (AST), dividing each AST into a plurality of syntax trees to obtain corresponding syntax tree sequences, and calculating contract similarity matrixes of two different syntax trees by using a similarity detection algorithm; the method can realize high-efficiency and accurate intelligent contract similarity detection, can give similarity explanation accurate to a code line, and has good foresight and reference.
Disclosure of Invention
In view of the above, the present invention provides an intelligent contract similarity detection method based on syntax tree matching, which can implement intelligent contract source code similarity detection at semantic level.
An intelligent contract similarity detection method based on syntax tree matching comprises the following steps:
(1) constructing an abstract syntax tree: aiming at the Ether intelligent contract research object, extracting an abstract syntax tree from intelligent contract source codes by using a syntax tree extraction tool;
(2) constructing a syntax tree sequence: with intelligent contract code segment Z1And Z2For the contract clone pairs to be tested, Z is obtained using a syntax tree extraction tool1And Z2Corresponding abstract syntax tree F1And F2Will F1And F2Splitting according to corresponding sentences and traversing in a precedence way to obtain a syntax tree sequence S1And S2;
(3) And (3) syntactic tree feature extraction: constructing a syntax tree coder based on an Attention mechanism, and extracting a syntax tree sequence S1And S2The feature vector corresponding to each syntax tree in the syntax tree sequence S is further obtained1Feature vector set ofAnd a sequence of syntax trees S2Feature vector set of Where n denotes the dimension of the vector, m and k denote S, respectively1And S2The number of syntax trees in (1);
(4) similarity calculation: computing using Pearson's similarity algorithmEach vector ofThe similarity between the medium vectors obtains a contract similarity matrix Tm×kWherein T ism×kThe value of the ith row and the jth column element in the middle represents S1The ith syntax tree and S2Similarity of the jth syntax tree;
(5) contract similarity detection: setting a threshold a1And a2Will matrix Tm×kHigher than a1Is kept constant below a1Computing the average value M of all non-zero elements in the matrix, which is the intelligent contract code segment Z1And Z2Further comparing M with a2Size of contract code segment Z1And Z2Whether they are similar;
(6) interpretability analysis: if matrix Tm×kThe element value of the ith row and the jth column in the matrix is the maximum element value in the matrix, and then represents S1The ith syntax tree and S2The jth syntax tree has the highest similarity, so that the contract code segment Z can be positioned1And Z2There are specific lines of code that are similar.
Further, the specific implementation manner of the step (2) is as follows: first, an intelligent contract code segment Z is extracted by using a syntax tree extraction tool1And Z2Extracted as an abstract syntax tree F1And F2(ii) a Then, according to statement hierarchy, pair F1And F2Splitting is carried out, and a syntax tree sequence S is obtained through preorder traversal1={fi∈F1|f1,...,fmAnd S2={fj∈F2|f1,...,fkEach of which is a syntactic tree pairOne statement in smart contracts, i.e. Z1And Z2M sentences and k sentences are contained respectively, i and j are natural numbers, i is more than or equal to 1 and less than or equal to m, and j is more than or equal to 1 and less than or equal to k;
in particular, for a nested statement, a series of independent nodes Ns ═ block, body } needs to be defined, where block is used to split the header and body of the nested statement, and body is used for method declaration; the syntax tree rooted at node s consists of s and all its descendant nodes D(s), if there is a path between nodes s and d through n, meaning that node d is some statement contained in the body of s, where d ∈ D(s) and n ∈ Ns.
Further, the specific implementation manner of the step (3) is as follows:
firstly, converting all nodes in a syntax tree to be coded (if the syntax tree represents a statement for defining a variable, the nodes may be type definitions of the corresponding variable) into corresponding vector representations by using a word2vec tool, and obtaining a vector sequence X ═ { X ═ X1,...,xmTaking X as the input of a syntax tree encoder, and m is the number of nodes in the syntax tree;
then, constructing a syntax tree encoder based on the Attention, learning the semantic relation between each vector in the sequence X, and obtaining a semantic vector sequence Y ═ Y corresponding to the input vector sequence X through multi-layer iterative learning1,...,ym};
And finally, inputting all vectors in the sequence Y into a convolution pooling layer to generate a feature vector corresponding to the syntax tree.
Further, the specific implementation manner of the step (4) is as follows: will be provided withVector p in (1)iAndvector p in (1)jSubstituting into the following similarity calculation function to obtain piAnd pjIs a similarity value t ofijI.e. sim (p)i,pj);
Wherein: p is a radical ofitRepresenting a vector piThe value of the t-th element in (b),representing a vector piAverage value of elements in (1), pjtRepresenting a vector pjThe value of the t-th element in (b),representing a vector pjAverage value of elements in (1), tijIs a matrix Tm×kI.e. representing the code segment Z1Corresponding syntax tree sequence S1The ith syntax tree and code segment Z in2Corresponding syntax tree sequence S2The similarity of the jth syntax tree in (1).
Further, the specific implementation manner of the step (5) is as follows:
first, a threshold value a is set1And a2Wherein a is1For filtering elements of the contract similarity matrix with lower similarity values, a2For determining whether the two code segments are similar;
then, the matrix T is dividedm×kHigher than a1Is kept constant below a1Computing the average value M of all non-zero elements in the matrix, which is the intelligent contract code segment Z1And Z2The similarity of (2);
finally, M is compared with a2Size of (D), judgment of Z1And Z2Similarity of (c): if M ≧ a2Then represents Z1And Z2With similarity, otherwise Z1And Z2There is no similarity.
Further, the specific implementation manner of the step (6) is as follows: first, by comparing the matrix Tm×kThe value of some of the elements in the matrix can be locked, and the position of some of the elements in the matrix can be obtained(ii) a In particular, the matrix Tm×kThe value in the ith row and the jth column in the specification represents S1The ith syntax tree of (1) and (S)2If the similarity value of the jth syntax tree is larger than a set threshold value, the method represents an intelligent contract code segment Z1The ith statement in (1) and Z2The jth statement in (a) is highly similar, so that locking to a specific code line in the intelligent contract can be realized, and the interpretability of intelligent contract similarity detection is given.
The intelligent contract similarity detection method based on syntax tree matching effectively solves the problem of intelligent contract similarity detection of Etheng; compared with the traditional code clone detection method, the method disclosed by the invention realizes a more accurate detection effect, has a similarity detection explanation accurate to a code line, has good universality and practical value, and has the following main beneficial technical effects and innovativeness in the following four aspects:
1. the intelligent contract syntax tree construction method disclosed by the invention captures intelligent contract syntax information through an abstract syntax tree extraction tool, subdivides the information on a statement level, and can more accurately compare the similarity of two code segments.
2. The syntax tree coder based on the Attention mechanism can extract high-semantic vector representation in the contract syntax tree, and improves the accuracy and efficiency of similarity detection.
3. The invention digitalizes the similarity of different contract code lines, can correspondingly provide the intelligent contract similarity detection interpretability and has reliable reference significance.
4. The intelligent contract similarity detection method based on syntax tree matching has good expansibility and reference significance.
Drawings
Fig. 1 is a schematic flow chart of an intelligent contract similarity detection method based on syntax tree matching according to the present invention.
FIG. 2 is a flow diagram illustrating splitting an abstract syntax tree into syntax trees according to the present invention.
FIG. 3 is a schematic diagram of the coding of an Attention-based encoder according to the present invention.
Fig. 4 is a simulation diagram illustrating detection of similarity of intelligent contracts according to an embodiment of the present invention.
Detailed Description
In order to more specifically describe the present invention, the following detailed description is provided for the technical solution of the present invention with reference to the accompanying drawings and the specific embodiments.
The method comprises the steps of capturing intelligent contract grammar information by means of an abstract grammar tree extraction tool, acquiring semantic information of each node in a syntax tree by utilizing an attention mechanism of an encoder, finally extracting high semantic feature vectors of the intelligent contract syntax tree, obtaining similarity values between different syntax trees by taking the feature vectors as input of a similarity calculation function, obtaining a similarity detection result of two sections of intelligent contract source codes by an averaging method, and obtaining the flow of the similarity detection result as shown in figure 1.
As shown in fig. 2, the method of the present invention for splitting an intelligent contract syntax tree (AST) into syntax trees can be summarized as follows: and splitting the AST according to different sentences, and traversing in a preorder manner to obtain a syntax tree sequence S, namely each syntax tree corresponds to one sentence in the source code. For nested statements, a series of independent nodes Ns ═ block, body }, where block is used to split the headers and bodies of the nested statements (e.g., the nested relationship between syntax tree f4 and syntax tree f 5), and body is used for method declaration; the syntax tree is defined as: the syntax tree rooted at S consists of S and all its descendant nodes D (S) (where S ∈ S); the definition of the descendant node is: if there is a path between s and d through n, then we mean that node d is contained in some statement in the body of s (where d ∈ D(s), n ∈ Ns).
Taking the FunStatement syntax tree in fig. 2 as an example, the syntax tree encoder based on the Attention mechanism of the present invention has the following processes: (1) converting all nodes of the syntax tree into an initial vector representation by using a word2vec tool, and obtaining a sequence X ═ { X ═ X1,...,xmWhere xiVector representation for each node); (2) inputting the initial vector X of the syntax tree into the Attention network, learning the semantic relationship between the vectors by using the Attention mechanism mutually, and extracting each semantic relationship in the syntax treeSemantic relationships between nodes; (3) performing multiple iterative learning through the step (2), and performing normalization processing on the learning result through a Softmax layer to obtain a semantic vector representation Y ═ Y corresponding to the initial vector1,...,ymIn which yiAnd xiCorresponding in turn), which contains not only syntactic information, but also semantic information (the semantic vector referred to in this invention is: considering that the code corresponding to the node of the syntax tree may have different meanings in different code lines, converting the initial vector of the node into a semantization vector of the combined use environment, for example, there is public in both lines 3 and 4 of the code segment in fig. 2, where the public in line 3 represents a variable type and the public in line 4 represents a function type); (4) converting Y into a final syntax tree vector p using a convolution pooling layeri。
The following embodiment takes the intelligent contract similarity detection shown in fig. 4 as an example, and the specific detection flow is as follows:
(1) first, intelligent contracts A and B are respectively converted into abstract syntax trees F by using syntax tree extraction tools1And F2。
(2) As shown in FIG. 2, F1And F2Splitting according to different sentences and obtaining a syntax tree sequence S through preorder traversal1={fi∈F1|f1,...,f8},S2={fi∈F2|f1,...,f10}. In this example, contracts A and B contain 8 sentences and 10 sentences, respectively, so sequence S1Containing 8 syntactic trees, S2Contains 10 syntax trees.
(3) As shown in fig. 3, a syntax tree sequence S is generated by a syntax encoder based on the Attention mechanism1And S2In each clause tree fiCorresponding feature vector to obtain syntax tree sequence S1Feature vector set ofSyntax tree sequence S2Feature vector set of (the dimension of all feature vectors in this example is 64).
(4) Will be provided withMiddle vector PiAndmiddle vector PjSequentially substituting the similarity calculation function to obtain PiAnd PjIs a similarity value t ofijTo obtain a contract similarity matrix T8×10(ii) a Setting a threshold a1Filtering T by element value comparison8×10Middle element value lower than a1The specific implementation process of the element (2) is as follows:
4.1 calculation Using Pearson's similarity calculation functionAndsimilarity between medium syntax tree vectors:
finally obtaining a contract similarity matrix T8×10The matrix is the ith row and jth column element value, i.e. the sequence S1The ith syntax tree and S2The similarity value of the jth syntax tree in (1).
4.2 setting threshold a1When the ratio is 0.75, mixing T8×10Middle element value higher than a1Is less than a1Set to zero.
(5) Setting a threshold a2To find a contract similarity matrix T8×10Average value of medium and non-zero elements to obtain similarity value M of intelligent contracts A and B, and comparing M with a2To thereby judge the size ofWhether the intelligent contracts A and B are similar or not is determined by the following specific implementation process:
5.1 computing the matrix T8×10And the average value M of the non-zero elements is the similarity of the intelligent contracts A and B.
5.2 setting a threshold a20.8, if M is more than or equal to a2If the similarity value of the contracts A and B is higher than 0.8, the semantic similarity between the contracts A and B is represented; otherwise, the contracts A and B have no semantic similarity.
5.3 the similarity value M of this case is greater than 0.8, which indicates that there is semantic similarity between contracts A and B.
(6) For T8×10Further analysis was carried out on medium elements, if T8×10If the similarity value of the ith row and the jth column in the contract a is higher, it indicates that the ith statement in the contract a and the jth statement in the contract B have high similarity, which further embodies the interpretability analysis of the similarity detection method, and the specific implementation process is as follows:
6.1 contract similarity matrix T8×10The element values in the matrix are compared, and the element with higher similarity value of the matrix and the position of the element in the matrix can be obtained.
6.2 it is clear that in this example, the first and second statements in the contract A, B (similarity value 1.00) are the most similar, and thus can be locked to the two lines 1 and 2.
The embodiments described above are presented to enable a person having ordinary skill in the art to make and use the invention. It will be readily apparent to those skilled in the art that various modifications to the above-described embodiments may be made, and the generic principles defined herein may be applied to other embodiments without the use of inventive faculty. Therefore, the present invention is not limited to the above embodiments, and those skilled in the art should make improvements and modifications to the present invention based on the disclosure of the present invention within the protection scope of the present invention.
Claims (6)
1. An intelligent contract similarity detection method based on syntax tree matching comprises the following steps:
(1) constructing an abstract syntax tree: aiming at the Ether intelligent contract research object, extracting an abstract syntax tree from intelligent contract source codes by using a syntax tree extraction tool;
(2) constructing a syntax tree sequence: with intelligent contract code segment Z1And Z2For the contract clone pairs to be tested, Z is obtained using a syntax tree extraction tool1And Z2Corresponding abstract syntax tree F1And F2Will F1And F2Splitting according to corresponding sentences and traversing in a precedence way to obtain a syntax tree sequence S1And S2;
(3) And (3) syntactic tree feature extraction: constructing a syntax tree coder based on an Attention mechanism, and extracting a syntax tree sequence S1And S2The feature vector corresponding to each syntax tree in the syntax tree sequence S is further obtained1Feature vector set ofAnd a sequence of syntax trees S2Feature vector set of Where n denotes the dimension of the vector, m and k denote S, respectively1And S2The number of syntax trees in (1);
(4) similarity calculation: computing using Pearson's similarity algorithmEach vector ofThe similarity between the medium vectors obtains a contract similarity matrix Tm×kWherein T ism×kThe value of the ith row and the jth column element in the middle represents S1The ith syntax tree and S2Similarity of the jth syntax tree;
(5) contract similarity detection: setting a threshold a1And a2Will matrix Tm×kHigher than a1Is kept constant below a1Computing the average value M of all non-zero elements in the matrix, which is the intelligent contract code segment Z1And Z2Further comparing M with a2Size of contract code segment Z1And Z2Whether they are similar;
(6) interpretability analysis: if matrix Tm×kThe element value of the ith row and the jth column in the matrix is the maximum element value in the matrix, and then represents S1The ith syntax tree and S2The jth syntax tree has the highest similarity, so that the contract code segment Z can be positioned1And Z2There are specific lines of code that are similar.
2. The intelligent contract similarity detection method according to claim 1, wherein: the specific implementation manner of the step (2) is as follows: first, an intelligent contract code segment Z is extracted by using a syntax tree extraction tool1And Z2Extracted as an abstract syntax tree F1And F2(ii) a Then, according to statement hierarchy, pair F1And F2Splitting is carried out, and a syntax tree sequence S is obtained through preorder traversal1={fi∈F1|f1,...,fmAnd S2={fj∈F2|f1,...,fkWherein each syntax tree corresponds to a statement in the intelligent contract, i.e. Z1And Z2M sentences and k sentences are contained respectively, i and j are natural numbers, i is more than or equal to 1 and less than or equal to m, and j is more than or equal to 1 and less than or equal to k;
in particular, for a nested statement, a series of independent nodes Ns ═ block, body } needs to be defined, where block is used to split the header and body of the nested statement, and body is used for method declaration; the syntax tree rooted at node s consists of s and all its descendant nodes D(s), if there is a path between nodes s and d through n, meaning that node d is some statement contained in the body of s, where d ∈ D(s) and n ∈ Ns.
3. The intelligent contract similarity detection method according to claim 1, wherein: the specific implementation manner of the step (3) is as follows:
firstly, converting all nodes in a syntax tree needing encoding into corresponding vector representations by using a word2vec tool to obtain a vector sequence X ═ { X ═ X1,...,xmTaking X as the input of a syntax tree encoder, and m is the number of nodes in the syntax tree;
then, constructing a syntax tree encoder based on the Attention, learning the semantic relation between each vector in the sequence X, and obtaining a semantic vector sequence Y ═ Y corresponding to the input vector sequence X through multi-layer iterative learning1,...,ym};
And finally, inputting all vectors in the sequence Y into a convolution pooling layer to generate a feature vector corresponding to the syntax tree.
4. The intelligent contract similarity detection method according to claim 1, wherein: the specific implementation manner of the step (4) is as follows: will be provided withVector p in (1)iAndvector p in (1)jSubstituting into the following similarity calculation function to obtain piAnd pjIs a similarity value t ofijI.e. sim (p)i,pj);
Wherein: p is a radical ofitRepresenting a vector piThe value of the t-th element in (b),representing a vector piAverage value of elements in (1), pjtRepresenting a vector pjThe value of the t-th element in (b),representing a vector pjAverage value of elements in (1), tijIs a matrix Tm×kI.e. representing the code segment Z1Corresponding syntax tree sequence S1The ith syntax tree and code segment Z in2Corresponding syntax tree sequence S2The similarity of the jth syntax tree in (1).
5. The intelligent contract similarity detection method according to claim 1, wherein: the specific implementation manner of the step (5) is as follows:
first, a threshold value a is set1And a2Wherein a is1For filtering elements of the contract similarity matrix with lower similarity values, a2For determining whether the two code segments are similar;
then, the matrix T is dividedm×kHigher than a1Is kept constant below a1Computing the average value M of all non-zero elements in the matrix, which is the intelligent contract code segment Z1And Z2The similarity of (2);
finally, M is compared with a2Size of (D), judgment of Z1And Z2Similarity of (c): if M ≧ a2Then represents Z1And Z2With similarity, otherwise Z1And Z2There is no similarity.
6. The intelligent contract similarity detection method according to claim 1, wherein: the specific implementation manner of the step (6) is as follows: first, by comparing the matrix Tm×kThe value of the element in (1) can lock some elements with higher values, and obtain the position of the elements in the matrix; in particular, the matrix Tm×kThe value in the ith row and the jth column in the specification represents S1The ith syntax tree of (1) and (S)2If the similarity value of the jth syntax tree is larger than a set threshold value, the method represents an intelligent contract code segment Z1I of (1)A sentence and Z2The jth statement in (a) is highly similar, so that locking to a specific code line in the intelligent contract can be realized, and the interpretability of intelligent contract similarity detection is given.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110569353.3A CN113177107B (en) | 2021-05-25 | 2021-05-25 | Intelligent contract similarity detection method based on syntax tree matching |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110569353.3A CN113177107B (en) | 2021-05-25 | 2021-05-25 | Intelligent contract similarity detection method based on syntax tree matching |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113177107A true CN113177107A (en) | 2021-07-27 |
CN113177107B CN113177107B (en) | 2022-05-27 |
Family
ID=76929930
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110569353.3A Active CN113177107B (en) | 2021-05-25 | 2021-05-25 | Intelligent contract similarity detection method based on syntax tree matching |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113177107B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113312268A (en) * | 2021-07-29 | 2021-08-27 | 北京航空航天大学 | Intelligent contract code similarity detection method |
CN114201406A (en) * | 2021-12-16 | 2022-03-18 | 中国电信股份有限公司 | Code detection method, system, equipment and storage medium based on open source component |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110162750A (en) * | 2019-01-24 | 2019-08-23 | 腾讯科技(深圳)有限公司 | Text similarity detection method, electronic equipment and computer readable storage medium |
US20200162473A1 (en) * | 2018-11-20 | 2020-05-21 | Microsoft Technology Licensing, Llc | Blockchain smart contracts for digital asset access |
CN111898360A (en) * | 2019-07-26 | 2020-11-06 | 创新先进技术有限公司 | Text similarity detection method and device based on block chain and electronic equipment |
-
2021
- 2021-05-25 CN CN202110569353.3A patent/CN113177107B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200162473A1 (en) * | 2018-11-20 | 2020-05-21 | Microsoft Technology Licensing, Llc | Blockchain smart contracts for digital asset access |
CN110162750A (en) * | 2019-01-24 | 2019-08-23 | 腾讯科技(深圳)有限公司 | Text similarity detection method, electronic equipment and computer readable storage medium |
CN111898360A (en) * | 2019-07-26 | 2020-11-06 | 创新先进技术有限公司 | Text similarity detection method and device based on block chain and electronic equipment |
Non-Patent Citations (2)
Title |
---|
XIAOJUN XU等: "Neural Network-based Graph Embedding for Cross-Platform Binary Code Similarity Detection", 《PROCEEDINGS OF THE 2017 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY》 * |
范俊松等: "基于SGX的区块链交易隐私安全保护方法", 《应用科学学报》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113312268A (en) * | 2021-07-29 | 2021-08-27 | 北京航空航天大学 | Intelligent contract code similarity detection method |
CN114201406A (en) * | 2021-12-16 | 2022-03-18 | 中国电信股份有限公司 | Code detection method, system, equipment and storage medium based on open source component |
CN114201406B (en) * | 2021-12-16 | 2024-02-02 | 中国电信股份有限公司 | Code detection method, system, equipment and storage medium based on open source component |
Also Published As
Publication number | Publication date |
---|---|
CN113177107B (en) | 2022-05-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107516041B (en) | WebShell detection method and system based on deep neural network | |
WO2019233112A1 (en) | Vectorized representation method for software source codes | |
CN110941716B (en) | Automatic construction method of information security knowledge graph based on deep learning | |
CN113177107B (en) | Intelligent contract similarity detection method based on syntax tree matching | |
CN113761221B (en) | Knowledge graph entity alignment method based on graph neural network | |
CN112306494A (en) | Code classification and clustering method based on convolution and cyclic neural network | |
CN113901474B (en) | Vulnerability detection method based on function-level code similarity | |
CN114201406B (en) | Code detection method, system, equipment and storage medium based on open source component | |
CN114547619B (en) | Vulnerability restoration system and restoration method based on tree | |
CN114237621B (en) | Semantic code searching method based on fine granularity co-attention mechanism | |
CN114064117A (en) | Code clone detection method and system based on byte code and neural network | |
CN115033890A (en) | Comparison learning-based source code vulnerability detection method and system | |
CN114332519A (en) | Image description generation method based on external triple and abstract relation | |
CN112860904A (en) | External knowledge-integrated biomedical relation extraction method | |
CN115617395A (en) | Intelligent contract similarity detection method fusing global and local features | |
CN116975634A (en) | Micro-service extraction method based on program static attribute and graph neural network | |
CN116302089B (en) | Picture similarity-based code clone detection method, system and storage medium | |
CN115878177A (en) | Code clone detection method and system | |
CN115422541A (en) | Intelligent contract code clone detection method based on AST multi-dimensional feature fusion | |
CN116226864A (en) | Network security-oriented code vulnerability detection method and system | |
CN115185728A (en) | Software system architecture recovery method based on graph node embedding | |
CN116628695A (en) | Vulnerability discovery method and device based on multitask learning | |
CN114254130A (en) | Relation extraction method of network security emergency response knowledge graph | |
CN117435246B (en) | Code clone detection method based on Markov chain model | |
CN117850871A (en) | Code clone detection method and device based on semantic word analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20231213 Address after: No. 811 Xingbo Third Road, Chengdong Street, Boxing County, Binzhou City, Shandong Province, 256500 Patentee after: Shandong Rendui Network Co.,Ltd. Address before: 310018, No. 18 Jiao Tong Street, Xiasha Higher Education Park, Hangzhou, Zhejiang Patentee before: ZHEJIANG GONGSHANG University |