CN113656066A - Clone code detection method based on feature alignment - Google Patents

Clone code detection method based on feature alignment Download PDF

Info

Publication number
CN113656066A
CN113656066A CN202110936377.8A CN202110936377A CN113656066A CN 113656066 A CN113656066 A CN 113656066A CN 202110936377 A CN202110936377 A CN 202110936377A CN 113656066 A CN113656066 A CN 113656066A
Authority
CN
China
Prior art keywords
code
feature
bidirectional
tree
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110936377.8A
Other languages
Chinese (zh)
Other versions
CN113656066B (en
Inventor
方黎明
张爱平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202110936377.8A priority Critical patent/CN113656066B/en
Publication of CN113656066A publication Critical patent/CN113656066A/en
Application granted granted Critical
Publication of CN113656066B publication Critical patent/CN113656066B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/75Structural analysis for program understanding
    • G06F8/751Code clone detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/42Syntactic analysis
    • G06F8/427Parsing

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a clone code detection method based on feature alignment, which comprises the steps of analyzing a source code into an abstract syntax tree, dividing the abstract syntax tree into a sentence tree sequence, and then carrying out word embedding and semantic tree coding; secondly, extracting feature representation of code segments with rich structure and semantic information by using a bidirectional causal convolutional neural network; after feature extraction, an alignment matrix representing the corresponding relation between the two code segments is learned in a data-driven mode through sparse reconstruction, so that the two code segments are aligned, and the similarity of the two codes is obtained. Compared with the prior art, the method can extract more abundant features, solve the problem of structural difference of codes with similar functions due to different statement positions and obtain higher detection precision.

Description

Clone code detection method based on feature alignment
Technical Field
The invention belongs to the technical field of software code analysis.
Background
The purpose of code clone detection is to make a decision by measuring the similarity of two code fragments. Code clone detection has proven valuable throughout the software development lifecycle. Identifying textual, grammatical, or functionally similar code fragments is the basis for many software engineering tasks, such as code classification, code reconstruction, bug detection, and malicious code detection. In recent years, deep learning techniques have achieved good results in code clone testing, especially to address code clone testing with similar functions.
However, the prior art only focuses on how to extract more distinctive features from the source code, and some problems, such as structural differences of functionally similar codes, are not clearly solved. In the software development process, when a programmer copies a code segment, several statements are often added or deleted, or a more flexible syntax structure is used to realize the same function, which causes the code statements before and after copying to be misplaced, resulting in structural differences.
The code fragments are usually converted into an abstract syntax tree or a program dependency graph, and then CNN or RNN learning feature representation is adopted to calculate the similarity between features so as to decide whether the code fragments are similar. The learned features are typically two-dimensional tensors, and in order to generate vectors that compute the degree of similarity, a global pooling operation is typically employed. However, global pooling is inherently weak in addressing code misalignment, and misalignment of features still exists. When the similarity of code pairs with different structures and similar functions is calculated, the similarity is low due to the misplacement of characteristics, and therefore, the decision error can be caused. The alignment operation is carried out on the code characteristics, so that the gap of different structures of similar-function codes can be closed.
Disclosure of Invention
The purpose of the invention is as follows: in order to solve the problems in the prior art, the invention provides a clone code detection method based on feature alignment.
The technical scheme is as follows: the invention provides a clone code detection method based on feature alignment, which specifically comprises the following steps: inputting the target code x and the code y into a trained clone code detection model; the trained clone code detection model outputs the similarity of the code x and the code y, and whether the code x and the code y are similar codes is judged according to the similarity of the code x and the code y; the clone code detection model performs the following processing on the input code x and the input code y:
step 1: generating an abstract syntax tree T for a code x using a code parsing toolxAnd an abstract syntax tree T for the code yy
Step 2: according to an abstract syntax tree TxState node of, will TxDividing into a plurality of statement trees according to the original abstract syntax tree TxOrder of precedence traversalForming the plurality of sentence trees into a sentence tree sequence STx(ii) a According to an abstract syntax tree TyState node of, will TyDividing into a plurality of statement trees according to the original abstract syntax tree TxForming the statement trees into a statement tree sequence ST according to the sequence of the sequencing traversaly
And step 3: constructing a statement vector matrix: embedding words into node entities of each statement tree in each statement tree sequence, encoding the statement trees with the words embedded into statement vectors by adopting an encoder, and forming a statement vector matrix by the statement vectors corresponding to the statement tree sequence according to the statement tree sequence;
and 4, step 4: statement vector matrix X for code X using a bidirectional causal convolutional networkxStatement vector matrix X of sum code yyRespectively extracting the features to obtain the code features F of the code xxCode characteristics F of code yy
And 5: computing code features F by sparse reconstruction methodyFor code feature FxAlignment feature of
Figure BDA0003213350850000021
And code feature FxFor code feature FyAlignment feature of
Figure BDA0003213350850000022
Step 6: computing
Figure BDA0003213350850000023
And are each to RxyAnd RyxPerforming maximum pooling operation to obtain similarity eigenvector VxyAnd Vyx
And 7: will VxyAnd VyxAnd connecting in the characteristic dimension, inputting the connected vector into a full connection layer, inputting the output of the full connection layer into a sigmoid function layer, and outputting the similarity S of the code x and the code y by the sigmoid function layer.
Further, the bidirectional causal convolution network in step 4 includes a first bidirectional causal convolution module and a second bidirectional causal convolution module that are connected to each other, where the first and second bidirectional causal convolution modules have the same structure and each include a 1 × 1 convolution layer, a first bidirectional causal convolution layer and a second bidirectional causal convolution layer; adding the results output by the 1 × 1 convolutional layer, the first bidirectional causal convolutional layer and the second bidirectional causal convolutional layer as the output of the bidirectional causal convolutional module; the first bi-directional causal convolutional layer is a bi-directional causal convolutional layer with a convolution kernel of 3 × 1, and the second bi-directional causal convolutional layer is a bi-directional causal convolutional layer with a convolution kernel of 3 × 1 and a step size of 2.
Further, the bi-directional causal convolutional layer performs the following operations on the input features:
Figure BDA0003213350850000024
Figure BDA0003213350850000031
Figure BDA0003213350850000032
wherein the content of the first and second substances,
Figure BDA0003213350850000033
is the T-th feature vector in the input features of the bidirectional causal convolution layer, wherein T is 0,1,2, …, T'; when the code x is subjected to feature extraction, T' is a sentence tree sequence STxLength of (d); when the code y is subjected to feature extraction, T' is a sentence tree sequence STyLength of (d); k is the convolution kernel size of the bi-directional causal convolution layer,
Figure BDA0003213350850000034
for convolution kernels, f and b represent forward and backward, ═ f or b, respectively;
Figure BDA0003213350850000035
for forward convolution operation in bidirectional causal convolution layerThe t-th feature vector is output,
Figure BDA0003213350850000036
the concat represents the connection of the t characteristic vector output by the backward convolution operation in the bidirectional causal convolution layer, and the t output characteristic vector F of the bidirectional causal convolution layer is obtained by connecting the bidirectional characteristic vectorst
Further, the step 5 specifically includes:
computing code features FyFor code feature FxAlignment feature of
Figure BDA0003213350850000037
The specific method comprises the following steps:
constructing code features F by sparse reconstructionyAligning to code feature FxThe objective function of (2):
Figure BDA0003213350850000038
wherein WyxIs a sparse reconstruction coefficient; beta is an equilibrium coefficient;
solving the objective function by the least square method to obtain
Figure BDA0003213350850000039
IyIs of size
Figure BDA00032133508500000310
Figure BDA00032133508500000311
T is a transpose,
Figure BDA00032133508500000312
for a sentence tree sequence STyLength of (d); thereby obtaining
Figure BDA00032133508500000313
Computing code features FxFor code feature FyAlignment feature of
Figure BDA00032133508500000314
The specific method comprises the following steps:
constructing code features F by sparse reconstructionxAligning to code feature FyThe objective function of (2):
Figure BDA00032133508500000315
wherein, WxyIs a sparse reconstruction coefficient;
solving the objective function by the least square method to obtain
Figure BDA00032133508500000316
Thereby obtaining
Figure BDA00032133508500000317
Figure BDA00032133508500000318
Wherein IxIs of size
Figure BDA00032133508500000319
The unit matrix of (a) is,
Figure BDA00032133508500000320
for a sentence tree sequence STxLength of (d).
Further, the loss function of the clone code detection model is:
Figure BDA0003213350850000041
wherein N is the total number of samples in the training set, one sample comprises two codes, yjIs the label of the jth sample, SjIs the similarity between two codes in the jth sample.
Has the advantages that: the method extracts the characteristics of the code through the bidirectional causal convolutional network, can acquire context information in a bidirectional mode, improves the representing capability of the code, has fewer parameters than bidirectional RNN (radio network) used by other methods, and has higher speed and higher accuracy in code cloning detection. The code features are aligned through sparse reconstruction, and the influence caused by code feature dislocation can be effectively eliminated. For more difficult clone code types, namely similar codes with added, deleted and modified sentences and similar codes with the same semantics and greatly different grammar structures, the invention can obviously improve the detection accuracy and the recall rate.
Drawings
FIG. 1 is a flow chart of a method of the present invention;
FIG. 2 is a schematic diagram of a method of code alignment;
FIG. 3 is a schematic diagram of a bidirectional causal convolution network, where (a) is a schematic diagram of the structure of the bidirectional causal convolution network and (b) is a schematic diagram of the structure of the bidirectional causal convolution module.
Detailed Description
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate an embodiment of the invention and, together with the description, serve to explain the invention and not to limit the invention.
The embodiment provides a clone code detection method based on feature alignment, which specifically comprises the following steps: inputting the code x and the code y into a trained clone code detection model; the trained clone code detection model outputs the similarity of the code x and the code y, and whether the code x and the code y are similar codes is judged according to the similarity of the code x and the code y; the clone code detection model converts an original code segment pair into an abstract syntax tree, then divides the abstract syntax tree to generate a statement tree sequence consisting of a plurality of statement trees, constructs a statement vector matrix, extracts features by using a bidirectional causal convolution network, aligns the features by using the proposed sparse reconstruction feature alignment method, and calculates the similarity of the aligned code features.
As shown in fig. 1 and 2, the specific clone code detection model processes the code x and the code y as follows:
(1) taking the code x and the code y as a code pair, and generating an abstract syntax tree T of the code x and the code y by utilizing an existing code analysis tool for a code bookxAnd Ty
(2) According to the state node of the abstract syntax tree, TxDividing the sentence trees into a plurality of sentence trees, and forming the sentence trees into a sentence tree sequence ST according to the sequence traversed by the original abstract syntax treex(ii) a Will TyDividing the sentence trees into a plurality of sentence trees, and forming the sentence trees into a sentence tree sequence ST according to the sequence traversed by the original abstract syntax treey(ii) a The purpose of this step is to keep the fixed order not to lose the structural information of the original code as much as possible, second, can carry on the subsequent characteristic extraction after forming the sequence;
(3) embedding words into the node entities in the statement tree by using Word2 Vec; coding the sentence tree with the embedded words into a sentence vector through a sentence coder, and constructing a sentence vector matrix according to the sequence of the sentence tree sequence; a sentence tree sequence ST is obtainedxSentence vector matrix and sentence tree sequence STyThe statement vector matrix of (2);
(4) inputting the statement vector matrix into a bidirectional causal convolution network for feature extraction, thereby obtaining the code feature F of the code xxCode characteristics F of code yy
(5) Generating the code characteristics of the code sample pair by sparse reconstruction to generate the alignment characteristics, namely the code characteristics FyFor code feature FxAlignment feature of
Figure BDA0003213350850000051
And code feature FxFor code feature FyAlignment feature of
Figure BDA0003213350850000052
(6) Subtracting the alignment features from the original features, calculating an absolute value, and performing maximum pooling along the quantity dimension of the statement tree to obtain a similarity feature vector of a single code and another code;
(7) connecting the two similarity characteristic vectors on the characteristic dimension, inputting the connected vectors into a connection layer, inputting the output of the full connection layer into a sigmoid function layer, and outputting the similarity S of the code x and the code y by the sigmoid function layer.
In step 4, as shown in fig. 3, the bidirectional causal convolution network is formed by stacking two bidirectional causal convolution modules in series. And the bidirectional causal convolution module consists of a 1 multiplied by 1 convolution layer and two bidirectional causal convolution layers and is used for capturing information with different scales. The convolution kernels of the two bidirectional causal convolution layers are 3 x 1 in size, and the step lengths are 1 and 2 respectively.
The two-way causal convolutional layer can be represented as:
Figure BDA0003213350850000053
Figure BDA0003213350850000061
Figure BDA0003213350850000062
wherein the content of the first and second substances,
Figure BDA0003213350850000063
is the T-th feature vector in the input features of the bidirectional causal convolution layer, wherein T is 0,1,2, …, T'; when the code x is subjected to feature extraction, T' is a sentence tree sequence STxLength of (d); when the code y is subjected to feature extraction, T' is a sentence tree sequence STyLength of (d); k is the convolution kernel size of the bi-directional causal convolution layer,
Figure BDA0003213350850000064
for convolution kernels, f and b represent forward and backward, ═ f or b, respectively;
Figure BDA0003213350850000065
the t-th eigenvector output for the forward convolution operation in the bi-directional causal convolution layer,
Figure BDA0003213350850000066
the concat represents the connection of the t characteristic vector output by the backward convolution operation in the bidirectional causal convolution layer, and the t output characteristic vector F of the bidirectional causal convolution layer is obtained by connecting the bidirectional characteristic vectorst
The output of the two-way causal convolution module is obtained by adding the 1 x 1 convolutional layer to the outputs of the two-way causal convolutional layers. Statement vector matrix XxAnd XyThe characteristics obtained by the bidirectional causal convolution network are respectively
Figure BDA0003213350850000067
And
Figure BDA0003213350850000068
Figure BDA0003213350850000069
Figure BDA00032133508500000610
for the length of the statement tree of the code x,
Figure BDA00032133508500000611
for the statement tree length of code y, D represents the dimension of the feature.
In an embodiment of the present invention, step 5 specifically includes:
computing code features FyFor code feature FxAlignment feature of
Figure BDA00032133508500000612
The specific method comprises the following steps:
constructing code features F by sparse reconstructionyAligning to code feature FxThe objective function of (2):
Figure BDA00032133508500000613
wherein the content of the first and second substances,
Figure BDA00032133508500000614
are sparse reconstruction coefficients, i.e. alignment matrices. Beta is the equilibrium coefficient. The least square method is adopted to obtain:
Figure BDA00032133508500000615
wherein
Figure BDA00032133508500000616
Is a matrix WyxThe transpose of (a) is performed,
Figure BDA00032133508500000617
is an inversion matrix.
Figure BDA00032133508500000618
Is an identity matrix. It can thus be obtained that the alignment characteristic of the code y with respect to the code x is:
Figure BDA00032133508500000619
computing code features FxFor code feature FyAlignment feature of
Figure BDA00032133508500000620
The specific method comprises the following steps:
constructing code features F by sparse reconstructionxAligning to code feature FyThe objective function of (2):
Figure BDA0003213350850000071
wherein the content of the first and second substances,
Figure BDA0003213350850000072
for sparse reconstruction coefficients, least squares are usedThe method can be solved as follows:
Figure BDA0003213350850000073
wherein
Figure BDA0003213350850000074
Is a matrix WxyThe transpose of (a) is performed,
Figure BDA0003213350850000075
is an inversion matrix.
Figure BDA0003213350850000076
Is an identity matrix, and thus the alignment characteristics of the code x to the code y can be obtained as follows:
Figure BDA0003213350850000077
in step 6, code x is characterized by F, an embodiment of the present inventionxThe alignment characteristic of the code y to the code x is
Figure BDA0003213350850000078
The difference between the two characteristics is calculated, and the absolute value is calculated
Figure BDA0003213350850000079
The vector V is obtained by pooling the maximum values of the vector V and the vector Vxy. The code y is characterized by FyThe alignment characteristic of the code x to the code y is
Figure BDA00032133508500000710
The difference between the two characteristics is calculated, and the absolute value is calculated
Figure BDA00032133508500000711
The vector V is obtained by pooling the maximum values of the vector V and the vector Vyx
In one embodiment of the invention, in step 7, vector V is appliedxyAnd VyxConnecting on characteristic dimension to obtain vector V, and inputting the vector V into full connection layer FCAnd obtaining the similarity S through a sigmoid function.
The clone code detection model adopts a binary cross entropy loss function as follows:
Figure BDA00032133508500000712
wherein N is the total number of samples in the training set, one sample comprises two codes, yjIs the label of the jth sample (manually set, label 1 if two codes in one sample are similar codes, namely clone codes, otherwise label 0), SjThe similarity between two codes in the jth sample is taken as the similarity (the value of the similarity is 0-1); training process by calculating
Figure BDA00032133508500000713
And back-propagating the gradient, updating the parameters of the model using gradient descent, thereby causing
Figure BDA00032133508500000714
Decrease, iterate a certain number of times or
Figure BDA00032133508500000715
And ending when the value is less than the given value to obtain the final clone code detection model.
It should be noted that the various features described in the above embodiments may be combined in any suitable manner without departing from the scope of the invention. The invention is not described in detail in order to avoid unnecessary repetition.

Claims (5)

1. A clone code detection method based on feature alignment is characterized in that a target code x and a code y are input into a trained clone code detection model; the trained clone code detection model outputs the similarity of the code x and the code y, and whether the code x and the code y are similar codes is judged according to the similarity of the code x and the code y; the clone code detection model performs the following processing on the input code x and the input code y:
step 1: generating an abstract syntax tree T for a code x using a code parsing toolxAnd an abstract syntax tree T for the code yy
Step 2: according to an abstract syntax tree TxState node of, will TxDividing the sentence tree into a plurality of sentence trees, and forming the sentence trees into a sentence tree sequence ST according to the sequence of the previous traversalx(ii) a According to an abstract syntax tree TyState node of, will TyDividing the sentence tree into a plurality of sentence trees, and forming the sentence trees into a sentence tree sequence ST according to the sequence of the previous traversaly
And step 3: constructing a statement vector matrix: embedding words into node entities of each statement tree in each statement tree sequence, encoding the statement trees with the words embedded into statement vectors by adopting an encoder, and forming a statement vector matrix by the statement vectors corresponding to the statement tree sequence according to the statement tree sequence;
and 4, step 4: statement vector matrix X for code X using a bidirectional causal convolutional networkxStatement vector matrix X of sum code yyRespectively extracting the features to obtain the code features F of the code xxCode characteristics F of code yy
And 5: computing code features FyFor code feature FxAlignment feature of
Figure FDA0003213350840000011
And code feature FxFor code feature FyAlignment feature of
Figure FDA0003213350840000012
Step 6: computing
Figure FDA0003213350840000013
And are each to RxyAnd RyxPerforming maximum pooling operation to obtain similarity eigenvector VxyAnd Vyx
And 7: will VxyAnd VyxAnd connecting in the characteristic dimension, inputting the connected vector into a full connection layer, inputting the output of the full connection layer into a sigmoid function layer, and outputting the similarity S of the code x and the code y by the sigmoid function layer.
2. The method for detecting clone codes based on feature alignment of claim 1, wherein said bidirectional causal convolution network in step 4 comprises a first bidirectional causal convolution module and a second bidirectional causal convolution module connected to each other, said first and second bidirectional causal convolution modules are identical in structure and each comprise a 1 x 1 convolution layer, a first bidirectional causal convolution layer and a second bidirectional causal convolution layer; adding the results output by the 1 × 1 convolutional layer, the first bidirectional causal convolutional layer and the second bidirectional causal convolutional layer as the output of the bidirectional causal convolutional module; the first bi-directional causal convolutional layer is a bi-directional causal convolutional layer with a convolution kernel of 3 × 1, and the second bi-directional causal convolutional layer is a bi-directional causal convolutional layer with a convolution kernel of 3 × 1 and a step size of 2.
3. The method according to claim 2, wherein the bidirectional causal convolutional layer performs the following operations on the input features:
Figure FDA0003213350840000021
Figure FDA0003213350840000022
Figure FDA0003213350840000023
wherein the content of the first and second substances,
Figure FDA0003213350840000024
is the T-th feature vector in the input features of the bidirectional causal convolution layer, wherein T is 0,1,2, …, T'; when the code x is subjected to feature extraction, T' is a sentence tree sequence STxLength of (d); when the code y is subjected to feature extraction, T' is a sentence tree sequence STyLength of (d); k is the convolution kernel size of the bi-directional causal convolution layer,
Figure FDA0003213350840000025
for convolution kernels, f and b represent forward and backward, ═ f or b, respectively;
Figure FDA0003213350840000026
the t-th eigenvector output for the forward convolution operation in the bi-directional causal convolution layer,
Figure FDA0003213350840000027
the concat represents the connection of the t characteristic vector output by the backward convolution operation in the bidirectional causal convolution layer, and the t output characteristic vector F of the bidirectional causal convolution layer is obtained by connecting the bidirectional characteristic vectorst
4. The method according to claim 1, wherein the step 5 specifically comprises:
computing code features FyFor code feature FxAlignment feature of
Figure FDA0003213350840000028
The specific method comprises the following steps:
constructing code features F by sparse reconstructionyAligning to code feature FxThe objective function of (2):
Figure FDA0003213350840000029
wherein WyxIs a sparse reconstruction coefficient; beta is an equilibrium coefficient;
solving the objective function by the least square method to obtain
Figure FDA00032133508400000210
IyIs of size
Figure FDA00032133508400000211
Figure FDA00032133508400000212
T is a transpose,
Figure FDA00032133508400000213
for a sentence tree sequence STyLength of (d); thereby obtaining
Figure FDA00032133508400000214
Computing code features FxFor code feature FyAlignment feature of
Figure FDA00032133508400000215
The specific method comprises the following steps:
constructing code features F by sparse reconstructionxAligning to code feature FyThe objective function of (2):
Figure FDA0003213350840000031
wherein, WxyIs a sparse reconstruction coefficient;
solving the objective function by the least square method to obtain
Figure FDA0003213350840000032
Thereby obtaining
Figure FDA0003213350840000033
Figure FDA0003213350840000034
Wherein IxIs of size
Figure FDA0003213350840000035
The unit matrix of (a) is,
Figure FDA0003213350840000036
for a sentence tree sequence STxLength of (d).
5. The method according to claim 1, wherein the loss function of the clone code detection model is:
Figure FDA0003213350840000037
wherein N is the total number of samples in the training set, one sample comprises two codes, yjIs the label of the jth sample, SjIs the similarity between two codes in the jth sample.
CN202110936377.8A 2021-08-16 2021-08-16 Clone code detection method based on feature alignment Active CN113656066B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110936377.8A CN113656066B (en) 2021-08-16 2021-08-16 Clone code detection method based on feature alignment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110936377.8A CN113656066B (en) 2021-08-16 2021-08-16 Clone code detection method based on feature alignment

Publications (2)

Publication Number Publication Date
CN113656066A true CN113656066A (en) 2021-11-16
CN113656066B CN113656066B (en) 2022-08-05

Family

ID=78479194

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110936377.8A Active CN113656066B (en) 2021-08-16 2021-08-16 Clone code detection method based on feature alignment

Country Status (1)

Country Link
CN (1) CN113656066B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114201406A (en) * 2021-12-16 2022-03-18 中国电信股份有限公司 Code detection method, system, equipment and storage medium based on open source component

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110246968A1 (en) * 2010-04-01 2011-10-06 Microsoft Corporation Code-Clone Detection and Analysis
CN104407872A (en) * 2014-12-04 2015-03-11 北京邮电大学 Code clone detection method
CN109062910A (en) * 2018-07-26 2018-12-21 苏州大学 Sentence alignment method based on deep neural network
CN109101235A (en) * 2018-06-05 2018-12-28 北京航空航天大学 A kind of intelligently parsing method of software program
CN110059188A (en) * 2019-04-11 2019-07-26 四川黑马数码科技有限公司 A kind of Chinese sentiment analysis method based on two-way time convolutional network
US20190265955A1 (en) * 2016-07-21 2019-08-29 Ramot At Tel-Aviv University Ltd. Method and system for comparing sequences
CN111562943A (en) * 2020-04-29 2020-08-21 海南大学 Code clone detection method and device based on event embedded tree and GAT network
CN112035165A (en) * 2020-08-26 2020-12-04 山谷网安科技股份有限公司 Code clone detection method and system based on homogeneous network
CN112215013A (en) * 2020-11-02 2021-01-12 天津大学 Clone code semantic detection method based on deep learning
CN112306494A (en) * 2020-12-03 2021-02-02 南京航空航天大学 Code classification and clustering method based on convolution and cyclic neural network
CN112394973A (en) * 2020-11-23 2021-02-23 山东理工大学 Multi-language code plagiarism detection method based on pseudo-twin network
CN112507337A (en) * 2020-12-18 2021-03-16 四川长虹电器股份有限公司 Implementation method of malicious JavaScript code detection model based on semantic analysis
CN112560502A (en) * 2020-12-28 2021-03-26 桂林电子科技大学 Semantic similarity matching method and device and storage medium
CN112596736A (en) * 2020-12-24 2021-04-02 哈尔滨工业大学 Semantic-based cross-instruction architecture binary code similarity detection method

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110246968A1 (en) * 2010-04-01 2011-10-06 Microsoft Corporation Code-Clone Detection and Analysis
CN104407872A (en) * 2014-12-04 2015-03-11 北京邮电大学 Code clone detection method
US20190265955A1 (en) * 2016-07-21 2019-08-29 Ramot At Tel-Aviv University Ltd. Method and system for comparing sequences
CN109101235A (en) * 2018-06-05 2018-12-28 北京航空航天大学 A kind of intelligently parsing method of software program
CN109062910A (en) * 2018-07-26 2018-12-21 苏州大学 Sentence alignment method based on deep neural network
CN110059188A (en) * 2019-04-11 2019-07-26 四川黑马数码科技有限公司 A kind of Chinese sentiment analysis method based on two-way time convolutional network
CN111562943A (en) * 2020-04-29 2020-08-21 海南大学 Code clone detection method and device based on event embedded tree and GAT network
CN112035165A (en) * 2020-08-26 2020-12-04 山谷网安科技股份有限公司 Code clone detection method and system based on homogeneous network
CN112215013A (en) * 2020-11-02 2021-01-12 天津大学 Clone code semantic detection method based on deep learning
CN112394973A (en) * 2020-11-23 2021-02-23 山东理工大学 Multi-language code plagiarism detection method based on pseudo-twin network
CN112306494A (en) * 2020-12-03 2021-02-02 南京航空航天大学 Code classification and clustering method based on convolution and cyclic neural network
CN112507337A (en) * 2020-12-18 2021-03-16 四川长虹电器股份有限公司 Implementation method of malicious JavaScript code detection model based on semantic analysis
CN112596736A (en) * 2020-12-24 2021-04-02 哈尔滨工业大学 Semantic-based cross-instruction architecture binary code similarity detection method
CN112560502A (en) * 2020-12-28 2021-03-26 桂林电子科技大学 Semantic similarity matching method and device and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
DANIEL PEREZ ET AL.: "Cross-Language Clone Detection by Learning Over Abstract Syntax Trees", 《2019 IEEE/ACM 16TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR)》 *
MARTIN WHITE ET AL.: "Deep learning code fragments for code clone detection", 《2016 31ST IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE)》 *
YUEMING WU ET AL.: "SCDetector: Software Functional Clone Detection Based on Semantic Tokens Analysis", 《2020 35TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE)》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114201406A (en) * 2021-12-16 2022-03-18 中国电信股份有限公司 Code detection method, system, equipment and storage medium based on open source component
CN114201406B (en) * 2021-12-16 2024-02-02 中国电信股份有限公司 Code detection method, system, equipment and storage medium based on open source component

Also Published As

Publication number Publication date
CN113656066B (en) 2022-08-05

Similar Documents

Publication Publication Date Title
CN108170736B (en) Document rapid scanning qualitative method based on cyclic attention mechanism
US11256487B2 (en) Vectorized representation method of software source code
CN110134757B (en) Event argument role extraction method based on multi-head attention mechanism
CN110059188B (en) Chinese emotion analysis method based on bidirectional time convolution network
CN113312500B (en) Method for constructing event map for safe operation of dam
CN108763191B (en) Text abstract generation method and system
CN109948340B (en) PHP-Webshell detection method combining convolutional neural network and XGboost
CN109726400B (en) Entity word recognition result evaluation method, device, equipment and entity word extraction system
CN114925195A (en) Standard content text abstract generation method integrating vocabulary coding and structure coding
CN116245513A (en) Automatic operation and maintenance system and method based on rule base
CN113656066B (en) Clone code detection method based on feature alignment
CN113641819A (en) Multi-task sparse sharing learning-based argument mining system and method
CN116046810A (en) Nondestructive testing method based on RPC cover plate damage load
CN115688784A (en) Chinese named entity recognition method fusing character and word characteristics
CN116523583A (en) Electronic commerce data analysis system and method thereof
CN114153942B (en) Event time sequence relation extraction method based on dynamic attention mechanism
CN116992304A (en) Policy matching analysis system and method based on artificial intelligence
CN117036778A (en) Potential safety hazard identification labeling method based on image-text conversion model
CN116955644A (en) Knowledge fusion method, system and storage medium based on knowledge graph
CN111382333A (en) Case element extraction method in news text sentence based on case correlation joint learning and graph convolution
CN114169447B (en) Event detection method based on self-attention convolution bidirectional gating cyclic unit network
CN116129251A (en) Intelligent manufacturing method and system for office desk and chair
CN112559750B (en) Text data classification method, device, nonvolatile storage medium and processor
CN113947085A (en) Named entity identification method for intelligent question-answering system
CN114118058A (en) Emotion analysis system and method based on fusion of syntactic characteristics and attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant