CN113656066A - Clone code detection method based on feature alignment - Google Patents
Clone code detection method based on feature alignment Download PDFInfo
- Publication number
- CN113656066A CN113656066A CN202110936377.8A CN202110936377A CN113656066A CN 113656066 A CN113656066 A CN 113656066A CN 202110936377 A CN202110936377 A CN 202110936377A CN 113656066 A CN113656066 A CN 113656066A
- Authority
- CN
- China
- Prior art keywords
- code
- feature
- bidirectional
- tree
- convolution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/70—Software maintenance or management
- G06F8/75—Structural analysis for program understanding
- G06F8/751—Code clone detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/42—Syntactic analysis
- G06F8/427—Parsing
Landscapes
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a clone code detection method based on feature alignment, which comprises the steps of analyzing a source code into an abstract syntax tree, dividing the abstract syntax tree into a sentence tree sequence, and then carrying out word embedding and semantic tree coding; secondly, extracting feature representation of code segments with rich structure and semantic information by using a bidirectional causal convolutional neural network; after feature extraction, an alignment matrix representing the corresponding relation between the two code segments is learned in a data-driven mode through sparse reconstruction, so that the two code segments are aligned, and the similarity of the two codes is obtained. Compared with the prior art, the method can extract more abundant features, solve the problem of structural difference of codes with similar functions due to different statement positions and obtain higher detection precision.
Description
Technical Field
The invention belongs to the technical field of software code analysis.
Background
The purpose of code clone detection is to make a decision by measuring the similarity of two code fragments. Code clone detection has proven valuable throughout the software development lifecycle. Identifying textual, grammatical, or functionally similar code fragments is the basis for many software engineering tasks, such as code classification, code reconstruction, bug detection, and malicious code detection. In recent years, deep learning techniques have achieved good results in code clone testing, especially to address code clone testing with similar functions.
However, the prior art only focuses on how to extract more distinctive features from the source code, and some problems, such as structural differences of functionally similar codes, are not clearly solved. In the software development process, when a programmer copies a code segment, several statements are often added or deleted, or a more flexible syntax structure is used to realize the same function, which causes the code statements before and after copying to be misplaced, resulting in structural differences.
The code fragments are usually converted into an abstract syntax tree or a program dependency graph, and then CNN or RNN learning feature representation is adopted to calculate the similarity between features so as to decide whether the code fragments are similar. The learned features are typically two-dimensional tensors, and in order to generate vectors that compute the degree of similarity, a global pooling operation is typically employed. However, global pooling is inherently weak in addressing code misalignment, and misalignment of features still exists. When the similarity of code pairs with different structures and similar functions is calculated, the similarity is low due to the misplacement of characteristics, and therefore, the decision error can be caused. The alignment operation is carried out on the code characteristics, so that the gap of different structures of similar-function codes can be closed.
Disclosure of Invention
The purpose of the invention is as follows: in order to solve the problems in the prior art, the invention provides a clone code detection method based on feature alignment.
The technical scheme is as follows: the invention provides a clone code detection method based on feature alignment, which specifically comprises the following steps: inputting the target code x and the code y into a trained clone code detection model; the trained clone code detection model outputs the similarity of the code x and the code y, and whether the code x and the code y are similar codes is judged according to the similarity of the code x and the code y; the clone code detection model performs the following processing on the input code x and the input code y:
step 1: generating an abstract syntax tree T for a code x using a code parsing toolxAnd an abstract syntax tree T for the code yy
Step 2: according to an abstract syntax tree TxState node of, will TxDividing into a plurality of statement trees according to the original abstract syntax tree TxOrder of precedence traversalForming the plurality of sentence trees into a sentence tree sequence STx(ii) a According to an abstract syntax tree TyState node of, will TyDividing into a plurality of statement trees according to the original abstract syntax tree TxForming the statement trees into a statement tree sequence ST according to the sequence of the sequencing traversaly;
And step 3: constructing a statement vector matrix: embedding words into node entities of each statement tree in each statement tree sequence, encoding the statement trees with the words embedded into statement vectors by adopting an encoder, and forming a statement vector matrix by the statement vectors corresponding to the statement tree sequence according to the statement tree sequence;
and 4, step 4: statement vector matrix X for code X using a bidirectional causal convolutional networkxStatement vector matrix X of sum code yyRespectively extracting the features to obtain the code features F of the code xxCode characteristics F of code yy;
And 5: computing code features F by sparse reconstruction methodyFor code feature FxAlignment feature ofAnd code feature FxFor code feature FyAlignment feature of
Step 6: computingAnd are each to RxyAnd RyxPerforming maximum pooling operation to obtain similarity eigenvector VxyAnd Vyx;
And 7: will VxyAnd VyxAnd connecting in the characteristic dimension, inputting the connected vector into a full connection layer, inputting the output of the full connection layer into a sigmoid function layer, and outputting the similarity S of the code x and the code y by the sigmoid function layer.
Further, the bidirectional causal convolution network in step 4 includes a first bidirectional causal convolution module and a second bidirectional causal convolution module that are connected to each other, where the first and second bidirectional causal convolution modules have the same structure and each include a 1 × 1 convolution layer, a first bidirectional causal convolution layer and a second bidirectional causal convolution layer; adding the results output by the 1 × 1 convolutional layer, the first bidirectional causal convolutional layer and the second bidirectional causal convolutional layer as the output of the bidirectional causal convolutional module; the first bi-directional causal convolutional layer is a bi-directional causal convolutional layer with a convolution kernel of 3 × 1, and the second bi-directional causal convolutional layer is a bi-directional causal convolutional layer with a convolution kernel of 3 × 1 and a step size of 2.
Further, the bi-directional causal convolutional layer performs the following operations on the input features:
wherein the content of the first and second substances,is the T-th feature vector in the input features of the bidirectional causal convolution layer, wherein T is 0,1,2, …, T'; when the code x is subjected to feature extraction, T' is a sentence tree sequence STxLength of (d); when the code y is subjected to feature extraction, T' is a sentence tree sequence STyLength of (d); k is the convolution kernel size of the bi-directional causal convolution layer,for convolution kernels, f and b represent forward and backward, ═ f or b, respectively;for forward convolution operation in bidirectional causal convolution layerThe t-th feature vector is output,the concat represents the connection of the t characteristic vector output by the backward convolution operation in the bidirectional causal convolution layer, and the t output characteristic vector F of the bidirectional causal convolution layer is obtained by connecting the bidirectional characteristic vectorst。
Further, the step 5 specifically includes:
computing code features FyFor code feature FxAlignment feature ofThe specific method comprises the following steps:
constructing code features F by sparse reconstructionyAligning to code feature FxThe objective function of (2):
wherein WyxIs a sparse reconstruction coefficient; beta is an equilibrium coefficient;
solving the objective function by the least square method to obtainIyIs of size T is a transpose,for a sentence tree sequence STyLength of (d); thereby obtaining
Computing code features FxFor code feature FyAlignment feature ofThe specific method comprises the following steps:
constructing code features F by sparse reconstructionxAligning to code feature FyThe objective function of (2):
wherein, WxyIs a sparse reconstruction coefficient;
solving the objective function by the least square method to obtainThereby obtaining Wherein IxIs of sizeThe unit matrix of (a) is,for a sentence tree sequence STxLength of (d).
Further, the loss function of the clone code detection model is:
wherein N is the total number of samples in the training set, one sample comprises two codes, yjIs the label of the jth sample, SjIs the similarity between two codes in the jth sample.
Has the advantages that: the method extracts the characteristics of the code through the bidirectional causal convolutional network, can acquire context information in a bidirectional mode, improves the representing capability of the code, has fewer parameters than bidirectional RNN (radio network) used by other methods, and has higher speed and higher accuracy in code cloning detection. The code features are aligned through sparse reconstruction, and the influence caused by code feature dislocation can be effectively eliminated. For more difficult clone code types, namely similar codes with added, deleted and modified sentences and similar codes with the same semantics and greatly different grammar structures, the invention can obviously improve the detection accuracy and the recall rate.
Drawings
FIG. 1 is a flow chart of a method of the present invention;
FIG. 2 is a schematic diagram of a method of code alignment;
FIG. 3 is a schematic diagram of a bidirectional causal convolution network, where (a) is a schematic diagram of the structure of the bidirectional causal convolution network and (b) is a schematic diagram of the structure of the bidirectional causal convolution module.
Detailed Description
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate an embodiment of the invention and, together with the description, serve to explain the invention and not to limit the invention.
The embodiment provides a clone code detection method based on feature alignment, which specifically comprises the following steps: inputting the code x and the code y into a trained clone code detection model; the trained clone code detection model outputs the similarity of the code x and the code y, and whether the code x and the code y are similar codes is judged according to the similarity of the code x and the code y; the clone code detection model converts an original code segment pair into an abstract syntax tree, then divides the abstract syntax tree to generate a statement tree sequence consisting of a plurality of statement trees, constructs a statement vector matrix, extracts features by using a bidirectional causal convolution network, aligns the features by using the proposed sparse reconstruction feature alignment method, and calculates the similarity of the aligned code features.
As shown in fig. 1 and 2, the specific clone code detection model processes the code x and the code y as follows:
(1) taking the code x and the code y as a code pair, and generating an abstract syntax tree T of the code x and the code y by utilizing an existing code analysis tool for a code bookxAnd Ty;
(2) According to the state node of the abstract syntax tree, TxDividing the sentence trees into a plurality of sentence trees, and forming the sentence trees into a sentence tree sequence ST according to the sequence traversed by the original abstract syntax treex(ii) a Will TyDividing the sentence trees into a plurality of sentence trees, and forming the sentence trees into a sentence tree sequence ST according to the sequence traversed by the original abstract syntax treey(ii) a The purpose of this step is to keep the fixed order not to lose the structural information of the original code as much as possible, second, can carry on the subsequent characteristic extraction after forming the sequence;
(3) embedding words into the node entities in the statement tree by using Word2 Vec; coding the sentence tree with the embedded words into a sentence vector through a sentence coder, and constructing a sentence vector matrix according to the sequence of the sentence tree sequence; a sentence tree sequence ST is obtainedxSentence vector matrix and sentence tree sequence STyThe statement vector matrix of (2);
(4) inputting the statement vector matrix into a bidirectional causal convolution network for feature extraction, thereby obtaining the code feature F of the code xxCode characteristics F of code yy;
(5) Generating the code characteristics of the code sample pair by sparse reconstruction to generate the alignment characteristics, namely the code characteristics FyFor code feature FxAlignment feature ofAnd code feature FxFor code feature FyAlignment feature of
(6) Subtracting the alignment features from the original features, calculating an absolute value, and performing maximum pooling along the quantity dimension of the statement tree to obtain a similarity feature vector of a single code and another code;
(7) connecting the two similarity characteristic vectors on the characteristic dimension, inputting the connected vectors into a connection layer, inputting the output of the full connection layer into a sigmoid function layer, and outputting the similarity S of the code x and the code y by the sigmoid function layer.
In step 4, as shown in fig. 3, the bidirectional causal convolution network is formed by stacking two bidirectional causal convolution modules in series. And the bidirectional causal convolution module consists of a 1 multiplied by 1 convolution layer and two bidirectional causal convolution layers and is used for capturing information with different scales. The convolution kernels of the two bidirectional causal convolution layers are 3 x 1 in size, and the step lengths are 1 and 2 respectively.
The two-way causal convolutional layer can be represented as:
wherein the content of the first and second substances,is the T-th feature vector in the input features of the bidirectional causal convolution layer, wherein T is 0,1,2, …, T'; when the code x is subjected to feature extraction, T' is a sentence tree sequence STxLength of (d); when the code y is subjected to feature extraction, T' is a sentence tree sequence STyLength of (d); k is the convolution kernel size of the bi-directional causal convolution layer,for convolution kernels, f and b represent forward and backward, ═ f or b, respectively;the t-th eigenvector output for the forward convolution operation in the bi-directional causal convolution layer,the concat represents the connection of the t characteristic vector output by the backward convolution operation in the bidirectional causal convolution layer, and the t output characteristic vector F of the bidirectional causal convolution layer is obtained by connecting the bidirectional characteristic vectorst。
The output of the two-way causal convolution module is obtained by adding the 1 x 1 convolutional layer to the outputs of the two-way causal convolutional layers. Statement vector matrix XxAnd XyThe characteristics obtained by the bidirectional causal convolution network are respectivelyAnd for the length of the statement tree of the code x,for the statement tree length of code y, D represents the dimension of the feature.
In an embodiment of the present invention, step 5 specifically includes:
computing code features FyFor code feature FxAlignment feature ofThe specific method comprises the following steps:
constructing code features F by sparse reconstructionyAligning to code feature FxThe objective function of (2):
wherein the content of the first and second substances,are sparse reconstruction coefficients, i.e. alignment matrices. Beta is the equilibrium coefficient. The least square method is adopted to obtain:
whereinIs a matrix WyxThe transpose of (a) is performed,is an inversion matrix.Is an identity matrix. It can thus be obtained that the alignment characteristic of the code y with respect to the code x is:
computing code features FxFor code feature FyAlignment feature ofThe specific method comprises the following steps:
constructing code features F by sparse reconstructionxAligning to code feature FyThe objective function of (2):
wherein the content of the first and second substances,for sparse reconstruction coefficients, least squares are usedThe method can be solved as follows:
whereinIs a matrix WxyThe transpose of (a) is performed,is an inversion matrix.Is an identity matrix, and thus the alignment characteristics of the code x to the code y can be obtained as follows:
in step 6, code x is characterized by F, an embodiment of the present inventionxThe alignment characteristic of the code y to the code x isThe difference between the two characteristics is calculated, and the absolute value is calculatedThe vector V is obtained by pooling the maximum values of the vector V and the vector Vxy. The code y is characterized by FyThe alignment characteristic of the code x to the code y isThe difference between the two characteristics is calculated, and the absolute value is calculatedThe vector V is obtained by pooling the maximum values of the vector V and the vector Vyx。
In one embodiment of the invention, in step 7, vector V is appliedxyAnd VyxConnecting on characteristic dimension to obtain vector V, and inputting the vector V into full connection layer FCAnd obtaining the similarity S through a sigmoid function.
The clone code detection model adopts a binary cross entropy loss function as follows:
wherein N is the total number of samples in the training set, one sample comprises two codes, yjIs the label of the jth sample (manually set, label 1 if two codes in one sample are similar codes, namely clone codes, otherwise label 0), SjThe similarity between two codes in the jth sample is taken as the similarity (the value of the similarity is 0-1); training process by calculatingAnd back-propagating the gradient, updating the parameters of the model using gradient descent, thereby causingDecrease, iterate a certain number of times orAnd ending when the value is less than the given value to obtain the final clone code detection model.
It should be noted that the various features described in the above embodiments may be combined in any suitable manner without departing from the scope of the invention. The invention is not described in detail in order to avoid unnecessary repetition.
Claims (5)
1. A clone code detection method based on feature alignment is characterized in that a target code x and a code y are input into a trained clone code detection model; the trained clone code detection model outputs the similarity of the code x and the code y, and whether the code x and the code y are similar codes is judged according to the similarity of the code x and the code y; the clone code detection model performs the following processing on the input code x and the input code y:
step 1: generating an abstract syntax tree T for a code x using a code parsing toolxAnd an abstract syntax tree T for the code yy;
Step 2: according to an abstract syntax tree TxState node of, will TxDividing the sentence tree into a plurality of sentence trees, and forming the sentence trees into a sentence tree sequence ST according to the sequence of the previous traversalx(ii) a According to an abstract syntax tree TyState node of, will TyDividing the sentence tree into a plurality of sentence trees, and forming the sentence trees into a sentence tree sequence ST according to the sequence of the previous traversaly;
And step 3: constructing a statement vector matrix: embedding words into node entities of each statement tree in each statement tree sequence, encoding the statement trees with the words embedded into statement vectors by adopting an encoder, and forming a statement vector matrix by the statement vectors corresponding to the statement tree sequence according to the statement tree sequence;
and 4, step 4: statement vector matrix X for code X using a bidirectional causal convolutional networkxStatement vector matrix X of sum code yyRespectively extracting the features to obtain the code features F of the code xxCode characteristics F of code yy;
And 5: computing code features FyFor code feature FxAlignment feature ofAnd code feature FxFor code feature FyAlignment feature of
Step 6: computingAnd are each to RxyAnd RyxPerforming maximum pooling operation to obtain similarity eigenvector VxyAnd Vyx;
And 7: will VxyAnd VyxAnd connecting in the characteristic dimension, inputting the connected vector into a full connection layer, inputting the output of the full connection layer into a sigmoid function layer, and outputting the similarity S of the code x and the code y by the sigmoid function layer.
2. The method for detecting clone codes based on feature alignment of claim 1, wherein said bidirectional causal convolution network in step 4 comprises a first bidirectional causal convolution module and a second bidirectional causal convolution module connected to each other, said first and second bidirectional causal convolution modules are identical in structure and each comprise a 1 x 1 convolution layer, a first bidirectional causal convolution layer and a second bidirectional causal convolution layer; adding the results output by the 1 × 1 convolutional layer, the first bidirectional causal convolutional layer and the second bidirectional causal convolutional layer as the output of the bidirectional causal convolutional module; the first bi-directional causal convolutional layer is a bi-directional causal convolutional layer with a convolution kernel of 3 × 1, and the second bi-directional causal convolutional layer is a bi-directional causal convolutional layer with a convolution kernel of 3 × 1 and a step size of 2.
3. The method according to claim 2, wherein the bidirectional causal convolutional layer performs the following operations on the input features:
wherein the content of the first and second substances,is the T-th feature vector in the input features of the bidirectional causal convolution layer, wherein T is 0,1,2, …, T'; when the code x is subjected to feature extraction, T' is a sentence tree sequence STxLength of (d); when the code y is subjected to feature extraction, T' is a sentence tree sequence STyLength of (d); k is the convolution kernel size of the bi-directional causal convolution layer,for convolution kernels, f and b represent forward and backward, ═ f or b, respectively;the t-th eigenvector output for the forward convolution operation in the bi-directional causal convolution layer,the concat represents the connection of the t characteristic vector output by the backward convolution operation in the bidirectional causal convolution layer, and the t output characteristic vector F of the bidirectional causal convolution layer is obtained by connecting the bidirectional characteristic vectorst。
4. The method according to claim 1, wherein the step 5 specifically comprises:
computing code features FyFor code feature FxAlignment feature ofThe specific method comprises the following steps:
constructing code features F by sparse reconstructionyAligning to code feature FxThe objective function of (2):
wherein WyxIs a sparse reconstruction coefficient; beta is an equilibrium coefficient;
solving the objective function by the least square method to obtainIyIs of size T is a transpose,for a sentence tree sequence STyLength of (d); thereby obtaining
Computing code features FxFor code feature FyAlignment feature ofThe specific method comprises the following steps:
constructing code features F by sparse reconstructionxAligning to code feature FyThe objective function of (2):
wherein, WxyIs a sparse reconstruction coefficient;
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110936377.8A CN113656066B (en) | 2021-08-16 | 2021-08-16 | Clone code detection method based on feature alignment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110936377.8A CN113656066B (en) | 2021-08-16 | 2021-08-16 | Clone code detection method based on feature alignment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113656066A true CN113656066A (en) | 2021-11-16 |
CN113656066B CN113656066B (en) | 2022-08-05 |
Family
ID=78479194
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110936377.8A Active CN113656066B (en) | 2021-08-16 | 2021-08-16 | Clone code detection method based on feature alignment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113656066B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114201406A (en) * | 2021-12-16 | 2022-03-18 | 中国电信股份有限公司 | Code detection method, system, equipment and storage medium based on open source component |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110246968A1 (en) * | 2010-04-01 | 2011-10-06 | Microsoft Corporation | Code-Clone Detection and Analysis |
CN104407872A (en) * | 2014-12-04 | 2015-03-11 | 北京邮电大学 | Code clone detection method |
CN109062910A (en) * | 2018-07-26 | 2018-12-21 | 苏州大学 | Sentence alignment method based on deep neural network |
CN109101235A (en) * | 2018-06-05 | 2018-12-28 | 北京航空航天大学 | A kind of intelligently parsing method of software program |
CN110059188A (en) * | 2019-04-11 | 2019-07-26 | 四川黑马数码科技有限公司 | A kind of Chinese sentiment analysis method based on two-way time convolutional network |
US20190265955A1 (en) * | 2016-07-21 | 2019-08-29 | Ramot At Tel-Aviv University Ltd. | Method and system for comparing sequences |
CN111562943A (en) * | 2020-04-29 | 2020-08-21 | 海南大学 | Code clone detection method and device based on event embedded tree and GAT network |
CN112035165A (en) * | 2020-08-26 | 2020-12-04 | 山谷网安科技股份有限公司 | Code clone detection method and system based on homogeneous network |
CN112215013A (en) * | 2020-11-02 | 2021-01-12 | 天津大学 | Clone code semantic detection method based on deep learning |
CN112306494A (en) * | 2020-12-03 | 2021-02-02 | 南京航空航天大学 | Code classification and clustering method based on convolution and cyclic neural network |
CN112394973A (en) * | 2020-11-23 | 2021-02-23 | 山东理工大学 | Multi-language code plagiarism detection method based on pseudo-twin network |
CN112507337A (en) * | 2020-12-18 | 2021-03-16 | 四川长虹电器股份有限公司 | Implementation method of malicious JavaScript code detection model based on semantic analysis |
CN112560502A (en) * | 2020-12-28 | 2021-03-26 | 桂林电子科技大学 | Semantic similarity matching method and device and storage medium |
CN112596736A (en) * | 2020-12-24 | 2021-04-02 | 哈尔滨工业大学 | Semantic-based cross-instruction architecture binary code similarity detection method |
-
2021
- 2021-08-16 CN CN202110936377.8A patent/CN113656066B/en active Active
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110246968A1 (en) * | 2010-04-01 | 2011-10-06 | Microsoft Corporation | Code-Clone Detection and Analysis |
CN104407872A (en) * | 2014-12-04 | 2015-03-11 | 北京邮电大学 | Code clone detection method |
US20190265955A1 (en) * | 2016-07-21 | 2019-08-29 | Ramot At Tel-Aviv University Ltd. | Method and system for comparing sequences |
CN109101235A (en) * | 2018-06-05 | 2018-12-28 | 北京航空航天大学 | A kind of intelligently parsing method of software program |
CN109062910A (en) * | 2018-07-26 | 2018-12-21 | 苏州大学 | Sentence alignment method based on deep neural network |
CN110059188A (en) * | 2019-04-11 | 2019-07-26 | 四川黑马数码科技有限公司 | A kind of Chinese sentiment analysis method based on two-way time convolutional network |
CN111562943A (en) * | 2020-04-29 | 2020-08-21 | 海南大学 | Code clone detection method and device based on event embedded tree and GAT network |
CN112035165A (en) * | 2020-08-26 | 2020-12-04 | 山谷网安科技股份有限公司 | Code clone detection method and system based on homogeneous network |
CN112215013A (en) * | 2020-11-02 | 2021-01-12 | 天津大学 | Clone code semantic detection method based on deep learning |
CN112394973A (en) * | 2020-11-23 | 2021-02-23 | 山东理工大学 | Multi-language code plagiarism detection method based on pseudo-twin network |
CN112306494A (en) * | 2020-12-03 | 2021-02-02 | 南京航空航天大学 | Code classification and clustering method based on convolution and cyclic neural network |
CN112507337A (en) * | 2020-12-18 | 2021-03-16 | 四川长虹电器股份有限公司 | Implementation method of malicious JavaScript code detection model based on semantic analysis |
CN112596736A (en) * | 2020-12-24 | 2021-04-02 | 哈尔滨工业大学 | Semantic-based cross-instruction architecture binary code similarity detection method |
CN112560502A (en) * | 2020-12-28 | 2021-03-26 | 桂林电子科技大学 | Semantic similarity matching method and device and storage medium |
Non-Patent Citations (3)
Title |
---|
DANIEL PEREZ ET AL.: "Cross-Language Clone Detection by Learning Over Abstract Syntax Trees", 《2019 IEEE/ACM 16TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR)》 * |
MARTIN WHITE ET AL.: "Deep learning code fragments for code clone detection", 《2016 31ST IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE)》 * |
YUEMING WU ET AL.: "SCDetector: Software Functional Clone Detection Based on Semantic Tokens Analysis", 《2020 35TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE)》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114201406A (en) * | 2021-12-16 | 2022-03-18 | 中国电信股份有限公司 | Code detection method, system, equipment and storage medium based on open source component |
CN114201406B (en) * | 2021-12-16 | 2024-02-02 | 中国电信股份有限公司 | Code detection method, system, equipment and storage medium based on open source component |
Also Published As
Publication number | Publication date |
---|---|
CN113656066B (en) | 2022-08-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108170736B (en) | Document rapid scanning qualitative method based on cyclic attention mechanism | |
US11256487B2 (en) | Vectorized representation method of software source code | |
CN110134757B (en) | Event argument role extraction method based on multi-head attention mechanism | |
CN110059188B (en) | Chinese emotion analysis method based on bidirectional time convolution network | |
CN113312500B (en) | Method for constructing event map for safe operation of dam | |
CN108763191B (en) | Text abstract generation method and system | |
CN109948340B (en) | PHP-Webshell detection method combining convolutional neural network and XGboost | |
CN109726400B (en) | Entity word recognition result evaluation method, device, equipment and entity word extraction system | |
CN114925195A (en) | Standard content text abstract generation method integrating vocabulary coding and structure coding | |
CN116245513A (en) | Automatic operation and maintenance system and method based on rule base | |
CN113656066B (en) | Clone code detection method based on feature alignment | |
CN113641819A (en) | Multi-task sparse sharing learning-based argument mining system and method | |
CN116046810A (en) | Nondestructive testing method based on RPC cover plate damage load | |
CN115688784A (en) | Chinese named entity recognition method fusing character and word characteristics | |
CN116523583A (en) | Electronic commerce data analysis system and method thereof | |
CN114153942B (en) | Event time sequence relation extraction method based on dynamic attention mechanism | |
CN116992304A (en) | Policy matching analysis system and method based on artificial intelligence | |
CN117036778A (en) | Potential safety hazard identification labeling method based on image-text conversion model | |
CN116955644A (en) | Knowledge fusion method, system and storage medium based on knowledge graph | |
CN111382333A (en) | Case element extraction method in news text sentence based on case correlation joint learning and graph convolution | |
CN114169447B (en) | Event detection method based on self-attention convolution bidirectional gating cyclic unit network | |
CN116129251A (en) | Intelligent manufacturing method and system for office desk and chair | |
CN112559750B (en) | Text data classification method, device, nonvolatile storage medium and processor | |
CN113947085A (en) | Named entity identification method for intelligent question-answering system | |
CN114118058A (en) | Emotion analysis system and method based on fusion of syntactic characteristics and attention mechanism |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |