CN112416431B - Source code segment pair comparison method based on coding sequence representation - Google Patents
Source code segment pair comparison method based on coding sequence representation Download PDFInfo
- Publication number
- CN112416431B CN112416431B CN202011324413.7A CN202011324413A CN112416431B CN 112416431 B CN112416431 B CN 112416431B CN 202011324413 A CN202011324413 A CN 202011324413A CN 112416431 B CN112416431 B CN 112416431B
- Authority
- CN
- China
- Prior art keywords
- source code
- similarity
- coding sequence
- sequence
- seed
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 53
- 108091026890 Coding region Proteins 0.000 title claims abstract description 51
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 21
- 238000006243 chemical reaction Methods 0.000 claims abstract description 20
- 238000012545 processing Methods 0.000 claims abstract description 8
- 230000003068 static effect Effects 0.000 claims abstract description 7
- 238000012216 screening Methods 0.000 claims abstract description 6
- 239000012634 fragment Substances 0.000 claims description 11
- 238000002864 sequence alignment Methods 0.000 claims description 9
- 230000008569 process Effects 0.000 claims description 8
- 230000006835 compression Effects 0.000 claims description 6
- 238000007906 compression Methods 0.000 claims description 6
- 238000004590 computer program Methods 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 5
- 238000001514 detection method Methods 0.000 description 4
- 235000019580 granularity Nutrition 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/70—Software maintenance or management
- G06F8/75—Structural analysis for program understanding
- G06F8/751—Code clone detection
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a source code segment pair comparison method based on coding sequence representation, which belongs to the technical field of computer programs and adopts a coding sequence source code representation method based on static program analysis to convert a source code text into coding sequence representation; performing data processing on the coding sequence of the source code segment by using Burrows-Wheeler conversion to obtain an index of the coding sequence; through seed screening, finding out the seed with high similarity from the index of the coding sequence; using a Smith-Waterman algorithm to take the high-similarity seeds as the initial positions of the sub-sequence comparison and expand the sub-sequences which keep a certain similarity threshold in the subsequent sequences; according to the source code line number information corresponding to the coding sequence, the high similarity parts between the source code segments are positioned, the technical problems that cross-granularity similarity matching cannot be supported and the positioning of the high similarity segments is not accurate enough are solved, cross-granularity source code similarity comparison can be supported, and source code texts not requiring to be compared have the same granularity.
Description
Technical Field
The invention belongs to the technical field of computer programs, and relates to a source code segment pair comparison method based on coded sequence representation.
Background
The source code similarity detection has wide application in many software development tasks, for example, code plagiarism and repeated code detection are performed through clone detection, software failure is positioned through similarity matching, code recommendation is performed through high-similarity codes or repair patches are generated, and the like. In these tasks, a source code similarity matching algorithm is required to search and quantitatively analyze similar codes.
The common code similarity calculation method generally represents a source code text by a text, a symbol, a tree structure or a graph structure, and then calculates the similarity of two sections of source codes by using corresponding similarity definitions. The text-based method comprises the steps of taking a source code text as a character string sequence or a set, and performing text matching; abstracting a source code into a sequence or a set of symbols based on a symbol method, and performing similarity matching of symbol strings or sets; the method based on the tree structure is to convert the source code into a syntax tree structure of the code, and calculate the similarity by using algorithms such as subtree matching and the like; the method based on the graph structure is to represent a control flow graph or a data dependency graph of a source code as the source code, and calculate corresponding similarity by adopting a subgraph matching mode.
In the current common code similarity calculation method, two pieces of source codes to be compared are required to be of the same granularity, namely, the two pieces of source codes are both in the levels of functions, classes or files. Such methods typically require that the source code being compared is already programmed code and that similarity comparisons cannot be made for normally programmed source code fragments. According to software engineering experience, the earlier a possible problem is found if software codes are modified, the lower the solution cost is. Therefore, for some software development tasks, such as repeated code detection, fault location and code recommendation, if similarity comparison can be performed in the programming process, the solution efficiency of such tasks can be effectively improved. This requires that the source code fragment comparison method be able to support similarity matching across granularities (matching between incomplete programming code fragments and completed code text).
Disclosure of Invention
The invention aims to provide a source code segment pair comparison method based on coding sequence representation, and solves the technical problems that cross-granularity similarity matching cannot be supported and high-similarity segment positioning is not accurate enough in the conventional source code similarity matching algorithm.
In order to achieve the purpose, the invention adopts the following technical scheme:
a source code segment pair-wise comparison method based on coded sequence representation comprises the following steps:
step 1: establishing a source code database for storing a source code text;
establishing a code conversion module, and converting a source code fragment of a source code text into a code sequence in the code conversion module by using a code sequence source code representation method based on static program analysis;
step 2: in a code conversion module, performing data processing on a coding sequence by using a Burrows-Wheeler block ordering compression algorithm to obtain an index of the coding sequence;
and step 3: establishing a sequence alignment module, and finding out a subsequence alignment seed with high similarity from an index of a coding sequence through seed screening in the sequence alignment module;
and 4, step 4: in a sequence alignment module, a Smith-Waterman algorithm is adopted to take the high-similarity seeds as the initial positions of subsequence alignment, and the subsequences which keep a certain similarity threshold in the subsequent sequences are expanded;
and 5: and positioning high-similarity parts among the source code segments according to the line number information of the source code segments corresponding to the coding sequences.
Preferably, when step 1 is executed, the method for representing source code of a coding sequence by using static program analysis in the code conversion module specifically includes: and processing the code segments by taking the code segments as units, and converting the source code text into a coding sequence.
Preferably, when step 2 is executed, the method specifically includes using a Burrows-Wheeler block ordering compression algorithm to index all nodes of the coding sequence, and obtaining the index sequence of all nodes in the coding sequence.
Preferably, when step 3 is executed, the method specifically includes the following steps:
step A1: selecting seed position pairs with structural codes not greater than 0 and the same type of codes according to the indexes of the nodes of the coding sequences of the two segments of source code segments to be compared;
step A2: for any seed position pair, if the number of the same nodes at the corresponding positions in the subsequent K nodes is not less than K multiplied by r, wherein r is a similarity threshold, the seed position pair is marked as a subsequence candidate seed pair with high similarity.
Preferably, when step 4 is executed, the method specifically includes expanding a subsequent sequence by using a smith-waterman algorithm for each subsequence candidate seed pair, taking K subsequent nodes each time as expansion, where the length of the expanded subsequence is nK, and if the similarity of the expanded subsequence is less than r and r is a similarity threshold, the expansion process is stopped; otherwise, the expansion is continued.
Preferably, when the step 5 is executed, the method specifically includes obtaining the position range of the source code segment corresponding to the high-similarity subsequence according to the line number information of the source code segment corresponding to the coding sequence, so as to locate the position of the high-similarity part in the two segments of source code segments.
The invention relates to a source code segment pair-wise comparison method based on coding sequence representation, which solves the technical problems that cross-granularity similarity matching cannot be supported and high-similarity segment positioning is not accurate enough in the existing source code similarity matching algorithm; meanwhile, because the coding sequence obtained by conversion based on the abstract syntax tree is used for matching, the similar segments can be accurately positioned to the corresponding code lines, cross-line matching can be supported, and the method has better applicability and matching performance compared with the prior art.
Drawings
FIG. 1 is a general flow frame diagram of the present invention;
fig. 2 is a source code fragment text a provided in the present embodiment;
fig. 3 is a source code fragment text B provided in the present embodiment;
FIG. 4 is a coding sequence of text A conversion of a source code segment in the present embodiment;
FIG. 5 is a coding sequence of the source code segment text B conversion in the present embodiment;
FIG. 6 is an abstract syntax tree of the source code segment A and the coding sequence of each node in the present embodiment;
FIG. 7 is an abstract syntax tree of the source code segment B and the coding sequence of each node in the present embodiment;
FIG. 8 is a high similarity seed pair provided in the present embodiment;
FIG. 9 is a matching subsequence provided in the present embodiment;
fig. 10 is a high similarity part in the two pieces of source code text provided in the present embodiment.
Detailed Description
For those skilled in the art to better understand the technical solutions in the present invention, the technical solutions in the embodiments of the present invention will be described in detail below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, belong to the scope of the present invention.
As shown in fig. 1-10, a method for comparing pairs of source code segments based on coded sequence representation includes the following steps:
step 1: establishing a source code database for storing a source code text;
establishing a code conversion module, and converting a source code fragment of a source code text into a code sequence in the code conversion module by using a code sequence source code representation method based on static program analysis;
in this embodiment, a source code analysis tool is used in the code conversion module to convert the source code text into the original abstract syntax tree; combining the node type of the abstract syntax tree, simplifying redundant information of the original abstract syntax tree, and only reserving the tree structure and the node type of the abstract syntax tree; a coding sequence format is adopted, structure coding and type coding of nodes are simultaneously contained, the simplified abstract syntax tree is subjected to traversal coding, and coding sequence representation of a source code text is generated.
Step 2: in a code conversion module, performing data processing on a coding sequence by using a Burrows-Wheeler block ordering compression algorithm to obtain an index of the coding sequence;
and step 3: establishing a sequence alignment module, and finding out a subsequence alignment seed with high similarity from an index of a coding sequence through seed screening in the sequence alignment module;
and 4, step 4: in a sequence alignment module, a Smith-Waterman algorithm is adopted to take the high-similarity seeds as the initial positions of subsequence alignment, and the subsequences which keep a certain similarity threshold in the subsequent sequences are expanded;
and 5: and positioning high-similarity parts among the source code segments according to the line number information of the source code segments corresponding to the coding sequences.
Preferably, when step 1 is executed, the method for representing source code of a coding sequence by using static program analysis in the code conversion module specifically includes: and processing the code segments by taking the code segments as units, and converting the source code text into a coding sequence.
Preferably, when step 2 is executed, the method specifically includes using a Burrows-Wheeler block ordering compression algorithm to index all nodes of the coding sequence, and obtaining the index sequence of all nodes in the coding sequence.
And converting the original coding sequence by using Burrows-Wheeler conversion to form an increasing sequence according to the value of each coding node, recording the corresponding position of each node in the increasing sequence in the original coding sequence, searching the starting position of the high-similarity seed by increasing the sequence index, and finding out the subsequent subsequence of the starting node by the position of the original node.
Preferably, when step 3 is executed, the method specifically includes the following steps:
step A1: selecting seed position pairs with structural codes not greater than 0 and the same type of codes according to the indexes of the nodes of the coding sequences of the two segments of source code segments to be compared;
step A2: for any seed position pair, if the number of the same nodes at the corresponding positions in the subsequent K nodes is not less than K multiplied by r, wherein r is a similarity threshold, the seed position pair is marked as a subsequence candidate seed pair with high similarity.
The invention firstly selects candidate seed pairs according to the characteristics of coding sequences, and the node N of each coding sequence i Are all encoded by structure SC i And type coding TC i If there is a pair of nodes in the two coding sequences A and BBoth structural codes are not greater than 0 and the type codes are the same, i.e.Then the nodeMay be referred to as a candidate seed pair.
Then, the candidate seed pairs are comparedK short sequences as starting positions, respectivelyAnd andthe number of the same nodes at the corresponding positions is t, if the number satisfiesWherein r is 1 If a person is a set similarity threshold, two seed pairs can be recorded as
Preferably, when step 4 is executed, the method specifically includes expanding a subsequent sequence by using a smith-waterman algorithm for each subsequence candidate seed pair, taking K subsequent nodes each time as expansion, where the length of the expanded subsequence is nK, and if the similarity of the expanded subsequence is less than r and r is a similarity threshold, the expansion process is stopped; otherwise, continuing to expand until the expansion reaches the starting position of the next seed pair.
High-similarity seed pair obtained by screening seedsAs the start position of the subsequence, the node of the Q term is expanded backward using the Smith-Waterman algorithm, i.e., smith-Waterman algorithm, and Q > K, i.e., Q > KAndtwo subsequences, ifAndthe similarity of the two subsequences in the Smith-Waterman algorithm is scored as maxS ≧ r 2 Wherein r is 2 If a person is a set similarity threshold, thenMay be referred to as matching subsequences.
Preferably, when the step 5 is executed, the method specifically includes obtaining the position range of the source code segment corresponding to the high-similarity subsequence according to the line number information of the source code segment corresponding to the coding sequence, so as to locate the position of the high-similarity part in the two segments of source code segments.
Each node of the coding sequence has corresponding code line number information when being generated, and the corresponding code line in the source code segment can be positioned according to the line number information of each node in the high-similarity subsequence, so that the high-similarity part between the source code segments can be positioned.
In this embodiment, as shown in fig. 4, the source code text a may be converted into the encoding sequence, each line corresponds to one abstract syntax tree node, and each node corresponds to the source code line number information and is stored in another file.
As shown in fig. 6, the source code fragment a can be parsed into the abstract syntax tree, where each node in the abstract syntax tree is an abstract syntax tree node, and each node has a C language abstract syntax tree node type defined by its corresponding Clang parser and a code of the node.
As shown in fig. 5, the source code text B may be converted into the encoding sequence, each line corresponds to an abstract syntax tree node, and each node corresponds to source code line number information and is stored in another file.
As shown in fig. 7, the source code fragment B can be parsed into the abstract syntax tree, where each node in the abstract syntax tree is an abstract syntax tree node, and each node has a C language abstract syntax tree node type defined by its corresponding Clang parser and a code of the node.
As shown in fig. 8, the screened high similarity seed pairs, if K =5 and the similarity threshold r1=0.8 is given in the seed screening process, two seed pairs can be obtained, the first column represents the start position of the coding sequence of the source code segment a, the second column represents the start position of the coding sequence of the source code segment B, and the third column represents the K value of the K short sequences.
As shown in fig. 9, smith-Waterman expansion is performed on two seed pairs to obtain two high similarity matching subsequences, wherein different colors represent different matching subsequences, and in fig. 9, the high similarity matching subsequences are indicated by boxes.
As shown in fig. 10, according to the matching subsequence, according to the source code line number information retained in the process of encoding sequence conversion, a code segment with high similarity in the source code text can be reversely found, as shown in a block in fig. 10.
The source code segment pair-wise comparison method based on coding sequence representation solves the technical problems that cross-granularity similarity matching cannot be supported and high-similarity segment positioning is not accurate enough in the existing source code similarity matching algorithm, can support cross-granularity source code similarity comparison, does not require the same granularity of source code texts to be compared, and can be applied to a similarity matching task of codes in a programming development stage; meanwhile, because the coding sequence obtained based on abstract syntax tree conversion is used for matching, the similar segments can be accurately positioned to the corresponding code lines, cross-line matching can be supported, and the method has better applicability and matching performance compared with the prior art.
In the present invention, any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present invention.
The logic and/or steps represented in the flowcharts or otherwise described herein, such as an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Further, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.
Claims (5)
1. A method for comparing source code segments in pairs based on coded sequence representation, comprising: the method comprises the following steps:
step 1: establishing a source code database for storing a source code text;
establishing a code conversion module, and converting a source code fragment of a source code text into a code sequence in the code conversion module by using a code sequence source code representation method based on static program analysis;
and 2, step: in a code conversion module, performing data processing on a coding sequence by using a Burrows-Wheeler block ordering compression algorithm to obtain an index of the coding sequence;
and step 3: establishing a sequence alignment module, and finding out a subsequence alignment seed with high similarity from an index of a coding sequence through seed screening in the sequence alignment module;
when step 3 is executed, the method specifically comprises the following steps:
step A1: selecting seed position pairs with structural codes not greater than 0 and the same type of codes according to the indexes of the nodes of the coding sequences of the two segments of source code segments to be compared;
step A2: for any seed position pair, if the number of the same nodes at the corresponding positions in the subsequent K nodes is not less than K multiplied by r, wherein r is a similarity threshold, marking the seed position pair as a subsequence candidate seed pair with high similarity;
firstly, selecting candidate seed pairs according to the characteristics of coding sequences, and selecting node N of each coding sequence i Are all encoded by structure SC i And type coding TC i If there is a pair of nodes in the two coding sequences A and BBoth structural codes are not greater than 0 and the type codes are the same, i.e.Then the nodeMay be referred to as a candidate seed pair;
then, the candidate seed pairs are comparedK short sequences for the starting position, respectivelyAndandthe number of the same nodes at the corresponding positions is t, if the number satisfiesWherein r is 1 If a person is a set similarity threshold, two seed pairs can be recorded as
And 4, step 4: in a sequence alignment module, a Smith-Waterman algorithm is adopted to take the high-similarity seeds as the initial positions of subsequence alignment, and the subsequences which keep a certain similarity threshold in the subsequent sequences are expanded;
and 5: and positioning high-similarity parts among the source code segments according to the line number information of the source code segments corresponding to the coding sequences.
2. A method for pairwise comparison of source code segments based on coded sequence representations, according to claim 1, wherein: when the step 1 is executed, the method for representing the source code of the coding sequence by using the static program analysis-based code conversion module specifically comprises the following steps: and processing the code segments by taking the code segments as units, and converting the source code text into a coding sequence.
3. A method for pairwise comparison of source code segments based on coded sequence representations, according to claim 1, wherein: when the step 2 is executed, specifically, the method includes using a Burrows-Wheeler block ordering compression algorithm to index all nodes of the coding sequence, and obtaining the index sequence of all nodes in the coding sequence.
4. A method of pairwise comparison of source code segments based on coded sequence representation according to claim 3, characterized by: when the step 4 is executed, expanding a subsequent sequence by using a smith-waterman algorithm for each subsequence candidate seed pair, taking K subsequent nodes as expansion each time, wherein the length of the expanded subsequence is nK, and if the similarity of the expanded subsequence is less than r and r is a similarity threshold value, stopping the expansion process; otherwise, continuing to expand until the expansion reaches the starting position of the next seed pair.
5. A method of comparing pairs of source code fragments based on a representation of an encoded sequence as claimed in claim 4, wherein: when the step 5 is executed, specifically obtaining the position range of the source code segment corresponding to the high-similarity subsequence according to the line number information of the source code segment corresponding to the coding sequence, thereby positioning the position of the high-similarity part in the two segments of source code segments.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011324413.7A CN112416431B (en) | 2020-11-23 | 2020-11-23 | Source code segment pair comparison method based on coding sequence representation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011324413.7A CN112416431B (en) | 2020-11-23 | 2020-11-23 | Source code segment pair comparison method based on coding sequence representation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112416431A CN112416431A (en) | 2021-02-26 |
CN112416431B true CN112416431B (en) | 2023-02-14 |
Family
ID=74777426
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011324413.7A Active CN112416431B (en) | 2020-11-23 | 2020-11-23 | Source code segment pair comparison method based on coding sequence representation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112416431B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20240087936A (en) * | 2022-12-12 | 2024-06-20 | 숙명여자대학교산학협력단 | Method and apparatus for automatically generating natural language comments based on transformer |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2015069393A (en) * | 2013-09-27 | 2015-04-13 | 株式会社東芝 | Document data comparison method, document data comparison apparatus, and document data comparison program |
CN105051741A (en) * | 2012-12-17 | 2015-11-11 | 微软技术许可有限责任公司 | Parallel local sequence alignment |
CN107066837A (en) * | 2017-04-01 | 2017-08-18 | 上海交通大学 | One kind has with reference to DNA sequence dna compression method and system |
CN107615240A (en) * | 2015-04-17 | 2018-01-19 | 巴特尔纪念研究所 | For analyzing the scheme based on biological sequence of binary file |
CN108345468A (en) * | 2018-01-29 | 2018-07-31 | 华侨大学 | Programming language code duplicate checking method based on tree and sequence similarity |
CN108920902A (en) * | 2018-06-29 | 2018-11-30 | 郑州云海信息技术有限公司 | A kind of gene order processing method and its relevant device |
CN109634594A (en) * | 2018-11-05 | 2019-04-16 | 南京航空航天大学 | A kind of code snippet recommended method considering code statement order information |
CN110310705A (en) * | 2018-03-16 | 2019-10-08 | 北京哲源科技有限责任公司 | Support the sequence alignment method and device of SIMD |
CN110737466A (en) * | 2019-10-16 | 2020-01-31 | 南京航空航天大学 | Source code coding sequence representation method based on static program analysis |
CN111562920A (en) * | 2020-06-08 | 2020-08-21 | 腾讯科技(深圳)有限公司 | Method and device for determining similarity of small program codes, server and storage medium |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102063488A (en) * | 2010-12-29 | 2011-05-18 | 南京航空航天大学 | Code searching method based on semantics |
KR102141272B1 (en) * | 2014-06-30 | 2020-08-04 | 마이크로소프트 테크놀로지 라이센싱, 엘엘씨 | Code recommendation |
-
2020
- 2020-11-23 CN CN202011324413.7A patent/CN112416431B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105051741A (en) * | 2012-12-17 | 2015-11-11 | 微软技术许可有限责任公司 | Parallel local sequence alignment |
JP2015069393A (en) * | 2013-09-27 | 2015-04-13 | 株式会社東芝 | Document data comparison method, document data comparison apparatus, and document data comparison program |
CN107615240A (en) * | 2015-04-17 | 2018-01-19 | 巴特尔纪念研究所 | For analyzing the scheme based on biological sequence of binary file |
CN107066837A (en) * | 2017-04-01 | 2017-08-18 | 上海交通大学 | One kind has with reference to DNA sequence dna compression method and system |
CN108345468A (en) * | 2018-01-29 | 2018-07-31 | 华侨大学 | Programming language code duplicate checking method based on tree and sequence similarity |
CN110310705A (en) * | 2018-03-16 | 2019-10-08 | 北京哲源科技有限责任公司 | Support the sequence alignment method and device of SIMD |
CN108920902A (en) * | 2018-06-29 | 2018-11-30 | 郑州云海信息技术有限公司 | A kind of gene order processing method and its relevant device |
CN109634594A (en) * | 2018-11-05 | 2019-04-16 | 南京航空航天大学 | A kind of code snippet recommended method considering code statement order information |
CN110737466A (en) * | 2019-10-16 | 2020-01-31 | 南京航空航天大学 | Source code coding sequence representation method based on static program analysis |
CN111562920A (en) * | 2020-06-08 | 2020-08-21 | 腾讯科技(深圳)有限公司 | Method and device for determining similarity of small program codes, server and storage medium |
Non-Patent Citations (2)
Title |
---|
SENSORY: Leveraging Code Statement Sequence Information for Code Snippets Recommendation;Lei Ai 等;《2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC)》;20190719;第1卷;27-36 * |
基于数据驱动的学生程序代码推荐;滕昌志;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20200215(第02期);I138-763 * |
Also Published As
Publication number | Publication date |
---|---|
CN112416431A (en) | 2021-02-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112464641B (en) | BERT-based machine reading understanding method, device, equipment and storage medium | |
US10380236B1 (en) | Machine learning system for annotating unstructured text | |
JP2009512099A (en) | Method and apparatus for restartable hashing in a try | |
CN111316296A (en) | Structure of learning level extraction model | |
CN113064586A (en) | Code completion method based on abstract syntax tree augmented graph model | |
CN112732862B (en) | Neural network-based bidirectional multi-section reading zero sample entity linking method and device | |
CN112416431B (en) | Source code segment pair comparison method based on coding sequence representation | |
CN114579168B (en) | Code updating method and device, electronic equipment and computer-readable storage medium | |
CN113296755A (en) | Code structure tree library construction method and information push method | |
CN112509644B (en) | Molecular optimization method, system, terminal equipment and readable storage medium | |
US20200159846A1 (en) | Optimizing hash storage and memory during caching | |
JP6261669B2 (en) | Query calibration system and method | |
JP5149063B2 (en) | Data comparison apparatus and program | |
CN114676155A (en) | Code prompt information determining method, data set determining method and electronic equipment | |
CN112925874A (en) | Similar code searching method and system based on case marks | |
CN111581270A (en) | Data extraction method and device | |
US11409806B2 (en) | Apparatus and method for constructing Aho-Corasick automata for detecting regular expression pattern | |
CN116089491B (en) | Retrieval matching method and device based on time sequence database | |
US20230138152A1 (en) | Apparatus and method for generating valid neural network architecture based on parsing | |
CN117951221A (en) | Multimode sentence processing method, multimode sentence processing device, computer equipment and storage medium | |
CN115718696A (en) | Source code cryptography misuse detection method and device, electronic equipment and storage medium | |
CN118782143A (en) | DNA storage data reconstruction method and system based on non-redundant Debrucine diagram | |
CN117931275A (en) | Automatic code merging conflict resolution method based on machine learning | |
CN118051257A (en) | Method and device for detecting code clone fusing depth subtree interaction and electronic equipment | |
Mengel et al. | Extracting structured data from web pages with maximum entropy segmental markov model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |