CN112115362B - Programming information recommendation method and device based on similar code recognition - Google Patents

Programming information recommendation method and device based on similar code recognition Download PDF

Info

Publication number
CN112115362B
CN112115362B CN202010996554.7A CN202010996554A CN112115362B CN 112115362 B CN112115362 B CN 112115362B CN 202010996554 A CN202010996554 A CN 202010996554A CN 112115362 B CN112115362 B CN 112115362B
Authority
CN
China
Prior art keywords
code
programming information
clone
extracting
doubt
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010996554.7A
Other languages
Chinese (zh)
Other versions
CN112115362A (en
Inventor
王彤
黄袁
陈湘萍
周晓聪
郑子彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202010996554.7A priority Critical patent/CN112115362B/en
Publication of CN112115362A publication Critical patent/CN112115362A/en
Application granted granted Critical
Publication of CN112115362B publication Critical patent/CN112115362B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/75Structural analysis for program understanding

Abstract

The invention provides a programming information recommendation method and device based on similar code identification, wherein the method comprises the following steps: extracting the in-doubt code to obtain a plurality of first code segments and a plurality of first comments; extracting the programming information to obtain a plurality of second code segments and a plurality of first scene keywords; inputting the plurality of first code segments and the plurality of second code segments into a clone detection tool and carrying out clone detection to obtain a plurality of clone pairs; wherein each clone pair consists of a first code segment and a second code segment; taking the first annotation and the first scene keyword as a sentence pair, and calculating the similarity of the sentence pair according to the trained Word2Vec model; and (4) carrying out descending order arrangement on the clone pairs according to the sentence pair similarity, and recommending the clone pairs with higher rank as programming information. The method comprehensively considers the similarity of the scene keywords and the codes, can help to search the causes of the code defects, has more pertinence in the recommended programming information, and can improve the defect repair efficiency.

Description

Programming information recommendation method and device based on similar code recognition
Technical Field
The invention relates to the technical field of software detection and repair, in particular to a programming information recommendation method and device based on similar code identification.
Background
Programmers often encounter technical problems which are difficult to solve independently in the programming process, and usually rely on internet to search corresponding problems for solving. The traditional code defect retrieval mainly depends on a programmer to perform text matching search on the description of the defect, the error information of the program and the like so as to find information for solving the code defect, such as web blogs, question and answer communities and the like. However, the programmer cannot accurately describe the problems encountered by the programmer at each time by language, and the problem that the error information of the program is difficult to locate exists, which undoubtedly brings great challenges to the programmer for solving the code problem in a traditional manner.
The existing code problem retrieval mainly has two modes, one mode is to search text matching based on the description of a programmer to the defect, error report information of the programmer and the like, and the other mode is to detect the defect based on similar codes.
First, the existing search technology based on text matching usually requires a programmer to perform autonomous summary on the encountered problems, extract keywords of the current scene, and then input the keywords into a search engine of the internet for searching. If a programmer is in a situation of insufficient capability of summarizing problem causes or similar keywords point to other problems, a search technology only depending on scene keywords cannot well provide a modification scheme of defect codes.
And the defect detection based on similar codes only depends on the similarity of the codes to carry out sequencing recommendation, and does not consider the actual scene of code multiplexing. Even if the code is the same, the application may have different program vulnerabilities in different scenarios. If code is referenced that does not conform to the current scenario, the programmer's modification behavior may be misled.
The repair of code defects is a complex and time-consuming task, and related research and investigation shows that software companies invest a great deal of time and effort each year in the repair of code defects. Meanwhile, with the popularization of open source codes, a large number of source codes are multiplexed, and a defective code segment is likely to be modified into a similar code with a defect repaired in the multiplexing process. Therefore, for a defective code fragment, if other similar code fragments that are multiplexed can be found, the programmer can obtain a method of repair by comparing the defective code with the code that has already been repaired.
Disclosure of Invention
The invention aims to provide a programming information recommendation method and device based on similar code identification, and aims to solve the technical problems that the code defect description is not accurate enough and the code is difficult to repair in the programming process.
The purpose of the invention can be realized by the following technical scheme:
a programming information recommendation method based on similar code identification comprises the following steps:
extracting the in-doubt code to obtain a plurality of first code segments and a plurality of first comments; wherein each of the first code segments corresponds to each of the first annotations;
extracting programming information on a network to obtain a plurality of second code segments and a plurality of first scene keywords; wherein each of the second code segments corresponds to each of the first scene keywords;
inputting the plurality of first code segments and the plurality of second code segments into a clone detection tool and carrying out clone detection to obtain a plurality of clone pairs; wherein each clone pair consists of a first code segment and a second code segment;
taking the first annotation and the first scene keyword as a sentence pair, and calculating the similarity of the sentence pair according to the trained Word2Vec model; wherein the sentence pair corresponds to the clone pair;
and according to the sentence pair similarity, sequencing the corresponding clone pairs in a descending manner, and recommending the clone pairs with higher rank as the referable programming information.
Optionally, the preceding the first annotation and the first scene keyword as a sentence pair further comprises: extracting the project code of the in-doubt code to obtain a plurality of second comments and a plurality of second scene keywords, inputting the second comments and the second scene keywords into a corpus of a Word2Vec model, and training the Word2Vec model.
Optionally, the extracting the in-doubt code to obtain the plurality of first code segments specifically includes: selecting front, middle and rear three sections of codes of the in-doubt codes, and segmenting according to a code structure to obtain a plurality of first code segments; wherein the character length of the first code segment is greater than 30.
Optionally, the programming information on the network specifically includes: the programmer communicates the website's responses, programs the educational experience, programs the textbook, and the programmer's blog.
Optionally, the extracting the programming information on the network to obtain the plurality of second code segments specifically includes: and extracting a part containing a corresponding language label in the programming information according to the language of the in-doubt code, and acquiring a second code segment with the character length larger than 30.
The invention also provides a programming information recommendation device based on similar code identification, which comprises:
the system comprises an in-doubt code extracting module, a first comment extracting module and a second comment extracting module, wherein the in-doubt code extracting module is used for extracting an in-doubt code to obtain a plurality of first code segments and a plurality of first comments; wherein each of the first code segments corresponds to each of the first annotations;
the programming information extraction module is used for extracting the programming information on the network to obtain a plurality of second code segments and a plurality of first scene keywords; wherein each of the second code segments corresponds to each of the first scene keywords;
a clone pair detection module for inputting the first code segments and the second code segments into a clone detection tool and performing clone detection to obtain a plurality of clone pairs; wherein each clone pair consists of a first code segment and a second code segment;
the sentence pair similarity calculation module is used for taking the first annotation and the first scene keyword as a sentence pair and calculating the sentence pair similarity according to the trained Word2Vec model; wherein the sentence pair corresponds to the clone pair;
and the programming information recommendation module is used for sequencing the corresponding clone pairs in a descending order according to the sentence pair similarity and recommending the clone pairs with higher rank as the referenceable programming information.
Optionally, the method further comprises:
and the project code extraction module is used for extracting the project code where the doubt code is located to obtain a plurality of second comments and a plurality of second scene keywords, inputting the plurality of second comments and the plurality of second scene keywords into a corpus of the Word2Vec model, and training the Word2Vec model.
Optionally, the in-doubt code extraction module is specifically configured to: selecting front, middle and rear three sections of codes of the in-doubt codes, and segmenting according to a code structure to obtain a plurality of first code segments; wherein the character length of the first code segment is greater than 30.
Optionally, the programming information extraction module is specifically configured to: and extracting a part containing a corresponding language label in the programming information according to the language of the in-doubt code, and acquiring a second code segment with the character length larger than 30.
The present invention also provides an electronic device, comprising:
a memory for storing a computer program;
and the processor is used for executing the computer program to realize the programming information recommendation method based on similar code identification.
The invention provides a programming information recommendation method and device based on similar code identification, wherein the method comprises the following steps: extracting the in-doubt code to obtain a plurality of first code segments and a plurality of first comments; wherein each of the first code segments corresponds to each of the first annotations; extracting programming information on a network to obtain a plurality of second code segments and a plurality of first scene keywords; wherein each of the second code segments corresponds to each of the first scene keywords; inputting the plurality of first code segments and the plurality of second code segments into a clone detection tool and carrying out clone detection to obtain a plurality of clone pairs; wherein each clone pair consists of a first code segment and a second code segment; taking the first annotation and the first scene keyword as a sentence pair, and calculating the similarity of the sentence pair according to the trained Word2Vec model; wherein the sentence pair corresponds to the clone pair; and according to the sentence pair similarity, sequencing the corresponding clone pairs in a descending manner, and recommending the clone pairs with higher rank as the referable programming information.
According to the programming information recommendation method and device based on similar code identification, provided by the invention, when programming information is recommended, the similarity information of the current scene keywords and the codes is comprehensively considered, and compared with a method of recommending only depending on the scene keywords or only depending on the code similarity, the method is more comprehensive. The method and the system can provide reference and help for a programmer to understand the reasons of defect generation and design a defect repair scheme, thereby improving defect repair efficiency and reducing defect repair cost.
Drawings
FIG. 1 is a schematic method flow diagram of a programming information recommendation method and apparatus based on similar code identification according to the present invention;
fig. 2 is a schematic diagram of an apparatus structure of a programming information recommendation method and apparatus based on similar code identification according to the present invention.
Detailed Description
The embodiment of the invention provides a programming information recommendation method and device based on similar code identification, and aims to solve the technical problems that the code defect description is not accurate enough and the code is difficult to repair in the programming process.
To facilitate an understanding of the invention, the invention will now be described more fully with reference to the accompanying drawings. Preferred embodiments of the present invention are shown in the drawings. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
Under the condition of low reliability of the text information, how to comprehensively utilize the text information and the code to search the programming information is a main problem to be solved. Therefore, a programmer needs a method for searching programming information based on code comparison to help solve the problems encountered in the programming process.
The programmer needs a method for searching programming information by comprehensively using the similarity between the keywords of the current scene and the codes. The method comprehensively considers the two factors, firstly carries out primary screening according to the similarity of the codes, and then carries out sequencing recommendation according to the scene keywords.
Referring to fig. 1, the following is an embodiment of a method for recommending programming information based on similar code identification according to the present invention, including:
s101: extracting the in-doubt code to obtain a plurality of first code segments and a plurality of first comments; wherein each of the first code segments corresponds to each of the first annotations;
s102: extracting programming information on a network to obtain a plurality of second code segments and a plurality of first scene keywords; wherein each of the second code segments corresponds to each of the first scene keywords;
s103: inputting the plurality of first code segments and the plurality of second code segments into a clone detection tool and carrying out clone detection to obtain a plurality of clone pairs; wherein each clone pair consists of a first code segment and a second code segment;
s104: taking the first annotation and the first scene keyword as a sentence pair, and calculating the similarity of the sentence pair according to the trained Word2Vec model; wherein the sentence pair corresponds to the clone pair;
s105: and according to the sentence pair similarity, sequencing the corresponding clone pairs in a descending manner, and recommending the clone pairs with higher rank as the referable programming information.
In step S101, for the suspected code with defects, the embodiment selects a plurality of code segments before, during and after the suspected code, for example, selects three code segments { fragment before, during and after the suspected code1,fragment2,fragment3As input data.
Specifically, for a section of in-doubt codes, front, middle and rear sections of codes are selected, and the three sections of codes are segmented according to a code structure to obtain a plurality of first code segments; the character length of each first code segment should be greater than 30, that is, the number of characters contained in each first code segment should be greater than 30.
In step S102, the programming information on the network is extracted to obtain a plurality of second code segments { A }1,A2,A3,…,AnAnd a first scene keyword { title } corresponding thereto1,title2,title3,…,titlen}。
The programming information code library of the embodiment integrates the codes of all large question and answer websites and blogs, and is more comprehensive compared with the traditional code which only relies on a separated source library in the middle of the github. Specifically, the programming information on the network may include: the programmer communicates the website's responses, programs the educational experience, programs the textbook, and the programmer's blog.
In this embodiment, a code portion in programming information on a network is extracted to obtain a plurality of second code segments, and the specific steps are as follows: for each programming information unit, extracting a code part containing a corresponding language label in the programming information according to the language of the in-doubt code in the S101, and only reserving information units with the character length larger than 30 in the second code segment; in order to obtain the second code segment, the < code > </code > tag is identified, and the second code segment which can be identified is obtained.
In step S103, a plurality of first code segments and a plurality of second code segments are input to the clone check tool, the plurality of first code segments corresponding to the challenge code are used as a first clone set, the plurality of second code segments corresponding to the programming information are used as a second clone set, and the clone sets are checked and collected between the first clone set and the second clone set to obtain a plurality of clone pairs { (fragment)1,Ai),(fragment1,Aj),…(fragment3,Ak) In which Ai、Aj、Ak∈{A1,A2,A3,…,An}。
Specifically, the clone detection tool may be configured with parameters according to the lengths of the plurality of first code segments and the plurality of second code segments input to the clone detection tool, and the parameters of the clone detection tool may be appropriately reduced in the case where the in-doubt code is short.
It should be noted that, in the process of collecting clone pairs, it is also necessary to collect the first scene keywords corresponding to the second code segment, so as to prepare for the similarity calculation of subsequent sentences.
Before step S104, extracting scene keywords of the programming information to obtain a plurality of second scene keywords, and inputting the obtained second scene keywords into a Word2Vec corpus; extracting submitted comments of the project where the doubt code is located to obtain a plurality of second comments, inputting the second comments into a Word2Vec corpus, using the obtained second scene keywords and the second comments as the corpus, and training the Word2Vec model.
It should be noted that the Word2Vec model used in this embodiment is a relevant model for generating Word vectors, and belongs to a model invented by others and already open sources. After the Word2Vec model training is complete, each Word may be mapped to a vector. Regarding the method using the Word2Vec model, the method is mainly divided into two steps: firstly, training, namely training a Word2Vec model by utilizing a corpus; and secondly, using a Word2Vec model to generate a Word vector.
Specifically, in this embodiment, firstly, training: using a plurality of second annotations and a plurality of second scene keywords as a corpus to train the Word2Vec model; secondly, reuse: the sentence pair similarity is calculated according to the trained Word2Vec model, the sentence pair is input into the trained Word2Vec model, and the sentence pair similarity is calculated by utilizing the trained Word2Vec model.
It should be noted that, for the second scene keyword and the second annotation input into the Word2Vec corpus, it is not limited to the portion matched by the clone pair in S103, but should include all the collected second scene keyword and all the historical submitted annotations of the project in the programming information, i.e., the second annotation.
In step S104, the first annotation and the first scene keyword are used as a sentence pair, the sentence pair is input into the corpus, and the sentence pair similarity is calculated; wherein the sentence pair corresponds to the clone pair;
specifically, the first annotation has a corresponding relationship with the first code segment, the first scene keyword has a corresponding relationship with the second code segment, and each clone pair consists of one first code segment and one second code segment; therefore, when the first annotation and the first scene keyword are used as a sentence pair, the sentence pair and the clone pair have a corresponding relationship.
The first annotation and the first scene keyword are used as sentence pairs, and the sentence pairs are input into a trained Word2Vec model to calculate the similarity of the sentence pairs; submitting an annotation of sentence to similarity comparison, wherein the annotation is a first annotation corresponding to the in-doubt code; for the scene keyword for sentence-to-similarity comparison of input Word2Vec, the scene keyword should be the first scene keyword corresponding to the programming information in the clone pair.
It should be noted that the sentence pair similarity in this embodiment is calculated by calculating the cosine similarity between two sets of vectors to which two sets of words (two sentences) are mapped.
In step S105, the clone pairs are sorted in a descending order according to the similarity of the corresponding sentence pairs, that is, the clone pairs are sorted according to the similarity of the sentence pairs from high to low, and the clone pairs with higher rank are recommended and output as the reference programming information.
The key of the embodiment is that the programming information is recommended by combining two parts of information, namely the scene keyword and the code similarity, and the method is more comprehensive compared with the traditional method of recommending by only depending on the scene keyword or only depending on the code similarity.
In addition to the similarity of the codes, the present embodiment also considers the influence of semantic information of the scene keywords on the recommendation result. Because the same codes have different emphasis in different application scenes, the codes recommended by the embodiment are all application scenes similar to the original problem, and the recommended programming information is more targeted.
The programming information recommendation method based on similar code identification provided by the embodiment of the invention comprehensively considers the similarity information of the current scene keywords and the codes when recommending the programming information, and is more comprehensive compared with a method of recommending only depending on the scene keywords or only depending on the code similarity. The method and the system can provide reference and help for a programmer to understand the reasons of defect generation and design a defect repair scheme, thereby improving defect repair efficiency and reducing defect repair cost.
The following is an embodiment of a programming information recommendation device based on similar code identification, comprising:
the system comprises an in-doubt code extracting module, a first comment extracting module and a second comment extracting module, wherein the in-doubt code extracting module is used for extracting an in-doubt code to obtain a plurality of first code segments and a plurality of first comments; wherein each of the first code segments corresponds to each of the first annotations;
the programming information extraction module is used for extracting the programming information on the network to obtain a plurality of second code segments and a plurality of first scene keywords; wherein each of the second code segments corresponds to each of the first scene keywords;
a clone pair detection module for inputting the first code segments and the second code segments into a clone detection tool and performing clone detection to obtain a plurality of clone pairs; wherein each clone pair consists of a first code segment and a second code segment;
the sentence pair similarity calculation module is used for taking the first annotation and the first scene keyword as a sentence pair, inputting the sentence pair into the corpus and calculating the sentence pair similarity; wherein the sentence pair corresponds to the clone pair;
and the programming information recommendation module is used for sequencing the corresponding clone pairs in a descending order according to the sentence pair similarity and recommending the clone pairs with higher rank as the referenceable programming information.
Further comprising:
and the project code extraction module is used for extracting the project code where the doubt code is located to obtain a plurality of second comments and a plurality of second scene keywords, and inputting the plurality of second comments and the plurality of second scene keywords into the corpus and training the corpus.
In addition, the present invention also provides an electronic device including:
a memory for storing a computer program;
and the processor is used for executing the computer program to realize the programming information recommendation method based on similar code identification.
The embodiment first preprocesses the doubt code and the programming information, and extracts the code segment and the annotation information in the doubt code and the programming information; grouping the processed code segments and inputting the code segments into a code clone detection tool; sorting results output by the clone detection tool; after a word2vec corpus is enriched, performing semantic similarity ordering on the clone detection results; and recommending the detection result to a programmer after sorting. The method can provide reference and help for a programmer to understand the reasons of defect generation and design a defect repair scheme, so that the defect repair efficiency is improved, and the defect repair cost is reduced.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A programming information recommendation method based on similar code identification is characterized by comprising the following steps:
extracting the in-doubt code to obtain a plurality of first code segments and a plurality of first comments; wherein each of the first code segments corresponds to each of the first annotations;
extracting programming information on a network to obtain a plurality of second code segments and a plurality of first scene keywords; wherein each of the second code segments corresponds to each of the first scene keywords;
inputting the plurality of first code segments and the plurality of second code segments into a clone detection tool and carrying out clone detection to obtain a plurality of clone pairs; wherein each clone pair consists of a first code segment and a second code segment;
taking the first annotation and the first scene keyword as a sentence pair, and calculating the similarity of the sentence pair according to the trained Word2Vec model; wherein the sentence pair corresponds to the clone pair;
and according to the sentence pair similarity, sequencing the corresponding clone pairs in a descending manner, and recommending the clone pairs with higher rank as the referable programming information.
2. The programming information recommendation method based on similar code recognition as claimed in claim 1, wherein the preceding the first annotation and the first scene keyword as a sentence pair further comprises: extracting the project code of the in-doubt code to obtain a plurality of second comments and a plurality of second scene keywords, inputting the second comments and the second scene keywords into a corpus of a Word2Vec model, and training the Word2Vec model.
3. The programming information recommendation method based on similar code identification as claimed in claim 2, wherein the extracting the in-doubt code to obtain the plurality of first code segments specifically comprises: selecting front, middle and rear three sections of codes of the in-doubt codes, and segmenting according to a code structure to obtain a plurality of first code segments; wherein the character length of the first code segment is greater than 30.
4. The programming information recommendation method based on similar code identification as claimed in claim 3, wherein the programming information on the network specifically comprises: the programmer communicates the website's responses, programs the educational experience, programs the textbook, and the programmer's blog.
5. The programming information recommendation method based on similar code identification as claimed in claim 4, wherein the extracting the programming information on the network to obtain the plurality of second code segments specifically comprises: and extracting a part containing a corresponding language label in the programming information according to the language of the in-doubt code, and acquiring a second code segment with the character length larger than 30.
6. A programming information recommendation device based on similar code identification, comprising:
the system comprises an in-doubt code extracting module, a first comment extracting module and a second comment extracting module, wherein the in-doubt code extracting module is used for extracting an in-doubt code to obtain a plurality of first code segments and a plurality of first comments; wherein each of the first code segments corresponds to each of the first annotations;
the programming information extraction module is used for extracting the programming information on the network to obtain a plurality of second code segments and a plurality of first scene keywords; wherein each of the second code segments corresponds to each of the first scene keywords;
a clone pair detection module for inputting the first code segments and the second code segments into a clone detection tool and performing clone detection to obtain a plurality of clone pairs; wherein each clone pair consists of a first code segment and a second code segment;
the sentence pair similarity calculation module is used for taking the first annotation and the first scene keyword as a sentence pair and calculating the sentence pair similarity according to the trained Word2Vec model; wherein the sentence pair corresponds to the clone pair;
and the programming information recommendation module is used for sequencing the corresponding clone pairs in a descending order according to the sentence pair similarity and recommending the clone pairs with higher rank as the referenceable programming information.
7. The programming information recommendation device based on similar code identification as claimed in claim 6, further comprising:
and the project code extraction module is used for extracting the project code where the doubt code is located to obtain a plurality of second comments and a plurality of second scene keywords, inputting the plurality of second comments and the plurality of second scene keywords into a corpus of the Word2Vec model, and training the Word2Vec model.
8. The programming information recommendation device based on similar code identification as claimed in claim 7, wherein the doubt code extraction module is specifically configured to: selecting front, middle and rear three sections of codes of the in-doubt codes, and segmenting according to a code structure to obtain a plurality of first code segments; wherein the character length of the first code segment is greater than 30.
9. The programming information recommendation device based on similar code identification as claimed in claim 7, wherein the programming information extraction module is specifically configured to: and extracting a part containing a corresponding language label in the programming information according to the language of the in-doubt code, and acquiring a second code segment with the character length larger than 30.
10. An electronic device, comprising:
a memory for storing a computer program;
a processor for executing the computer program to implement the programming information recommendation method based on similar code identification according to any one of claims 1 to 5.
CN202010996554.7A 2020-09-21 2020-09-21 Programming information recommendation method and device based on similar code recognition Active CN112115362B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010996554.7A CN112115362B (en) 2020-09-21 2020-09-21 Programming information recommendation method and device based on similar code recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010996554.7A CN112115362B (en) 2020-09-21 2020-09-21 Programming information recommendation method and device based on similar code recognition

Publications (2)

Publication Number Publication Date
CN112115362A CN112115362A (en) 2020-12-22
CN112115362B true CN112115362B (en) 2022-01-11

Family

ID=73801182

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010996554.7A Active CN112115362B (en) 2020-09-21 2020-09-21 Programming information recommendation method and device based on similar code recognition

Country Status (1)

Country Link
CN (1) CN112115362B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116720196B (en) * 2023-08-03 2023-11-07 北京比瓴科技有限公司 Code homology detection method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107608732A (en) * 2017-09-13 2018-01-19 扬州大学 A kind of bug search localization methods based on bug knowledge mappings
CN109522011A (en) * 2018-10-17 2019-03-26 南京航空航天大学 A kind of code line recommended method of context depth perception live based on programming
CN109670022A (en) * 2018-12-13 2019-04-23 南京航空航天大学 A kind of java application interface use pattern recommended method based on semantic similarity
CN110716749A (en) * 2019-09-03 2020-01-21 东南大学 Code searching method based on function similarity matching

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10705809B2 (en) * 2017-09-08 2020-07-07 Devfactory Innovations Fz-Llc Pruning engine

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107608732A (en) * 2017-09-13 2018-01-19 扬州大学 A kind of bug search localization methods based on bug knowledge mappings
CN109522011A (en) * 2018-10-17 2019-03-26 南京航空航天大学 A kind of code line recommended method of context depth perception live based on programming
CN109670022A (en) * 2018-12-13 2019-04-23 南京航空航天大学 A kind of java application interface use pattern recommended method based on semantic similarity
CN110716749A (en) * 2019-09-03 2020-01-21 东南大学 Code searching method based on function similarity matching

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Automatically detecting the scopes of source code comments;HuanchaoChen等;《The Journal of Systems and Software》;20190312;第153卷;第45–63页 *
基于增强描述的代码搜索方法;黎宣等;《软件学报》;20170220;第28卷(第6期);第1405-1417页 *

Also Published As

Publication number Publication date
CN112115362A (en) 2020-12-22

Similar Documents

Publication Publication Date Title
US20230334254A1 (en) Fact checking
US10102254B2 (en) Confidence ranking of answers based on temporal semantics
US9754207B2 (en) Corpus quality analysis
CN108280114B (en) Deep learning-based user literature reading interest analysis method
CN110276071B (en) Text matching method and device, computer equipment and storage medium
CN108959559B (en) Question and answer pair generation method and device
CN108875059B (en) Method and device for generating document tag, electronic equipment and storage medium
CN112818093B (en) Evidence document retrieval method, system and storage medium based on semantic matching
US9646247B2 (en) Utilizing temporal indicators to weight semantic values
CN109492081B (en) Text information searching and information interaction method, device, equipment and storage medium
CN111984792A (en) Website classification method and device, computer equipment and storage medium
CN110569335A (en) triple verification method and device based on artificial intelligence and storage medium
WO2015084404A1 (en) Matching of an input document to documents in a document collection
US9558462B2 (en) Identifying and amalgamating conditional actions in business processes
CN110413998B (en) Self-adaptive Chinese word segmentation method oriented to power industry, system and medium thereof
CN112286799B (en) Software defect positioning method combining sentence embedding and particle swarm optimization algorithm
CN112115362B (en) Programming information recommendation method and device based on similar code recognition
CN112328469B (en) Function level defect positioning method based on embedding technology
CN111191413B (en) Method, device and system for automatically marking event core content based on graph sequencing model
CN115150354B (en) Method and device for generating domain name, storage medium and electronic equipment
CN114661684A (en) Method and device for processing log error reporting information based on conditional random field
CN114491209A (en) Method and system for mining enterprise business label based on internet information capture
Li Feature and variability extraction from natural language software requirements specifications
CN109344254B (en) Address information classification method and device
CN113468339A (en) Label extraction method, system, electronic device and medium based on knowledge graph

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant