CN115774667A - Code detection method, device, equipment and storage medium - Google Patents

Code detection method, device, equipment and storage medium Download PDF

Info

Publication number
CN115774667A
CN115774667A CN202211447960.3A CN202211447960A CN115774667A CN 115774667 A CN115774667 A CN 115774667A CN 202211447960 A CN202211447960 A CN 202211447960A CN 115774667 A CN115774667 A CN 115774667A
Authority
CN
China
Prior art keywords
name
function
code
sample
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211447960.3A
Other languages
Chinese (zh)
Inventor
宋欢
刘杰
林蓝东
杨萍
吴沁芸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Douyin Vision Co Ltd
Original Assignee
Douyin Vision Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Douyin Vision Co Ltd filed Critical Douyin Vision Co Ltd
Priority to CN202211447960.3A priority Critical patent/CN115774667A/en
Publication of CN115774667A publication Critical patent/CN115774667A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Stored Programmes (AREA)

Abstract

The utility model provides a code detection method, device, equipment and storage medium, through carrying out the preliminary treatment to the source code file, obtain the function code and its in the source code file preserve the function name, and the general code section that the function code changes into, and then input the general code section to function name generation model, obtain the model and generate the name, thereby carry out name uniformity to preserve the function name and model and generate the name and judge, under the condition that the judgement result indicates that the name is inconsistent, carry out name optimization suggestion. Therefore, by preprocessing the source code file, the richness of the general code segment can be increased, further, in the process of generating the name for the source code file through the model, the analysis effect of the model on the function code can be improved, the precision of the model for generating the name is improved, and the name of the function code can be automatically corrected and recommended, so that the name accuracy of the function code and the matching degree between the function code and the function name are improved.

Description

Code detection method, device, equipment and storage medium
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to a code detection method, apparatus, device, and storage medium.
Background
With the progress and development of science and technology, the requirement for software development is also increased, and developers are often required to write function codes by themselves. In the process of software development, most of the software development workers form a group and collaborate to develop software. Developers in the same group need to understand function codes written by other group members and conduct Code Review (CR) on the function codes, so that problems of the function codes can be found in time, and the quality and maintainability of the codes are improved. The function name can embody the realization function of the function code, and the function name corresponding to the function code is very critical in various processes such as code review, program understanding, function calling and the like.
In order to obtain the function name adapted to the function code, recommendation is mostly performed based on the existing function name or the function name is generated based on the function code, however, the precision of the function name obtained by the above method is low, and the matching degree of the obtained function name and the function code is low. If an ambiguous function name is selected for the function code, the understanding of others about the code function in subsequent use of the function may be affected, and a software calling error may also occur.
Disclosure of Invention
The embodiment of the disclosure at least provides a code detection method, a device, equipment and a storage medium.
The embodiment of the disclosure provides a code detection method, which includes:
preprocessing a source code file to obtain at least one section of function code, a preset function name of the function code in the source code file and a universal code section converted from the function code;
inputting the general code segment into a pre-trained function name generation model to obtain a model generation name of the function code, wherein the function name generation model is obtained by training according to a plurality of sections of sample function codes and corresponding sample function names;
and performing name consistency judgment on the preset function name and the model generation name, and performing name optimization prompting on the preset function name under the condition that the judgment result indicates that the preset function name is inconsistent with the model generation name.
In an optional embodiment, the preprocessing a source code file to obtain at least one segment of function code, a preset function name of the function code in the source code file, and a universal code segment into which the function code is converted includes:
extracting at least one section of function code from a source code file and a preset function name of the function code in the source code file;
performing data cleaning processing on each segment of the function code to obtain a general code segment corresponding to the function code;
the data cleaning processing is carried out on each piece of the function code, and comprises one or more of the following items:
deleting the annotation information in the function code; deleting a preset symbol in the function code; splitting the variable name in the function code into single character strings; and replacing the constant in the function code with a single character string.
In an optional embodiment, the function name generation model is obtained by training according to the following steps:
acquiring a training sample set, wherein the training sample set comprises a plurality of sections of sample function codes and a sample function name corresponding to each section of the sample function codes;
for each section of the sample function code, performing data cleaning processing on the sample function code to obtain a sample code segment corresponding to the sample function code;
training a pre-constructed neural network based on the sample code segment and the corresponding sample function name, and generating a model by taking the obtained model meeting a training cutoff condition as the function name, wherein the pre-constructed neural network comprises a uniform pre-training encoding and decoding network with identifier perception capability.
In an optional embodiment, before the training the pre-constructed neural network based on the sample code segment and the corresponding sample function name, the method further includes:
splitting the sample function name into single character strings to obtain the converted sample function name;
the training of the pre-constructed neural network based on the sample code segment and the corresponding sample function name, and the generation of the model by using the obtained model meeting the training cutoff condition as the function name comprises the following steps:
and training a pre-constructed neural network based on the sample code segment and the corresponding converted sample function name, and taking the obtained model meeting the training cutoff condition as a function name generation model.
In an optional embodiment, the training a pre-constructed neural network based on the sample code segment and the corresponding sample function name, and taking an obtained model satisfying a training cutoff condition as a function name generation model includes:
inputting the sample code segment into a pre-constructed neural network to obtain a training generation name corresponding to the sample code segment;
adjusting network parameters of the neural network based on the obtained training generation name and the sample function name corresponding to the sample code segment to obtain an adjusted neural network;
and returning to the step of inputting the sample code segment into the neural network until the neural network meets the training cutoff condition, and generating the model by taking the trained model as the function name.
In an optional implementation manner, the performing name consistency judgment on the preset function name and the model generation name, and performing name optimization prompting on the preset function name under the condition that a judgment result indicates that the preset function name is inconsistent with the model generation name includes:
acquiring a preset similarity threshold;
determining the similarity between the preset function name and the model generation name;
and under the condition that the similarity is smaller than the similarity threshold value, determining that the judgment result is that the preset function name is inconsistent with the model generation name, and performing name optimization prompting on the preset function name.
In an optional embodiment, when the determination result indicates that the preset function name is inconsistent with the model generation name, performing name optimization prompting on the preset function name includes:
and sending name optimization prompt information including a recommended name to a user submitting the source code file under the condition that the judgment result indicates that the preset function name is inconsistent with the model generation name, wherein the recommended name is the model generation name.
The embodiment of the present disclosure further provides a code detection apparatus, where the apparatus includes:
the data processing module is used for preprocessing a source code file to obtain at least one section of function code, a preset function name of the function code in the source code file and a general code section converted from the function code;
the name generation module is used for inputting the universal code segment into a pre-trained function name generation model to obtain a model generation name of the function code, and the function name generation model is obtained by training according to a plurality of sections of sample function codes and corresponding sample function names;
and the name judging module is used for judging the consistency of the names of the preset function and the model generating name and carrying out name optimization prompting on the preset function name under the condition that the judging result indicates that the preset function name is inconsistent with the model generating name.
In an optional implementation manner, the data processing module is specifically configured to:
extracting at least one section of function code and a preset function name of the function code in a source code file from the source code file;
performing data cleaning processing on each segment of the function code to obtain a general code segment corresponding to the function code;
when the data processing module is used for performing data cleaning processing on each piece of the function code, the data processing module comprises one or more of the following contents:
deleting the annotation information in the function code; deleting a preset symbol in the function code; splitting variable names in the function codes into single character strings; replacing constants in the function code with a single string.
In an optional embodiment, the apparatus further comprises a model training module, the model training module is configured to:
acquiring a training sample set, wherein the training sample set comprises a plurality of sections of sample function codes and a sample function name corresponding to each section of the sample function codes;
for each section of the sample function code, performing data cleaning processing on the sample function code to obtain a sample code segment corresponding to the sample function code;
training a pre-constructed neural network based on the sample code segment and the corresponding sample function name, and generating a model by taking the obtained model meeting a training cutoff condition as the function name, wherein the pre-constructed neural network comprises a uniform pre-training encoding and decoding network with an identifier sensing capability.
In an optional embodiment, the model training module is further configured to:
splitting the sample function name into single character strings to obtain the converted sample function name;
the model training module is specifically configured to, when the model training module is configured to train a pre-constructed neural network based on the sample code segment and the corresponding sample function name, and generate a model using an obtained model satisfying a training cutoff condition as a function name:
and training a pre-constructed neural network based on the sample code segment and the corresponding converted sample function name, and taking the obtained model meeting the training cutoff condition as a function name generation model.
In an optional embodiment, when the model training module is configured to train a pre-constructed neural network based on the sample code segment and the corresponding sample function name, and generate a model using an obtained model meeting a training cutoff condition as a function name, the model training module is specifically configured to:
inputting the sample code segment into a pre-constructed neural network to obtain a training generation name corresponding to the sample code segment;
adjusting network parameters of the neural network based on the obtained training generation name and the sample function name corresponding to the sample code segment to obtain an adjusted neural network;
and returning to the step of inputting the sample code segment into the neural network until the neural network meets the training cutoff condition, and generating the model by taking the trained model as the function name.
In an optional implementation manner, the name determining module is specifically configured to:
acquiring a preset similarity threshold;
determining the similarity between the preset function name and the model generation name;
and under the condition that the similarity is smaller than the similarity threshold value, determining that the judgment result is that the preset function name is inconsistent with the model generation name, and performing name optimization prompting on the preset function name.
In an optional implementation manner, when the name determining module is configured to perform name optimization prompting on the preset function name under the condition that the determination result indicates that the preset function name is inconsistent with the model generation name, the name determining module is specifically configured to:
and sending name optimization prompt information including a recommended name to a user submitting the source code file under the condition that the judgment result indicates that the preset function name is inconsistent with the model generation name, wherein the recommended name is the model generation name.
An embodiment of the present disclosure further provides an electronic device, including: the code detection method comprises a processor, a memory and a bus, wherein the memory stores machine readable instructions executable by the processor, the processor and the memory are communicated through the bus when the electronic device runs, and the machine readable instructions are executed by the processor to execute the steps of the code detection method.
Embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the computer program performs the steps of the code detection method.
The code detection method, the device, the equipment and the storage medium provided by the embodiment of the disclosure can be used for preprocessing a source code file to obtain at least one section of function code and a preset function name of the function code in the source code file, and a universal code segment converted from the function code, further inputting the universal code segment into a pre-trained function name generation model to obtain a model generation name of the function code, wherein the function name generation model is obtained by training according to a plurality of sections of sample function codes and corresponding sample function names, so that name consistency judgment is carried out on the preset function name and the model generation name, and name optimization prompt is carried out on the preset function name under the condition that a judgment result indicates that the preset function name is inconsistent with the model generation name. Therefore, the source code file is preprocessed, the richness of the general code fragments can be increased, the semantic richness is improved, and then the name can be generated for the generation model of the source code file through the function name generation model.
Furthermore, by improving the name accuracy of the function codes and the matching degree between the function codes and the corresponding function names, the readability and the understandability of the function names can be improved, and the comprehension degree of the function codes based on the function names can be further improved, so that the software development efficiency can be improved, and the maintenance cost of the software can be reduced.
In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is to be understood that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art to which the disclosure pertains without the benefit of the inventive faculty, and that additional related drawings may be derived therefrom.
FIG. 1 is a flow chart illustrating a method for code detection provided by an embodiment of the present disclosure;
FIG. 2 is a flowchart illustrating a specific file preprocessing method in the code detection method provided in the embodiment of the present disclosure;
FIG. 3 is a flowchart illustrating a specific method of model training in the code detection method provided by the embodiment of the present disclosure;
FIG. 4 is a process diagram illustrating a method for code detection provided by an embodiment of the present disclosure;
FIG. 5 is a schematic diagram illustrating a code detection apparatus provided by an embodiment of the present disclosure;
fig. 6 shows a second schematic diagram of a code detection apparatus provided in the embodiment of the present disclosure;
fig. 7 shows a schematic structural diagram of an electronic device provided in an embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of the embodiments of the present disclosure, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the disclosure, provided in the accompanying drawings, is not intended to limit the scope of the disclosure, as claimed, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making any creative effort, shall fall within the protection scope of the disclosure.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
The term "and/or" herein merely describes an associative relationship, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of a, B, and C, and may mean including any one or more elements selected from the group consisting of a, B, and C.
Research shows that most of the function names adapted to function codes are recommended based on the existing function names or function names are generated based on the function codes, the recommended mode is that the existing functions similar to or identical to the current function codes are determined based on the existing function codes and the function names stored in the database, and then the function names of the current function codes are obtained based on the function names corresponding to the existing function codes determined from the database.
Based on the research, the disclosure provides a code detection method, aiming at a source code file, at least one section of function code and a preset function name of the function code in the source code file can be obtained by preprocessing the source code file, and a universal code segment converted from the function code can be obtained, so that the richness of the universal code segment can be increased, the semantic richness can be improved, and then a model generation name can be generated for the function code through a function name generation model.
For the convenience of understanding the present embodiment, first, a code detection method disclosed in the embodiments of the present disclosure is described in detail, and an execution subject of the code detection method provided in the embodiments of the present disclosure may be a code detection apparatus, or an electronic device with certain computing capability. In this embodiment, the electronic device may be a server. The server can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and can also be a cloud server for providing basic cloud computing services such as cloud service, a cloud database, cloud computing, cloud storage, big data, an artificial intelligence platform and the like.
In other embodiments, the electronic device may also be a terminal device or other processing device, and the terminal device may be a mobile device, a terminal, an in-vehicle device, a computing device, or the like. The other processing device may be a device that includes a processor and a memory, and is not limited thereto. In some possible implementations, the code detection method may be implemented by a processor invoking computer readable instructions stored in a memory.
The following describes a code detection method provided by the embodiments of the present disclosure.
Referring to fig. 1, fig. 1 is a flowchart illustrating a code detection method according to an embodiment of the disclosure. As shown in fig. 1, a code detection method provided by an embodiment of the present disclosure includes:
s101: preprocessing a source code file to obtain at least one section of function code, a preset function name of the function code in the source code file, and a general code section converted from the function code.
Here, a source code file may be obtained from a code database, and in practical applications, the source code file includes a function, a class, a variable, and the like, where the function includes a function code, a function name, a parameter, and the like, and therefore, the source code file may be preprocessed to obtain at least one piece of function code, a preset function name of the function code in the source code file, and a general code segment into which the function code is converted.
Optionally, at least one section of function code may be parsed from the source code file, the function name currently recorded in the function code in the source code file is the corresponding preset function name, and subsequently, the preset function name of the function code may be subjected to name judgment.
Optionally, the preset function name may be generated by a developer of the function code through self-naming, or may be obtained through other manners, which is not limited herein.
It should be noted that the function codes in the source code file can be obtained by writing in various different programming languages, and the programming languages are formal languages used to define computer programs and used to issue instructions to the computer, so that the computer can implement various different functions. The current program languages include many types, such as C language, C + + language, objective-C language, golang language, VB language, JAVA language, and so on. The scheme can convert various program languages into a universal language by using an interpreter, so that the scheme is not limited by the type of the program language, and the type of the language of the source code file is not limited at all.
Specifically, referring to fig. 2, fig. 2 is a flowchart of a specific file preprocessing method in the code detection method according to the embodiment of the present disclosure. As shown in fig. 2, a file preprocessing method provided by the embodiment of the present disclosure includes:
s1011: at least one section of function code and a preset function name of the function code in a source code file are extracted from the source code file.
As can be seen from the above, functions are written in the source code file, each function corresponds to a function code and a function name, and then at least one section of function code and a preset function name of the function code in the source code file can be extracted from the source code file.
S1012: and carrying out data cleaning processing on each section of the function code to obtain a universal code section corresponding to the function code.
In this step, after at least one section of the function code is extracted, data cleaning processing may be performed on each section of the function code, information unrelated to the semantics of the function code may be removed through the data cleaning processing, and fine-grained conversion may be performed on the function code, so as to obtain a general code segment into which the function code is converted.
Specifically, in some possible embodiments, the performing a data washing process on each piece of the function code includes one or more of the following:
deleting the annotation information in the function code; deleting a preset symbol in the function code; splitting variable names in the function codes into single character strings; and replacing the constant in the function code with a single character string.
The preset symbol may be a preset symbol unrelated to the function code semantics, such as a punctuation mark such as a bracket.
Therefore, by deleting the annotation information and the preset symbol, only the function body can be reserved, and the influence of other information on the code semantics is reduced, so that the accuracy of the model generated name obtained by the function name generation model in the follow-up process is improved, and the accuracy of the name consistency judgment of the function code in the follow-up process is ensured.
In the present embodiment, the variable name is split into a single character string, that is, the compound word is split into a plurality of words, so that the richness of the variable name can be increased, the semantic richness is improved, and the analysis effect of the subsequent function name generation model on the function code is improved.
In the embodiment, the constants are replaced by the single character strings, so that the universality of the constants is improved, and the recognition effect of the subsequent function name generation model on the function codes is improved.
Receiving the signals from S101 and S102: and inputting the universal code segment into a pre-trained function name generation model to obtain a model generation name of the function code, wherein the function name generation model is obtained by training according to a plurality of sections of sample function codes and corresponding sample function names.
In this step, a function name generation model is obtained in advance through training according to multiple sections of sample function codes and corresponding sample function names, and the function name generation model is used for generating corresponding function names for the function codes.
As described above, in the embodiments of the present disclosure, the function name generation model is trained in advance, please refer to fig. 3, and fig. 3 is a flowchart of a specific method for model training in the code detection method provided in the embodiments of the present disclosure. As shown in fig. 3, a model training method provided by the embodiment of the present disclosure includes:
s301: the method comprises the steps of obtaining a training sample set, wherein the training sample set comprises a plurality of sections of sample function codes and sample function names corresponding to the sample function codes.
In order to improve the accuracy of name generation of the subsequently trained function name generation model, the sample function name corresponding to the sample function code may be selected by using a high-quality function name, that is, the sample function name is normalized by words, and the sample function name has a high matching degree with the sample function code, and the sample function name may clearly and intuitively represent the function that can be realized by the sample function code.
Optionally, the sample function codes and the sample function names corresponding to the sample function codes may be extracted from an open source data set, the open source data set may include multiple program language types, and the sample function codes and the sample function names corresponding to the sample function codes may also be manually labeled and filtered, which is not limited in this respect.
In some possible embodiments, the training may be performed in batches according to the program language type corresponding to the sample function code.
Therefore, the neural network is trained by selecting the high-quality sample function codes and the sample function names, and the name generation precision of the trained function name generation model can be improved.
S302: and for each section of the sample function code, performing data cleaning processing on the sample function code to obtain a sample code section corresponding to the sample function code.
Here, the process of performing data cleaning processing on the sample function code is the same as the process of performing data cleaning processing on the function code in the foregoing description, and is not described again here.
Therefore, the sample function codes are subjected to data cleaning processing, so that the number of words in the converted sample code fragments is increased, the richness of the sample code fragments can be increased, the semantic richness is improved, the learning effect of a neural network on various characters is improved, the robustness of a function name generation model obtained through training is greatly enhanced, and the model generation name obtained through the function name generation model and the adaptation degree of the function codes are effectively improved.
S303: training a pre-constructed neural network based on the sample code segment and the corresponding sample function name, and generating a model by taking the obtained model meeting a training cutoff condition as the function name, wherein the pre-constructed neural network comprises a uniform pre-training encoding and decoding network with identifier perception capability.
In a specific application, when used for generating function names, the pre-constructed neural network may be a uniform pre-trained codec network with identifier awareness, such as a CodeT5 network. The CodeT5 is used to train the model for generating the function name, which not only has a high name generation speed, but also has a good detection and recognition effect for any function code, but is not limited thereto.
Here, the sample function name is mostly formed by continuously arranging a plurality of english words and merging the english words into a compound word as the sample function name, in order to improve the learning effect of the neural network on the sample function name, in some possible embodiments, the sample function name may be split into a single character string to obtain the converted sample function name, and by splitting the sample function name into a single character string, that is, splitting the compound word into a plurality of words, the richness of the sample function name may be increased, and the semantic richness is improved, thereby improving the analysis effect of the neural network on the sample function name.
Accordingly, after the converted sample function name is obtained, a pre-constructed neural network may be trained based on the sample code segment and the corresponding converted sample function name, so as to obtain a trained function name generation model.
Optionally, in some possible embodiments, a pre-constructed neural network for generating a function name may be obtained first, then the sample code segment is used as an input of the neural network, the corresponding converted sample function name is used as an output of the neural network, so that the neural network learns, and an obtained model meeting a training cutoff condition is used as a function name generation model.
In other possible embodiments, the sample code segment may be input to a pre-constructed neural network, so as to obtain a training generation name corresponding to the sample code segment; adjusting network parameters of the neural network based on the obtained training generation name and the sample function name corresponding to the sample code segment to obtain an adjusted neural network; and returning to the step of inputting the sample code segment into the neural network until the neural network meets the training cutoff condition, and generating the model by taking the trained model as the function name.
In the above step, the sample code segment may be used as an input of the neural network, a training generation name corresponding to each sample code segment is output through learning and prediction of the neural network, parameters of the neural network are adjusted based on a difference between the output training generation name and the corresponding sample function name, so as to complete one training, then the step of inputting the sample code segment to the neural network is returned, the sample code segment is continuously input to the neural network after the parameters are adjusted for the first time, network parameters of the neural network are continuously adjusted through subsequent processing steps until the neural network after multiple training and adjustment meets a training cutoff condition, and the neural network may be considered to be trained completely, so as to obtain a function name generation model.
The parameters of the neural network are adjusted through the difference between the output training generation name and the corresponding sample function name, the loss of the neural network in each dimension can be calculated by calling a preset loss function, and then the adjustment direction, the adjustment size and the like of each network parameter in the neural network are determined according to the calculated loss.
Accordingly, the training cutoff condition may be that the loss of the neural network in each dimension is smaller than a loss threshold corresponding to each dimension, that is, the training cutoff condition is satisfied, but the training cutoff condition is not limited to this, and in other embodiments, the training cutoff condition may be satisfied when the number of times of adjusting the parameters of the neural network is greater than or equal to a preset number of times.
Receiving the signals from S102, S103: and performing name consistency judgment on the preset function name and the model generation name, and performing name optimization prompting on the preset function name under the condition that the judgment result indicates that the preset function name is inconsistent with the model generation name.
In this step, after the model generation name is obtained, since the model generation model is obtained by training a high-quality sample function code and a corresponding sample function name, the model generation name is considered to have high accuracy, and the model generation name and the function code have high conformity, and the preset function name is the current function name of the function code, at this time, the name consistency determination may be performed on the preset function name and the model generation name, and different processing may be performed according to different determination results, specifically, when the determination result indicates that the preset function name and the model generation name are inconsistent, name optimization prompt may be performed on the preset function name.
Therefore, in some possible embodiments, the performing, by the name consistency determination, a name consistency determination on the preset function name and the model generation name, and performing, in a case that the determination result indicates that the preset function name is inconsistent with the model generation name, name optimization prompting on the preset function name includes:
acquiring a preset similarity threshold;
determining the similarity between the preset function name and the model generation name;
and under the condition that the similarity is smaller than the similarity threshold value, determining that the judgment result is that the preset function name is inconsistent with the model generation name, and performing name optimization prompting on the preset function name.
Here, as can be seen from the above, in practical applications, the function name is mostly formed by a plurality of english words arranged in series and combined into one compound word as the function name, and when determining the similarity between the preset function name and the model generation name, the preset function name and the model generation name may be divided into a plurality of tokens respectively, and the same Token between the preset function name and the model generation name may be compared.
Specifically, the preset function name and the model generation name may be parsed and lexically analyzed, respectively, to segment the preset function name and the model generation name into a plurality of tokens, where a Token is a group of predefined recognizable character strings. Optionally, each english word in the preset function name and the model generation name may be divided into one Token.
Then, a ratio of tokens shared between the preset function name and the model generation name to a sum of the preset function name and the tokens of the model generation name may be determined, where the larger the ratio value, the larger the number of tokens indicating the same is, the higher the similarity between the preset function name and the model generation name is.
Specifically, the calculation formula of the similarity is shown in the following formula (1):
Figure BDA0003951059810000171
wherein Sim (p, c) is the similarity between the preset function name and the model generation name, p is the preset function name, c is the model generation name, num Of Tokens (p) is the number Of Tokens in the preset function name, num Of Tokens (c) is the number Of Tokens in the model generation name, and num Of Shared Tokens (p, c) is the number Of Tokens that are the same between the preset function name and the model generation name.
Here, the value corresponding to the similarity is between 0 and 1, and by comparing the similarity with the similarity threshold, it can be determined whether recommendation and error correction are required for the preset function name based on the model generation name, and further name optimization prompting is performed for the preset function name.
In other embodiments, the similarity between the name of the preset function and the name of the model generation may be obtained by calculating cosine similarity, calculating edit distance, and the like.
After the similarity between the preset function name and the model generation name and a preset similarity threshold are obtained, comparing the similarity with the similarity threshold, and determining that the judgment result is that the preset function name is inconsistent with the model generation name under the condition that the similarity is smaller than the similarity threshold, wherein the preset function name is considered to be greatly different from the model generation name, and the recommendation and the error correction are required to be performed on the preset function name based on the model generation name, so that name optimization prompt is performed on the preset function name.
For example, if the similarity between the preset function name and the model generation name obtained by the above calculation formula is 0.5 and the similarity threshold is 0.8, and at this time, the similarity is smaller than the similarity threshold, name optimization prompt may be performed on the preset function name.
Accordingly, in some possible embodiments, in a case that the determination result indicates that the preset function name is not consistent with the model generation name, name optimization prompt information including a recommended name is sent to a user submitting the source code file, where the recommended name is the model generation name.
Optionally, the name optimization prompt information may further include a judgment result of the name consistency judgment, so that a user knows that the current preset function name of the function code is insufficient, and may perform adjustment based on the recommended name.
Specifically, the user may replace the preset function name of the function code with the model generation name, and determine the model generation name as the function name of the function code, so that the function code has a function name with high accuracy and high matching degree.
In other possible embodiments, if the similarity is greater than or equal to the similarity threshold, it is determined that the determination result is that the preset function name is consistent with the model generation name, at this time, it may be considered that the difference between the preset function name and the model generation name is small, recommendation and error correction for the preset function name based on the model generation name are not needed, and the current preset function name of the function code may be continuously used.
Optionally, in a case that the determination result indicates that the preset function name is consistent with the model generation name, the determination result may be sent to a user who submits the source code file, so that the user knows that the current preset function name of the function code is highly accurate, and may continue to use the preset function name.
Referring to fig. 4, fig. 4 is a schematic process diagram of a code detection method according to an embodiment of the present disclosure. As shown in fig. 4, a source code file is preprocessed to obtain a function code, a preset function name of the function code in the source code file, and a universal code segment converted from the function code, and then the universal code segment is input to a pre-trained function name generation model, so that a model generation name output by the function name generation model can be obtained, and further, name consistency determination is performed on the preset function name and the model generation name, if a determination result indicates that the preset function name is consistent with the model generation name, a user who submits the source code file can be informed that the preset function name has high accuracy, the preset function name can be continuously used, and if the determination result indicates that the preset function name is inconsistent with the model generation name, name optimization prompt information including a recommended name can be sent to the user who submits the source code file, where the recommended name is referred to as the model generation name.
It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.
The code detection method provided by the embodiment of the disclosure can be used for preprocessing a source code file to obtain at least one section of function code and a preset function name of the function code in the source code file, and a universal code section converted from the function code, and then inputting the universal code section into a pre-trained function name generation model to obtain a model generation name of the function code, wherein the function name generation model is obtained by training according to a plurality of sections of sample function codes and corresponding sample function names, so that name consistency judgment is performed on the preset function name and the model generation name, and name optimization prompt is performed on the preset function name under the condition that a judgment result indicates that the preset function name is inconsistent with the model generation name. Therefore, the source code file is preprocessed, the richness of the general code segment can be increased, the semantic richness is improved, and then the name can be generated for the generating model of the source code file through the function name generating model.
Furthermore, by improving the name accuracy of the function codes and the matching degree between the function codes and the corresponding function names, the readability and the understandability of the function names can be improved, and the comprehension degree of the function codes based on the function names can be further improved, so that the software development efficiency can be improved, and the maintenance cost of the software can be reduced.
Based on the same inventive concept, a code detection device corresponding to the code detection method is also provided in the embodiments of the present disclosure, and as the principle of solving the problem of the device in the embodiments of the present disclosure is similar to the code detection method in the embodiments of the present disclosure, the implementation of the device may refer to the implementation of the method, and repeated details are not repeated.
Referring to fig. 5 and fig. 6, fig. 5 is a first schematic diagram of a code detection apparatus provided in an embodiment of the disclosure, and fig. 6 is a second schematic diagram of a code detection apparatus provided in an embodiment of the disclosure.
As shown in fig. 5, a code detection apparatus 500 provided by the embodiment of the present disclosure includes:
a data processing module 510, configured to pre-process a source code file to obtain at least one segment of function code, a preset function name of the function code in the source code file, and a universal code segment converted from the function code;
a name generation module 520, configured to input the generic code segment into a pre-trained function name generation model to obtain a model generation name of the function code, where the function name generation model is obtained by training according to multiple segments of sample function codes and corresponding sample function names;
a name determining module 530, configured to perform name consistency determination on the preset function name and the model generation name, and perform name optimization prompting on the preset function name when a determination result indicates that the preset function name is inconsistent with the model generation name.
In an optional implementation manner, the data processing module 510 is specifically configured to:
extracting at least one section of function code and a preset function name of the function code in a source code file from the source code file;
performing data cleaning processing on each segment of the function code to obtain a universal code segment corresponding to the function code;
the data processing module 510, when configured to perform data cleansing processing on each piece of the function code, includes one or more of the following:
deleting the annotation information in the function code; deleting a preset symbol in the function code; splitting the variable name in the function code into single character strings; replacing constants in the function code with a single string.
In an alternative embodiment, as shown in fig. 6, the apparatus further comprises a model training module 540, the model training module 540 is configured to:
acquiring a training sample set, wherein the training sample set comprises a plurality of sections of sample function codes and a sample function name corresponding to each section of the sample function codes;
for each section of the sample function code, performing data cleaning processing on the sample function code to obtain a sample code segment corresponding to the sample function code;
training a pre-constructed neural network based on the sample code segment and the corresponding sample function name, and generating a model by taking the obtained model meeting a training cutoff condition as the function name, wherein the pre-constructed neural network comprises a uniform pre-training encoding and decoding network with an identifier sensing capability.
In an alternative embodiment, the model training module 540 is further configured to:
splitting the sample function name into single character strings to obtain the converted sample function name;
the model training module 540 is specifically configured to, when the model training module is configured to train a pre-constructed neural network based on the sample code segment and the corresponding sample function name, and use an obtained model satisfying a training cutoff condition as a function name generation model:
and training a pre-constructed neural network based on the sample code segment and the corresponding converted sample function name, and taking the obtained model meeting the training cutoff condition as a function name generation model.
In an optional embodiment, when the model training module 540 is configured to train a pre-constructed neural network based on the sample code segment and the corresponding sample function name, and generate a model using an obtained model meeting a training cutoff condition as a function name, specifically configured to:
inputting the sample code segment into a pre-constructed neural network to obtain a training generation name corresponding to the sample code segment;
adjusting network parameters of the neural network based on the obtained training generation name and the sample function name corresponding to the sample code segment to obtain an adjusted neural network;
and returning to the step of inputting the sample code segment into the neural network after the adjustment of the neural network until the neural network meets the training cutoff condition, and generating a model by taking the trained model as a function name.
In an optional implementation manner, the name determining module 530 is specifically configured to:
acquiring a preset similarity threshold;
determining the similarity between the preset function name and the model generation name;
and under the condition that the similarity is smaller than the similarity threshold value, determining that the judgment result is that the preset function name is inconsistent with the model generation name, and performing name optimization prompting on the preset function name.
In an optional embodiment, when the name determining module 530 is configured to perform name optimization prompting on the preset function name when the determination result indicates that the preset function name is inconsistent with the model generation name, specifically, the name determining module is configured to:
and sending name optimization prompt information including a recommended name to a user submitting the source code file under the condition that the judgment result indicates that the preset function name is inconsistent with the model generation name, wherein the recommended name is the model generation name.
The description of the processing flow of each module in the device and the interaction flow between the modules may refer to the related description in the above method embodiments, and will not be described in detail here.
The code detection device provided by the embodiment of the disclosure can preprocess a source code file to obtain at least one section of function code and a preset function name of the function code in the source code file, and a universal code section converted from the function code, and then input the universal code section into a function name generation model trained in advance to obtain a model generation name of the function code, wherein the function name generation model is obtained by training according to a plurality of sections of sample function codes and corresponding sample function names, so that name consistency judgment is performed on the preset function name and the model generation name, and name optimization prompt is performed on the preset function name under the condition that a judgment result indicates that the preset function name is inconsistent with the model generation name. Therefore, the source code file is preprocessed, the richness of the general code fragments can be increased, the semantic richness is improved, and then the name can be generated for the generation model of the source code file through the function name generation model.
Furthermore, by improving the name accuracy of the function codes and the matching degree between the function codes and the corresponding function names, the readability and the understandability of the function names can be improved, and the comprehension degree of the function codes based on the function names can be further improved, so that the software development efficiency can be improved, and the maintenance cost of the software can be reduced.
Corresponding to the code detection method in fig. 1, an embodiment of the present disclosure further provides an electronic device 700, as shown in fig. 7, which is a schematic structural diagram of the electronic device 700 provided in the embodiment of the present disclosure, and includes:
processor 710, memory 720, and bus 730; the memory 720 is used for storing instructions for execution and includes a memory 721 and an external memory 722; the memory 721 is also referred to as an internal memory, and is used for temporarily storing the operation data in the processor 710 and the data exchanged with the external memory 722 such as a hard disk, the processor 710 exchanges data with the external memory 722 through the memory 721, and when the electronic device 700 operates, the processor 710 and the memory 720 communicate through the bus 730, so that the processor 710 can execute the steps of the code detection method.
The disclosed embodiments also provide a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the code detection method described in the above method embodiments. The storage medium may be a volatile or non-volatile computer-readable storage medium.
The embodiments of the present disclosure also provide a computer program product, where the computer program product includes computer instructions, and when the computer instructions are executed by a processor, the steps of the code detection method in the foregoing method embodiments may be executed.
The computer program product may be implemented by hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and devices may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, and for example, the division of the units into only one type of logical function may be implemented in other ways, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed coupling or direct coupling or communication connection between each other may be through some communication interfaces, indirect coupling or communication connection between devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Finally, it should be noted that: the above-mentioned embodiments are merely specific embodiments of the present disclosure, which are used to illustrate the technical solutions of the present disclosure, but not to limit the technical solutions, and the scope of the present disclosure is not limited thereto, and although the present disclosure is described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: any person skilled in the art can modify or easily conceive of the technical solutions described in the foregoing embodiments or equivalent technical features thereof within the technical scope of the present disclosure; such modifications, changes and substitutions do not depart from the spirit and scope of the embodiments disclosed herein, and they should be construed as being included therein. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (10)

1. A method of code detection, the method comprising:
preprocessing a source code file to obtain at least one section of function code, a preset function name of the function code in the source code file and a universal code section converted from the function code;
inputting the general code segment into a pre-trained function name generation model to obtain a model generation name of the function code, wherein the function name generation model is obtained by training according to a plurality of sections of sample function codes and corresponding sample function names;
and performing name consistency judgment on the preset function name and the model generation name, and performing name optimization prompting on the preset function name under the condition that the judgment result indicates that the preset function name is inconsistent with the model generation name.
2. The method according to claim 1, wherein the preprocessing the source code file to obtain at least one piece of function code and a preset function name of the function code in the source code file, and a general code segment into which the function code is converted includes:
extracting at least one section of function code from a source code file and a preset function name of the function code in the source code file;
performing data cleaning processing on each segment of the function code to obtain a general code segment corresponding to the function code;
the data cleaning processing is carried out on each piece of the function code, and comprises one or more of the following items:
deleting the annotation information in the function code; deleting a preset symbol in the function code; splitting variable names in the function codes into single character strings; replacing constants in the function code with a single string.
3. The method of claim 1, wherein the function name generation model is trained according to the following steps:
acquiring a training sample set, wherein the training sample set comprises a plurality of sections of sample function codes and a sample function name corresponding to each section of the sample function codes;
for each section of the sample function code, performing data cleaning processing on the sample function code to obtain a sample code segment corresponding to the sample function code;
training a pre-constructed neural network based on the sample code segment and the corresponding sample function name, and generating a model by taking the obtained model meeting a training cutoff condition as the function name, wherein the pre-constructed neural network comprises a uniform pre-training encoding and decoding network with an identifier sensing capability.
4. The method of claim 3, wherein before the training of the pre-constructed neural network based on the sample code fragments and the corresponding sample function names, the method further comprises:
splitting the sample function name into single character strings to obtain the converted sample function name;
the training of the pre-constructed neural network based on the sample code segment and the corresponding sample function name, and the generation of the model by using the obtained model meeting the training cutoff condition as the function name comprises the following steps:
and training a pre-constructed neural network based on the sample code segment and the corresponding converted sample function name, and taking the obtained model meeting the training cutoff condition as a function name generation model.
5. The method of claim 3, wherein the training a pre-constructed neural network based on the sample code segment and the corresponding sample function name, and generating a model with the obtained model satisfying a training cutoff condition as a function name comprises:
inputting the sample code segment into a pre-constructed neural network to obtain a training generation name corresponding to the sample code segment;
adjusting network parameters of the neural network based on the obtained training generation name and the sample function name corresponding to the sample code segment to obtain an adjusted neural network;
and returning to the step of inputting the sample code segment into the neural network after the adjustment of the neural network until the neural network meets the training cutoff condition, and generating a model by taking the trained model as a function name.
6. The method according to claim 1, wherein the performing a name consistency determination on the preset function name and the model generation name, and performing a name optimization prompt on the preset function name if the determination result indicates that the preset function name is inconsistent with the model generation name comprises:
acquiring a preset similarity threshold;
determining the similarity between the preset function name and the model generation name;
and under the condition that the similarity is smaller than the similarity threshold value, determining that the judgment result is that the preset function name is inconsistent with the model generation name, and performing name optimization prompting on the preset function name.
7. The method according to claim 1, wherein performing name optimization prompting for the preset function name when the determination result indicates that the preset function name is inconsistent with the model generation name includes:
and sending name optimization prompt information including a recommended name to a user submitting the source code file under the condition that the judgment result indicates that the preset function name is inconsistent with the model generation name, wherein the recommended name is the model generation name.
8. A code detection apparatus, characterized in that the apparatus comprises:
the data processing module is used for preprocessing a source code file to obtain at least one section of function code, a preset function name of the function code in the source code file and a general code segment converted from the function code;
the name generation module is used for inputting the universal code segment into a pre-trained function name generation model to obtain a model generation name of the function code, and the function name generation model is obtained by training according to a plurality of sections of sample function codes and corresponding sample function names;
and the name judging module is used for judging the consistency of the names of the preset function and the model generating name and carrying out name optimization prompting on the preset function name under the condition that the judging result indicates that the preset function name is inconsistent with the model generating name.
9. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when the electronic device is operating, the machine-readable instructions when executed by the processor performing the steps of the code detection method of any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of the code detection method according to one of claims 1 to 7.
CN202211447960.3A 2022-11-18 2022-11-18 Code detection method, device, equipment and storage medium Pending CN115774667A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211447960.3A CN115774667A (en) 2022-11-18 2022-11-18 Code detection method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211447960.3A CN115774667A (en) 2022-11-18 2022-11-18 Code detection method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115774667A true CN115774667A (en) 2023-03-10

Family

ID=85389398

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211447960.3A Pending CN115774667A (en) 2022-11-18 2022-11-18 Code detection method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115774667A (en)

Similar Documents

Publication Publication Date Title
CN110737758A (en) Method and apparatus for generating a model
CN112528034B (en) Knowledge distillation-based entity relationship extraction method
CN111310440B (en) Text error correction method, device and system
US10977155B1 (en) System for providing autonomous discovery of field or navigation constraints
Dey et al. Representation of developer expertise in open source software
US20210191938A1 (en) Summarized logical forms based on abstract meaning representation and discourse trees
CN115357719B (en) Power audit text classification method and device based on improved BERT model
CN111324739B (en) Text emotion analysis method and system
CN114429133A (en) Relying on speech analysis to answer complex questions through neuro-machine reading understanding
CN112002323A (en) Voice data processing method and device, computer equipment and storage medium
CN111667923B (en) Data matching method and device, computer readable medium and electronic equipment
CN116635862A (en) Outside domain data augmentation for natural language processing
CN114528398A (en) Emotion prediction method and system based on interactive double-graph convolutional network
CN116303537A (en) Data query method and device, electronic equipment and storage medium
CN113297852B (en) Medical entity word recognition method and device
CN112989829B (en) Named entity recognition method, device, equipment and storage medium
CN112036186A (en) Corpus labeling method and device, computer storage medium and electronic equipment
US20230351121A1 (en) Method and system for generating conversation flows
CN113705207A (en) Grammar error recognition method and device
CN115470790A (en) Method and device for identifying named entities in file
CN112579733A (en) Rule matching method, rule matching device, storage medium and electronic equipment
CN112528674B (en) Text processing method, training device, training equipment and training equipment for model and storage medium
CN115796141A (en) Text data enhancement method and device, electronic equipment and storage medium
CN113420542B (en) Dialogue generation method, device, electronic equipment and storage medium
CN115510188A (en) Text keyword association method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination