CN112463162B

CN112463162B - Code naming recommendation method, system, storage medium and equipment

Info

Publication number: CN112463162B
Application number: CN202011447014.XA
Authority: CN
Inventors: 曹文康
Original assignee: Suzhou Inspur Intelligent Technology Co Ltd
Current assignee: Suzhou Inspur Intelligent Technology Co Ltd
Priority date: 2020-12-11
Filing date: 2020-12-11
Publication date: 2022-12-20
Anticipated expiration: 2040-12-11
Also published as: CN112463162A

Abstract

The invention provides a code naming recommendation method, a system, a storage medium and equipment, wherein the method comprises the following steps: scanning finished codes in a public code base and an internal code base, and training a code naming mode of the codes through a seq2seq model; scanning and reasoning unfinished codes in the internal code base based on the training result to obtain an initial recommended name; scanning the unfinished code according to a defined attribute judgment rule to obtain a corresponding class mark value and a corresponding service mark value; and defining a pruning rule based on the class mark value and the service mark value, and reconstructing the initial recommended name through the pruning rule to obtain the recommended name. The invention can improve the standardization of the code writing process by recommending the name, reduce the time cost for optimizing the code, enhance the readability of the code, simultaneously facilitate the rapid audit of the code by auditors, and improve the audit efficiency.

Description

Code naming recommendation method, system, storage medium and equipment

Technical Field

The invention relates to the technical field of software, in particular to a code naming recommendation method, system, storage medium and device.

Background

In the process of writing the code, the file, the function and the variable are named visually and in accordance with the specification, so that the readability of the program can be increased, and the programming style of a programmer can be reflected. Better naming habit can effectively improve the maintainability of the program. Early code naming conventions, such as hungarian naming, identified the scope, type, etc. of a variable by prefixing a symbolic identification of the corresponding lower case letter in front of the variable name. The existing common naming specifications generally require code naming to achieve naming awareness, searching can be achieved, ambiguity and reserved words and the like are avoided, universal specifications are made for naming variables, constants, functions (methods), annotations and the like, a naming recommendation system of the current mainstream Integrated Development Environment (IDE) adopts simple naming rules to perform automatic associative completion, naming awareness and the like are difficult to achieve for methods, class naming and the like, and if manual naming is adopted, naming of codes is easily influenced by experience of programmers and personal styles. How to realize naming automation in the encoding process and achieve naming awareness of naming results at the same time becomes a problem to be solved urgently.

Disclosure of Invention

In view of this, the present invention provides a method, a system, a storage medium, and a device for recommending code naming, which are used to implement recommendation of code naming in a code writing process for a user to select.

Based on the above purpose, the invention provides a code naming recommendation method, which comprises the following steps:

scanning finished codes in a public code base and an internal code base, and training a code naming mode of the codes through a seq2seq model;

scanning and reasoning unfinished codes in the internal code base based on the training result to obtain an initial recommended name;

scanning the unfinished code through a defined attribute judgment rule to obtain a corresponding class mark value and a corresponding service mark value;

and defining a pruning rule based on the category tag value and the service tag value, and reconstructing the initial recommended name through the pruning rule to obtain the recommended name.

In some embodiments, scanning and reasoning incomplete codes in the internal code library based on the training results comprises: and obtaining a parameter which enables the log-likelihood conditional probability function to take the maximum value through training, and scanning incomplete codes in an internal code library and naming recommended codes through the parameter.

In some embodiments, scanning the incomplete code according to the defined attribute determination rule to obtain the corresponding category tag value and the corresponding service tag value includes: scanning the unfinished code through the defined attribute judgment rule to obtain a corresponding class mark value, and obtaining a corresponding service mark value based on the class mark value.

In some embodiments, the method further comprises: in response to the user selecting to use the recommendation name, the recommendation name is placed into the internal code base and used as the next training object.

In some embodiments, code naming includes class naming and/or method naming.

In some embodiments, the incomplete code in the internal code library comprises code that has not been audited and/or code that has not passed the audit.

In some embodiments, the CLASS label value comprises METHOD and/or CLASS and/or POJO and/or INTERFACE.

In another aspect of the present invention, a code naming recommendation system is further provided, including:

the training module is configured to scan the finished codes in the public code base and the internal code base and train the code naming mode of the finished codes through a seq2seq model;

the reasoning module is configured for scanning and reasoning incomplete codes in the internal code base based on the training result to obtain an initial recommended name;

the attribute judgment module is configured to scan the unfinished code according to a defined attribute judgment rule to obtain a corresponding class mark value and a corresponding service mark value; and

and the rule pruning module is configured to define a pruning rule based on the category tag value and the service tag value, and reconstruct the initial recommended name through the pruning rule to obtain the recommended name.

In yet another aspect of the present invention, there is also provided a computer readable storage medium storing computer program instructions which, when executed, implement any one of the methods described above.

In yet another aspect of the present invention, a computer device is provided, which includes a memory and a processor, the memory storing a computer program, the computer program executing any one of the above methods when executed by the processor.

The invention has at least the following beneficial technical effects:

according to the invention, the completed code is trained, the unfinished code is inferred according to the training result, and the recommended name is obtained by defining the attribute judgment rule and the trimming rule, so that the standardization in the code writing process can be improved based on the recommended name, the time cost for optimizing the code is reduced, the readability of the code is enhanced, meanwhile, the code can be quickly audited by auditors, and the auditing efficiency is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.

FIG. 1 is a schematic diagram of an embodiment of a recommendation method for code naming provided in accordance with the present invention;

FIG. 2 is a schematic diagram of an embodiment of a code naming recommendation system provided in accordance with the present invention;

fig. 3 is a schematic hardware structure diagram of an embodiment of a computer device for executing the recommendation method for code naming according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.

It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two non-identical entities with the same name or different parameters, and it is understood that "first" and "second" are only used for convenience of expression and should not be construed as limiting the embodiments of the present invention. Moreover, the terms "comprises" and "comprising," as well as any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements does not include other steps or elements inherent in the present invention.

In view of the above object, a first aspect of the embodiments of the present invention provides an embodiment of a recommendation method for code naming. Fig. 1 is a schematic diagram illustrating an embodiment of a recommendation method for code naming provided by the present invention. As shown in fig. 1, the embodiment of the present invention includes the following steps:

s10, scanning finished codes in a public code base and an internal code base, and training a code naming mode of the codes through a seq2seq model;

s20, scanning and reasoning unfinished codes in an internal code library based on a training result to obtain an initial recommended name;

s30, scanning the unfinished codes according to a defined attribute judgment rule to obtain corresponding category tag values and service tag values;

and S40, defining a pruning rule based on the class tag value and the service tag value, and reconstructing the initial recommended name through the pruning rule to obtain the recommended name.

According to the embodiment of the invention, the completed code is trained, the uncompleted code is inferred according to the training result, and the recommended name is obtained by defining the attribute judgment rule and the trimming rule, so that the standardization in the code writing process can be improved based on the recommended name, the time cost for optimizing the code is reduced, the readability of the code is enhanced, and meanwhile, the code can be quickly audited by auditors.

In some embodiments, scanning and reasoning incomplete codes in the internal code library based on the training results comprises: and obtaining a parameter which enables the log-likelihood conditional probability function to take the maximum value through training, and scanning unfinished codes in the internal code library and naming recommended codes through the parameter. The following is the operational process of training through the seq2seq model and obtaining the parameters:

(1) Embedding: the input information is code content, the output information comprises two parts, and a word vector w = (w) composed of the code content ₁ ,w ₂ ,...,w _T ) And a position vector p = (p) composed of syntax structure, indentation, etc. information ₁ P 2.. PT), which together form a summation vector as the input sequence x = (x =) (x) ₁ ,x ₂ ,...,x _T )＝(w ₁ +p ₁ ,w ₂ +p ₂ ,...,w _T +p _T ) Similarly, splitting the output sequence into character strings containing spaces according to the CamelCase style, and obtaining y = (y) through Embedding ₁ ,y ₂ ,...,y _T )

(2) Softmax: softmax has a very wide application in machine learning and deep learning, and is defined as follows:

wherein i represents the category index, C represents the total number of categories, S _i The ratio of the index of the current element to the sum of the indices of all elements is shown.

Suppose that the word vector output probability is P (x) _t |x ₁ ,x ₂ ,...,x _t-1 ) Then the computation of each neuron of the softmax layer is as follows:

wherein h is _t Is an implicit state that is related to the input at the previous moment and the current input, i.e. h _t ＝f(h _t-1 ,x _t ) The probability of the entire sequence is expressed as:

(3) encoder-decoder: the seq2seq model is a framework of RNN (recurrent neural network) deep learning prediction, which consists of an encoder and a decoder. Input sequence x = (x) in encoder-decoder ₁ ,x ₂ ,...,x _T ) If this happens, the output sequence y = (y) ₁ ,y ₂ ,...,y _T ) The probability of occurrence is equal to p (y) _t |x,y ₁ ,y ₂ ,...,y _t-1 ) Continuous multiplication is as follows:

the probability distribution of decoder at a certain time can be expressed as follows:

p(y _t |x，y ₁ ，y ₂ ，…，y _t-1 )＝g(h _t ，y _t-1 ，v)

(4) In backhaul, each layer in the seq2seq deep learning neural network comprises a plurality of parameters, and during forward propagation, each input dimension information of each layer is multiplied by the parameter of the corresponding position to obtain the value of the next layer, and so on until output is obtained; since the final output layer result may be far from the true expected output, a back propagation, i.e. Backpropagation, is required. For training samples, p (y) of all training samples is made ₁ ,y ₂ ,...,y _T |x ₁ ,x ₂ ,...,x _T ) The sum of the probabilities is maximum, and the corresponding log-likelihood conditional probability function is:

and correcting each layer theta by adopting a reverse layer-by-layer differential form, and obtaining the well-trained theta after the multi-round (epoch) is repeatedly trained. The result of back propagation is to train the set of θ parameters that maximizes the function of the above formula.

In the reasoning process, after the completed codes in the public code base and the internal code base are used for training, the optimized theta parameter set is adopted for naming and recommending the local unfinished codes.

In some embodiments, scanning the incomplete code according to the defined attribute determination rule to obtain the corresponding category tag value and the corresponding service tag value includes: scanning the unfinished code according to the defined attribute judgment rule to obtain a corresponding class mark value, and obtaining a corresponding service mark value based on the class mark value. In some embodiments, the CLASS label value comprises METHOD and/or CLASS and/or POJO and/or INTERFACE.

In this embodiment, the unfinished code is scanned according to the defined attribute determination rule to obtain a corresponding category label value kidd, which is specifically as follows:

1. if the code title does not contain class and the code content does not contain a statement defining an internal attribute, marking the code block as a METHOD;

2. if the code title contains CLASS and the code content contains statements that define internal properties and define methods other than the base method (construct/setter/toString/… method), then mark the code block as CLASS;

3. if the code title contains class and the code content contains statements defining internal attributes and methods other than the basic method are not defined, marking the code block as POJO;

4. if the code header contains abstrat and CLASS and no code content, marking the code block ABSTRACT _ CLASS;

5. if the code header contains abstrat, does not contain class and has no code content, marking the code block as ABSTRACT;

6. if the code header does not contain abstrat, class and no code content, marking the code block as ABSTRACT _ METHOD;

7. if the code header contains an INTERFACE, the code block is marked as an INTERFACE.

Obtaining a corresponding service tag value AMUNT based on the class tag value KIND, which is as follows:

1. if the category flag value kidd = METHOD, the attribute determination rule is as follows:

(1) If the code content calls a single other non-tool method, the flag AMOUNT =1;

(2) If the code content is a tool method, marking AMOUNT =2;

(3) In the case other than the above (1) and (2), the flag AMOUNT =0;

2. if the category flag value KIND = CLASS, the attribute determination rule is as follows:

(1) If the code header contains an extensions Exception, the flag AMOUNT =1;

(2) If the code title contains the annotation @ Test or @ SpringTest, the label AMOUNT =2;

(3) If the code header contains instantiations and the implemented interface is from the internal code library, the flag AMOUNT =3;

(4) If the 3 conditions are not met, marking AMOUNT =0;

3. if the category flag value kidd = POJO, the attribute determination rule is as follows:

(1) If the code title contains the annotation @ ApiModel and the code content contains the annotation @ ApiModel Property, the tag AMOUNT =1;

(2) If the code title contains the annotation @ ApiModel, but the code contents do not contain the annotation @ ApiModel property, the tag number =2;

(3) If the code title contains the annotation @ measure, the tag estimate =3;

(4) If the conditions do not meet the 3 conditions, marking AMOUNT =4;

4. if the category label value KIND = INTERFACE, the attribute judgment rule is as follows:

(1) If the code title contains the annotation @ RequestMapping, the flag AMOUNT =1;

(2) If the code title is not annotated, the code content comprises other code titles, and the return value comprises a BO class POJO object, and the mark AMOUNT =2;

(3) If the code title is not annotated, the code content contains other code titles, and the return value contains a DO class POJO object, with the tag AMOUNT =3.

Defining a pruning rule based on the class mark value KIND and the service mark value AMOUNT, and specifically as follows:

METHOD：

1. reconstructing the input recommendation result according to lowerCamelCase;

2. if AMOUNT =1 and the method returns a value as a List, but the recommended naming does not end with a List, then add the List after the naming;

3. if AMOUNT =1 and the method returns a value as a Set, but the recommended naming does not end with Set, then Set is added after naming;

4. if AMOUNT =1 and the method returns a value of Map, but the recommended naming does not end with a Map, then add the Map after naming;

CLASS：

1. reconstructing the introduced recommendation result according to UpperCamelCase;

2. if AMOUNT =1 and the recommended naming does not end with an Exception, then an Exception is added after the naming;

3. if AMOUNT =2, and the recommended naming does not end with the Test, then the Test is added after the naming;

4. if AMOUNT =3, and the recommended naming does not end with Impl, then Impl is added after the naming;

POJO：

2. if AMOUNT =1, and the recommended nomenclature does not end with VOs, then VOs are added after the nomenclature;

3. if AMOUNT =2, and the recommended naming does not end with a DTO, then the DTO is added after the naming;

4. if AMOUNT =3, and the recommended naming does not end with DO, then DO is added after the naming;

5. if AMOUNT =4, and the recommended nomenclature does not end with BO, then BO is added after the nomenclature;

INTERFACE：

2. if AMOUNT =1, and the recommended naming does not end with Api, then add Api after naming;

3. if AMOUNT =2 and the recommended name does not end with Service, adding Service after the name;

4. if AMOUNT =3, and the recommended naming does not end with Dao, then Dao is added after the naming;

no special treatment is performed for ABSTRACT _ CLASS, ABSTRACT and ABSTRACT _ METHOD.

In some embodiments, the method further comprises: in response to the user selecting to use the recommendation name, the recommendation name is placed into the internal code base and used as the next training object. The user can select whether to use the recommended name as the code name, and if the recommended name is selected to be used, the code base pulls the name to fill the internal code base to be used as new training data. The user can reduce the time cost of thinking and editing and improve the code writing efficiency by selecting and using the recommended names.

In some embodiments, code naming includes class naming and/or method naming. For example, java class naming, java method naming, and class naming, method naming in other programming languages are also possible.

In some embodiments, the incomplete code in the internal code library comprises code that has not been audited and/or code that has not passed the audit. The internal code library refers to codes managed by a project group internal code resource management tool, common tools include github, gitlab, code cloud and the like, and functions of code auditing, version control, code merging and the like are provided for developers. After the code auditor audits, the codes without problems enter the code library, the codes with problems return to the developer to be modified, the codes entering the code library are finished codes and are used by other developers in a pulling mode, all the codes before the finishing codes belong to unfinished codes, and the unfinished codes are invisible to other developers.

In a second aspect of the embodiments of the present invention, a recommendation system for code naming is further provided. Fig. 2 is a schematic diagram of an embodiment of a code naming recommendation system provided by the present invention. A recommendation system for code naming comprising: the training module 10 is configured to scan the completed codes in the public code base and the internal code base, and train the code naming mode of the common code base and the completed codes in the internal code base through a seq2seq model; the reasoning module 20 is configured to scan and reason the unfinished codes in the internal code base based on the training result to obtain an initial recommended name; the attribute judgment module 30 is configured to scan the unfinished code according to a defined attribute judgment rule to obtain a corresponding category tag value and a corresponding service tag value; and a rule pruning module 40 configured to define a pruning rule based on the category tag value and the service tag value, and reconstruct the initial recommended name by the pruning rule to obtain the recommended name.

According to the code naming recommendation system, the finished codes are trained, the unfinished codes are inferred according to the training results, the recommended names are obtained by defining the attribute judgment rules and the trimming rules, standardization in the code writing process can be improved based on the recommended names, time cost for optimizing the codes is reduced, readability of the codes is enhanced, and meanwhile, quick audit of code by auditors is facilitated.

In a third aspect of the embodiments of the present invention, a computer storage medium is further provided, where the computer storage medium stores computer program instructions, and the computer program instructions, when executed, implement any one of the above-mentioned embodiment methods.

It is to be understood that all embodiments, features and advantages set forth above with respect to the recommendation method for code naming according to the present invention are equally applicable, without conflict therewith, to the recommendation system and storage medium for code naming according to the present invention. That is, all of the embodiments of the recommendation method applied to code naming and variations thereof described above may be directly transferred to and applied to the system and storage medium according to the present invention, and directly incorporated herein. For the sake of brevity of the present disclosure, no repeated explanation is provided herein.

In a fourth aspect of the embodiments of the present invention, there is further provided a computer device, including a memory 302 and a processor 301, where the memory stores therein a computer program, and the computer program, when executed by the processor, implements any one of the above-mentioned method embodiments.

Fig. 3 is a schematic hardware structural diagram of an embodiment of a computer device for executing the recommendation method for code naming according to the present invention. Taking the computer device shown in fig. 3 as an example, the computer device includes a processor 301 and a memory 302, and may further include: an input device 303 and an output device 304. The processor 301, the memory 302, the input device 303 and the output device 304 may be connected by a bus or other means, and fig. 3 illustrates the connection by a bus as an example. The input device 303 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the code naming recommendation system. The output means 304 may comprise a display device such as a display screen. The processor 301 executes various functional applications of the server and data processing, i.e., implementing the recommended method of code naming of the above-described method embodiments, by running the nonvolatile software programs, instructions, and modules stored in the memory 302.

Finally, it is noted that the computer-readable storage medium (e.g., memory) herein can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. By way of example, and not limitation, nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which can act as external cache memory. By way of example and not limitation, RAM is available in a variety of forms such as synchronous RAM (DRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), synchlink DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The storage devices of the disclosed aspects are intended to comprise, without being limited to, these and other suitable types of memory.

Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments of the present invention.

The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.

It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items. The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

Claims

1. A method for recommending code naming, comprising the steps of:

scanning finished codes in a public code base and an internal code base, and training a code naming mode of the finished codes through a seq2seq model;

scanning the unfinished code according to a defined attribute judgment rule to obtain a corresponding category tag value and a corresponding service tag value;

defining a pruning rule based on the category tag value and the service tag value, and reconstructing the initial recommended name through the pruning rule to obtain a recommended name;

wherein the CLASS label value comprises METHOD and/or CLASS and/or POJO and/or INTERFACE;

if the code title does not contain class and the code content does not contain a statement defining an internal attribute, marking the code block as a METHOD;

if the code title contains CLASS and the code content contains statements defining internal attributes and methods other than the basic method are defined, marking the code block as CLASS, wherein the basic method at least comprises a constructing/getter/setter/toString method;

if the code title contains class and the code content contains statements defining internal attributes and methods other than the basic method are not defined, marking the code block as POJO;

if the code title contains abstrat and CLASS and no code content, marking the code block as ABSTRACT _ CLASS;

if the code header contains abstrat, does not contain class and has no code content, marking the code block as ABSTRACT;

if the code header does not contain abstrat, class and no code content, marking the code block as ABSTRACT _ METHOD;

if the code header contains an INTERFACE, the code block is marked as an INTERFACE.

2. The method of claim 1, wherein scanning and reasoning incomplete codes in the internal code library based on training results comprises:

and obtaining a parameter which enables the log-likelihood conditional probability function to take the maximum value through training, and scanning incomplete codes in an internal code library and naming recommended codes through the parameter.

3. The method of claim 1, wherein scanning the incomplete code according to a defined attribute determination rule to obtain a corresponding class tag value and a corresponding service tag value comprises:

and scanning the unfinished code through a defined attribute judgment rule to obtain a corresponding class mark value, and obtaining a corresponding service mark value based on the class mark value.

4. The method of claim 1, further comprising:

and in response to the user selecting to use the recommended name, putting the recommended name into an internal code base and serving as a training object of the next time.

5. The method of claim 1, wherein the code naming comprises class naming and/or method naming.

6. The method of claim 1, wherein the incomplete code in the internal code repository comprises an unviewed code and/or a code for which the audit failed.

7. A code naming recommendation system, comprising:

the attribute judgment module is configured to scan the unfinished code through a defined attribute judgment rule to obtain a corresponding category tag value and a corresponding service tag value; and

the rule pruning module is configured to define a pruning rule based on the category tag value and the service tag value, and reconstruct the initial recommended name according to the pruning rule to obtain a recommended name;

if the code header contains class and the code content contains statements defining internal attributes and methods other than the basic method are not defined, marking the code block as POJO;

if the code header contains abstrat and CLASS and no code content, marking the code block as ABSTRACT _ CLASS;

8. A computer-readable storage medium, characterized in that computer program instructions are stored which, when executed, implement the method according to any one of claims 1-6.

9. A computer device comprising a memory and a processor, characterized in that the memory has stored therein a computer program which, when executed by the processor, performs the method according to any one of claims 1-6.