CN114443476A

CN114443476A - Code review method and device

Info

Publication number: CN114443476A
Application number: CN202210025868.1A
Authority: CN
Inventors: 喻煜阳
Original assignee: Alibaba Cloud Computing Ltd
Current assignee: Alibaba Cloud Computing Ltd
Priority date: 2022-01-11
Filing date: 2022-01-11
Publication date: 2022-05-06

Abstract

The code review method provided by the embodiment of the application comprises the following steps: obtaining a code segment to be evaluated; obtaining a code evaluation suggestion aiming at the code segment to be evaluated according to a code evaluation model; the code review model is used for searching words of which the association degrees with the characters included in the code segment to be reviewed meet preset association degree conditions from the words included in the historical code review suggestions as associated words, and generating the code review suggestions for the code segment to be reviewed according to the associated words. Compared with the limited domain knowledge of the reviewer, the code review model is obtained through massive priori knowledge training, more problems can be recognized, and the code review suggestion corresponding to the code segment to be reviewed is obtained according to the code review model, so that the code review efficiency is improved, and the code review quality is improved.

Description

Code review method and device

Technical Field

The application relates to the field of artificial intelligence, in particular to a code review method, a code review device, electronic equipment and a storage medium.

Background

Code review means that after the coding personnel completes code writing or code modification and submits the code file to the code library, the code reviewer reads the code to check whether the code submitted by the coding personnel meets the coding standard and the quality of the code. Code review is an indispensable link in the software development process, and the code review can improve the readability and maintainability of the code and ensure the software development quality.

Currently, in the process of code review for code writing or code modification completed by a coder, a static scanning tool or a deep learning model is generally adopted for code review. However, the static scanning tool is adopted to carry out code review on code writing data or code modification data, by means of a code modification-code review corresponding rule set, the limited rule set enables code modification not contained in the rule set to be incapable of matching with corresponding code review opinions, and the code review range aiming at code modification is reduced; code review of code modification data by adopting a deep learning model requires code review of a full amount of code source files, so that the code review work efficiency is low due to the complex workload of the code review work.

Therefore, how to improve the efficiency of code review for code modification is a technical problem that needs to be solved urgently by those skilled in the art.

Disclosure of Invention

The embodiment of the application provides a code review method, a code review device, electronic equipment and a computer storage medium, so as to solve the technical problem that the efficiency of code review aiming at code modification is low in the prior art. The application also relates to a training method and device of the code review model, the electronic equipment and a storage medium.

The embodiment of the application provides a code review method, which comprises the following steps: obtaining a code segment to be evaluated; obtaining a code evaluation suggestion aiming at the code segment to be evaluated according to a code evaluation model; the code review model is used for searching relevant words corresponding to characters included in the code segment to be reviewed from words included in historical code review suggestions, the relevance between the characters included in the code segment to be reviewed and the relevant words meets a preset relevance condition, and the code review suggestions are generated for the code segment to be reviewed according to the relevant words.

Optionally, the obtaining, according to the code review model, a code review suggestion for the to-be-reviewed code segment includes: acquiring code character embedding vectors to be evaluated and examined corresponding to the code fragments to be evaluated and examined; and generating a code review suggested word embedding vector of which the association degree with the code character embedding vector to be reviewed meets a preset association degree condition through the corresponding relation between the historical code segment character embedding vector and the historical code review suggested word embedding vector which are pre-stored in the historical code review suggested word library.

Optionally, the obtaining of the code character embedding vector to be evaluated corresponding to the code segment to be evaluated includes: carrying out character splitting processing on the code segments to be evaluated to obtain a plurality of subcode characters; obtaining sub-code character embedding vectors corresponding to the plurality of sub-code characters; and forming the code character embedding vector to be evaluated according to a preset combination mode by using the obtained plurality of sub-code character embedding vectors.

Optionally, the method further includes: obtaining sub-code character vectors corresponding to a plurality of sub-code characters; the obtaining of the sub-code character embedding vectors corresponding to the plurality of sub-code characters includes: generating a dimension-reduced sub-code character embedded vector through a multi-layer embedded network model aiming at each sub-code character vector; the multi-layer embedded network model is used for obtaining word vectors with reduced dimensionality aiming at input high-dimensional word vectors or obtaining character vectors with reduced dimensionality aiming at input high-dimensional character vectors.

Optionally, the generating a code review suggested word embedding vector of which the association degree with the to-be-reviewed code character embedding vector meets a preset association degree condition through the correspondence between the historical code segment character embedding vector and the historical code review suggested word embedding vector pre-stored in the historical code review suggested word library includes: matching a plurality of sub-code character embedding vectors included in the code character embedding vector to be evaluated with sub-historical code character embedding vectors included in historical code fragment character embedding vectors in the historical code evaluation suggestion library respectively; obtaining a sub-history code character embedding vector corresponding to the sub-character embedding vector; obtaining a sub-history code review suggested word embedding vector corresponding to the sub-history code character embedding vector; taking the sub-history code review suggested word embedding vector as a relevant word embedding vector corresponding to the sub-character embedding vector; and generating a code review suggested word embedding vector of which the association degree with the code character embedding vector to be reviewed meets a preset association degree condition according to the obtained multiple associated word embedding vectors.

Optionally, the method further includes: screening the plurality of subcode characters according to preset screening conditions to obtain a plurality of target subcode characters; the obtaining sub-code character vectors corresponding to a plurality of sub-code characters includes: and obtaining target sub-code character vectors corresponding to the plurality of target sub-code characters.

Optionally, the code segment to be evaluated is a code segment changed in the source code; the obtaining of the code review suggestion for the code segment to be reviewed according to the code review model includes: obtaining a code review suggestion for the altered code segment according to a code review model.

The embodiment of the application further provides a training method of the code review model, which comprises the following steps: obtaining a historical code segment and a historical code review suggestion corresponding to the historical code segment; acquiring historical code segment character embedding vectors corresponding to the historical code segments and historical code review suggestion word embedding vectors corresponding to the historical code review suggestions; obtaining a first code review suggested word embedding vector aiming at the historical code segment character embedding vector according to an initial code review model; calculating cosine similarity of the first code review suggested word embedding vector and the historical code review suggested word embedding vector, wherein the cosine similarity is used for representing the similarity between the first code review suggested word embedding vector and the historical code review suggested word embedding vector; and training the initial code review model according to the cosine similarity.

Optionally, the obtaining of the historical code segment character embedding vector corresponding to the historical code segment and the historical code review suggestion word embedding vector corresponding to the historical code review suggestion includes: performing character splitting processing on the historical code segments to obtain a plurality of sub-historical code characters; performing word segmentation processing on the historical code review suggestion to obtain a plurality of sub-historical code review suggestion words; obtaining sub-history code character embedding vectors corresponding to the sub-history code characters and sub-history code review suggested word embedding vectors corresponding to the sub-history code review suggested words; accumulating the obtained multiple sub-historical code character embedding vectors to obtain a vector which is used as the historical code segment character embedding vector; correspondingly, the obtained vector obtained by accumulating the multiple sub-historical code review suggested word embedding vectors is used as the historical code review suggested word embedding vector.

Optionally, the obtaining of the sub-history code character embedding vectors corresponding to the plurality of sub-history code characters and the sub-history code review suggested word embedding vectors corresponding to the plurality of sub-history code review suggested words includes: obtaining sub-history code character vectors corresponding to the sub-history code characters and sub-history code review suggested word vectors corresponding to the sub-history code review suggested words; generating a plurality of dimensionality-reduced sub-history code character embedding vectors and a plurality of dimensionality-reduced sub-history code review suggested word embedding vectors through a multi-layer embedding network model; the multi-layer embedded network model is used for obtaining word vectors with reduced dimensionality aiming at input high-dimensional word vectors or obtaining character vectors with reduced dimensionality aiming at input high-dimensional character vectors.

Optionally, the calculating the cosine similarity between the first code review suggested word embedding vector and the historical code review suggested word embedding vector includes: calculating cosine similarity of the first code review suggested word embedding vector and the historical code review suggested word embedding vector according to a first formula; wherein the first formula is:

wherein similarity (a, B) represents cosine similarity of the first code review suggested word embedding vector and the history code review suggested word embedding vector; a represents embedding of first code review suggestion intoAmount, A_iRepresenting the ith element in the first code review suggestion embedding vector, B representing the historical code review suggestion embedding vector, B_iIndicating that the historic code reviews the ith element of the suggested word embedded vector.

Optionally, the training the initial code review model according to the cosine similarity includes: according to the cosine similarity, obtaining a first loss between the first code review suggested word embedding vector and the historical code review suggested word embedding vector; transmitting the first loss back into a trained code review model; and the trained code review model adjusts parameters according to the first loss.

Optionally, the method further includes: terminating the training of the code review model if the cosine similarity of the first code review suggested word embedded vector and the historical code review suggested word embedded vector is less than a predetermined threshold; and the association degree between the first code review suggestion corresponding to the first code review suggestion word embedding vector and the historical code segment meets a preset association degree condition.

An embodiment of the present application further provides a code review device, including: the obtaining unit is used for obtaining the code segments to be evaluated; the code review suggestion generation unit is used for obtaining a code review suggestion aiming at the code segment to be reviewed according to the code review model; the code review model is used for searching relevant words corresponding to characters included in the code segment to be reviewed from words included in historical code review suggestions, the relevance between the characters included in the code segment to be reviewed and the relevant words meets a preset relevance condition, and the code review suggestions are generated for the code segment to be reviewed according to the relevant words.

The embodiment of the present application further provides a training device for a code review model, including: the device comprises a first obtaining unit, a second obtaining unit and a third obtaining unit, wherein the first obtaining unit is used for obtaining a historical code segment and a historical code review suggestion corresponding to the historical code segment; the second obtaining unit is used for obtaining historical code segment character embedding vectors corresponding to the historical code segments and historical code review suggestion word embedding vectors corresponding to the historical code review suggestions; the third obtaining unit is used for obtaining a first code review suggested word embedding vector aiming at the historical code segment character embedding vector according to the initial code review model; the computing unit is used for computing cosine similarity of the first code review suggested word embedding vector and the historical code review suggested word embedding vector, and the cosine similarity is used for representing the similarity between the first code review suggested word embedding vector and the historical code review suggested word embedding vector; and the training unit is used for training the initial code review model according to the cosine similarity.

An embodiment of the present application further provides an electronic device, including: a processor; a memory for storing a program of a method, which when read run by a processor, performs the method of any of the above.

Embodiments of the present application further provide a computer storage medium, in which a computer program is stored, and the program performs any one of the methods described above when executed.

Compared with the prior art, the method has the following advantages:

the code review method provided by the embodiment of the application comprises the following steps: obtaining a code segment to be evaluated; obtaining a code evaluation suggestion aiming at the code segment to be evaluated according to a code evaluation model; the code review model is used for searching relevant words corresponding to characters included in the code segment to be reviewed from words included in historical code review suggestions, the relevance between the characters included in the code segment to be reviewed and the relevant words meets a preset relevance condition, and the code review suggestions are generated for the code segment to be reviewed according to the relevant words.

The code review method provided by the application adopts the code review model to obtain the code review suggestions according to the code segments to be reviewed, and specifically, the code review model generates the corresponding code review suggestions for the code segments to be reviewed according to the corresponding relation between the characters included in the historical code segments pre-stored in the historical code review suggestion library and the words included in the historical code review suggestions. Compared with the limited domain knowledge of the reviewer, the code review model is obtained through massive priori knowledge training, more problems can be recognized, and the code review suggestion corresponding to the code segment to be reviewed is obtained according to the code review model, so that the code review efficiency is improved, and the code review quality is improved.

Drawings

Fig. 1 is an application scenario diagram of a training method for a code review model according to an embodiment of the present application.

Fig. 2 is a flowchart of a code review method according to a first embodiment of the present application.

Fig. 3 is a schematic diagram of a code review device according to a second embodiment of the present application.

Fig. 4 is a flowchart of a method for training a code review model according to a third embodiment of the present application.

Fig. 5 is a schematic diagram of a training apparatus for a code review model according to a fourth embodiment of the present application.

Fig. 6 is a schematic view of an electronic device according to a fifth embodiment of the present application.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of implementation in many different ways than those herein set forth and of similar import by those skilled in the art without departing from the spirit of this application and is therefore not limited to the specific implementations disclosed below.

The embodiment of the application provides a code review method, which includes searching for relevant words with the relevance degree meeting a preset relevance degree condition with the characters of a code segment to be reviewed according to the relevance relationship between the characters included in a historical code segment stored in a historical code review suggestion library in a code review model and the words included in the historical code review suggestion, and generating a code review suggestion corresponding to the code segment to be reviewed according to the relevant words. In the process, the code evaluation is performed without reading the full code data, so that the code evaluation efficiency aiming at the code change segment is improved. Compared with the limited domain knowledge of the reviewer, the code review model is obtained through massive priori knowledge training, more problems can be recognized, and the code review suggestion corresponding to the code segment to be reviewed is obtained according to the code review model, so that the code review efficiency is improved, and the code review quality is improved.

The training method of the code review model provided by the embodiment of the application can be executed by any computer with data processing capability, such as a terminal device or a server.

Fig. 1 is a schematic application scenario diagram of a training method for a code review model according to an embodiment of the present application, which is described in detail below with reference to fig. 1.

As shown in fig. 1, the training process for the code review model includes: a data processing phase, a model training phase, and a code review proposal generation phase.

In the data processing stage 101, the server receives the historical code review request suggestion request data 101-1 sent by the user side, and splits the historical code review request suggestion request data 101-1 to obtain the historical code review suggestion 101-2 and the historical code alteration segment 101-3. And performing word segmentation processing on the historical code review suggestion 101-2 to obtain a plurality of sub-historical code review suggestion words, and performing character segmentation processing on the historical code change segment 101-3 to obtain a plurality of sub-historical code characters.

Obtaining a sub-history code character vector 101-5 corresponding to a plurality of sub-history code characters and a plurality of sub-history code review suggested word vectors corresponding to a plurality of sub-history code review suggested words; respectively generating a plurality of dimensionality-reduced sub-history code character embedded vectors 101-4 and a plurality of dimensionality-reduced sub-history code review suggested word embedded vectors through a multi-layer embedded network model; the multi-layer embedded network model is used for obtaining word vectors with reduced dimensionality aiming at input high-dimensional word vectors or obtaining character vectors with reduced dimensionality aiming at input high-dimensional character vectors.

Accumulating the obtained multiple sub-historical code character embedding vectors to obtain a vector which is used as the historical code segment character embedding vector; correspondingly, the vector obtained by accumulating the obtained multiple sub-historical code review suggested word embedding vectors is used as the historical code review suggested word embedding vector.

In addition, the obtained multiple sub-history code character embedded vectors can be accumulated, and the accumulated vectors are normalized to obtain the vectors as the history code segment character embedded vectors. And accumulating the obtained multiple sub-historical code review suggested word embedding vectors, and normalizing the accumulated vectors to obtain the vectors serving as the historical code review suggested word embedding vectors.

For example, a vector obtained by accumulating a plurality of obtained sub-history code character embedding vectors is {3, 2, 0, 1, 3}, and for improving the calculation efficiency, the vector is normalized and the vector obtained after the normalization is used as a history code segment character embedding vector. Correspondingly, the vector obtained by accumulating the obtained multiple sub-historical code review suggestion word embedding vectors is {3, 3, 6, 2, 3}, in order to improve the calculation efficiency, the vector is subjected to normalization processing, and the vector obtained after the normalization processing is used as the historical code review suggestion word embedding vector.

In a model training stage 102, obtaining a first code review suggested word embedding vector aiming at the historical code segment character embedding vector according to the initial code review model; calculating cosine similarity of a first code review suggested word embedded vector and a historical code review suggested word embedded vector, and obtaining a first loss between the first code review suggested word embedded vector and the historical code review suggested word embedded vector according to the cosine similarity; transmitting the first loss back into a trained code review model; and the trained code review model adjusts parameters according to the first loss.

The historical code review suggestion word embedding vector is 102-1, the historical code segment character embedding vector is 102-2, and Rel (C, R)102-3 represents the association degree between the historical code review suggestion word embedding vector 102-1 and the historical code segment character embedding vector 102-2.

Terminating the training of the code review model if the cosine similarity of the first code review suggested word embedded vector and the historical code review suggested word embedded vector is less than a predetermined threshold; at this time, it is stated that the association degree between the first code review suggestion corresponding to the first code review suggestion word embedding vector and the historical code segment satisfies a preset association degree condition.

In the code review suggestion generation stage 103, the code segment 103-1 to be reviewed is input into the trained code review model 103-2, and a code review suggestion 103-3 corresponding to the code segment 103-1 to be reviewed output by the trained code review model 103-2 is obtained.

The code review model searches for relevant words with the relevance degree meeting a preset relevance degree condition with the characters included in the code segment to be reviewed according to the corresponding relation between the characters included in the historical code segment in the historical code review suggestion library and the words included in the historical code review suggestion, and generates corresponding code review suggestions for the code segment to be reviewed according to the relevant words.

The code review method is described in more detail below with reference to specific embodiments.

First embodiment

Fig. 2 is a flowchart of a code review method according to a first embodiment of the present application, which includes the following steps.

Step S201: and obtaining the code segment to be evaluated.

The method is used for obtaining the code segments to be evaluated and making a data basis for subsequently analyzing the code evaluation suggestions corresponding to the code segments to be evaluated.

And the code segment to be evaluated is a changed code segment in the source code.

According to the method and the device, the code review model is adopted to generate the corresponding code review suggestions for the code segments to be reviewed, the code review suggestions corresponding to the changed code segments are analyzed according to the changed code segments instead of the full code source files, and the generation efficiency of the code review suggestions for the code changing part is improved.

Step S202: and obtaining a code evaluation suggestion aiming at the code segment to be evaluated according to the code evaluation model.

The method comprises the following steps of obtaining a code evaluation suggestion aiming at a code segment to be evaluated by adopting a code evaluation model.

The code review model is used for searching relevant words corresponding to characters included in the code segment to be reviewed from words included in historical code review suggestions, the relevance between the characters included in the code segment to be reviewed and the relevant words meets a preset relevance condition, and the code review suggestions are generated for the code segment to be reviewed according to the relevant words.

The words included in the historical code review suggestions are words in the historical code review suggestions stored in the historical code review library in the code review model in advance.

Specifically, the historical code review suggestion library stores a corresponding relationship between the historical code segment and the historical code review suggestion, and the method includes: the method comprises the steps of obtaining historical code review suggestions, obtaining historical code segment embedding vectors corresponding to historical code segments, obtaining historical code review suggestion words, obtaining historical code segment character embedding vectors corresponding to historical code segments, obtaining historical code review suggestion words, obtaining historical code segments, obtaining historical code review suggestion words, obtaining the historical code review suggestion words, and obtaining the historical code review suggestion words.

The obtaining of the code review suggestion for the code segment to be reviewed according to the code review model may be: acquiring code character embedding vectors to be evaluated and examined corresponding to the code fragments to be evaluated and examined; and generating a code review suggested word embedding vector of which the association degree with the code character embedding vector to be reviewed meets a preset association degree condition through the corresponding relation between the historical code segment character embedding vector and the historical code review suggested word embedding vector which are pre-stored in the historical code review suggested word library.

After the code segment to be evaluated is obtained, the character embedding vector of the code segment to be evaluated corresponding to the code segment to be evaluated is obtained, the embedding vector related to the character embedding vector of the code segment to be evaluated is inquired in a historical code evaluation suggestion library, and then historical code evaluation suggestion data corresponding to the embedding vector of the code evaluation suggestion word are evaluated according to the obtained code with the preset similarity degree with the character embedding vector of the code segment to be evaluated.

The obtaining of the code character embedding vector to be evaluated corresponding to the code segment to be evaluated may refer to: carrying out character splitting processing on the code segments to be evaluated to obtain a plurality of subcode characters; obtaining sub-code character embedding vectors corresponding to the plurality of sub-code characters; and forming the code character embedding vector to be evaluated according to a preset combination mode by using the obtained plurality of sub-code character embedding vectors.

After obtaining the plurality of subcodes, the method further comprises: and obtaining sub-code character vectors corresponding to the plurality of sub-code characters.

The obtaining of the sub-code character embedding vectors corresponding to the plurality of sub-code characters includes: generating a dimension-reduced sub-code character embedded vector through a multi-layer embedded network model aiming at each sub-code character vector; the multi-layer embedded network model is used for obtaining word vectors with reduced dimensionality aiming at input high-dimensional word vectors or obtaining character vectors with reduced dimensionality aiming at input high-dimensional character vectors.

After obtaining the plurality of subcodes, the method further comprises: and screening the plurality of subcode characters according to a preset screening condition to obtain a plurality of target subcode characters. The obtaining sub-code character vectors corresponding to a plurality of sub-code characters includes: and obtaining target sub-code character vectors corresponding to the plurality of target sub-code characters.

The code review suggested word embedding vector generating the code review suggested word embedding vector of which the association degree with the code character embedding vector to be reviewed meets the preset association degree condition through the corresponding relation between the historical code segment character embedding vector and the historical code review suggested word embedding vector pre-stored in the historical code review suggested word library comprises the following steps: matching a plurality of sub-code character embedding vectors included in the code character embedding vector to be evaluated with sub-historical code character embedding vectors included in historical code fragment character embedding vectors in the historical code evaluation suggestion library respectively; obtaining a sub-history code character embedding vector corresponding to the sub-character embedding vector; obtaining a sub-history code review suggested word embedding vector corresponding to the sub-history code character embedding vector; taking the sub-history code review suggested word embedded vector as a relevant word embedded vector corresponding to the sub-character embedded vector; and generating a code review suggested word embedding vector of which the association degree with the code character embedding vector to be reviewed meets a preset association degree condition according to the obtained multiple associated word embedding vectors.

Second embodiment

Corresponding to the application scenario embodiment of the training method of the code review model provided by the present application and the code review method provided by the first embodiment, a second embodiment of the present application further provides a code review device. Since the device embodiment is basically similar to the application scenario embodiment and the first embodiment, the description is relatively simple, and reference may be made to the application scenario embodiment and a part of the description of the first embodiment for relevant points. The device embodiments described below are merely illustrative.

Please refer to fig. 3, which is a schematic diagram of a code review device provided in a second embodiment of the present application.

The code review device comprises:

an obtaining unit 301, configured to obtain a code segment to be evaluated;

the code review generating unit 302 is configured to obtain a code review suggestion for the code segment to be reviewed according to the code review model;

Optionally, the obtaining, according to the code review model, a code review suggestion for the to-be-reviewed code segment includes: acquiring code character embedding vectors to be evaluated and audited corresponding to the code fragments to be evaluated and audited; and generating a code review suggested word embedding vector of which the association degree with the code character embedding vector to be reviewed meets a preset association degree condition through the corresponding relation between the historical code segment character embedding vector and the historical code review suggested word embedding vector which are pre-stored in the historical code review suggested word library.

Optionally, the apparatus is further configured to: obtaining sub-code character vectors corresponding to a plurality of sub-code characters; the obtaining of the sub-code character embedding vectors corresponding to the plurality of sub-code characters includes: generating a dimension-reduced sub-code character embedded vector through a multi-layer embedded network model aiming at each sub-code character vector; the multi-layer embedded network model is used for obtaining word vectors with reduced dimensionality aiming at input high-dimensional word vectors or obtaining character vectors with reduced dimensionality aiming at input high-dimensional character vectors.

Optionally, the generating of the code review suggested word embedded vector, in which the association degree with the to-be-reviewed code character embedded vector satisfies a preset association degree condition, according to the correspondence between the historical code segment character embedded vector and the historical code review suggested word embedded vector pre-stored in the historical code review suggested word library includes: matching a plurality of sub-code character embedding vectors included in the code character embedding vector to be evaluated with sub-historical code character embedding vectors included in historical code fragment character embedding vectors in the historical code evaluation suggestion library respectively; obtaining a sub-history code character embedding vector corresponding to the sub-character embedding vector; obtaining a sub-history code review suggested word embedding vector corresponding to the sub-history code character embedding vector; taking the sub-history code review suggested word embedding vector as a relevant word embedding vector corresponding to the sub-character embedding vector; and generating a code review suggested word embedding vector of which the association degree with the code character embedding vector to be reviewed meets a preset association degree condition according to the obtained multiple associated word embedding vectors.

Optionally, the apparatus is further configured to: screening the plurality of subcode characters according to preset screening conditions to obtain a plurality of target subcode characters; the obtaining sub-code character vectors corresponding to a plurality of sub-code characters includes: and obtaining target sub-code character vectors corresponding to the plurality of target sub-code characters.

Third embodiment

In the first embodiment, a code review method is provided, and the code review method mainly generates a code review suggestion that the association degree meets a preset association degree condition for a code segment to be reviewed through a code review model. Correspondingly, the third embodiment of the present application provides a training method for a code review model, and since the code review method has been described in detail in the first embodiment, reference is made to the description of the first embodiment for relevant points, and details are not repeated here.

A third embodiment of the present application provides a method for training a code review model, please refer to fig. 4, which is a flowchart of the method for training a code review model provided in the third embodiment of the present application.

Step S401: obtaining a historical code segment and a historical code review suggestion corresponding to the historical code segment.

This step is used to obtain the historical code snippets and the historical code review suggestions as training data for the training code review model.

The historical code segments and the historical code review suggestions corresponding to the historical code segments are used as training data because the association degree between the historical code segments and the historical code review suggestions meets a preset association degree condition. The code review model obtained by training in this embodiment can generate a code review suggestion for the code segment to be reviewed, where the association degree of the code segment to be reviewed and the code segment to be reviewed satisfies the preset association degree condition.

Therefore, in the embodiment of the application, the historical code segment and the historical code review suggestion are used as sample data pairs, and the first code review suggestion corresponding to the historical code segment is generated through the initial code review model; cosine similarity calculation is carried out on the first code review suggestion and historical code review suggestions corresponding to the historical code segments and stored in a historical code review suggestion database; determining the difference degree between the first code review proposal and the historical code review proposal according to the obtained cosine similarity value; and further training the code review model according to the difference degree.

The training process of the code review model is described in detail by the following steps.

Before the code review model is trained, data processing needs to be performed on the above historical code segments and the historical code review suggestions, please refer to step S402.

Step S402: and acquiring historical code segment character embedding vectors corresponding to the historical code segments, and historical code review suggestion word embedding vectors corresponding to the historical code review suggestions.

The step is used for carrying out data vectorization processing on the obtained historical code segments and the historical code review suggestions.

The obtaining of the historical code segment character embedding vector corresponding to the historical code segment and the historical code review suggestion word embedding vector corresponding to the historical code review suggestion includes:

performing character splitting processing on the historical code segments to obtain a plurality of sub-historical code characters; and performing word segmentation processing on the historical code review suggestion to obtain a plurality of sub-historical code review suggestion words.

The history code segment is the changed history code segment in the source code. The changed historical code segment is subjected to character segmentation, and special words which are related to code items and have low occurrence frequency, such as variable names, derivative words, method names, complex words and the like, are usually included in the changed historical code segment. And in order to avoid the phenomenon that the special words cannot be expressed by using word vectors, the history code segments are processed by character segmentation and data vectorization to obtain sub-history code character embedded vectors corresponding to a plurality of sub-history code characters.

And performing word segmentation processing and data vectorization processing on the historical code review suggestion, and embedding vectors of the sub-historical code review suggestion corresponding to the plurality of sub-historical code review suggestions.

In addition, the obtained multiple sub-history code character embedding vectors can be accumulated, and the accumulated vectors are normalized to obtain the vectors as the history code segment character embedding vectors. And accumulating the obtained multiple sub-historical code review suggested word embedding vectors, and normalizing the accumulated vectors to obtain the vectors serving as the historical code review suggested word embedding vectors.

In addition, the historical code character embedding vector and the historical code review suggested word embedding vector are obtained by the following method:

the obtaining of the sub-history code character embedding vectors corresponding to the plurality of sub-history code characters and the sub-history code review suggested word embedding vectors corresponding to the plurality of sub-history code review suggested words includes: obtaining sub-history code character vectors corresponding to the sub-history code characters and sub-history code review suggested word vectors corresponding to the sub-history code review suggested words; generating a plurality of dimensionality-reduced sub-history code character embedding vectors and a plurality of dimensionality-reduced sub-history code review suggested word embedding vectors through a multi-layer embedding network model; the multi-layer embedded network model is used for obtaining word vectors with reduced dimensionality aiming at input high-dimensional word vectors or obtaining character vectors with reduced dimensionality aiming at input high-dimensional character vectors.

For example, the historic code review suggestion includes 5 sub-historic code review suggestion words, one for each sub-historic code review suggestion word embedding vector. The first sub-history code review suggestion embedding vector may be defined as the first element being 1 and the remaining 4 elements being 0, and may be represented as {1, 0, 0, 0, 0 }. Accordingly, the second sub-history code review suggestion word embedding vector may be defined as a second element of 1 and the remaining 4 elements of 0, and the vector may be represented as {0, 1, 0, 0, 0 }; the third sub-history code review suggestion word embedding vector can be defined as a third element of 1 and the remaining 4 elements of 0, and the vector can be expressed as {0, 0, 1, 0, 0 }; the fourth sub-history code review suggestion word embedding vector can be defined as the fourth element being 1, and the remaining 4 elements being 0, and the vector can be represented as {0, 0, 0, 1, 0 }; the fifth sub-history code review suggestion embedding vector may be defined as the fifth element being 1 and the remaining 4 elements being 0, and may be represented as {0, 0, 0, 0, 1 }.

The history code review suggestion embedding vector obtained by accumulating the first sub-history code review suggestion embedding vector, the second sub-history code review suggestion embedding vector, the third sub-history code review suggestion embedding vector, the fourth sub-history code review suggestion embedding vector and the fifth sub-history code review suggestion embedding vector can be expressed as {1, 1, 1, 1, 1 }.

Accordingly, the historical code character embedding vector is similar to the vector characterization method of the historical code review suggested word embedding vector.

Step S403: obtaining a first code review suggested word embedding vector aiming at the historical code segment character embedding vector according to an initial code review model;

the method comprises the steps of analyzing historical code segment character embedded vectors by using an initial code review model, and generating a first code review suggested word embedded vector aiming at the historical code segment character embedded vectors.

The first code review suggestion word embedding vector obtained in the step is used for comparing with the historical code review suggestions corresponding to the historical code fragments, and the initial code review model is further trained according to the difference degree of the two suggestions obtained through comparison.

Specifically, cosine similarity between the first code review suggested word embedding vector and the historical code review suggested word embedding vector is calculated, the difference degree between the two is determined according to the cosine similarity, and the specific description is performed in step S404.

Step S404: and calculating cosine similarity of the first code review suggested word embedding vector and the historical code review suggested word embedding vector, wherein the cosine similarity is used for representing the similarity between the first code review suggested word embedding vector and the historical code review suggested word embedding vector.

The step is used for calculating the cosine similarity of the first code review suggested word embedding vector and the historical code review suggested word embedding vector according to a formula, so as to provide a training basis for further training of the initial code review model in the step S405.

Specifically, the calculating the cosine similarity between the first code review suggested word embedding vector and the historical code review suggested word embedding vector includes:

calculating cosine similarity of the first code review suggested word embedding vector and the historical code review suggested word embedding vector according to a first formula;

wherein the first formula is:

wherein similarity (a, B) represents cosine similarity of the first code review suggested word embedding vector and the history code review suggested word embedding vector; a represents a first code review suggested word embedding vector, A_iRepresenting the ith element in the first code review suggestion embedding vector, B representing the historical code review suggestion embedding vector, B_iIndicating the ith element of the historical code review suggestion embedding vector; n is the dimension of the historical code review suggested word embedding vector and the first code review suggested word embedding vector, and n is not less than 2.

The cosine similarity value is between 0 and 1, and the larger the cosine similarity value is, the smaller the included angle between the two vectors is, the more similar the two vectors are; the smaller the cosine similarity value is, the larger the included angle between the two vectors is, the larger the difference degree between the two vectors is.

Step S405: and training the initial code review model according to the cosine similarity.

The method is used for training the initial code review model according to the cosine similarity and judging when to stop training the initial code review model.

The training the initial code review model according to the cosine similarity comprises the following steps:

according to the cosine similarity, obtaining a first loss between the first code review suggested word embedding vector and the historical code review suggested word embedding vector; transmitting the first loss back into a trained code review model; and the trained code review model adjusts parameters according to the first loss.

As can be known from the above formula, when the cosine similarity value is larger, it indicates that the included angle between the two vectors is smaller, the two vectors are more similar, the difference degree between the two vectors is smaller, and it indicates that the difference degree between the first code review suggested word embedded vector and the historical code review suggested word embedded vector is smaller, and it indicates that the training effect of the initial code review model is better.

When the cosine similarity value is smaller, the larger the included angle between the two vectors is, the larger the difference degree between the first code review suggested word embedded vector and the historical code review suggested word embedded vector is, the poorer the training effect of the initial code review model is, and the initial code review model needs to be trained continuously.

Terminating the training of the code review model if the cosine similarity of the first code review suggested word embedded vector and the historical code review suggested word embedded vector is less than a predetermined threshold; and the association degree between the first code review suggestion corresponding to the first code review suggestion word embedding vector and the historical code segment meets a preset association degree condition.

The cosine similarity is smaller than a preset threshold, the difference degree of the first code review suggested word embedded vector and the historical code review suggested word embedded vector is within a preset difference degree threshold range, the initial code review model is adjusted to be in a state meeting the requirement, and the training purpose is achieved.

In addition, the following conditions may also be used as training conditions for terminating the initial code review model: and continuously providing the historical code segments to the initial code review model, wherein the obtained cosine similarity value between the first code review suggestion and the historical code review suggestion tends to be stable.

The training result shows that the currently used initial code review model reaches the capacity limit due to the limitation of the internal architecture thereof, the continuous training has no great significance, and the training can also be stopped.

The above is a specific description of the training method of the code review model provided in the embodiment of the present application.

The training method of the code review model provided by the embodiment of the application comprises the following steps: obtaining a historical code segment and a historical code review suggestion corresponding to the historical code segment; acquiring historical code segment character embedding vectors corresponding to the historical code segments and historical code review suggestion word embedding vectors corresponding to the historical code review suggestions; obtaining a first code review suggested word embedding vector aiming at the historical code segment character embedding vector according to the initial code review model; calculating cosine similarity of the first code review suggested word embedding vector and the historical code review suggested word embedding vector, wherein the cosine similarity is used for representing the similarity between the first code review suggested word embedding vector and the historical code review suggested word embedding vector; and training the initial code review model according to the cosine similarity.

According to the method, the historical code review suggestions corresponding to the historical code segments are used as learning samples, the initial code review model is trained, and the trained code review model generates the code review suggestions, of which the association degrees with the code segments to be reviewed meet the association degree threshold value, for the code segments to be reviewed. Specifically, the first code review suggestion corresponding to the historical code segment obtained by the initial code review model is used for calculating the cosine similarity between the word embedding vector corresponding to the first code review suggestion and the embedding vector of the historical code review suggestion, and determining the difference degree between the word embedding vector corresponding to the first code review suggestion and the embedding vector of the historical code review suggestion according to the cosine similarity so as to determine whether the initial code review model needs to be trained continuously. Therefore, the code review model has better analysis capability, and code review suggestions with correlation degrees meeting the correlation degree condition are obtained for the code segments to be reviewed.

Fourth embodiment

Corresponding to the application scenario embodiment of the training method for the code review model provided by the present application and the training method for the code review model provided by the third embodiment, the fourth embodiment of the present application further provides a training device for the code review model. Since the device embodiment is basically similar to the application scenario embodiment and the third embodiment, the description is relatively simple, and reference may be made to the partial description of the application scenario embodiment and the third embodiment for relevant points. The device embodiments described below are merely illustrative.

Please refer to fig. 5, which is a schematic diagram of a training apparatus for a code review model provided in a fourth embodiment of the present application.

The training device of the code review model comprises:

a first obtaining unit 501, configured to obtain a historical code segment and a historical code review suggestion corresponding to the historical code segment;

a second obtaining unit 502, configured to obtain a historical code segment character embedding vector corresponding to a historical code segment, and a historical code review suggestion word embedding vector corresponding to the historical code review suggestion;

a third obtaining unit 503, configured to obtain a first code review suggested word embedding vector for the historical code segment character embedding vector according to the initial code review model;

a calculating unit 504, configured to calculate cosine similarity between the first code review suggested word embedding vector and the historical code review suggested word embedding vector, where the cosine similarity is used to represent a similarity between the first code review suggested word embedding vector and the historical code review suggested word embedding vector;

and a training unit 505, configured to train the initial code review model according to the cosine similarity.

Optionally, the obtaining of the historical code segment character embedding vector corresponding to the historical code segment and the historical code review suggestion word embedding vector corresponding to the historical code review suggestion includes: performing character splitting processing on the historical code segments to obtain a plurality of sub-historical code characters; performing word segmentation processing on the historical code review suggestion to obtain a plurality of sub-historical code review suggestion words; obtaining sub-history code character embedding vectors corresponding to the sub-history code characters and sub-history code review suggested word embedding vectors corresponding to the sub-history code review suggested words; accumulating the obtained multiple sub-historical code character embedding vectors to obtain a vector which is used as the historical code segment character embedding vector; correspondingly, the vector obtained by accumulating the obtained multiple sub-historical code review suggested word embedding vectors is used as the historical code review suggested word embedding vector.

wherein similarity (a, B) represents cosine similarity of the first code review suggested word embedding vector and the history code review suggested word embedding vector; a represents a first code review suggested word embedding vector, A_iRepresenting the ith element in the first code review suggestion embedding vector, B representing the historical code review suggestion embedding vector, B_iIndicating that the historic code reviews the ith element of the suggested word embedded vector.

Optionally, the apparatus is further configured to: terminating the training of the code review model if the cosine similarity of the first code review suggested word embedded vector and the historical code review suggested word embedded vector is less than a predetermined threshold; and the association degree between the first code review suggestion corresponding to the first code review suggestion word embedding vector and the historical code segment meets a preset association degree condition.

Fifth embodiment

Corresponding to the methods provided in the first and third embodiments of the present application, a fifth embodiment of the present application also provides an electronic device. Since the fifth embodiment is substantially similar to the above method embodiment provided in this application, it is described relatively simply, and reference may be made to the description of the above method embodiment provided in this application for relevant points. The fifth embodiment described below is merely illustrative.

As shown in fig. 6, fig. 6 is a schematic view of an electronic device according to a ninth embodiment of the present application.

The electronic device includes:

at least one processor 601, at least one communication interface 602, at least one memory 603, and at least one communication bus 604;

optionally, the communication interface 602 may be an interface of a communication module, such as an interface of a WLAN (Wireless Local Area Network) module;

the processor 601 may be a processor CPU, or an application Specific Integrated circuit (asic), or one or more Integrated circuits configured to implement embodiments of the present application.

The memory 603 may comprise a high-speed RAM memory, and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory.

The memory 603 stores a program, and the processor 601 calls the program stored in the memory 603 to execute the method provided by the embodiment of the present application.

Sixth embodiment

Corresponding to the above method embodiments provided by the present application, the sixth embodiment of the present application further provides a computer storage medium, and since the sixth embodiment is substantially similar to the above method embodiments provided by the present application, the description is relatively simple, and relevant points can be referred to the description of the above method embodiment section provided by the present application. The sixth embodiment described below is merely illustrative.

The computer storage medium stores a computer program that, when executed, implements the methods provided in the above-described method embodiments.

It should be noted that, for the detailed description of the electronic device and the computer storage medium provided in the embodiments of the present application, reference may be made to the related description of the foregoing method embodiments provided in the present application, and details are not repeated here.

Although the present application has been described with reference to the preferred embodiments, it is not intended to limit the present application, and those skilled in the art can make variations and modifications without departing from the spirit and scope of the present application, therefore, the scope of the present application should be determined by the claims that follow.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

1. Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transmitter 7 media), such as modulated data signals and carrier waves.

2. It will be apparent to those skilled in the art that embodiments of the present application may be provided as a system or an electronic device. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Claims

1. A code review method, comprising:

obtaining a code segment to be evaluated;

obtaining a code evaluation suggestion aiming at the code segment to be evaluated according to a code evaluation model;

the code review model is used for searching for associated words corresponding to characters included in the code segment to be reviewed from words included in historical code review suggestions, the association degree between the characters included in the code segment to be reviewed and the associated words meets a preset association degree condition, and the code review suggestions are generated for the code segment to be reviewed according to the associated words.

2. The method of claim 1, wherein obtaining a code review suggestion for the code segment to be reviewed according to a code review model comprises:

acquiring code character embedding vectors to be evaluated and examined corresponding to the code fragments to be evaluated and examined;

and generating a code review suggested word embedding vector of which the association degree with the code character embedding vector to be reviewed meets a preset association degree condition through the corresponding relation between the historical code segment character embedding vector and the historical code review suggested word embedding vector which are pre-stored in the historical code review suggested word library.

3. The method according to claim 2, wherein the obtaining of the code character embedding vector to be evaluated corresponding to the code segment to be evaluated comprises:

carrying out character splitting processing on the code segments to be evaluated to obtain a plurality of subcode characters;

obtaining sub-code character embedding vectors corresponding to the plurality of sub-code characters;

and forming the code character embedding vector to be evaluated by the obtained plurality of sub-code character embedding vectors according to a preset combination mode.

4. The method of claim 2, wherein the generating of the code review suggestion embedding vector with the association degree with the code character embedding vector to be reviewed satisfying a preset association degree condition through the correspondence between the historical code segment character embedding vector and the historical code review suggestion embedding vector pre-stored in the historical code review suggestion library comprises:

matching a plurality of sub-code character embedding vectors included in the code character embedding vector to be evaluated with sub-historical code character embedding vectors included in historical code fragment character embedding vectors in the historical code evaluation suggestion library respectively;

obtaining a sub-history code character embedding vector corresponding to the sub-character embedding vector;

obtaining a sub-history code review suggested word embedding vector corresponding to the sub-history code character embedding vector;

taking the sub-history code review suggested word embedded vector as a relevant word embedded vector corresponding to the sub-character embedded vector;

and generating a code review suggested word embedding vector of which the association degree with the code character embedding vector to be reviewed meets a preset association degree condition according to the obtained multiple associated word embedding vectors.

5. The method of claim 1, wherein the code segment to be evaluated is a code segment changed in source code;

the obtaining of the code review suggestion for the code segment to be reviewed according to the code review model includes:

and obtaining a code review suggestion aiming at the changed code segment according to the code review model.

6. A method for training a code review model, comprising:

obtaining a historical code segment and a historical code review suggestion corresponding to the historical code segment;

acquiring historical code segment character embedding vectors corresponding to the historical code segments and historical code review suggestion word embedding vectors corresponding to the historical code review suggestions;

obtaining a first code review suggested word embedding vector aiming at the historical code segment character embedding vector according to an initial code review model;

calculating cosine similarity of the first code review suggested word embedding vector and the historical code review suggested word embedding vector, wherein the cosine similarity is used for representing the similarity between the first code review suggested word embedding vector and the historical code review suggested word embedding vector;

and training the initial code review model according to the cosine similarity.

7. The method of claim 6, wherein obtaining historical code segment character embedding vectors corresponding to historical code segments and historical code review suggestion corresponding historical code review suggestion word embedding vectors comprises:

performing character splitting processing on the historical code segments to obtain a plurality of sub-historical code characters; performing word segmentation processing on the historical code review suggestion to obtain a plurality of sub-historical code review suggestion words;

obtaining sub-history code character embedding vectors corresponding to the sub-history code characters and sub-history code review suggested word embedding vectors corresponding to the sub-history code review suggested words;

accumulating the obtained multiple sub-historical code character embedding vectors to obtain a vector which is used as the historical code segment character embedding vector;

correspondingly, the vector obtained by accumulating the obtained multiple sub-historical code review suggested word embedding vectors is used as the historical code review suggested word embedding vector.

8. The method of claim 6, wherein the training the initial code review model according to the cosine similarity comprises:

according to the cosine similarity, obtaining a first loss between the first code review suggested word embedding vector and the historical code review suggested word embedding vector;

transmitting the first loss back into a trained code review model;

and the trained code review model adjusts parameters according to the first loss.

9. A code review device, comprising:

the obtaining unit is used for obtaining the code segments to be evaluated;

the code review suggestion generation unit is used for obtaining a code review suggestion aiming at the code segment to be reviewed according to the code review model;

10. A training apparatus for a code review model, comprising:

the device comprises a first obtaining unit, a second obtaining unit and a third obtaining unit, wherein the first obtaining unit is used for obtaining a historical code segment and a historical code review suggestion corresponding to the historical code segment;

the second obtaining unit is used for obtaining historical code segment character embedding vectors corresponding to the historical code segments and historical code review suggestion word embedding vectors corresponding to the historical code review suggestions;

the third obtaining unit is used for obtaining a first code review suggested word embedding vector aiming at the historical code segment character embedding vector according to the initial code review model;

the computing unit is used for computing cosine similarity of the first code review suggested word embedding vector and the historical code review suggested word embedding vector, and the cosine similarity is used for representing the similarity between the first code review suggested word embedding vector and the historical code review suggested word embedding vector;

and the training unit is used for training the initial code review model according to the cosine similarity.

11. An electronic device, comprising:

a processor;

memory for storing a program of a method, which when read run by a processor, performs the method of any of claims 1-8.

12. A computer storage medium, characterized in that the computer storage medium stores a computer program which, when executed, performs the method of any one of claims 1-8.