CN114443476A - Code review method and device - Google Patents

Code review method and device Download PDF

Info

Publication number
CN114443476A
CN114443476A CN202210025868.1A CN202210025868A CN114443476A CN 114443476 A CN114443476 A CN 114443476A CN 202210025868 A CN202210025868 A CN 202210025868A CN 114443476 A CN114443476 A CN 114443476A
Authority
CN
China
Prior art keywords
code
historical
review
code review
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210025868.1A
Other languages
Chinese (zh)
Inventor
喻煜阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Cloud Computing Ltd
Original Assignee
Alibaba Cloud Computing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Cloud Computing Ltd filed Critical Alibaba Cloud Computing Ltd
Priority to CN202210025868.1A priority Critical patent/CN114443476A/en
Publication of CN114443476A publication Critical patent/CN114443476A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3604Software analysis for verifying properties of programs
    • G06F11/3608Software analysis for verifying properties of programs using formal methods, e.g. model checking, abstract interpretation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3604Software analysis for verifying properties of programs
    • G06F11/3616Software analysis for verifying properties of programs using software metrics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Quality & Reliability (AREA)
  • Computer Hardware Design (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Machine Translation (AREA)

Abstract

The code review method provided by the embodiment of the application comprises the following steps: obtaining a code segment to be evaluated; obtaining a code evaluation suggestion aiming at the code segment to be evaluated according to a code evaluation model; the code review model is used for searching words of which the association degrees with the characters included in the code segment to be reviewed meet preset association degree conditions from the words included in the historical code review suggestions as associated words, and generating the code review suggestions for the code segment to be reviewed according to the associated words. Compared with the limited domain knowledge of the reviewer, the code review model is obtained through massive priori knowledge training, more problems can be recognized, and the code review suggestion corresponding to the code segment to be reviewed is obtained according to the code review model, so that the code review efficiency is improved, and the code review quality is improved.

Description

Code review method and device
Technical Field
The application relates to the field of artificial intelligence, in particular to a code review method, a code review device, electronic equipment and a storage medium.
Background
Code review means that after the coding personnel completes code writing or code modification and submits the code file to the code library, the code reviewer reads the code to check whether the code submitted by the coding personnel meets the coding standard and the quality of the code. Code review is an indispensable link in the software development process, and the code review can improve the readability and maintainability of the code and ensure the software development quality.
Currently, in the process of code review for code writing or code modification completed by a coder, a static scanning tool or a deep learning model is generally adopted for code review. However, the static scanning tool is adopted to carry out code review on code writing data or code modification data, by means of a code modification-code review corresponding rule set, the limited rule set enables code modification not contained in the rule set to be incapable of matching with corresponding code review opinions, and the code review range aiming at code modification is reduced; code review of code modification data by adopting a deep learning model requires code review of a full amount of code source files, so that the code review work efficiency is low due to the complex workload of the code review work.
Therefore, how to improve the efficiency of code review for code modification is a technical problem that needs to be solved urgently by those skilled in the art.
Disclosure of Invention
The embodiment of the application provides a code review method, a code review device, electronic equipment and a computer storage medium, so as to solve the technical problem that the efficiency of code review aiming at code modification is low in the prior art. The application also relates to a training method and device of the code review model, the electronic equipment and a storage medium.
The embodiment of the application provides a code review method, which comprises the following steps: obtaining a code segment to be evaluated; obtaining a code evaluation suggestion aiming at the code segment to be evaluated according to a code evaluation model; the code review model is used for searching relevant words corresponding to characters included in the code segment to be reviewed from words included in historical code review suggestions, the relevance between the characters included in the code segment to be reviewed and the relevant words meets a preset relevance condition, and the code review suggestions are generated for the code segment to be reviewed according to the relevant words.
Optionally, the obtaining, according to the code review model, a code review suggestion for the to-be-reviewed code segment includes: acquiring code character embedding vectors to be evaluated and examined corresponding to the code fragments to be evaluated and examined; and generating a code review suggested word embedding vector of which the association degree with the code character embedding vector to be reviewed meets a preset association degree condition through the corresponding relation between the historical code segment character embedding vector and the historical code review suggested word embedding vector which are pre-stored in the historical code review suggested word library.
Optionally, the obtaining of the code character embedding vector to be evaluated corresponding to the code segment to be evaluated includes: carrying out character splitting processing on the code segments to be evaluated to obtain a plurality of subcode characters; obtaining sub-code character embedding vectors corresponding to the plurality of sub-code characters; and forming the code character embedding vector to be evaluated according to a preset combination mode by using the obtained plurality of sub-code character embedding vectors.
Optionally, the method further includes: obtaining sub-code character vectors corresponding to a plurality of sub-code characters; the obtaining of the sub-code character embedding vectors corresponding to the plurality of sub-code characters includes: generating a dimension-reduced sub-code character embedded vector through a multi-layer embedded network model aiming at each sub-code character vector; the multi-layer embedded network model is used for obtaining word vectors with reduced dimensionality aiming at input high-dimensional word vectors or obtaining character vectors with reduced dimensionality aiming at input high-dimensional character vectors.
Optionally, the generating a code review suggested word embedding vector of which the association degree with the to-be-reviewed code character embedding vector meets a preset association degree condition through the correspondence between the historical code segment character embedding vector and the historical code review suggested word embedding vector pre-stored in the historical code review suggested word library includes: matching a plurality of sub-code character embedding vectors included in the code character embedding vector to be evaluated with sub-historical code character embedding vectors included in historical code fragment character embedding vectors in the historical code evaluation suggestion library respectively; obtaining a sub-history code character embedding vector corresponding to the sub-character embedding vector; obtaining a sub-history code review suggested word embedding vector corresponding to the sub-history code character embedding vector; taking the sub-history code review suggested word embedding vector as a relevant word embedding vector corresponding to the sub-character embedding vector; and generating a code review suggested word embedding vector of which the association degree with the code character embedding vector to be reviewed meets a preset association degree condition according to the obtained multiple associated word embedding vectors.
Optionally, the method further includes: screening the plurality of subcode characters according to preset screening conditions to obtain a plurality of target subcode characters; the obtaining sub-code character vectors corresponding to a plurality of sub-code characters includes: and obtaining target sub-code character vectors corresponding to the plurality of target sub-code characters.
Optionally, the code segment to be evaluated is a code segment changed in the source code; the obtaining of the code review suggestion for the code segment to be reviewed according to the code review model includes: obtaining a code review suggestion for the altered code segment according to a code review model.
The embodiment of the application further provides a training method of the code review model, which comprises the following steps: obtaining a historical code segment and a historical code review suggestion corresponding to the historical code segment; acquiring historical code segment character embedding vectors corresponding to the historical code segments and historical code review suggestion word embedding vectors corresponding to the historical code review suggestions; obtaining a first code review suggested word embedding vector aiming at the historical code segment character embedding vector according to an initial code review model; calculating cosine similarity of the first code review suggested word embedding vector and the historical code review suggested word embedding vector, wherein the cosine similarity is used for representing the similarity between the first code review suggested word embedding vector and the historical code review suggested word embedding vector; and training the initial code review model according to the cosine similarity.
Optionally, the obtaining of the historical code segment character embedding vector corresponding to the historical code segment and the historical code review suggestion word embedding vector corresponding to the historical code review suggestion includes: performing character splitting processing on the historical code segments to obtain a plurality of sub-historical code characters; performing word segmentation processing on the historical code review suggestion to obtain a plurality of sub-historical code review suggestion words; obtaining sub-history code character embedding vectors corresponding to the sub-history code characters and sub-history code review suggested word embedding vectors corresponding to the sub-history code review suggested words; accumulating the obtained multiple sub-historical code character embedding vectors to obtain a vector which is used as the historical code segment character embedding vector; correspondingly, the obtained vector obtained by accumulating the multiple sub-historical code review suggested word embedding vectors is used as the historical code review suggested word embedding vector.
Optionally, the obtaining of the sub-history code character embedding vectors corresponding to the plurality of sub-history code characters and the sub-history code review suggested word embedding vectors corresponding to the plurality of sub-history code review suggested words includes: obtaining sub-history code character vectors corresponding to the sub-history code characters and sub-history code review suggested word vectors corresponding to the sub-history code review suggested words; generating a plurality of dimensionality-reduced sub-history code character embedding vectors and a plurality of dimensionality-reduced sub-history code review suggested word embedding vectors through a multi-layer embedding network model; the multi-layer embedded network model is used for obtaining word vectors with reduced dimensionality aiming at input high-dimensional word vectors or obtaining character vectors with reduced dimensionality aiming at input high-dimensional character vectors.
Optionally, the calculating the cosine similarity between the first code review suggested word embedding vector and the historical code review suggested word embedding vector includes: calculating cosine similarity of the first code review suggested word embedding vector and the historical code review suggested word embedding vector according to a first formula; wherein the first formula is:
Figure BDA0003464613660000041
wherein similarity (a, B) represents cosine similarity of the first code review suggested word embedding vector and the history code review suggested word embedding vector; a represents embedding of first code review suggestion intoAmount, AiRepresenting the ith element in the first code review suggestion embedding vector, B representing the historical code review suggestion embedding vector, BiIndicating that the historic code reviews the ith element of the suggested word embedded vector.
Optionally, the training the initial code review model according to the cosine similarity includes: according to the cosine similarity, obtaining a first loss between the first code review suggested word embedding vector and the historical code review suggested word embedding vector; transmitting the first loss back into a trained code review model; and the trained code review model adjusts parameters according to the first loss.
Optionally, the method further includes: terminating the training of the code review model if the cosine similarity of the first code review suggested word embedded vector and the historical code review suggested word embedded vector is less than a predetermined threshold; and the association degree between the first code review suggestion corresponding to the first code review suggestion word embedding vector and the historical code segment meets a preset association degree condition.
An embodiment of the present application further provides a code review device, including: the obtaining unit is used for obtaining the code segments to be evaluated; the code review suggestion generation unit is used for obtaining a code review suggestion aiming at the code segment to be reviewed according to the code review model; the code review model is used for searching relevant words corresponding to characters included in the code segment to be reviewed from words included in historical code review suggestions, the relevance between the characters included in the code segment to be reviewed and the relevant words meets a preset relevance condition, and the code review suggestions are generated for the code segment to be reviewed according to the relevant words.
The embodiment of the present application further provides a training device for a code review model, including: the device comprises a first obtaining unit, a second obtaining unit and a third obtaining unit, wherein the first obtaining unit is used for obtaining a historical code segment and a historical code review suggestion corresponding to the historical code segment; the second obtaining unit is used for obtaining historical code segment character embedding vectors corresponding to the historical code segments and historical code review suggestion word embedding vectors corresponding to the historical code review suggestions; the third obtaining unit is used for obtaining a first code review suggested word embedding vector aiming at the historical code segment character embedding vector according to the initial code review model; the computing unit is used for computing cosine similarity of the first code review suggested word embedding vector and the historical code review suggested word embedding vector, and the cosine similarity is used for representing the similarity between the first code review suggested word embedding vector and the historical code review suggested word embedding vector; and the training unit is used for training the initial code review model according to the cosine similarity.
An embodiment of the present application further provides an electronic device, including: a processor; a memory for storing a program of a method, which when read run by a processor, performs the method of any of the above.
Embodiments of the present application further provide a computer storage medium, in which a computer program is stored, and the program performs any one of the methods described above when executed.
Compared with the prior art, the method has the following advantages:
the code review method provided by the embodiment of the application comprises the following steps: obtaining a code segment to be evaluated; obtaining a code evaluation suggestion aiming at the code segment to be evaluated according to a code evaluation model; the code review model is used for searching relevant words corresponding to characters included in the code segment to be reviewed from words included in historical code review suggestions, the relevance between the characters included in the code segment to be reviewed and the relevant words meets a preset relevance condition, and the code review suggestions are generated for the code segment to be reviewed according to the relevant words.
The code review method provided by the application adopts the code review model to obtain the code review suggestions according to the code segments to be reviewed, and specifically, the code review model generates the corresponding code review suggestions for the code segments to be reviewed according to the corresponding relation between the characters included in the historical code segments pre-stored in the historical code review suggestion library and the words included in the historical code review suggestions. Compared with the limited domain knowledge of the reviewer, the code review model is obtained through massive priori knowledge training, more problems can be recognized, and the code review suggestion corresponding to the code segment to be reviewed is obtained according to the code review model, so that the code review efficiency is improved, and the code review quality is improved.
Drawings
Fig. 1 is an application scenario diagram of a training method for a code review model according to an embodiment of the present application.
Fig. 2 is a flowchart of a code review method according to a first embodiment of the present application.
Fig. 3 is a schematic diagram of a code review device according to a second embodiment of the present application.
Fig. 4 is a flowchart of a method for training a code review model according to a third embodiment of the present application.
Fig. 5 is a schematic diagram of a training apparatus for a code review model according to a fourth embodiment of the present application.
Fig. 6 is a schematic view of an electronic device according to a fifth embodiment of the present application.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of implementation in many different ways than those herein set forth and of similar import by those skilled in the art without departing from the spirit of this application and is therefore not limited to the specific implementations disclosed below.
The embodiment of the application provides a code review method, which includes searching for relevant words with the relevance degree meeting a preset relevance degree condition with the characters of a code segment to be reviewed according to the relevance relationship between the characters included in a historical code segment stored in a historical code review suggestion library in a code review model and the words included in the historical code review suggestion, and generating a code review suggestion corresponding to the code segment to be reviewed according to the relevant words. In the process, the code evaluation is performed without reading the full code data, so that the code evaluation efficiency aiming at the code change segment is improved. Compared with the limited domain knowledge of the reviewer, the code review model is obtained through massive priori knowledge training, more problems can be recognized, and the code review suggestion corresponding to the code segment to be reviewed is obtained according to the code review model, so that the code review efficiency is improved, and the code review quality is improved.
The training method of the code review model provided by the embodiment of the application can be executed by any computer with data processing capability, such as a terminal device or a server.
Fig. 1 is a schematic application scenario diagram of a training method for a code review model according to an embodiment of the present application, which is described in detail below with reference to fig. 1.
As shown in fig. 1, the training process for the code review model includes: a data processing phase, a model training phase, and a code review proposal generation phase.
In the data processing stage 101, the server receives the historical code review request suggestion request data 101-1 sent by the user side, and splits the historical code review request suggestion request data 101-1 to obtain the historical code review suggestion 101-2 and the historical code alteration segment 101-3. And performing word segmentation processing on the historical code review suggestion 101-2 to obtain a plurality of sub-historical code review suggestion words, and performing character segmentation processing on the historical code change segment 101-3 to obtain a plurality of sub-historical code characters.
Obtaining a sub-history code character vector 101-5 corresponding to a plurality of sub-history code characters and a plurality of sub-history code review suggested word vectors corresponding to a plurality of sub-history code review suggested words; respectively generating a plurality of dimensionality-reduced sub-history code character embedded vectors 101-4 and a plurality of dimensionality-reduced sub-history code review suggested word embedded vectors through a multi-layer embedded network model; the multi-layer embedded network model is used for obtaining word vectors with reduced dimensionality aiming at input high-dimensional word vectors or obtaining character vectors with reduced dimensionality aiming at input high-dimensional character vectors.
Accumulating the obtained multiple sub-historical code character embedding vectors to obtain a vector which is used as the historical code segment character embedding vector; correspondingly, the vector obtained by accumulating the obtained multiple sub-historical code review suggested word embedding vectors is used as the historical code review suggested word embedding vector.
In addition, the obtained multiple sub-history code character embedded vectors can be accumulated, and the accumulated vectors are normalized to obtain the vectors as the history code segment character embedded vectors. And accumulating the obtained multiple sub-historical code review suggested word embedding vectors, and normalizing the accumulated vectors to obtain the vectors serving as the historical code review suggested word embedding vectors.
For example, a vector obtained by accumulating a plurality of obtained sub-history code character embedding vectors is {3, 2, 0, 1, 3}, and for improving the calculation efficiency, the vector is normalized and the vector obtained after the normalization is used as a history code segment character embedding vector. Correspondingly, the vector obtained by accumulating the obtained multiple sub-historical code review suggestion word embedding vectors is {3, 3, 6, 2, 3}, in order to improve the calculation efficiency, the vector is subjected to normalization processing, and the vector obtained after the normalization processing is used as the historical code review suggestion word embedding vector.
In a model training stage 102, obtaining a first code review suggested word embedding vector aiming at the historical code segment character embedding vector according to the initial code review model; calculating cosine similarity of a first code review suggested word embedded vector and a historical code review suggested word embedded vector, and obtaining a first loss between the first code review suggested word embedded vector and the historical code review suggested word embedded vector according to the cosine similarity; transmitting the first loss back into a trained code review model; and the trained code review model adjusts parameters according to the first loss.
The historical code review suggestion word embedding vector is 102-1, the historical code segment character embedding vector is 102-2, and Rel (C, R)102-3 represents the association degree between the historical code review suggestion word embedding vector 102-1 and the historical code segment character embedding vector 102-2.
Terminating the training of the code review model if the cosine similarity of the first code review suggested word embedded vector and the historical code review suggested word embedded vector is less than a predetermined threshold; at this time, it is stated that the association degree between the first code review suggestion corresponding to the first code review suggestion word embedding vector and the historical code segment satisfies a preset association degree condition.
In the code review suggestion generation stage 103, the code segment 103-1 to be reviewed is input into the trained code review model 103-2, and a code review suggestion 103-3 corresponding to the code segment 103-1 to be reviewed output by the trained code review model 103-2 is obtained.
The code review model searches for relevant words with the relevance degree meeting a preset relevance degree condition with the characters included in the code segment to be reviewed according to the corresponding relation between the characters included in the historical code segment in the historical code review suggestion library and the words included in the historical code review suggestion, and generates corresponding code review suggestions for the code segment to be reviewed according to the relevant words.
The code review method is described in more detail below with reference to specific embodiments.
First embodiment
Fig. 2 is a flowchart of a code review method according to a first embodiment of the present application, which includes the following steps.
Step S201: and obtaining the code segment to be evaluated.
The method is used for obtaining the code segments to be evaluated and making a data basis for subsequently analyzing the code evaluation suggestions corresponding to the code segments to be evaluated.
And the code segment to be evaluated is a changed code segment in the source code.
According to the method and the device, the code review model is adopted to generate the corresponding code review suggestions for the code segments to be reviewed, the code review suggestions corresponding to the changed code segments are analyzed according to the changed code segments instead of the full code source files, and the generation efficiency of the code review suggestions for the code changing part is improved.
Step S202: and obtaining a code evaluation suggestion aiming at the code segment to be evaluated according to the code evaluation model.
The method comprises the following steps of obtaining a code evaluation suggestion aiming at a code segment to be evaluated by adopting a code evaluation model.
The code review model is used for searching relevant words corresponding to characters included in the code segment to be reviewed from words included in historical code review suggestions, the relevance between the characters included in the code segment to be reviewed and the relevant words meets a preset relevance condition, and the code review suggestions are generated for the code segment to be reviewed according to the relevant words.
The words included in the historical code review suggestions are words in the historical code review suggestions stored in the historical code review library in the code review model in advance.
Specifically, the historical code review suggestion library stores a corresponding relationship between the historical code segment and the historical code review suggestion, and the method includes: the method comprises the steps of obtaining historical code review suggestions, obtaining historical code segment embedding vectors corresponding to historical code segments, obtaining historical code review suggestion words, obtaining historical code segment character embedding vectors corresponding to historical code segments, obtaining historical code review suggestion words, obtaining historical code segments, obtaining historical code review suggestion words, obtaining the historical code review suggestion words, and obtaining the historical code review suggestion words.
The obtaining of the code review suggestion for the code segment to be reviewed according to the code review model may be: acquiring code character embedding vectors to be evaluated and examined corresponding to the code fragments to be evaluated and examined; and generating a code review suggested word embedding vector of which the association degree with the code character embedding vector to be reviewed meets a preset association degree condition through the corresponding relation between the historical code segment character embedding vector and the historical code review suggested word embedding vector which are pre-stored in the historical code review suggested word library.
After the code segment to be evaluated is obtained, the character embedding vector of the code segment to be evaluated corresponding to the code segment to be evaluated is obtained, the embedding vector related to the character embedding vector of the code segment to be evaluated is inquired in a historical code evaluation suggestion library, and then historical code evaluation suggestion data corresponding to the embedding vector of the code evaluation suggestion word are evaluated according to the obtained code with the preset similarity degree with the character embedding vector of the code segment to be evaluated.
The obtaining of the code character embedding vector to be evaluated corresponding to the code segment to be evaluated may refer to: carrying out character splitting processing on the code segments to be evaluated to obtain a plurality of subcode characters; obtaining sub-code character embedding vectors corresponding to the plurality of sub-code characters; and forming the code character embedding vector to be evaluated according to a preset combination mode by using the obtained plurality of sub-code character embedding vectors.
After obtaining the plurality of subcodes, the method further comprises: and obtaining sub-code character vectors corresponding to the plurality of sub-code characters.
The obtaining of the sub-code character embedding vectors corresponding to the plurality of sub-code characters includes: generating a dimension-reduced sub-code character embedded vector through a multi-layer embedded network model aiming at each sub-code character vector; the multi-layer embedded network model is used for obtaining word vectors with reduced dimensionality aiming at input high-dimensional word vectors or obtaining character vectors with reduced dimensionality aiming at input high-dimensional character vectors.
After obtaining the plurality of subcodes, the method further comprises: and screening the plurality of subcode characters according to a preset screening condition to obtain a plurality of target subcode characters. The obtaining sub-code character vectors corresponding to a plurality of sub-code characters includes: and obtaining target sub-code character vectors corresponding to the plurality of target sub-code characters.
The code review suggested word embedding vector generating the code review suggested word embedding vector of which the association degree with the code character embedding vector to be reviewed meets the preset association degree condition through the corresponding relation between the historical code segment character embedding vector and the historical code review suggested word embedding vector pre-stored in the historical code review suggested word library comprises the following steps: matching a plurality of sub-code character embedding vectors included in the code character embedding vector to be evaluated with sub-historical code character embedding vectors included in historical code fragment character embedding vectors in the historical code evaluation suggestion library respectively; obtaining a sub-history code character embedding vector corresponding to the sub-character embedding vector; obtaining a sub-history code review suggested word embedding vector corresponding to the sub-history code character embedding vector; taking the sub-history code review suggested word embedded vector as a relevant word embedded vector corresponding to the sub-character embedded vector; and generating a code review suggested word embedding vector of which the association degree with the code character embedding vector to be reviewed meets a preset association degree condition according to the obtained multiple associated word embedding vectors.
Second embodiment
Corresponding to the application scenario embodiment of the training method of the code review model provided by the present application and the code review method provided by the first embodiment, a second embodiment of the present application further provides a code review device. Since the device embodiment is basically similar to the application scenario embodiment and the first embodiment, the description is relatively simple, and reference may be made to the application scenario embodiment and a part of the description of the first embodiment for relevant points. The device embodiments described below are merely illustrative.
Please refer to fig. 3, which is a schematic diagram of a code review device provided in a second embodiment of the present application.
The code review device comprises:
an obtaining unit 301, configured to obtain a code segment to be evaluated;
the code review generating unit 302 is configured to obtain a code review suggestion for the code segment to be reviewed according to the code review model;
the code review model is used for searching relevant words corresponding to characters included in the code segment to be reviewed from words included in historical code review suggestions, the relevance between the characters included in the code segment to be reviewed and the relevant words meets a preset relevance condition, and the code review suggestions are generated for the code segment to be reviewed according to the relevant words.
Optionally, the obtaining, according to the code review model, a code review suggestion for the to-be-reviewed code segment includes: acquiring code character embedding vectors to be evaluated and audited corresponding to the code fragments to be evaluated and audited; and generating a code review suggested word embedding vector of which the association degree with the code character embedding vector to be reviewed meets a preset association degree condition through the corresponding relation between the historical code segment character embedding vector and the historical code review suggested word embedding vector which are pre-stored in the historical code review suggested word library.
Optionally, the obtaining of the code character embedding vector to be evaluated corresponding to the code segment to be evaluated includes: carrying out character splitting processing on the code segments to be evaluated to obtain a plurality of subcode characters; obtaining sub-code character embedding vectors corresponding to the plurality of sub-code characters; and forming the code character embedding vector to be evaluated according to a preset combination mode by using the obtained plurality of sub-code character embedding vectors.
Optionally, the apparatus is further configured to: obtaining sub-code character vectors corresponding to a plurality of sub-code characters; the obtaining of the sub-code character embedding vectors corresponding to the plurality of sub-code characters includes: generating a dimension-reduced sub-code character embedded vector through a multi-layer embedded network model aiming at each sub-code character vector; the multi-layer embedded network model is used for obtaining word vectors with reduced dimensionality aiming at input high-dimensional word vectors or obtaining character vectors with reduced dimensionality aiming at input high-dimensional character vectors.
Optionally, the generating of the code review suggested word embedded vector, in which the association degree with the to-be-reviewed code character embedded vector satisfies a preset association degree condition, according to the correspondence between the historical code segment character embedded vector and the historical code review suggested word embedded vector pre-stored in the historical code review suggested word library includes: matching a plurality of sub-code character embedding vectors included in the code character embedding vector to be evaluated with sub-historical code character embedding vectors included in historical code fragment character embedding vectors in the historical code evaluation suggestion library respectively; obtaining a sub-history code character embedding vector corresponding to the sub-character embedding vector; obtaining a sub-history code review suggested word embedding vector corresponding to the sub-history code character embedding vector; taking the sub-history code review suggested word embedding vector as a relevant word embedding vector corresponding to the sub-character embedding vector; and generating a code review suggested word embedding vector of which the association degree with the code character embedding vector to be reviewed meets a preset association degree condition according to the obtained multiple associated word embedding vectors.
Optionally, the apparatus is further configured to: screening the plurality of subcode characters according to preset screening conditions to obtain a plurality of target subcode characters; the obtaining sub-code character vectors corresponding to a plurality of sub-code characters includes: and obtaining target sub-code character vectors corresponding to the plurality of target sub-code characters.
Optionally, the code segment to be evaluated is a code segment changed in the source code; the obtaining of the code review suggestion for the code segment to be reviewed according to the code review model includes: obtaining a code review suggestion for the altered code segment according to a code review model.
Third embodiment
In the first embodiment, a code review method is provided, and the code review method mainly generates a code review suggestion that the association degree meets a preset association degree condition for a code segment to be reviewed through a code review model. Correspondingly, the third embodiment of the present application provides a training method for a code review model, and since the code review method has been described in detail in the first embodiment, reference is made to the description of the first embodiment for relevant points, and details are not repeated here.
A third embodiment of the present application provides a method for training a code review model, please refer to fig. 4, which is a flowchart of the method for training a code review model provided in the third embodiment of the present application.
Step S401: obtaining a historical code segment and a historical code review suggestion corresponding to the historical code segment.
This step is used to obtain the historical code snippets and the historical code review suggestions as training data for the training code review model.
The historical code segments and the historical code review suggestions corresponding to the historical code segments are used as training data because the association degree between the historical code segments and the historical code review suggestions meets a preset association degree condition. The code review model obtained by training in this embodiment can generate a code review suggestion for the code segment to be reviewed, where the association degree of the code segment to be reviewed and the code segment to be reviewed satisfies the preset association degree condition.
Therefore, in the embodiment of the application, the historical code segment and the historical code review suggestion are used as sample data pairs, and the first code review suggestion corresponding to the historical code segment is generated through the initial code review model; cosine similarity calculation is carried out on the first code review suggestion and historical code review suggestions corresponding to the historical code segments and stored in a historical code review suggestion database; determining the difference degree between the first code review proposal and the historical code review proposal according to the obtained cosine similarity value; and further training the code review model according to the difference degree.
The training process of the code review model is described in detail by the following steps.
Before the code review model is trained, data processing needs to be performed on the above historical code segments and the historical code review suggestions, please refer to step S402.
Step S402: and acquiring historical code segment character embedding vectors corresponding to the historical code segments, and historical code review suggestion word embedding vectors corresponding to the historical code review suggestions.
The step is used for carrying out data vectorization processing on the obtained historical code segments and the historical code review suggestions.
The obtaining of the historical code segment character embedding vector corresponding to the historical code segment and the historical code review suggestion word embedding vector corresponding to the historical code review suggestion includes:
performing character splitting processing on the historical code segments to obtain a plurality of sub-historical code characters; and performing word segmentation processing on the historical code review suggestion to obtain a plurality of sub-historical code review suggestion words.
The history code segment is the changed history code segment in the source code. The changed historical code segment is subjected to character segmentation, and special words which are related to code items and have low occurrence frequency, such as variable names, derivative words, method names, complex words and the like, are usually included in the changed historical code segment. And in order to avoid the phenomenon that the special words cannot be expressed by using word vectors, the history code segments are processed by character segmentation and data vectorization to obtain sub-history code character embedded vectors corresponding to a plurality of sub-history code characters.
And performing word segmentation processing and data vectorization processing on the historical code review suggestion, and embedding vectors of the sub-historical code review suggestion corresponding to the plurality of sub-historical code review suggestions.
Accumulating the obtained multiple sub-historical code character embedding vectors to obtain a vector which is used as the historical code segment character embedding vector; correspondingly, the vector obtained by accumulating the obtained multiple sub-historical code review suggested word embedding vectors is used as the historical code review suggested word embedding vector.
In addition, the obtained multiple sub-history code character embedding vectors can be accumulated, and the accumulated vectors are normalized to obtain the vectors as the history code segment character embedding vectors. And accumulating the obtained multiple sub-historical code review suggested word embedding vectors, and normalizing the accumulated vectors to obtain the vectors serving as the historical code review suggested word embedding vectors.
For example, a vector obtained by accumulating a plurality of obtained sub-history code character embedding vectors is {3, 2, 0, 1, 3}, and for improving the calculation efficiency, the vector is normalized and the vector obtained after the normalization is used as a history code segment character embedding vector. Correspondingly, the vector obtained by accumulating the obtained multiple sub-historical code review suggestion word embedding vectors is {3, 3, 6, 2, 3}, in order to improve the calculation efficiency, the vector is subjected to normalization processing, and the vector obtained after the normalization processing is used as the historical code review suggestion word embedding vector.
In addition, the historical code character embedding vector and the historical code review suggested word embedding vector are obtained by the following method:
the obtaining of the sub-history code character embedding vectors corresponding to the plurality of sub-history code characters and the sub-history code review suggested word embedding vectors corresponding to the plurality of sub-history code review suggested words includes: obtaining sub-history code character vectors corresponding to the sub-history code characters and sub-history code review suggested word vectors corresponding to the sub-history code review suggested words; generating a plurality of dimensionality-reduced sub-history code character embedding vectors and a plurality of dimensionality-reduced sub-history code review suggested word embedding vectors through a multi-layer embedding network model; the multi-layer embedded network model is used for obtaining word vectors with reduced dimensionality aiming at input high-dimensional word vectors or obtaining character vectors with reduced dimensionality aiming at input high-dimensional character vectors.
For example, the historic code review suggestion includes 5 sub-historic code review suggestion words, one for each sub-historic code review suggestion word embedding vector. The first sub-history code review suggestion embedding vector may be defined as the first element being 1 and the remaining 4 elements being 0, and may be represented as {1, 0, 0, 0, 0 }. Accordingly, the second sub-history code review suggestion word embedding vector may be defined as a second element of 1 and the remaining 4 elements of 0, and the vector may be represented as {0, 1, 0, 0, 0 }; the third sub-history code review suggestion word embedding vector can be defined as a third element of 1 and the remaining 4 elements of 0, and the vector can be expressed as {0, 0, 1, 0, 0 }; the fourth sub-history code review suggestion word embedding vector can be defined as the fourth element being 1, and the remaining 4 elements being 0, and the vector can be represented as {0, 0, 0, 1, 0 }; the fifth sub-history code review suggestion embedding vector may be defined as the fifth element being 1 and the remaining 4 elements being 0, and may be represented as {0, 0, 0, 0, 1 }.
The history code review suggestion embedding vector obtained by accumulating the first sub-history code review suggestion embedding vector, the second sub-history code review suggestion embedding vector, the third sub-history code review suggestion embedding vector, the fourth sub-history code review suggestion embedding vector and the fifth sub-history code review suggestion embedding vector can be expressed as {1, 1, 1, 1, 1 }.
Accordingly, the historical code character embedding vector is similar to the vector characterization method of the historical code review suggested word embedding vector.
Step S403: obtaining a first code review suggested word embedding vector aiming at the historical code segment character embedding vector according to an initial code review model;
the method comprises the steps of analyzing historical code segment character embedded vectors by using an initial code review model, and generating a first code review suggested word embedded vector aiming at the historical code segment character embedded vectors.
The first code review suggestion word embedding vector obtained in the step is used for comparing with the historical code review suggestions corresponding to the historical code fragments, and the initial code review model is further trained according to the difference degree of the two suggestions obtained through comparison.
Specifically, cosine similarity between the first code review suggested word embedding vector and the historical code review suggested word embedding vector is calculated, the difference degree between the two is determined according to the cosine similarity, and the specific description is performed in step S404.
Step S404: and calculating cosine similarity of the first code review suggested word embedding vector and the historical code review suggested word embedding vector, wherein the cosine similarity is used for representing the similarity between the first code review suggested word embedding vector and the historical code review suggested word embedding vector.
The step is used for calculating the cosine similarity of the first code review suggested word embedding vector and the historical code review suggested word embedding vector according to a formula, so as to provide a training basis for further training of the initial code review model in the step S405.
Specifically, the calculating the cosine similarity between the first code review suggested word embedding vector and the historical code review suggested word embedding vector includes:
calculating cosine similarity of the first code review suggested word embedding vector and the historical code review suggested word embedding vector according to a first formula;
wherein the first formula is:
Figure BDA0003464613660000141
wherein similarity (a, B) represents cosine similarity of the first code review suggested word embedding vector and the history code review suggested word embedding vector; a represents a first code review suggested word embedding vector, AiRepresenting the ith element in the first code review suggestion embedding vector, B representing the historical code review suggestion embedding vector, BiIndicating the ith element of the historical code review suggestion embedding vector; n is the dimension of the historical code review suggested word embedding vector and the first code review suggested word embedding vector, and n is not less than 2.
The cosine similarity value is between 0 and 1, and the larger the cosine similarity value is, the smaller the included angle between the two vectors is, the more similar the two vectors are; the smaller the cosine similarity value is, the larger the included angle between the two vectors is, the larger the difference degree between the two vectors is.
Step S405: and training the initial code review model according to the cosine similarity.
The method is used for training the initial code review model according to the cosine similarity and judging when to stop training the initial code review model.
The training the initial code review model according to the cosine similarity comprises the following steps:
according to the cosine similarity, obtaining a first loss between the first code review suggested word embedding vector and the historical code review suggested word embedding vector; transmitting the first loss back into a trained code review model; and the trained code review model adjusts parameters according to the first loss.
As can be known from the above formula, when the cosine similarity value is larger, it indicates that the included angle between the two vectors is smaller, the two vectors are more similar, the difference degree between the two vectors is smaller, and it indicates that the difference degree between the first code review suggested word embedded vector and the historical code review suggested word embedded vector is smaller, and it indicates that the training effect of the initial code review model is better.
When the cosine similarity value is smaller, the larger the included angle between the two vectors is, the larger the difference degree between the first code review suggested word embedded vector and the historical code review suggested word embedded vector is, the poorer the training effect of the initial code review model is, and the initial code review model needs to be trained continuously.
Terminating the training of the code review model if the cosine similarity of the first code review suggested word embedded vector and the historical code review suggested word embedded vector is less than a predetermined threshold; and the association degree between the first code review suggestion corresponding to the first code review suggestion word embedding vector and the historical code segment meets a preset association degree condition.
The cosine similarity is smaller than a preset threshold, the difference degree of the first code review suggested word embedded vector and the historical code review suggested word embedded vector is within a preset difference degree threshold range, the initial code review model is adjusted to be in a state meeting the requirement, and the training purpose is achieved.
In addition, the following conditions may also be used as training conditions for terminating the initial code review model: and continuously providing the historical code segments to the initial code review model, wherein the obtained cosine similarity value between the first code review suggestion and the historical code review suggestion tends to be stable.
The training result shows that the currently used initial code review model reaches the capacity limit due to the limitation of the internal architecture thereof, the continuous training has no great significance, and the training can also be stopped.
The above is a specific description of the training method of the code review model provided in the embodiment of the present application.
The training method of the code review model provided by the embodiment of the application comprises the following steps: obtaining a historical code segment and a historical code review suggestion corresponding to the historical code segment; acquiring historical code segment character embedding vectors corresponding to the historical code segments and historical code review suggestion word embedding vectors corresponding to the historical code review suggestions; obtaining a first code review suggested word embedding vector aiming at the historical code segment character embedding vector according to the initial code review model; calculating cosine similarity of the first code review suggested word embedding vector and the historical code review suggested word embedding vector, wherein the cosine similarity is used for representing the similarity between the first code review suggested word embedding vector and the historical code review suggested word embedding vector; and training the initial code review model according to the cosine similarity.
According to the method, the historical code review suggestions corresponding to the historical code segments are used as learning samples, the initial code review model is trained, and the trained code review model generates the code review suggestions, of which the association degrees with the code segments to be reviewed meet the association degree threshold value, for the code segments to be reviewed. Specifically, the first code review suggestion corresponding to the historical code segment obtained by the initial code review model is used for calculating the cosine similarity between the word embedding vector corresponding to the first code review suggestion and the embedding vector of the historical code review suggestion, and determining the difference degree between the word embedding vector corresponding to the first code review suggestion and the embedding vector of the historical code review suggestion according to the cosine similarity so as to determine whether the initial code review model needs to be trained continuously. Therefore, the code review model has better analysis capability, and code review suggestions with correlation degrees meeting the correlation degree condition are obtained for the code segments to be reviewed.
Fourth embodiment
Corresponding to the application scenario embodiment of the training method for the code review model provided by the present application and the training method for the code review model provided by the third embodiment, the fourth embodiment of the present application further provides a training device for the code review model. Since the device embodiment is basically similar to the application scenario embodiment and the third embodiment, the description is relatively simple, and reference may be made to the partial description of the application scenario embodiment and the third embodiment for relevant points. The device embodiments described below are merely illustrative.
Please refer to fig. 5, which is a schematic diagram of a training apparatus for a code review model provided in a fourth embodiment of the present application.
The training device of the code review model comprises:
a first obtaining unit 501, configured to obtain a historical code segment and a historical code review suggestion corresponding to the historical code segment;
a second obtaining unit 502, configured to obtain a historical code segment character embedding vector corresponding to a historical code segment, and a historical code review suggestion word embedding vector corresponding to the historical code review suggestion;
a third obtaining unit 503, configured to obtain a first code review suggested word embedding vector for the historical code segment character embedding vector according to the initial code review model;
a calculating unit 504, configured to calculate cosine similarity between the first code review suggested word embedding vector and the historical code review suggested word embedding vector, where the cosine similarity is used to represent a similarity between the first code review suggested word embedding vector and the historical code review suggested word embedding vector;
and a training unit 505, configured to train the initial code review model according to the cosine similarity.
Optionally, the obtaining of the historical code segment character embedding vector corresponding to the historical code segment and the historical code review suggestion word embedding vector corresponding to the historical code review suggestion includes: performing character splitting processing on the historical code segments to obtain a plurality of sub-historical code characters; performing word segmentation processing on the historical code review suggestion to obtain a plurality of sub-historical code review suggestion words; obtaining sub-history code character embedding vectors corresponding to the sub-history code characters and sub-history code review suggested word embedding vectors corresponding to the sub-history code review suggested words; accumulating the obtained multiple sub-historical code character embedding vectors to obtain a vector which is used as the historical code segment character embedding vector; correspondingly, the vector obtained by accumulating the obtained multiple sub-historical code review suggested word embedding vectors is used as the historical code review suggested word embedding vector.
Optionally, the obtaining of the sub-history code character embedding vectors corresponding to the plurality of sub-history code characters and the sub-history code review suggested word embedding vectors corresponding to the plurality of sub-history code review suggested words includes: obtaining sub-history code character vectors corresponding to the sub-history code characters and sub-history code review suggested word vectors corresponding to the sub-history code review suggested words; generating a plurality of dimensionality-reduced sub-history code character embedding vectors and a plurality of dimensionality-reduced sub-history code review suggested word embedding vectors through a multi-layer embedding network model; the multi-layer embedded network model is used for obtaining word vectors with reduced dimensionality aiming at input high-dimensional word vectors or obtaining character vectors with reduced dimensionality aiming at input high-dimensional character vectors.
Optionally, the calculating the cosine similarity between the first code review suggested word embedding vector and the historical code review suggested word embedding vector includes: calculating cosine similarity of the first code review suggested word embedding vector and the historical code review suggested word embedding vector according to a first formula; wherein the first formula is:
Figure BDA0003464613660000171
wherein similarity (a, B) represents cosine similarity of the first code review suggested word embedding vector and the history code review suggested word embedding vector; a represents a first code review suggested word embedding vector, AiRepresenting the ith element in the first code review suggestion embedding vector, B representing the historical code review suggestion embedding vector, BiIndicating that the historic code reviews the ith element of the suggested word embedded vector.
Optionally, the training the initial code review model according to the cosine similarity includes: according to the cosine similarity, obtaining a first loss between the first code review suggested word embedding vector and the historical code review suggested word embedding vector; transmitting the first loss back into a trained code review model; and the trained code review model adjusts parameters according to the first loss.
Optionally, the apparatus is further configured to: terminating the training of the code review model if the cosine similarity of the first code review suggested word embedded vector and the historical code review suggested word embedded vector is less than a predetermined threshold; and the association degree between the first code review suggestion corresponding to the first code review suggestion word embedding vector and the historical code segment meets a preset association degree condition.
Fifth embodiment
Corresponding to the methods provided in the first and third embodiments of the present application, a fifth embodiment of the present application also provides an electronic device. Since the fifth embodiment is substantially similar to the above method embodiment provided in this application, it is described relatively simply, and reference may be made to the description of the above method embodiment provided in this application for relevant points. The fifth embodiment described below is merely illustrative.
As shown in fig. 6, fig. 6 is a schematic view of an electronic device according to a ninth embodiment of the present application.
The electronic device includes:
at least one processor 601, at least one communication interface 602, at least one memory 603, and at least one communication bus 604;
optionally, the communication interface 602 may be an interface of a communication module, such as an interface of a WLAN (Wireless Local Area Network) module;
the processor 601 may be a processor CPU, or an application Specific Integrated circuit (asic), or one or more Integrated circuits configured to implement embodiments of the present application.
The memory 603 may comprise a high-speed RAM memory, and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory.
The memory 603 stores a program, and the processor 601 calls the program stored in the memory 603 to execute the method provided by the embodiment of the present application.
Sixth embodiment
Corresponding to the above method embodiments provided by the present application, the sixth embodiment of the present application further provides a computer storage medium, and since the sixth embodiment is substantially similar to the above method embodiments provided by the present application, the description is relatively simple, and relevant points can be referred to the description of the above method embodiment section provided by the present application. The sixth embodiment described below is merely illustrative.
The computer storage medium stores a computer program that, when executed, implements the methods provided in the above-described method embodiments.
It should be noted that, for the detailed description of the electronic device and the computer storage medium provided in the embodiments of the present application, reference may be made to the related description of the foregoing method embodiments provided in the present application, and details are not repeated here.
Although the present application has been described with reference to the preferred embodiments, it is not intended to limit the present application, and those skilled in the art can make variations and modifications without departing from the spirit and scope of the present application, therefore, the scope of the present application should be determined by the claims that follow.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
1. Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transmitter 7 media), such as modulated data signals and carrier waves.
2. It will be apparent to those skilled in the art that embodiments of the present application may be provided as a system or an electronic device. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Claims (12)

1. A code review method, comprising:
obtaining a code segment to be evaluated;
obtaining a code evaluation suggestion aiming at the code segment to be evaluated according to a code evaluation model;
the code review model is used for searching for associated words corresponding to characters included in the code segment to be reviewed from words included in historical code review suggestions, the association degree between the characters included in the code segment to be reviewed and the associated words meets a preset association degree condition, and the code review suggestions are generated for the code segment to be reviewed according to the associated words.
2. The method of claim 1, wherein obtaining a code review suggestion for the code segment to be reviewed according to a code review model comprises:
acquiring code character embedding vectors to be evaluated and examined corresponding to the code fragments to be evaluated and examined;
and generating a code review suggested word embedding vector of which the association degree with the code character embedding vector to be reviewed meets a preset association degree condition through the corresponding relation between the historical code segment character embedding vector and the historical code review suggested word embedding vector which are pre-stored in the historical code review suggested word library.
3. The method according to claim 2, wherein the obtaining of the code character embedding vector to be evaluated corresponding to the code segment to be evaluated comprises:
carrying out character splitting processing on the code segments to be evaluated to obtain a plurality of subcode characters;
obtaining sub-code character embedding vectors corresponding to the plurality of sub-code characters;
and forming the code character embedding vector to be evaluated by the obtained plurality of sub-code character embedding vectors according to a preset combination mode.
4. The method of claim 2, wherein the generating of the code review suggestion embedding vector with the association degree with the code character embedding vector to be reviewed satisfying a preset association degree condition through the correspondence between the historical code segment character embedding vector and the historical code review suggestion embedding vector pre-stored in the historical code review suggestion library comprises:
matching a plurality of sub-code character embedding vectors included in the code character embedding vector to be evaluated with sub-historical code character embedding vectors included in historical code fragment character embedding vectors in the historical code evaluation suggestion library respectively;
obtaining a sub-history code character embedding vector corresponding to the sub-character embedding vector;
obtaining a sub-history code review suggested word embedding vector corresponding to the sub-history code character embedding vector;
taking the sub-history code review suggested word embedded vector as a relevant word embedded vector corresponding to the sub-character embedded vector;
and generating a code review suggested word embedding vector of which the association degree with the code character embedding vector to be reviewed meets a preset association degree condition according to the obtained multiple associated word embedding vectors.
5. The method of claim 1, wherein the code segment to be evaluated is a code segment changed in source code;
the obtaining of the code review suggestion for the code segment to be reviewed according to the code review model includes:
and obtaining a code review suggestion aiming at the changed code segment according to the code review model.
6. A method for training a code review model, comprising:
obtaining a historical code segment and a historical code review suggestion corresponding to the historical code segment;
acquiring historical code segment character embedding vectors corresponding to the historical code segments and historical code review suggestion word embedding vectors corresponding to the historical code review suggestions;
obtaining a first code review suggested word embedding vector aiming at the historical code segment character embedding vector according to an initial code review model;
calculating cosine similarity of the first code review suggested word embedding vector and the historical code review suggested word embedding vector, wherein the cosine similarity is used for representing the similarity between the first code review suggested word embedding vector and the historical code review suggested word embedding vector;
and training the initial code review model according to the cosine similarity.
7. The method of claim 6, wherein obtaining historical code segment character embedding vectors corresponding to historical code segments and historical code review suggestion corresponding historical code review suggestion word embedding vectors comprises:
performing character splitting processing on the historical code segments to obtain a plurality of sub-historical code characters; performing word segmentation processing on the historical code review suggestion to obtain a plurality of sub-historical code review suggestion words;
obtaining sub-history code character embedding vectors corresponding to the sub-history code characters and sub-history code review suggested word embedding vectors corresponding to the sub-history code review suggested words;
accumulating the obtained multiple sub-historical code character embedding vectors to obtain a vector which is used as the historical code segment character embedding vector;
correspondingly, the vector obtained by accumulating the obtained multiple sub-historical code review suggested word embedding vectors is used as the historical code review suggested word embedding vector.
8. The method of claim 6, wherein the training the initial code review model according to the cosine similarity comprises:
according to the cosine similarity, obtaining a first loss between the first code review suggested word embedding vector and the historical code review suggested word embedding vector;
transmitting the first loss back into a trained code review model;
and the trained code review model adjusts parameters according to the first loss.
9. A code review device, comprising:
the obtaining unit is used for obtaining the code segments to be evaluated;
the code review suggestion generation unit is used for obtaining a code review suggestion aiming at the code segment to be reviewed according to the code review model;
the code review model is used for searching relevant words corresponding to characters included in the code segment to be reviewed from words included in historical code review suggestions, the relevance between the characters included in the code segment to be reviewed and the relevant words meets a preset relevance condition, and the code review suggestions are generated for the code segment to be reviewed according to the relevant words.
10. A training apparatus for a code review model, comprising:
the device comprises a first obtaining unit, a second obtaining unit and a third obtaining unit, wherein the first obtaining unit is used for obtaining a historical code segment and a historical code review suggestion corresponding to the historical code segment;
the second obtaining unit is used for obtaining historical code segment character embedding vectors corresponding to the historical code segments and historical code review suggestion word embedding vectors corresponding to the historical code review suggestions;
the third obtaining unit is used for obtaining a first code review suggested word embedding vector aiming at the historical code segment character embedding vector according to the initial code review model;
the computing unit is used for computing cosine similarity of the first code review suggested word embedding vector and the historical code review suggested word embedding vector, and the cosine similarity is used for representing the similarity between the first code review suggested word embedding vector and the historical code review suggested word embedding vector;
and the training unit is used for training the initial code review model according to the cosine similarity.
11. An electronic device, comprising:
a processor;
memory for storing a program of a method, which when read run by a processor, performs the method of any of claims 1-8.
12. A computer storage medium, characterized in that the computer storage medium stores a computer program which, when executed, performs the method of any one of claims 1-8.
CN202210025868.1A 2022-01-11 2022-01-11 Code review method and device Pending CN114443476A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210025868.1A CN114443476A (en) 2022-01-11 2022-01-11 Code review method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210025868.1A CN114443476A (en) 2022-01-11 2022-01-11 Code review method and device

Publications (1)

Publication Number Publication Date
CN114443476A true CN114443476A (en) 2022-05-06

Family

ID=81368269

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210025868.1A Pending CN114443476A (en) 2022-01-11 2022-01-11 Code review method and device

Country Status (1)

Country Link
CN (1) CN114443476A (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108563555A (en) * 2018-01-10 2018-09-21 江苏工程职业技术学院 Failure based on four objective optimizations changes code prediction method
CN110781072A (en) * 2019-09-10 2020-02-11 中国平安财产保险股份有限公司 Code auditing method, device and equipment based on machine learning and storage medium
CN111552969A (en) * 2020-04-21 2020-08-18 中国电力科学研究院有限公司 Embedded terminal software code vulnerability detection method and device based on neural network
CN111639344A (en) * 2020-07-31 2020-09-08 中国人民解放军国防科技大学 Vulnerability detection method and device based on neural network
CN111832028A (en) * 2020-07-27 2020-10-27 中国工商银行股份有限公司 Code auditing method and device, electronic equipment and medium
CN112416358A (en) * 2020-11-20 2021-02-26 武汉大学 Intelligent contract code defect detection method based on structured word embedded network
CN112416765A (en) * 2020-11-18 2021-02-26 平安普惠企业管理有限公司 Code review method, device, equipment and medium
CN112733137A (en) * 2020-12-24 2021-04-30 哈尔滨工业大学 Binary code similarity analysis method for vulnerability detection
KR102311644B1 (en) * 2021-01-29 2021-10-12 한국과학기술정보연구원 Data analysis apparatus, and control method thereof
CN113535229A (en) * 2021-06-30 2021-10-22 中国人民解放军战略支援部队信息工程大学 Anti-confusion binary code clone detection method based on software gene
CN113672931A (en) * 2021-07-13 2021-11-19 中国人民解放军军事科学院国防科技创新研究院 Software vulnerability automatic detection method and device based on pre-training

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108563555A (en) * 2018-01-10 2018-09-21 江苏工程职业技术学院 Failure based on four objective optimizations changes code prediction method
CN110781072A (en) * 2019-09-10 2020-02-11 中国平安财产保险股份有限公司 Code auditing method, device and equipment based on machine learning and storage medium
CN111552969A (en) * 2020-04-21 2020-08-18 中国电力科学研究院有限公司 Embedded terminal software code vulnerability detection method and device based on neural network
CN111832028A (en) * 2020-07-27 2020-10-27 中国工商银行股份有限公司 Code auditing method and device, electronic equipment and medium
CN111639344A (en) * 2020-07-31 2020-09-08 中国人民解放军国防科技大学 Vulnerability detection method and device based on neural network
CN112416765A (en) * 2020-11-18 2021-02-26 平安普惠企业管理有限公司 Code review method, device, equipment and medium
CN112416358A (en) * 2020-11-20 2021-02-26 武汉大学 Intelligent contract code defect detection method based on structured word embedded network
CN112733137A (en) * 2020-12-24 2021-04-30 哈尔滨工业大学 Binary code similarity analysis method for vulnerability detection
KR102311644B1 (en) * 2021-01-29 2021-10-12 한국과학기술정보연구원 Data analysis apparatus, and control method thereof
CN113535229A (en) * 2021-06-30 2021-10-22 中国人民解放军战略支援部队信息工程大学 Anti-confusion binary code clone detection method based on software gene
CN113672931A (en) * 2021-07-13 2021-11-19 中国人民解放军军事科学院国防科技创新研究院 Software vulnerability automatic detection method and device based on pre-training

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HARSH LAL 等: "Code review analysis of software system using machine learning techniques", Retrieved from the Internet <URL:2017 11th International Conference on Intelligent Systems and Control> *
程序员的那些事: "代码审查也要自动化了?Amazon 推新工具", Retrieved from the Internet <URL:https://www.51cto.com/article/607337.html> *

Similar Documents

Publication Publication Date Title
CN111382255B (en) Method, apparatus, device and medium for question-answering processing
EP2657884B1 (en) Identifying multimedia objects based on multimedia fingerprint
US8612367B2 (en) Learning similarity function for rare queries
WO2017181866A1 (en) Making graph pattern queries bounded in big graphs
CN112131322B (en) Time sequence classification method and device
CN113326420B (en) Question retrieval method, device, electronic equipment and medium
US20200364216A1 (en) Method, apparatus and storage medium for updating model parameter
CN111144109A (en) Text similarity determination method and device
CN111177719A (en) Address category determination method, device, computer-readable storage medium and equipment
CN111144098B (en) Recall method and device for extended question
US11960846B2 (en) Embedding inference
CN108108379A (en) Keyword opens up the method and device of word
US20230274161A1 (en) Entity linking method, electronic device, and storage medium
US11755671B2 (en) Projecting queries into a content item embedding space
CN114443476A (en) Code review method and device
CN116028626A (en) Text matching method and device, storage medium and electronic equipment
US20230237076A1 (en) Automatically drawing infographics for statistical data based on a data model
CN113239149B (en) Entity processing method, device, electronic equipment and storage medium
US20170293863A1 (en) Data analysis system, and control method, program, and recording medium therefor
CN109815312B (en) Document query method and device, computing equipment and computer storage medium
CN112784600A (en) Information sorting method and device, electronic equipment and storage medium
CN110019771B (en) Text processing method and device
CN113157870B (en) Knowledge association method and device and electronic equipment
CN113283229B (en) Text similarity calculation method and device, electronic equipment and storage medium
US20240211698A1 (en) Embedding inference

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination