WO2021169227A1 - 一种代码处理方法、装置、设备及介质 - Google Patents

一种代码处理方法、装置、设备及介质 Download PDF

Info

Publication number
WO2021169227A1
WO2021169227A1 PCT/CN2020/112767 CN2020112767W WO2021169227A1 WO 2021169227 A1 WO2021169227 A1 WO 2021169227A1 CN 2020112767 W CN2020112767 W CN 2020112767W WO 2021169227 A1 WO2021169227 A1 WO 2021169227A1
Authority
WO
WIPO (PCT)
Prior art keywords
code
code change
group
submission
change
Prior art date
Application number
PCT/CN2020/112767
Other languages
English (en)
French (fr)
Inventor
魏昭
梁广泰
王千祥
申博
张伟
Original Assignee
华为技术有限公司
北京大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司, 北京大学 filed Critical 华为技术有限公司
Publication of WO2021169227A1 publication Critical patent/WO2021169227A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/72Code refactoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/362Software debugging
    • G06F11/3624Software debugging by performing operations on the source code, e.g. via a compiler
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/71Version control; Configuration management

Definitions

  • This application relates to the field of computer technology, and in particular to a code processing method, device, device, and computer-readable storage medium.
  • developers When developing software, developers often face the development of multiple tasks. For example, when a developer is developing a new feature, he can also check the code defect (bug) of the existing code and fix the code defect. After completing multiple tasks, developers often tend to submit all the changed code through one code submission operation, resulting in composite commits.
  • code defect bug
  • the submission message of the composite submission contains multiple tasks for code changes, and it is difficult for other users to understand the intention of the code changes. This gives the code review during the code development phase and code recovery (revert) and cherry pick regression tests during the code maintenance phase. (regression testing) and so on have brought inconvenience, which greatly affects the efficiency of software development. Based on this, the industry urgently needs to provide a code processing method to solve the problem that code composite submission affects subsequent code review and regression testing, thereby affecting the efficiency of software development.
  • This application provides a code processing method, which analyzes the code change block as the granularity, obtains the association relationship between the code change blocks, and realizes the intelligent code grouping based on the association relationship, thereby solving the impact of code composite submission on subsequent code review, Regression testing, which in turn affects the efficiency of software development.
  • this application provides a code processing method.
  • the method can be executed by a code processing device, and the code processing device can be deployed in a computer.
  • the computer may specifically be a terminal device such as a desktop computer, a notebook computer, or a tablet computer. In some cases, the computer may also include a server.
  • the code processing device may obtain the code change file of the current version (current version) code file relative to the base version (base version) code file.
  • the basic version specifically refers to the version before redevelopment in the continuous development process
  • the current version specifically refers to the version after redevelopment in the continuous development process.
  • the code change file may include multiple code change blocks.
  • the code processing device may analyze the multiple code change blocks from at least one dimension, so as to obtain at least one association relationship between the multiple code change blocks, and then use the at least one association relationship to compare the multiple
  • the code change blocks are grouped to obtain N (N is a positive integer) code change groups, so as to perform code submission operations respectively according to the N code change groups.
  • the above method utilizes the association relationship between the code change blocks to realize the intelligent and automatic grouping of the code change blocks, avoids the occurrence of false negatives or false positives during manual grouping, and improves the accuracy of the grouping.
  • the code submission operation is performed separately, and the code change tasks contained in the submission message corresponding to each code change group are greatly reduced. This reduces the difficulty for other users to understand the code change intentions, which is beneficial to the code review in the code development stage.
  • Code restoration, selection of the best, and regression testing in the code maintenance phase improve the efficiency of software development.
  • the code processing device can also compile the grouped code change group to check the accuracy of the grouping, and based on the compilation prompt when the grouping is inaccurate The information adjusts the grouping results, optimizes the grouping results, and further improves the accuracy of the grouping.
  • the code processing device can be based on the current version included in the code change block of the first code change group.
  • the code fragments perform code compilation to obtain compilation prompt information, and then adjust the first code change group according to the compilation prompt information.
  • the code processing device can also perform code compilation for multiple code change groups, for example, each of the N code change groups, to obtain the compilation prompt information corresponding to each code change group, and adjust based on the prompt information The above code change group.
  • code compilation prompt of a code change group prompts that the code change group lacks a certain function definition
  • the compilation prompt information of another code change group prompts that the code change group defines the above function but does not use the function, then the code is processed
  • the device may move the code change block including the function definition from the another code change group to the code change group.
  • the code processing device may also re-execute the step of compiling code according to the code fragments of the current version included in the code change block included in the code change group of the code change group. If the compilation prompt message indicates that the compilation is passed or successful, you can end the process of adjusting the grouping results according to the compilation prompt information. If the compilation prompt indicates that the compilation fails, there is a compilation warning or compilation error. , Static modified xxx function is not used) or wrong type (undefined function or undefined variable) to adjust the code change group.
  • the code processing device can cyclically execute the steps of "compiling code according to the current version of the code fragments included in the code change block of the first code change group to obtain compilation prompt information" and "adjust the first code change group according to the compilation prompt information" , Until the code fragments of the current version included in the code change block of the first code change group are compiled and passed.
  • the code processing device may also provide a grouping interface, so as to manually adjust the grouping result based on the grouping interface, so as to further improve the accuracy of the grouping and improve the compatibility.
  • the grouping interface may display the grouping result of the code change block by the code processing device.
  • the grouping result may be the grouping result obtained by the code processing device grouping based on the association relationship between the code change blocks, or may be the grouping result obtained by adjusting the above-mentioned grouping result in combination with the compilation prompt information.
  • the grouping result can be adjusted through the grouping interface.
  • the code processing device can receive the adjustment settings of the second code change group by the user through the grouping interface.
  • the second code change group may be any one of the N code change groups.
  • the second code change group and the first code change group may be the same code change group or different code change groups.
  • the code processing device may adjust the code change block of the second code change group according to the adjustment setting.
  • the code processing device may adaptively adjust other code change blocks through self-learning based on the user's adjustment settings for a certain code change block or some code change blocks in the second code change group.
  • the adjustment efficiency can be improved, and the user's operation convenience can be improved.
  • the adjustment setting of the second code change group by the user through the grouping interface may be the adjustment setting of the target code change block in the second code change group, and the target code change block may be any one or the other in the second code change group.
  • the code processing device may first adjust the target code change block according to the above adjustment setting, then determine the associated code change block of the target code change block from the second code change group, and adjust the associated code change block according to the above adjustment setting.
  • the code processing apparatus may obtain a submission template corresponding to at least one code change group, and then generate a submission message according to the at least one code change group and its corresponding submission template, so as to execute all submissions according to the submission message. Describe the code submission operation. Based on this, on the one hand, it is possible to avoid the problem of a submission message being mixed with multiple change tasks, resulting in difficult to understand the change intent, affecting the follow-up process of software development, and thus affecting the efficiency of software development. On the other hand, it can improve the problem of low accuracy and weak generalization ability caused by the use of long short-term memory (LSTM) to generate submission messages.
  • LSTM long short-term memory
  • the at least one association relationship between the code change blocks may specifically include any one or more of element definition and use relationship, element indirect use relationship, or similar change relationship.
  • the code change block includes code fragments, and the code fragments include statements that carry elements.
  • the so-called elements are the components of the statements in the code fragments, which can specifically include variables and/or functions, and so on. Therefore, the element definition and use relationship may specifically include variable definition and use relationship and/or function definition and use relationship.
  • the indirect use relationship of elements may specifically include the relationship between code change blocks that use the same element. Similar change relationship refers to the relationship between code change blocks with similar change behavior.
  • the code processing device can be obtained in the following manner. Specifically, the code processing device may obtain the code change action set of each code change block in the multiple code change blocks, and then parse the code change action set to obtain the current version and the base corresponding to each code change block. The element definition set and element use set of the version, and then the code processing device obtains one of the multiple code change blocks according to the element definition set and element use set of the current version and the basic version corresponding to each code change block At least one of the definition and use relationship between the elements and the indirect use relationship of the elements.
  • the code processing device can analyze code change blocks from multiple dimensions to obtain multiple association relationships between code change blocks, such as element definition and use relationship, indirect use relationship, and similar use relationship. In this way, richer and more comprehensive information can be provided for code grouping, thereby improving the accuracy of grouping.
  • this application provides a code processing device.
  • the device specifically includes: a communication module, an analysis module and a grouping module.
  • the communication module is used to obtain a code change file of the code file of the current version relative to the code file of the basic version, and the code change file includes a plurality of code change blocks.
  • the analysis module is used to analyze the multiple code change blocks to obtain at least one association relationship between the multiple code change blocks.
  • the grouping module is configured to use the at least one association relationship to group the multiple code change blocks to obtain N code change groups, so as to perform code submission operations respectively according to the N code change groups, where N is positive Integer.
  • the device further includes:
  • the adjustment module is configured to compile code according to the code fragments of the current version included in the code change block of the first code change group to obtain compilation prompt information, and adjust the first code change group according to the compilation prompt information.
  • the communication module is also used to:
  • the device also includes:
  • the adjustment module is configured to adjust the code change block of the second code change group according to the adjustment setting.
  • the adjustment setting of the second code change group includes the adjustment setting of the target code change block in the second code change group
  • the adjustment module is specifically used for:
  • the communication module is also used to:
  • the device also includes:
  • the submission module is configured to generate a submission message according to the at least one code change group and its corresponding submission template, so as to perform the code submission operation according to the submission message.
  • the analysis module is specifically used for:
  • the element definition set and element use set of the current version and the basic version corresponding to each code change block obtain the element definition and use relationship and the element indirect use relationship among the multiple code change blocks At least one.
  • the present application provides a computer including a processor and a memory.
  • the processor and the memory communicate with each other.
  • the processor is configured to execute instructions stored in the memory, so that the computer executes the code processing method in the first aspect or any one of the implementation manners of the first aspect.
  • the present application provides a computer-readable storage medium having instructions stored in the computer-readable storage medium, which when run on a computer, cause the computer to execute the first aspect or any one of the first aspects above Implement the code processing method described in the mode.
  • the present application provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the code processing method described in the first aspect or any one of the implementation manners of the first aspect.
  • FIG. 1 is a schematic diagram of a code change file and code change block provided by an embodiment of the application
  • FIG. 2 is a schematic diagram of a system architecture of a code processing method provided by an embodiment of the application
  • FIG. 3 is an interaction flowchart of a code processing method provided by an embodiment of this application.
  • FIG. 4 is a schematic diagram of a grouping interface provided by an embodiment of the application.
  • FIG. 5 is a schematic structural diagram of a computer provided by an embodiment of this application.
  • Code files refer to text files written in accordance with certain programming language specifications.
  • code files generally refer to source code files.
  • the code file includes a series of human-readable computer language instructions, which can be compiled by a compiler into binary instructions that can be executed by the computer. When the binary instruction is executed, the function corresponding to the binary instruction can be realized.
  • the software development process is often sustainable. Developers can re-develop on the basis of a certain version of the code file to obtain a new version of the code file. For example, a developer can modify a code defect (bug) in a code file of a version, or add a new function (new function) or a new feature (new feature), etc., to obtain a new version of the code file.
  • a code defect bug
  • new function new function
  • new feature new feature
  • the difference between the code file of the current version and the code file of the base version can form a code change file.
  • the code change file includes multiple code change blocks, and each code change block includes the code fragment before the change and the code fragment after the change, that is, the code fragment of the basic version and the code of the current version corresponding to the above-mentioned difference. Fragment.
  • this application also provides an example of a code change block.
  • the code fragments before and after modification in the code change block are all loop statements.
  • the code fragment before modification it uses while to loop, which is prone to infinite loops, resulting in code defects.
  • the modified code fragment it uses for to loop. Since int is a 32-bit integer, the variable sum will overflow when it reaches the maximum value. At this time, the variable sum becomes a negative value, thus ending the loop and fixing the above code defect.
  • this application provides a code processing method.
  • This method analyzes multiple code change blocks in a code change file from the code change block level, obtains at least one association relationship between the multiple code change blocks, and then uses the at least one association relationship to perform the multiple code change blocks Grouping to obtain N code change groups. Since the code change block is used as the granularity to consider the association relationship between the code change blocks, the false positives or false positives when the code change blocks are grouped are reduced, and the grouping accuracy is improved. According to the above N code change groups to perform code submission operations, the code change tasks contained in the submission message corresponding to each code change group are greatly reduced. This reduces the difficulty for other users to understand the code change intentions, which is beneficial to the code development stage. Code review and code restoration, optimal selection, and regression testing in the code maintenance phase have improved the efficiency of software development.
  • the method also supports code compilation of code fragments of the current version included in the code change block of at least one code change group in the N code change groups, and adjusts the code change group based on the compilation result, so that the adjusted code change group
  • the code fragments of the current version included in the code change block are compiled and passed. Since the correctness of code compilation after grouping is considered, the grouping accuracy is further improved, and the software development efficiency is improved.
  • this method also supports the user to manually adjust the code change group, and can adaptively adjust the adjustment settings of the code change group based on the user to optimize the grouping result, so that the submission message corresponding to each code change group is easy to understand. Provide convenience for subsequent code development and code maintenance.
  • code processing method provided in this application can be applied to the application environment including but not limited to the application environment shown in FIG. 2.
  • the code processing device 200 includes a communication module 202, an analysis module 204 and a grouping module 206.
  • the communication module 202 is used to obtain the code change file of the current version of the code file relative to the code file of the basic version.
  • the code change file includes multiple code change blocks.
  • the analysis module 204 can analyze the multiple code change blocks to obtain For at least one association relationship between the multiple code change blocks, the grouping module 206 uses the at least one association relationship to group the multiple code change blocks to obtain N code change groups. Among them, N is a positive integer.
  • the code processing apparatus 200 may further include an adjustment module 208.
  • the adjustment module 208 may include a compiler.
  • the adjustment module 208 may use a compiler to compile the code according to the code fragments of the current version included in the code change block of the first code change group (any one of the N code change groups) to obtain the compilation prompt information. Then, the adjustment module 208 may analyze the compilation prompt information, and adjust the first code change group according to the analysis result. For example, when the compilation prompt information prompts that the function definition is missing in the first code change group, code change blocks that include function definitions in other code change groups can be moved to the first code change group.
  • the adjustment module 208 may also provide a grouping interface through which the user can adjust the code change group.
  • the communication module 202 is also used to receive a user’s grouping interface for the second code change group (any one of the N code change groups, which can be the same code change group as the first code change group, or it can be a different code change group).
  • Code change group the adjustment module 208 adjusts the code change block of the second code change group in response to the adjustment setting.
  • the adjustment module 208 can adjust the target code change block based on the user's adjustment settings of the target code change block in the second code change group through the grouping interface, and determine the target code change block from the second code change group Associated code change block, adjust the associated code change block according to the above adjustment settings. In this way, at least one round of user interaction and adaptive adjustment are used to optimize the grouping result and improve the accuracy of the grouping.
  • the code processing apparatus 200 further includes a submission module 210.
  • the submission module 210 is configured to generate a submission message according to at least one code change group and a submission template corresponding to the code change group, and perform a code submission operation based on the submission message.
  • the code processing device 200 may send the code grouping result to the classification device 100, specifically N code change groups.
  • the communication module 102 of the classification device 100 receives the aforementioned N code change groups, and then the preprocessing module 104 of the classification device 100 generates corresponding feature vectors according to the N code change groups, and the classification module 106 of the classification device 100 can call the submission type predictor Predict the submission type of each code change group.
  • the submission type predictor takes the feature vector as the input and the submission type as the output.
  • the submission type predictor can be obtained by training using training samples generated by a large number of submission examples.
  • the classification module 106 may also obtain a submission template corresponding to the submission type from the template library 108 according to the predicted submission type.
  • the communication module 102 may also send the respective submission template corresponding to the at least one code change group to the code processing device 200, so that the code processing device 200 generates a submission message according to the at least one code change group and its corresponding submission template, and based on the submission template.
  • the submission message realizes smart code submission.
  • the code processing apparatus 200 may be deployed locally, for example, in a local terminal device.
  • the terminal equipment includes, but is not limited to, desktop computers, laptop computers, and so on.
  • the code processing apparatus 200 may also be deployed in the cloud, for example, in a cloud server.
  • the user can access the cloud server through a browser to implement the code processing method provided in this application.
  • the classification apparatus 100 may be deployed in a terminal device or a server. It should be noted that the code processing device 200 and the classification device 100 can be deployed in different devices, which can reduce the demand for system resources of the code processing device 200 and improve the usability. Of course, the code processing device 200 and the classification device 100 can also be deployed in the same device, so that the code processing efficiency can be improved.
  • the code processing method of the present application will be introduced below in conjunction with embodiments.
  • the code processing apparatus 200 and the classification apparatus 100 are respectively deployed in different devices for illustration.
  • the method includes:
  • the code processing device 200 obtains the code change file of the code file of the current version relative to the code file of the basic version.
  • the user when developing software based on the code processing device 200, the user can make changes to the code files of the basic version, such as modifying code defects in the code files of the basic version, adding new functions or adding new functions or adding new functions to the code files of the basic version.
  • the code processing device 200 can receive the code changed for each change behavior, and obtain the code change file.
  • the code change file includes the code change block corresponding to each change behavior. Considering that users usually produce multiple change behaviors when developing software, specifically multiple change behaviors for different tasks, the code change file can include multiple code change blocks. Among them, each code change block includes the code fragment of the current version and the code fragment of the basic version.
  • S304 The code processing device 200 analyzes the multiple code change blocks to obtain at least one association relationship between the multiple code change blocks.
  • the code change block includes code snippets, and the code snippets include statements that carry elements.
  • an element is a component part of a statement in a code fragment, and it can be a variable and/or a function.
  • the variables and/functions need to be defined.
  • the code processing device 200 can analyze from the perspective of element definition and usage, and obtain the association relationship between multiple code change blocks. Specifically, the code processing device 200 can analyze from the perspective of variable definition and usage, and obtain the variable definition and usage relationship among multiple code change blocks. Of course, the code processing device 200 can analyze the function definition and usage to obtain the function definition and usage relationship among multiple code change blocks.
  • the code processing device 200 may also analyze multiple code change blocks to obtain an indirect use relationship.
  • the so-called indirect usage relationship refers to the relationship between different code change blocks that use the same element such as the same variable or the same function. For example, if both code change block 1 and code change block 2 use function A, it is said that there is an indirect use relationship between code change block 1 and code change block 2.
  • the code processing device 200 may first obtain the code change action set of each code change block in the multiple code change blocks. Specifically, the code processing device 200 may use a syntax parse tree differentiation algorithm For example, the gumtree algorithm and the change distiller algorithm determine the code change action set of each code change block.
  • the code change action set includes a set of all change actions included in the code change block.
  • the change action includes any one or more of add (add), delete (delete), modify (change) and move (move).
  • the code processing device 200 parses the code change action set to obtain the element definition set and element use set of the current version and the basic version corresponding to each code change block. Next, the code processing device 200 obtains the element definition and usage relationship among the multiple code change blocks according to the element definition set and the element use set of the current version and the basic version corresponding to each code change block.
  • the element indirectly uses at least one association relationship in the relationship.
  • the code processing device 200 can also analyze the code change blocks from the dimension of the similarity of the change behaviors to obtain the difference between the code change blocks. Relationship.
  • the code processing device 200 can determine the change object and change operation of each code change block, determine the similarity degree of the code change behavior based on the similarity degree of the change object and the similarity degree of the change operation, and make the similarity degree of the code change behavior greater than
  • the code change block with the preset threshold is determined as the code change block with similar change relationship.
  • the change object refers to the object for which the change operation is performed, and the change object can be a variable, a function, and so on.
  • the change operation can include adding, deleting, modifying or moving, etc.
  • the code processing device 200 may also calculate the similarity of the code fragments of the current version and the similarity of the code fragments of the basic version in multiple code change blocks, and determine a group based on the similarity of the code fragments of the current version. For similar code change blocks, another group of similar code change blocks is determined based on the similarity of the code fragments of the basic version. When the above two groups of similar code change blocks are the same code change block, it is determined that this group of code change blocks are code change blocks with similar change relationships.
  • the code processing device 200 can analyze code change blocks from multiple dimensions, and obtain multiple association relationships between multiple code change blocks, such as obtaining direct use relationships between multiple code blocks (including function definitions and use relationships and variables). Definition and use relationship), indirect use relationship and similar change relationship, which can improve the accuracy of grouping.
  • S306 The code processing device 200 uses the association relationship to group the multiple code change blocks to obtain N code change groups.
  • the code processing device 200 can use the association relationship to divide code change blocks with element definitions and usage relationships into the same code change group. Specifically, the code processing device 200 divides code change blocks with function definitions and use relationships into the same code change group, and divides code change blocks with variable definitions and use relationships into the same code change group.
  • code change block 1 defines variable A and function A
  • code change block 2 uses variable A
  • code change block 3 uses function A
  • code change block 4 uses function B
  • the code processing device 200 can change the code Block 1, code change block 2, and code change block 3 are divided into the same code change group, and code change block 4 is divided into different code change groups.
  • the code processing apparatus 200 may also use the association relationship to divide code change blocks with similar change relationships into the same code change group.
  • code change block 1 modifies the expression of function A
  • code modification block 5 also modifies the expression of function A
  • the change objects are all function A
  • the change operations are all modifications.
  • the code processing device 200 determines the code change block 1 and The code change block 5 is a code change block with a similar change relationship, so the code change block 1 and the code change block 5 are divided into the same code change group.
  • the code processing device 200 may use the association relationship to construct a change block relationship graph for the multiple code change blocks, and then use a clustering algorithm to cluster the nodes in the change block relationship graph, thereby implementing the code Change block grouping.
  • the code processing device 200 constructs the change block relationship graph specifically using the code change block as a node, and constructs the edges of the change block relationship graph according to the association relationship of the code change block. Further, the code processing device 200 can assign a corresponding weight to each edge according to the degree of association, so that the change block relationship graph can more accurately express the association relationship between the code change blocks.
  • the K-means clustering algorithm, mean shift (mean shift) clustering algorithm, or hierarchical clustering algorithm can be used to perform clustering on the nodes in the changed block relationship graph. Clustering to obtain N code change groups.
  • each code change block belongs to a code change group.
  • the code processing device 200 may determine the code change group to which the code change block ultimately belongs based on the weights of the different association relationships.
  • the code processing device 200 performs code compilation according to the code fragments of the current version included in the code change block of the first code change group, and obtains compilation prompt information.
  • the code processing device 200 may first compile the code fragments of the current version included in the code change block of the code change group, for example, to any one of the N code change groups (that is, the first code change group). Compile the code fragments of the current version included in the code change block of the code change group to check out the code change group with compilation problems in advance, and adjust the code change group with compilation problems to avoid the submitted code change group including compilation Problem to further improve the accuracy of grouping.
  • the code processing device 200 may select a corresponding compiler according to the programming language of the code segment to respectively perform code compilation on the current version of the code segment included in the code change block in the N code change groups.
  • the code processing apparatus 200 may use Gradle or MVN to compile.
  • the compiler can output compile prompt information after compiling.
  • the compiling prompt information may specifically include multiple types such as compiling errors or compiling warnings.
  • the compilation error indicates that there is an error in the code fragment and cannot be converted into an executable program.
  • the compilation warning indicates that the code segment has a risk point, which may cause the executable program converted by the code segment to be abnormal during execution.
  • this application also provides some specific examples of compilation errors and compilation warnings.
  • the compilation error may be an undefined function or undefined variable, etc.
  • the compilation warning may be a lack of brackets, the static modified xxx function is not used, and so on.
  • the compilation prompt message also includes the location where compilation errors or compilation alarms occur. Among them, the location can be specifically characterized by the line number in the code file.
  • S310 The code processing apparatus 200 adjusts the first code change group according to the compilation prompt information.
  • the code processing device 200 may analyze the compilation prompt information, and adjust the first code change group according to the analysis result. Specifically, the code processing device 200 analyzes the compilation prompt information, and if the analysis result indicates that the variable definition is missing in the first code change group, the code change block containing the variable definition is added to the first code change group, and if the analysis result indicates the first code change group If a function definition is missing from a code change group, the code change block containing the function definition is added to the first code change group.
  • the code processing apparatus 200 may re-execute the code compilation step after adjusting the first code change group according to the compilation prompt information. If the compiler still outputs a compilation error or compilation alarm, the first code change group can be adjusted continuously until the code fragments of the current version included in the code change block in the adjusted first code change group are compiled and passed. If the compiler outputs that the compilation is successful, you can perform the next steps. That is, the code processing apparatus 200 may also execute the foregoing S308 to S310 in a loop until the code fragments of the current version included in the code change block in the adjusted code change group are compiled and passed.
  • the code processing device 200 receives the adjustment settings of the target code change block in the second code change group by the user through the grouping interface.
  • the code processing apparatus 200 also supports the user to group codes or adjust existing grouping results. When the user believes that the grouping of a certain or some code change blocks is not accurate, the grouping result can also be adjusted through the grouping interface.
  • the code processing device 200 is provided with a grouping interface, and the grouping interface can display N code change groups obtained by the code processing device 200 grouping multiple code change blocks.
  • the user can adjust any code change group (that is, the second code change group) in the N code change groups through the grouping interface, specifically for any one or more code change blocks in the second code change group (that is, target Code change block) to make adjustments.
  • the user can move a certain code change block in one code change group to another code change group through the grouping interface, so as to realize the adjustment of the two code change groups.
  • the code processing device 200 receives the adjustment settings of the target code change block in the second code change group by the user through the grouping interface, so as to realize the manual adjustment of the second code change group.
  • the second code change group and the first code change group may be the same code change group or different code change groups.
  • Figure 4 shows a schematic diagram of adjusting the code change group through the grouping interface.
  • the window of the grouping interface includes N sub-forms, and each sub-form is used to display the code changes in a code change group.
  • Block, N sub-windows are used to display code change blocks in the N code change groups obtained by the code processing device 200.
  • the two windows above the N sub-forms respectively display the code snippet of the basic version and the code snippet of the current version included in the code change block, that is, the code snippet before and after the modification Code snippets. In this way, it is convenient to view the difference between the code fragment of the basic version and the code fragment of the current version in the code change block.
  • N is specifically 5.
  • the first code change group includes 3 code change blocks
  • the second to fifth code change groups include 6 code change blocks respectively. The user can move the code change blocks included in the code change group by means of drag and drop.
  • the grouping interface may also provide a navigation bar, which displays a list of code files, and the user can quickly switch to a specified file through the navigation bar.
  • the grouping interface can also provide a change intent display column.
  • the change intent display column can display the change intent corresponding to the code change block, so that the user can quickly view the change intent and improve the adjustment efficiency.
  • the change intent display column may also provide an edit control to support the user to modify the change intent through the edit control. When the user thinks that the change intention is incorrect, they can modify it through the above-mentioned edit controls.
  • the N code change groups displayed on the grouping interface may be N code change groups adjusted after compilation.
  • the N code change groups displayed on the grouping interface may also be N code change groups divided by the code processing apparatus 200 according to at least one association relationship between multiple code change blocks.
  • S314 The code processing device 200 adjusts the target code change block according to the adjustment setting.
  • the code processing device 200 may adjust the target code change block of the second code change group according to the received adjustment setting. For example, the code processing device 200 moves the target code change block from the above-mentioned second code change group to another code change group according to the adjustment setting, or deletes the target code change block from the second code change group.
  • the code processing device 200 determines the associated code change block of the target code change block from the second code change group, and adjusts the associated code change block according to the adjustment setting.
  • the code processing device 200 records the change information of the code change block that performs the adjustment operation.
  • the change information may specifically include the source group and the destination group to which the code change block belongs, that is, the code change group to which the code change block belongs before the adjustment and the code change group to which the code change block belongs after the adjustment.
  • the code processing device 200 analyzes the reason why the user adjusts the code change block, combines at least one association relationship between the multiple code change blocks, determines the associated code change block that needs to be adjusted accordingly, and adjusts the associated code change block based on the adjustment settings. .
  • the code processing apparatus 200 may not perform the above steps, and the user may manually adjust the associated code change block.
  • S318 The code processing device 200 sends a classification request message to the classification device 100.
  • the classification request message includes at least one code change group among the N code change groups.
  • the classification request message is specifically used to classify at least one code change group among the N code change groups to obtain the submission type of the at least one code change group.
  • the so-called commit type specifically refers to the type of commit message.
  • the submission type may include at least one of bug fix, refactor, new feature, and reformat.
  • the code processing device 200 sends a classification request message to the classification device 100 to obtain the submission type of at least one code change group among the N code change groups, and then generates a submission message for at least one code change group.
  • S320 The classification device 100 generates a feature vector according to at least one code change group.
  • the classification device 100 extracts the feature vector corresponding to the code change group according to the code fragments included in the code change block in each code change group. Specifically, the classification device 100 may extract the code change action set according to the code fragments included in the code change block in the code change group, and then generate the feature vector corresponding to the code change group according to the code change action set.
  • the classification device 100 can perform vectorization processing on the code change action set, thereby generating a feature vector corresponding to the code change group.
  • Each code change action in the code change action set can be identified by the change operation and the change object.
  • the change operation includes add, delete, change, or move
  • the change object can be a variable or a function, etc., specifically including variable declarations and function definitions.
  • the above variable declaration and function definition respectively correspond to an AST node type in a syntax parse tree (abstract syntax tree, AST).
  • the classification device 100 may correspond to n AST node types for each of the four alteration operations, and sequentially tile them into a one-dimensional vector. If a certain AST node type is added in this submission, the element value of the corresponding element in the one-dimensional vector is increased by 1. For AST node types that do not appear in the four operations, the element value of the corresponding element in the one-dimensional vector is set to 0. In this way, the feature vector corresponding to the code change group can be obtained.
  • the classification device 100 when extracting feature vectors, can not only consider fine-grained code change actions from a micro perspective, but also consider statistical characteristics of code changes from a macro perspective, such as the number of code change blocks in a code change group. , The number of code change blocks involved in the definition and usage relationship in the code change group, the number of code change blocks involved in similar change relationships in the code change group, and so on. Specifically, the classification device 100 may add several elements on the basis of the above-mentioned one-dimensional vector, and assign the above-mentioned statistical values to these elements. In this way, a more comprehensive feature vector can be extracted, providing more information to the classification model, and improving the classification accuracy of the classification model.
  • the classification device 100 predicts the submission type of the at least one code change group according to the feature vector.
  • the classification device 100 can use the submission type predictor to predict the submission type of at least one code change group.
  • the submission type predictor takes the feature vector as input and the submission type of the code change group as output. Therefore, the classification device 100 inputs the feature vector of at least one code change group into the submission type predictor, and the submission type predictor can predict the submission type of the at least one code change group and output the prediction result.
  • the submission type predictor can be based on pre-collected training samples and adopt a machine learning method for model training generation.
  • a training sample includes a feature vector extracted from a code fragment included in a historical submission message and a submission type label.
  • the submission type label can be determined based on the commit log.
  • the keywords of the submission log include the submission type.
  • the classification device 100 automatically generates a submission type label by analyzing the submission log.
  • the submission type label can be identified by different numbers. For example, bug fix can be identified by the number 00, and refactor can be identified by the number 01.
  • the classification device 100 splices the above-mentioned feature vector and the submission type label to generate a training sample.
  • the classification device 100 can use a dimensionality reduction algorithm to remove the sparsity of the matrix. Finally, the classification device 100 uses a machine learning algorithm, such as xgboost, to perform model training on the sparsity-removed matrix to obtain a submission type predictor.
  • a machine learning algorithm such as xgboost
  • the classification device 100 obtains a submission template corresponding to at least one code change group from the template library.
  • the classification device 100 can mine the historical submission messages of various submission types stored in the code warehouse, so as to obtain a general template of submission messages of various submission types, and the general template can be used as a submission template corresponding to the submission type.
  • Table 1 shows an example of a submission template:
  • XXX in Table 1 is the content to be filled in the submission template, which can be determined by the code features extracted from the code change group.
  • the classification device 100 may query a template library, and obtain a submission template matching the submission type of at least one code change group from the template library, as the submission template corresponding to the code change group.
  • S326 The classification device 100 sends a classification response message to the code processing device 200.
  • the classification response message includes at least one submission template corresponding to the code change group.
  • the code processing apparatus 200 generates a submission message according to the at least one code change group and its corresponding submission template, so as to perform a code submission operation based on the submission message.
  • the code processing device 200 may extract code features for at least one code change group, determine keywords based on the code features, fill the keywords into the submission template corresponding to the code change group, and generate a submission message.
  • the code submission operation can be triggered based on the submission message.
  • the code processing apparatus 200 may also display the submission message, so that the user can confirm the correctness of the submission message.
  • the code processing device 200 supports the user to modify the submission message, such as keywords in the submission message, so as to ensure the correctness of the submission message through user interaction.
  • the foregoing S308 to S316 are optional steps implemented in the embodiment of the application to further improve the grouping accuracy. In some possible implementation manners, the foregoing steps may not be performed.
  • the foregoing S318 to S328 are optional steps implemented in the embodiment of the application to generate a submission message, and in some possible implementation manners, it may also be implemented in other manners.
  • an embodiment of the present application provides a code processing method.
  • This method uses code change blocks as the grouping granularity for grouping, instead of grouping from relatively coarse-grained file-level changes, nor from the transformation of each code feature with too fine granularity to group, avoiding too large granularity to cause inaccurate grouping , And the problem of too much grouping caused by too fine granularity.
  • This method can also comprehensively consider multiple association relationships between code feature blocks, such as variable definition and use relationship, function definition and call relationship, indirect use relationship, or similar change relationship, providing a richer and more comprehensive relationship for code grouping Information to improve the accuracy of grouping.
  • the method also dynamically adjusts the grouping result in combination with the compilation prompt information to ensure that the adjusted code change group is compiled and passed, and the accuracy of code grouping is further improved.
  • the method also supports adjusting the grouping result through user interaction, and adaptively adjusting the grouping result according to the adjustment settings of the user, so as to obtain an optimized grouping result.
  • the method also supports determining the submission type based on the foregoing grouping results, and automatically generating a submission message in combination with the submission template corresponding to the submission type.
  • corresponding submission messages are generated for different code change groups, which avoids the problem that one submission message is mixed with multiple change tasks, and the change intention is difficult to understand, which affects the follow-up process of software development, and then affects the efficiency of software development.
  • it can improve the problem of low accuracy and weak generalization ability caused by the use of long short-term memory (LSTM) to generate submission messages.
  • LSTM long short-term memory
  • the code processing device 200 includes a communication module 202, an analysis module 204 and a grouping module 206.
  • the communication module 202 is configured to obtain a code change file of the code file of the current version relative to the code file of the basic version, and the code change file includes a plurality of code change blocks.
  • the analysis module 204 is configured to analyze the multiple code change blocks to obtain at least one association relationship between the multiple code change blocks.
  • the grouping module 206 is configured to use the at least one association relationship to group the multiple code change blocks to obtain N code change groups, so as to perform code submission operations respectively according to the N code change groups, where N is Positive integer.
  • the apparatus 200 further includes:
  • the adjustment module 208 is configured to compile code according to the code fragments of the current version included in the code change block of the first code change group to obtain compilation prompt information, and adjust the first code change group according to the compilation prompt information.
  • the communication module 202 is also used to:
  • the device 200 further includes:
  • the adjustment module 208 is configured to adjust the code change block of the second code change group according to the adjustment setting.
  • the adjustment setting of the second code change group includes the adjustment setting of the target code change block in the second code change group
  • the adjustment module 208 is specifically configured to:
  • the communication module 202 is also used to:
  • the device 200 further includes:
  • the submission module 210 is configured to generate a submission message according to the at least one code change group and its corresponding submission template, so as to perform the code submission operation according to the submission message.
  • the analysis module 204 is specifically configured to:
  • the element definition set and element use set of the current version and the basic version corresponding to each code change block obtain the element definition and use relationship and the element indirect use relationship among the multiple code change blocks At least one.
  • the code processing apparatus 200 may correspond to the methods described in the embodiments of the present application, and the foregoing and other operations and/or functions of the various modules of the code processing apparatus 200 are used to implement the corresponding methods in FIG. 3. For the sake of brevity, the process will not be repeated here.
  • the above-mentioned code processing apparatus 200 can be implemented by a computer.
  • Fig. 5 provides a schematic structural diagram of a computer. As shown in Fig. 5, the computer can be specifically used to implement the functions of the code processing apparatus 200 in the embodiment shown in Fig. 2 above.
  • the computer 500 includes a bus 501, a processor 502, a communication interface 503, and a memory 504.
  • the processor 502, the memory 504, and the communication interface 503 communicate through a bus 501.
  • the bus 501 may be a peripheral component interconnect standard (PCI) bus or an extended industry standard architecture (EISA) bus, etc.
  • PCI peripheral component interconnect standard
  • EISA extended industry standard architecture
  • the bus can be divided into address bus, data bus, control bus and so on. For ease of presentation, only one thick line is used to represent in FIG.
  • the communication interface 503 is used to communicate with the outside.
  • the communication interface 503 may be used to obtain a code change file of the code file of the current version relative to the code file of the basic version.
  • the processor 502 may be a central processing unit (CPU).
  • the memory 504 may include a volatile memory (volatile memory), such as a random access memory (random access memory, RAM).
  • volatile memory volatile memory
  • RAM random access memory
  • non-volatile memory non-volatile memory
  • ROM read-only memory
  • HDD HDD or SSD.
  • the memory 504 stores executable code, and the processor 502 executes the executable code to execute the aforementioned code processing method.
  • each module of the code processing apparatus 200 described in the embodiment of FIG. 2 is realized by software
  • the analysis module 204 and the grouping module in FIG. 2 are executed. 206.
  • the software or program codes required for the functions of the adjustment module 208 and the submission module 210 are stored in the memory 504.
  • the functions of the communication module 202 can be implemented through the communication interface 503.
  • the communication interface 503 obtains the code change file and transmits it to the processor 502 through the bus 501.
  • the processor 502 executes the program code corresponding to each module stored in the memory 504, executes the code processing method, and realizes multiple code changes in the code change file Blocks are grouped intelligently.
  • the embodiment of the present application also provides a computer-readable storage medium, which includes instructions, which when run on a computer, cause the computer to execute the above-mentioned code processing method applied to the code processing apparatus 200.
  • the embodiments of the present application also provide a computer program product.
  • the computer program product When the computer program product is executed by a computer, the computer executes any one of the foregoing code processing methods.
  • the computer program product may be a software installation package. In the case where any method of the foregoing code processing method is required, the computer program product may be downloaded and executed on the computer.
  • the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physically separate.
  • the physical unit can be located in one place or distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the connection relationship between the modules indicates that they have a communication connection between them, which may be specifically implemented as one or more communication buses or signal lines.
  • this application can be implemented by means of software plus necessary general hardware.
  • it can also be implemented by dedicated hardware including dedicated integrated circuits, dedicated CPUs, dedicated memory, Dedicated components and so on to achieve.
  • all functions completed by computer programs can be easily implemented with corresponding hardware.
  • the specific hardware structures used to achieve the same function can also be diverse, such as analog circuits, digital circuits or special-purpose circuits. Circuit etc.
  • software program implementation is a better implementation in more cases.
  • the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a readable storage medium, such as a computer floppy disk. , U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk, etc., including several instructions to make a computer device (which can be a personal computer, training device, or network device, etc.) execute the various embodiments described in this application method.
  • a computer device which can be a personal computer, training device, or network device, etc.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from a website, computer, training device, or data.
  • the center transmits to another website, computer, training equipment, or data center through wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.).
  • wired such as coaxial cable, optical fiber, digital subscriber line (DSL)
  • wireless such as infrared, wireless, microwave, etc.
  • the computer-readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a training device or a data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (SSD)).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Stored Programmes (AREA)

Abstract

一种代码提交方法,包括:获取当前版本的代码文件相对于基础版本的代码文件的代码变更文件,对代码变更文件包括的多个代码变更块进行分析,获得多个代码变更块之间的至少一种关联关系,如def-use关系、相似变更关系、间接use关系等等,然后根据该关系对代码变更块进行分组,例如通过聚类算法进行分组,进一步地,对每个分组进行编译,根据编译提示信息对分组进行动态调整,另外还可以根据用户对分组的反馈对分组进行调整,基于调整后的分组分别提交代码,由此解决代码复合提交影响后续的代码审查、回归测试,进而影响软件开发效率的问题。

Description

一种代码处理方法、装置、设备及介质
本申请要求于2020年02月25日提交中国国家知识产权局、申请号为202010118173.9、发明名称为“一种代码处理方法、装置、设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,尤其涉及一种代码处理方法、装置、设备以及计算机可读存储介质。
背景技术
在进行软件开发时,开发者经常面临多个任务的开发。例如,开发者在开发新特性时,还可以检查已有代码的代码缺陷(bug),并修复该代码缺陷。在完成多个任务之后,开发者常常倾向于通过一次代码提交操作提交所有变更代码,如此产生了复合提交(composite commits)。
复合提交的提交消息包含了代码变更的多个任务,其他用户很难理解代码变更意图,这给代码开发阶段的代码审查以及代码维护阶段的代码恢复(revert)、择优挑选(cherry pick)回归测试(regression testing)等带来了不便,极大地影响了软件开发效率。基于此,业界亟需提供一种代码处理方法,以解决代码复合提交影响后续的代码审查、回归测试,进而影响软件开发效率的问题。
发明内容
本申请提供了一种代码处理方法,其以代码变更块为粒度进行分析,获得代码变更块之间的关联关系,基于该关联关系实现智能代码分组,从而解决代码复合提交影响后续的代码审查、回归测试,进而影响软件开发效率的问题。
第一方面,本申请提供了一种代码处理方法。该方法可以由代码处理装置执行,代码处理装置可以部署于计算机中。计算机具体可以是台式机、笔记本电脑、平板电脑等终端设备,在有些情况下,计算机也可以包括服务器。
具体地,代码处理装置可以获取当前版本(current version)的代码文件相对于基础版本(base version)的代码文件的代码变更文件。其中,基础版本具体是指持续开发过程中再次开发之前的版本,当前版本具体是指持续开发过程中再次开发之后的版本。考虑到一次开发过程中可以包括多种代码变更操作,所述代码变更文件可以包括多个代码变更块。
代码处理装置可以从至少一个维度对所述多个代码变更块进行分析,从而获得所述多个代码变更块之间的至少一种关联关系,然后利用所述至少一种关联关系对所述多个代码变更块进行分组得到N(N为正整数)个代码变更组,以便根据所述N个代码变更组分别执行代码提交操作。
上述方法利用代码变更块之间的关联关系实现对代码变更块智能化以及自动化分组, 避免了人工分组时的漏报或误报发生,提高了分组准确度。根据上述分组结果分别执行代码提交操作,每个代码变更组对应的提交消息包含的代码变更的任务大幅减少,如此,降低了其他用户理解代码变更意图的难度,有利于代码开发阶段的代码审查以及代码维护阶段的代码恢复、择优挑选、回归测试,提高了软件开发效率。
在一些可能的实现方式中,考虑到代码提交之前往往还需要进行编译,代码处理装置还可以通过对分组所得的代码变更组进行编译,以检验分组准确性,并在分组不准确时基于编译提示信息对分组结果进行调整,优化分组结果,进一步提高分组准确度。
具体实现时,针对N个代码变更组中的任意一个代码变更组(本申请称之为第一代码变更组),代码处理装置可以根据该第一代码变更组的代码变更块包括的当前版本的代码片段进行代码编译,得到编译提示信息,然后根据所述编译提示信息调整所述第一代码变更组。
当然,代码处理装置也可以对多个代码变更组,例如N个代码变更组中的每一个代码变更组,分别进行代码编译,得到各代码变更组对应的编译提示信息,并基于该提示信息调整上述代码变更组。例如,某一代码变更组的编译提示信息提示该代码变更组缺少某一函数定义时,另一代码变更组的编译提示信息提示该代码变更组定义了上述函数但并未使用函数,则代码处理装置可以从上述另一代码变更组中将包括上述函数定义的代码变更块移动至上述代码变更组。
针对调整后的代码变更组,代码处理装置还可以重新执行根据代码变更组的代码变更组包括的代码变更块包括的当前版本的代码片段进行代码编译的步骤。若编译提示信息提示编译通过或者编译成功,则可以结束根据编译提示信息调整分组结果的流程,若编译提示信息提示编译不通过,即存在编译告警或编译错误,则可以根据告警类型(如缺少括弧、static修饰的xxx函数没有被使用到)或错误类型(未定义函数或未定义变量)调整代码变更组。
也即代码处理装置可以循环执行“根据第一代码变更组的代码变更块包括的当前版本的代码片段进行代码编译,获得编译提示信息”以及“根据编译提示信息调整第一代码变更组”的步骤,直至第一代码变更组的代码变更块包括的当前版本的代码片段编译通过。
在一些可能的实现方式中,代码处理装置还可以提供分组界面,以便基于该分组界面人工调整分组结果,进一步提高分组准确度,以及提高兼容性。
具体地,分组界面可以展示代码处理装置对代码变更块的分组结果。该分组结果可以是代码处理装置基于代码变更块之间的关联关系分组所得分组结果,也可以是结合编译提示信息对上述分组结果进行调整所得的分组结果。当用户认为分组结果不准确时,可以通过分组界面对分组结果进行调整。代码处理装置可以接收用户通过分组界面对第二代码变更组的调整设置。其中,第二代码变更组可以是N个代码变更组中的任意一个。该第二代码变更组与第一代码变更组可以是同一代码变更组,也可以是不同的代码变更组。代码处理装置可以根据所述调整设置调整所述第二代码变更组的代码变更块。
在一些可能的实现方式中,代码处理装置可以基于用户对第二代码变更组中某一代码变更块或某些代码变更块的调整设置,通过自学习的方式自适应调整其他代码变更块,如此可以提高调整效率,以及提高用户操作便利性。
具体地,用户通过分组界面对第二代码变更组的调整设置可以是对第二代码变更组中目标代码变更块的调整设置,该目标代码变更块可以是第二代码变更组中的任意一个或多个代码变更块。代码处理装置可以根据上述调整设置先调整目标代码变更块,然后从第二代码变更组中确定目标代码变更块的关联代码变更块,根据上述调整设置调整所述关联代码变更块。
在一些可能的实现方式中,代码处理装置可以获取至少一个代码变更组对应的提交模板,然后根据所述至少一个代码变更组及其对应的提交模版生成提交消息,以便根据所述提交消息执行所述代码提交操作。基于此,一方面可以避免一个提交消息混杂多个变更任务,导致的变更意图难以理解,影响软件开发的后续流程,进而影响软件开发效率的问题。另一方面,可以改善使用长短期记忆网络(long short-term memory,LSTM)生成提交消息导致的准确度不高、泛化能力不强的问题。
在一些可能的实现方式中,代码变更块之间的至少一种关联关系具体可以包括元素定义与使用关系、元素间接使用关系或者相似变更关系中的任意一种或多种。其中,代码变更块包括代码片段,而代码片段包括携带元素的语句。所谓元素即是代码片段中语句的组成部分,具体可以包括变量和/或函数等等。因此,元素定义与使用关系具体可以包括变量定义与使用关系和/或函数定义与使用关系。元素间接使用关系具体可以包括使用相同元素的代码变更块之间的关系。相似变更关系则是指具有相似变更行为的代码变更块之间的关系。
针对元素定义与使用关系、元素间接使用关系,代码处理装置可以通过如下方式获得。具体地,代码处理装置可以获取多个代码变更块中各代码变更块的代码变更动作集合,然后解析所述代码变更动作集合,获得所述各代码变更块对应的所述当前版本和所述基础版本的元素定义集和元素使用集,接着代码处理装置根据所述各代码变更块对应的所述当前版本和所述基础版本的元素定义集和元素使用集,获取所述多个代码变更块之间的元素定义与使用关系和元素间接使用关系中的至少一种。
需要说明的是,代码处理装置可以从多个维度对代码变更块进行分析,获得代码变更块之间的多种关联关系,例如元素定义与使用关系、间接使用关系和相似使用关系等等。如此可以为代码分组提供更为丰富且全面的信息,从而提高分组的准确度。
第二方面,本申请提供了一种代码处理装置。该装置具体包括:通信模块、分析模块和分组模块。其中,通信模块用于获取当前版本的代码文件相对于基础版本的代码文件的代码变更文件,所述代码变更文件包括多个代码变更块。分析模块,用于对所述多个代码变更块进行分析,获得所述多个代码变更块之间的至少一种关联关系。分组模块,用于利用所述至少一种关联关系对所述多个代码变更块进行分组得到N个代码变更组,以便根据所述N个代码变更组分别执行代码提交操作,所述N为正整数。
在一些可能的实现方式中,所述装置还包括:
调整模块,用于根据第一代码变更组的代码变更块包括的当前版本的代码片段进行代码编译,得到编译提示信息,根据所述编译提示信息调整所述第一代码变更组。
在一些可能的实现方式中,所述通信模块还用于:
接收用户通过分组界面对第二代码变更组的调整设置;
所述装置还包括:
调整模块,用于根据所述调整设置调整所述第二代码变更组的代码变更块。
在一些可能的实现方式中,所述对第二代码变更组的调整设置包括对第二代码变更组中目标代码变更块的调整设置;
所述调整模块具体用于:
根据所述调整设置调整所述目标代码变更块;
从所述第二代码变更组中确定所述目标代码变更块的关联代码变更块;
根据所述调整设置调整所述关联代码变更块。
在一些可能的实现方式中,所述通信模块还用于:
获取至少一个代码变更组对应的提交模板;
所述装置还包括:
提交模块,用于根据所述至少一个代码变更组及其对应的提交模版生成提交消息,以便根据所述提交消息执行所述代码提交操作。
在一些可能的实现方式中,所述分析模块具体用于:
获取所述多个代码变更块中各代码变更块的代码变更动作集合;
解析所述代码变更动作集合,获得所述各代码变更块对应的所述当前版本和所述基础版本的元素定义集和元素使用集;
根据所述各代码变更块对应的所述当前版本和所述基础版本的元素定义集和元素使用集,获取所述多个代码变更块之间的元素定义与使用关系和元素间接使用关系中的至少一种。
第三方面,本申请提供一种计算机,所述计算机包括处理器和存储器。所述处理器、所述存储器进行相互的通信。所述处理器用于执行所述存储器中存储的指令,以使得所述计算机执行如第一方面或第一方面的任一种实现方式中的代码处理方法。
第四方面,本申请提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述第一方面或第一方面的任一种实现方式所述的代码处理方法。
第五方面,本申请提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面或第一方面的任一种实现方式所述的代码处理方法。
本申请在上述各方面提供的实现方式的基础上,还可以进行进一步组合以提供更多实现方式。
附图说明
为了更清楚地说明本申请实施例的技术方法,下面将对实施例中所需使用的附图作以简单地介绍。
图1为本申请实施例提供的一种代码变更文件及代码变更块的示意图;
图2为本申请实施例提供的一种代码处理方法的系统架构示意图;
图3为本申请实施例提供的一种代码处理方法的交互流程图;
图4为本申请实施例提供的一种分组界面的示意图;
图5为本申请实施例提供的一种计算机的结构示意图。
具体实施方式
下面将结合本申请中的附图,对本申请提供的实施例中的方案进行描述。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。
为了便于理解本申请的技术方案,下面对本申请涉及的一些技术术语进行介绍。
代码文件是指按照一定的程序设计语言规范编写的文本文件。在本申请中,代码文件一般是指源代码文件。代码文件包括一系列人类可读的计算机语言指令,其可以通过编译器编译为计算机可以执行的二进制指令。当二进制指令被执行时,可以实现与该二进制指令相应的功能。
软件开发过程往往是可持续的,开发者可以在某一版本的代码文件基础上进行再次开发,得到新版本的代码文件。例如,开发者可以修改一个版本的代码文件中的代码缺陷(bug),或者增加新功能(new function)或新特性(new feature)等等,从而得到一个新版本的代码文件。为了方便描述,再次开发之前的版本可以称之为基础版本(base version),再次开发所得的新版本可以称之为当前版本(current version)。
当前版本的代码文件相对于基础版本的代码文件的差异部分可以形成代码变更文件。如图1所示,代码变更文件包括多个代码变更块,每个代码变更块包括变更前的代码片段和变更后的代码片段,即上述差异部分对应的基础版本的代码片段和当前版本的代码片段。
为了便于理解,本申请还提供了代码变更块的一个实例。如图1所示,该代码变更块中修改前和修改后的代码片段均为循环语句。针对修改前的代码片段,其采用while进行循环,容易产生死循环,从而产生了代码缺陷。针对修改后的代码片段,其采用for进行循环。由于int为32位整型,变量sum在增加到最大值时将会溢出,此时变量sum变为负值,从而结束循环,修复了上述代码缺陷。
开发者在进行软件开发时,可以针对多个任务进行开发,从而产生多个代码变更块。开发者可以通过一次代码提交操作提交上述多个代码变更块时,如此产生了复合提交(composite commits)。由于复合提交的提交消息包含代码变更的多个任务,增加了其他用户理解代码变更意图的难度,给代码开发阶段的代码审查以及代码维护阶段的代码恢复(revert)、择优挑选(cherry pick)、回归测试(regression testing)等带来了不便,影响了软件开发效率。
基于此,本申请提供了一种代码处理方法。该方法从代码变更块层面对代码变更文件中的多个代码变更块进行分析,获得多个代码变更块之间的至少一种关联关系,然后利用至少一种关联关系对多个代码变更块进行分组,从而得到N个代码变更组。由于以代码变更块为粒度考虑了代码变更块之间的关联关系,减少了对代码变更块分组时的漏报或误报,提高了分组准确度。根据上述N个代码变更组分别执行代码提交操作,每个代码变更组对应的提交消息包含的代码变更的任务大幅减少,如此,降低了其他用户理解代码变更意图 的难度,有利于代码开发阶段的代码审查以及代码维护阶段的代码恢复、择优挑选、回归测试,提高了软件开发效率。
进一步地,该方法还支持对N个代码变更组中至少一个代码变更组的代码变更块包括的当前版本的代码片段进行代码编译,并基于编译结果调整代码变更组,使得调整后的代码变更组中代码变更块包括的当前版本的代码片段编译通过。由于考虑了分组之后代码编译的正确性,因而进一步提高了分组准确度,提高了软件开发效率。此外,该方法还支持用户对代码变更组进行人工调整,并可以基于用户对代码变更组的调整设置进行自适应调整,以优化分组结果,使得每个代码变更组对应的提交消息易于理解,为后续的代码开发以及代码维护提供便利。
可以理解,本申请提供的代码处理方法可以应用于包括但不限于如图2所示的应用环境中。
如图2所示,代码处理装置200包括通信模块202、分析模块204和分组模块206。其中,通信模块202用于获取当前版本的代码文件相对于基础版本的代码文件的代码变更文件,该代码变更文件包括多个代码变更块,分析模块204可以对多个代码变更块进行分析,获得多个代码变更块之间的至少一种关联关系,分组模块206利用至少一种关联关系对多个代码变更块进行分组得到N个代码变更组。其中,N为正整数。
进一步地,代码处理装置200还可以包括调整模块208。其中,调整模块208中可以包括编译器。调整模块208可以根据第一代码变更组(N个代码变更组中的任意一个)的代码变更块包括的当前版本的代码片段,利用编译器进行代码编译,得到编译提示信息。然后,调整模块208可以解析编译提示信息,并根据解析结果调整第一代码变更组。例如,编译提示信息提示第一代码变更组中缺少函数定义时,可以其他代码变更组中包括函数定义的代码变更块移动至该第一代码变更组。
调整模块208还可以提供分组界面,用户可以通过该分组界面对代码变更组进行调整。具体地,通信模块202还用于接收用户通过分组界面对第二代码变更组(N个代码变更组中的任意一个,其与第一代码变更组可以是同一个代码变更组,也可以是不同的代码变更组)的调整设置,调整模块208响应于该调整设置调整第二代码变更组的代码变更块。在实际应用时,调整模块208可以基于用户通过分组界面对第二代码变更组中目标代码变更块的调整设置,调整目标代码变更块,并从第二代码变变更组中确定目标代码变更块的关联代码变更块,根据上述调整设置调整关联代码变更块。如此,通过至少一轮用户交互以及自适应调整实现分组结果的优化,提高分组准确度。
在一些实现方式中,代码处理装置200还包括提交模块210。提交模块210用于根据至少一个代码变更组以及该代码变更组对应的提交模板生成提交消息,基于该提交消息执行代码提交操作。例如,代码处理装置200可以向分类装置100发送代码分组结果,具体为N个代码变更组。分类装置100的通信模块102接收上述N个代码变更组,然后分类装置100的预处理模块104根据N个代码变更组分别生成对应的特征向量,分类装置100的分类模块106可以调用提交类型预测器对各代码变更组的提交类型进行预测。其中,提交类型预测器以特征向量为输入,以提交类型为输出。提交类型预测器可以通过利用大量的提交实例生成的训练样本训练得到。
分类模块106还可以根据预测的提交类型从模板库108中获取与提交类型相对应的提交模板。相应地,通信模块102还可以向代码处理装置200发送上述至少一个代码变更组各自对应的提交模板,以便代码处理装置200根据至少一个代码变更组及其对应的提交模板生成提交消息,并基于该提交消息实现智能代码提交。
其中,代码处理装置200可以部署在本地,例如部署在本地的终端设备中。该终端设备包括但不限于台式机、笔记本电脑等等。在一些可能的实现方式中,代码处理装置200也可以部署在云端,例如部署在云服务器中。用户可以通过浏览器访问云服务器的方式实现本申请提供的代码处理方法。分类装置100可以部署在终端设备或服务器中。需要说明的是,代码处理装置200和分类装置100可以部署在不同的设备,如此可以减少代码处理装置200对系统资源的需求,提高可用性。当然,代码处理装置200和分类装置100也可以部署在相同的设备,如此,可以提高代码处理效率。
为了使得本申请的技术方案更加清楚、易于理解,下面结合实施例对本申请的代码处理方法进行介绍。该实施例以代码处理装置200和分类装置100分别部署于不同的设备进行示例说明。
参见图3所示的代码处理方法的流程图,该方法包括:
S302:代码处理装置200获取当前版本的代码文件相对于基础版本的代码文件的代码变更文件。
具体地,用户在基于代码处理装置200进行软件开发时,可以对基础版本的代码文件进行变更,如修改基础版本的代码文件中的代码缺陷,在基础版本的代码文件基础上新增新功能或新特性对应的代码,或者是在基础版本的代码文件基础上删除已有功能或已有特性对应的代码等,得到当前版本的代码文件。代码处理装置200可以接收每一次变更行为所变更的代码,获得代码变更文件。
该代码变更文件包括每一次变更行为对应的代码变更块。考虑到用户在进行软件开发时,通常会产生多次变更行为,具体是针对不同任务产生多次变更行为,代码变更文件可以包括多个代码变更块。其中,每个代码变更块包括当前版本的代码片段和基础版本的代码片段。
S304:代码处理装置200对所述多个代码变更块进行分析,获得所述多个代码变更块之间的至少一种关联关系。
代码变更块包括代码片段,而代码片段包括携带元素的语句。本实施例中,元素是代码片段中语句的组成部分,其可以是变量和/或函数等。在使用变量和/函数时,需要对变量和/函数进行定义。
基于此,代码处理装置200可以从元素定义与使用角度分析,获得多个代码变更块之间的关联关系。具体地,代码处理装置200可以从变量定义与使用角度分析,获得多个代码变更块之间的变量定义与使用关系。当然,代码处理装置200可以从函数定义与使用角度分析,获得多个代码变更块之间的函数定义与使用关系。
其中,上述定义与使用关系属于直接使用关系,在一些可能的实现方式中,代码处理装置200还可以对多个代码变更块进行分析,获得间接使用关系。所谓间接使用关系是指 使用同一元素如同一变量或同一函数的不同代码变更块之间的关系。例如,代码变更块1和代码变更块2均使用函数A,则称代码变更块1和代码变更块2之间具有间接使用关系。
在从元素定义与使用角度分析时,代码处理装置200可以先获取所述多个代码变更块中各代码变更块的代码变更动作集合,具体地,代码处理装置200可以通过语法解析树差分化算法如gumtree算法、变更蒸馏change distiller算法等确定各代码变更块的代码变更动作集合。其中,代码变更动作集合中包括代码变更块包括的所有变更动作的集合。变更动作包括增加(add)、删除(delete)、修改(change)和移动(move)中的任意一种或多种。
然后,代码处理装置200解析所述代码变更动作集合,获得所述各代码变更块对应的所述当前版本和所述基础版本的元素定义集和元素使用集。接着,代码处理装置200根据所述各代码变更块对应的所述当前版本和所述基础版本的元素定义集和元素使用集,获取所述多个代码变更块之间的元素定义与使用关系和元素间接使用关系中的至少一种关联关系。
考虑到相似的代码变更行为的变更意图是相似的,在一些可能的实现方式中,代码处理装置200还可以从变更行为的相似性这一维度对代码变更块进行分析,获得代码变更块之间的关联关系。
具体地,代码处理装置200可以确定各个代码变更块的变更对象以及变更操作,基于变更对像的相似程度以及变更操作的相似程度确定代码变更行为的相似程度,并将代码变更行为的相似程度大于预设阈值的代码变更块确定为具有相似变更关系的代码变更块。其中,变更对象是指执行变更操作所针对的对象,该变更对象可以是变量、函数等等。变更操作可以包括增加、删除、修改或者移动等等。
在实际应用时,代码处理装置200也可以分别计算多个代码变更块中当前版本的代码片段的相似度以及基础版本的代码片段的相似度,并基于当前版本的代码片段的相似度确定一组相似的代码变更块,基于基础版本的代码片段的相似度确定另一组相似的代码变更块。当上述两组相似的代码变更块为相同的代码变更块时,则确定这一组代码变更块为具有相似变更关系的代码变更块。
代码处理装置200可以从多维度对代码变更块进行分析,获得多个代码变更块之间的多种关联关系,如获得多个代码块之间的直接使用关系(包括函数定义与使用关系和变量定义与使用关系)、间接使用关系以及相似变更关系,如此可以提高分组准确度。
S306:代码处理装置200利用所述关联关系对所述多个代码变更块进行分组得到N个代码变更组。
在具体实现时,代码处理装置200可以利用关联关系,将具有元素定义与使用关系的代码变更块划分至同一代码变更组。具体地,代码处理装置200将具有函数定义与使用关系的代码变更块划分至同一代码变更组,将具有变量定义与使用关系的代码变更块划分至同一代码变更组。
例如,代码变更块1定义了变量A和函数A,代码变更块2使用了变量A,代码变更块3使用了函数A,代码变更块4使用了函数B,则代码处理装置200可以将代码变更块1、代码变更块2和代码变更块3划分至同一代码变更组,而将代码变更块4划分至不同代码变更组。
在一些可能的实现方式中,代码处理装置200也可以利用关联关系,将具有相似变更关系的代码变更块划分至同一代码变更组。例如,代码变更块1修改了函数A的表达式,代码变更块5也修改了函数A的表达式,变更对象均为函数A,变更操作均为修改,代码处理装置200确定代码变更块1和代码变更块5是具有相似变更关系的代码变更块,故将代码变更块1和代码变更块5划分至同一代码变更组。
在具体实现时,代码处理装置200可以利用所述关联关系对所述多个代码变更块构建变更块关系图,然后通过聚类算法对变更块关系图中的节点进行聚类处理,从而实现代码变更块分组。其中,代码处理装置200构建变更块关系图具体是以代码变更块为节点,根据代码变更块的关联关系构建变更块关系图的边。进一步地,代码处理装置200可以根据关联程度为每条边赋予相应的权重,使得变更块关系图能够更为精准地表达代码变更块之间的关联关系。在对变更块关系图进行聚类处理时,具体可以通过K均值(K-means)聚类算法、均值偏移(mean shift)聚类算法或者层次聚类算法对变更块关系图中的节点进行聚类,从而得到N个代码变更组。
需要说明的是,在对代码变更块进行分组时,每个代码变更块归属于一个代码变更组。当某一代码变更块与不同代码变更组的代码变更块具有关联关系时,代码处理装置200可以基于不同关联关系的权重确定该代码变更块最终归属的代码变更组。
S308:代码处理装置200根据第一代码变更组的代码变更块包括的当前版本的代码片段进行代码编译,得到编译提示信息。
考虑到代码编译正确性,代码处理装置200可以先对代码变更组的代码变更块包括的当前版本的代码片段进行编译,例如,对N个代码变更组中的任意一个代码变更组(即第一代码变更组)的代码变更块包括的当前版本的代码片段进行编译,以提前检出存在编译问题的代码变更组,并对存在编译问题的代码变更组进行调整,避免提交的代码变更组包含编译问题,进一步提高分组准确度。
为了便于描述,下文以对N个代码变更组的代码变更块包括的当前版本的代码片段分别进行编译进行示例说明。在具体实现时,代码处理装置200可以根据代码片段的编程语言选择相应的编译器对N个代码变更组中代码变更块包括的当前版本的代码片段分别进行代码编译。例如,针对采用Java编程语言的代码片段,代码处理装置200可以采用Gradle或MVN进行编译。编译器在编译完成后可以输出编译提示信息。
编译提示信息具体可以包括编译错误(compiling error)或者编译报警(compiling warning)等多种类型。其中,编译错误指示代码片段存在错误,不能转换为可执行程序。编译报警指示代码片段存在风险点,该风险点可以导致该代码片段转换的可执行程序在执行时出现异常。
为了便于理解,本申请还提供了编译错误和编译报警的一些具体示例。在一个示例中,编译错误可以为未定义函数或未定义变量等等,编译告警可以为缺少括弧、static修饰的xxx函数没有被使用到等等。
需要说明的是,编译提示信息中还包括发生编译错误或者编译报警的位置。其中,位置具体可以通过在代码文件中的行号进行表征。
S310:代码处理装置200根据所述编译提示信息调整所述第一代码变更组。
代码处理装置200可以解析编译提示信息,根据解析结果调整第一代码变更组。具体地,代码处理装置200解析编译提示信息,若解析结果指示第一代码变更组中缺少变量定义,则将包含该变量定义的代码变更块添加至该第一代码变更组,若解析结果指示第一代码变更组中缺少函数定义,则将包含该函数定义的代码变更块添加至该第一代码变更组。
进一步地,代码处理装置200可以在根据编译提示信息调整第一代码变更组后,重新执行代码编译步骤。如果编译器仍输出编译错误或编译报警,则可以继续调整第一代码变更组,直至调整后的第一代码变更组中代码变更块包括的当前版本的代码片段均编译通过。如果编译器输出编译成功,则可以执行后续步骤。也即,代码处理装置200还可以循环执行上述S308至S310,直至调整后的代码变更组中所述代码变更块包括的当前版本的代码片段均编译通过。
S312:代码处理装置200接收用户通过分组界面对第二代码变更组中目标代码变更块的调整设置。
在一些可能的实现方式中,代码处理装置200还支持用户对代码进行分组或者对已有的分组结果进行调整。当用户认为某个或某些代码变更块的分组不准确时,还可以通过分组界面对分组结果进行调整。
具体地,代码处理装置200提供有分组界面,该分组界面可以展示代码处理装置200对多个代码变更块进行分组所获得的N个代码变更组。用户可以通过分组界面对N个代码变更组中的任意一个代码变更组(即第二代码变更组)进行调整,具体是针对第二代码变更组中的任意一个或多个代码变更块(即目标代码变更块)进行调整。例如,用户可以通过分组界面将一个代码变更组中某一个代码变更块移动至另一个代码变更组,如此实现对两个代码变更组进行调整。
代码处理装置200接收用户通过分组界面对第二代码变更组中目标代码变更块的调整设置,以便实现对第二代码变更组的手动调整。其中,第二代码变更组与第一代码变更组可以是同一个代码变更组,也可以是不同的代码变更组。
图4示出了通过分组界面对代码变更组进行调整的示意图,如图4所示,分组界面的窗体中包括N个子窗体,每个子窗体用于展示一个代码变更组中的代码变更块,N个子窗体用于展示代码处理装置200获得的N个代码变更组中的代码变更块。当用户点击任一代码变更块时,这N个子窗体上方的两个窗体分别展示该代码变更块包括的基础版本的代码片段和当前版本的代码片段,即修改前的代码片段和修改后的代码片段。如此,可以方便查看代码变更块中基础版本的代码片段和当前版本的代码片段的差异。
在该示例中,N具体为5。其中,第一个代码变更组包括3个代码变更块,第二至第五个代码变更组分别包括6个代码变更块。用户可以通过拖拽等方式移动代码变更组中所包括的代码变更块。
在一些可能的实现方式中,如图4所示,分组界面还可以提供导航栏,导航栏展示有代码文件列表,用户可以通过该导航栏迅速切换到指定文件。当然,分组界面还可以提供变更意图展示栏,当用户点击任一代码变更块时,变更意图展示栏可以展示该代码变更块对应的变更意图,方便用户快速查看变更意图,提高调整效率。需要说明的是,变更意图展示栏还可以提供编辑控件,支持用户通过该编辑控件对变更意图进行修改。当用户认为 变更意图不正确时,可以通过上述编辑控件进行修改。
需要说明的是,分组界面展示的N个代码变更组可以是经过编译后调整所得的N个代码变更组。当然,上述S308至S310未执行时,分组界面展示的N个代码变更组也可以是代码处理装置200根据多个代码变更块之间的至少一种关联关系划分的N个代码变更组。
S314:代码处理装置200根据所述调整设置调整所述目标代码变更块。
代码处理装置200可以根据接收到的调整设置,调整第二代码变更组的目标代码变更块。例如,代码处理装置200根据调整设置,将目标代码变更块由上述第二代码变更组移动到另一个代码变更组,或者将目标代码变更块从第二代码变更组中删除。
S316:代码处理装置200从所述第二代码变更组中确定所述目标代码变更块的关联代码变更块,根据所述调整设置调整所述关联代码变更块。
在一些可能的实现方式中,代码处理装置200记录有执行调整操作的代码变更块的变更信息。该变更信息具体可以包括代码变更块所属的源分组和目的分组,即代码变更块在调整前所属的代码变更组和在调整后所属的代码变更组。代码处理装置200分析用户调整上述代码变更块的原因,结合多个代码变更块之间的至少一种关联关系,确定出需要随之调整的关联代码变更块,基于上述调整设置调整关联代码变更块。
需要说明的是,在一些可能的实现方式中,代码处理装置200也可以不执行上述步骤,由用户手动对关联代码变更块进行调整。
S318:代码处理装置200向分类装置100发送分类请求消息。
分类请求消息包括N个代码变更组中的至少一个代码变更组。该分类请求消息具体用于对N个代码变更组中至少一个代码变更组进行分类获得至少一个代码变更组的提交类型。
所谓提交类型具体是指提交消息(commit message)的类型。在实际应用时,提交类型可以包括缺陷修复(bug fix)、重构(refactor)、新特性(feature)和重设格式(reformat)中的至少一种。代码处理装置200通过向分类装置100发送分类请求消息,以获取N个代码变更组中至少一个代码变更组的提交类型,进而生成至少一个代码变更组的提交消息。
S320:分类装置100根据至少一个代码变更组生成特征向量。
分类装置100根据每个代码变更组中代码变更块包括的代码片段提取该代码变更组对应的特征向量。具体地,分类装置100可以根据代码变更组中代码变更块包括的代码片段提取代码变更动作集合,然后根据代码变更动作集合生成该代码变更组对应的特征向量。
在实际应用时,分类装置100可以对代码变更动作集合进行向量化处理,从而生成代码变更组对应的特征向量。代码变更动作集合中的每个代码变更动作可以通过变更操作和变更对象进行标识。其中,变更操作包括add、delete、change或者move,变更对象可以是变量或者函数等等,具体包括变量声明、函数定义。其中,上述变量声明、函数定义分别对应于语法解析树(abstract syntax tree,AST)中的一个AST节点类型。
分类装置100可以将四种变更操作中的每种操作对应n个AST节点类型,按照顺序平铺成一个一维向量。若本次提交中add了某个AST节点类型,则将一维向量中的对应元素的元素值加1。对于四种操作中均不出现的AST节点类型,则将一维向量中的对应元素的元素值置0。如此,即可获得代码变更组对应的特征向量。
在一些可能的实现方式中,分类装置100在提取特征向量时不仅可以从微观方面考虑 细粒度的代码变更动作,还可以从宏观方面考虑代码变更的统计特征,如代码变更组中代码变更块数目,代码变更组中涉及定义与使用关系的代码变更块的数目,代码变更组中涉及相似变更关系的代码变更块的数目等等。具体地,分类装置100可以在上述一维向量的基础上增加若干元素,并为这些元素赋值为上述统计值。如此,可以提取更为全面的特征向量,给分类模型提供更多的信息,提高分类模型分类的准确度。
S322:分类装置100根据所述特征向量对所述至少一个代码变更组的提交类型进行预测。
在实际应用时,分类装置100可以利用提交类型预测器对至少一个代码变更组的提交类型进行预测。提交类型预测器以特征向量为输入,以代码变更组的提交类型为输出。因此,分类装置100将至少一个代码变更组的特征向量输入提交类型预测器,提交类型预测器可以对至少一个代码变更组的提交类型进行预测,并输出预测结果。
其中,提交类型预测器可以基于预先采集的训练样本,采用机器学习方法进行模型训练生成。一个训练样本包括一个历史提交消息中包括的代码片段提取的特征向量和提交类型标签。从历史提交消息中包括的代码片段提取特征向量的具体实现可以参见S320相关内容描述,在此不再赘述。提交类型标签可以基于提交日志(commit log)确定。
具体地,提交日志的关键词包括提交类型。分类装置100通过分析提交日志,自动生成提交类型标签。提交类型标签可以采用不同数字进行标识。例如,bug fix可以采用数字00标识,refactor可以采用数字01标识。分类装置100将上述特征向量和提交类型标签进行拼接即可生成训练样本。
由于训练样本的向量维度较大,多个训练样本构成了一个稀疏的矩阵,因此分类装置100可以采用降维算法去除矩阵的稀疏性。最后,分类装置100采用机器学习算法,例如xgboost对去除稀疏性的矩阵进行模型训练得到提交类型预测器。
S324:分类装置100从模板库中获取至少一个代码变更组对应的提交模板。
分类装置100可以对代码仓库中存储的各种提交类型的历史提交消息进行挖掘,从而获得各种提交类型的提交消息的通用模板,该通用模板可以作为提交类型对应的提交模板。表1示出了提交模板的示例:
表1提交模板示例
bug fix:fix an XXX issue
refactor:change XXX
new feature:add/Wrap XXX for XXX
reformat:change XXX(eg.spaces)
其中,表1中的XXX是提交模板中的待填充内容,其可以由代码变更组中提取的代码特征确定。
分类装置100可以查询模板库,从模板库中获取至少一个代码变更组的提交类型相匹配的提交模板,作为该代码变更组对应的提交模板。
S326:分类装置100向代码处理装置200发送分类响应消息。
该分类响应消息中包括至少一个代码变更组对应的提交模板。
S328:代码处理装置200根据至少一个代码变更组及其对应的提交模板生成提交消息,以便基于该提交消息执行代码提交操作。
具体地,代码处理装置200可以针对至少一个代码变更组提取代码特征,基于代码特征确定关键词,将关键词填充至该代码变更组对应的提交模板中,生成提交消息。基于该提交消息可以触发代码提交操作。
在一些可能的实现方式中,代码处理装置200还可以展示提交消息,以便用户确认提交消息的正确性。当用户确认提交消息不正确时,代码处理装置200支持用户对提交消息如提交消息中的关键词进行修改,从而实现通过用户交互的方式确保提交消息的正确性。
上述S308至S316为本申请实施例为了进一步提高分组准确度而实施的可选步骤,在一些可能的实现方式中,也可以不执行上述步骤。上述S318至S328为本申请实施例为了生成提交消息而实施的可选步骤,在一些可能的实现方式中,也可以采用其他方式实现。
基于上述内容描述,本申请实施例提供了一种代码处理方法。该方法以代码变更块作为分组粒度进行分组,而不是从粒度比较粗的文件级别的变化进行分组,也不是从粒度过细的每个代码特征的变换进行分组,避免了粒度过大导致分组不准确,以及粒度过细导致分组太多的问题。该方法还可以综合考虑代码特征块之间的多种关联关系,如变量定义与使用关系、函数定义与调用关系、间接使用关系,或是相似变更关系,为代码分组提供了更为丰富且全面的信息,从而提高分组的准确度。
进一步地,该方法还结合编译提示信息动态调整分组结果,保证调整后的代码变更组编译通过,进一步提高了代码分组的准确度。此外,该方法还支持通过用户交互方式调整分组结果,根据用户的调整设置自适应调整分组结果,从而获得优化的分组结果。
该方法还支持基于上述分组结果确定提交类型,并结合与提交类型对应的提交模板自动生成提交消息。一方面,针对不同代码变更组分别生成对应的提交消息,避免了一个提交消息混杂多个变更任务,导致的变更意图难以理解,影响软件开发的后续流程,进而影响软件开发效率的问题。另一方面,可以改善使用长短期记忆网络(long short-term memory,LSTM)生成提交消息导致的准确度不高、泛化能力不强的问题。
上文中结合图1至图4,详细描述了本申请所提供的代码处理方法,下面将结合附图,描述根据本申请所提供的装置和设备。
参见图2所示的代码处理装置200的结构示意图,该代码处理装置200包括通信模块202、分析模块204和分组模块206。
通信模块202,用于获取当前版本的代码文件相对于基础版本的代码文件的代码变更文件,所述代码变更文件包括多个代码变更块。
分析模块204,用于对所述多个代码变更块进行分析,获得所述多个代码变更块之间的至少一种关联关系。
分组模块206,用于利用所述至少一种关联关系对所述多个代码变更块进行分组得到N个代码变更组,以便根据所述N个代码变更组分别执行代码提交操作,所述N为正整数。
在一些可能的实现方式中,所述装置200还包括:
调整模块208,用于根据第一代码变更组的代码变更块包括的当前版本的代码片段进 行代码编译,得到编译提示信息,根据所述编译提示信息调整所述第一代码变更组。
在一些可能的实现方式中,所述通信模块202还用于:
接收用户通过分组界面对第二代码变更组的调整设置;
所述装置200还包括:
调整模块208,用于根据所述调整设置调整所述第二代码变更组的代码变更块。
在一些可能的实现方式中,所述对第二代码变更组的调整设置包括对第二代码变更组中目标代码变更块的调整设置;
所述调整模块208具体用于:
根据所述调整设置调整所述目标代码变更块;
从所述第二代码变更组中确定所述目标代码变更块的关联代码变更块;
根据所述调整设置调整所述关联代码变更块。
在一些可能的实现方式中,所述通信模块202还用于:
获取至少一个代码变更组对应的提交模板;
所述装置200还包括:
提交模块210,用于根据所述至少一个代码变更组及其对应的提交模版生成提交消息,以便根据所述提交消息执行所述代码提交操作。
在一些可能的实现方式中,所述分析模块204具体用于:
获取所述多个代码变更块中各代码变更块的代码变更动作集合;
解析所述代码变更动作集合,获得所述各代码变更块对应的所述当前版本和所述基础版本的元素定义集和元素使用集;
根据所述各代码变更块对应的所述当前版本和所述基础版本的元素定义集和元素使用集,获取所述多个代码变更块之间的元素定义与使用关系和元素间接使用关系中的至少一种。
根据本申请实施例代码处理装置200可对应于执行本申请实施例中描述的方法,并且代码处理装置200的各个模块的上述和其它操作和/或功能分别为了实现图3中的各个方法的相应流程,为了简洁,在此不再赘述。
上述代码处理装置200可以通过计算机实现。图5提供了一种计算机的结构示意图,如图5所示,计算机具体可以用于实现上述图2所示实施例中代码处理装置200的功能。计算机500包括总线501、处理器502、通信接口503和存储器504。处理器502、存储器504和通信接口503之间通过总线501通信。总线501可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图5中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。通信接口503用于与外部通信。例如,通信接口503可以用于获取当前版本的代码文件相对于基础版本的代码文件的代码变更文件等。
其中,处理器502可以为中央处理器(central processing unit,CPU)。存储器504可以包括易失性存储器(volatile memory),例如随机存取存储器(random access memory,RAM)。存 储器504还可以包括非易失性存储器(non-volatile memory),例如只读存储器(read-only memory,ROM),快闪存储器,HDD或SSD。
存储器504中存储有可执行代码,处理器502执行该可执行代码以执行前述代码处理方法。
具体地,在实现图2所示实施例的情况下,且图2实施例中所描述的代码处理装置200的各模块为通过软件实现的情况下,执行图2中的分析模块204、分组模块206、调整模块208和提交模块210功能所需的软件或程序代码存储在存储器504中。通信模块202功能可以通过通信接口503实现。通信接口503获取代码变更文件,将其通过总线501传输至处理器502,处理器502执行存储器504中存储的各模块对应的程序代码,执行代码处理方法,实现对代码变更文件中多个代码变更块进行智能分组。
本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质包括指令,当其在计算机上运行时,使得计算机执行上述应用于代码处理装置200的代码处理方法。
本申请实施例还提供了一种计算机程序产品,所述计算机程序产品被计算机执行时,所述计算机执行前述代码处理方法的任一方法。该计算机程序产品可以为一个软件安装包,在需要使用前述代码处理方法的任一方法的情况下,可以下载该计算机程序产品并在计算机上执行该计算机程序产品。
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、ROM、RAM、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,训练设备,或者网络设备等)执行本申请各个实施例所述的方法。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传 输,例如,所述计算机指令可以从一个网站站点、计算机、训练设备或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、训练设备或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的训练设备、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。

Claims (14)

  1. 一种代码处理方法,其特征在于,所述方法包括:
    获取当前版本的代码文件相对于基础版本的代码文件的代码变更文件,所述代码变更文件包括多个代码变更块;
    对所述多个代码变更块进行分析,获得所述多个代码变更块之间的至少一种关联关系;
    利用所述至少一种关联关系对所述多个代码变更块进行分组得到N个代码变更组,以便根据所述N个代码变更组分别执行代码提交操作,所述N为正整数。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    根据第一代码变更组的代码变更块包括的当前版本的代码片段进行代码编译,得到编译提示信息;
    根据所述编译提示信息调整所述第一代码变更组。
  3. 根据权利要求1或2所述的方法,其特征在于,所述方法还包括:
    接收用户通过分组界面对第二代码变更组的调整设置;
    根据所述调整设置调整所述第二代码变更组的代码变更块。
  4. 根据权利要求3所述的方法,其特征在于,所述对第二代码变更组的调整设置包括对第二代码变更组中目标代码变更块的调整设置;
    所述根据所述调整设置调整所述第二代码变更组的代码变更块,包括:
    根据所述调整设置调整所述目标代码变更块;
    从所述第二代码变更组中确定所述目标代码变更块的关联代码变更块;
    根据所述调整设置调整所述关联代码变更块。
  5. 根据权利要求1至4任意一项所述的方法,其特征在于,所述方法还包括:
    获取至少一个代码变更组对应的提交模板;
    根据所述至少一个代码变更组及其对应的提交模版生成提交消息,以便根据所述提交消息执行所述代码提交操作。
  6. 根据权利要求1至5任意一项所述的方法,其特征在于,所述对所述多个代码变更块进行分析,获得所述多个代码变更块之间的至少一种关联关系,包括:
    获取所述多个代码变更块中各代码变更块的代码变更动作集合;
    解析所述代码变更动作集合,获得所述各代码变更块对应的所述当前版本和所述基础版本的元素定义集和元素使用集;
    根据所述各代码变更块对应的所述当前版本和所述基础版本的元素定义集和元素使用集,获取所述多个代码变更块之间的元素定义与使用关系和元素间接使用关系中的至少一种。
  7. 一种代码处理装置,其特征在于,所述装置包括:
    通信模块,用于获取当前版本的代码文件相对于基础版本的代码文件的代码变更文件,所述代码变更文件包括多个代码变更块;
    分析模块,用于对所述多个代码变更块进行分析,获得所述多个代码变更块之间的至少一种关联关系;
    分组模块,用于利用所述至少一种关联关系对所述多个代码变更块进行分组得到N个 代码变更组,以便根据所述N个代码变更组分别执行代码提交操作,所述N为正整数。
  8. 根据权利要求7所述的装置,其特征在于,所述装置还包括:
    调整模块,用于根据第一代码变更组的代码变更块包括的当前版本的代码片段进行代码编译,得到编译提示信息,根据所述编译提示信息调整所述第一代码变更组。
  9. 根据权利要求7或8所述的装置,其特征在于,所述通信模块还用于:
    接收用户通过分组界面对第二代码变更组的调整设置;
    所述装置还包括:
    调整模块,用于根据所述调整设置调整所述第二代码变更组的代码变更块。
  10. 根据权利要求9所述的装置,其特征在于,所述对第二代码变更组的调整设置包括对第二代码变更组中目标代码变更块的调整设置;
    所述调整模块具体用于:
    根据所述调整设置调整所述目标代码变更块;
    从所述第二代码变更组中确定所述目标代码变更块的关联代码变更块;
    根据所述调整设置调整所述关联代码变更块。
  11. 根据权利要求7至10任意一项所述的装置,其特征在于,所述通信模块还用于:
    获取至少一个代码变更组对应的提交模板;
    所述装置还包括:
    提交模块,用于根据所述至少一个代码变更组及其对应的提交模版生成提交消息,以便根据所述提交消息执行所述代码提交操作。
  12. 根据权利要求7至11任意一项所述的装置,其特征在于,所述分析模块具体用于:
    获取所述多个代码变更块中各代码变更块的代码变更动作集合;
    解析所述代码变更动作集合,获得所述各代码变更块对应的所述当前版本和所述基础版本的元素定义集和元素使用集;
    根据所述各代码变更块对应的所述当前版本和所述基础版本的元素定义集和元素使用集,获取所述多个代码变更块之间的元素定义与使用关系和元素间接使用关系中的至少一种。
  13. 一种计算机,其特征在于,所述计算机包括处理器和存储器;
    所述计算机的处理器用于执行所述存储器中存储的指令,以使得所述计算机执行如权利要求1至6任一项所述的代码处理方法。
  14. 一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得所述计算机执行如权利要求1至6中任一项所述的代码处理方法。
PCT/CN2020/112767 2020-02-25 2020-09-01 一种代码处理方法、装置、设备及介质 WO2021169227A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010118173.9A CN113377431A (zh) 2020-02-25 2020-02-25 一种代码处理方法、装置、设备及介质
CN202010118173.9 2020-02-25

Publications (1)

Publication Number Publication Date
WO2021169227A1 true WO2021169227A1 (zh) 2021-09-02

Family

ID=77489853

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/112767 WO2021169227A1 (zh) 2020-02-25 2020-09-01 一种代码处理方法、装置、设备及介质

Country Status (2)

Country Link
CN (1) CN113377431A (zh)
WO (1) WO2021169227A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113760300A (zh) * 2021-09-13 2021-12-07 武汉联影智融医疗科技有限公司 软件持续集成方法、装置、计算机设备和存储介质
CN114064472A (zh) * 2021-11-12 2022-02-18 天津大学 基于代码表示的软件缺陷自动修复加速方法
CN115904480A (zh) * 2023-01-09 2023-04-04 成方金融科技有限公司 代码重构方法、装置、电子设备及存储介质
CN117492822A (zh) * 2023-12-29 2024-02-02 杭州新中大科技股份有限公司 变更对比方法、装置、电子设备及存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190026106A1 (en) * 2017-07-20 2019-01-24 Ca, Inc. Associating software issue reports with changes to code
CN109947462A (zh) * 2019-03-15 2019-06-28 武汉大学 一种面向软件代码变更集成的决策支持方法及装置
CN110262966A (zh) * 2019-06-03 2019-09-20 深圳前海微众银行股份有限公司 一种覆盖信息获取方法及装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190026106A1 (en) * 2017-07-20 2019-01-24 Ca, Inc. Associating software issue reports with changes to code
CN109947462A (zh) * 2019-03-15 2019-06-28 武汉大学 一种面向软件代码变更集成的决策支持方法及装置
CN110262966A (zh) * 2019-06-03 2019-09-20 深圳前海微众银行股份有限公司 一种覆盖信息获取方法及装置

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113760300A (zh) * 2021-09-13 2021-12-07 武汉联影智融医疗科技有限公司 软件持续集成方法、装置、计算机设备和存储介质
CN113760300B (zh) * 2021-09-13 2023-10-27 武汉联影智融医疗科技有限公司 软件持续集成方法、装置、计算机设备和存储介质
CN114064472A (zh) * 2021-11-12 2022-02-18 天津大学 基于代码表示的软件缺陷自动修复加速方法
CN114064472B (zh) * 2021-11-12 2024-04-09 天津大学 基于代码表示的软件缺陷自动修复加速方法
CN115904480A (zh) * 2023-01-09 2023-04-04 成方金融科技有限公司 代码重构方法、装置、电子设备及存储介质
CN117492822A (zh) * 2023-12-29 2024-02-02 杭州新中大科技股份有限公司 变更对比方法、装置、电子设备及存储介质
CN117492822B (zh) * 2023-12-29 2024-03-29 杭州新中大科技股份有限公司 变更对比方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN113377431A (zh) 2021-09-10

Similar Documents

Publication Publication Date Title
WO2021169227A1 (zh) 一种代码处理方法、装置、设备及介质
US10120788B2 (en) Cloud connected automated testing in multiple operating environments using multiple parallel test threads
US9798648B2 (en) Transitive source code violation matching and attribution
US10649836B2 (en) Detecting an error message and automatically presenting links to relevant solution pages
US11429365B2 (en) Systems and methods for automated retrofitting of customized code objects
US11593342B2 (en) Systems and methods for database orientation transformation
JP6911059B2 (ja) Cpu利用およびコードリファクタリングのためのクエリオプティマイザー
US8898635B2 (en) System and method for automatic impact variable analysis and field expansion in mainframe systems
US8364696B2 (en) Efficient incremental parsing of context sensitive programming languages
US9311077B2 (en) Identification of code changes using language syntax and changeset data
US9342784B1 (en) Rule based module for analyzing computing environments
US11422783B2 (en) Auto-deployment of applications
US9851944B2 (en) Operation search method and operation search apparatus
US20180293160A1 (en) Comparing software projects having been analyzed using different criteria
CN106484389B (zh) 动作流分段管理
CN115543781A (zh) 汽车软件模型自动化验证的方法及交互系统
CN110716804A (zh) 无用资源的自动删除方法、装置、存储介质及电子设备
Lavoie et al. A case study of TTCN-3 test scripts clone analysis in an industrial telecommunication setting
WO2021011117A1 (en) Detecting misconfiguration and/or bug(s) in large service(s) using correlated change analysis
WO2024031983A1 (zh) 一种代码管理方法及相关设备
US20220374333A1 (en) Automated classification of defective code from bug tracking tool data
JP2023031614A (ja) 変更度計測装置、方法及びプログラム
CN111736840A (zh) 小程序应用的编译方法、运行方法、存储介质及电子设备
CN116339742A (zh) 获取函数的方法、装置及存储介质
CN117632224A (zh) 一种代码管理方法及相关设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20921794

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20921794

Country of ref document: EP

Kind code of ref document: A1