CN115098109A

CN115098109A - Directed graph-based code warehouse code block level conflict sorting and grouping method

Info

Publication number: CN115098109A
Application number: CN202210726119.1A
Authority: CN
Inventors: 张卫丰; 张传忠; 周国强; 张迎周; 王子元
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2022-06-24
Filing date: 2022-06-24
Publication date: 2022-09-23

Abstract

A code warehouse code block level conflict sorting and grouping method based on a directed graph comprises the steps of firstly, extracting information of a file list with conflicts, method names in conflict files and conflict code segments for storage; then constructing a dependency graph for all methods in the conflict files on different branches; secondly, comparing nodes in the dependency graph, and merging the dependency graphs constructed on different branches; traversing the dependency graph, comparing the full path names of the files, the method names and the line numbers of the start and the end of the method carried by the nodes with storage, and storing the full path names and the method names of the conflict files meeting the conditions from large to small according to the node sequence numbers; and finally, distributing the conflict methods in the same dependency graph into the same group. The method can effectively sort the code block levels of the current code warehouse, and effectively group the conflicts of the code block levels, thereby helping developers to orderly solve the conflicts.

Description

Directed graph-based code warehouse code block level conflict sorting and grouping method

Technical Field

The invention belongs to the technical field of computers, particularly relates to the technical field of software, and particularly relates to a code warehouse code block level conflict sorting and grouping method based on a directed graph.

Background

In modern software development, developers rely on versioning systems like Git to collaborate in branch-based development work. One drawback of this mode of operation is that when contributions from different developers are combined, conflicts can occur, reducing the efficiency of collaboration, and introducing potential vulnerabilities, which can worsen as reformulations are prevalent in software development and a large number of conflicts arise.

However, a common problem in the conflict merge process is: for unstructured merging, the codes are regarded as plain text, so that conflicts can be reported as long as developers modify the codes of the same line, and a large number of false alarms exist in the case; for structured merging, it reduces some false positives. However, for any merging method, they can solve a large number of conflicts together at one time, which can consume a lot of time, especially in the case of a large number of conflicts and a large number of code lines. This is a major challenge faced when merging branches of a code warehouse.

An effective solution to these problems is to incorporate all the conflicts at the code block level into the same group, so that after the conflict resolution of the same group is completed, the problem of the individual module can be compiled first when all the conflicts are resolved, and the whole project does not need to be compiled after all the conflicts are resolved. The function of sorting the conflicts at the code block level first and then solving the conflicts should accurately sort the conflicts at the code block level and bring the conflicts at the code block level in the same group by analyzing the full path names and method information of the nodes in the graph and sorting the sequence numbers, thereby helping developers to orderly solve the conflicts.

However, because the methods in the files of the code repository have dependency relationships, the above solutions have a certain disadvantage in that the dependency relationships of the methods in the files are complex, the actually obtained effect is not good enough, and the efficiency of the conflict resolution process is low.

Disclosure of Invention

The main work of the invention is to provide a method for sorting and grouping code block level conflicts occurring inside a file based on a directed graph. Firstly, the invention focuses on the precedence of the code block level conflict resolution in the code repository and the grouping of the code block level conflicts, and firstly, the conversion from the file to the directed dependency graph is completed by the method in all the files on different branches in the code repository. Second, the graphs on different branches are merged. And then traversing the nodes in the graph by using a graph traversal algorithm, and sequencing the conflicts of the code block levels according to the sequence of the nodes accessed. After sorting is complete, conflicts at the code block level are assigned to different groups depending on whether they occur in the same graph.

A code warehouse code block level conflict sorting grouping method based on a directed graph comprises the following steps:

step 1, extracting all methods in project conflict files in a code warehouse;

step 2, constructing a dependency graph according to the dependency among the methods, wherein a graph is constructed on each branch in the code warehouse;

step 3, constructing a new graph according to the dependency graph, wherein the nodes in the graph comprise out-degree, in-degree, full file path names, sequence numbers, method names and line numbers for starting and ending the methods;

step 4, acquiring conflict information aiming at the merging scene in the code warehouse;

step 5, sequencing each node in the graph, traversing the nodes in the graph for screening, and ensuring the uniqueness of each node;

step 6, merging the graphs on different branches and then matching to obtain a merged and matched graph;

step 7, traversing the merged graph, and storing the full path names and the names of the conflict methods of the conflict files according to the node sequence numbers from large to small according to the sizes of the sequence numbers in the nodes, thereby achieving the sequencing effect;

and 8, traversing the merged graphs, and dividing the collision methods appearing in the same graph into the same group to finish the final sequencing grouping.

Further, in step 1, a list of files where a conflict occurs, names of all methods inside the conflicting files, and line numbers of the beginning and the end of the conflicting code segment are extracted.

Further, in step 2, a file list where a conflict occurs, all method names inside the conflict file, and line numbers of the beginning and the end of the code segment where the conflict occurs are extracted.

Further, in step 3, in the constructed new graph, the sequence number is initially zero, the in-degree indicates the number of methods depending on the method, the out-degree indicates the number of methods depending on other methods, the edge in the graph indicates that there is a dependency relationship between two methods, and the directed edge points from one method to the method depending on the directed edge.

Further, in step 5, nodes with zero in-degree are traversed, and every time a point is reached, the in-degree of the node is incremented, and the sequence number is incremented, if the nodes with zero in-degree start to reach a certain node and the number of edges reaching the node is different, that is, the sequence number of the node has different sizes, a larger discard is selected to be reserved, and a smaller discard is selected to ensure the uniqueness of the sequence number of each node.

Further, in step 6, merging the dependency graphs on different branches into a graph, traversing from a node with zero in-degree for matching, wherein the condition that matching succeeds is that the full path name and the method name of the file carried by the node are the same as the full path name and the method name of the file in the conflict file list, a larger sequence number in the node is reserved when matching succeeds, and for a node which does not succeed in matching, it is required to insert the node into the merged graph according to the position of the node relative to other nodes in the graph on the branch where the node is located.

Further, in step 7, traversing the merged graph, and when the file full path name and the method name in the node are the same as those in the conflict file list, and the line numbers of the beginning and the end of the conflict code segment are within the range of the line numbers of the beginning and the end of the method, saving the file full path name and the method name in the node according to the size of the serial number of the node from large to small.

The invention achieves the following beneficial effects: the conflict processing method and the conflict processing system can sequence conflict under the condition of a large number of conflicts, recommend an effective conflict solving sequence to developers, improve the efficiency of solving conflict problems, avoid repeated modification caused by the wrong processing sequence of two conflict blocks with dependency relationship, save time, and perform grouping so that the conflict processing of a certain functional module can be completed and the conflict processing can be compiled in advance, thereby improving the efficiency of solving the problems.

Drawings

FIG. 1 is a schematic flow chart illustrating a method for constructing a dependency graph for a file according to an embodiment of the present invention.

Fig. 2 is a schematic diagram illustrating a process of sorting code block level conflicts according to the present invention.

FIG. 3 is a diagram illustrating a conflict sorting result generation based on a dependency graph according to an embodiment of the present invention.

Detailed Description

The technical scheme of the invention is further explained in detail by combining the drawings in the specification.

The invention provides a code warehouse code block level conflict sorting and grouping method based on a directed graph, which specifically comprises the following steps:

step 1, extracting a file list with conflict, all method names in a conflict file and line numbers of the beginning and the end of a code segment with conflict from the items of a code warehouse.

And 2, analyzing the file internal method of each branch in the code warehouse, which is conflicted, according to the dependency in the imported file in the Java file and the pom file, wherein the imported file in the Java file is the file which is depended on by the file, constructing a dependency relationship diagram, each node in the diagram represents a method, and the diagram comprises a file path and a method name.

Step 3, constructing a new graph according to the dependency graph in step 2, because the analysis dependency and the analysis conflict code block are not processed together, and in addition, in order to facilitate the matching of the collected conflict information and the nodes, a graph is constructed by establishing a new node and traversing the old graph according to the sequence of the accessed nodes in the old graph and the edges between the nodes, wherein the new graph and the old graph have the same structure, except that the nodes carry more information, the nodes in the graph comprise out-degree, in-degree, full path name of the file, sequence number, name of the method, line numbers of the method start and end, and line numbers of the conflict code segments, the sequence number is initially zero, the in-degree represents the number of methods depending on the method, the out-degree represents the number of methods depending on other methods, the edges in the graph represent that a dependency exists between two methods, because one method depends on a plurality of methods, there will be multiple directed edges pointing to this node, and there will also be multiple directed edges pointing from one node to different nodes.

And 4, acquiring conflict information aiming at the merging scene in the code warehouse, wherein the conflict information comprises the path of the file with the conflict and the information of the method with the conflict, and storing the information of all the methods with the conflict and the information of the file in which the method is positioned in the file.

And 5, sequencing each node in the graph, wherein in a project, the method finally depends on some basic methods, and a node with zero in degree indicates that no method depends on the node, namely the methods do not depend on other methods any more, so that the dependency graph formed by all the methods certainly has nodes with zero in degree, and more than one point with zero in degree is required to traverse from the nodes with zero in degree, the degree of the node is increased by one and the sequence number is increased by one when reaching one point, if the number of edges reaching the node from different nodes with zero in degree is different, namely the sequence numbers of the node have different sizes, and the node is selected to retain a larger value and a smaller value to ensure the uniqueness of the sequence number of each node.

And 6, merging the graphs of each branch in the code warehouse, matching according to the full path name and the method name of the file carried in the node, and keeping the successful matching.

And 7, traversing the combined graph to sort, comparing the full path name, the method name and the line number of the beginning and the ending of the conflict code segment of the file where the conflict method is collected in the step 1 with the full path name, the method name and the line number of the beginning and the ending of the method of the file carried in the node, and collecting the full path name and the method information of the successfully matched file into the file from large to small according to the node sequence number.

Step 8, grouping the conflicts at the code block level, wherein one project comprises a plurality of modules, each module has different functions, and files in the two modules appear in the same graph under the condition that the modules depend on each other; for the condition that no dependency exists between the modules, the files of the two modules exist in different graphs, so that one project has a plurality of graphs, each graph is traversed, and the conflict files existing in each graph are divided into the same group.

In step 1, a file list with conflicts, the full path name of the file, code segments with conflicts and line numbers of the beginning and the end of the code segments with conflicts are extracted from the items, and the information is stored in the file and used as input in the subsequent step of sequencing conflict methods.

In step 2, the nodes of the dependency graph represent a method; the edge in the graph indicates that a dependency relationship exists between the two methods, the edge points to the node on which the edge depends from one node, one node can have a plurality of edges to point to the node according to the dependency relationship, and a plurality of edges can also point to other nodes from the node.

In step 6, the dependency graphs on different branches are merged, comparison is carried out from nodes with zero in-degree, the condition of node matching is that the full path names of files carried by the nodes are the same and the names of methods are the same, if the node matching is successful, the node with a larger sequence number is copied as a new node, and if the node matching is unsuccessful, the node is inserted into the merged graph according to the relative position of the node and other nodes in the graph on the branch where the node is located.

In step 7, the graph merged in step 6 is traversed, starting from the node with zero in-degree, the information carried by each node is compared with the full path name, the method name and the conflict code segment information of the file where the conflict of the code block level collected in step 1 is located, if the full path name and the method name of the file carried by the node exist in the conflict file list, and the line numbers of the beginning and the ending of the conflict code segment stored in the conflict file list are within the range of the line numbers of the beginning and the ending of the method carried by the node, the node is stored into a set, the storage sequence is stored according to the sequence number of the node from large to small, the larger the node sequence number indicates that the conflict of the code block level should be processed earlier, and the smaller the node sequence number indicates that the conflict of the code block level should be processed later.

In step 8, grouping the conflicts at the code block level, where the grouping is performed for the purpose of solving the conflict of one group, and a single module may be compiled first, and a project includes multiple modules, and there are two situations between the modules, where there are dependency relations and no dependency relations, so that a project constructs a graph with multiple dependency relations, creates multiple groups, and traverses each graph, and adds the conflicts at the code block level appearing in the same graph to the same group to complete the grouping.

The work and contributions of the present invention are as follows:

1. and acquiring a conflict file list, all method names in the conflict file and the line numbers of the beginning and the end of the code segment where the conflict occurs.

2. And constructing a dependency graph among the methods, wherein in the step, for a plurality of branches in the code warehouse, each branch needs to construct the dependency graph, because method bodies in files on the branches are different, the file dependency graphs constructed by different branches are different, and nodes in the graph represent one method in the file.

3. And constructing a new graph according to the dependency graph, wherein nodes in the graph comprise out-degree, in-degree, file full path names, sequence numbers, method names, line numbers of method starts and ends, line numbers of conflict code segments, the in-degree represents the number of the methods depending on the method, the out-degree represents the number of the methods depending on other methods, and edges in the graph represent that the dependency exists between the two methods.

4. Merging directed graphs on different branches, merging graphs of each branch in the code warehouse, matching according to the full path name and the method name of the file carried in the node, keeping the matching success, and if the node which is not successfully matched exists, inserting the node into the merged graph according to the position of the node relative to other nodes in the graph before merging.

5. Sorting conflicts at the code block level, traversing a directed graph, comparing the collected full path names, method names and the line numbers of the beginning and the end of the conflict code segments with the full path names, the method names and the line numbers of the beginning and the end of the files carried in the nodes, if the full path names and the method names of the files are the same and the line numbers of the beginning and the end of the conflict code segments are within the range of the line numbers of the beginning and the end of the method in the nodes, storing the full path names and the method information of the conflict files from large to small according to the size sequence of the sequence numbers in the nodes, wherein the larger the sequence number of the node is, the more the conflicts at the code block level are processed, and the smaller the sequence number is, the more the conflicts at the code block level are processed.

6. Grouping the conflicts at the code block level, wherein the grouping aims to solve the conflict of one group, an independent module can be compiled first, one java item comprises a plurality of modules, and the modules have two conditions, namely, a dependency relationship exists, the modules jointly form a dependency relationship diagram, and a dependency relationship does not exist.

The above description is only a preferred embodiment of the present invention, and the scope of the present invention is not limited to the above embodiment, but equivalent modifications or changes made by those skilled in the art according to the disclosure of the present invention should be included in the scope of the present invention as set forth in the appended claims.

Claims

1. A code warehouse code block level conflict sorting grouping method based on a directed graph is characterized in that: the method comprises the following steps:

step 1, extracting all methods in project conflict files in a code warehouse;

2. The directed graph-based code warehouse code block level collision ordering grouping method according to claim 1, wherein: in step 1, extracting a file list with conflict, all method names in the conflict file and line numbers of the beginning and the end of the code segment with conflict.

3. The directed graph-based code warehouse code block level collision ordering grouping method according to claim 1, wherein: in step 2, extracting a file list with conflict, all method names in the conflict file and line numbers of the beginning and the end of the code segment with conflict.

4. The directed graph-based code warehouse code block level collision ordering grouping method according to claim 1, wherein: in step 3, in the constructed new graph, the sequence number is initially zero, the in-degree indicates the number of the methods depending on the method, the out-degree indicates the number of the methods depending on other methods, the edge in the graph indicates that a dependency relationship exists between two methods, and the directed edge points to the method depending on the directed edge from one method.

5. The directed graph-based code warehouse code block level collision sorting grouping method of claim 1, wherein: in step 5, nodes with zero in-degree are traversed, and when each node is reached, the in-degree of the node is increased by one, and the sequence number is also increased by one, if the nodes with zero in-degree from different arrive at a certain node and the number of edges reaching the node is different, that is, the sequence number of the node has different sizes, the node is selected to keep larger abandon smaller to ensure the uniqueness of the sequence number of each node.

6. The directed graph-based code warehouse code block level collision ordering grouping method according to claim 1, wherein: and 6, merging the dependency graphs on different branches into a graph, traversing from a node with zero in-degree for matching, wherein the condition of successful matching is that the full path name and the method name of the file carried by the node are the same as those of the file in the conflict file list, a larger sequence number in the node is reserved when the matching is successful, and the content of the file on the different branches is newly added or deleted for the node which is not successfully matched, so that the node needs to be inserted into the merged graph according to the position of the node relative to other nodes in the graph on the branch where the node is located.

7. The directed graph-based code warehouse code block level collision sorting grouping method of claim 1, wherein: and 7, traversing the merged graph, and saving the file full path names and the method names in the nodes according to the sequence numbers of the nodes from large to small when the file full path names and the method names in the nodes are the same as the file full path names and the method names in the conflict file list and the line numbers of the beginning and the end of the conflict code segments are within the range of the line numbers of the beginning and the end of the method.