CN115168085A

CN115168085A - Repetitive conflict scheme detection method based on diff code block matching

Info

Publication number: CN115168085A
Application number: CN202210719649.3A
Authority: CN
Inventors: 张卫丰; 李贺彬; 周国强; 张迎周; 王子元
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2022-06-23
Filing date: 2022-06-23
Publication date: 2022-10-11

Abstract

A repeated conflict scheme detection method based on diff code block matching is used for extracting relevant merging solutions for a database with a large number of historical merging codes, such as actual scenes that the database migrates from one code warehouse to another code warehouse or similar updating exists in one code warehouse. Secondly, the solution of the acquired history merging conflict is stored in a database. Then, when the existing conflict is encountered, the recommendation of the merging scheme is carried out by utilizing the saved conflict solution. And finally, updating the conflict in the database based on the conflict solved by the historical information.

Description

Repetitive conflict scheme detection method based on diff code block matching

Technical Field

The invention belongs to the technical field of computers, particularly relates to the technical field of software, and particularly relates to a repetitive conflict scheme detection method based on diff code block matching.

Background

With the wider application of internet technology and the more and more change of software industry, the demands of remote cooperative work, use of software code warehouse and conflict resolution are increasing. In the collaborative development work, there are two basic forms, distributed and centralized. And Git is the most advanced distributed version control system in the world at present as a typical software code warehouse. The biggest difference between distributed versus centralized is that developers can submit to the local, each copying a complete Git repository on the local machine by cloning. Git is a version control tool for Linux kernel development. Different from the common version control tools CVS, subversion and the like, the method adopts a distributed version library mode without the support of server-side software, so that the release and the communication of the source code are very convenient. Git is fast, which is naturally important for large projects such as Linux kernel. Git is the most outstanding of its merged tracking capabilities.

However, in the collaborative work, there are different people who perform the collaborative work and branch, and when the Git version is used to control the project submitting flow, merging errors, i.e. conflicts, may occur when merging different work branches. Although developers can solve some simple merging conflicts through method calling in Git, the solution of complex conflicts can only be manually solved by the developers, and moreover, when large-scale open source projects are merged, the number of conflicts can be rapidly increased, the workload of the developers is greatly increased, and the method is a main challenge in Git collaborative work.

One effective solution to these problems is to provide the functionality of a conflicting solution in a software integrated development environment. The prompt function is to analyze the history conflict solved by the developer before, and for the conflict faced at present, the prompt function can quickly match the history conflict information and provide a prompt for solving the conflict scheme for the developer, so that the development efficiency is improved. However, because the concrete representation form of the merging scheme in the Git version control is too single, the current open source framework and IDE can not provide help for solving complex conflicts.

Disclosure of Invention

The invention mainly aims to research a conflict which can provide a conflict past solution, particularly a conflict with similarity to historical conflict information, and provides a repeated conflict scheme detection method based on diff code block matching.

The method mainly comprises the steps of acquiring conflict information of the merging nodes from historical information and storing a historical merging conflict scheme into a database. When similar conflicts are encountered again, the conflict and recommendation conflict resolution scheme and the conflict saving scheme can be quickly searched. The model can be used for matching recommendation of the existing solution of the merging conflict problem. Firstly, the method focuses on solution recommendation of merging problems in continuous integration, and for a database with a large number of historical merging codes, for example, for a practical scene that the database migrates from one code warehouse to another code warehouse or a code warehouse has similar updates, and the like, a large amount of code warehouse information which solves good merging problems is utilized to extract relevant merging solutions. Secondly, the solution of the acquired history merging conflict is stored in a database. And then, when the existing conflict is met, recommending the merging scheme by using the saved merging conflict scheme. And finally, updating the update time of the conflict solved based on the historical information in the database, which indicates that the solved conflict is high in use frequency. If a conflict is manually resolved by a developer, the solution needs to be saved to a database.

The invention achieves the following beneficial effects: the invention can effectively improve the solving efficiency of the similar conflict, reduce the workload of developers for processing the similar conflict and reduce the working difficulty. In addition, the invention uses the indexing technology in the aspect of detecting the similarity conflict, can accelerate the detection of the similarity conflict, improve the matching speed and also improve the accuracy of the similarity conflict.

Drawings

Fig. 1 is a schematic diagram of a conflict node based on historical information acquisition in an embodiment of the present invention.

Fig. 2 is a schematic diagram of a merging conflict scheme for saving history based on a code warehouse submission tree in an embodiment of the present invention.

Fig. 3 is a schematic diagram of a merge conflict process based on a scheme recommendation in an embodiment of the present invention.

Detailed Description

The technical scheme of the invention is further explained in detail by combining the drawings in the specification.

A repetitive conflict scheme detection method based on diff code block matching specifically comprises the following steps:

step 1, firstly, a commit submitting tree is obtained for a code warehouse containing a large amount of historical merging information, for each merging node, a node which is one node before two branches of the merging node is obtained and merged, whether a conflict occurs or not is checked, and therefore whether the merging node is a node containing conflict history or not is judged.

And 2, using a graph traversal algorithm and node combination, recording the current branch name of the conflict node when analyzing the conflict node, combining the current branch name of the conflict node with other branch names of the current branch, combining the current branch name with precursor nodes of other branches of the current branch, precursor nodes of the precursor of the conflict node, and combining the branch and ancestor nodes of the branches to be combined. And records commit id of each node.

And 3, acquiring a reference code block of the ancestor node, a comparison code block merged to precursor nodes of other branches of the current branch, a source code block of a previous node of a precursor of the current node and a merged code block of a conflict node.

And 4, comparing the codes obtained in the step 3 by using a code matching technology, obtaining a hash value serving as a record dimension of a conflict scheme by using a hash algorithm for the successfully matched code blocks, and storing the current warehouse name, the commit id of each node in the step 2, the hash value of each file code block and the update time into a database.

And 5, matching the existing conflict with the conflict scheme stored in the step 4 by using a database index quick matching technology, returning the matched conflict scheme to developers after successful matching, and updating the storage time. And (3) in the matching failure, developers need to solve the merging conflict problem by themselves, and then the current merging conflict scheme record is stored in the database according to the step 2, the step 3 and the step 4.

In step 1, the process of extracting the merged node is as follows: for a code warehouse containing a large amount of history merging information, firstly, a commit submitting tree in the code warehouse is obtained by using a SourceTree tool, wherein the commit submitting tree is the core of the whole warehouse of Git, and a commit node is the basic unit of the commit tree. Then, for the branch merging case, the node of two branch merging is found, and all the nodes are collected as a set. The node represents commit on one branch. Then, for each node in the set, the commit id at the time of the branch submission is recorded, the commit id is a code successfully submitted by a developer to a code warehouse, the SHA1 value automatically generated by Git represents the serial number submitted this time, the commit ids of all the merging nodes are obtained by referring to a commit submitting tree, and then the current warehouse name needs to be stored.

SourceTree is a free GUI Git client applicable to macOS and Windows. It simplifies the version control process, allowing you to focus on the important thing, encoding. The method has a professional UI, and can execute a Git task and access the Git stream by directly accessing the Git stream, sub-modules, a remote repo manager, local submission search, a visual management version library supporting Git large files and the like. The commit submitting tree is generated based on a Sourcetree tool, and a node with two merged branches is positioned.

In step 2, for the obtained commit id merge node as E, the current branch of the current node and the branch from the merge need to be recorded, the node where the two branches start to split is found, the commit id of the node is stored, the predecessor node of the branch predecessor where the merge node is located is found as D and the predecessor node of the branch from the merge is found as C, and the specific node obtaining situation can refer to fig. 1. For a code warehouse, we can obtain a commit submission tree from the submission history, then obtain a set of merge nodes, then go through each node in the set of merge nodes, check whether the merge result has a conflict through the git merge command, and thus determine whether the node contains history conflict information.

And D, the precursor node B and the nodes C of other branches are merged, and if merging conflict occurs, the current node E is proved to be a conflict node. The commit id obtained to the node is recorded as one piece of information of the merge conflict resolution.

In step 3, it is determined whether the current node is a conflict node through step 2, then the node set in step 1 is circulated, all the merged nodes containing the conflict history are found, then in order to be able to locate the concrete situation of history merged conflict nodes, referring to fig. 2, the present invention is based on the scheme schematic diagram of a commit tree save history merged conflict scheme of the commit, the comparison diff code block, the source diff code block, the reference diff code block and the commit Id of the node where the merged file is located are saved through a commit submit tree, and the conflict diff code block can be located through saving the commit Id. Node D of fig. 2 represents the parent node of the merge node, which is used to acquire node B.

In step 4, in order to further save the history merging scheme, codes in the comparison diff code block, the source diff code block and the reference diff code block are matched one by using a code matching technology, so that a code block list that the code blocks cannot be successfully matched is obtained. And obtaining a code block hash value by respectively using a hash algorithm for each code block in the list. And (3) obtaining the name of the warehouse and the submitting id of the source diff code block, comparing the submitting id of the diff code block, the submitting id of the reference diff code block, the submitting id of the combined diff code block, and obtaining the hash value of the source diff code block, and comparing the hash value of the diff code block, the hash value of the combined diff code block and the hash value of the reference diff code block. Finally as a merge solution in the database, it is in the form of < id, projectName, merged commit Id, source commit Id, target commit Id, base commit Id, source code fragment hash, target code fragment hash, merged code fragment hash, base code fragment hash, update time >. The id is the number of the data record, the project name is the name of the warehouse, the merged commit id is the merged diff code block node submission id, the source commit id is the source diff code block node submission id, the target commit id is the comparison diff code block node submission id, the base commit id is the reference diff code block node submission id, the source code fragment hash is the hash value of the source code block, the target code fragment hash is the hash value of the comparison code block, the merged code fragment hash is the hash value of the merged code block, the base code fragment hash reference code block hash value, and the update time is the time for inserting or updating the record.

In step 5, referring to fig. 3, a schematic diagram of a merging conflict process recommended based on a scheme is shown, a database index fast matching technology is used, according to conflicts encountered by developers in the current project, corresponding conflict blocks are extracted, a hash algorithm is used for hash, obtained hash values are obtained, data in the database are matched through indexes in the database, if matching is successful, a historical merging conflict scheme is recommended to the developers, and merging scheme time in the database is updated. And if the matching fails, the developer manually solves the problem, and the current merging conflict scheme is stored in the database according to the step 2, the step 3 and the step 4.

The work and contributions of the present invention are as follows:

1. and merging the extraction and the storage of the conflict scheme. For a database with a large amount of historical merging codes, such as migration from one code warehouse to another, or a real-world scenario where there is a similar update in one code base, the historical merging decision may have reference value for new merging conflicts. In the first step, a commit submission tree in the code repository can be obtained by using a Sourcetree tool to obtain an imaged tree diagram. The commit tree is the core of the entire warehouse for Git, and the commit node is the basic unit of the commit tree. And secondly, finding a node where two branches in the commit submitting tree intersect, wherein the node is defined as a merging node. All the combination nodes are obtained as a set, and for each node in the set, precursor nodes of two branches of the node are obtained, wherein in the precursor nodes on the branches to be combined, a grandparent node of the combination node is to be found, and the grandparent node is the precursor node of the node. And then carrying out node merging under the two branches, wherein if no conflict occurs, the merged node belongs to normal merging, and no conflict occurs. Otherwise, a merge conflict occurs, which indicates that the merge node is the merge node after manual resolution after the conflict occurs. The above steps may find a merge conflict node in a code repository.

2. Based on the conflict nodes having been found, maintaining a historical merge scheme is crucial for the scheme recommendation. The commit submitting id of the main recording node can find out the content of the node file after the merging is finished, namely a merging diff code block, the diff code blocks to be merged to other branches of the current branch are called comparison diff code blocks, and the node diff code blocks when the two branches are separated are called reference diff code blocks through the conflict nodes. The diff code block of the predecessor node of the found merge node is called the source diff code block. The merging problems encountered need to be matched to the saved merging problems in the database. Since the occurrence of the merging problem always involves the code blocks in the code warehouse, the similarity between the code blocks becomes the key for judging whether the merging scheme is suitable for the merging problem, and the hash value of the code block is saved by using a code matching method to serve as the dimension for saving the conflict scheme. And performing quick matching by using indexes in the database so as to achieve recommendation of the merging scheme.

3. When existing conflicts are encountered, a historical merging conflict scheme needs to be quickly inquired, and the scheme for merging conflicts can be recommended through the stored scheme information after the scheme is found. The essence of this recommendation is to use the statistical nature of the merging decisions that occur with high frequency, applied to new similar code changes. If matching to similar merging schemes is possible, the current database is updated with the saved scheme update time. Of course, if the current conflict does not match a similar merge conflict in the database, it is only possible for the developer to manually process the merge conflict, and finally save the conflict solution processed by the developer into the data. Thus, a human developer or code administrator may make a final merging decision based on the merging scenario recommendation enhancing understanding.

In the software industry, when a developer submits, conflict information exists, merging cannot be completed, at the moment, the content disclosed herein can extract the encountered conflict information, then hash processing is carried out, the conflict information is rapidly matched with a conflict hash value stored in a database, matching is successful, matching data can be recommended to the developer, a historical conflict solution is referred to the developer, and therefore the difficulty of the developer in merging work is reduced.

The above description is only a preferred embodiment of the present invention, and the scope of the present invention is not limited to the above embodiment, but equivalent modifications or changes made by those skilled in the art according to the present disclosure should be included in the scope of the present invention as set forth in the appended claims.

Claims

1. A repetitive conflict scheme detection method based on diff code block matching is characterized in that: the method comprises the following steps:

step 1, extracting all merging nodes in a submission tree for a database with a large number of historical merging codes, and then judging whether the nodes contain historical conflict merging information or not;

step 2, using a graph traversal algorithm and node merging analysis, recording the current branch name of the conflict node, merging the current branch name of the conflict node with other branch names of the current branch, merging the current branch name of the current branch with precursor nodes of other branches of the current branch, merging the precursor nodes of the precursor of the conflict node, merging the branches and ancestor nodes of branches to be merged, and recording the submission marks of each node;

step 3, acquiring a reference code block of an ancestor node, a comparison code block merged to precursor nodes of other branches of the current branch, a source code block of a previous node of a precursor of the current node, and a merged code block of a conflict node;

step 4, extracting code blocks by using a code matching technology for comparison, obtaining a hash value of the successfully matched code blocks through a hash algorithm to serve as a record dimension of a conflict scheme, and storing the hash value and the updating time of each code block in a database by using the current warehouse name, the submission mark of each node and the hash value and the updating time of each code block;

step 5, using a database index fast matching technology to match the existing conflict with the stored conflict scheme, if the matching is successful, returning the matched conflict scheme to the developer, and updating the storage time; if the matching fails, developers need to solve the merging conflict problem by themselves, and then the current merging conflict scheme is recorded in the database.

2. The method for detecting a repetitive collision scheme based on diff code block matching as claimed in claim 1, wherein: in step 1, when the merging node is judged, the commit Id of the node where the diff code block, the source file, the reference file and the merging file are located is stored through the submission tree, the conflict file can be located through the storage of the commit Id, and the good warehouse name is recorded.

3. The method for detecting a repetitive collision scheme based on diff code block matching as claimed in claim 1, wherein: in step 2, using a graph traversal algorithm and node merging analysis, recording the current branch name of the conflict node, merging the current branch name of the conflict node with other branch names of the current branch, merging the current branch name of the current branch with precursor nodes of other branches of the current branch, merging the precursor nodes of the conflict node, merging the branches and ancestor nodes of branches to be merged, and recording the submission marks of each node.

4. The method for detecting a repetitive collision scheme based on diff code block matching as claimed in claim 1, wherein: in step 3, whether the current node is a conflict node is determined through the step 2, then the node set in the step 1 is circulated, all merging nodes containing conflict histories are found, then in order to locate the specific situation of the history merging conflict nodes, the commit Id of the node where the comparison diff code block, the source diff code block, the reference diff code block and the merging file are located is stored through a commit tree, and the conflict diff code block can be located through the storage of the commit Id.

5. The method for detecting a repetitive collision scheme based on diff code block matching as claimed in claim 1, wherein: in step 4, matching codes in the reference diff code block, the source diff code block and the reference diff code block to obtain a code block list in which the code blocks cannot be successfully matched completely, and obtaining a code block hash value of each code block in the list by a hash algorithm; and (4) obtaining the name of the warehouse and the submitting id of the source code block, the submitting id of the comparison code block, the submitting id of the reference code block and the submitting id of the merging code block from the step (1) and the step (3), and obtaining the hash value of the source code block, the hash value of the comparison code block, the hash value of the merging code block and the hash value of the reference code block, and finally using the obtained hash values as a merging scheme in the database.

6. The method for detecting a repetitive collision scheme based on diff code block matching as claimed in claim 1, wherein: and step 5, extracting corresponding conflict blocks according to the encountered conflicts, performing hash on the code blocks by using a hash algorithm to obtain a hash value, matching data in the database through indexes in the database, recommending a historical merging conflict scheme to developers if matching is successful, updating merging scheme time in the database, manually solving by the developers if matching is failed, and storing the merging conflict scheme in the database.