Detailed Description
The embodiments of the present application will be described in detail below with reference to the accompanying drawings.
As shown in fig. 1, a system architecture applied in the embodiment of the present application includes a server 101 and a client 102, where the client 102 is a device used for a user to interact with the server 101, and the server 101 is used for managing files and performing operations such as comparison and merging on the files. The file merging method provided below in the present application can be applied to code files of various programming languages, and can also be applied to files having a fixed syntax structure such as xml.
It should be noted that, in the embodiments of the present application, the number of "a" or "an" is at least one, and usually two or more.
Based on the system architecture shown in fig. 1, the file merging method provided in the embodiment of the present application will be described in detail below. As shown in fig. 2, a flow of the file merging method provided in the embodiment of the present application is as follows. The execution subject of the file merging method provided by the embodiment of the present application may be considered as the server 101, and may also be considered as an integrated circuit or a chip integrated in the server 101.
Step 201, acquiring a first file and a second file; the first file and the second file are files to be merged.
Step 202, determining the content blocks contained in the first file according to the semantics of the content in the first file, and determining the content blocks contained in the second file according to the semantics of the content in the second file.
Specifically, one file contains a plurality of character strings, and the content of the file can be divided into a plurality of content blocks according to the semantics of the character strings. The way in which content blocks are determined according to semantics is different for different file types. Optionally, in this embodiment of the present application, content block division models corresponding to different file types may be stored in advance, where the content block division models include how to divide content blocks according to semantics. Specifically, file types of a first file and a second file are respectively determined, a first content block division model corresponding to the file type of the first file and a second content block division model corresponding to the file type of the second file are obtained according to a corresponding relation between a pre-stored file type and the content block division models, semantic recognition is carried out on the content in the first file according to the obtained first content block division model, and content blocks contained in the first file are determined according to the result of the semantic recognition; and performing semantic recognition on the content in the second file according to the acquired second content block division model, and determining content blocks contained in the second file according to the result of the semantic recognition. Optionally, the content block division model may further include a manner of determining an identifier of a content block, where the identifier of the content block included in the first file is determined according to the first content block division model, and the identifier of the content block included in the second file is determined according to the second content block division model.
For example, for a C language code file, all statements defining and declaring a function are divided into one content block based on the syntax of the C language. The function name is determined as an identification of the content block. The following is an example of the identification of content blocks and content blocks in several C language code files.
Example 1: function(s)
Taking the above entire function as one content block, the function name is foo, and all the contents of the above entire content block starting from "void" to the end of the brace "}" can be identified by the function name foo.
Example 2: structural body
Taking the above whole structure as a content block, the name of the structure is Point _ s, and all the contents of the above whole content block starting from "Typedef" to "Point _ s" can be identified by the name of the structure, Point _ s.
Example 3: macro definition
#define max(a,b)(((a)>(b))?(a):(b))
Taking the above entire macro definition as one content block, the name of the macro definition is max, and the above "# define max (a, b) (((a) > (b)).
Step 203, comparing the first file and the second file according to the determined content blocks, and merging according to the comparison result. The first file and the second file are compared by taking the content block as a comparison unit, but not by taking the character string as a comparison unit. Specifically, content blocks with different identifications in the first file and the second file are merged. The content block identifier may be determined according to the preset content block dividing manner, or may be determined in other manners.
For the content blocks with the same identification in the first file and the second file, the content blocks with the same identification in the first file and the second file can be compared based on the character strings, and automatic combination is performed according to the comparison result of the character strings, or the content blocks with the same identification in the first file and the second file can be prompted to a user for manual combination. The policy of automatic merging is not limited in the present application, and the same content in the first file and the second file may be directly placed in the merged file, and different content is selectively merged according to a certain rule. For example, compared with the second file, the first file has m rows of character strings more than the second file, and the rest rows of character strings have the same content, and if automatic merging is performed, the character strings of the same part in the two files are directly placed into the merged file, and the m rows of character strings can be selected to be merged into the merged file.
In an implementation manner, one of the first file and the second file can be selected as a basic comparison object, and according to the identification of the content block, the content block in the other file is sequentially compared with the content block in the basic comparison object to determine the merging manner. It is assumed that the basic comparison object is a file whose input time is later, and the other file is a file whose input time is closer to the current time. For example, the first file is used as a basic comparison object, content blocks in the second file are respectively compared with content blocks in the first file, and if the identifier of a certain content block in the second file is the same as the identifier of a certain content block in the first file, the comparison is performed based on the character strings as described above, and automatic merging is performed according to the comparison result of the character strings, or the user is prompted to perform manual merging; if the identification of a certain content block in the second file is different from the identification of any content block in the first file, whether the content block is renamed content of the certain content block in the first file needs to be confirmed, and if the content block is not renamed, the content block is determined to be a newly added content block; if a certain content block in the first file does not exist in the second file, the content block is a deleted content block. And carrying out zero-risk combination on the newly added content blocks and the deleted content blocks. If the first file and the second file are modified files of the baseline file, the first file deletes a function relative to the baseline file, and the second file retains the function, the second file and the first file look to have an added function when comparing content blocks of the function, in this case, the comparison is performed in combination with the baseline file, it is determined that a function is deleted for the first file, automatic merging can be performed according to deletion or non-deletion, and certainly, a user can be prompted to perform manual merging according to intention.
In another implementation manner, a comparison and merging strategy can be formulated according to the category of the content block. For example, the categories of content blocks in the C language code file include: function, data structure, enumeration type. And for functions, combining the synonym functions, and automatically combining the homonym functions based on character string comparison or prompting a user to manually combine the homonym functions. And adding the added members of the first file and the added members of the second file into the merged file in sequence for the data type and the enumeration type. For example, the member may be an integer or a character string in C language.
After the merging according to the comparison result in step 203, further risk assessment, that is, accuracy assessment, may be performed on the merged result, and the result of the accuracy assessment may be prompted to the user. In particular, during the merging process, there are no risks for merging of some content in the file and risks for merging of some content. For example, the risk of merging for content blocks in two files that identify differences is zero or close to zero. After content blocks with the same identification are compared based on character strings, automatic merging according to the comparison result is risky. And comparing and combining all the content blocks, recording the risk value of each combination, and prompting the risk value to a user. And after the file merging is finished, providing at least one of the number of the content blocks with risk merging, the identification of the content blocks and the content of the content blocks for the user. For example, 100 content blocks are combined, 93 content blocks are combined at zero risk, 7 content blocks are combined at certain risk, and after the file combination is finished, at least one item of the number of the content blocks with risk combination being 7, the identification of the 7 content blocks, and the content of the 7 content blocks is provided to the user.
And for the content blocks with certain risks, not automatically combining, and prompting the user of the content blocks which cannot be automatically combined so as to facilitate the user to manually combine. Therefore, the merging speed is improved, and meanwhile, the merging accuracy can be guaranteed.
Of course, if the first file and the second file are both modified files based on the baseline file, the content blocks included in the first file, the content blocks included in the second file, and the content blocks included in the baseline file may also be compared, and combined according to the comparison result. For example, a function is deleted from a first file compared with a baseline file, a function is reserved in a second file, if the first file and the second file are directly compared, it is confirmed that the function is added to the second file relative to the first file, automatic merging has risks relatively, and certain accuracy deviation exists, the second file is comprehensively compared with the first file and the baseline file, the first file deletion function is determined, the result of the function is reserved in the second file, targeted policy merging is performed according to the result, automatic merging can be performed, and a user can be prompted to perform manual processing according to the risk value.
The file merging method provided in the embodiment of the present application will be further described in detail with reference to specific application scenarios. Based on the system architecture shown in fig. 1, file a and file B are assumed to be modified files after modification of the baseline file. As shown in fig. 3, the method for merging the file a and the file B is specifically described as follows. The execution subject of a specific method may be the server 101 in fig. 1.
Step 301, receiving a file A, a file B and a baseline file X input by a user.
Step 302, the file types of file A, file B and baseline file X are obtained. Generally, the files being compared are of the same file type. The file type is first identified by a file name suffix, for example, C-language code files with C, cpp, h as suffixes. Java as a suffix and python as a suffix. The user may also specify the type of file as a C language code file or a Java language code file or a python language code file in other ways.
Step 303, determining a content block division model corresponding to the file type according to the acquired file type, and performing content block division on content blocks contained in the file a, the file B and the baseline file X according to the content block division model. Each file type corresponds to a content block division model, and the content block division model comprises a mode of dividing content blocks according to semantics and a determination mode of content block identification. For example, for a C language code file, a function may be divided into one content block and a data structure may be divided into one content block.
And step 304, judging whether the files A and B have the content blocks with the same identifiers which are not compared. If yes, go to step 305-step 312, otherwise go to step 306-step 327.
And step 305, comparing whether the contents in the content blocks identified in the file A and the file B are identical, if so, executing step 307, and otherwise, executing step 308.
Let us assume that the identities of the same content blocks in file a and file B are both denoted by Y.
And step 307, identifying the same content in the same content block in the file A and the file B as a merging result, and returning to continue executing step 304.
Step 308, determine whether the baseline file X also contains a content block denoted as Y, if yes, execute step 309, otherwise execute step 310.
Step 309, compare file a with baseline file X, compare file B with baseline X, and determine whether both file a and file B have modified the content block identified as Y in baseline file X. If only one of the files A and B modifies the content block marked as Y in the baseline file X, executing step 311; if both file a and file B have modifications to the content block identified as Y in the baseline file X, then step 310 is performed.
And step 311, merging the modified content blocks into the merged file as the merged content.
And 310, comparing the content blocks marked as Y in the file A and the file B according to character strings, and automatically combining the content blocks according to a comparison result, or prompting a user to manually combine the content blocks.
And step 312, performing risk prediction on the combination of the content blocks marked as Y in the file A and the file B, storing the combination result, and continuously returning to the step 304.
As for the determination result in step 304, if the file a and the file B do not identify the same content block, step 306 to step 326 are executed, and step 327 is finally executed.
Step 306, selecting whether there are content blocks contained in the file a and not contained in the file B, if yes, executing step 313, otherwise executing step 314.
Step 313, assuming that the identifier of the content block is P, determining whether the baseline file X includes the content block identified as P, if so, executing step 315, otherwise, indicating that the file a adds the content block identified as P in the baseline file X, and executing step 316.
Step 315, determining whether the content of the content block identified as P in the file a and the baseline file X is the same, if so, indicating that the file B performs deletion processing on the content block identified as P in the baseline file X, then executing step 317, otherwise, indicating that the file a modifies the content block identified as P in the baseline file X, and if not, executing step 318, indicating that the file B performs deletion processing on the content block identified as P in the baseline file X.
Step 316, with the content block marked as P added as the result of merging the content blocks of file a and file B, step 319 is executed.
Step 317, with the content block marked as P deleted as the result of merging the content blocks of the file a and the file B, executes step 319.
Step 318, prompting the user to perform manual combination, and executing step 319.
In the case of such a conflict, the automatic merging does not truly reflect the intention of the user, so the user is prompted to perform manual merging.
319, risk prediction is carried out on the combination of the content blocks marked as P in the file A and the file B, the combination result is stored, and the step 306 is continuously executed.
Step 314, selecting whether there is a content block contained in the file B and not contained in the file a, if yes, executing step 320. If not, go to step 327.
Step 320, assuming that the identifier of the content block is Q, determining whether the baseline file X includes the content block identified as Q, if so, executing step 321, otherwise, indicating that the file B adds the content block identified as Q in the baseline file X, and executing step 322.
Step 321, determining whether the content of the content block identified as Q in the file B and the baseline file X is the same, if so, indicating that the file a deletes the content block identified as Q in the baseline file X, then performing step 323, otherwise, indicating that the file B modifies the content block identified as Q in the baseline file X, and if so, executing step 324.
Step 323, merge the content blocks with file a and file B with the added content block identified as Q, execute step 326.
Step 324, with the deleted content block identified as Q as the result of merging the content blocks for file a and file B, executes step 326.
Step 325, prompt the user to perform manual merge, and execute step 326. In the case of such a conflict, the automatic merging does not truly reflect the intention of the user, so the user is prompted to perform manual merging.
And 326, performing risk prediction on the combination of the content blocks marked as Q in the file A and the file B, storing the combination result, and continuously returning to execute the step 314.
And 327, saving the merging result of the file A and the file B, and outputting the risk value of the merged file. The risk value may be the number of content blocks, content, identification, etc. in file a and file B that incorporate the risk.
It should be noted that, whether content blocks included in different files are the same or not may be determined according to the identifier of the content block, or may be determined according to other manners, which is not limited in this application.
Based on the above embodiment, as shown in fig. 4, the server 101 in the embodiment of the present application may be divided into several functional modules. Including an input-output module 401, a comparison module 402, a merging module 403, and a scoring module 404. Wherein:
and the input and output module 401 is configured to receive a file input by a user, present a result of comparing and merging to the user, and prompt the user with information such as a risk value of manually merging and merging files.
An information model module 402, configured to store a file type and a corresponding content block division model. Specifically, content block division models corresponding to different file types may be stored. For example, for a C language code file, the module includes a way of decomposing a C language code file into a plurality of content blocks and a comparison and merging policy for each content block. By storing the content block division models corresponding to various file types in the module, the function of comparing different file types based on semantics can be realized.
And a comparing module 403, configured to compare the input files based on the content blocks to obtain a comparison result.
A merging module 404, configured to perform merging according to the comparison result of the comparing module 403, or perform merging according to the operation of the user, so as to generate a new merged file.
And the scoring module 405 is configured to perform risk prediction on the merged result, and quantify the risk of the merged result by using a numerical value.
In summary, in the embodiment of the present application, by dividing the content blocks of the files to be compared, the files can be merged based on semantics, which is helpful for improving the efficiency and accuracy of merging the files and for realizing automatic merging, and after the method is integrated into a code change management system, the efficiency and accuracy of automatically combining and merging the files can be improved, and quantitative risk assessment can be provided for the automatically combined and structured files.
Based on the same inventive concept as the document merging method shown in fig. 2, as shown in fig. 5, the embodiment of the present application further provides a document merging apparatus 500, and the document merging apparatus 500 can be used for executing the method shown in fig. 5. The file merging apparatus 500 includes a processor 501 and a memory 502, and the processor 501 is configured to execute code in the memory 502, and when the code is executed, the execution causes the processor 501 to execute the file method shown in fig. 2.
The processor 501 may be a Central Processing Unit (CPU), a Network Processor (NP), or a combination of a CPU and an NP.
The processor 501 may further include a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof. The PLD may be a Complex Programmable Logic Device (CPLD), a field-programmable gate array (FPGA), a General Array Logic (GAL), or any combination thereof.
Memory 502 may include volatile memory (volatile memory), such as random-access memory (RAM); the memory 502 may also include a non-volatile memory (non-volatile memory), such as a flash memory (flash memory), a Hard Disk Drive (HDD) or a solid-state drive (SSD); the memory 502 may also comprise a combination of memories of the kind described above.
Based on the same inventive concept as the document merging method shown in fig. 2, as shown in fig. 6, the embodiment of the present application further provides a document merging apparatus 600, and the document merging apparatus 600 can be used for executing the method shown in fig. 2. The file merging apparatus 600 includes: an acquisition unit 601 and a processing unit 602. Wherein:
an acquisition unit 601 configured to acquire a first file and a second file;
a processing unit 602, configured to determine content blocks included in a first file according to semantics of content in the first file, and determine content blocks included in a second file according to semantics of content in the second file;
the processing unit 602 is further configured to compare content blocks included in the first file with content blocks included in the second file, and merge the first file and the second file according to a comparison result.
Optionally, when determining the content blocks included in the first file and the content blocks included in the second file, the processing unit 602 is specifically configured to: respectively determining the file types of the first file and the second file; acquiring a first content block division model corresponding to the file type of a first file and a second content block division model corresponding to the file type of a second file according to the corresponding relation between the pre-stored file type and the content block division model; performing semantic recognition on the content in the first file according to the acquired first content block division model, and determining content blocks contained in the first file according to the result of the semantic recognition; and performing semantic recognition on the content in the second file according to the acquired second content block division model, and determining the content blocks contained in the second file according to the result of the semantic recognition.
Optionally, when the processing unit 602 compares the content blocks included in the first file with the content blocks included in the second file, and merges the first file and the second file according to the comparison result, the processing unit is specifically configured to:
and combining the content blocks with different identifications in the first file and the second file according to the identifications of the content blocks contained in the first file and the second file respectively.
Optionally, when the processing unit 602 compares the content blocks included in the first file with the content blocks included in the second file, and merges the first file and the second file according to the comparison result, the processing unit is specifically configured to:
and comparing the content blocks with the same identifier in the first file and the second file based on the character strings according to the identifiers of the content blocks contained in the first file and the second file respectively, and merging according to the comparison result of the character strings.
Optionally, the processing unit 602 is further configured to:
and after the first file and the second file are merged according to the comparison result, carrying out accuracy evaluation on the merged result, and prompting the result of the accuracy evaluation to a user.
Optionally, when the processing unit 602 compares the content blocks included in the first file with the content blocks included in the second file, and merges the first file and the second file according to the comparison result, the processing unit is specifically configured to:
if the first file and the second file are both modified files based on the baseline file, comparing the content blocks contained in the first file with the content blocks contained in the second file according to the content blocks contained in the baseline file, and merging the first file and the second file according to the comparison result.
An embodiment of the present application provides a computer storage medium, which stores a computer program, where the computer program includes a program for executing the file merging method shown in fig. 2.
Embodiments of the present application provide a computer program product containing instructions that, when executed on a computer, cause the computer to perform the file merging method shown in fig. 2.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various changes and modifications may be made in the embodiments of the present application without departing from the spirit and scope of the embodiments of the present application. Thus, if such modifications and variations of the embodiments of the present application fall within the scope of the claims of the present application and their equivalents, the present application is also intended to encompass such modifications and variations.