CN103729342B - File comparison method and device - Google Patents

File comparison method and device Download PDF

Info

Publication number
CN103729342B
CN103729342B CN201210385557.2A CN201210385557A CN103729342B CN 103729342 B CN103729342 B CN 103729342B CN 201210385557 A CN201210385557 A CN 201210385557A CN 103729342 B CN103729342 B CN 103729342B
Authority
CN
China
Prior art keywords
file
data set
compared
comparison
swap data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210385557.2A
Other languages
Chinese (zh)
Other versions
CN103729342A (en
Inventor
尹祥龙
万鑫明
吴金坛
吕苏
马军
杨惠娟
高伟东
周涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Unionpay Co Ltd
Original Assignee
China Unionpay Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Unionpay Co Ltd filed Critical China Unionpay Co Ltd
Priority to CN201210385557.2A priority Critical patent/CN103729342B/en
Publication of CN103729342A publication Critical patent/CN103729342A/en
Application granted granted Critical
Publication of CN103729342B publication Critical patent/CN103729342B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The present invention proposes a kind of file comparison method, including: the size of file the most to be compared, if the cause not of uniform size of described file to be compared, determine that described file to be compared is different;Otherwise, for described file to be compared, use hash function to carry out coding and obtain corresponding swap data set;Relatively each swap data set;And if swap data set is identical, file the most to be compared.The application of the invention, can realize the comparison of large information capacity file rapidly.

Description

File comparison method and device
Technical field
Present invention relates in general to computer information processing field, particularly relate to large information capacity file quickly processes skill Art.
Background technology
One key character of information age is that the high speed of quantity of information expands.Such as in financial field, along with financial industry Develop rapidly, quantity and the file size of financial transaction file the most quickly increase, finance chronological file fileinfo amount huge Greatly, line number is more than 100,000.In addition it also has the following characteristics that 1) with behavior unit;2) there is no restriction relation between row and row, It is inserted in file with random basis.In the concurrent testing of financial transaction system works, need comparison two batches approximation flowing water literary composition All between part do not exist together, to confirm the test result of system.In the face of the magnanimity chronological file of more than 30G easily, as how Less cost quickly compares these files, becomes a difficult problem in current test job.
The comparison method of existing finance chronological file is: successively two batches of approximation file decompressions, sequence, files to be compared Visualization comparison instrument comparison (such as Beyond Compare, Diff order), wherein use Beyond Compare must also Copy magnanimity finance chronological file is to windows platform;Simultaneously in order to promote speed, use the method that space exchanges the time for, i.e. Use multiple machine servicer aggregated structure, utilize the parallel computation of physical machine to go to solve unit Calculation bottleneck.Existing method is to 1G Within file comparison be that succinct quickly but the comparison processing magnanimity chronological file is very slow, total time has reached 7 hours Above.Existing method there is also following shortcoming: from an economic point of view, use server cluster shorten approximation ratio pair time Between, but the operation expense of server cluster is higher, needs the mass energy consumed, and the cost spent is bigger;From skill From the point of view of art angle, needing unnecessary file ordering step, expend Installed System Memory, CPU, I/O resource, step is too many, needs Many manual interventions.Therefore, the most also do not have practicable practical technique solve required by financial industry based on pass The mass file approximation comparison technology in key territory.Thus, research quickly approximation alignment algorithm and approximation ratio countermeasure show slightly particularly Important.
Summary of the invention
In order at least solve an aspect of the problems referred to above, the present invention proposes a kind of file comparison method, including: for File to be compared, uses hash function to carry out coding and obtains corresponding swap data set;And compare each swap data set.
The step using hash function to carry out encoding described in above-mentioned file comparison method includes described literary composition to be compared Data in each of part obtain remainder result data to a predetermined value remainder number, and concentrate note at described transform data The number of record conflict and position.
File to be compared described in above-mentioned file comparison method includes file one and file two, and described method also includes: The swap data set of scanning file one, concentrates identical remainder result data if wherein existed with the transform data of file two, So the number of the conflict in described file one and described file two is subtracted one respectively, otherwise, continue to scan on the conversion of file one Data set, until the swap data set of file one is scanned and is disposed;If the transform data of last file one The number of collisions of collection is not 0, then supporting paper one is not the subset of file two, and file one transform data concentrates number of collisions to be non-zero institute Corresponding line number row have recorded the difference row during file one comparison document two;The difference of output file one comparison document two OK;And the swap data set of scanning file two, concentrate identical remainder knot with the transform data of file one if wherein existed Really data, then subtract one respectively by the number of the conflict in described file two and described file one, otherwise, continue to scan on file two Swap data set, until the swap data set of file two is scanned and is disposed;If the change of last file two The number of collisions changing data set is not 0, then supporting paper two is not the subset of file one, and file two transform data concentrates number of collisions The difference row during file two comparison document one, output file two comparison document one is have recorded for non-zero corresponding line number row Difference row.
The step using hash function to carry out encoding described in above-mentioned file comparison method includes in file to be compared Every data line set up Hash hash, allow cryptographic Hash be uniformly distributed.
The invention allows for a kind of file comparison method, including the size of file the most to be compared, if described in treat The cause not of uniform size of the file of comparison then immediately determines that described file to be compared is different;Otherwise, for described to be compared File, use hash function to carry out coding and obtain corresponding swap data set;Relatively each swap data set;And if become Change the identical file then the most to be compared of data set, otherwise terminate epicycle and compare.
The step using hash function to carry out encoding described in above-mentioned file comparison method includes described literary composition to be compared The data of each in part obtain remainder result data to a predetermined value remainder number, and concentrate note at described transform data The number of record conflict and position.
File to be compared described in above-mentioned file comparison method includes file one and file two, and described method also includes: The swap data set of scanning file one, concentrates identical remainder result data if wherein existed with the transform data of file two, So the number of the conflict in described file one and described file two is subtracted one respectively, otherwise, continue to scan on the conversion of file one Data set, until the swap data set of file one is scanned and is disposed;If the transform data of last file one The number of collisions of collection is not 0, then supporting paper one is not the subset of file two, and file one transform data concentrates number of collisions to be non-zero institute Corresponding line number row have recorded the difference row during file one comparison document two;The difference of output file one comparison document two OK;And the swap data set of scanning file two, concentrate identical remainder knot with the transform data of file one if wherein existed Really data, then subtract one respectively by the number of the conflict in described file two and described file one, otherwise, continue to scan on file two Swap data set, until the swap data set of file two is scanned and is disposed;If the change of last file two The number of collisions changing data set is not 0, then supporting paper two is not the subset of file one, and file two transform data concentrates number of collisions The difference row during file two comparison document one, output file two comparison document one is have recorded for non-zero corresponding line number row Difference row.
The step using hash function to carry out encoding described in above-mentioned file comparison method includes file to be compared In every data line set up Hash hash, allow cryptographic Hash be uniformly distributed.
The invention allows for a kind of file comparison device, including: file size compares equipment, for the most to be compared The size of file, if the cause not of uniform size of described file to be compared, determines that described file to be compared is different;Breathe out Uncommon function encoding device, for file to be compared of the same size, uses hash function to carry out coding and is converted accordingly Data set;Swap data set compares equipment, is used for comparing each swap data set;And Documents Comparison equipment, in conversion File more to be compared when data set is identical.
Described in above-mentioned file comparison device, hash function encoding device includes: remainder counting apparatus, for described waiting being compared To file in the data of each one predetermined value remainder number is obtained remainder result data, and recording equipment, use In number and the position of concentrating record conflict at described transform data.
The application of the invention, can realize the comparison of large information capacity file rapidly.
Accompanying drawing explanation
For ease of understanding, by not limiting example, embodiments of the invention are described with reference to the accompanying drawings, wherein:
Fig. 1 shows the main flow of a kind of file comparison;
Fig. 2 is the function structure chart of a kind of file alignment schemes;
Fig. 3 shows the flow process of a kind of file comparison;
Fig. 4 shows a kind of Quick file comparison.
Detailed description of the invention
Illustrate unless separately added, as from discussion below it can also be recognized that as, in this specification in the whole text In, utilize such as " process ", " calculating ", the discussion of " determination " etc term represent and use such as computer or similar electronics to calculate The action of the particular device of device etc or process.In the context of the present specification, computer or similar electronics calculate dress Put and can handle or convert signal.These signals computer or the memorizer of similar computing electronics, depositor or its Its information-storing device, transmitting device or display device are typically expressed as physical electronic or the quantity of magnetism.Such as, electronics calculates dress Put the one or more processor that can include performing one or more specific functions.
According to an aspect of the present invention, hash function is used to carry out the comparison of file.Here file is financial flow Hydrology part, it is also possible to be other type of file.For file to be compared, use hash function to carry out coding and obtain accordingly Swap data set, then compare each swap data set.Hash is exactly that the hashing algorithm that enters through of random length is converted Become the output of regular length.If in the structure exist and a record equal for keyword K, then must be at f(K) storage position On.Thus, it is not required to compare and just can directly obtain looked into record.This corresponding relation is exactly hash function.
Assume have to be compared two file, respectively file A and file B.According to the present invention, use hash function to A Encode respectively with B, obtain swap data set A1 and B1.The relation of A and B of drawing is approximated by comparing A1 and B1.
According to an aspect of the present invention, use that the hash function step that carries out encoding includes file to be compared is every Data in one are to a predetermined value remainder number, and record the number of conflict and position is concentrated at transform data.Here institute The conflict claimed, as will be appreciated by one of skill in the art, refers to different keywords obtains same hash ground The phenomenon of location.
Assume that file to be compared includes file one and file two, relation therebetween will be obtained, can first scan literary composition The swap data set of part one, concentrates identical remainder result data if wherein existed with the transform data of file two, then will The number of the conflict in described file one and described file two subtracts one respectively.
Without finding identical remainder result data, then continue to scan on the swap data set of file one, until file Till the swap data set of is scanned and is disposed.If the number of collisions of the swap data set of last file one is not 0, Then supporting paper one is not the subset of file two.File one transform data concentrates number of collisions to be the line number row corresponding to non-zero number Have recorded the difference row during file one comparison document two, can be using the difference row of file one comparison document two as result Part output.
Then the swap data set of same scanning file two.If wherein existing concentrates identical with the transform data of file one Remainder result data, then the number of the conflict in described file two and described file one is subtracted one respectively.Without sending out Existing identical remainder result data, continues to scan on the swap data set of file two, until the swap data set of file two is scanned Till being disposed.If the number of collisions of the swap data set of last file two is not 0, then supporting paper two is not file The subset of one.File two transform data concentrates number of collisions to be that non-zero corresponding line number row have recorded file two comparison document one mistake Difference row in journey, can export the difference row of file two comparison document one as a part for result.
Wherein, if the transform data of file one is concentrated to exist concentrates identical remainder result with the transform data of file two Data, then subtract one respectively by the number of the conflict in file one and file two, if the transform data of definitive document two is concentrated The number of conflict be not zero, then file two is not the subset of file one.If what the transform data of definitive document one was concentrated rushes Prominent number is not zero, then file one is not the subset of file two.
Said method is the approximation comparison method of file, according to a further aspect of the invention, also proposed one here The quickly method of comparison.The method is as shown in Figure 1.First carrying out file size consistency check, cause not of uniform size is the most directly returned Returning result, file the most to be compared is different.Otherwise, if in the same size, use hash function to carry out coding and obtain accordingly Swap data set and compare each swap data set, if swap data set is inconsistent also illustrates that file to be compared is different 's.If swap data set is also consistent, compare file to be compared the most again.
According to a further aspect of the invention, a kind of file comparison device is proposed hered.This device includes file size Relatively equipment, hash function encoding device, swap data set compares equipment and Documents Comparison equipment.Wherein, file size ratio Compared with equipment for the size of file the most to be compared, if the cause not of uniform size of file to be compared, determine literary composition to be compared Part is different.Hash function encoding device uses hash function to encode for the file to be compared to cause not of uniform size Obtain corresponding swap data set.Swap data set compares equipment for comparing each swap data set.Documents Comparison equipment is used In the file more to be compared when swap data set is identical.
Hash function encoding device can include remainder counting apparatus and recording equipment.The former is for by file to be compared The data of each obtain remainder result data to a predetermined value remainder number.The latter is for concentrating record punching at transform data Prominent number and position.
Fig. 2 shows according to a further aspect of the invention, uses and includes that the device of five modules is to carry out the ratio of file To schematic diagram.First module is dispatching control module, scheduling controlling interface.This module supports multiprocess scheduling, file ratio To the initiating in real time of task, suspend, terminate.Use the strategy quickly got the upper hand of, search file below catalogue, if file is first Whether filename and the size of relatively two files mate, if unanimously further calling file comparing module enters line data Calculating cryptographic Hash is set up index and is compared;If not file but catalogue with regard to recursive call oneself.
Second module is file comparing module.First this module travels through all row of file A, the row data to file A Key sets up array of indexes, uses the hashing technique of high efficiency row compress technique and lower coupling to set up file in internal memory A concordance list;Then traveling through each row of file B, whether the cryptographic Hash comparing each row data in B exists in the concordance list of A, as Fruit does not exists, then it represents that this row is not existing together of two files, exports the information of this row.If file A is one compared with file B Cause, the most in turn with identical method file B compared with file A.
3rd module is task configuration module.This module is for arranging file white list, file blacklist, file line Key to be compared and comparison strategy;Determine which file comparison, which file not comparison;The key of configuration file achieves The approximation comparison function of file;And determine whether the comparison strategy selecting quickly to get the upper hand of.
4th module is result processing module.The result that this module exports file comparing module is according to personal settings Show tester, strengthen the readability of result.Specifically, file comparing module can be split according to file standard specification All not existing together of output recorded in specific file, such as Excel file, and can be used by inconsistent file part Eye-catching color mark.
5th module is journal output module.This module exports day the procedural information of all operations in said method In will, tester is facilitated to analyze the detailed operation note searched in comparison process.
Fig. 2 also show the main flow of the comparison method of file, including following step:
(1) configuration step of comparison task: comparison personnel have planned the input directory of program, defeated according to comparison mission requirements Go out catalogue;Configuration needs the white list of comparison file, blacklist, key to file to carry out coupling etc. to arrange;Controlling master Configure comparison task above machine and do not affect ongoing comparison task, there is the motility of height.
(2) the execution step of comparison task: comparison personnel perform the comparison task configured, and task scheduling modules is responsible for tune The degree current loading condition of assessment system, distributes number of processes according to the size of task, temporal sensitivity, and dispatch multiple enter Journey concurrent efforts, records the tasks carrying situation of each process, has the power initiated in real time, suspend, terminate to task.
(3) invocation step of submodule: scheduling controlling interface calls other submodules according to each interface parameters, to comparison The process of task is monitored, and confirms the result of its comparison and integrates with the result of other machines, and feedback feeds Journey the next item down task.
File comparing module is referred to the Implementation Technology shown in Fig. 3.Such as, 1) with file A comparison document B, then Follow the steps below:
Take file A, use hash function that every a line content of A is encoded, the two-dimensional array of hash to N row, array A [i]={ index value i, conflict mark cfi, line number ai}.
The each row taking file B carries out hash, inside the two-dimensional array of same hash to N, array B [j]=index value j, Conflict mark cfj, line number bj}.
For B [j] number, take and index as making a look up inside j to array A, A [j]=index value j, conflict mark cfj, Line number aj}, then uses the bj row of B file to compare the ai row of A file, and carry out is the comparison of row and row.On all four Time, to continue to compare next time, this time deletes index record corresponding in A and B, and number of collisions deducts 1, till reducing to 0.Read Take the end-state of file B index, obtain inconsistent line number and export the journal file of B.
Check the conflict record count in the concordance list of file B, as cfj=0, in supporting paper B every a line content all with File A is completely the same;Work as cfj!When=0, supporting paper B exists the line number inconsistent with file A, and output file B is corresponding The content of row is in journal file.
Here four kinds of situations can be considered:
A) B does not conflict, and A does not conflict.
Assume that Hash function uses f (x)=A [i] %4, then can obtain following comparison process:
Input data:
A={1,2,3};
B={1};
The index situation of file A, B of comparison process:
Comparison result:
Content in file B is the subset of file A.
B) B does not conflict, and A has conflict.
Assume that Hash function uses f (x)=A [i] %4, then can obtain following comparison process:
Input data:
A={1,2,3,5};
B={1};
The index situation of file A, B of comparison process:
Comparison result:
Content in file B is the subset of file A.
C) B has conflict, A not to conflict.
From a), the project of this situation mainly B conflict does not the most find in A, finally stays the daily record of B In.
Assume that Hash function uses f (x)=A [i] %4, then can obtain following comparison process:
Input data:
A={1,2,3};
B={1,5};
The index situation of file A, B of comparison process:
Comparison result:
There is the line number inconsistent with file A in file B, line number is 2, second row of output file B, i.e. 5, and output is to day In will file.
Need exist for considering, another situation:
Assume that Hash function uses f (x)=A [i] %4, then following comparison process can be obtained
Input data:
A={2,3,9};
B={1,5};
The index situation of file A, B of comparison process:
Comparison result:
There is the line number inconsistent with file A in file B, line number is 2 and 1, second row of output file B, the first row, i.e. 5 and 1, export in journal file.
D) B has conflict, A to have conflict.
From b), this situation has mainly just started the project of B and has not the most found in A, finally stays the daily record of B In.
Assume that Hash function uses f (x)=A [i] %4, then can obtain following comparison process:
Input data:
A={1,2,3,5};
B={1,5};
The index situation of file A, B of comparison process:
Comparison result:
Content in file B is the subset of file A.
2) file B comparison document A.
Because file A comparison document B is final only can get row inconsistent with A in B, so needing to compare literary composition again Whether part A exists the row not having in file B, according to above 1) shown in step, by file B comparison document A inconsistent in A The daily record writing A in.
Assume that Hash function uses f (x)=A [i] %3, then can obtain following comparison process:
Input data:
A={4,7…3n+1};
B={1};
The index situation of file A, B of comparison process:
Comparison result is, exists and file A is inconsistent in file B.
According to a further aspect of the invention, a kind of quickly winning strategy technical method is the most also proposed.As shown in Figure 4, First the method carries out file size consistency check, and cause not of uniform size the most directly returns result.Then, if in the same size, First file A is compared with file B, checks that in Hash index, the content of correspondence is the most unanimously, be found to have at one inconsistent the most immediately Return result.If it follows that file A is consistent with the content that file B compares, then file B is compared with file A, check Hash Content corresponding in index is the most consistent, inconsistent at one returns result the most immediately if it find that have.Can be fast by this strategy Which file is discrepant to the tester that tells of speed.
Method described here can be realized by various modes based in part on application according to special characteristic or example. Such as, this method can be realized by hardware, firmware, software or their any combination.In hardware realizes, such as, Device can be at one or more special ICs (ASICs), digital signal processor (DSPs), digital signal processing device (DSPDs), PLD (PLDs), field programmable gate array (FPGAs), processor, controller, microcontroller, Microprocessor, electronic installation or be designed to carry out other device unit of all functions as described herein or theirs is any Combination realizes.
Equally, in certain embodiments, method can use execution function described here or their any combination of mould Block realizes.Such as, any machine readable media visibly embodying instruction can use in realizing this kind of method.Real one Executing in example, such as, software or code are storable in running in memorizer and by processing unit.Memorizer can be at processing unit In and/or processing unit outside realize.Term used herein above " memorizer " represents any kind of long-term, short-term, easily The property lost, non-volatile or other memorizer, and it is not limited to any particular type or the number of memorizer of memorizer Amount or the type of storage medium.
Storage medium can include any usable medium that can be accessed by computer, calculating platform, calculating device etc..Make For citing rather than restriction, computer-readable medium can include RAM, ROM, EEPROM, CD-ROM or other optical disc storage, disk Storage or other magnetic memory apparatus, or can be used for carrying or storing taking instruction or the desired program of data structure form Code and other any medium that can be accessed by computer, calculating platform or calculating device.
Although being illustrated above the content being presently considered to be exemplary characteristics, but those skilled in the art will Understand, in the case of without departing substantially from claimed theme, can specific embodiment described in the present invention be carried out various Amendment.And, embodiment neutralize the identical feature of other embodiment perhaps to statement succinct and omit, this area One of skill will recognize that these omit.It is therefore desirable to the theme of protection is not limited to disclosed particular example, phase Instead, that includes all the elements fallen within the scope of the appended claims.

Claims (5)

1. a file comparison method, including:
For file to be compared, use hash function to carry out coding and obtain corresponding swap data set;And
Relatively each swap data set,
The step that wherein said use hash function carries out encoding include by described file to be compared each in data One predetermined value remainder number is obtained remainder result data, and concentrates number and the position of record conflict at described transform data Put,
Wherein said file to be compared includes file one and file two, and described method also includes:
The swap data set of scanning file one, concentrates identical remainder number of results if wherein existed with the transform data of file two According to, then the number of the conflict in described file one and described file two is subtracted one respectively, otherwise, continues to scan on the change of file one Change data set, until the swap data set of file one is scanned and is disposed;If the conversion number of last file one Number of collisions according to collection is not 0, then supporting paper one is not the subset of file two, and file one transform data concentrates number of collisions to be non-zero Corresponding line number row have recorded the difference row during file one comparison document two;The difference of output file one comparison document two OK;And
The swap data set of scanning file two, concentrates identical remainder number of results if wherein existed with the transform data of file one According to, then the number of the conflict in described file two and described file one is subtracted one respectively, otherwise, continues to scan on the change of file two Change data set, until the swap data set of file two is scanned and is disposed;If the conversion number of last file two Number of collisions according to collection is not 0, then supporting paper two is not the subset of file one, and file two transform data concentrates number of collisions to be non- Line number row corresponding to 0 have recorded the difference row during file two comparison document one, the difference of output file two comparison document one Different row.
2. file comparison method as claimed in claim 1, the step that wherein said use hash function carries out encoding include by Every data line in file to be compared sets up Hash hash, allows cryptographic Hash be uniformly distributed.
3. a file comparison method, including:
The size of file relatively more to be compared, and if the cause not of uniform size of described file to be compared; would immediately determine that described in wait to compare To file be different;
Otherwise, for described file to be compared, use hash function to carry out coding and obtain corresponding swap data set;
Relatively each swap data set;And
If swap data set is identical, file the most to be compared, otherwise terminate epicycle and compare,
The step that wherein said use hash function carries out encoding include by described file to be compared each in data One predetermined value remainder number is obtained remainder result data, and concentrates number and the position of record conflict at described transform data Put,
Wherein said file to be compared includes file one and file two, and described method also includes:
The swap data set of scanning file one, concentrates identical remainder number of results if wherein existed with the transform data of file two According to, then the number of the conflict in described file one and described file two is subtracted one respectively, otherwise, continues to scan on the change of file one Change data set, until the swap data set of file one is scanned and is disposed;If the conversion number of last file one Number of collisions according to collection is not 0, then supporting paper one is not the subset of file two, and file one transform data concentrates number of collisions to be non-zero Corresponding line number row have recorded the difference row during file one comparison document two;The difference of output file one comparison document two OK;And
The swap data set of scanning file two, concentrates identical remainder number of results if wherein existed with the transform data of file one According to, then the number of the conflict in described file two and described file one is subtracted one respectively, otherwise, continues to scan on the change of file two Change data set, until the swap data set of file two is scanned and is disposed;If the conversion number of last file two Number of collisions according to collection is not 0, then supporting paper two is not the subset of file one, and file two transform data concentrates number of collisions to be non- Line number row corresponding to 0 have recorded the difference row during file two comparison document one, the difference of output file two comparison document one Different row.
4. file comparison method as claimed in claim 3, the step that wherein said use hash function carries out encoding include by Every data line in file to be compared sets up Hash hash, allows cryptographic Hash be uniformly distributed.
5. a file comparison device, including:
File size compares equipment, for the size of file the most to be compared, if the size of described file to be compared is not Consistent then determine that described file to be compared is different;
Hash function encoding device, for file to be compared of the same size, uses hash function to carry out coding and obtains accordingly Swap data set;
Swap data set compares equipment, is used for comparing each swap data set;And
Documents Comparison equipment, for the file more to be compared when swap data set is identical,
Wherein said hash function encoding device includes:
Remainder counting apparatus, for obtaining the data of each in described file to be compared a predetermined value remainder number Remainder result data, and
Recording equipment, for concentrating number and position, the wherein said file to be compared of record conflict at described transform data Including file one and file two,
The swap data set of scanning file one, concentrates identical remainder number of results if wherein existed with the transform data of file two According to, then the number of the conflict in described file one and described file two is subtracted one respectively, otherwise, continues to scan on the change of file one Change data set, until the swap data set of file one is scanned and is disposed;If the conversion number of last file one Number of collisions according to collection is not 0, then supporting paper one is not the subset of file two, and file one transform data concentrates number of collisions to be non-zero Corresponding line number row have recorded the difference row during file one comparison document two;The difference of output file one comparison document two OK;And
The swap data set of scanning file two, concentrates identical remainder number of results if wherein existed with the transform data of file one According to, then the number of the conflict in described file two and described file one is subtracted one respectively, otherwise, continues to scan on the change of file two Change data set, until the swap data set of file two is scanned and is disposed;If the conversion number of last file two Number of collisions according to collection is not 0, then supporting paper two is not the subset of file one, and file two transform data concentrates number of collisions to be non- Line number row corresponding to 0 have recorded the difference row during file two comparison document one, the difference of output file two comparison document one Different row.
CN201210385557.2A 2012-10-12 2012-10-12 File comparison method and device Active CN103729342B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210385557.2A CN103729342B (en) 2012-10-12 2012-10-12 File comparison method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210385557.2A CN103729342B (en) 2012-10-12 2012-10-12 File comparison method and device

Publications (2)

Publication Number Publication Date
CN103729342A CN103729342A (en) 2014-04-16
CN103729342B true CN103729342B (en) 2016-09-28

Family

ID=50453421

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210385557.2A Active CN103729342B (en) 2012-10-12 2012-10-12 File comparison method and device

Country Status (1)

Country Link
CN (1) CN103729342B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105391566B (en) * 2014-09-04 2018-12-07 中国移动通信集团黑龙江有限公司 A kind of method and device that dynamic network equipments configuration compares
CN104639629A (en) * 2015-01-30 2015-05-20 英华达(上海)科技有限公司 File comparing method and system at client and cloud
CN105787041B (en) * 2016-02-26 2019-08-13 中国银联股份有限公司 Big file comparison method and Compare System based on data characteristics code
CN106055692A (en) * 2016-06-12 2016-10-26 上海爱数信息技术股份有限公司 Automatic testing method and system for comparison files or folders
CN108446394B (en) * 2018-03-26 2021-02-19 网易(杭州)网络有限公司 File difference comparison method and device
CN112181479A (en) * 2020-09-23 2021-01-05 中国建设银行股份有限公司 Method and device for determining difference between code file versions and electronic equipment
CN113505137B (en) * 2021-07-27 2022-07-08 重庆市规划和自然资源信息中心 Real estate space graph updating method

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2005201758B2 (en) * 2005-04-27 2008-12-18 Canon Kabushiki Kaisha Method of learning associations between documents and data sets
CN101146111B (en) * 2007-10-19 2012-03-07 深圳市迅雷网络技术有限公司 A file download method and device
CN101398837B (en) * 2008-10-23 2011-05-11 深圳市奇迹通讯有限公司 Method for rapidly matching sms text
CN101957858A (en) * 2010-09-27 2011-01-26 中兴通讯股份有限公司 Data comparison method and device
CN102467458B (en) * 2010-11-05 2014-08-06 英业达股份有限公司 Method for establishing index of data block

Also Published As

Publication number Publication date
CN103729342A (en) 2014-04-16

Similar Documents

Publication Publication Date Title
CN103729342B (en) File comparison method and device
CN102129425B (en) The access method of big object set table and device in data warehouse
US10783163B2 (en) Instance-based distributed data recovery method and apparatus
CN103902702A (en) Data storage system and data storage method
CN105260464B (en) The conversion method and device of data store organisation
CN112052138A (en) Service data quality detection method and device, computer equipment and storage medium
CN102169491B (en) Dynamic detection method for multi-data concentrated and repeated records
CN104834599A (en) WEB security detection method and device
CN102455971A (en) Application-level random instruction testing method, system and device
US20170293469A1 (en) Efficient handling of sort payload in a column organized relational database
CN107977504B (en) Asymmetric reactor core fuel management calculation method and device and terminal equipment
CN103500224A (en) Data writing method and device and data reading method and device
US20110264703A1 (en) Importing Tree Structure
CN111752944A (en) Data allocation method and device, computer equipment and storage medium
US10114878B2 (en) Index utilization in ETL tools
Kang et al. Reducing i/o cost in olap query processing with mapreduce
CN112990583A (en) Method and equipment for determining mold entering characteristics of data prediction model
CN116860311A (en) Script analysis method, script analysis device, computer equipment and storage medium
US9449046B1 (en) Constant-vector computation system and method that exploits constant-value sequences during data processing
CN107402939A (en) Declaration form treating method and apparatus
US9286349B2 (en) Dynamic search system
US20210110038A1 (en) Method and apparatus to identify hardware performance counter events for detecting and classifying malware or workload using artificial intelligence
CN105229668B (en) The search that line pattern is indicated using gesture
CN113220551A (en) Index trend prediction and early warning method and device, electronic equipment and storage medium
van der Vlugt Large-scale SVD algorithms for latent semantic indexing, recommender systems and image processing

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant