CN103729342A - File comparison method and device - Google Patents

File comparison method and device Download PDF

Info

Publication number
CN103729342A
CN103729342A CN201210385557.2A CN201210385557A CN103729342A CN 103729342 A CN103729342 A CN 103729342A CN 201210385557 A CN201210385557 A CN 201210385557A CN 103729342 A CN103729342 A CN 103729342A
Authority
CN
China
Prior art keywords
file
data set
compared
comparison
swap data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201210385557.2A
Other languages
Chinese (zh)
Other versions
CN103729342B (en
Inventor
尹祥龙
万鑫明
吴金坛
吕苏
马军
杨惠娟
高伟东
周涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Unionpay Co Ltd
Original Assignee
China Unionpay Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Unionpay Co Ltd filed Critical China Unionpay Co Ltd
Priority to CN201210385557.2A priority Critical patent/CN103729342B/en
Publication of CN103729342A publication Critical patent/CN103729342A/en
Application granted granted Critical
Publication of CN103729342B publication Critical patent/CN103729342B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a file comparison method. The method includes the steps that the sizes of files to be compared are compared, if the sizes of the files to be compared are not uniform, it is determined that the files to be compared are different; otherwise the files to be compared are encoded through the hash function to obtain corresponding conversion data sets; the conversion data sets are compared; if the conversion data sets are the same, the files to be compared are compared. According to the file comparison method, comparison of the files with a large amount of information can be achieved rapidly.

Description

File comparison method and device
Technical field
Present invention relates in general to computer information processing field, relate in particular to the rapid treating technology of large information capacity file.
Background technology
The high speed that the key character of information age is quantity of information expands.For example, in financial field, along with the develop rapidly of financial industry, the equal rapid growth of the quantity of financial transaction file and file size, the fileinfo amount of financial chronological file is huge, and line number is more than 100,000.In addition it also has following characteristics: 1) with behavior unit; 2) between row and row, there is no restriction relation, with random form, be inserted in file.In the concurrent testing work of financial transaction system, need to compare all between two batches of approximate chronological files and not exist together, to confirm the test result of system.Face magnanimity chronological file more than 30G easily, as how less cost compares these files fast, become a difficult problem in current test job.
The comparison method of existing financial chronological file is: successively for example, the visual comparison instrument comparison of two batches of approximate file decompress(ion)s, sequence, file to be compared (Beyond Compare, Diff order), wherein use Beyond Compare also must copy magnanimity finance chronological file to windows platform; For pulling speed, adopt space to exchange the method for time for simultaneously, use multiple machine servicer aggregated structure, utilize the parallel computation of physical machine to remove to solve unit Calculation bottleneck.Existing method is succinctly fast to 1G with interior file comparison, but the comparison of processing magnanimity chronological file is very slow, and the time has reached more than 7 hours altogether.Also there is following shortcoming in existing method: from economic angle, use server cluster to shorten the right time of approximation ratio, but the operation expense of server cluster is higher, needs the mass energy of consumption, the cost spending is larger; From technical standpoint, need unnecessary file ordering step, expend Installed System Memory, CPU, I/O resource, step is too many, needs too much manual intervention.Therefore, also do not have till now practicable practical technique to solve the approximate comparison technology of the desired mass file based on key of financial industry.Thereby research is similar to fast alignment algorithm and approximation ratio countermeasure and slightly seems particularly important.
Summary of the invention
An aspect in order at least to address the above problem, the present invention proposes a kind of file comparison method, comprising: for file to be compared, use hash function to encode and obtain corresponding swap data set; And each swap data set of comparison.
Described in above-mentioned file comparison method, use step that hash function is encoded to comprise the data in each of described file to be compared are obtained to remainder result data to a predetermined value remainder number, and at described transform data, concentrate number and the position of record conflict.
Described in above-mentioned file comparison method, file to be compared comprises file one and file two, described method also comprises: the swap data set of scanning document one, if wherein the transform data of existence and file two is concentrated identical remainder result data, so the number of the conflict in described file one and described file two is subtracted respectively to one, otherwise, continue the swap data set of scanning document one, until the swap data set of file one is scanned and is disposed; If the number of collisions of the swap data set of last file one is not 0, supporting paper one is not the subset of file two, and it is that non-zero corresponding line number row have recorded the difference row in file one comparison document two processes that file one transform data is concentrated number of collisions; The difference row of output file one comparison document two; And the swap data set of scanning document two, if wherein the transform data of existence and file one is concentrated identical remainder result data, so the number of the conflict in described file two and described file one is subtracted respectively to one, otherwise, continue the swap data set of scanning document two, until the swap data set of file two is scanned and is disposed; If the number of collisions of the swap data set of last file two is not 0, supporting paper two is not the subset of file one, and it is that non-zero corresponding line number row have recorded the difference row in file two comparison document one processes, the difference row of output file two comparison documents one that file two transform datas are concentrated number of collisions.
Described in above-mentioned file comparison method, use step that hash function is encoded to comprise the every data line in file to be compared is set up to Hash hash, allow cryptographic hash be uniformly distributed.
The invention allows for a kind of file comparison method, comprising: the size of file more to be compared, if the not of uniform size of described file to be compared causes, determine that immediately described file to be compared is different; Otherwise, for described file to be compared, use hash function to encode and obtain corresponding swap data set; Compare each swap data set; And if swap data set is identical, file more to be compared, otherwise finish epicycle comparison.
Described in above-mentioned file comparison method, use step that hash function is encoded to comprise each the data in described file to be compared are obtained to remainder result data to a predetermined value remainder number, and at described transform data, concentrate number and the position of record conflict.
Described in above-mentioned file comparison method, file to be compared comprises file one and file two, described method also comprises: the swap data set of scanning document one, if wherein the transform data of existence and file two is concentrated identical remainder result data, so the number of the conflict in described file one and described file two is subtracted respectively to one, otherwise, continue the swap data set of scanning document one, until the swap data set of file one is scanned and is disposed; If the number of collisions of the swap data set of last file one is not 0, supporting paper one is not the subset of file two, and it is that non-zero corresponding line number row have recorded the difference row in file one comparison document two processes that file one transform data is concentrated number of collisions; The difference row of output file one comparison document two; And the swap data set of scanning document two, if wherein the transform data of existence and file one is concentrated identical remainder result data, so the number of the conflict in described file two and described file one is subtracted respectively to one, otherwise, continue the swap data set of scanning document two, until the swap data set of file two is scanned and is disposed; If the number of collisions of the swap data set of last file two is not 0, supporting paper two is not the subset of file one, and it is that non-zero corresponding line number row have recorded the difference row in file two comparison document one processes, the difference row of output file two comparison documents one that file two transform datas are concentrated number of collisions.
Described in above-mentioned file comparison method, use step that hash function is encoded to comprise the every data line in file to be compared is set up to Hash hash, allow cryptographic hash be uniformly distributed.
The invention allows for a kind of file comparison device, comprising: file size compare facilities, for the size of file more to be compared, if the not of uniform size of described file to be compared causes, determine that described file to be compared is different; Hash function encoding device, for file to be compared of the same size, is used hash function to encode and obtains corresponding swap data set; Swap data set compare facilities, for comparing each swap data set; And file compare facilities, for file more to be compared when swap data set is identical.
Described in above-mentioned file comparison device, hash function encoding device comprises: remainder counting apparatus, for each data of described file to be compared are obtained to remainder result data to a predetermined value remainder number, and pen recorder, for concentrate number and the position of record conflict at described transform data.
The application of the invention, can realize the comparison of large information capacity file rapidly.
Accompanying drawing explanation
For ease of understanding, by indefiniteness example, embodiments of the invention are described with reference to the accompanying drawings, wherein:
Fig. 1 shows a kind of main flow process of file comparison;
Fig. 2 is a kind of modular structure figure of file comparison scheme;
Fig. 3 shows a kind of flow process of file comparison;
Fig. 4 shows a kind of rapid file comparison method.
Embodiment
Unless separately added and illustrated, as also cognoscible from following discussion, this instructions in the whole text in, utilize such as " processings ", " calculating ", " determining " the discussion of term to represent action or the process of the particular device of use such as computing machine or similar computing electronics.In the context of the present specification, computing machine or similar computing electronics can be handled or figure signal.These signals are typically expressed as physical electronic or the quantity of magnetism in storer, register or out of Memory memory storage, transmitting device or the display device of computing machine or similar computing electronics.For example, computing electronics can comprise the one or more processor of carrying out one or more specific functions.
According to an aspect of the present invention, with hash function, carry out the comparison of file.The file is here financial chronological file, but may be also the file of other type.For file to be compared, use hash function to encode and obtain corresponding swap data set, then compare each swap data set.Hash is exactly the input of random length to be transformed into the output of regular length by hashing algorithm.If existed in structure and the equal record of a key word K, must be at f(K) memory location on.Thus, do not need more just can directly obtain looked into record.This corresponding relation is exactly hash function.
Supposing has two files to be compared, is respectively file A and file B.According to the present invention, use hash function to encode respectively to A and B, obtain swap data set A1 and B1.By relatively A1 and B1 are similar to the relation that draws A and B.
According to an aspect of the present invention, use step that hash function is encoded to comprise the data in each of file to be compared a predetermined value remainder number, and concentrate the number of record conflict and position at transform data.Here alleged conflict, just as understood by those skilled in the art, refers to the phenomenon that different key words is obtained to same hash address.
Suppose that file to be compared comprises file one and file two, obtain the relation between the two, swap data set that can first scanning document one, if wherein the transform data of existence and file two is concentrated identical remainder result data, so the number of the conflict in described file one and described file two is subtracted respectively to one.
If do not find identical remainder result data, continue the swap data set of scanning document one, until the swap data set of file one is scanned and is disposed.If the number of collisions of the swap data set of last file one is not 0, supporting paper one is not the subset of file two.It is that the non-zero corresponding line number row of number have recorded the difference row in file one comparison document two processes that file one transform data is concentrated number of collisions, can be by the consequent part output of the difference row of file one comparison document two.
Follow the swap data set of same scanning document two.If wherein the transform data of existence and file one is concentrated identical remainder result data, so the number of the conflict in described file two and described file one is subtracted respectively to one.If do not find identical remainder result data, continue the swap data set of scanning document two, until the swap data set of file two is scanned and is disposed.If the number of collisions of the swap data set of last file two is not 0, supporting paper two is not the subset of file one.It is that non-zero corresponding line number row have recorded the difference row in file two comparison document one processes that file two transform datas are concentrated number of collisions, can be by the consequent part output of the difference row of file two comparison documents one.
Wherein, if the transform data of file one concentrates the transform data of existence and file two to concentrate identical remainder result data, so the number of the conflict in file one and file two is subtracted respectively to one, if the number of the conflict that the transform data of definitive document two is concentrated is non-vanishing, file two is not the subset of file one.If the number of the conflict that the transform data of definitive document one is concentrated is non-vanishing, file one is not the subset of file two.
Said method is the approximate comparison method of file, according to a further aspect of the invention, has also proposed a kind of method of quick comparison here.The method as shown in Figure 1.First carry out file size consistency check, not of uniform size just causing directly returns results, and file to be compared is different.Otherwise, if in the same size, use hash function to encode and obtain corresponding swap data set and compare each swap data set, if swap data set is inconsistent, also illustrate that file to be compared is different.If swap data set is also consistent, file more to be compared more.
According to a further aspect of the invention, a kind of file comparison device has been proposed here.This device comprises file size compare facilities, hash function encoding device, swap data set compare facilities and file compare facilities.Wherein, file size compare facilities, for the size of file more to be compared, if the not of uniform size of file to be compared causes, determines that file to be compared is different.Hash function encoding device obtains corresponding swap data set for using hash function to encode to the file to be compared causing not of uniform size.Swap data set compare facilities is for comparing each swap data set.File compare facilities is for file more to be compared when swap data set is identical.
Hash function encoding device can comprise remainder counting apparatus and pen recorder.The former is for obtaining remainder result data by each data of file to be compared to a predetermined value remainder number.The latter is for concentrating number and the position of record conflict at transform data.
Fig. 2 shows according to a further aspect of the invention, the schematic diagram that carries out the comparison of file with the device that comprises five modules.First module is dispatching control module, scheduling controlling interface.This module is supported multiprocess scheduling, real-time initiation, time-out, the termination of file comparison task.Adopt the strategy get the upper hand of fast, search file below catalogue, if file first relatively filename and the size of two files whether mate, if unanimously further call again file comparing module, enter line data and calculate cryptographic hash and set up index comparison; If not file but catalogue with regard to recursive call oneself.
Second module is file comparing module.First this module travels through all row of file A, and the row data key of file A is set up to array of indexes, adopts the hashing technique of high efficiency row compress technique and low coupling in internal memory, to set up file A concordance list; Then travel through each row of file B, relatively whether the cryptographic hash of each row data in B exists in the concordance list of A, if there is no, represents that this row is not existing together of two files, exports the information of this row.If it is consistent that file A compares with file B, by identical method, file B is compared with file A conversely.
The 3rd module is task configuration module.This module is for arranging key to be compared and the comparison strategy of file white list, file blacklist, file line; Determine which file comparison, which file is not compared; The key of configuration file has been realized the approximate comparison function of file; And determine whether to select the comparison strategy get the upper hand of fast.
The 4th module is result processing module.This module shows tester the result of file comparing module output according to personal settings, strengthens the legibility of result.Particularly, can be according to file standard standard, all not the existing together that splits the output of file comparing module is recorded to specific file, for example, in Excel file, and inconsistent file part can be used to eye-catching color mark.
The 5th module is daily record output module.This module outputs to the procedural information of all operations in said method in daily record, facilitates tester to analyze and searches the detailed operation note in comparison process.
Fig. 2 shows the main flow process of the comparison method of file simultaneously, comprises following step:
(1) configuration step of comparison task: comparison personnel have planned input directory, the output directory of program according to comparison mission requirements; Configuration need to compare the white list, blacklist of file, to settings such as the key of file mate; On main control system, configuring comparison task does not affect ongoing comparison task, has the dirigibility of height.
(2) execution step of comparison task: comparison personnel carry out the comparison task configuring, task scheduling modules is responsible for dispatching the current loading condition of evaluating system, according to the size of task, temporal sensitivity, carry out distribution process quantity, and dispatch the concurrent work of a plurality of processes, record the tasks carrying situation of each process, task is there is to the power of real-time initiation, time-out, termination.
(3) invocation step of submodule: scheduling controlling interface calls other submodules according to each interface parameters, process to comparison task is monitored, the result of its comparison is confirmed and integrated with the result of other machines, feed back to process the next item down task.
File comparing module can be with reference to the Implementation Technology shown in figure 3.For example, 1) with file A comparison document B, carry out following steps:
Get file A, use hash function to encode to every a line content of A, hash is to the capable two-dimensional array of N, array A[i]={ index value i, conflict sign cfi, line number ai}.
Each row of getting file B carries out hash, and same hash is to the two-dimensional array the inside of N, array B[j]={ index value j, conflict sign cfj, line number bj}.
Concerning B[j] number, get index and be j and search to array A the inside, A[j]=index value j, and conflict sign cfj, line number aj}, so use the ai of the capable relatively A file of the bj of B file capable, what carry out is row and the comparison of row.In the time of on all four, continue to compare next time, delete index record corresponding in A and B this time, number of collisions deducts 1, reduces to till 0.The end-state of file reading B index, obtains inconsistent line number and outputs to the journal file of B.
Conflict record count in the concordance list of viewing files B, when cfj=0, in supporting paper B, every a line content is all in full accord with file A; When cfj unequal to 0, in supporting paper B, exist and the inconsistent line number of file A, and the content of row corresponding to output file B is in journal file.
Here can consider four kinds of situations:
A) not conflict of B, A is conflict not.
Suppose that Hash function adopts f (x)=A[i] %4, can obtain following comparison process so:
Input data:
A={1,2,3};
B={1};
The index situation of file A, the B of comparison process:
Figure 2012103855572100002DEST_PATH_IMAGE002
Figure DEST_PATH_IMAGE004
Figure DEST_PATH_IMAGE006
Figure DEST_PATH_IMAGE008
Comparison result:
Content in file B is the subset of file A.
B) not conflict of B, A has conflict.
Suppose that Hash function adopts f (x)=A[i] %4, can obtain following comparison process so:
Input data:
A={1,2,3,5};
B={1};
The index situation of file A, the B of comparison process:
Figure DEST_PATH_IMAGE010
Figure DEST_PATH_IMAGE012
Figure DEST_PATH_IMAGE014
Comparison result:
Content in file B is the subset of file A.
C) B has conflict, and A is conflict not.
From a), this situation is mainly that the project of B conflict does not find all the time in A, finally stays in the daily record of B.
Suppose that Hash function adopts f (x)=A[i] %4, can obtain following comparison process so:
Input data:
A={1,2,3};
B={1,5};
The index situation of file A, the B of comparison process:
Figure DEST_PATH_IMAGE004A
Figure DEST_PATH_IMAGE018
Figure DEST_PATH_IMAGE020
Comparison result:
In file B, exist and the inconsistent line number of file A, line number is 2, the second row of output file B, and 5, output in journal file.
Here need to consider another situation:
Suppose that Hash function adopts f (x)=A[i] %4, can obtain following comparison process so
Input data:
A={2,3,9};
B={1,5};
The index situation of file A, the B of comparison process:
Figure DEST_PATH_IMAGE022A
Figure DEST_PATH_IMAGE022AA
Figure 562858DEST_PATH_IMAGE024
Figure 513496DEST_PATH_IMAGE024
Comparison result:
In file B, exist and the inconsistent line number of file A, line number is 2 and 1, the second row of output file B, the first row, and 5 and 1, output in journal file.
D) B has conflict, and A has conflict.
From b), this situation is mainly that the project that has just started B does not find all the time in A, finally stays in the daily record of B.
Suppose that Hash function adopts f (x)=A[i] %4, can obtain following comparison process so:
Input data:
A={1,2,3,5};
B={1,5};
The index situation of file A, the B of comparison process:
Figure DEST_PATH_IMAGE026
Figure DEST_PATH_IMAGE028
Figure DEST_PATH_IMAGE004AA
Figure DEST_PATH_IMAGE032
Figure DEST_PATH_IMAGE034
Comparison result:
Content in file B is the subset of file A.
2) file B comparison document A.
Because file A comparison document B is final, only can get in B and the inconsistent row of A, so need to be more once file A whether there is the row not having in file B, according to above 1) shown in step, by file B comparison document A and in inconsistent daily record of writing A in A.
Suppose that Hash function adopts f (x)=A[i] %3, can obtain following comparison process so:
Input data:
A={4,7…3n+1};
B={1};
The index situation of file A, the B of comparison process:
Figure DEST_PATH_IMAGE038
Figure DEST_PATH_IMAGE038A
Figure DEST_PATH_IMAGE040
Figure DEST_PATH_IMAGE042
Comparison result is that in file B, existence and file A are inconsistent.
According to a further aspect of the invention, a kind of quick winning strategy technical method is also proposed here.As shown in Figure 4, first the method carries out file size consistency check, and not of uniform size just causing directly returns results.Then, if in the same size, first file A and file B comparison, check that whether the content of correspondence in Hash index is consistent, find that there is that a place is inconsistent just to be returned results immediately.Next, if file A is consistent with the content of file B comparison, then file B and file A comparison, check that whether content corresponding in Hash index is consistent, if found that there is, a place is inconsistent just to be returned results immediately.By this strategy, can tell fast which file of tester is discrepant.
Method described here can realize by variety of way according to application at least partly according to special characteristic or example.For example, this method can realize by hardware, firmware, software or their any combination.In hardware is realized, for example, device can be at one or more special IC (ASICs), digital signal processor (DSPs), digital signal processing device (DSPDs), programmable logic device (PLD) (PLDs), field programmable gate array (FPGAs), processor, controller, microcontroller, microprocessor, electronic installation or is designed to carry out in other device units of all functions as described herein or their any combination and realizes.
Equally, in certain embodiments, method can adopt the module of carrying out function described here or their any combination to realize.For example, any machine readable media of visibly specializing instruction can be used in realizing these class methods.In one embodiment, for example, software or code can be stored in storer and by processing unit and move.Storer can be in processing unit and/or processing unit outside realize.Here the term that used " storer " represents long-term, short-term, volatibility, non-volatile or other storer of any type, and is not limited to any particular type or the quantity of storer or the type of storage medium of storer.
Storage medium can comprise any usable medium that can be visited by computing machine, computing platform, calculation element etc.As an example rather than restriction, computer-readable medium can comprise RAM, ROM, EEPROM, CD-ROM or other optical disc storage, disk storage or other magnetic memory apparatus, or can be used for carrying or storing the program code of the expectation of taking instruction or data structure form and other any medium that can be visited by computing machine, computing platform or calculation element.
Although shown the current content that is considered to exemplary characteristics above, one skilled in the art will appreciate that in the situation that do not deviate from claimed theme, can carry out various modifications to specific embodiment described in the present invention.And embodiment neutralizes feature that other embodiment is identical and may omit for explain succinct, those skilled the in art will appreciate that these omissions.Therefore, claimed theme is not limited to disclosed particular example, and on the contrary, it has comprised all the elements within the scope that falls into claims.

Claims (10)

1. a file comparison method, comprising:
For file to be compared, use hash function to encode and obtain corresponding swap data set; And
Compare each swap data set.
2. file comparison method as claimed in claim 1, the step that wherein said use hash function is encoded comprises the data in each of described file to be compared is obtained to remainder result data to a predetermined value remainder number, and at described transform data, concentrates number and the position of record conflict.
3. file comparison method as claimed in claim 2, wherein said file to be compared comprises file one and file two, described method also comprises:
The swap data set of scanning document one, if wherein the transform data of existence and file two is concentrated identical remainder result data, so the number of the conflict in described file one and described file two is subtracted respectively to one, otherwise, continue the swap data set of scanning document one, until the swap data set of file one is scanned and is disposed; If the number of collisions of the swap data set of last file one is not 0, supporting paper one is not the subset of file two, and it is that non-zero corresponding line number row have recorded the difference row in file one comparison document two processes that file one transform data is concentrated number of collisions; The difference row of output file one comparison document two; And
The swap data set of scanning document two, if wherein the transform data of existence and file one is concentrated identical remainder result data, so the number of the conflict in described file two and described file one is subtracted respectively to one, otherwise, continue the swap data set of scanning document two, until the swap data set of file two is scanned and is disposed; If the number of collisions of the swap data set of last file two is not 0, supporting paper two is not the subset of file one, and it is that non-zero corresponding line number row have recorded the difference row in file two comparison document one processes, the difference row of output file two comparison documents one that file two transform datas are concentrated number of collisions.
4. file comparison method as claimed in claim 1, the step that wherein said use hash function is encoded comprises sets up Hash hash by the every data line in file to be compared, allows cryptographic hash be uniformly distributed.
5. a file comparison method, comprising:
The size of file more to be compared, if the not of uniform size of described file to be compared causes, determines that described file to be compared is different immediately;
Otherwise, for described file to be compared, use hash function to encode and obtain corresponding swap data set;
Compare each swap data set; And
If swap data set is identical, file more to be compared, otherwise finish epicycle comparison.
6. file comparison method as claimed in claim 5, the step that wherein said use hash function is encoded comprises each the data in described file to be compared is obtained to remainder result data to a predetermined value remainder number, and at described transform data, concentrates number and the position of record conflict.
7. file comparison method as claimed in claim 6, wherein said file to be compared comprises file one and file two, described method also comprises:
The swap data set of scanning document one, if wherein the transform data of existence and file two is concentrated identical remainder result data, so the number of the conflict in described file one and described file two is subtracted respectively to one, otherwise, continue the swap data set of scanning document one, until the swap data set of file one is scanned and is disposed; If the number of collisions of the swap data set of last file one is not 0, supporting paper one is not the subset of file two, and it is that non-zero corresponding line number row have recorded the difference row in file one comparison document two processes that file one transform data is concentrated number of collisions; The difference row of output file one comparison document two; And
The swap data set of scanning document two, if wherein the transform data of existence and file one is concentrated identical remainder result data, so the number of the conflict in described file two and described file one is subtracted respectively to one, otherwise, continue the swap data set of scanning document two, until the swap data set of file two is scanned and is disposed; If the number of collisions of the swap data set of last file two is not 0, supporting paper two is not the subset of file one, and it is that non-zero corresponding line number row have recorded the difference row in file two comparison document one processes, the difference row of output file two comparison documents one that file two transform datas are concentrated number of collisions.
8. file comparison method as claimed in claim 5, the step that wherein said use hash function is encoded comprises sets up Hash hash by the every data line in file to be compared, allows cryptographic hash be uniformly distributed.
9. a file comparison device, comprising:
File size compare facilities, for the size of file more to be compared, if the not of uniform size of described file to be compared causes, determines that described file to be compared is different;
Hash function encoding device, for file to be compared of the same size, is used hash function to encode and obtains corresponding swap data set;
Swap data set compare facilities, for comparing each swap data set; And
File compare facilities, for file more to be compared when swap data set is identical.
10. file comparison device as claimed in claim 9, wherein said hash function encoding device comprises:
Remainder counting apparatus, for each data of described file to be compared are obtained to remainder result data to a predetermined value remainder number, and
Pen recorder, for concentrating number and the position of record conflict at described transform data.
CN201210385557.2A 2012-10-12 2012-10-12 File comparison method and device Active CN103729342B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210385557.2A CN103729342B (en) 2012-10-12 2012-10-12 File comparison method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210385557.2A CN103729342B (en) 2012-10-12 2012-10-12 File comparison method and device

Publications (2)

Publication Number Publication Date
CN103729342A true CN103729342A (en) 2014-04-16
CN103729342B CN103729342B (en) 2016-09-28

Family

ID=50453421

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210385557.2A Active CN103729342B (en) 2012-10-12 2012-10-12 File comparison method and device

Country Status (1)

Country Link
CN (1) CN103729342B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104639629A (en) * 2015-01-30 2015-05-20 英华达(上海)科技有限公司 File comparing method and system at client and cloud
CN105391566A (en) * 2014-09-04 2016-03-09 中国移动通信集团黑龙江有限公司 Dynamic network equipment configuration comparison method and device
CN105787041A (en) * 2016-02-26 2016-07-20 中国银联股份有限公司 Large file comparison method and comparison system based on data characteristic codes
CN106055692A (en) * 2016-06-12 2016-10-26 上海爱数信息技术股份有限公司 Automatic testing method and system for comparison files or folders
CN108446394A (en) * 2018-03-26 2018-08-24 网易(杭州)网络有限公司 The control methods of file difference and device
CN112181479A (en) * 2020-09-23 2021-01-05 中国建设银行股份有限公司 Method and device for determining difference between code file versions and electronic equipment
CN113505137A (en) * 2021-07-27 2021-10-15 重庆市规划和自然资源信息中心 Real estate space graph updating method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2005201758A1 (en) * 2005-04-27 2006-11-16 Canon Kabushiki Kaisha Method of learning associations between documents and data sets
CN101146111A (en) * 2007-10-19 2008-03-19 深圳市迅雷网络技术有限公司 A file download method and device
CN101398837A (en) * 2008-10-23 2009-04-01 深圳市奇迹通讯有限公司 Method for rapidly matching sms text
CN101957858A (en) * 2010-09-27 2011-01-26 中兴通讯股份有限公司 Data comparison method and device
CN102467458A (en) * 2010-11-05 2012-05-23 英业达股份有限公司 Method for establishing index of data block

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2005201758A1 (en) * 2005-04-27 2006-11-16 Canon Kabushiki Kaisha Method of learning associations between documents and data sets
CN101146111A (en) * 2007-10-19 2008-03-19 深圳市迅雷网络技术有限公司 A file download method and device
CN101398837A (en) * 2008-10-23 2009-04-01 深圳市奇迹通讯有限公司 Method for rapidly matching sms text
CN101957858A (en) * 2010-09-27 2011-01-26 中兴通讯股份有限公司 Data comparison method and device
CN102467458A (en) * 2010-11-05 2012-05-23 英业达股份有限公司 Method for establishing index of data block

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105391566A (en) * 2014-09-04 2016-03-09 中国移动通信集团黑龙江有限公司 Dynamic network equipment configuration comparison method and device
CN105391566B (en) * 2014-09-04 2018-12-07 中国移动通信集团黑龙江有限公司 A kind of method and device that dynamic network equipments configuration compares
CN104639629A (en) * 2015-01-30 2015-05-20 英华达(上海)科技有限公司 File comparing method and system at client and cloud
CN105787041A (en) * 2016-02-26 2016-07-20 中国银联股份有限公司 Large file comparison method and comparison system based on data characteristic codes
CN105787041B (en) * 2016-02-26 2019-08-13 中国银联股份有限公司 Big file comparison method and Compare System based on data characteristics code
CN106055692A (en) * 2016-06-12 2016-10-26 上海爱数信息技术股份有限公司 Automatic testing method and system for comparison files or folders
CN108446394A (en) * 2018-03-26 2018-08-24 网易(杭州)网络有限公司 The control methods of file difference and device
CN108446394B (en) * 2018-03-26 2021-02-19 网易(杭州)网络有限公司 File difference comparison method and device
CN112181479A (en) * 2020-09-23 2021-01-05 中国建设银行股份有限公司 Method and device for determining difference between code file versions and electronic equipment
CN113505137A (en) * 2021-07-27 2021-10-15 重庆市规划和自然资源信息中心 Real estate space graph updating method

Also Published As

Publication number Publication date
CN103729342B (en) 2016-09-28

Similar Documents

Publication Publication Date Title
CN103729342A (en) File comparison method and device
US9886418B2 (en) Matrix operands for linear algebra operations
CN103902702A (en) Data storage system and data storage method
US10572463B2 (en) Efficient handling of sort payload in a column organized relational database
CN102129425B (en) The access method of big object set table and device in data warehouse
CN111258966A (en) Data deduplication method, device, equipment and storage medium
CN105260464B (en) The conversion method and device of data store organisation
CN103902698A (en) Data storage system and data storage method
CN101499065B (en) Table item compression method and device based on FA, table item matching method and device
US10496659B2 (en) Database grouping set query
CN103988174A (en) A data processing apparatus and method for performing register renaming without additional registers
CN107977504B (en) Asymmetric reactor core fuel management calculation method and device and terminal equipment
CN114398346A (en) Data migration method, device, equipment and storage medium
CN107368281B (en) Data processing method and device
Yin et al. Content‐Based Image Retrial Based on Hadoop
CN113887201A (en) Text fixed-length error correction method, device, equipment and storage medium
CN113434660A (en) Product recommendation method, device, equipment and storage medium based on multi-domain classification
US10250278B2 (en) Compression of a set of integers
Li et al. Application and performance optimization of MapReduce model in image segmentation
CN114945902A (en) Shuffle reduction task with reduced I/O overhead
Szalay et al. Gpu-based interactive visualization of billion point cosmological simulations
US20240004954A1 (en) Computer-implemented accumulation method for sparse matrix multiplication applications
CN116049184A (en) Method and device for creating ranking table, electronic equipment and storage medium
CN114840613A (en) Data binning and visual display method, device, equipment and storage medium
Jylhä-Ollila GPU-accelerated k-mer counting

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant