CN102945241A - Hash data structure used for file comparison,hash comparison system and method - Google Patents
Hash data structure used for file comparison,hash comparison system and method Download PDFInfo
- Publication number
- CN102945241A CN102945241A CN2012103330235A CN201210333023A CN102945241A CN 102945241 A CN102945241 A CN 102945241A CN 2012103330235 A CN2012103330235 A CN 2012103330235A CN 201210333023 A CN201210333023 A CN 201210333023A CN 102945241 A CN102945241 A CN 102945241A
- Authority
- CN
- China
- Prior art keywords
- hash
- data
- source file
- fileinfo
- file
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 41
- 230000000052 comparative effect Effects 0.000 claims description 20
- 241001269238 Data Species 0.000 claims description 12
- 238000007689 inspection Methods 0.000 claims description 6
- 238000010586 diagram Methods 0.000 description 14
- 230000008569 process Effects 0.000 description 9
- 238000005516 engineering process Methods 0.000 description 6
- 230000008901 benefit Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 238000004513 sizing Methods 0.000 description 1
- 238000002910 structure generation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/08—Error detection or correction by redundancy in data representation, e.g. by using checking codes
- G06F11/10—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/70—Software maintenance or management
- G06F8/71—Version control; Configuration management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Computer Security & Cryptography (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a hash data structure used for file comparison, a hash comparison system and a method. The hash data comparison system according to the embodiment types can compare the source files by using the hash data comprising the file information and the hash value. The hash data comparison system comprises a file information generation unit, a hash generation unit, and a control unit. The file information generation unit examines the attribute of each of the source files, and generates the file information related to the source files. The hash generation unit can calculate the hash value by applying the hash function algorithm on at least a part of the source files. The control unit can generate the hash data by aiming at the corresponding source files, and the hash value comprises the file information and the hash value. Therefore the invention is advantageous in that all the hash values are not required to be compared in different files, and the files can be compared more quickly.
Description
Technical field
Relate generally to of the present invention is used for the hashing technique of data file, more specifically, the hash comparison system and the method that relate to the hash data structure and utilize this hash data structure, it uses hashed value with the unique trait information of source file, therefore carry out more quickly file relatively.
Background technology
In multiple operation, used the comparison between a plurality of data (particularly data file).For example, in fact in multiple operation, used so file relatively, thus check in operating system (OS) thus in file between variation or patch file and source file compared carry out the patch of being scheduled to.
The traditional file comparison techniques of having used comprise the comparison All Files method, with version information distribute to file and based on version information check file method, hash function is applied to file and the method etc. of comparison document then.
Because the speed that exists a large amount of data to compare and compare is slow, so relatively the method for All Files is not used continually.The defective of version information being distributed to the method for file and comparison document is, even the content of file is changed, file content also may be and the version information coupling, unless file version information be changed, thereby because mismatch and correctly do not carry out file relatively so.
Therefore, in most of situations, calculate hashed value by hash function is applied to file, and by the hashed value of relatively calculating the content of comparison document.Yet, only use the problem of this tradition comparative approach of hashed value to be, when the size of file is larger, need more computational resource generate hashed value, and carry out the required time of corresponding operation and increase.
Summary of the invention
Therefore, aim of the present invention is to solve the above problem that occurs in the prior art, and the purpose of this invention is to provide the hash data structure that can utilize resource more in a small amount easily file to be compared mutually.
Another object of the present invention provides about the hash data structure generation method of described structure and hash data structure comparative approach, and it can utilize the more required hash data structure of file to come more rapidly file to be compared mutually.
Another purpose of the present invention provides the hash comparison system, and it can utilize the more required hash data structure of file to come effectively file to be compared mutually.
According to the aspect of the present invention that realizes above-mentioned purpose, a kind of hash data structure has been proposed, this structure comprises: by predetermined data bit that consist of and relevant with the attribute of source file fileinfo and consisted of by the particular data bit and with source file relevant hashed value, wherein, described hash data structure is included in the data bit corresponding with the fileinfo data bit corresponding with hashed value afterwards.
In embodiment, fileinfo can comprise described source file sizes values, comprise described source file beginning data first's data and comprise in the second portion data of last data of described source file at least one.
In embodiment, described hash data structure may further include the structure head, and this structure head comprises and the hashed value that comprises in this hash data structure and each the relevant structural information in the fileinfo.
In embodiment, described hash data structure may further include the parity information with this hash data structurally associated, wherein, described parity information comprises for the first Parity Check Bits of fileinfo and is used for the second Parity Check Bits of hashed value.
According to the another aspect of the present invention that realizes above-mentioned purpose, proposed a kind ofly for generating the hash data generation method of each hash data of reference source file of will being used for, the method may further comprise the steps: (a) check the attribute of each source file and generate the fileinfo that is made of the tentation data bit based on inspection attribute; (b) calculate hashed value by at least a portion that hashing algorithm is applied to described source file; And (c) generate hash data by continuously described hashed value being connected to described fileinfo.
In embodiment, step (a) can comprise: check described source file size, title and form, comprise described source file beginning data first's data and comprise in the second portion data of last data of described source file at least one; And, first's data of the data of the beginning that generate size, title and the form comprise described source file, comprises described source file and comprise at least one described fileinfo in the second portion data of last data of described source file.
In embodiment, this hash data generation method may further include step (d): generate the hash Parity Check Bits that is used for described hash data.
In embodiment, step (d) can comprise: generate the first Parity Check Bits that is used for described fileinfo; Generate the second Parity Check Bits that is used for described hashed value; And by connecting continuously described the first Parity Check Bits and described the second Parity Check Bits generates described hash Parity Check Bits.
According to another aspect of the present invention that realizes above-mentioned purpose, proposed a kind ofly for generating the hash data method of generationing of each hash data of reference source file of will being used for, the method may further comprise the steps: (a) generation comprises the structure head with the hashed value that comprises and each the relevant structural information in the fileinfo in the hash data structure; (b) check the attribute of each source file and based on inspection attribute and generate the fileinfo that is consisted of by the tentation data bit; (c) generate hashed value by at least a portion that hashing algorithm is applied to described source file; And (d) generate hash data by continuously described hashed value being connected to described fileinfo.
According to another aspect of the present invention that realizes above-mentioned purpose, proposed the hash data comparative approach that a kind of hash data that comprises fileinfo and hashed value for utilization compares two source files mutually, the method may further comprise the steps: (a) check two hash datas that are associated with described two source files respectively; Two fileinfos that (b) will comprise in described two hash datas compare mutually; And if (c) described two fileinfos are identical, two hashed values that then will comprise in described two hash datas compare mutually, and if described two hashed values be identical, determine that then described two source files are identical files.
In embodiment, described fileinfo can comprise corresponding source file size, title and form, comprise described source file beginning data first's data and comprise in the second portion data of last data of described source file at least one.
In embodiment, step (b) can comprise that each data bit that will consist of described two fileinfos compares mutually.
In embodiment, step (b) can comprise: for each fileinfo in described two fileinfos, be identified in the source file that comprises in the corresponding document information size, title and form, comprise described source file beginning data first's data and comprise in the second portion data of last data of described source file at least one; And, size, title and the form of the described source file that just has been identified, comprise described source file beginning data first's data and comprise in the second portion data of last data of described source file at least one, mutual more described two fileinfos.
According to another aspect of the present invention that realizes above-mentioned purpose, proposed a kind of utilization comprise fileinfo, hashed value and comprise with fileinfo and hashed value in the hash data of structure head of each relevant structural information hash data comparative approach that two source files are compared mutually, the method may further comprise the steps: (a) the structure head with described two source files compares mutually, and whether definite hash data has identical structure; (b) if determine that described hash data has identical structure, two fileinfos that then will be associated with two source files respectively compare mutually; And if (c) described two fileinfos are identical, the hashed value that then will be associated with two source files respectively compares mutually, and if hashed value be identical, determine that then described two source files are identical files.
According to another aspect of the present invention that realizes above-mentioned purpose, the hash data comparison system that a kind of hash data that comprises fileinfo and hashed value for utilization compares source file has mutually been proposed, this system comprises: fileinfo generation unit, described fileinfo generation unit are constructed to check attribute and the generation fileinfo relevant with described source file of each source file; Hash generation unit, described hash generation unit are constructed to by the hash function algorithm application is calculated hashed value at least a portion of described source file; And control module, described control module is constructed to for respective sources file generated hash data, and described hash data comprises described fileinfo and described hashed value.
In embodiment, described hash data comparison system may further include the hash file administrative unit, and described hash file administrative unit is constructed to store the hash data of generation and keeps and the information relevant with the source file that is associated of the hashed value of storing.
In embodiment, described control module can be by sequentially comparison document information and hashed value are determined the homogeny of described the first source file and described the second source file between the first source file and the second source file.
In embodiment, described control module can generate the structure head that comprises the identifying information relevant with described hashed value with described fileinfo, and generates the described hash data that comprises described structure head, described fileinfo and described hashed value.
In embodiment, described control module can be between described the first source file and described the second source file sequentially comparative structure head, fileinfo and hashed value, if and then structure head, fileinfo and the hashed value of described the first source file and described the second source file are identical, determine that then described the first source file is identical file with described the second source file.
In embodiment, described control module can generate the Parity Check Bits for described hash data, and described Parity Check Bits comprises the Parity Check Bits that is respectively described fileinfo and the calculating of described hashed value.
Description of drawings
According to the detailed description below in conjunction with accompanying drawing, above and other objects of the present invention and feature are with easier to understand, wherein:
Fig. 1 is the reference diagram that illustrates according to the embodiment of hash data structure of the present invention;
Fig. 2 is the reference diagram that illustrates according to another embodiment of hash data structure of the present invention;
Fig. 3 is the reference diagram that illustrates according to another embodiment of hash data structure of the present invention;
Fig. 4 is the structural map that illustrates according to the embodiment of hash comparison system of the present invention;
Fig. 5 is the process flow diagram that the embodiment of the hash data generation method that can be carried out by the hash comparison system of Fig. 4 is shown;
Fig. 6 is the process flow diagram that another embodiment of the hash data generation method that can be carried out by the hash comparison system of Fig. 4 is shown;
Fig. 7 is the process flow diagram that the embodiment of the hash data comparative approach that can be carried out by the hash comparison system of Fig. 4 is shown;
Fig. 8 is the process flow diagram that another embodiment of the hash data comparative approach that can be carried out by the hash comparison system of Fig. 4 is shown; And
Fig. 9 is the structural map that illustrates according to another embodiment of hash comparison system of the present invention.
Embodiment
Disclosed technology only is the embodiment of structure or functional description in the present invention, and therefore the scope of disclosed technology should not be understood to be limited by the embodiment of describing in this instructions.That is, embodiment can be revised in a variety of forms and can have various forms, thereby the scope of disclosed technology should be understood to include the equivalent that can realize technical spirit of the present invention.
The implication of the term of describing in this manual simultaneously, should be appreciated that as follows.
Only be used for parts and other parts are distinguished such as " first " and " second " such word, and scope of the present invention should not limited by these terms.For example, first component can be appointed as second component, and in a similar manner, second component can be appointed as first component.
In whole instructions, it should be understood that, the description that the indication first component ' attach ' to second component can comprise the situation that wherein exists first component in some other parts to be connected to second component between first component and second component, and wherein first component " directly " is connected to the situation of second component.On the contrary, it should be understood that the description of indicating first component " directly " to be connected to second component means do not have parts to be inserted between first component and the second component.Simultaneously, illustrate the relation between parts other description (for example, " and ... between " and " directly exist ... between " or " with ... adjacent " and " directly with ... adjacent ") also can understand in a similar manner.
It should be understood that singular references comprises the plural number statement, unless point out particularly in the text opposite description.In this manual, it should be understood that, only be intended to indication such as " comprising " or " having " such term and have feature, numeral, step, operation, parts, part or its combination, and do not get rid of existence or add one or more further feature, numeral, step, operation, parts, part or its combination.
Reference symbol in each step (such as a, b, c etc.) is used for the convenience of description, and do not indicate the order of each step, and each step can be according to occur in sequence different from the order of describing in instructions, unless limit clearly in the text the concrete order of step.That is, step can according to identical the occurring in sequence of order of describing in this manual, perhaps basically side by side occur, perhaps occur in reverse order.
Unless differently limit, all terms that no person is used herein to comprise technical term or scientific and technical terminology all have with the present invention under the identical meanings usually understood of the those of ordinary skill of technical field.The term identical with those terms that define in the dictionary that usually uses should be understood to have the implication identical with the context implication of prior art, and be not interpreted as desirable or excessive formal implication, unless they are clearly defined in this manual.
In the following description, the file that term " source file " is expressed as follows, this document are that the hash data structure is with the object that is applied to.With the typical characteristics of hashed value similarly, the invention provides the hash data structure that has for the independent values of each source file.
Fig. 1 is the reference diagram that illustrates according to the embodiment of hash data structure of the present invention.
With reference to figure 1, hash data structure 100 comprises fileinfo 110 and hashed value 120.More specifically, hash data structure 100 can be constructed to be included in the data bit bit corresponding with hashed value afterwards about the fileinfo 110 of source file.
In one or more system according to embodiment described later, fileinfo 110 can be constructed to have different length.That is, fileinfo 110 needn't be made of specific data bit, but can be made of the data bit corresponding with pre-sizing according to the setting of system or according to need for environment.
Source file sizes values 111 is data of the size of indication source file.
First's data 112 are first parts than the source file corresponding with predetermined length that rises abruptly from source file, and second portion data 113 are last parts than the source file corresponding with predetermined length that rises abruptly from source file.In this case, can differently determine according to corresponding file comparison system the length of first's data 112 and second portion data 113, thereby the present invention is not by these length restriction.
Hashed value 120 is by hashing algorithm being applied to the data that source file obtains.In embodiment, hashed value 120 can be set to concrete bit.That is, fileinfo 110 is constructed to so that the element and the size of this element that are included in wherein can be changed, and hashed value 120 can be restricted to for example so concrete size (data bit) of standardized size.For example, in the situation of SHA-0 or SHA-1 algorithm, hashed value 120 can have 160 bits, in the situation of SHA-256/224 algorithm, hashed value can have 256/224 bit, and in the situation of SHA-512/384 algorithm, hashed value can have 512/384 bit.In other words, even according to embodiment, hashed value 120 is applied in one or more system, and it also can be made of the data bit of length-specific.That is, because hashed value preferably determines according to standard, so it can be restricted to the data bit of specific size.
When comparison document, fileinfo 110 must compare before hashed value 120.For example, when being described below the example that wherein is desirably in file A search file A in the C, utilize the fileinfo relevant with file A that file A is compared mutually to C, therefore so that can identify file A.In this case, because can only utilize fileinfo 110 to find corresponding file, thus will hashed value mutually not compare, thus can utilize more resource in a small amount to find more quickly the file of expectation.
Fig. 2 is the reference diagram that illustrates according to another embodiment of hash data structure of the present invention.Compare with the embodiment of Fig. 1, the hash data structure further comprises structure head 130 shown in figure 2.
In embodiment, structure head 130 can comprise the information relevant with the hash function that is used for calculating hashed value 120.For example, structure head 130 can comprise the information relevant with the hash function that is used for calculating corresponding hashed value 120 (for example SHA-0 or SHA-1).
In embodiment, fileinfo 110 can only comprise at least one in three kinds of data 111 to 113 illustrated in the accompanying drawings, and structure head 130 can provide be included in fileinfo 110 in the relevant information of data.
For example, suppose that document size information and first's data and second portion data are expressed as respectively A, B and C, document size information has two bytes of fixed size, and structure head 130 is made of " 6AB ".In this case, " 6 " in structure head 130 are the values of sum of the byte of indication fileinfo 110, and " AB " indication fileinfo 110 is made of document size information 111 and first's data 112.
In the embodiment of Fig. 2, disclosed hash data structure 100 also can be applied to the situation that the fileinfo 110 that wherein has different length is used by individual system.That is, the reason about this is to utilize structure head 130 to be identified for individually the bit of the element of hash data structure 100.
Fig. 3 is the reference diagram that illustrates according to another embodiment of hash data structure of the present invention.Compare with the embodiment of Fig. 1, hash data structure shown in Figure 3 further comprises parity information 140.
In embodiment, parity information 140 can comprise that (i) is used for the Parity Check Bits of fileinfo 110 and (ii) for the Parity Check Bits of hashed value 120.This plan is used for determining each parity values that because when comparison document, the present invention can only utilize fileinfo 110 to finish comparison.
When the transmission of file etc. occurs, the more effectively execution error inspection and file compared mutually of the embodiment of Fig. 3.
Fig. 4 is the structural map that illustrates according to the embodiment of hash comparison system of the present invention.
In embodiment, fileinfo generation unit 210 can be by generating above-mentioned first's data and second portion data from first bit of the data bit of source file and last reads the preset length of source file than rising abruptly data bit.In this case, preset length can be corresponding to first's data of corresponding hash data structure and the size of second portion data.
In embodiment, hash generation unit 220 has a plurality of hash functions and can utilize in response to the request of control module 250 concrete hash function to generate hashed value for source file.
In embodiment, hash generation unit 220 can generate the hashed value that only is used for the part of source file.For example, when the size of source file was equal to or greater than predetermined value, hash generation unit 220 can generate the hashed value for the part of the source file corresponding with default size.In another embodiment, hash generation unit 220 can also generate the hashed value that only is used for the remainder of the source file except first's data and second portion data.
Hash file administrative unit 230 can the managed source file and with the corresponding hash file of source file (structure).For example, hash file administrative unit 230 storage hash files and keeping and the information relevant with the source file that mates of corresponding hash file (for example, link information etc.).
Source file administrative unit 240 can be stored source file and be kept the historical record of each source file.For example, if determine that file A changes because file A is carried out hash relatively, then corresponding file A and its hash historical record can be stored in the source file administrative unit 240.
In embodiment, control module 250 can generate the hash data structure (file) for each source file.More specifically, control module 250 concrete source file can be provided to fileinfo generation unit 210 and hash generation unit 220 the two, and utilize the hashed value and the fileinfo that have received in response to described concrete source file to generate the hash data structure.With reference to Fig. 5 and Fig. 6 the embodiment relevant with the generation of hash data structure described in further detail.
In embodiment, control module 250 can utilize the hash data structure that two source files are compared mutually.Hash data structure according to the present invention is divided into fileinfo and hashed value, and utilizes so that architectural feature compares source file mutually.More specifically, control module 250 analysis is treated mutually the hash data structure of source file relatively, and determines whether identical file of source file by the fileinfo that utilizes the hash data structure.If determine that source file is identical file, then control module 250 checks by the hashed value of utilizing the hash data structure whether source file has identical content.The present invention at first carries out and utilizes fileinfo to determine the whether step of identical file of file, and if only file be determined to be identically, then carry out the step that between hashed value, compares, therefore compare more rapidly.
In embodiment, when mutual comparison document information, control module 250 can compare each data bit that consists of described fileinfo mutually.In another embodiment, control module 250 can be identified each element that consists of described fileinfo, and can pass through the mutual relatively element through identifying and more described fileinfo.That is, for each fileinfo, identify at least one in size, title, form, first's data and the second portion data that are included in the source file in the corresponding fileinfo, and can compare with the element of another fileinfo through the element of identification.
In embodiment, control module 250 can be provided to hash file administrative unit 230 with the hash file that generates and the source file information that is associated with described hash file, thereby hash file can be managed.Control module 250 is provided to hash file administrative unit 230 with the hash file that generates, thereby hash file is stored in the hash file administrative unit 230.When receiving for such as the request of another more such operation of hash the time, can be from hash file administrative unit 230 to control module 250 provide and the corresponding hash file of particular source file, thereby can carry out predetermined operation.
In embodiment, control module 250 can be controlled source file administrative unit 240, thereby generates the historical record of source file.For example, when patch etc. occurring for identical source file, can require the historical record of patch.In the situation of this example, control module 250:(i) as the result who between source file, compares, utilize fileinfo to determine whether identical source file of source file, if and (ii) utilize hashed value to determine that the content of file has variation, information that then will be relevant with the respective sources file and be provided to source file administrative unit 240 with the information of hash data structurally associated is therefore so that can generate historical record.
In embodiment, control module 250 can be for each hash data structural generation structure head.More specifically, when providing fileinfo and hashed value by fileinfo generation unit 210 and hash generation unit 220 respectively, control module 250 can be for hash data structural generation structure head, thereby can identify fileinfo and hashed value.For example, control module 250 can generate the structure head that comprises the information that the element that comprises, the data length of each element, the length of hashed value etc. are indicated in fileinfo 110.In this embodiment, when mutual comparison of hashed data structure, then control module 250 at first analytical structure head and determines whether identical file of two source files to be compared based on fileinfo with identification fileinfo and hashed value.If determine that source file is identical file, then control module 250 can be by coming relatively mutually to determine the hashed value of file whether the content of file changes.
In embodiment, control module 250 can generate parity information and add this parity information to each hash data structure.More specifically, control module 250 can generate for the Parity Check Bits of fileinfo and be used for the Parity Check Bits of hashed value, and can generate the parity information that comprises above-mentioned two Parity Check Bits.This embodiment can be applied to wherein the situation of transmission that the hash data structure occurs etc. between different systems.For fileinfo and the hashed value of hash data structure, calculate respectively Parity Check Bits, thereby when the hash data structure is compared mutually, can carry out more rapidly parity-check operations.
Fig. 5 is the process flow diagram that the embodiment of the hash data generation method that can be carried out by the hash comparison system of Fig. 4 is shown.
With reference to figure 5, at step S510, fileinfo generation unit 210 can check the attribute of each source file under the control of control module 250.In this case, attribute is the data that are collected with spanned file information, and as mentioned above, can be file size, file name, file layout, first's data or second portion data etc.
At step S520, fileinfo generation unit 210 can be based on the attribute spanned file information on inspection of source file.With hash data mutually relatively the time, fileinfo is used to determine whether identical file of two source files being compared.As mentioned above, fileinfo can comprise at least one in file size, first's data or the second portion data.Alternatively, fileinfo can comprise file name or file layout.Fileinfo generation unit 210 is provided to control module 250 with the fileinfo that generates.
At step S530, hash generation unit 220 can generate under the control of control module 250 and the corresponding hashed value of each source file.In embodiment, hash generation unit 220 can have various hashing algorithms, and can utilize hashing algorithm by control module 250 request to generate hashed value for source file.In embodiment, hash generation unit 220 can only utilize the part of source file to generate hashed value under the control of control module 250.Hash generation unit 220 is provided to control module 250 with the hashed value that generates.
At step S540, control module 250 can utilize fileinfo and hashed value to generate hash data.Control module 250 can generate hash data by being connected to the corresponding data bit of fileinfo with the corresponding data bit of hashed value continuously.In this embodiment, control module 250 can be known fileinfo since the first bit in advance until which bit finishes.Therefore, when control module 250 is carried out control so that when fileinfo generation unit 210 and hash generation unit 220 spanned file information and hashed value, can make so request for the generation of fileinfo and hashed value, comprise the information relevant with the size of data.
Fig. 6 is the process flow diagram that another embodiment of the hash data generation method that can be carried out by the hash comparison system of Fig. 4 is shown.The embodiment of Fig. 6 relates to the embodiment that wherein utilizes the said structure head and generate hash data.Obtain the embodiment of Fig. 6 by the embodiment that predetermined process is added to Fig. 5, thereby will describe briefly same or analogous those steps of step in the embodiment with Fig. 5.
With reference to figure 6, at step S610, control module 250 can determine to be included in the element in the fileinfo in advance.That is, control module 250 can determine to be included in the type of the element in the fileinfo, the size of element etc. in advance, and the maintenance information relevant with the structure of fileinfo.Afterwards, control module 250 can demand file information generating unit 210 spanned file information, comprise the information relevant with the element of determining.
At step S640, hash generation unit 220 can generate the hashed value that is used for source file under the control of control module 250, and this hashed value is provided to control module 250.
At step S650, control module 250 can generate the structure head about fileinfo and hashed value.As mentioned above, the structure head can comprise the information with the structurally associated of hash data.Be that about the reason of using the structure head the present invention will separate with hashed value from the fileinfo in the hash data, and then individually each of fileinfo and hashed value compared.In embodiment, control module 250 can be before fileinfo and hashed value be generated the generating structure head.Namely, because when the structure of fileinfo and hashed value in the requested situation of the generation of fileinfo and hashed value (for example, when the size of the element of fileinfo, the size of fileinfo and hashed value etc.) also requested, even also do not receive fileinfo and hashed value, the structure head also can be generated, so this operation is possible.In another embodiment, control module 250 can receive fileinfo and hashed value individually, and generates afterwards the structure head about them.That is, when fileinfo generation unit 210 and hash generation unit 220 respectively independently when spanned file information and hashed value, control module 250 can receive fileinfo and hashed value dividually, and can the generating structure head.
In case generating structure head, control module 250 can generate hash data based on structure head, fileinfo and hashed value at step S660.
Fig. 7 is the process flow diagram that the embodiment of the hash data comparative approach that can be carried out by the hash comparison system of Fig. 4 is shown.The hash data comparative approach shown in Fig. 7 be with at the corresponding embodiment of the hash data generation method shown in Fig. 5.
With reference to figure 7, at step S710, control module 250 can be selected respectively and two hash datas that source file is associated to be compared.In the situation of the embodiment that comprises hash file administrative unit 230, control module 250 can be to the hash data of hash file administrative unit 230 request about two source files to be compared, and obtain described hash data from hash file administrative unit 230.
At step S720, control module 250 can be checked the structure of the hash data of two selections.That is, control module 250 which partly respectively respective file information and the hashed value that can check each hash data.
At step S730, control module 250 can compare the fileinfo that is included in described two hash datas mutually, and then at first determines whether identical file of two source files.For example, when file name, file size etc. is included in the fileinfo, can utilize fileinfo at first to determine whether identical file of two source files, then can determine file content.The present invention be constructed to about two files wanting comparison whether identical file determine the homogeny of object, if and file is determined to be identical object, then about the content of two objects identical homogeny of determining contents of object whether, therefore finished relatively.
If at step S740, described two fileinfos are identical (in situations of "Yes"), and then control module 250 can will compare with the hashed value that described two source files are associated mutually at step S750.
If also be identical (in the situation in "Yes") in step S760 hashed value, then determine that at step S770 described two source files are identical files.
If at step S740, described two fileinfos are different (in the situations of "No") each other, be different (in the situations in "No") each other in step S760 hashed value perhaps, then can determine that described two source files are different files at step S771.
In above-mentioned steps, when with fileinfo or hashed value mutually relatively the time, control module 250 can compare by the data bit that checks corresponding object to be compared.Therefore, if only utilize fileinfo to determine that source file is different file, then the quantity of data bit is significantly reduced.Therefore, in the time must comparing according to the relation of 1:N, for example, when carrying out finding the operation of the file identical with the particular source file in the middle of a plurality of files, the present invention can compare effectively.
Fig. 8 is the process flow diagram that another embodiment of the hash data comparative approach that can be carried out by the hash comparison system of Fig. 4 is shown.The hash data comparative approach shown in Fig. 8 be with at the corresponding embodiment of the hash data generation method shown in Fig. 6, wherein, further comprise the structure head at the hash data shown in Fig. 8.Therefore, in this embodiment, with describe briefly with the embodiment shown in Fig. 7 in same or analogous those steps of step.
With reference to figure 8, at step S810, control module 250 can be selected respectively and two hash datas that source file is associated to be compared.
At step S820, control module 250 can check the structure head of the hash data of two selections, and analytical structure head then.As mentioned above, because each structure head is included in length of the content of the fileinfo that comprises in the corresponding hash data and length, hashed value etc., so control module 250 can be identified by the analytical structure head each element of hash data.
At step S850, control module 250 can compare the fileinfo that is included in described two hash datas mutually, and then at first determines whether identical file of two source files.
If at step S860, described two fileinfos are identical (in situations of "Yes"), and then control module 250 can compare mutually in the hashed value that step S870 will be associated with described two source files respectively.
If be identical (in the situation in "Yes") in step S880 hashed value, then determine that at step S890 described two source files are identical files.
If be (in the situation in "No") that differs from one another at step S830 structure head, if be (in the situation in "No") that differs from one another at step S860 fileinfo, be (in the situation in "No") that differs from one another in step S880 hashed value perhaps, then can determine that at step S891 described two source files are different files.
Can utilize the identification of structure head to consist of fileinfo and the hashed value of hash data at the embodiment shown in Fig. 8.In the system that fileinfo and hashed value are differently used, this embodiment can be more effective.And, at step S830 because utilize structure head itself can determine the homogeny of file, so can be more fast and determine exactly the homogeny of file, therefore effectively make comparison.
Fig. 9 is the structural map that illustrates according to another embodiment of hash comparison system of the present invention.The embodiment that can be applied to wherein with the situation of the mutual comparison document of relation of 1:N at the hash comparison system shown in Fig. 9.This system is constructed to: at first only fileinfo is compared mutually, utilize to have file generated first comparative group of same file information, and the hashed value that only will belong to the file of the first comparative group compares mutually.
With reference to figure 9, hash comparison system 200 comprises fileinfo generation unit 210, hash generation unit 220, control module 250 and hash comparing unit 260.In embodiment, hash comparison system 200 may further include at least one in hash file administrative unit 230 and the source file administrative unit 240.In the description of the embodiment shown in Fig. 9, will omit or carry out briefly the description of the same or analogous parts of parts in the embodiment with Fig. 4.
Hash comparing unit 260 can only compare hashed value under the control of control module 250 mutually.In disclosed embodiment, hash comparing unit 260 is set to only hashed value be compared individually, therefore requires therein to search more effectively execution comparison in the plain situation with the relation of 1:N.
According to disclosed technology in the present invention, the hashed value of file can determined whether file is mutually different before relatively mutually, thereby therefore all hash datas that needn't more different files obtain the advantage that can more rapidly file be compared mutually.
In addition, advantage in technology disclosed by the invention is, can utilize by the Parity Check Bits that is used for fileinfo and be used for the parity information that the Parity Check Bits of hashed value consists of and check each of whether correctly constructing fileinfo and hash data structure.
Although preferred implementation of the present invention openly has been used for illustrative purpose, but skilled person will appreciate that, in the situation of disclosed scope and spirit of the present invention, various modifications, interpolation and replacement are possible in not breaking away from such as the claims of enclosing.
Claims (14)
1. one kind be used for to generate the hash data generation method of each hash data of reference source file of will being used for, and the method may further comprise the steps:
(a) check the attribute of each source file and based on inspection attribute and generate the fileinfo that is consisted of by the tentation data bit;
(b) calculate hashed value by at least a portion that hashing algorithm is applied to described source file; And
(c) generate hash data by continuously described hashed value being connected to described fileinfo.
2. hash data generation method according to claim 1, wherein, step (a) comprising:
Check described source file size, title and form, comprise described source file beginning data first's data and comprise in the second portion data of last data of described source file at least one; And
Generation comprise described source file size, title and form, comprise described source file beginning data described first data and comprise at least one described fileinfo in the described second portion data of last data of described source file.
3. hash data generation method according to claim 1, the method further comprises step (d): generate the hash Parity Check Bits that is used for described hash data.
4. hash data generation method according to claim 3, wherein, step (d) comprising:
Generate the first Parity Check Bits that is used for described fileinfo;
Generate the second Parity Check Bits that is used for described hashed value; And
By connecting continuously described the first Parity Check Bits and described the second Parity Check Bits generates described hash Parity Check Bits.
5. hash data comparative approach that be used for to utilize the hash data that comprises fileinfo and hashed value that two source files are compared mutually, the method may further comprise the steps:
(a) check two hash datas that are associated with described two source files respectively;
Two fileinfos that (b) will comprise in described two hash datas compare mutually; And
(c) if described two fileinfos are identical, two hashed values that then will comprise in described two hash datas compare mutually, and if described two hashed values be identical, determine that then described two source files are identical files.
6. hash data comparative approach according to claim 5, wherein, described fileinfo comprise the respective sources file size, title and form, comprise described source file beginning data first's data and comprise in the second portion data of last data of described source file at least one.
7. hash data comparative approach according to claim 6, wherein, step (b) comprises that each data bit that will consist of described two fileinfos compares mutually.
8. hash data comparative approach according to claim 6, wherein, step (b) comprising:
For each fileinfo in described two fileinfos, be identified in the source file that comprises in the corresponding document information size, title and form, comprise described source file beginning data first's data and comprise in the second portion data of last data of described source file at least one; And
Size, title and the form of the described source file that just has been identified, comprise described source file beginning data described first data and comprise in the described second portion data of last data of described source file at least one, described two fileinfos are compared mutually.
9. hash data comparison system that be used for to utilize the hash data that comprises fileinfo and hashed value that source file is compared mutually, this system comprises:
Fileinfo generation unit, described fileinfo generation unit are constructed to check attribute and the generation fileinfo relevant with described source file of each source file;
Hash generation unit, described hash generation unit are constructed to by the hash function algorithm application is calculated hashed value at least a portion of described source file; With
Control module, described control module are constructed to for respective sources file generated hash data, and described hash data comprises described fileinfo and described hashed value.
10. hash data comparison system according to claim 9, this system further comprises the hash file administrative unit, and described hash file administrative unit is constructed to store the hash data of generation and keeps and the information relevant with the source file that is associated of the hashed value of storing.
11. hash data comparison system according to claim 9, wherein, described control module is by sequentially comparison document information and hashed value are determined the homogeny of described the first source file and described the second source file between the first source file and the second source file.
12. hash data comparison system according to claim 9, wherein, described control module generates the structure head that comprises the identifying information relevant with described hashed value with described fileinfo, and generates the described hash data that comprises described structure head, described fileinfo and described hashed value.
13. hash data comparison system according to claim 12, wherein, described control module sequentially comparative structure head, fileinfo and hashed value between the first source file and the second source file, if then structure head, fileinfo and the hashed value of described the first source file and described the second source file are identical, determine that then described the first source file is identical file with described the second source file.
14. hash data comparison system according to claim 9, wherein, described control module generates the Parity Check Bits that is used for described hash data, and described Parity Check Bits comprises the Parity Check Bits that is respectively described fileinfo and the calculating of described hashed value.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR10-2011-0111296 | 2011-10-28 | ||
KR1020110111296A KR101310253B1 (en) | 2011-10-28 | 2011-10-28 | Hash data creation method and hash data comparison system and method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102945241A true CN102945241A (en) | 2013-02-27 |
Family
ID=47728187
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2012103330235A Pending CN102945241A (en) | 2011-10-28 | 2012-09-10 | Hash data structure used for file comparison,hash comparison system and method |
Country Status (4)
Country | Link |
---|---|
KR (1) | KR101310253B1 (en) |
CN (1) | CN102945241A (en) |
TW (1) | TW201319929A (en) |
WO (1) | WO2013062223A1 (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103699610A (en) * | 2013-12-13 | 2014-04-02 | 乐视网信息技术(北京)股份有限公司 | Method for generating file verification information, file verifying method and file verifying equipment |
WO2014206223A1 (en) * | 2013-06-27 | 2014-12-31 | 华为终端有限公司 | Method, server, and client for securely accessing web application |
US20170017798A1 (en) * | 2015-07-17 | 2017-01-19 | International Business Machines Corporation | Source authentication of a software product |
CN106471767A (en) * | 2014-07-04 | 2017-03-01 | 国立大学法人名古屋大学 | Communication system and key information sharing method |
CN107133120A (en) * | 2016-02-29 | 2017-09-05 | 阿里巴巴集团控股有限公司 | A kind of method of calibration of file data, device |
CN110197005A (en) * | 2019-05-07 | 2019-09-03 | 珠海格力电器股份有限公司 | Automatic identification method and device for CAE model of air conditioner |
CN110990897A (en) * | 2019-12-16 | 2020-04-10 | 北京无忧创想信息技术有限公司 | File fingerprint generation method and device |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015060494A1 (en) * | 2013-10-21 | 2015-04-30 | 주식회사 리얼타임테크 | Apparatus for automatically updating record id of navigation network data and method for same |
US9811333B2 (en) | 2015-06-23 | 2017-11-07 | Microsoft Technology Licensing, Llc | Using a version-specific resource catalog for resource management |
KR20220041394A (en) * | 2020-09-25 | 2022-04-01 | 삼성전자주식회사 | Electronic device and method for managing non-destructive editing contents |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050091497A1 (en) * | 2002-07-01 | 2005-04-28 | Canon Kabushiki Kaisha | Imaging apparatus |
CN101354708A (en) * | 2008-07-29 | 2009-01-28 | 四川大学 | Remote file rapid synchronization method |
CN101582076A (en) * | 2009-06-24 | 2009-11-18 | 浪潮电子信息产业股份有限公司 | Data de-duplication method based on data base |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4049498B2 (en) * | 1999-11-18 | 2008-02-20 | 株式会社リコー | Originality assurance electronic storage method, apparatus, and computer-readable recording medium |
JP2000357115A (en) | 1999-06-15 | 2000-12-26 | Nec Corp | Device and method for file retrieval |
JP2006053836A (en) | 2004-08-13 | 2006-02-23 | Fuji Electric Systems Co Ltd | Authenticity determination apparatus, and system for storing and utilizing electronic file |
US20110145259A1 (en) | 2009-12-11 | 2011-06-16 | Pitney Bowes Inc. | System and method for identifying data fields for remote address cleansing |
-
2011
- 2011-10-28 KR KR1020110111296A patent/KR101310253B1/en active IP Right Grant
-
2012
- 2012-08-21 WO PCT/KR2012/006614 patent/WO2013062223A1/en active Application Filing
- 2012-09-10 CN CN2012103330235A patent/CN102945241A/en active Pending
- 2012-09-24 TW TW101134886A patent/TW201319929A/en unknown
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050091497A1 (en) * | 2002-07-01 | 2005-04-28 | Canon Kabushiki Kaisha | Imaging apparatus |
CN101354708A (en) * | 2008-07-29 | 2009-01-28 | 四川大学 | Remote file rapid synchronization method |
CN101582076A (en) * | 2009-06-24 | 2009-11-18 | 浪潮电子信息产业股份有限公司 | Data de-duplication method based on data base |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014206223A1 (en) * | 2013-06-27 | 2014-12-31 | 华为终端有限公司 | Method, server, and client for securely accessing web application |
US9830454B2 (en) | 2013-06-27 | 2017-11-28 | Huawei Device (Dongguan) Co., Ltd. | Web application security access method, server, and client |
CN103699610A (en) * | 2013-12-13 | 2014-04-02 | 乐视网信息技术(北京)股份有限公司 | Method for generating file verification information, file verifying method and file verifying equipment |
CN106471767A (en) * | 2014-07-04 | 2017-03-01 | 国立大学法人名古屋大学 | Communication system and key information sharing method |
CN106471767B (en) * | 2014-07-04 | 2019-12-24 | 国立大学法人名古屋大学 | Communication system and key information sharing method |
US20170017798A1 (en) * | 2015-07-17 | 2017-01-19 | International Business Machines Corporation | Source authentication of a software product |
US9965639B2 (en) * | 2015-07-17 | 2018-05-08 | International Business Machines Corporation | Source authentication of a software product |
US20180225470A1 (en) * | 2015-07-17 | 2018-08-09 | International Business Machines Corporation | Source authentication of a software product |
US10558816B2 (en) * | 2015-07-17 | 2020-02-11 | International Business Machines Corporation | Source authentication of a software product |
CN107133120A (en) * | 2016-02-29 | 2017-09-05 | 阿里巴巴集团控股有限公司 | A kind of method of calibration of file data, device |
CN110197005A (en) * | 2019-05-07 | 2019-09-03 | 珠海格力电器股份有限公司 | Automatic identification method and device for CAE model of air conditioner |
CN110990897A (en) * | 2019-12-16 | 2020-04-10 | 北京无忧创想信息技术有限公司 | File fingerprint generation method and device |
Also Published As
Publication number | Publication date |
---|---|
WO2013062223A1 (en) | 2013-05-02 |
TW201319929A (en) | 2013-05-16 |
KR101310253B1 (en) | 2013-09-24 |
KR20130046746A (en) | 2013-05-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102945241A (en) | Hash data structure used for file comparison,hash comparison system and method | |
US9710503B2 (en) | Tunable hardware sort engine for performing composite sorting algorithms | |
US8924687B1 (en) | Scalable hash tables | |
JP5466257B2 (en) | Table search method | |
EP3072076B1 (en) | A method of generating a reference index data structure and method for finding a position of a data pattern in a reference data structure | |
US20180349422A1 (en) | Database management system, database server, and database management method | |
EP3516539A1 (en) | Techniques for in-memory key range searches | |
US9971793B2 (en) | Database management system and database management method | |
CN105447166A (en) | Keyword based information search method and system | |
KR101201626B1 (en) | Apparatus for genome sequence alignment usting the partial combination sequence and method thereof | |
CN109858285B (en) | Block chain data processing method, device, equipment and medium | |
JP4491480B2 (en) | Index construction method, document retrieval apparatus, and index construction program | |
JP6546704B2 (en) | Data processing method, distributed data processing system and storage medium | |
CN109977113A (en) | A kind of HBase Index Design method based on Bloom filter for medical imaging data | |
JP6366812B2 (en) | Computer and database management method | |
US10628488B2 (en) | Document retrieval system and retrieval method | |
CN115801765A (en) | File transmission method, device, system, electronic equipment and storage medium | |
WO2022248045A1 (en) | Method of data management in data storage system, data indexing module, and data storage system | |
US10325672B2 (en) | Memory apparatus having plurality of information storage tables managed by separate virtual regions and control method thereof | |
KR102544899B1 (en) | Embedding blockchain method and system using external storage media | |
CN113282423B (en) | Deployment method, system and computer readable storage medium | |
CN103678384A (en) | Sequential index generating system and sequential index generating method | |
JP5709982B2 (en) | Database device, database system | |
WO2023034368A3 (en) | System, method, and computer program product for consent management | |
WO2022262990A1 (en) | Method and system for indexing data item in data storage system and data indexing module |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20130227 |