WO2013062223A1 - Hash data structure for file comparison and hash comparison system and method using the same - Google Patents

Hash data structure for file comparison and hash comparison system and method using the same Download PDF

Info

Publication number
WO2013062223A1
WO2013062223A1 PCT/KR2012/006614 KR2012006614W WO2013062223A1 WO 2013062223 A1 WO2013062223 A1 WO 2013062223A1 KR 2012006614 W KR2012006614 W KR 2012006614W WO 2013062223 A1 WO2013062223 A1 WO 2013062223A1
Authority
WO
WIPO (PCT)
Prior art keywords
hash
file
data
source
file information
Prior art date
Application number
PCT/KR2012/006614
Other languages
French (fr)
Inventor
Sung Gook Jang
Kwang Hee Yoo
Joo Hyun Sung
Hye Jin Jin
Yoon Hyung Lee
Original Assignee
Neowiz Games Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Neowiz Games Co., Ltd. filed Critical Neowiz Games Co., Ltd.
Publication of WO2013062223A1 publication Critical patent/WO2013062223A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/71Version control; Configuration management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs

Abstract

The present invention relates to a hash technology related to data files. A hash data comparison system according to an embodiment compares source files with each other using hash data including file information and a hash value. The hash data comparison system includes a file information generation unit, a hash generation unit, and a control unit. The file information generation unit checks an attribute of each source file and generates file information about the source file. The hash generation unit calculates a hash value by applying a hash function algorithm to at least part of the source file. The control unit generates hash data for the corresponding source file including the file information and the hash value.

Description

HASH DATA STRUCTURE FOR FILE COMPARISON AND HASH COMPARISON SYSTEM AND METHOD USING THE SAME
The present invention relates, in general, to a hash technology for data files, and, more particularly, to a hash data structure and a hash comparison system and method using the hash data structure, which use the unique characteristic information of source files together with hash values, thus more rapidly making a file comparison.
A comparison between pieces of data, files in particular, has been used in various operations. For example, such a file comparison has been essentially used in various operations in such a way as to check variations between files in an Operating System (OS) or to compare a patch file with a source file in order to perform a predetermined patch.
Conventional file comparison technologies that have been used include a method of comparing all files, a method of assigning version information to files and checking the files based on the version information, a method of applying a hash function to files and then comparing the files, etc.
The method of comparing all files is not frequently used because there is a large amount of data to be compared and the speed of comparison is slow. The method of assigning version information to files and comparing the files is disadvantageous in that even if the contents of the files are changed, file contents may not match version information unless the file version information is changed, so that a file comparison is not correctly made due to such mismatch.
Therefore, in most cases, hash values are calculated by applying a hash function to files, and the contents of the files are compared by comparing the calculated hash values. However, such a conventional comparison method using only hash values is problematic in that when the size of a file is large, more computing resources are required to generate hash values and the time required to perform the corresponding operations is increased.
Accordingly, the present invention has been made keeping in mind the above problems occurring in the prior art, and an object of the present invention is to provide a hash data structure, which can easily compare files with one another using a smaller amount of resources.
Another object of the present invention is to provide a hash data structure generation method and a hash data structure comparison method therefor, which can more rapidly compare files with one another using a hash data structure required for the file comparison.
A further object of the present invention is to provide a hash comparison system, which can efficiently compare files with one another using a hash data structure required for file comparison.
In accordance with an aspect of the present invention to accomplish the above objects, there is provided a hash data structure including file information composed of predetermined data bits and related to an attribute of a source file, and a hash value composed of specific data bits and related to the source file, wherein the hash data structure includes the data bits corresponding to the hash value, subsequent to the data bits corresponding to the file information.
In an embodiment, the file information may include at least one of a size value of the source file, first partial data including first data of the source file, and second partial data including last data of the source file.
In an embodiment, the hash data structure may further include a structure header including structure information about each of the file information and the hash value included in the hash data structure.
In an embodiment, the hash data structure may further include parity information about the hash data structure, wherein the parity information includes a first parity bit for the file information and a second parity bit for the hash value.
In accordance with another aspect of the present invention to accomplish the above objects, there is provided a hash data generation method for generating respective pieces of hash data to be used to compare source files, including (a) checking an attribute of each source file and generating file information composed of predetermined data bits based on the checked attribute, (b) calculating a hash value by applying a hash algorithm to at least part of the source file, and (c) generating hash data by successively connecting the hash value to the file information.
In an embodiment, (a) may include checking at least one of a size, a name, and a format of the source file, first partial data including first data of the source file, and second partial data including last data of the source file, and generating the file information including at least one of the size, the name, and the format of the source file, the first partial data including first data of the source file, and the second partial data including last data of the source file.
In an embodiment, the hash data generation method may further include (d) generating hash parity bits for the hash data.
In an embodiment, (d) may include generating a first parity bit for the file information, generating a second parity bit for the hash value, and generating the hash parity bits by successively connecting the first and second parity bits.
In accordance with a further aspect of the present invention to accomplish the above objects, there is provided a hash data generation method for generating respective pieces of hash data to be used to compare source files, including (a) generating a structure header including structure information about each of file information and a hash value included in a hash data structure, (b) checking an attribute of each source file and generating file information composed of predetermined data bits based on the checked attribute, (c) generating a hash value by applying a hash algorithm to at least part of the source file, and (d) generating hash data by successively connecting the hash value to the file information.
In accordance with yet another aspect of the present invention to accomplish the above objects, there is provided a hash data comparison method for comparing two source files with each other using hash data including file information and a hash value, including (a) checking two pieces of hash data respectively associated with the two source files, (b) comparing two pieces of file information included in the two pieces of hash data with each other, and (c) if the two pieces of file information are identical, comparing two hash values included in the two pieces of hash data with each other, and if the two hash values are identical, determining that the two source files are an identical file.
In an embodiment, the file information may include at least one of a size, a name and a format of a corresponding source file, first partial data including first data of the source file, and second partial data including last data of the source file.
In an embodiment, (b) may include comparing individual data bits constituting the two pieces of file information with each other.
In an embodiment, (b) may include identifying at least one of a size, a name, and a format of a source file included in corresponding file information, first partial data including first data of the source file, and second partial data including last data of the source file, for each of the two pieces of file information, and comparing the two pieces of file information with each other in terms of at least one of the size, the name, and the format of the source file, the first partial data including first data of the source file, and the second partial data including last data of the source file, which have been identified.
In accordance with still another aspect of the present invention to accomplish the above objects, there is provided a hash data comparison method for comparing two source files with each other using hash data including file information, a hash value, and a structure header including structure information about each of the file information and the hash value, including (a) comparing structure headers of the two source files with each other, and determining whether pieces of hash data have an identical structure, (b) if it is determined that the pieces of hash data have the identical structure, comparing pieces of file information respectively associated with the two source files with each other, and (c) if the pieces of file information are identical, comparing hash values respectively associated with the source files with each other, and determining that the two source files are an identical file if the hash values are identical.
In accordance with still aspect of the present invention to accomplish the above objects, there is provided a hash data comparison system for comparing source files with each other using hash data including file information and a hash value, including a file information generation unit configured to check an attribute of each source file and generate file information about the source file, a hash generation unit configured to calculate a hash value by applying a hash function algorithm to at least part of the source file, and a control unit configured to generate hash data for the corresponding source file, including the file information and the hash value.
In an embodiment, the hash data comparison system may further include a hash file management unit configured to store the generated hash data and keep information about source files associated with stored hash values.
In an embodiment, the control unit may determine identicalness of first and second source files by sequentially comparing pieces of file information and hash values between the first and second source files.
In an embodiment, the control unit may generate a structure header including identification information about the file information and the hash value, and generate the hash data, including the structure header, the file information, and the hash value.
In an embodiment, the control unit may sequentially compare structure headers, pieces of file information, and hash values between the first and second source files, and then determine that the first and second source files are an identical file if the structure headers, the file information, and hash values of the first and second source files are identical.
In an embodiment, the control unit may generate parity bits for the hash data, including parity bits respectively calculated for the file information and the hash value.
The above and other objects, features and advantages of the present invention will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings, in which:
FIG. 1 is a reference diagram showing an embodiment of a hash data structure according to the present invention;
FIG. 2 is a reference diagram showing another embodiment of a hash data structure according to the present invention;
FIG. 3 is a reference diagram showing a further embodiment of a hash data structure according to the present invention;
FIG. 4 is a configuration diagram showing an embodiment of a hash comparison system according to the present invention;
FIG. 5 is a flowchart showing an embodiment of a hash data generation method that can be performed by the hash comparison system of FIG. 4;
FIG. 6 is a flowchart showing another embodiment of a hash data generation method that can be performed by the hash comparison system of FIG. 4;
FIG. 7 is a flowchart showing an embodiment of a hash data comparison method that can be performed by the hash comparison system of FIG. 4;
FIG. 8 is a flowchart showing another embodiment of a hash data comparison method that can be performed by the hash comparison system of FIG. 4; and
FIG. 9 is a configuration diagram showing another embodiment of a hash comparison system according to the present invention.
Technologies disclosed in the present invention are only embodiments for a structural or functional description, so that the scope of the disclosed technologies should not be interpreted as being limited by embodiments described in the present specification. That is, embodiments can be modified in various forms and can have various forms, so that the scope of the disclosed technologies should be understood as including equivalents capable of realizing the technical spirit of the present invention.
Meanwhile, the meanings of terms described in the present specification should be understood as follows.
The terms such as first and second are merely used to distinguish one component from other components, and the scope of the present invention should not be limited by these terms. For example, a first component may be designated as a second component and a second component may be designated as a first component in the similar manner.
Throughout the entire specification, it should be understood that a representation indicating that a first component is connected to a second component may include the case where the first component is connected to the second component with some other component interposed therebetween, as well as the case where the first component is directly connected to the second component. In contrast, it should be understood that a representation indicating that a first component is directly connected to a second component means that no component is interposed between the first and second components. Meanwhile, other representations describing relationships among components, that is, between and directly between or adjacent to and directly adjacent to, should be interpreted in similar manners.
It should be understood that a singular expression includes a plural expression unless a description to the contrary is specifically pointed out in context.
Figure PCTKR2012006614-appb-I000001
In the present specification, it should be understood that the terms such as include or have are merely intended to indicate that features, numbers, steps, operations, components, parts, or combinations thereof are present, and are not intended to exclude a possibility that one or more other features, numbers, steps, operations, components, parts, or combinations thereof will be present or added.
Reference symbols in individual steps (for example, a, b, c, etc.) are used for the sake of convenience of description, and do not indicate the sequence of individual steps, and the individual steps may occur in a sequence differing from that described in the present specification unless the specific sequence of the steps is definitely defined in context. That is, the steps can occur in the same sequence as that described in the present specification, or substantially simultaneously, or in a reverse sequence.
Unless differently defined, all terms used here including technical or scientific terms have the same meanings as the terms generally understood by those skilled in the art to which the present invention pertains. The terms identical to those defined in generally used dictionaries should be interpreted as having meanings identical to contextual meanings of the related art, and are not interpreted as being ideal or excessively formal meanings unless they are definitely defined in the present specification.
In the following description, the term source file denotes a file that is a target to which a hash data structure will be applied. The present invention provides a hash data structure having independent values for respective source files similarly to the typical characteristics of hash values.
FIG. 1 is a reference diagram showing an embodiment of a hash data structure according to the present invention.
Referring to FIG. 1, a hash data structure 100 includes file information 110 and a hash value 120. In greater detail, the hash data structure 100 may be configured to include bits corresponding to the hash value, subsequent to data bits of the file information 110 for a source file.
The file information 110 may include the size value 111 of the source file, partial data 112 including first data of the source file (hereinafter referred to as first partial data ), and partial data 113 including last data of the source file (hereinafter referred to as second partial data ). According to an embodiment, the file information 110 may be composed of at least one of the above-described three types of data 111 to 113.
The file information 110 may be configured to have different lengths in one or more systems in accordance with embodiments which will be described later. That is, the file information 110 is not necessarily composed of specific data bits, but may be composed of data bits corresponding to a predetermined size depending on the settings of the system or as occasion demands.
The source file size value 111 is data indicative of the size of the source file.
The first partial data 112 is part of the source file corresponding to a predetermined length from the first bit of the source file, and the second partial data 113 is part of the source file corresponding to a predetermined length from the last bit of the source file. In this case, the lengths of the first and second partial data 112 and 113 can be determined differently depending on a corresponding file comparison system, so that the present invention is not limited by these lengths.
The hash value 120 is data obtained by applying a hash algorithm to the source file. In an embodiment, the hash value 120 can be set to specific bits. That is, the file information 110 is configured such that elements included therein and the sizes of the elements can be changed, whereas the hash value 120 can be limited to a specific size (data bits) such as a standardized size. For example, in the case of an SHA-0 or SHA-1 algorithm, the hash value 120 may have 160 bits, in the case of an SHA-256/224 algorithm, the hash value may have 256/224 bits, and in the case of a SHA-512/384 algorithm, the hash value may have 512/384 bits. In other words, even if the hash value 120 is applied to one or more systems depending on embodiments, it can be composed of data bits of a specific length. That is, since the hash value is preferably determined depending on standards, it can be limited to a specific size of data bits.
The file information 110 must be compared prior to the hash value 120 when comparing files. For example, when an example in which it is desired to search for file A among files A to C is described below, files A to C are compared with one another using file information about file A, thus enabling file A to be identified. In this case, since the corresponding file can be found using only the file information 110, there is no need to mutually compare hash values with one another, so that a desired file can be more rapidly found using a smaller amount of resources.
FIG. 2 is a reference diagram showing another embodiment of a hash data structure according to the present invention. The hash data structure shown in FIG. 2 further includes a structure header 130 compared to the embodiment of FIG. 1.
The structure header 130 includes information about the structures of file information 110 and a hash value 120. For example, the structure header 130 may include information about the total number of bits of the file information 110 and the total number of bits of the hash value 120.
In an embodiment, the structure header 130 may include information about a hash function used to calculate the hash value 120. For example, the structure header 130 may include information about a hash function, such as SHA-0 or SHA-1, used to calculate the corresponding hash value 120.
In an embodiment, the file information 110 may include only at least one of the three types of data 111 to 113 shown in the drawing, and the structure header 130 may provide information about data included in the file information 110.
For example, it is assumed that file size information and first and second partial data are respectively identified as A, B, and C, the file size information has a fixed size of two bytes, and the structure header 130 is composed of 6AB . In this case, 6 in the structure header 130 is a value indicating the total number of bytes of the file information 110, and AB indicates that the file information 110 is composed of file size information 111 and first partial data 112.
In the embodiment of FIG. 2, the disclosed hash data structure 100 can also be applied to the case where file information 110 having different lengths is used by a single system. That is, the reason for this is that bits for elements of the hash data structure 100 can be individually identified using the structure header 130.
FIG. 3 is a reference diagram showing a further embodiment of a hash data structure according to the present invention. The hash data structure shown in FIG. 3 further includes parity information 140 compared to the embodiment of FIG. 1.
The parity information 140 includes a parity value for the hash data structure 100.
In an embodiment, the parity information 140 may include (i) a parity bit for file information 110 and (ii) a parity bit for a hash value 120. This is intended to identify individual parity values because the present invention can complete the comparison using only the file information 110 when comparing files.
The embodiment of FIG. 3 can more efficiently perform error checking and compare files with one another when the transmission of files or the like occurs.
FIG. 4 is a configuration diagram showing an embodiment of a hash comparison system according to the present invention.
A hash comparison system 200 includes a file information generation unit 210, a hash generation unit 220, a hash file management unit 230, and a control unit 250. In an embodiment, the hash comparison system 200 may further include a source file management unit 240.
The file information generation unit 210 may check the attributes of source files and generate file information about the source files. Here, the attributes of each source file may include the size, name, format, and partial data bits of the source file (for example, a predetermined length from the first data bit or the last data bit), etc.
In an embodiment, the file information generation unit 210 can generate the above-described first and second partial data by reading data bits of the source file by preset lengths from the first and last bits of the data bits of the source file. In this case, the preset lengths may correspond to the sizes of the first and second partial data of a corresponding hash data structure.
The hash generation unit 220 may generate a hash value by applying a hash function to each source file. The hash generation unit 220 may use a hash function utilized by individual systems or a hash function based on standards, for example, based on a Secure Hash Algorithm (SHA).
In an embodiment, the hash generation unit 220 has a plurality of hash functions and is capable of generating a hash value for the source file using a specific hash function in response to the request of the control unit 250.
In an embodiment, the hash generation unit 220 may generate a hash value only for part of the source file. For example, when the size of the source file is equal to or greater than a predetermined value, the hash generation unit 220 may generate a hash value for part of the source file corresponding to a preset size. In another embodiment, the hash generation unit 220 may also generate a hash value only for the remaining part of the source file other than the first and second partial data.
The hash file management unit 230 may manage source files and hash files (structures) corresponding thereto. For example, the hash file management unit 230 stores hash files and keeps information about source files matching the corresponding hash files (e.g., link information or the like).
The source file management unit 240 may store the source files and keep the history of each source file. For example, if it is determined that file A has changed as a result of making a hash comparison on file A, the corresponding file A and the hash history thereof can be stored in the source file management unit 240.
The control unit 250 may generate a hash data structure or compare source files with one another by controlling the overall operation of the hash comparison system 200.
In an embodiment, the control unit 250 may generate a hash data structure (file) for each source file. In greater detail, the control unit 250 may provide a specific source file both to the file information generation unit 210 and to the hash generation unit 220, and generate a hash data structure using file information and a hash value that have been received in response to the specific source file. An embodiment related to the generation of the hash data structure will be described in detail with reference to FIGS. 5 and 6.
In an embodiment, the control unit 250 may compare two source files with each other using the hash data structure. The hash data structure according to the present invention is separated into file information and a hash value, and the source files are mutually compared with each other using such structural characteristics. In greater detail, the control unit 250 analyzes the hash data structures of the source files to be compared with each other, and determines whether the source files are the same file by using the file information of the hash data structures. If it is determined that the source files are the same file, the control unit 250 checks whether the source files have the same contents by using the hash values of the hash data structures. The present invention first performs the step of determining whether the files are the same file using the file information and subsequently performs the step of making a comparison between hash values only if the files are determined to be identical, thus more rapidly making a comparison.
In an embodiment, when pieces of file information are compared with each other, the control unit 250 can mutually compare the individual data bits that constitute the pieces of file information. In another embodiment, the control unit 250 may identify individual elements constituting the pieces of file information and may compare the pieces of file information by mutually comparing the identified elements with each other. That is, for each piece of file information, at least one of the size, name, format, first partial data, and second partial data of the source file, included in the corresponding file information, is identified, and the identified element can be compared to that of the other piece of file information.
In an embodiment, the control unit 250 can provide generated hash files and source file information associated with them to the hash file management unit 230, so that the hash files can be managed. The control unit 250 provides the generated hash files to the hash file management unit 230, so that the hash files are stored in the hash file management unit 230. When a request for another operation such as a hash comparison is received, the control unit 250 can be provided with a hash file corresponding to a specific source file from the hash file management unit 230, so that a predetermined operation can be performed.
In an embodiment, the control unit 250 can control the source file management unit 240 so that a history of source files is generated. For example, when a patch or the like occurs for the same source file, a history of the patch may be required. In the case of this example, the control unit 250 (i) determines that the source files are the same source file using file information as a result of making a comparison between the source files, and (ii) if it is determined using a hash value that there is variation in the contents of the file, provides information about the corresponding source file and information about the hash data structure to the source file management unit 240, thus enabling the history to be generated.
In an embodiment, the control unit 250 can generate a structure header for each hash data structure. In greater detail, when file information and a hash value are provided by the file information generation unit 210 and the hash generation unit 220, respectively, the control unit 250 can generate a structure header for the hash data structure so that the file information and the hash value can be identified. For example, the control unit 250 can generate a structure header including information indicating elements included in the file information 110, the data lengths of the respective elements, the length of a hash value, etc. In this embodiment, when hash data structures are compared with other, the control unit 250 first analyzes structure headers to identify file information and hash values, and then determines based on the file information whether the two source files to be compared are the same file. If it is determined that the source files are the same file, the control unit 250 can determine whether the contents of the files have changed by comparing the hash values thereof with each other.
In an embodiment, the control unit 250 may generate parity information and add it to each hash data structure. In greater detail, the control unit 250 may generate a parity bit for file information and a parity bit for a hash value, and may generate parity information, including the two parity bits. This embodiment can be applied to the case where the transmission or the like of a hash data structure occurs between different systems. Parity bits are respectively calculated for both the file information and the hash value of the hash data structure, so that a parity operation can be more rapidly performed when comparing hash data structures with each other.
FIG. 5 is a flowchart showing an embodiment of a hash data generation method that can be performed by the hash comparison system of FIG. 4.
Referring to FIG. 5, the file information generation unit 210 can check the attributes of each source file under the control of the control unit 250 at step S510. In this case, the attributes are pieces of data collected to generate file information, and may be a file size, a file name, a file format, first or second partial data, or the like, as described above.
The file information generation unit 210 can generate file information based on the checked attributes of the source file at step S520. The file information is used to determine whether two source files that are compared are the same file upon comparing pieces of hash data with other. As described above, the file information may include at least one of a file size, first partial data, and second partial data. Alternatively, the file information may include a file name or a file format. The file information generation unit 210 provides the generated file information to the control unit 250.
The hash generation unit 220 can generate a hash value corresponding to each source file under the control of the control unit 250 at step S530. In an embodiment, the hash generation unit 220 can have various hash algorithms and can generate a hash value for the source file using the hash algorithm requested by the control unit 250. In an embodiment, the hash generation unit 220 can generate a hash value using only part of the source file under the control of the control unit 250. The hash generation unit 220 provides the generated hash value to the control unit 250.
The control unit 250 can generate hash data using the file information and the hash value at step S540. The control unit 250 can generate hash data by successively connecting data bits corresponding to the hash value to data bits corresponding to the file information. In this embodiment, the control unit 250 can previously know up to which bit the file information ranges starting from the first bit. Therefore, when the control unit 250 performs control such that the file information generation unit 210 and the hash generation unit 220 generate the file information and the hash value, it may make such a request for the generation of the file information and the hash value, including information about the size of data.
FIG. 6 is a flowchart showing another embodiment of a hash data generation method that can be performed by the hash comparison system of FIG. 4. The embodiment of FIG. 6 relates to an embodiment in which hash data is generated using the above-described structure header. The embodiment of FIG. 6 is obtained by adding predetermined steps to the embodiment of FIG. 5, so that steps identical or similar to those of the embodiment of FIG. 5 will be briefly described.
Referring to FIG. 6, the control unit 250 can previously determine elements to be included in file information at step S610. That is, the control unit 250 can previously determine the types of elements to be included in the file information, the sizes of the elements, etc., and keep information about the configuration of the file information. Thereafter, the control unit 250 can request the file information generation unit 210 to generate file information, including information about the determined elements.
The file information generation unit 210 can generate file information under the control of the control unit 250. That is, the file information generation unit 210 can check the attributes of each source file at step S620, generate file information using the checked attributes at step S630, and provide the file information to the control unit 250.
The hash generation unit 220 can generate a hash value for the source file under the control of the control unit 250 at step S640, and provide the hash value to the control unit 250.
The control unit 250 can generate a structure header for the file information and the hash value at step S650. As described above, the structure header can include information about the structure of hash data. The reason for using the structure header is that the present invention separates file information and a hash value from the hash data and then separately makes a comparison on each of the file information and the hash value. In an embodiment, the control unit 250 can generate a structure header before the file information and the hash value have been generated. That is, this operation is possible because when the configuration of the file information and the hash value (for example, elements of the file information, the sizes of the elements, the size of a hash value, etc.) is also requested when the generation of the file information and the hash value is requested, the structure header can be generated even if the file information and the hash value have not been received. In another embodiment, the control unit 250 can individually receive the file information and the hash value and thereafter generate a structure header for them. That is, when the file information generation unit 210 and the hash generation unit 220 independently generate file information and a hash value, respectively, the control unit 250 can receive the file information and the hash value separately and can generate the structure header.
Once the structure header has been generated, the control unit 250 can generate hash data based on the structure header, file information, and the hash value at step S660.
FIG. 7 is a flowchart showing an embodiment of a hash data comparison method that can be performed by the hash comparison system of FIG. 4. The hash data comparison method shown in FIG. 7 is an embodiment corresponding to the hash data generation method shown in FIG. 5.
Referring to FIG. 7, the control unit 250 can select pieces of hash data respectively associated with two source files to be compared at step S710. In the case of an embodiment including the hash file management unit 230, the control unit 250 may request hash data for two source files to be compared from the hash file management unit 230 and obtain the hash data from the hash file management unit 230.
The control unit 250 can verify the configurations of the two pieces of selected hash data at step S720. That is, the control unit 250 can verify which parts of each piece of hash data correspond to file information and the hash value, respectively.
The control unit 250 is capable of comparing pieces of file information included in the two pieces of hash data with each other, and then first determining whether the two source files are the same file at step S730. For example, when a file name, a file length, etc. are included in the file information, whether two source files are the same file can be first determined using file information, and then file contents can be subsequently determined. The present invention is configured to determine the sameness of objects in relation to whether two files desired to be compared are the same object, and if the files are determined to be the same object, to determine the identicalness of object contents in relation to whether the contents of the two objects are identical, thus completing the comparison.
If the two pieces of file information are identical at step S740 (in the case of YES), the control unit 250 can compare hash values associated with the two source files with each other at step S750.
If the hash values are also identical at step S760 (in the case of YES), it can be determined that the two source files are the same file at step S770.
If the pieces of file information are different from each other at step S740 (in the case of NO), or if hash values are different from each other at step S760 (in the case of NO), it can be determined that the two source files are different files at step S771.
At the above-described steps, the control unit 250 can make a comparison by checking data bits of corresponding objects to be compared when comparing pieces of file information or hash values with each other. Therefore, if it is determined that the source files are different files using only the file information, the number of data bits is remarkably reduced. Therefore, the present invention can efficiently make a comparison when having to make a comparison in a 1: N relation such as the operation of finding the same file as a specific source file from among a plurality of files.
FIG. 8 is a flowchart showing another embodiment of a hash data comparison method that can be performed by the hash comparison system of FIG. 4. The hash data comparison method shown in FIG. 8 is an embodiment corresponding to the hash data generation method shown in FIG. 6, wherein the hash data shown in FIG. 8 further includes a structure header. Therefore, in the present embodiment, steps identical or similar to those of the embodiment shown in FIG. 7 will be briefly described.
Referring to FIG. 8, the control unit 250 can select pieces of hash data respectively associated with two source files to be compared at step S810.
The control unit 250 may check the structure headers of the two pieces of selected hash data and then analyze the structure headers at step S820. As described above, since each structure header includes the contents and length of file information included in corresponding hash data, the length of a hash value, etc., the control unit 250 can identify the individual elements of the hash data by analyzing the structure header.
The control unit 250 compares the structure headers of the two pieces of hash data with each other, and if the structure headers are identical (at step S830 in the case of YES), can identify pieces of file information and hash values included in the respective pieces of hash data at step S840.
The control unit 250 can compare the pieces of file information included in the two pieces of hash data with each other, and then first determine whether the two source files are the same file at step S850.
If the pieces of file information are identical at step S860 (in the case of YES), the control unit 250 can compare hash values respectively associated with the two source files with each other at step S870.
If the hash values are identical at step S880 (in the case of YES), it can be determined that the two source files are the same file at step S890.
If the structure headers are different from each other at step S830 (in the case of NO), if the pieces of file information are different from each other at step S860 (in the case of NO) or if hash values are different from each other at step S880 (in the case of NO), the two source files can be determined to be different files at step S891.
The embodiment shown in FIG. 8 can identify file information and hash values constituting hash data using the structure headers. This embodiment can be more efficient in systems to which pieces of file information and hash values are differently applied. Further, at step S830, since the sameness of files can be determined using the structure headers themselves, the sameness of files can be more rapidly and accurately determined, thus efficiently making a comparison.
FIG. 9 is a configuration diagram showing another embodiment of a hash comparison system according to the present invention. The hash comparison system shown in FIG. 9 is an embodiment that can be applied to the case where files are compared with each other in a 1:N relation. This system is configured to first compare only pieces of file information with one another, generate a first comparison group using files having the same file information, and compare hash values of only files belonging to the first comparison group with one another.
Referring to FIG. 9, the hash comparison system 200 includes a file information generation unit 210, a hash generation unit 220, a control unit 250, and a hash comparison unit 260. In an embodiment, the hash comparison system 200 may further include at least one of a hash file management unit 230 and a source file management unit 240. In the description of the embodiment shown in FIG. 9, a description of components identical or similar to those of the embodiment of FIG. 4 will be omitted or briefly made.
The control unit 250 can select a file identical to source file A from target file group B. For this, the control unit 250 can select pieces of hash data associated with all files included in target file group B, select hash data of source file A, and compare the pieces of selected hash data with one another. In the comparison, the control unit 250 can separate each piece of hash data into file information and a hash value, and first compare only pieces of file information with one another. That is, the control unit 250 can compare the file information of source file A with pieces of file information of target files included in the target file group B, classify target files having the same file information, and then generate a first comparison group. Thereafter, the control unit 250 can compare hash values of the target files included in the first comparison group with the hash value of source file A using the hash comparison unit 260, and then determine the same files.
The hash comparison unit 260 can compare only hash values with one another under the control of the control unit 250. In the disclosed embodiment, the hash comparison unit 260 is provided to separately make a comparison only on hash values, thus more efficiently performing comparison in the case where a search is required in a 1:N relation.
According to technology disclosed in the present invention, whether files are different from each other can be determined before the hash values of the files are compared with each other, so that there is no need to compare all hash values of different files, thus obtaining the advantage of being able to more rapidly compare files with one another.
Further, the technology disclosed in the present invention is advantageous in that whether each of file information and a hash data structure has been correctly configured can be verified using parity information composed of parity for file information and parity for a hash value.
Although the preferred embodiments of the present invention have been disclosed for illustrative purposes, those skilled in the art will appreciate that various modifications, additions and substitutions are possible, without departing from the scope and spirit of the invention as disclosed in the accompanying claims.

Claims (14)

  1. A hash data generation method for generating respective pieces of hash data to be used to compare source files, comprising:
    (a) checking an attribute of each source file and generating file information composed of predetermined data bits based on the checked attribute;
    (b) calculating a hash value by applying a hash algorithm to at least part of the source file; and
    (c) generating hash data by successively connecting the hash value to the file information.
  2. The hash data generation method of claim 1, wherein (a) comprises:
    checking at least one of a size, a name, and a format of the source file, first partial data including first data of the source file, and second partial data including last data of the source file; and
    generating the file information including at least one of the size, the name, and the format of the source file, the first partial data including first data of the source file, and the second partial data including last data of the source file.
  3. The hash data generation method of claim 1, further comprising (d) generating hash parity bits for the hash data.
  4. The hash data generation method of claim 3, wherein (d) comprises:
    generating a first parity bit for the file information;
    generating a second parity bit for the hash value; and
    generating the hash parity bits by successively connecting the first and second parity bits.
  5. A hash data comparison method for comparing two source files with each other using hash data including file information and a hash value, comprising:
    (a) checking two pieces of hash data respectively associated with the two source files;
    (b) comparing two pieces of file information included in the two pieces of hash data with each other; and
    (c) if the two pieces of file information are identical, comparing two hash values included in the two pieces of hash data with each other, and if the two hash values are identical, determining that the two source files are an identical file.
  6. The hash data comparison method of claim 5, wherein the file information comprises at least one of a size, a name and a format of a corresponding source file, first partial data including first data of the source file, and second partial data including last data of the source file.
  7. The hash data comparison method of claim 6, wherein (b) comprises comparing individual data bits constituting the two pieces of file information with each other.
  8. The hash data comparison method of claim 6, wherein (b) comprises:
    identifying at least one of a size, a name, and a format of a source file included in corresponding file information, first partial data including first data of the source file, and second partial data including last data of the source file, for each of the two pieces of file information; and
    comparing the two pieces of file information with each other in terms of at least one of the size, the name, and the format of the source file, the first partial data including first data of the source file, and the second partial data including last data of the source file, which have been identified.
  9. A hash data comparison system for comparing source files with each other using hash data including file information and a hash value, comprising:
    a file information generation unit configured to check an attribute of each source file and generate file information about the source file;
    a hash generation unit configured to calculate a hash value by applying a hash function algorithm to at least part of the source file; and
    a control unit configured to generate hash data for the corresponding source file, including the file information and the hash value.
  10. The hash data comparison system of claim 9, further comprising a hash file management unit configured to store the generated hash data and keep information about source files associated with stored hash values.
  11. The hash data comparison system of claim 9, wherein the control unit determines identicalness of first and second source files by sequentially comparing pieces of file information and hash values between the first and second source files.
  12. The hash data comparison system of claim 9, wherein the control unit generates a structure header including identification information about the file information and the hash value, and generates the hash data, including the structure header, the file information, and the hash value.
  13. The hash data comparison system of claim 12, wherein the control unit sequentially compares structure headers, pieces of file information, and hash values between the first and second source files, and then determines that the first and second source files are an identical file if the structure headers, the file information, and hash values of the first and second source files are identical.
  14. The hash data comparison system of claim 9, wherein the control unit generates parity bits for the hash data, including parity bits respectively calculated for the file information and the hash value.
PCT/KR2012/006614 2011-10-28 2012-08-21 Hash data structure for file comparison and hash comparison system and method using the same WO2013062223A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2011-0111296 2011-10-28
KR1020110111296A KR101310253B1 (en) 2011-10-28 2011-10-28 Hash data creation method and hash data comparison system and method

Publications (1)

Publication Number Publication Date
WO2013062223A1 true WO2013062223A1 (en) 2013-05-02

Family

ID=47728187

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2012/006614 WO2013062223A1 (en) 2011-10-28 2012-08-21 Hash data structure for file comparison and hash comparison system and method using the same

Country Status (4)

Country Link
KR (1) KR101310253B1 (en)
CN (1) CN102945241A (en)
TW (1) TW201319929A (en)
WO (1) WO2013062223A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9811333B2 (en) 2015-06-23 2017-11-07 Microsoft Technology Licensing, Llc Using a version-specific resource catalog for resource management
US9830454B2 (en) 2013-06-27 2017-11-28 Huawei Device (Dongguan) Co., Ltd. Web application security access method, server, and client

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015060494A1 (en) * 2013-10-21 2015-04-30 주식회사 리얼타임테크 Apparatus for automatically updating record id of navigation network data and method for same
CN103699610A (en) * 2013-12-13 2014-04-02 乐视网信息技术(北京)股份有限公司 Method for generating file verification information, file verifying method and file verifying equipment
JP6338949B2 (en) * 2014-07-04 2018-06-06 国立大学法人名古屋大学 Communication system and key information sharing method
US9965639B2 (en) * 2015-07-17 2018-05-08 International Business Machines Corporation Source authentication of a software product
CN107133120A (en) * 2016-02-29 2017-09-05 阿里巴巴集团控股有限公司 A kind of method of calibration of file data, device
CN110197005A (en) * 2019-05-07 2019-09-03 珠海格力电器股份有限公司 A kind of air-conditioning CAE model automatic identifying method and device
CN110990897A (en) * 2019-12-16 2020-04-10 北京无忧创想信息技术有限公司 File fingerprint generation method and device
KR20220041394A (en) * 2020-09-25 2022-04-01 삼성전자주식회사 Electronic device and method for managing non-destructive editing contents

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000357115A (en) * 1999-06-15 2000-12-26 Nec Corp Device and method for file retrieval
JP2001147898A (en) * 1999-11-18 2001-05-29 Ricoh Co Ltd Electronic preserving method and device for guaranteeing originality and computer readable recording medium
JP2006053836A (en) * 2004-08-13 2006-02-23 Fuji Electric Systems Co Ltd Authenticity determination apparatus, and system for storing and utilizing electronic file
US20110145259A1 (en) * 2009-12-11 2011-06-16 Pitney Bowes Inc. System and method for identifying data fields for remote address cleansing

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004040307A (en) * 2002-07-01 2004-02-05 Canon Inc Image forming apparatus
CN101354708B (en) * 2008-07-29 2010-08-18 四川大学 Remote file rapid synchronization method
CN101582076A (en) * 2009-06-24 2009-11-18 浪潮电子信息产业股份有限公司 Data de-duplication method based on data base

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000357115A (en) * 1999-06-15 2000-12-26 Nec Corp Device and method for file retrieval
JP2001147898A (en) * 1999-11-18 2001-05-29 Ricoh Co Ltd Electronic preserving method and device for guaranteeing originality and computer readable recording medium
JP2006053836A (en) * 2004-08-13 2006-02-23 Fuji Electric Systems Co Ltd Authenticity determination apparatus, and system for storing and utilizing electronic file
US20110145259A1 (en) * 2009-12-11 2011-06-16 Pitney Bowes Inc. System and method for identifying data fields for remote address cleansing

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9830454B2 (en) 2013-06-27 2017-11-28 Huawei Device (Dongguan) Co., Ltd. Web application security access method, server, and client
US9811333B2 (en) 2015-06-23 2017-11-07 Microsoft Technology Licensing, Llc Using a version-specific resource catalog for resource management

Also Published As

Publication number Publication date
KR20130046746A (en) 2013-05-08
TW201319929A (en) 2013-05-16
CN102945241A (en) 2013-02-27
KR101310253B1 (en) 2013-09-24

Similar Documents

Publication Publication Date Title
WO2013062223A1 (en) Hash data structure for file comparison and hash comparison system and method using the same
WO2014003497A1 (en) Generation and verification of alternate data having specific format
WO2018207975A1 (en) Blockchain system and method for generating blockchain
WO2018038433A1 (en) Apparatus, system on chip, and method for transmitting video image
US10581849B2 (en) Data packet transmission method, data packet authentication method, and server thereof
WO2020233089A1 (en) Test case generating method and apparatus, terminal, and computer readable storage medium
WO2020233073A1 (en) Blockchain environment test method, device and apparatus, and storage medium
WO2020224251A1 (en) Block chain transaction processing method, device, apparatus and storage medium
WO2011065660A4 (en) Calculation simulation system and method thereof
WO2017116062A1 (en) Method and server for authenticating and verifying file
WO2014181946A1 (en) System and method for extracting big data
WO2013125866A1 (en) Computer system and signature verification server
WO2018038458A1 (en) Wireless receiving apparatus and data processing module
WO2018062613A1 (en) Remote management system and method for batch parameter setting of smart meter
WO2019103443A1 (en) Method, apparatus and system for managing electronic fingerprint of electronic file
WO2020062639A1 (en) Blockchain introduction testing method and apparatus, device, and computer-readable storage medium
WO2015068929A1 (en) Operation method of node considering packet characteristic in content-centered network and node
WO2021261901A1 (en) Anomaly detection device and method using function call pattern analysis
WO2022114653A1 (en) Data boundary deriving system and method
WO2021172780A1 (en) Method and device for selecting gene
WO2017142365A1 (en) Method and device for transmitting and receiving media data in multimedia system
WO2022097881A1 (en) Device and method for detecting target file on basis of network packet analysis
WO2021080110A1 (en) System and method for managing and identifying affiliation of terminal in cloud environment
WO2017030337A1 (en) Method for processing associated transaction in internet of things, internet of things communication node therefor, and internet of things network using same
WO2024034738A1 (en) Camera calibration device and method using automatic recognition of calibration pattern

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12843232

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12843232

Country of ref document: EP

Kind code of ref document: A1