TW201319929A

TW201319929A - Hash data structure for file comparison and hash comparison system and method using the same

Info

Publication number: TW201319929A
Application number: TW101134886A
Authority: TW
Inventors: Sung-Gook Jang; Kwang-Hee Yoo; Joo-Hyun Sung; Hye-Jin Jin; Yoon-Hyung Lee
Original assignee: Neowiz Games Co Ltd
Priority date: 2011-10-28
Filing date: 2012-09-24
Publication date: 2013-05-16
Also published as: KR101310253B1; WO2013062223A1; CN102945241A; KR20130046746A

Abstract

The present invention relates to a hash technology related to data files. A hash data comparison system according to an embodiment compares source files with each other using hash data including file information and a hash value. The hash data comparison system includes a file information generation unit, a hash generation unit, and a control unit. The file information generation unit checks an attribute of each source file and generates file information about the source file. The hash generation unit calculates a hash value by applying a hash function algorithm to at least part of the source file. The control unit generates hash data for the corresponding source file including the file information and the hash value. Accordingly, the present invention is advantageous in that there is no need to compare all hash values among different files, thus comparing more rapidly files.

Description

Hash data structure for file comparison and hash comparison system and method using same

概括言之，本發明係有關於一種用於資料檔案的散列(hash)技術，特別是關於一種散列資料結構以及一種使用該散列資料結構的散列比較系統和方法，其使用原始檔案(source file)的特有特徵資訊配合散列數值，從而更迅速地進行檔案比較。 In summary, the present invention relates to a hash technique for data archives, and more particularly to a hash data structure and a hash comparison system and method using the hash data structure, using the original archive The unique feature information of the (source file) is combined with the hash value to make file comparisons more quickly.

資料段之間的比較，特別是檔案之間，已然被使用於各種不同的工作之中。例如，此種檔案比較基本上已經以此種方式被使用於各種不同的工作之中，諸如在一作業系統(Operation System；OS)之中檢查檔案之間的差異或者對一修補檔案(patch file)與一原始檔案進行比較以執行一個預定的修補。 Comparisons between data segments, especially between archives, have been used in a variety of different jobs. For example, such file comparisons have basically been used in a variety of different ways in this way, such as checking for differences between files or for patch files in an Operating System (OS). ) is compared to an original file to perform a predetermined patch.

已然被採用的傳統型檔案比較技術包含比較所有檔案的方法、分派版本資訊給檔案並根據此版本資訊檢查檔案之方法、對檔案施用一散列函數(hash function)而後比較該等檔案之方法、等等。 The traditional file comparison technique that has been adopted includes a method of comparing all files, a method of distributing version information to a file, checking a file according to the version information, a hash function for the file, and then comparing the files, and many more.

比較所有檔案的方法不常被使用，因為待比較資料的數量龐大且比較的速度緩慢。分派版本資訊給檔案並比較檔案之方法的不利之處在於假使檔案之內容改變，檔案內容即可能不符合版本資訊，除非檔案版本資訊亦隨之改變，故檔案比較由於此種錯誤而無法正確執行。 The method of comparing all files is not often used because the amount of data to be compared is large and the speed of comparison is slow. The disadvantage of assigning version information to files and comparing files is that if the contents of the file are changed, the file content may not conform to the version information, unless the file version information changes, so the file comparison cannot be performed correctly due to such errors. .

因此，在多數情況下，其係藉由對檔案施用一散列函數而計算散列數值，從而藉由比較計算出的散列數值以比較檔案之內容。然而，此一僅使用散列數值的傳統型比較方法係有問題的，其問題在於當檔案的大小極為龐大之時，需要更多的計算資源以產生散列數值，且執行對應運算所需之時間亦隨之增加。 Therefore, in most cases, it is by applying a hash letter to the file. The hash value is calculated as a number to compare the calculated hash values to compare the contents of the file. However, this traditional comparison method using only hash values is problematic. The problem is that when the size of the file is extremely large, more computing resources are needed to generate hash values, and the corresponding operations are performed. Time also increases.

因此，本發明對於先前技術具有前述之問題銘記於心，且本發明之一目的係提出一種散列資料結構，其能夠利用較小量的資源輕易地進行檔案之間的比較。 Accordingly, the present invention has been borne in mind with the foregoing problems with the prior art, and it is an object of the present invention to provide a hash data structure that enables easy comparison between files with a small amount of resources.

本發明之另一目的係提出一種散列資料結構產生方法以及針對其之一種散列資料結構比較方法，其能夠利用檔案比較所需的一種散列資料結構，迅速地進行檔案之間的比較。 Another object of the present invention is to provide a hash data structure generating method and a hash data structure comparing method thereof, which can quickly compare files between files by using a hash data structure required for file comparison.

本發明之再一目的係提出一種散列比較系統，其能夠利用檔案比較所需的一種散列資料結構，有效率地進行檔案之間的比較。 Still another object of the present invention is to provide a hash comparison system capable of efficiently performing comparison between files by utilizing a hash data structure required for file comparison.

依據本發明之一特色以達成上述目的，其提出一種散列資料結構，包含由預定資料位元所構成且相關於一原始檔案之一屬性之檔案資訊，且包含由預定資料位元所構成且相關於該原始檔案之一散列數值，其中該散列資料結構包含對應至該散列數值的資料位元，位於對應至該檔案資訊的資料位元之後。 According to a feature of the present invention, in order to achieve the above object, a hash data structure is provided, comprising: file information composed of predetermined data bits and related to an attribute of an original file, and comprising a predetermined data bit and Corresponding to a hash value of the original file, wherein the hash data structure includes a data bit corresponding to the hash value, located after the data bit corresponding to the file information.

在一實施例之中，上述之檔案資訊可以包含該原始檔案之一大小數值、包含該原始檔案之最前面資料之第一局部資料、以及包含該原始檔案之最後面資料之第二局部資料中的至少其中一者。 In an embodiment, the file information may include a size value of the original file, and the first page containing the first data of the original file. At least one of the departmental information and the second partial material containing the last information of the original file.

在一實施例之中，上述之散列資料結構可以另外包含一結構標頭，該結構標頭包含有關該散列資料結構之中所包含的檔案資訊和散列數值各自之結構資訊。 In an embodiment, the hash data structure may further include a structure header, and the structure header includes structural information about respective file information and hash values included in the hash data structure.

在一實施例之中，上述之散列資料結構可以另外包含有關該散列資料結構之同位(parity)資訊，其中該同位資訊包含對於該檔案資訊之一第一同位位元以及對於該散列數值之一第二同位位元。 In an embodiment, the hash data structure may additionally include parity information about the hash data structure, wherein the parity information includes a first parity bit for the file information and for the One of the column values is the second co-bit.

依據本發明之另一特色以達成上述目的，其提出一種用於產生待用以比較原始檔案之各段散列資料的散列資料產生方法，包含(a)檢查每一原始檔案之一屬性並根據被檢查的屬性產生由預定資料位元所構成的檔案資訊、(b)藉由對至少部分之該原始檔案施用一散列演算法以計算一散列數值、以及(c)藉由依序將散列數值連結至檔案資訊以產生散列資料。 According to another feature of the present invention, in order to achieve the above object, a method for generating a hash data for generating each piece of hash data to be compared with an original file is provided, comprising: (a) checking one attribute of each original file and Generating file information consisting of predetermined data bits based on the attributes being inspected, (b) applying a hash algorithm to at least a portion of the original file to calculate a hash value, and (c) by sequentially The hash value is linked to the file information to generate hash data.

在一實施例之中，(a)可以包含檢查該原始檔案之一大小、一名稱、和一格式、包含該原始檔案之最前面資料的第一局部資料、以及包含該原始檔案之最後面資料的第二局部資料中的至少其中一者，並且產生包含該原始檔案之該大小、該名稱、和該格式、包含該原始檔案之最前面資料的該第一局部資料、以及包含該原始檔案之最後面資料的該第二局部資料中的至少其中一者之檔案資訊。 In one embodiment, (a) may include checking a size of the original file, a name, and a format, including the first partial data of the first data of the original file, and including the last data of the original file. At least one of the second partial material, and generating the size, the name, and the format of the original file, the first partial data including the first data of the original file, and the original file The file information of at least one of the second partial materials of the last data.

在一實施例之中，上述之散列資料產生方法可以另外包含(d)針對該散列資料產生散列同位位元。 In an embodiment, the above hash data generating method may additionally Including (d) generating a hash co-located bit for the hash data.

在一實施例之中，(d)可以包含針對該檔案資訊產生一第一同位位元、針對該散列數值產生一第二同位位元、以及藉由依序連結該第一及第二同位位元以產生該散列同位位元。 In an embodiment, (d) may include generating a first co-located bit for the file information, generating a second co-located bit for the hash value, and sequentially connecting the first and second co-located bits by sequentially The bit is generated to produce the hash co-located bit.

依據本發明之再一特色以達成上述目的，其提出一種用於產生待用以比較原始檔案之各段散列資料的散列資料產生方法，包含(a)產生一結構標頭，該結構標頭包含有關一散列資料結構中所包含的檔案資訊和一散列數值各自之結構資訊、(b)檢查每一原始檔案之一屬性並根據所檢查的屬性產生由預定資料位元所構成的檔案資訊、(c)藉由對至少部分之原始檔案施用一散列演算法以計算一散列數值、以及(d)藉由依序將該散列數值連結至該檔案資訊而產生散列資料。 According to still another feature of the present invention, in order to achieve the above object, a method for generating a hash data for generating each piece of hash data to be compared with an original file is provided, comprising: (a) generating a structure header, the structure label The header contains structural information about the file information and a hash value contained in a hashed data structure, (b) checks one attribute of each original file, and generates a predetermined data bit based on the checked attribute. The archive information, (c) generates a hash data by applying a hash algorithm to at least a portion of the original file to calculate a hash value, and (d) by sequentially linking the hash value to the file information.

依據本發明之又另一特色以達成上述目的，其提出一種散列資料比較方法，用以利用包含檔案資訊及一散列數值的散列資料將二原始檔案彼此互相比較，該方法包含(a)檢查分別關聯該二原始檔案之二段散列資料、(b)將包含於該二段散列資料中之二段檔案資訊彼此互相比較、以及(c)若該二段檔案資訊完全相同，則將包含於該二段散列資料中之二散列數值彼此互相比較，且若該二散列數值完全相同，則判定該二原始檔案係同一檔案。 According to still another feature of the present invention, in order to achieve the above object, a method for comparing hash data is provided for comparing two original files with each other by using hash data including file information and a hash value, the method comprising (a ) examining the two pieces of hash information associated with the two original files, (b) comparing the two pieces of file information contained in the two pieces of hashed information with each other, and (c) if the information of the two pieces of files is identical, Then, the two hash values included in the two pieces of hash data are compared with each other, and if the two hash values are identical, it is determined that the two original files are the same file.

在一實施例之中，上述之檔案資訊可以包含一對應原始檔案之一大小、一名稱及一格式、包含該原始檔案之最前面資料之第一局部資料、以及包含該原始檔案之最後面資料之第二局部資料中的至少其中一者。 In an embodiment, the file information may include a size corresponding to one of the original files, a name and a format, and the most At least one of the first partial data of the preceding data and the second partial data of the last data of the original file.

在一實施例之中，(b)可以包含將組成該二段檔案資訊的個別資料位元彼此互相比較。 In an embodiment, (b) may include comparing individual data bits constituting the two pieces of file information with each other.

在一實施例之中，(b)可以包含，針對該二段檔案資訊中之每一者，各自識別包含於對應檔案資訊中之一原始檔案之一大小、一名稱、和一格式、包含該原始檔案之最前面資料的第一局部資料、以及包含該原始檔案之最後面資料的第二局部資料中的至少其中一者，並且就已被識別出的該原始檔案之該大小、該名稱、和該格式、包含該原始檔案之最前面資料的該第一局部資料、以及包含該原始檔案之最後面資料的該第二局部資料中的至少其中一者，將該二段檔案資訊彼此互相比較。 In an embodiment, (b) may include, for each of the two pieces of file information, each identifying a size, a name, and a format of one of the original files included in the corresponding file information, including the At least one of the first partial data of the first data of the original file and the second partial data of the last data of the original file, and the size, the name, the name of the original file that has been identified Comparing the two pieces of file information with each other, the format, the first partial data including the first data of the original file, and the second partial data including the last data of the original file .

依據本發明之仍另一特色以達成上述目的，其提出一種散列資料比較方法，用以利用包含檔案資訊、一散列數值、及包含有關該檔案資訊和該散列數值之結構資訊之一結構標頭的散列資料將二原始檔案彼此互相比較，該方法包含(a)將該二原始檔案之結構標頭彼此互相比較，並判斷複數段散列資料是否具有一相同結構、(b)若其判定該複數段散列資料具有相同結構，則將分別關聯該二原始檔案之複數段檔案資訊彼此互相比較、以及(c)若該複數段檔案資訊完全相同，則將分別關聯該二原始檔案之散列數值彼此互相比較，且若該等散列數值完全相同，則判定該二原始檔案係同一檔案。 According to still another feature of the present invention, in order to achieve the above object, a method for comparing hash data is provided for utilizing file information, a hash value, and one of structural information including the file information and the hash value. The hash data of the structure header compares the two original files with each other, and the method comprises (a) comparing the structure headers of the two original files with each other, and determining whether the plurality of original files have the same structure, (b) If it is determined that the plurality of pieces of hash data have the same structure, the plurality of pieces of file information respectively associated with the two original files are compared with each other, and (c) if the plurality of pieces of file information are identical, respectively, the two originals are associated The hash values of the files are compared with each other, and if the hash values are identical, it is determined that the two original files are the same file.

依據本發明之仍另一特色以達成上述目的，其提出一種散列資料比較系統，用以利用包含檔案資訊及一散列數值的散列資料將原始檔案彼此互相比較，該系統包含一檔案資訊產生單元，被組構成檢查每一原始檔案之一屬性以及產生有關該原始檔案之檔案資訊、一散列產生單元，被組構成藉由對至少部分之該原始檔案施用一散列函數演算法以計算一散列數值、以及一控制單元，被組構成產生對應原始檔案之散列資料，包含檔案資訊和散列數值。 According to still another feature of the present invention, in order to achieve the above object, a hash data comparison system is provided for comparing original files with each other by using hash data including file information and a hash value, the system including a file information Generating units, configured to examine an attribute of each of the original files and to generate file information about the original file, a hash generating unit configured to apply a hash function algorithm to at least a portion of the original file A hash value is calculated, and a control unit is grouped to generate hash data corresponding to the original file, including file information and hash values.

在一實施例之中，上述之散列資料比較系統可以另包含一散列檔案管理單元，被組構成儲存所產生之散列資料以及保持與關聯儲存之散列數值的原始檔案有關的資訊。 In one embodiment, the hash data comparison system described above may further comprise a hash file management unit configured to store the hash data generated by the storage and to maintain information related to the original file of the associated stored hash value.

在一實施例之中，上述之控制單元可以藉由依序比較第一與第二原始檔案的複數段檔案資訊和散列數值以判定第一與第二原始檔案之雷同性。 In an embodiment, the control unit may determine the similarity between the first and second original files by sequentially comparing the plurality of file information and the hash value of the first and second original files.

在一實施例之中，該控制單元可以產生包含有關檔案資訊及散列數值的識別資訊之一結構標頭，並產生散列資料，包含該結構標頭、檔案資訊、以及散列數值。 In an embodiment, the control unit may generate a structure header including one of identification information about the file information and the hash value, and generate hash data including the structure header, the file information, and the hash value.

在一實施例之中，該控制單元可以依序比較第一與第二原始檔案的結構標頭、複數段檔案資訊、和散列數值，而後若第一與第二原始檔案之結構標頭、檔案資訊、和散列數值完全相同，則判定第一與第二原始檔案係同一檔案。 In an embodiment, the control unit may sequentially compare the structure header, the plurality of file information, and the hash value of the first and second original files, and then if the structure headers of the first and second original files, If the file information and the hash value are identical, it is determined that the first and second original files are the same file.

在一實施例之中，該控制單元可以針對散列資料產生同位位元，包含針對檔案資訊和散列數值分別計算出之同位位元。 In an embodiment, the control unit may generate a parity bit for the hash data, including a parity bit calculated separately for the file information and the hash value.

揭示於本發明中的技術僅係用於一結構或功能說明之實施例，故揭示技術之範疇不應被解讀為受限於本說明書之中所描述的實施例。換言之，實施例均能夠以各種不同之形式加以修改且可以具有各種不同之形式，故揭示技術之範疇應被理解為包含能夠實現本發明技術精神之等效事物。 The technology disclosed in the present invention is only for the purpose of illustrating a structure or a functional description, and the scope of the disclosed technology should not be construed as being limited to the embodiments described in the specification. In other words, the embodiments can be modified in various different forms and can have various forms, and the scope of the disclosed technology should be understood to include equivalents capable of achieving the technical spirit of the present invention.

並且，本說明書中所提及之用語之意義應被理解如下。 Also, the meaning of the terms mentioned in this specification should be understood as follows.

諸如"第一"及"第二"之用語僅係用以區分一組件與其他組件，且本發明之範疇不應受限於此等用語。舉例而言，一第一組件可以被稱為一第二組件，而一第二組件可以類似地被稱為一第一組件。 Terms such as "first" and "second" are used to distinguish one component from another, and the scope of the invention should not be limited to such terms. For example, a first component can be referred to as a second component, and a second component can be similarly referred to as a first component.

在整個說明書之中，其應理解，一個表明一第一組件"連接"至一第二組件之表述可以包含其中該第一組件係透過介於其間之某一其他組件連接至該第二組件之情形，以及其中該第一組件係"直接連接"至該第二組件之情形。相對地，其應理解，一個表明一第一組件"直接連接"至一第二組件之表述係表示其中並無組件介於該第一組件與第二組件之間。並且，描述組件之間的關係的其他表述，意即，"介於"和"直接介於"或者"毗鄰"和"直接毗鄰"，均應以類似之方式解讀之。 Throughout the specification, it should be understood that an expression indicating that a first component "connects" to a second component can include wherein the first component is coupled to the second component by some other component interposed therebetween. Situation, and where the first component is "directly connected" to the second component. In contrast, it should be understood that a description of a first component "directly connected" to a second component means that no component is interposed between the first component and the second component. Also, other expressions describing the relationship between components, that is, "between" and "directly between" or "adjacent" and "directly adjacent", should be interpreted in a similar manner.

其應理解，除非文中特別敘明，否則一單數形式之用詞均包含複數形式之意義。在本說明書之中，其應理解，諸如"包含"或"具有"之用語僅係用以表示其存在特徵、數目、步驟、運作、組件、部件、或其組合，並非排除一或多個其他特徵、數目、步驟、運作、組件、部件、或其組合將存在或被加入的可能性。 It is to be understood that the singular forms "a" In the present specification, it should be understood that terms such as "including" or "having" are used merely to indicate their characteristic features and numbers. The subject matter, steps, operations, components, components, or combinations thereof are not intended to exclude the possibility that one or more other features, numbers, steps, operations, components, components, or combinations thereof will be present or added.

個別步驟中的參考符號(例如，a、b、c、等等)之使用僅係基於說明之方便性之理由，並非表示個別步驟之順序，且除非文中有明確界定步驟之特定順序，否則該等個別步驟均可以以異於說明書中所述之順序執行。換言之，該等步驟可以是以相同於說明書中所述之順序執行，或者大致上同時執行，或者是以相反之順序執行。 The use of reference symbols (e.g., a, b, c, etc.) in the individual steps is for illustrative purposes only and does not represent an order of individual steps, and unless the context clearly dictates the specific order of the steps, Individual steps may be performed in an order different from that described in the specification. In other words, the steps may be performed in the same order as described in the specification, or substantially simultaneously, or in the reverse order.

除非有不同的定義，否則此處使用的包含技術或科學性術語的所有用語，均具有與熟習本發明相關技術者通常理解的用語相同之意義。與定義於一般使用字典中相同之用語，應取決於上下文被解釋為具有與相關技術相同之意義，除非明確定義於本說明書之中，否則均不應被解釋為具有理想的或過於正式之含義。 Unless otherwise defined, all terms including technical or scientific terms used herein have the same meaning as commonly understood by those skilled in the art. Terms that are the same as defined in the general use of the dictionary should be interpreted as having the same meaning as the related art, and should not be interpreted as having an ideal or too formal meaning unless explicitly defined in the specification. .

在以下的說明之中，"原始檔案"一詞係表示一個將做為對其施用一散列資料結構之目標的檔案。本發明提出一種散列資料結構，此散列資料結構對於各個原始檔案均類似地具有散列數值之典型特性的獨立數值。 In the following description, the term "original file" refers to an archive that will serve as a target for the application of a hashed data structure. The present invention proposes a hash data structure that similarly has independent values for the typical characteristics of hash values for each original file.

圖1係一參考示意圖，其顯示依據本發明之一散列資料結構之一實施例。 1 is a schematic diagram showing an embodiment of a hash data structure in accordance with the present invention.

參見圖1，一散列資料結構100包含檔案資訊110和一散列數值120。更詳細言之，散列資料結構100可以被組構成包含對應至散列數值之位元，位於一原始檔案之檔案資訊110的資料位元之後。 Referring to FIG. 1, a hash data structure 100 includes file information 110 and a hash value 120. In more detail, the hash data structure 100 can be grouped to contain bits corresponding to the hash value, located in an original file archive. After the information bit of 110.

檔案資訊110可以包含原始檔案的檔案大小數值111、包含原始檔案最前面資料的局部資料112(本文以下稱其為"第一局部資料")、以及包含原始檔案最後面資料的局部資料113(本文以下稱其為"第二局部資料")。依據一實施例，檔案資訊110可以是由上述的三種資料111至113中的至少其中一者所構成。 The file information 110 may include the file size value 111 of the original file, the local data 112 including the first data of the original file (hereinafter referred to as "the first partial data"), and the local data 113 including the last data of the original file (this article). Hereinafter referred to as "second partial data"). According to an embodiment, the archive information 110 may be composed of at least one of the three types of materials 111 to 113 described above.

檔案資訊110可以依據將說明於下之實施例在一或多個系統之中被組構成具有不同的長度。換言之，檔案資訊110不一定要由特定之資料位元構成，而是可以由對應至一取決於系統的設定或者視場合需要之預定長度之資料位元所構成。 The archive information 110 can be grouped into different lengths in one or more systems in accordance with the embodiments to be described below. In other words, the file information 110 does not have to be composed of specific data bits, but can be composed of data bits corresponding to a predetermined length depending on the settings of the system or as needed.

原始檔案大小數值111係代表原始檔案之大小的資料。 The original file size value 111 represents the size of the original file.

第一局部資料112係對應至從原始檔案的第一個位元算起一預定長度之原始檔案的部分，而第二局部資料113係對應至從原始檔案的最末個位元算起一預定長度之原始檔案的部分。在此情況下，第一和第二局部資料112及113之長度可以取決於一對應檔案比較系統被以不同方式決定，而本發明並不受限於該等長度。 The first partial data 112 corresponds to a portion of the original file of a predetermined length from the first bit of the original file, and the second partial data 113 corresponds to a predetermined time from the last bit of the original file. The part of the original file of length. In this case, the lengths of the first and second partial data 112 and 113 may be determined differently depending on a corresponding file comparison system, and the invention is not limited to the lengths.

散列數值120係藉由對原始檔案施用一散列演算法而得到的資料。在一實施例之中，散列數值120可以被設定成特定之位元。換言之，檔案資訊110被组構成使得包含於其中之元素和該等元素之長度可以被改變，而散列數值120可以被限制成一特定之長度(資料位元)，諸如一標準化的長度。舉例而言，在SHA-0或SHA-1演算法的情形之中，散列數值120可以具有160個位元，在SHA-256/224演算法的情形之中，該散列數值可以具有256/224個位元，而在SHA-512/384演算法的情形之中，該散列數值可以具有512/384個位元。換言之，即使散列數值120被套用至取決於實施例的一或多個系統，其仍可以是由一特定長度之資料位元所構成。意即，由於散列數值較佳之實施方式係取決於標準加以決定，故其可以被限制成一特定長度之資料位元。 The hash value 120 is obtained by applying a hash algorithm to the original file. In an embodiment, the hash value 120 can be set to a particular bit. In other words, the archive information 110 is grouped such that the elements contained therein and the length of the elements can be changed, and the hash value 120 can be limited to a specific length (data bit), such as a standardization. length. For example, in the case of the SHA-0 or SHA-1 algorithm, the hash value 120 can have 160 bits, and in the case of the SHA-256/224 algorithm, the hash value can have 256. /224 bits, and in the case of the SHA-512/384 algorithm, the hash value can have 512/384 bits. In other words, even if the hash value 120 is applied to one or more systems depending on the embodiment, it can be constructed of a particular length of data bits. That is, since the preferred embodiment of the hash value is determined by the standard, it can be limited to a data bit of a particular length.

當比較檔案之時，檔案資訊110必須在散列數值120之前被比較。例如，當其想要在檔案A至C之中搜尋檔案A之時，此一實例說明如下，檔案A至C利用有關檔案A的檔案資訊彼此相比較，從而使得檔案A能夠被識別出來。在此情況下，由於僅使用檔案資訊110即能夠找出對應檔案，故無須彼此比較散列數值，故而可以利用一較小量的資源，迅速找出一個預期的檔案。 When comparing files, the file information 110 must be compared before the hash value 120. For example, when it wants to search for the file A among the files A to C, this example is explained as follows. The files A to C are compared with each other using the file information about the file A, so that the file A can be recognized. In this case, since only the file information 110 can be used to find the corresponding file, it is not necessary to compare the hash values with each other, so that a small amount of resources can be utilized to quickly find an expected file.

圖2係一參考示意圖，其顯示依據本發明之一散列資料結構之另一實施例。相較於圖1的實施例，顯示於圖2中的散列資料結構另包含一結構標頭130。 Figure 2 is a schematic diagram showing another embodiment of a hashed data structure in accordance with the present invention. In contrast to the embodiment of FIG. 1, the hash data structure shown in FIG. 2 further includes a structure header 130.

結構標頭130包含有關於檔案資訊110和一散列數值120之結構的資訊。例如，結構標頭130可以包含有關檔案資訊110之位元總數目以及散列數值120之位元總數目的資訊。 The structure header 130 contains information about the structure of the archive information 110 and a hash value 120. For example, the structure header 130 may contain information regarding the total number of bits of the archive information 110 and the total number of bits of the hash value 120.

在一實施例之中，結構標頭130可以包含有關用以估算散列數值120之一散列函數的資訊。例如，結構標頭130可以包含有關用以計算對應的散列數值120之一散列函數的資訊，諸如SHA-0或SHA-1。 In an embodiment, the structure header 130 may contain information about Calculate the information of the hash function of one of the hash values 120. For example, structure header 130 may contain information about a hash function used to calculate a corresponding hash value 120, such as SHA-0 or SHA-1.

在一實施例之中，檔案資訊110可以僅包含顯示於圖式中的三種資料111至113中的至少其中一者，且結構標頭130可以提供有關包含於檔案資訊110中的資料的資訊。 In an embodiment, the archive information 110 may only include at least one of the three materials 111 to 113 displayed in the schema, and the structure header 130 may provide information about the materials included in the archive information 110.

例如，假設檔案大小資訊以及第一和第二局部資料分別以A、B、和C識別之，檔案大小資訊具有二位元組之固定大小，且結構標頭130係由"6AB"組成。在此情況下，結構標頭130中的"6"係一個表示檔案資訊110的位元組總數目的數值，而"AB"則表示檔案資訊110係由檔案大小資訊111與第一局部資料112所構成。 For example, assume that the file size information and the first and second partial data are identified by A, B, and C, respectively, the file size information has a fixed size of two bytes, and the structure header 130 is composed of "6AB". In this case, "6" in the structure header 130 is a value indicating the total number of bytes of the file information 110, and "AB" indicates that the file information 110 is from the file size information 111 and the first partial data 112. Composition.

在圖2的實施例之中，所揭示的散列資料結構100亦可以套用至具有不同長度的檔案資訊110被單一系統使用的情形之中。換言之，之所以如此的理由係散列資料結構100之元素的位元可以利用結構標頭130被個別地識別出來。 In the embodiment of FIG. 2, the disclosed hash data structure 100 can also be applied to situations where file information 110 having different lengths is used by a single system. In other words, the reason why the elements of the elements of the hash data structure 100 can be individually identified by the structure header 130.

圖3係一參考示意圖，其顯示依據本發明之一散列資料結構之再一實施例。相較於圖1的實施例，顯示於圖3中的散列資料結構另包含同位資訊140。 Figure 3 is a schematic diagram showing still another embodiment of a hash data structure in accordance with the present invention. In contrast to the embodiment of FIG. 1, the hash data structure shown in FIG. 3 additionally includes parity information 140.

該同位資訊140包含散列資料結構100之一同位數值。 The parity information 140 includes one of the hash data structures 100.

在一實施例之中，同位資訊140可以包含(i)對於檔案資訊110之一同位位元以及(ii)對於一散列數值120之一同位位元。其目的係用以識別個別的同位數值，因為本發明在比較檔案之時，僅使用檔案資訊110即能夠完成比較。 In one embodiment, the parity information 140 can include (i) one of the parity bits for the archive information 110 and (ii) one of the parity bits for a hash value 120. The purpose is to identify individual parity values, as the invention When comparing files, the comparison can be done using only file information 110.

圖3之實施例可以在檔案傳輸或類似情況發生時，更有效率地執行錯誤檢查以及彼此互相比較檔案。 The embodiment of Figure 3 can perform error checking and compare files with each other more efficiently when file transfers or the like occur.

圖4係一組態示意圖，其顯示依據本發明之一散列比較系統之一實施例。 4 is a configuration diagram showing an embodiment of a hash comparison system in accordance with the present invention.

一散列比較系統200包含一檔案資訊產生單元210、一散列產生單元220、一散列檔案管理單元230、以及一控制單元250。在一實施例之中，散列比較系統200可以另包含一原始檔案管理單元240。 A hash comparison system 200 includes a file information generation unit 210, a hash generation unit 220, a hash file management unit 230, and a control unit 250. In an embodiment, the hash comparison system 200 can further include an original file management unit 240.

檔案資訊產生單元210可以檢查原始檔案的屬性並且產生有關於原始檔案的檔案資訊。此處，每一原始檔案的屬性可以包含大小、名稱、格式、以及原始檔案的局部資料位元(例如，距第一個資料位元或最末個資料位元一特定長度內之資料)、等等。 The file information generating unit 210 can check the attributes of the original file and generate file information about the original file. Here, the attributes of each original file may include the size, name, format, and local data bits of the original file (for example, data within a certain length from the first data bit or the last data bit), and many more.

在一實施例之中，檔案資訊產生單元210可以藉由預設距原始檔案資料位元之第一及最末個位元之長度，讀取原始檔案之資料位元，而產生上述的第一及第二局部資料。在此情況下，該預設長度可以相當於一對應散列資料結構之第一和第二局部資料的長度。 In an embodiment, the file information generating unit 210 can read the data bit of the original file by presetting the length of the first and last bits from the original file data bit to generate the first And the second partial data. In this case, the preset length may correspond to the length of the first and second partial data of a corresponding hash data structure.

散列產生單元220可以藉由對每一原始檔案施用一散列函數而產生一散列數值。散列產生單元220可以採用由個別系統所使用的一個散列函數或者是根據標準，例如，根據安全散列演算法(Secure Hash Algorithm；SHA)的一個散列函數。 The hash generation unit 220 can generate a hash value by applying a hash function to each original file. The hash generation unit 220 may employ a hash function used by an individual system or according to a standard, for example, a hash function according to a Secure Hash Algorithm (SHA).

在一實施例之中，散列產生單元220具有複數散列函數，且能夠回應控制單元250之請求，利用一特定之散列函數，針對原始檔案產生一散列數值。 In one embodiment, the hash generation unit 220 has a complex hash function and is capable of generating a hash value for the original file using a particular hash function in response to a request from the control unit 250.

在一實施例之中，散列產生單元220可以僅針對部分之原始檔案產生一散列數值。舉例而言，當原始檔案的大小係等於或大於一預定數值之時，散列產生單元220可以針對對應至一預設大小的部分原始檔案產生一散列數值。在另一實施例之中，散列產生單元220亦可以僅針對原始檔案中除了第一及第二局部資料之外的其餘部分產生一散列數值。 In an embodiment, the hash generation unit 220 may generate a hash value only for a portion of the original file. For example, when the size of the original file is equal to or greater than a predetermined value, the hash generation unit 220 may generate a hash value for a portion of the original file corresponding to a predetermined size. In another embodiment, the hash generation unit 220 may also generate a hash value only for the remainder of the original file other than the first and second partial data.

散列檔案管理單元230可以管理原始檔案以及與其對應的散列檔案(結構)。舉例而言，散列檔案管理單元230儲存散列檔案並保持有關匹配對應散列檔案(例如，連結資訊或類似項目)的原始檔案的資訊。 The hash file management unit 230 can manage the original file and the hash file (structure) corresponding thereto. For example, the hash file management unit 230 stores the hash file and maintains information about the original file that matches the corresponding hash file (eg, link information or the like).

原始檔案管理單元240可以儲存原始檔案並保留每一個原始檔案的歷史記錄。例如，若其判定檔案A由於進行一個對於檔案A的散列比較而被改變，則對應的檔案A以及其歷史記錄均可以被儲存於原始檔案管理單元240之中。 The original file management unit 240 can store the original file and keep a history of each original file. For example, if it is determined that file A is changed by performing a hash comparison for file A, the corresponding file A and its history can be stored in the original file management unit 240.

控制單元250可以藉由控制散列比較系統200的整體運作而產生一散列資料結構或者彼此互相比較原始檔案。 The control unit 250 can generate a hash data structure or compare the original files with each other by controlling the overall operation of the hash comparison system 200.

在一實施例之中，控制單元250可以針對每一個原始檔案產生一散列資料結構(檔案)。更詳細言之，控制單元250可以提供一特定之原始檔案給檔案資訊產生單元210和散列產生單元220二者，並利用已針對該特定之原始檔案接收的檔案資訊和一散列數值產生一個散列資料結構。一個有關散列資料結構之產生的實施例將參照圖5和圖6加以詳細說明。 In an embodiment, control unit 250 may generate a hash data structure (archive) for each original file. In more detail, the control unit 250 can provide a specific original file to both the archive information generating unit 210 and the hash generating unit 220, and utilize the original file that has been targeted for the particular file. The received file information and a hash value result in a hashed data structure. An embodiment relating to the generation of a hash data structure will be described in detail with reference to FIGS. 5 and 6.

在一實施例之中，控制單元250可以利用散列資料結構將二原始檔案彼此比較。依據本發明的散列資料結構被分成檔案資訊和一散列數值，而原始檔案利用此結構特性被彼此互相比較。更詳細言之，控制單元250分析待彼此比較的原始檔案的散列資料結構，並藉由利用散列資料結構之檔案資訊，判定該等原始檔案是否係相同的檔案。若其判定該等原始檔案係相同的檔案，則控制單元250藉由利用散列資料結構的散列數值，檢查該等原始檔案是否具有相同的內容。本發明先利用檔案資訊執行判定檔案是否係相同檔案的步驟，而後僅在檔案被判定係相同檔案時，執行對散列數值進行比較之步驟，因而更迅速地進行比較之動作。 In an embodiment, control unit 250 may utilize the hash data structure to compare the two original files to each other. The hash data structure in accordance with the present invention is divided into archive information and a hash value, and the original files are compared with each other using this structural property. In more detail, the control unit 250 analyzes the hash data structure of the original files to be compared with each other, and determines whether the original files are the same file by using the file information of the hash data structure. If it is determined that the original files are the same file, the control unit 250 checks whether the original files have the same content by using the hash value of the hash data structure. The present invention first uses the file information to perform the step of determining whether the file is the same file, and then performs the step of comparing the hash values only when the file is determined to be the same file, thereby performing the comparison action more quickly.

在一實施例之中，當複數段檔案資訊彼此互相比較之時，控制單元250可以互相比較構成該複數段檔案資訊的個別資料位元。在另一實施例之中，控制單元250可以識別構成該複數段檔案資訊的個別元素，且可以藉由將識別出的元素彼此比較而比較該複數段檔案資訊。換言之，對於每一段檔案資訊，其識別出包含於對應檔案資訊之中的原始檔案的大小、名稱、格式、第一局部資料、和第二局部資料中的至少其中一者，且被識別出的元素可以與其他段檔案資訊之中者進行比較。 In an embodiment, when the plurality of pieces of file information are compared with each other, the control unit 250 may compare the individual data bits constituting the plurality of pieces of file information with each other. In another embodiment, the control unit 250 may identify individual elements constituting the plurality of pieces of archive information, and may compare the plurality of pieces of file information by comparing the identified elements with each other. In other words, for each piece of file information, it identifies at least one of the size, name, format, first partial data, and second partial data of the original file included in the corresponding file information, and is identified. Elements can be compared to other segments of the file information.

在一實施例之中，控制單元250可以提供產生的散列檔案以及與其關聯的原始檔案資訊給散列檔案管理單元230，使得散列檔案能夠被管理。控制單元250提供產生的散列檔案給散列檔案管理單元230，使得散列檔案被儲存於散列檔案管理單元230之中。當接收到針對諸如散列比較的另一動作的請求之時，控制單元250可以從散列檔案管理單元230取得對應至一特定原始檔案之一散列檔案，使得一預定之運作得以執行。 In an embodiment, the control unit 250 can provide the generated hash file and the original file information associated therewith to the hash file management unit 230 so that the hash file can be managed. The control unit 250 provides the generated hash file to the hash file management unit 230 so that the hash file is stored in the hash file management unit 230. Upon receiving a request for another action, such as a hash comparison, control unit 250 may retrieve a hash file corresponding to one of the particular original files from hash file management unit 230 such that a predetermined operation is performed.

在一實施例之中，控制單元250可以控制原始檔案管理單元240，使得原始檔案之一歷史記錄被產生。舉例而言，當針對同一原始檔案出現一修補或類似項目之時，可能需要該修補之一歷史記錄。在此實例的情形之中，控制單元250(i)利用檔案資訊並藉由對原始檔案進行一比較而判定原始檔案係相同的原始檔案，並且(ii)若其利用一散列數值判定檔案內容之間存有差異，則將對應原始檔案的相關資訊和散列資料結構的相關資訊提供給原始檔案管理單元240，從而使得歷史記錄能夠被產生。 In an embodiment, control unit 250 may control original archive management unit 240 such that a history of the original archive is generated. For example, when a patch or similar item appears for the same original file, a history of that patch may be required. In the case of this example, control unit 250(i) uses the archive information and determines that the original file is the same original file by comparing the original file, and (ii) if it uses a hash value to determine the file content If there is a difference between them, the related information of the original file and the related information of the hashed data structure are supplied to the original file management unit 240, so that the history can be generated.

在一實施例之中，控制單元250可以針對每一散列資料結構產生一結構標頭。更詳細言之，當檔案資訊產生單元210與散列產生單元220分別提供檔案資訊與一散列數值之時，控制單元250可以針對散列資料結構產生一結構標頭，使得檔案資訊和散列數值得以被識別。舉例而言，控制單元250可以產生一結構標頭，此結構標頭包含指出含納於檔案資訊110中的元素、各別元素的資料長度、一散列數值之長度、等資訊。在此實施例之中，當散列資料結構彼此比較之時，控制單元250首先分析結構標頭以識別出檔案資訊和散列數值，而後根據檔案資訊判定待比較的二個原始檔案是否係相同的檔案。若其判定該等原始檔案係相同的檔案，則控制單元250可以藉由比較彼此的散列數值而判斷檔案的內容是否已改變。 In an embodiment, control unit 250 may generate a structure header for each hash data structure. In more detail, when the file information generating unit 210 and the hash generating unit 220 respectively provide the file information and a hash value, the control unit 250 may generate a structure header for the hash data structure, so that the file information and the hash are made. The value is recognized. For example, the control unit 250 can generate a structure header that includes the length of the data indicating the elements included in the archive information 110, the individual elements, and The length of the hash value, and other information. In this embodiment, when the hash data structures are compared with each other, the control unit 250 first analyzes the structure headers to identify the file information and the hash value, and then determines whether the two original files to be compared are the same according to the file information. Archives. If it is determined that the original files are the same file, the control unit 250 can determine whether the content of the file has changed by comparing the hash values of each other.

在一實施例之中，控制單元250可以產生同位資訊並將其加入每一散列資料結構之中。更詳細言之，控制單元250可以針對檔案資訊產生一同位位元且針對一散列數值產生一同位位元，並且可以產生同位資訊，包含該二同位位元。此實施例可應用於發生於不同系統之間之一散列資料結構之傳輸或類似動作的情形。其針對散列資料結構的檔案資訊與散列數值二者分別計算同位位元，使得當散列資料結構彼此比較之時，一同位運算能夠被更迅速地執行。 In an embodiment, control unit 250 can generate co-located information and add it to each hash data structure. In more detail, the control unit 250 can generate a parity bit for the file information and generate a parity bit for a hash value, and can generate the parity information including the two parity bits. This embodiment is applicable to situations where transmission of a hashed data structure or similar action occurs between different systems. It calculates the parity bits for the file information and the hash value of the hash data structure, respectively, so that when the hash data structures are compared with each other, a parity operation can be performed more quickly.

圖5係一流程圖，其顯示能夠由圖4之散列比較系統執行之一散列資料產生方法之一實施例。 Figure 5 is a flow chart showing one embodiment of a hash data generation method that can be performed by the hash comparison system of Figure 4.

參見圖5，在步驟S510處，檔案資訊產生單元210可以在控制單元250的控制之下，檢查每一原始檔案的屬性。在此情況下，屬性係被收集以產生檔案資訊的資料片段，且可以是一檔案大小、一檔案名稱、一檔案格式、第一或第二局部資料、或類似項目，如前所述。 Referring to FIG. 5, at step S510, the archive information generating unit 210 may check the attributes of each original file under the control of the control unit 250. In this case, the attribute is collected to generate a piece of information for the archive information, and may be a file size, a file name, a file format, first or second partial material, or the like, as previously described.

於步驟S520處，檔案資訊產生單元210可以根據所檢查的原始檔案之屬性產生檔案資訊。該檔案資訊在複數段散列資料彼此比較之後，被用以判定被比較的二個原始檔案是否係相同的檔案。如前所述，檔案資訊可以包含檔案大小、第一局部資料、和第二局部資料中的至少其中一者。或者，檔案資訊可以包含一檔案名稱或一檔案格式。檔案資訊產生單元210將產生的檔案資訊提供給控制單元250。 At step S520, the file information generating unit 210 may generate the file information according to the attributes of the original file checked. The file information is used to determine the two original files to be compared after the plurality of hash data are compared with each other. Whether the case is the same file. As described above, the file information may include at least one of a file size, a first partial profile, and a second partial profile. Alternatively, the file information may contain a file name or a file format. The file information generating unit 210 supplies the generated file information to the control unit 250.

於步驟S530，散列產生單元220可以在控制單元250的控制之下，產生對應至每一原始檔案之一散列數值。在一實施例之中，散列產生單元220可以具有各種不同的散列演算法且可以利用控制單元250所請求的散列演算法針對原始檔案產生一散列數值。在一實施例之中，散列產生單元220能夠在控制單元250的控制之下，僅利用部分的原始檔案即產生一散列數值。散列產生單元220將產生的散列數值提供給控制單元250。 In step S530, the hash generation unit 220 may generate a hash value corresponding to one of each original file under the control of the control unit 250. In an embodiment, the hash generation unit 220 can have a variety of different hash algorithms and can generate a hash value for the original file using the hash algorithm requested by the control unit 250. In one embodiment, the hash generation unit 220 can generate a hash value using only a portion of the original file under the control of the control unit 250. The hash generation unit 220 supplies the generated hash value to the control unit 250.

於步驟S540，控制單元250可以利用檔案資訊和散列數值產生散列資料。控制單元250可以藉由依序將對應至散列數值的資料位元連結至對應至檔案資訊的資料位元而產生散列資料。在此實施例之中，控制單元250可以事先知悉檔案資訊涵蓋的範圍是從第一個位元開始到哪一個位元為止。因此，當控制單元250執行控制使得檔案資訊產生單元210和散列產生單元220產生檔案資訊和散列數值之時，其可以針對檔案資訊和散列數值的產生發出此一請求，包括有關資料長度的資訊。 In step S540, the control unit 250 can generate hash data using the archive information and the hash value. The control unit 250 may generate hash data by sequentially linking the data bit corresponding to the hash value to the data bit corresponding to the file information. In this embodiment, the control unit 250 can know in advance that the range of file information coverage is from the first bit to which bit. Therefore, when the control unit 250 performs control such that the archive information generating unit 210 and the hash generating unit 220 generate the archive information and the hash value, it can issue the request for the generation of the archive information and the hash value, including the length of the relevant data. Information.

圖6係一流程圖，其顯示能夠由圖4之散列比較系統執行之一散列資料產生方法之另一實施例。圖6之實施例係有關於一個其中利用上述之結構標頭產生散列資料的實施例。圖6之實施例係藉由將特定的步驟加入圖5的實施例之中而獲得，故與圖5之實施例相同或類似的步驟均將簡短地帶過。 Figure 6 is a flow chart showing another embodiment of a method of generating hash data that can be performed by the hash comparison system of Figure 4. The embodiment of Figure 6 relates to a method in which hash data is generated using the above-described structure header. Example. The embodiment of Fig. 6 is obtained by adding a specific step to the embodiment of Fig. 5, so that the same or similar steps as those of the embodiment of Fig. 5 will be briefly passed.

參見圖6，在步驟S610處，控制單元250可以事先決定包含於檔案資訊之中的元素。換言之，控制單元250可以事先決定包含於檔案資訊中的元素之種類、元素的長度、等等，並保留有關檔案資訊之組態的資訊。之後，控制單元250可以請求檔案資訊產生單元210產生檔案資訊，包含有關已決定之元素的資訊。 Referring to FIG. 6, at step S610, the control unit 250 may determine in advance elements included in the archive information. In other words, the control unit 250 can determine in advance the type of the element included in the archive information, the length of the element, and the like, and retain information on the configuration of the file information. Thereafter, the control unit 250 may request the file information generating unit 210 to generate file information containing information about the determined elements.

檔案資訊產生單元210可以在控制單元250的控制下產生檔案資訊。換言之，檔案資訊產生單元210可以在步驟S620檢查每一個原始檔案的屬性、在步驟S630利用所檢查的屬性產生檔案資訊，並將檔案資訊提供給控制單元250。 The file information generating unit 210 can generate file information under the control of the control unit 250. In other words, the file information generating unit 210 may check the attributes of each of the original files in step S620, generate the file information using the checked attributes in step S630, and provide the file information to the control unit 250.

於步驟S640，散列產生單元220可以在控制單元250的控制之下，針對原始檔案產生一散列數值，並將該散列數值提供給控制單元250。 In step S640, the hash generation unit 220 may generate a hash value for the original file under the control of the control unit 250, and provide the hash value to the control unit 250.

於步驟S650，控制單元250可以針對檔案資訊和散列數值產生一結構標頭。如前所述，結構標頭可以包含有關於散列資料之結構的資訊。使用結構標頭的理由是本發明將檔案資訊和一散列數值與散列資料分開，而後對檔案資訊和散列數值各自進行一比較。在一實施例之中，控制單元250可以在檔案資訊及散列數值產生之前產生一結構標頭。換言之，此動作係有可能的，因為當請求檔案資訊和散列數值之產生時亦請求檔案資訊和散列數值之組態(例如，檔案資訊之元素、元素之長度、一散列數值之長度、等等)之時，結構標頭即使在尚未接收到檔案資訊和散列數值之時亦可以被產生。在另一實施例之中，控制單元250可以個別地接收檔案資訊和散列數值，且之後對其產生一結構標頭。換言之，當檔案資訊產生單元210和散列產生單元220分別獨立地產生檔案資訊和一散列數值之時，控制單元250可以分開接收檔案資訊和散列數值且可以產生結構標頭。 In step S650, the control unit 250 may generate a structure header for the file information and the hash value. As mentioned earlier, the structure header can contain information about the structure of the hashed data. The reason for using the structure header is that the present invention separates the file information from a hash value and the hash data, and then compares the file information with the hash value. In one embodiment, control unit 250 may generate a structure header prior to file information and hash value generation. In other words, this action is possible because when requesting file information and When the hash value is generated, the configuration of the file information and the hash value is also requested (for example, the element of the file information, the length of the element, the length of a hash value, etc.), even if the structure header has not been received yet. File information and hash values can also be generated. In another embodiment, control unit 250 may individually receive the archive information and the hash value and then generate a structural header for it. In other words, when the file information generating unit 210 and the hash generating unit 220 respectively generate the file information and a hash value, the control unit 250 can separately receive the file information and the hash value and can generate a structure header.

一旦結構標頭已然產生之後，在步驟S660，控制單元250可以根據結構標頭、檔案資訊、和散列數值產生散列資料。 Once the structure header has been generated, at step S660, control unit 250 can generate hash data based on the structure header, archive information, and hash value.

圖7係一流程圖，其顯示能夠由圖4之散列比較系統執行之一散列資料比較方法之一實施例。圖7所示之散列資料比較方法係一個對應至圖5所示之散列資料產生方法的實施例。 Figure 7 is a flow chart showing one embodiment of a method of comparing hash data that can be performed by the hash comparison system of Figure 4. The hash data comparison method shown in Fig. 7 is an embodiment corresponding to the hash data generating method shown in Fig. 5.

參見圖7，在步驟S710，控制單元250可以選擇分別關聯待比較之二原始檔案之複數段散列資料。在一包含散列檔案管理單元230的實施例的情形之中，控制單元250可以針對待比較之二原始檔案自散列檔案管理單元230請求散列資料，並自散列檔案管理單元230取得散列資料。 Referring to FIG. 7, in step S710, the control unit 250 may select a plurality of pieces of hash data respectively associated with the original files to be compared. In the case of an embodiment including the hash file management unit 230, the control unit 250 may request the hash data from the hash file management unit 230 for the two original files to be compared, and obtain the hash from the hash file management unit 230. List of materials.

在步驟S720，控制單元250可以核驗二段所選擇的散列資料之組態。換言之，控制單元250可以分別核驗每一段散列資料的哪個部分對應至檔案資訊和散列數值。 At step S720, the control unit 250 can verify the configuration of the selected hash data of the two segments. In other words, the control unit 250 can separately verify which portion of each piece of hash data corresponds to the file information and the hash value.

在步驟S730，控制單元250能夠將包含於二段散列資料中之複數段檔案資訊彼此互相比較，而後先判定該二原始檔案是否係相同的檔案。舉例而言，當檔案資訊之中包含一檔案名稱、一檔案長度、等等之時，可以利用檔案資訊先判定二原始檔案是否係相同的檔案，而後可以判定檔案內容。本發明係用以針對二個預定被比較的檔案是否係相同物件而後判定物件之相同性，且若檔案被判定為相同之物件，則針對二物件之內容是否完全相同而決定物件內容之雷同性，從而完成比較。 In step S730, the control unit 250 can compare the plurality of pieces of file information included in the two pieces of hash data with each other, and then determine whether the two original files are the same file. For example, when the file information includes a file name, a file length, and the like, the file information can be used to determine whether the two original files are the same file, and then the file content can be determined. The invention is used for determining whether the two files to be compared are the same object and then determining the identity of the objects, and if the files are determined to be the same object, determining the similarity of the contents of the objects according to whether the contents of the two objects are identical. To complete the comparison.

在步驟S740，若該二段檔案資訊係完全相同的(即"是"之情形)，則在步驟S750，控制單元250可以將關聯二原始檔案之散列數值彼此互相比較。 In step S740, if the two pieces of file information are identical (ie, "yes"), then in step S750, the control unit 250 may compare the hash values of the associated two original files with each other.

在步驟S760，若散列數值亦完全相同(即"是"之情形)，則在步驟S770，其可以判定該二原始檔案係相同的檔案。 In step S760, if the hash values are also identical (i.e., "yes"), then in step S770, it can be determined that the two original files are the same file.

在步驟S740，若該複數段檔案資訊彼此相異(即"否"之情形)，或者在步驟S760，若散列數值彼此相異(即"否"之情形)，則在步驟S771，其可以判定該二原始檔案係不同的檔案。 In step S740, if the plurality of pieces of file information are different from each other (ie, the case of "No"), or if the hash values are different from each other (ie, the case of "No"), then in step S771, It is determined that the two original files are different files.

在上述的步驟之中，當複數段檔案資訊或散列數值彼此互相比較之時，控制單元250可以藉由檢查待比較的對應物件之資料位元而進行比較。因此，若其僅利用檔案資訊即判定原始檔案係不同的檔案，則資料位元之數目被明顯地縮減。因此，本發明在必須於一個1：N的關係進行比較之時，能夠有效率地進行比較，諸如自複數檔案之中，找出與一特定原始檔案相同之檔案的動作。 In the above steps, when the plurality of pieces of file information or hash values are compared with each other, the control unit 250 can perform comparison by checking the data bits of the corresponding objects to be compared. Therefore, if it uses only the archive information to determine that the original file is a different file, the number of data bits is significantly reduced. Therefore, the present invention can efficiently compare when it is necessary to compare a 1:N relationship, such as from a complex file. The act of finding the same file as a particular original file.

圖8係一流程圖，其顯示能夠由圖4之散列比較系統執行之一散列資料比較方法之另一實施例。圖8所示之散列資料比較方法係一個對應至圖6所示之散列資料產生方法的實施例，其中圖8所示的散列資料另包含一結構標頭。因此，在本實施例之中，與圖7所示的實施例相同或類似的步驟將簡短帶過。 Figure 8 is a flow chart showing another embodiment of a method of comparing hash data that can be performed by the hash comparison system of Figure 4. The hash data comparison method shown in FIG. 8 is an embodiment corresponding to the hash data generating method shown in FIG. 6, wherein the hash data shown in FIG. 8 further includes a structure header. Therefore, in the present embodiment, the same or similar steps as those of the embodiment shown in Fig. 7 will be briefly taken.

參見圖8，在步驟S810之中，控制單元250可以選擇分別關聯待比較之二原始檔案之複數段散列資料。 Referring to FIG. 8, in step S810, the control unit 250 may select a plurality of pieces of hash data respectively associated with the original files to be compared.

在步驟S820，控制單元250可以檢查該二段選定之散列資料之結構標頭，而後分析該等結構標頭。如前所述，由於每一結構標頭均包含對應散列資料之中所包含的檔案資訊之內容及長度、一散列數值之長度、等等，故控制單元250能夠藉由分析該結構標頭識別散列資料的個別元素。 In step S820, the control unit 250 may check the structure headers of the two selected pieces of hash data, and then analyze the structure headers. As described above, since each structure header includes the content and length of the file information included in the corresponding hash data, the length of a hash value, and the like, the control unit 250 can analyze the structure label. The header identifies individual elements of the hashed material.

控制單元250將該二段散列資料之結構標頭彼此互相比較，且若該等結構標頭完全相同(在步驟S830的"是"之情形)，則可以在步驟S840識別出包含於各段散列資料中的複數段檔案資訊及散列數值。 The control unit 250 compares the structure headers of the two pieces of hash data with each other, and if the structure headers are identical (in the case of "Yes" at step S830), it may be identified in step S840 that the segments are included in each segment. Multiple file information and hash values in the hashed data.

在步驟S850，控制單元250能夠將包含於該二段散列資料中之複數段檔案資訊彼此互相比較，而後先判定該二原始檔案是否係相同的檔案。 In step S850, the control unit 250 can compare the plurality of pieces of file information included in the two pieces of hash data with each other, and then determine whether the two original files are the same file.

在步驟S860，若該複數段檔案資訊係完全相同的(即"是"之情形)，則在步驟S870，控制單元250可以將分別關聯二原始檔案之散列數值彼此互相比較。 In step S860, if the plurality of file information is identical (ie, "yes"), then in step S870, the control unit 250 may compare the hash values of the respective associated original files with each other.

在步驟S880，若散列數值完全相同(即"是"之情形)，則在步驟S890，其可以判定該二原始檔案係相同的檔案。 In step S880, if the hash values are identical (i.e., "yes"), then in step S890, it may be determined that the two original files are the same file.

若在步驟S830，該等結構標頭彼此相異(即"否"之情形)、若在步驟S860，該複數段檔案資訊彼此相異(即"否"之情形)、或者若在步驟S880，散列數值彼此相異(即"否"之情形)，則在步驟S891可以判定該二原始檔案係不同的檔案。 If in step S830, the structural headers are different from each other (ie, the case of "No"), if in step S860, the plurality of file profiles are different from each other (ie, "No"), or if in step S880, If the hash values are different from each other (i.e., the case of "No"), then in step S891, it is possible to determine that the two original files are different files.

圖8所示的實施例可以利用結構標頭識別出構成散列資料的檔案資訊和散列數值。此實施例在複數段檔案資訊和散列數值以不同方式套用的系統中可以更加有效率。此外，在步驟S830之中，由於檔案的相同性可以利用結構標頭本身加以判定，故檔案的相同性之判定可以更為迅速及精確，從而有效率地進行比較。 The embodiment shown in Figure 8 can utilize the structure header to identify the archive information and hash values that make up the hash material. This embodiment can be more efficient in systems where multiple pieces of file information and hash values are applied in different ways. In addition, in step S830, since the identity of the file can be determined by using the structure header itself, the determination of the identity of the file can be made more rapid and accurate, so that the comparison can be performed efficiently.

圖9係一組態示意圖，其顯示依據本發明之一散列比較系統之另一實施例。圖9之中所示的散列比較系統係一個可套用於檔案彼此之間以1：N的關係進行比較之情形的實施例。此系統被組構成先僅將複數段檔案資訊彼此互相比較、利用具有相同檔案資訊的檔案產生一第一比較群組、以及僅將隸屬於該第一比較群組的檔案的散列數值彼此互相比較。 Figure 9 is a configuration diagram showing another embodiment of a hash comparison system in accordance with the present invention. The hash comparison system shown in Fig. 9 is an embodiment that can be applied to the case where files are compared with each other in a 1:N relationship. The system is configured to first compare only a plurality of pieces of file information with each other, generate a first comparison group using files having the same file information, and only hash values of files belonging to the first comparison group to each other. Comparison.

參見圖9，散列比較系統200包含一檔案資訊產生單元210、一散列產生單元220、一控制單元250、以及一散列比較單元260。在一實施例之中，散列比較系統200可以另包含一檔案管理單元230和一原始檔案管理單元240中的至少其中一者。在圖9所示的實施例的說明之中，與圖4的實施例完全相同或類似的組件之描述將予以省略或簡短帶過。 Referring to FIG. 9, the hash comparison system 200 includes a file information generating unit 210, a hash generating unit 220, a control unit 250, and a hash comparing unit 260. In an embodiment, the hash comparison system 200 can further include a file management unit 230 and an original file management unit 240. At least one of them. Among the description of the embodiment shown in Fig. 9, the description of the components identical or similar to those of the embodiment of Fig. 4 will be omitted or briefly taken.

控制單元250可以自目標檔案群組B選擇一個與原始檔案A完全相同的檔案。就此而言，控制單元250可以選擇關聯包含於目標檔案群組B中的所有檔案的複數段散列資料、選擇原始檔案A的散列資料、以及將選擇的複數段散列資料彼此互相比較。在比較當中，控制單元250可以將每一段散列資料分成檔案資訊和一散列數值，並先只將複數段檔案資訊彼此互相比較。換言之，控制單元250可以將原始檔案A的檔案資訊與包含於目標檔案群組B之目標檔案的複數段檔案資訊進行比較、將具有相同檔案資訊的目標檔案加以歸類、而後產生一第一比較群組。之後，控制單元250可以利用散列比較單元260將包含於該第一比較群組的目標檔案的散列數值與原始檔案A的散列數值進行比較，而後判定相同的檔案。 The control unit 250 can select a file identical to the original file A from the target file group B. In this regard, the control unit 250 can select to associate the plurality of pieces of hash data of all the files included in the target file group B, select the hash data of the original file A, and compare the selected plurality of pieces of hash data with each other. In the comparison, the control unit 250 may divide each piece of hash data into file information and a hash value, and first compare only the plurality of pieces of file information with each other. In other words, the control unit 250 can compare the file information of the original file A with the plurality of file information of the target file included in the target file group B, classify the target files having the same file information, and then generate a first comparison. Group. Thereafter, the control unit 250 may use the hash comparison unit 260 to compare the hash value of the target file included in the first comparison group with the hash value of the original file A, and then determine the same file.

散列比較單元260可以在控制單元250的控制之下，只將散列數值彼此互相比較。在揭示的實施例之中，散列比較單元260係用以僅對散列數值分別加以比較，從而在一個1：N關係之中需要搜尋的情況下，更有效率地執行比較。 The hash comparison unit 260 can only compare the hash values to each other under the control of the control unit 250. In the disclosed embodiment, the hash comparison unit 260 is configured to compare only the hash values, thereby performing the comparison more efficiently in the case where a search is required among a 1:N relationship.

依據揭示於本發明中的技術，檔案是否彼此相異可以在檔案的散列數值彼此互相比較之前即加以判定，故其無須比較不同檔案的所有散列數值，從而得到能夠更迅速地將檔案彼此互相比較的優點。 According to the technique disclosed in the present invention, whether files are different from each other can be determined before the hash values of the files are compared with each other, so that it is not necessary to compare all hash values of different files, thereby obtaining a more rapid The advantage of comparing files to each other.

此外，揭示於本發明中的技術，其優點在於，檔案資訊及一散列資料結構各自是否已被正確地組構可以利用由檔案資訊之同位和一散列數值之同位所構成的同位資訊加以核驗。 In addition, the technique disclosed in the present invention has the advantage that whether the file information and a hash data structure have been correctly configured can utilize the information of the parity of the co-located information of the file information and the parity of a hash value. Verification.

雖然本發明之較佳實施例已基於例示性目的揭示於上，但熟習相關技術者應能領略，各種修改、增補及替代均屬可能，且並未脫離揭示於申請專利範圍之中的發明範疇和精神。 While the preferred embodiments of the present invention have been disclosed herein for illustrative purposes, it will be appreciated by those skilled in the art that various modifications, additions and substitutions are possible, and without departing from the scope of the invention disclosed in the scope of the claims. And spirit.

100‧‧‧散列資料結構 100‧‧‧Hash data structure

110‧‧‧檔案資訊 110‧‧‧File Information

111‧‧‧檔案大小數值 111‧‧‧File size values

112‧‧‧第一局部資料 112‧‧‧ first partial data

113‧‧‧第二局部資料 113‧‧‧Second partial data

120‧‧‧散列數值 120‧‧‧Hash values

130‧‧‧結構標頭 130‧‧‧Structure header

140‧‧‧同位資訊 140‧‧‧Same Information

200‧‧‧散列比較系統 200‧‧‧Hash comparison system

210‧‧‧檔案資訊產生單元 210‧‧‧File Information Generation Unit

220‧‧‧散列產生單元 220‧‧‧Hash Generation Unit

230‧‧‧散列檔案管理單元 230‧‧‧Hash archive management unit

240‧‧‧原始檔案管理單元 240‧‧‧Original File Management Unit

250‧‧‧控制單元 250‧‧‧Control unit

260‧‧‧散列比較單元 260‧‧‧Hash comparison unit

S510-S540‧‧‧步驟 S510-S540‧‧‧Steps

S610-S660‧‧‧步驟 S610-S660‧‧‧Steps

S710-S771‧‧‧步驟 S710-S771‧‧‧Steps

S810-S891‧‧‧步驟 S810-S891‧‧‧Steps

藉由配合所附圖式進行的詳細說明，將更容易理解本發明之前述和其他目的、特徵及優點，其中：圖1係一參考示意圖，其顯示依據本發明之一散列資料結構之一實施例；圖2係一參考示意圖，其顯示依據本發明之一散列資料結構之另一實施例；圖3係一參考示意圖，其顯示依據本發明之一散列資料結構之再一實施例；圖4係一組態示意圖，其顯示依據本發明之一散列比較系統之一實施例；圖5係一流程圖，其顯示能夠由圖4之散列比較系統執行之一散列資料產生方法之一實施例；圖6係一流程圖，其顯示能夠由圖4之散列比較系統執行之一散列資料產生方法之另一實施例；圖7係一流程圖，其顯示能夠由圖4之散列比較系統執行之一散列資料比較方法之一實施例；圖8係一流程圖，其顯示能夠由圖4之散列比較系統執行之一散列資料比較方法之另一實施例；以及圖9係一組態示意圖，其顯示依據本發明之一散列比較系統之另一實施例。 The above and other objects, features and advantages of the present invention will become more <RTIgt; understood</RTI> <RTIgt; 2 is a schematic diagram showing another embodiment of a hash data structure according to the present invention; and FIG. 3 is a schematic diagram showing a further embodiment of a hash data structure according to the present invention. FIG. 4 is a configuration diagram showing an embodiment of a hash comparison system according to the present invention; FIG. 5 is a flow chart showing that one of the hash data generations can be performed by the hash comparison system of FIG. One embodiment of the method; FIG. 6 is a flow chart showing another embodiment of a method of generating hash data that can be performed by the hash comparison system of FIG. 4; Figure 7 is a flow chart showing one embodiment of a hash data comparison method that can be performed by the hash comparison system of Figure 4; Figure 8 is a flow chart showing the execution of the hash comparison system of Figure 4 Another embodiment of a hash data comparison method; and FIG. 9 is a configuration diagram showing another embodiment of a hash comparison system in accordance with the present invention.

S510-S540‧‧‧步驟 S510-S540‧‧‧Steps

Claims

A method for generating a hash data for generating each piece of hash data to be compared with an original file, comprising: (a) checking an attribute of each original file and generating a predetermined data bit according to the attribute being checked File information; (b) calculating a hash value by applying a hash algorithm to at least a portion of the original file; and (c) generating a hash by sequentially linking the hash value to the file information data.

The method for generating hash data according to item 1 of the patent application, wherein (a) comprises: checking a size of the original file, a name, and a format, a first partial data including the first data of the original file, And at least one of the second partial material including the last data of the original file; and generating the first size, the name, and the format including the original file, the first portion of the original file Partial information, and file information of at least one of the second partial data including the last data of the original file.

For example, the method for generating hash data according to item 1 of the patent application scope further includes (d) generating a hash co-located bit for the hash data.

For example, the method for generating hash data according to item 3 of the patent application scope, wherein (d) comprises: generating a first parity bit for the file information; Generating a second parity bit for the hash value; and generating the hash parity bit by sequentially joining the first and second parity bits.

A method for comparing hash data using a hash data containing file information and hash values to compare two original files with each other, comprising: (a) examining two pieces of hash data respectively associated with the two original files; (b) including The two pieces of file information in the two paragraphs of the hash information are compared with each other; and (c) if the two pieces of file information are identical, the two hash values included in the two pieces of hash data are mutually Comparing, and if the two hash values are identical, it is determined that the two original files are the same file.

The method for comparing hash data according to item 5 of the patent application scope, wherein the file information includes a size corresponding to one of the original files, a name and a format, a first partial data including the first data of the original file, and the inclusion At least one of the second partial data of the last data of the original file.

For example, in the method of comparing the hash data of item 6 of the patent application, (b) includes comparing the individual data bits constituting the information of the two sections of the file with each other.

For example, in the comparison method of the hash data of the sixth application scope of the patent application, wherein (b) includes: identifying, for each of the two pieces of file information, a size, a name of one of the original files included in the corresponding file information. And a format, a first partial material containing the first data of the original file, and the original At least one of the second partial data of the last data of the initial file; and the size, the name, and the format of the original file that has been identified, and the first data including the original file At least one of the partial data and the second partial data including the last data of the original file compares the two pieces of file information with each other.

A hash data comparison system using hash data including file information and hash values to compare original files with each other, comprising: a file information generating unit configured to check an attribute of each original file and generate related information Archive information of the original file; a hash generating unit configured to calculate a hash value by applying a hash function algorithm to at least a portion of the original file; and a control unit grouped to correspond to the corresponding original file Generates hash data, including file information and hash values.

For example, the hash data comparison system of claim 9 includes a hash file management unit that is configured to store the hash data generated by the storage and to maintain information about the original file associated with the stored hash value.

For example, the hash data comparison system of claim 9 wherein the control unit determines the similarity of the first and second original files by sequentially comparing the plurality of file information and the hash value of the first and second original files. Sex.

For example, in the hash data comparison system of claim 9, wherein the control unit generates a structural header including one of identification information about the file information and the hash value, and generates the hash data, including the structure header, File information, and hash values.

For example, the hash data comparison system of claim 12, wherein the control unit sequentially compares the structure header, the plurality of file information, and the hash value of the first and second original files, and then the first and second If the structure header, file information, and hash value of the original file are identical, it is determined that the first and second original files are the same file.

For example, in the hash data comparison system of claim 9, wherein the control unit generates a parity bit for the hash data, and includes a parity bit calculated separately for the file information and the hash value.