CN102946379A - Multi-layer digest file generation method and file correctness verification method for massively parallel system - Google Patents

Multi-layer digest file generation method and file correctness verification method for massively parallel system Download PDF

Info

Publication number
CN102946379A
CN102946379A CN2012103947659A CN201210394765A CN102946379A CN 102946379 A CN102946379 A CN 102946379A CN 2012103947659 A CN2012103947659 A CN 2012103947659A CN 201210394765 A CN201210394765 A CN 201210394765A CN 102946379 A CN102946379 A CN 102946379A
Authority
CN
China
Prior art keywords
file
multilayer
level
interlayer
massively parallel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2012103947659A
Other languages
Chinese (zh)
Inventor
何王全
方燕飞
权建校
刘勇
文延华
魏迪
毛兴权
王珊珊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Jiangnan Computing Technology Institute
Original Assignee
Wuxi Jiangnan Computing Technology Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Jiangnan Computing Technology Institute filed Critical Wuxi Jiangnan Computing Technology Institute
Priority to CN2012103947659A priority Critical patent/CN102946379A/en
Publication of CN102946379A publication Critical patent/CN102946379A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a multi-layer digest file generation method and a file correctness verification method for a massively parallel system. The multi-layer digest file generation method comprises the steps as follows: an object file is split into multiple first-level files according to the size of an original Hash block as a unit; a first-level interlayer digest is generated for each first-level file by using a message digest algorithm; and the total digest is generated by using the message digest algorithm for at least one time on the basis of the first-level interlayer digests. The file correctness verification method comprises the steps as follows: a multi-layer digest file corresponding to a file is read, and the size of the original Hash block is obtained; the file is split into multiple first-level files by using the size of the original Hash block as a unit; and a first-level interlayer digest is generated for each first-level file by using a message digest algorithm, and then compared with a corresponding first-level interlayer digest in the multi-layer digest file, and if the two are not consistent, wrong information is output. After the multi-layer digest file generation method and the file correctness verification method are adopted, the correctness of a large number of files can be rapidly checked on the massively parallel system.

Description

Multilayer Summary file generation method and file authentication verification method
Technical field
The present invention relates to network safety filed, relate in particular under multilayer Summary file generation method under the massively parallel system and the massively parallel system file authentication verification method based on the multilayer Summary file.
Background technology
On the massively parallel system that is consisted of by thousands of CPU, application program quantity of documents to be processed and capacity are very huge, file content on the dish battle array is damaged, perhaps in the program operation process, read in internal memory from the dish battle array, because the data content that the sporadic mistake of I/O causes is made mistakes, be difficult to by fast detecting out during perhaps from the internal memory writing in files.
At present, the means of traditional verification file correctness are to utilize the hash algorithms such as MD5, sha1, des to carry out document, just can find by the contrast summary whether file content is destroyed.
The MD5 full name is message digest algorithm (Message-Digest Algorithm 5), this algorithm calculates by turn to the information of random length, produce a binary system length and be " hash value " (or claim " summary ") of 128, the possibility of the hash value that different file generations is identical is very little.Fig. 1 utilizes the MD5 algorithm to calculate the flow chart of hash value.As shown in Figure 1, the MD5 algorithm is take 512bit as a grouping, and the data of summary to be calculated are divided into n group.From first grouping, calculate the hash value of a 128bit for first grouping.Then, be the hash value that a 128bit is calculated in second grouping, and the hash value of second grouping is added on the hash value of first grouping.And so forth, until on n individual hash value of hash value accumulated value (n-1).The hash value of the 128bit that arrangement output obtains at last (i.e. summary).During the verification file correctness, calculate the summary of 128bit with same algorithm, compare with the summary that calculates before.If consistent, then file is correct; If inconsistent, show that file mistake occurs in the I/O process.
File verification instrument commonly used is md5sum under the linux at present, and it is to calculate the implementing procedure of making a summary with verification file.The document software that md5sum and similar functions are also arranged under the windows.As shown in Figure 1, when calculating the hash value, the hash value of a rear grouping can be superimposed on the hash value of last grouping, and forward and backward hash value has correlation.So, calculate summary with the MD5 algorithm, be in fact a serial computing.So under massively parallel system, when the file data capacity is very huge, utilizes the MD5 algorithm to generate summary or carry out verification and become a job very consuming time.
In addition, the outline utility under existing linux or the windows all is static, and the I/O operation to file does not detect in real time in the support application program, can't in time find the I/O mistake, avoid mistake to enlarge as early as possible.
Summary of the invention
Technical problem to be solved by this invention is on massively parallel system, how to realize fast the integrity verification of heap file and the Correctness checking of a large amount of I/O operations.
In order to address the above problem, the invention provides the multilayer Summary file generation method under a kind of massively parallel system, comprising:
Take the original Hash block size of predefined as unit, file destination is split into a plurality of first grade files;
Be each first grade file with message digest algorithm, generate one-level interlayer summary, described one-level interlayer summary is stored in the described multilayer Summary file;
Based on described one-level interlayer summary, use at least message digest algorithm one time, generate always summary, described total summary is stored in the described multilayer Summary file.
Optionally, describedly use at least message digest algorithm one time based on described one-level interlayer summary, generate total summary and comprises:
Use message digest algorithm always to make a summary for all one-level interlayer summarization generations.
Optionally, describedly use at least message digest algorithm one time based on described one-level interlayer summary, generate total summary and comprises:
Obtain summary number of stories m, the ground floor Hash piece of predefined and count n 1, second layer Hash piece counts n 2..., m layer Hash piece counts n m
When described summary number of stories m greater than 1 the time, repeat following steps, until form all m level interlayers summaries:
1) all i level interlayer summaries is split into n (i+1)Individual (i+1) level file; 1≤i≤(m-1), i is positive integer;
2) be each (i+1) level file with message digest algorithm, generate (i+1) level interlayer summary, described (i+1) level interlayer summary is stored in the described multilayer Summary file;
3) begin circulation from step 1) behind the cumulative i;
Use message digest algorithm always to make a summary for all m level interlayer summarization generations.
Optionally,, as unit file destination is split into before a plurality of first grade files at described original Hash block size take predefined, also comprise: the structure that defines described multilayer Summary file;
The structure of described multilayer Summary file comprises: file destination name, file destination size, original Hash block size, original Hash piece are counted n, n one-level interlayer summary, total summary.
Optionally,, as unit file destination is split into before a plurality of first grade files at described original Hash block size take predefined, also comprise: the structure that defines described multilayer Summary file;
The structure of described multilayer Summary file comprises: file destination name, file destination size, summary number of stories m, original Hash block size, ground floor Hash piece are counted n 1, n 1Individual one-level interlayer summary, second layer Hash piece are counted n 2, n 2Individual secondary interlayer summary ..., m layer Hash piece counts n m, n mIndividual m level interlayer summary, total summary.
Optionally, at described original Hash block size take predefined as unit, file destination is split into before a plurality of first grade files, also comprise: be the file destination name in the described multilayer Summary file, file destination size, original Hash block size, original Hash piece number n assignment;
Described one-level interlayer summary is stored in the multilayer Summary file comprises: n one-level interlayer summary assignment in take described one-level interlayer summary as described multilayer Summary file;
Described will always make a summary to be stored in the multilayer Summary file comprise: the total summary assignment in take described total summary as described multilayer Summary file.
Optionally, at described original Hash block size take predefined as unit, file destination is split into before a plurality of first grade files, also comprise: for file destination name, file destination size, summary number of stories m, original Hash block size, the ground floor Hash piece of described multilayer Summary file are counted n 1, second layer Hash piece counts n 2..., m layer Hash piece counts n mAssignment;
Described one-level interlayer summary is stored in the multilayer Summary file comprises: the n in take described one-level interlayer summary as described multilayer Summary file 1Individual one-level interlayer summary assignment;
Described (i+1) level interlayer summary is stored in the multilayer Summary file comprises: the n in take described (i+1) level interlayer summary as described multilayer Summary file (i+1)Individual (i+1) level interlayer summary assignment;
Described will always make a summary to be stored in the multilayer Summary file comprise: the total summary assignment in take described total summary as described multilayer Summary file.
Optionally, summary is finished by task parallelisms different under the described massively parallel system between generation layer.
Optionally, generating total summary is finished by single process.
Optionally, described message digest algorithm is the MD5 algorithm.
The present invention also provides under a kind of massively parallel system based on the file authentication verification method of multilayer Summary file, comprising:
Read multilayer Summary file corresponding to described file, obtain original Hash block size;
Take described original Hash block size as unit, described file declustering is become a plurality of first grade files;
Be each first grade file with message digest algorithm, generate one-level interlayer summary, one-level interlayer corresponding in described one-level interlayer summary and the described multilayer Summary file is made a summary to be compared, if both are inconsistent, and output error message then.
Optionally, in output error message, export the sequence number of first grade file corresponding to described error message.
Optionally, the method also comprises:
If all corresponding in all one-level interlayers summaries and described multilayer Summary file one-level interlayers summaries are all consistent, be that all one-level interlayers are made a summary with message digest algorithm then, generation is always made a summary;
Total summary corresponding in described total summary and the described multilayer Summary file is compared; If both are consistent, then export correct information; If both are inconsistent, output error message then.
Optionally, the method also comprises:
If all corresponding one-level interlayers are made a summary all unanimously in all one-level interlayers summaries and the described multilayer Summary file, the summary number of stories m, the ground floor Hash piece that then obtain in the described multilayer Summary file are counted n 1, second layer Hash piece counts n 2..., m layer Hash piece counts n m
When described summary number of stories m greater than 1 the time, repeat following steps, until generate all m level interlayers summaries:
1) counts n according to (i+1) layer Hash piece (i+1), all i level interlayer summaries are split into n (i+1)Individual (i+1) level file; 1≤i≤(m-1), i is positive integer;
2) be each (i+1) level file with message digest algorithm, generate (i+1) level interlayer summary;
3) (i+1) level interlayer summary corresponding in described (i+1) level interlayer summary and the described multilayer Summary file is compared if both are consistent, then 4) begin to circulate from step 1) behind the i that adds up; If both are inconsistent, output error message then;
If all corresponding in all m level interlayers summaries and described multilayer Summary file m level interlayers summaries are all consistent, be that all m level interlayers are made a summary with message digest algorithm then, generation is always made a summary;
Total summary corresponding in described total summary and the described multilayer Summary file is compared; If both are consistent, then export correct information; If both are inconsistent, output error message then.
Optionally, summary is finished by task parallelisms different under the described massively parallel system between generation layer.
Optionally, generating total summary is finished by single process.
Optionally, described message digest algorithm is the MD5 algorithm.
Compared with prior art, technical scheme of the present invention has the following advantages:
1, the present invention produces the interlayer summary by different level, and each interlayer summary can be complete by different task parallelisms, again to total summary of interlayer summarization generation of the bottom, can take full advantage of the disposal ability of massively parallel system, greatly improves summarization generation speed.Accordingly, when the authenticating documents correctness, produce by different level the interlayer summary, the interlayer summary of preserving in the Summary file that interlayer is made a summary and calculated in advance is crossed with gained compares, inconsistently namely report an error, can take full advantage of the disposal ability of massively parallel system, greatly improve file verification speed.
2, in the possibility, in the file authentication proof procedure, because one-level interlayer summary carries out based on the piecemeal of file, so when the first grade file summary is made mistakes, but except reporting errors, also mistake accurately can be positioned in the concrete file block, in time find the I/O mistake, avoid as early as possible wrong and enlarge.
Description of drawings
Fig. 1 is the method flow diagram that existing MD5 algorithm calculates the hash value;
Fig. 2 is the flow chart of an embodiment of the multilayer Summary file generation method under the massively parallel system of the present invention;
Fig. 3 is based on the flow chart of an embodiment of the file authentication verification method of multilayer Summary file under the massively parallel system of the present invention;
Fig. 4 is based on the Summary file structural representation of the first embodiment of the file authentication verification method of multilayer Summary file under the first embodiment of the multilayer Summary file generation method under the massively parallel system of the present invention, the massively parallel system of the present invention;
Fig. 5 is the flow chart of the first embodiment of the multilayer Summary file generation method under the massively parallel system of the present invention;
Fig. 6 is based on the first embodiment flow chart of the file authentication verification method of multilayer Summary file under the massively parallel system;
Fig. 7 is the flow chart of another embodiment of the multilayer Summary file generation method under the massively parallel system of the present invention;
Fig. 8 is based on the flow chart of another embodiment of the file authentication verification method of multilayer Summary file under the massively parallel system of the present invention;
Fig. 9 is based on the Summary file structural representation of the second embodiment of the file authentication verification method of multilayer Summary file under the second embodiment of the multilayer Summary file generation method under the massively parallel system of the present invention, the massively parallel system of the present invention;
Figure 10 is the flow chart of the second embodiment of the multilayer Summary file generation method under the massively parallel system of the present invention;
Figure 11 is based on the flow chart of the second embodiment of the file authentication verification method of multilayer Summary file under the massively parallel system of the present invention.
Embodiment
A lot of details have been set forth in the following description so that fully understand the present invention.But the present invention can implement much to be different from alternate manner described here, and those skilled in the art can do similar popularization in the situation of intension of the present invention, so the present invention is not subjected to the restriction of following public implementation.
Secondly, the present invention utilizes schematic diagram to be described in detail, and when the embodiment of the invention was described in detail in detail, for ease of explanation, described schematic diagram was example, and it should not limit the scope of protection of the invention at this.
In order to solve the technical problem in the background technology, the invention provides the multilayer Summary file generation method under a kind of massively parallel system.Fig. 2 is the flow chart of an embodiment of the multilayer Summary file generation method under the massively parallel system of the present invention, has only formed one-level interlayer summary in this embodiment.As shown in Figure 2, this embodiment may further comprise the steps:
Execution in step S21 take the original Hash block size of predefined as unit, splits into a plurality of first grade files with file destination.
Need to prove,, as unit file destination is split into before a plurality of first grade files at described original Hash block size take predefined, also can comprise: the structure that defines described multilayer Summary file.The structure of described multilayer Summary file comprises: file destination name, file destination size, original Hash block size, original Hash piece are counted n, n one-level interlayer summary, total summary.At described original Hash block size take predefined as unit, file destination is split into before a plurality of first grade files, after the structure of the described multilayer Summary file of definition, also can comprise: be the file destination name in the described multilayer Summary file, file destination size, original Hash block size, original Hash piece number n assignment.
Execution in step S22 is each first grade file with message digest algorithm, generates one-level interlayer summary, and described one-level interlayer summary is stored in the described multilayer Summary file.Particularly, described one-level interlayer summary is stored in the multilayer Summary file comprises: n one-level interlayer summary assignment in take described one-level interlayer summary as described multilayer Summary file.Described is each first grade file, generates one-level interlayer summary and is finished by task parallelisms different under the described massively parallel system.
Execution in step S23 is all one-level interlayer summaries with message digest algorithm, generates always summary, and described total summary is stored in the described multilayer Summary file.Particularly, described will always make a summary to be stored in the multilayer Summary file comprise: the total summary assignment in take described total summary as described multilayer Summary file.Described is all one-level interlayer summaries, generates a total summary and is finished by single process.
Need to prove, although the message digest algorithm that adopts in this embodiment is the MD5 algorithm, should not be construed as concrete restriction of the present invention.It will be understood by those skilled in the art that other message digest algorithms in the state of the art, such as: sha1, des etc. all can be applicable among the present invention.
Accordingly, the present invention also provides under a kind of massively parallel system the file authentication verification method based on the multilayer Summary file.Fig. 3 be under the massively parallel system of the present invention based on the flow chart of an embodiment of the file authentication verification method of multilayer Summary file, only formed one-level interlayer summary in this embodiment.As shown in Figure 3, this embodiment may further comprise the steps:
Execution in step S31 reads multilayer Summary file corresponding to described file, obtains original Hash block size.
Execution in step S32 take original Hash block size as unit, becomes a plurality of first grade files with file declustering.
Execution in step S33 is each first grade file with message digest algorithm, generates one-level interlayer summary, one-level interlayer corresponding in described one-level interlayer summary and the described multilayer Summary file made a summary compare, if both are inconsistent, then output error message is verified and is finished.Simultaneously, the sequence number of first grade file corresponding to exportable described error message also.Described is each first grade file, generates one-level interlayer summary and is finished by task parallelisms different under the described massively parallel system.
This embodiment can further comprise:
Execution in step S34 if corresponding all one-level interlayers summaries are all consistent in all one-level interlayers summaries and the described multilayer Summary file, be that all one-level interlayers are made a summary with message digest algorithm by single process then, generates one and always makes a summary.The total summary of described generation is finished by single process.
Execution in step S35 compares total summary corresponding in described total summary and the described multilayer Summary file; If both are consistent, then export correct information; If both are inconsistent, output error message then.
Need to prove, although the message digest algorithm that adopts in this embodiment is the MD5 algorithm, should not be construed as concrete restriction of the present invention.It will be understood by those skilled in the art that other message digest algorithms in the state of the art, such as: sha1, des etc. all can be applicable among the present invention.
Below in conjunction with the drawings and specific embodiments technical scheme of the present invention is described further.
Fig. 4 is based on the Summary file structural representation of the first embodiment of the file authentication verification method of multilayer Summary file under the first embodiment of the multilayer Summary file generation method under the above-mentioned massively parallel system and the above-mentioned massively parallel system.As shown in Figure 4, the Summary file of present embodiment comprises: filename, file size, Hash block size, Hash piece number, Hash1~Hash1024 be the hash value of totally 1024 16 bytes (being one-level interlayer summary) and a total hash value (i.e. total summary).Before generating the multilayer Summary file, present embodiment to the some content assignment in the described multilayer Summary file, specifically comprises: the file in the present embodiment is called File, and this document size is 1GB, and each Hash block size is 1MB, and Hash piece number is 1024.
Fig. 5 is the flow chart of the first embodiment of the multilayer Summary file generation method under the above-mentioned massively parallel system.As shown in Figure 5, at first according to the Hash block size 1MB in the multilayer Summary file, file File is split into 1024 first grade files.
Then, by mutually independently 1024 CPU read in respectively a first grade file, the one-level interlayer summary with MD5 calculates this first grade file is stored in the one-level interlayer summary corresponding in the multilayer Summary file.Such as: CPU 1Read in the 1st first grade file, calculate the one-level interlayer summary Hash1 of the 128bit of the 1st first grade file with the MD5 algorithm, be stored in Hash1 place corresponding in the described multilayer Summary file.
Then, after 1024 one-level interlayer summary Hash1~Hash1024 have generated, by CPU 1Collect 1024 one-level interlayers summary, and be the total hash values of the total summary of 1024 one-level interlayer summarization generations with MD5, be stored in total hash value of correspondence in the described multilayer Summary file.Calculate and always make a summary only by CPU 1In single process finish.So far, the multilayer Summary file of file File generates complete.
Accordingly, Fig. 6 is based on the flow chart of the first embodiment of the file authentication verification method of multilayer Summary file under the above-mentioned massively parallel system.As shown in Figure 6, the multilayer Summary file that at first file reading File is corresponding, obtaining original Hash block size is 1MB.
Then, take 1MB as unit, file File is split into 1024 first grade files.
Then, by mutually independently 1024 CPU read in respectively a first grade file, calculate the one-level interlayer summary of this first grade file with MD5, one-level interlayer corresponding in this one-level interlayer summary and the multilayer Summary file is made a summary to be compared.If both are inconsistent, illustrate that the I/O of this first grade file makes a mistake, output error message then, and can export simultaneously the sequence number of this first grade file.Such as: CPU 2Read in the 2nd first grade file, calculate the one-level interlayer summary Hash2 ' of the 128bit of the 2nd first grade file with the MD5 algorithm, one-level interlayer corresponding in this one-level interlayer summary Hash2 ' and the multilayer Summary file Hash2 that makes a summary is compared.If both are different, show that then the 2nd first grade file mistake occurred in the I/O process, output error message then, the first grade file sequence number 2 of simultaneously output error.
If 1024 corresponding one-level interlayers are made a summary all unanimously in 1024 one-level interlayer summary and the multilayer Summary file, illustrate that all one-level interlayers make a summary all correct in the I/O process.In theory, at one-level interlayer summary all on the correct basis, total summary should be consistent with corresponding total summary in the multilayer Summary file.If occur inconsistently, also should be limited to total digest calculations mistake.In order to ensure the integrality of checking, by CPU 1Collecting 1024 one-level interlayers summary Hash1 '~Hash1024 ', and be the total hash values of the total summary of these 1024 one-level interlayer summarization generations with MD5 the total hash value of total summary corresponding in ', will be somebody's turn to do total hash value of always making a summary ' and the multilayer Summary file compares.If both are consistent, then correct information is verified in output.Otherwise, output error message.So far, the correctness of file File verifies.
Fig. 7 is the flow chart of another embodiment of the multilayer Summary file generation method under the massively parallel system of the present invention.The part identical with last embodiment repeats no more herein.Different from last embodiment is that in this embodiment, except one-level interlayer summary, also according to the summary number of stories m of predefined, the interlayer that generates the 2nd layer~m layer is made a summary, and is stored in the described multilayer Summary file.Accordingly, the Summary file structure of this embodiment is also different, has increased some contents, such as: summary number of stories m, and the piece number of the 2nd layer~m layer.
As shown in Figure 7, this embodiment may further comprise the steps:
Execution in step S71 take the original Hash block size of predefined as unit, splits into a plurality of first grade files with file destination.
Need to prove,, as unit file destination is split into before a plurality of first grade files at described original Hash block size take predefined, also can comprise: the structure that defines described multilayer Summary file.The structure of described multilayer Summary file comprises: file destination name, file destination size, summary number of stories m, original Hash block size, ground floor Hash piece are counted n 1, n 1Individual one-level interlayer summary, second layer Hash piece are counted n 2, n 2Individual secondary interlayer summary ..., m layer Hash piece counts n m, n mIndividual m level interlayer summary, total summary.At described original Hash block size take predefined as unit, file destination is split into before a plurality of first grade files, after the structure of the described multilayer Summary file of definition, also can comprise: for file destination name, file destination size, summary number of stories m, original Hash block size, the ground floor Hash piece of described multilayer Summary file are counted n 1, second layer Hash piece counts n 2..., m layer Hash piece counts n mAssignment.
Execution in step S72 is each first grade file with message digest algorithm, generates one-level interlayer summary, and described one-level interlayer summary is stored in the described multilayer Summary file.Particularly, described one-level interlayer summary is stored in the multilayer Summary file comprises: the n in take described one-level interlayer summary as described multilayer Summary file 1Individual one-level interlayer summary assignment.Described is each first grade file, generates one-level interlayer summary and is finished by task parallelisms different under the described massively parallel system.
Execution in step S73 obtains summary number of stories m, the ground floor Hash piece of predefined and counts n 1, second layer Hash piece counts n 2..., m layer Hash piece counts n m
Execution in step S74 is local variable i initialize 1.
Execution in step S75, whether judge i≤(m-1).
If i≤(m-1), that is: also have the interlayer summary not generate, then execution in step S76 splits into n with all i level interlayer summaries (i+1)Individual (i+1) level file.
Execution in step S77 is each (i+1) level file with message digest algorithm, generates one (i+1) level interlayer summary, and described (i+1) level interlayer summary is stored in n corresponding to described multilayer Summary file (i+1)In individual (i+1) level interlayer summary.Particularly, described (i+1) level interlayer summary is stored in the multilayer Summary file comprises: the n in take described (i+1) level interlayer summary as described multilayer Summary file (i+1)Individual (i+1) level interlayer summary assignment.
Execution in step S78, behind the cumulative i, to step S75 place, circulation is carried out.
If i〉(m-1), that is: the interlayer of m level summary all generates, and then execution in step S79 is all m level interlayer summaries with message digest algorithm, generates a total summary, and described total summary is stored in total summary corresponding to described multilayer Summary file.Particularly, described will always make a summary to be stored in the multilayer Summary file comprise: the total summary assignment in take described total summary as described multilayer Summary file.Total summary of described generation is finished by single process.So far, the multilayer Summary file generates.
Need to prove, although the message digest algorithm that adopts in this embodiment is the MD5 algorithm, should not be construed as concrete restriction of the present invention.It will be understood by those skilled in the art that other message digest algorithms in the state of the art, such as: sha1, des etc. all can be applicable among the present invention.
Accordingly, Fig. 8 is based on the flow chart of another embodiment of the file authentication verification method of multilayer Summary file under the massively parallel system of the present invention.The part identical with last embodiment repeats no more herein.Different from last embodiment is to have comprised the interlayer summary that two-stage is above in the multilayer Summary file that this embodiment adopts.As shown in Figure 8, this embodiment may further comprise the steps:
Execution in step S801 reads multilayer Summary file corresponding to described file, obtains original Hash block size.
Execution in step S802 take original Hash block size as unit, becomes n first grade file with file declustering.
Execution in step S803 be each first grade file with message digest algorithm, generates one-level interlayer summary, and one-level interlayer corresponding in described one-level interlayer summary and the described multilayer Summary file is made a summary to be compared, if both are inconsistent, and output error message then.
This embodiment can further comprise:
Execution in step S804, if all corresponding one-level interlayers are made a summary all unanimously in all one-level interlayers summaries and the described multilayer Summary file, that is: each first grade file is not all made mistakes, the summary number of stories m, the ground floor Hash piece that then obtain in the described multilayer Summary file are counted n 1, second layer Hash piece counts n 2..., m layer Hash piece counts n m
Execution in step S805 is local variable i initialize 1.
≤ (m-1) whether execution in step S806, interpretation i.
If i≤(m-1), then execution in step S807 counts n according to (i+1) layer Hash piece (i+1), all i level interlayer summaries are split into n (i+1)Individual (i+1) level file.Execution in step S808 is each (i+1) level file with message digest algorithm, generates one (i+1) level interlayer summary.Execution in step S809, (i+1) grade interlayer corresponding in described (i+1) level interlayer summary and the described multilayer Summary file made a summary to be compared, and judges whether both are consistent.If consistent, then execution in step S811 behind the cumulative i, to step S806 place, circulates and carries out.If inconsistent, execution in step S810 then, output error message, checking finishes.
If i〉(m-1), that is: the interlayer of m layer summary all generates, if in all m level interlayers summary and the described multilayer Summary file all corresponding m level interlayers make a summary all consistent, execution in step S812 then, be all m level interlayers summaries with message digest algorithm, generate one and always make a summary.
Execution in step S813 compares total summary corresponding in described total summary and the described multilayer Summary file; If both are consistent, then export correct information; If both are inconsistent, output error message then.
Need to prove that each the interlayer summary in this embodiment all can be generated by task parallelisms different under the described massively parallel system, total summary is generated by single process.
Need to prove, although the message digest algorithm that adopts in this embodiment is the MD5 algorithm, should not be construed as concrete restriction of the present invention.It will be understood by those skilled in the art that other message digest algorithms in the state of the art, such as: sha1, des etc. all can be applicable among the present invention.
Below in conjunction with the drawings and specific embodiments technical scheme of the present invention is described further.
Fig. 9 is based on the Summary file structural representation of the second embodiment of the file authentication verification method of multilayer Summary file under the second embodiment of the multilayer Summary file generation method under the above-mentioned massively parallel system and the above-mentioned massively parallel system.As shown in Figure 9, the Summary file of present embodiment comprises: filename, file size, the number of plies, Hash block size, ground floor Hash piece number, Hash1~Hash1024 be the hash value of totally 1024 16 bytes (be one-level interlayer summary), second layer Hash piece number, the Hash1025~Hash1028 hash value of totally 4 16 bytes (being secondary interlayer summary) and total hash value (namely always making a summary).Before generating the multilayer Summary file, present embodiment is to the some content assignment in the described multilayer Summary file, comprise: the file in the present embodiment is called File, this document size is 1GB, the interlayer summary of this document is 2 layers, the Hash block size is 1MB, and ground floor Hash piece number is 1024, and second layer Hash piece number is 4.
Need to prove that although present embodiment has adopted 2 grades of interlayer summaries, the present invention does not do concrete restriction to the quantity of interlayer summary, operating personnel can make to determine to adopt what interlayer summary by oneself according to the requirement of file scale and arithmetic speed.
Figure 10 is the flow chart of the second embodiment of the multilayer Summary file generation method under the massively parallel system of the present invention.
As shown in figure 10, at first according to the Hash block size 1MB in the multilayer Summary file, file File is split into 1024 first grade files.
Then, by mutually independently 1024 CPU respectively with message digest algorithm for this first grade file generates an one-level interlayer summary, be stored in during one-level interlayer corresponding in the multilayer Summary file makes a summary.Such as: CPU 1Read in the 1st first grade file, calculate the one-level interlayer summary Hash1 of the 128bit of the 1st first grade file with the MD5 algorithm, be stored in Hash1 place corresponding in the described multilayer Summary file.
Then, after 1024 one-level interlayers summary Hash1~Hash1024 had generated, the summary number of plies of obtaining predefined was 2, ground floor Hash piece number is 1024, second layer Hash piece number is 4.
Then, 1024 one-level interlayer summaries being split into 4(is second layer Hash piece number) individual second grade file.
By mutually independently 4 CPU respectively with message digest algorithm for this second grade file generates a secondary interlayer summary, be stored in during secondary interlayer corresponding in the multilayer Summary file makes a summary.Such as: CPU 2Read in the 2nd second grade file, calculate the secondary interlayer summary Hash1026 of the 128bit of the 2nd second grade file with the MD5 algorithm, be stored in Hash1026 place corresponding in the described multilayer Summary file.
Then, by CPU 1Be the total hash values of the total summary of 4 secondary interlayer summarization generations with MD5, be stored in total hash value corresponding in the described multilayer Summary file.So far, the multilayer Summary file of file File generates.
Figure 11 is based on the flow chart of the second embodiment of the file authentication verification method of multilayer Summary file under the massively parallel system of the present invention.
As shown in figure 11, the multilayer Summary file that at first file reading File is corresponding, obtaining original Hash block size is 1MB.
Then, take 1MB as unit, file File is split into 1024 first grade files.
Then, by mutually independently 1024 CPU read in respectively a first grade file, calculate the one-level interlayer summary of this first grade file with MD5, one-level interlayer corresponding in this one-level interlayer summary and the multilayer Summary file is made a summary to be compared.If both are inconsistent, illustrate that the I/O of this first grade file makes a mistake, output error message is then exported the sequence number of this first grade file simultaneously.Such as: CPU 2Read in the 2nd first grade file, calculate the one-level interlayer summary Hash2 ' of the 128bit of the 2nd first grade file with the MD5 algorithm, one-level interlayer corresponding in this one-level interlayer summary Hash2 ' and the multilayer Summary file Hash2 that makes a summary is compared.If both are different, show that then the 2nd first grade file mistake occurred in the I/O process, output error message, the first grade file sequence number 2 of simultaneously output error.
If 1024 corresponding one-level interlayers are made a summary all unanimously in 1024 one-level interlayer summary and the multilayer Summary file, illustrate that all one-level interlayers make a summary all correct in the I/O process.Then, obtain several 1024, the second layer Hash piece several 4 of the summary number of plies 2, ground floor Hash piece in the multilayer Summary file.
It is second layer Hash piece number that 1024 one-level interlayer summaries are split into 4() individual second grade file.
By mutually independently 4 CPU read in respectively a second grade file, generate a secondary interlayer summary with message digest algorithm for this second grade file, secondary interlayer corresponding in this secondary interlayer summary and the multilayer Summary file is made a summary to be compared.If both are inconsistent, illustrate that the I/O of this second grade file makes a mistake, then output error message.Such as: CPU 2Read in the 2nd second grade file, calculate the secondary interlayer summary Hash1026 ' of the 128bit of the 2nd second grade file with the MD5 algorithm, two corresponding in this secondary interlayer summary Hash1026 ' and the multilayer Summary file interlayers Hash1026 that makes a summary is compared.If both are different, show that then output error message mistake appearred, in the 2nd second grade file in the I/O process.
If in 4 secondary interlayer summary and the multilayer Summary file 4 corresponding secondary interlayers make a summary all consistent, then by CPU 1The corresponding total hash value of total summary compares in be the total hash values of the total summary of 4 secondary interlayer summarization generations with MD5 ', will be somebody's turn to do total hash value of always making a summary ' and the multilayer Summary file.If both are consistent, then correct information is verified in output.Otherwise, output error message.So far, the correctness of file File verifies.
Need to prove that through the above description of the embodiments, those skilled in the art can be well understood to and of the present inventionly partly or entirely can realize by software and in conjunction with essential general hardware platform.Based on such understanding, the part that technical scheme of the present invention contributes to prior art in essence in other words can embody with the form of software product, this computer software product can comprise the one or more machine readable medias that store machine-executable instruction on it, and these instructions are can be so that these one or more machines come executable operations according to embodiments of the invention when carrying out such as the one or more machines such as computer, computer network or other electronic equipments.Machine readable media can comprise, but be not limited to floppy disk, CD, CD-ROM(compact-disc-read-only memory), magneto optical disk, ROM(read-only memory), the RAM(random access memory), the EPROM(Erasable Programmable Read Only Memory EPROM), the EEPROM(Electrically Erasable Read Only Memory), magnetic or optical card, flash memory or be suitable for store the medium/machine readable media of the other types of machine-executable instruction.
The present invention can be used in numerous general or special purpose computingasystem environment or the configuration.Such as: personal computer, server computer, handheld device or portable set, plate equipment, multicomputer system, based on microprocessor system, set top box, programmable consumer-elcetronics devices, network PC, minicom, mainframe computer, comprise the distributed computing environment (DCE) of above any system or equipment etc.
The present invention can describe in the general context of the computer executable instructions of being carried out by computer, for example program module.Usually, program module comprises the routine carrying out particular task or realize particular abstract data type, program, object, assembly, data structure etc.Also can in distributed computing environment (DCE), put into practice the application, in these distributed computing environment (DCE), be executed the task by the teleprocessing equipment that is connected by communication network.In distributed computing environment (DCE), program module can be arranged in the local and remote computer-readable storage medium that comprises memory device.
Although the present invention with preferred embodiment openly as above; but it is not to limit the present invention; any those skilled in the art without departing from the spirit and scope of the present invention; can utilize method and the technology contents of above-mentioned announcement that technical solution of the present invention is made possible change and modification; therefore; every content that does not break away from technical solution of the present invention; to any simple modification, equivalent variations and modification that above embodiment does, all belong to the protection range of technical solution of the present invention according to technical spirit of the present invention.

Claims (17)

1. the multilayer Summary file generation method under the massively parallel system is characterized in that, comprising:
Take the original Hash block size of predefined as unit, file destination is split into a plurality of first grade files;
Be each first grade file with message digest algorithm, generate one-level interlayer summary, described one-level interlayer summary is stored in the described multilayer Summary file;
Based on described one-level interlayer summary, use at least message digest algorithm one time, generate always summary, described total summary is stored in the described multilayer Summary file.
2. the multilayer Summary file generation method under the massively parallel system as claimed in claim 1 is characterized in that, describedly uses at least message digest algorithm one time based on described one-level interlayer summary, generates total summary and comprises:
Use message digest algorithm always to make a summary for all one-level interlayer summarization generations.
3. the multilayer Summary file generation method under the massively parallel system as claimed in claim 1 is characterized in that, describedly uses at least message digest algorithm one time based on described one-level interlayer summary, generates total summary and comprises:
Obtain summary number of stories m, the ground floor Hash piece of predefined and count n 1, second layer Hash piece counts n 2..., m layer Hash piece counts n m
When described summary number of stories m greater than 1 the time, repeat following steps, until form all m level interlayers summaries:
1) all i level interlayer summaries is split into n (i+1)Individual (i+1) level file; 1≤i≤(m-1), i is positive integer;
2) be each (i+1) level file with message digest algorithm, generate (i+1) level interlayer summary, described (i+1) level interlayer summary is stored in the described multilayer Summary file;
3) begin circulation from step 1) behind the cumulative i;
Use message digest algorithm always to make a summary for all m level interlayer summarization generations.
4. the multilayer Summary file generation method under the massively parallel system as claimed in claim 2, it is characterized in that,, as unit file destination is split into before a plurality of first grade files at described original Hash block size take predefined, also comprise: the structure that defines described multilayer Summary file;
The structure of described multilayer Summary file comprises: file destination name, file destination size, original Hash block size, original Hash piece are counted n, n one-level interlayer summary, total summary.
5. the multilayer Summary file generation method under the massively parallel system as claimed in claim 3, it is characterized in that,, as unit file destination is split into before a plurality of first grade files at described original Hash block size take predefined, also comprise: the structure that defines described multilayer Summary file;
The structure of described multilayer Summary file comprises: file destination name, file destination size, summary number of stories m, original Hash block size, ground floor Hash piece are counted n 1, n 1Individual one-level interlayer summary, second layer Hash piece are counted n 2, n 2Individual secondary interlayer summary ..., m layer Hash piece counts n m, n mIndividual m level interlayer summary, total summary.
6. the multilayer Summary file generation method under the massively parallel system as claimed in claim 4 is characterized in that:
At described original Hash block size take predefined as unit, file destination is split into before a plurality of first grade files, also comprise: be the file destination name in the described multilayer Summary file, file destination size, original Hash block size, original Hash piece number n assignment;
Described one-level interlayer summary is stored in the multilayer Summary file comprises: n one-level interlayer summary assignment in take described one-level interlayer summary as described multilayer Summary file;
Described will always make a summary to be stored in the multilayer Summary file comprise: the total summary assignment in take described total summary as described multilayer Summary file.
7. the multilayer Summary file generation method under the massively parallel system as claimed in claim 5 is characterized in that:
At described original Hash block size take predefined as unit, file destination is split into before a plurality of first grade files, also comprise: for file destination name, file destination size, summary number of stories m, original Hash block size, the ground floor Hash piece of described multilayer Summary file are counted n 1, second layer Hash piece counts n 2..., m layer Hash piece counts n mAssignment;
Described one-level interlayer summary is stored in the multilayer Summary file comprises: the n in take described one-level interlayer summary as described multilayer Summary file 1Individual one-level interlayer summary assignment;
Described (i+1) level interlayer summary is stored in the multilayer Summary file comprises: the n in take described (i+1) level interlayer summary as described multilayer Summary file (i+1)Individual (i+1) level interlayer summary assignment;
Described will always make a summary to be stored in the multilayer Summary file comprise: the total summary assignment in take described total summary as described multilayer Summary file.
8. the multilayer Summary file generation method under any massively parallel system as claimed in claim 2 or claim 3 is characterized in that summary is finished by task parallelisms different under the described massively parallel system between generation layer.
9. the multilayer Summary file generation method under any massively parallel system as claimed in claim 2 or claim 3 is characterized in that, generates total summary and is finished by single process.
10. the multilayer Summary file generation method under any massively parallel system as claimed in claim 2 or claim 3 is characterized in that described message digest algorithm is the MD5 algorithm.
11. based on the file authentication verification method of multilayer Summary file, it is characterized in that under the massively parallel system, comprising:
Read multilayer Summary file corresponding to described file, obtain original Hash block size;
Take described original Hash block size as unit, described file declustering is become a plurality of first grade files;
Be each first grade file with message digest algorithm, generate one-level interlayer summary, one-level interlayer corresponding in described one-level interlayer summary and the described multilayer Summary file is made a summary to be compared, if both are inconsistent, and output error message then.
12. based on the file authentication verification method of multilayer Summary file, it is characterized in that under the massively parallel system as claimed in claim 11, in output error message, export the sequence number of first grade file corresponding to described error message.
13. based on the file authentication verification method of multilayer Summary file, it is characterized in that the method also comprises under the massively parallel system as claimed in claim 11:
If all corresponding in all one-level interlayers summaries and described multilayer Summary file one-level interlayers summaries are all consistent, be that all one-level interlayers are made a summary with message digest algorithm then, generation is always made a summary;
Total summary corresponding in described total summary and the described multilayer Summary file is compared; If both are consistent, then export correct information; If both are inconsistent, output error message then.
14. based on the file authentication verification method of multilayer Summary file, it is characterized in that the method also comprises under the massively parallel system as claimed in claim 11:
If all corresponding one-level interlayers are made a summary all unanimously in all one-level interlayers summaries and the described multilayer Summary file, the summary number of stories m, the ground floor Hash piece that then obtain in the described multilayer Summary file are counted n 1, second layer Hash piece counts n 2..., m layer Hash piece counts n m
When described summary number of stories m greater than 1 the time, repeat following steps, until generate all m level interlayers summaries:
1) counts n according to (i+1) layer Hash piece (i+1), all i level interlayer summaries are split into n (i+1)Individual (i+1) level file; 1≤i≤(m-1), i is positive integer;
2) be each (i+1) level file with message digest algorithm, generate (i+1) level interlayer summary;
3) (i+1) level interlayer summary corresponding in described (i+1) level interlayer summary and the described multilayer Summary file is compared if both are consistent, then 4) begin to circulate from step 1) behind the i that adds up; If both are inconsistent, output error message then;
If all corresponding in all m level interlayers summaries and described multilayer Summary file m level interlayers summaries are all consistent, be that all m level interlayers are made a summary with message digest algorithm then, generation is always made a summary;
Total summary corresponding in described total summary and the described multilayer Summary file is compared; If both are consistent, then export correct information; If both are inconsistent, output error message then.
15. as under claim 13 or 14 described any massively parallel systems based on the file authentication verification method of multilayer Summary file, it is characterized in that, between generation layer the summary finished by task parallelisms different under the described massively parallel system.
16. as under claim 13 or 14 described any massively parallel systems based on the file authentication verification method of multilayer Summary file, it is characterized in that, generate total summary and finished by single process.
17. as under claim 13 or 14 described any massively parallel systems based on the file authentication verification method of multilayer Summary file, it is characterized in that described message digest algorithm is the MD5 algorithm.
CN2012103947659A 2012-10-16 2012-10-16 Multi-layer digest file generation method and file correctness verification method for massively parallel system Pending CN102946379A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2012103947659A CN102946379A (en) 2012-10-16 2012-10-16 Multi-layer digest file generation method and file correctness verification method for massively parallel system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012103947659A CN102946379A (en) 2012-10-16 2012-10-16 Multi-layer digest file generation method and file correctness verification method for massively parallel system

Publications (1)

Publication Number Publication Date
CN102946379A true CN102946379A (en) 2013-02-27

Family

ID=47729286

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012103947659A Pending CN102946379A (en) 2012-10-16 2012-10-16 Multi-layer digest file generation method and file correctness verification method for massively parallel system

Country Status (1)

Country Link
CN (1) CN102946379A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109560931A (en) * 2018-11-30 2019-04-02 江苏恒宝智能系统技术有限公司 A kind of equipment remote upgrade method based on no Certification system
CN109981291A (en) * 2019-03-27 2019-07-05 国家电网有限公司 A kind of mixing packet signature method
CN114970564A (en) * 2022-06-16 2022-08-30 北京汉端科技有限公司 Cloud platform based micro-service civil aviation safety management SOP system
CN115509800A (en) * 2022-11-21 2022-12-23 苏州浪潮智能科技有限公司 Metadata verification method, system, computer equipment and storage medium
CN116702225A (en) * 2023-06-08 2023-09-05 重庆傲雄在线信息技术有限公司 Method, system, equipment and medium for fast verifying electronic archive file based on hash parallel computing

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050188216A1 (en) * 2003-04-18 2005-08-25 Via Technologies, Inc. Apparatus and method for employing cyrptographic functions to generate a message digest
US20060010327A1 (en) * 2004-06-25 2006-01-12 Koshy Kamal J Apparatus and method for performing MD5 digesting
CN101090320A (en) * 2007-07-13 2007-12-19 王少波 Indentify authorization method for dectronic signature
CN101458618A (en) * 2007-12-11 2009-06-17 刘勇 Parallel hash function mode

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050188216A1 (en) * 2003-04-18 2005-08-25 Via Technologies, Inc. Apparatus and method for employing cyrptographic functions to generate a message digest
US20060010327A1 (en) * 2004-06-25 2006-01-12 Koshy Kamal J Apparatus and method for performing MD5 digesting
CN101090320A (en) * 2007-07-13 2007-12-19 王少波 Indentify authorization method for dectronic signature
CN101458618A (en) * 2007-12-11 2009-06-17 刘勇 Parallel hash function mode

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109560931A (en) * 2018-11-30 2019-04-02 江苏恒宝智能系统技术有限公司 A kind of equipment remote upgrade method based on no Certification system
CN109560931B (en) * 2018-11-30 2020-11-24 江苏恒宝智能系统技术有限公司 Equipment remote upgrading method based on certificate-free system
CN109981291A (en) * 2019-03-27 2019-07-05 国家电网有限公司 A kind of mixing packet signature method
CN114970564A (en) * 2022-06-16 2022-08-30 北京汉端科技有限公司 Cloud platform based micro-service civil aviation safety management SOP system
CN114970564B (en) * 2022-06-16 2023-02-03 北京汉端科技有限公司 Cloud platform based micro-service civil aviation safety management SOP system
CN115509800A (en) * 2022-11-21 2022-12-23 苏州浪潮智能科技有限公司 Metadata verification method, system, computer equipment and storage medium
WO2024109236A1 (en) * 2022-11-21 2024-05-30 苏州元脑智能科技有限公司 Metadata check method and system, and computer device and non-volatile readable storage medium
CN116702225A (en) * 2023-06-08 2023-09-05 重庆傲雄在线信息技术有限公司 Method, system, equipment and medium for fast verifying electronic archive file based on hash parallel computing

Similar Documents

Publication Publication Date Title
CN102946379A (en) Multi-layer digest file generation method and file correctness verification method for massively parallel system
CN106326742A (en) System and method for determining modified web pages
CN110474900B (en) Game protocol testing method and device
CN101777103A (en) The method of authenticating computer program, the method that computer program is provided and device thereof
CN106557697A (en) The system and method for generating antivirus set of records ends
CN111563016B (en) Log collection and analysis method and device, computer system and readable storage medium
CN104461641B (en) A kind of data programming method, system, burn writing equipment and target device
CN113553380B (en) Reputation-behavior association-oriented data tracing system for double-block chain
CN103403781A (en) Secure multiply-accumulate union system, computation device, secure multiply-accumulate union method, and program thereof
CN101652755A (en) Software behavior modeling device, software behavior modeling method, software behavior verification device, and software behavior verification method
CN106547648A (en) Backup data processing method and device
CN107861793A (en) Virtual hardware platform starts method, apparatus, equipment and computer-readable storage medium
CN101276389A (en) Separation of logical trusted platform modules within a single physical trusted platform module
CN106021048A (en) Out-of-order verification method and device of disk pack
CN103745166A (en) Method and device for inspecting file attribute value
CN108460068A (en) Method, apparatus, storage medium and the terminal that report imports and exports
CN115222410A (en) Block chain based transaction uplink method and device, electronic equipment and storage medium
CN111176567A (en) Storage supply amount verification method and device for distributed cloud storage
CN109951527B (en) Virtualization system-oriented hypervisor integrity detection method
CN111221795A (en) Virtual disk data verification method, device and medium
CN113411191B (en) Data auditing method and device
CN103577758A (en) Program code verification method and device
CN107704548A (en) A kind of storage medium and storage method of object data, device and equipment
CN101582106A (en) Integrity inspection method for fine-grained data
CN105260425A (en) Cloud disk based file display method and apparatus

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20130227