CN103916483A - Self-adaptation data storage and reconstruction method for coding redundancy storage system - Google Patents

Self-adaptation data storage and reconstruction method for coding redundancy storage system Download PDF

Info

Publication number
CN103916483A
CN103916483A CN201410175898.6A CN201410175898A CN103916483A CN 103916483 A CN103916483 A CN 103916483A CN 201410175898 A CN201410175898 A CN 201410175898A CN 103916483 A CN103916483 A CN 103916483A
Authority
CN
China
Prior art keywords
file
cryptographic hash
client
file block
stored
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410175898.6A
Other languages
Chinese (zh)
Inventor
蒋海波
李娜
周星梅
陈建中
王晓京
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Institute of Biology of CAS
Original Assignee
Chengdu Institute of Biology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Institute of Biology of CAS filed Critical Chengdu Institute of Biology of CAS
Priority to CN201410175898.6A priority Critical patent/CN103916483A/en
Publication of CN103916483A publication Critical patent/CN103916483A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Storage Device Security (AREA)

Abstract

The invention provides a self-adaptation data storage and reconstruction method for a coding redundancy storage system. The self-adaptation data storage and reconstruction method includes the following steps: (1) a client terminal calculates a Hash value of a file to be stored and uploads the Hash value to a server terminal; (2) the Hash value of the file to be stored is compared with Hash values of files already stored in the server terminal; (3) if the Hash value equal to the Hash value of the file exists, the server terminal does not accept uploading of the file to be stored, if the Hash value equal to the Hash value of the file does not exist, the server terminal accepts uploading of the file to be stored, the uploaded file is partitioned, Hash values of the file partitions are calculated and stored, and the file partitions are encoded to generate verification data partitions. Compared with the prior art, the self-adaptation data storage and reconstruction method has the advantages that the Hash value of the file to be stored and the Hash values of the file partitions are recorded, the corresponding file storage and reconstruction method is selected according to conditions of the system and the client terminal, and therefore network bandwidth pressure and calculation pressure, caused by data reconstruction, on a data center are reduced.

Description

A kind of self-adapting data storage and reconstructing method for coding redundancy storage system
Technical field
The present invention relates to areas of information technology, particularly relate to the self-adapting data storage and the obliterated data reconstructing method that utilize the data-storage system that coding redundancy strategy is basic storage architecture.
Background technology
Compared with copying redundant technique, based on the reliability engineering of coding redundancy, under the prerequisite that has identical fault-tolerant ability, there is lower data redundancy and storage overhead.But in the time there is node damage or dropout of data block in system, storage policy based on copying redundancy only needs to download the data volume onesize with obliterated data just can realize repair process, but based on coding redundancy, as at least needing to download k obliterated data amount doubly, the redundancy strategy of correcting and eleting codes could go out obliterated data by decoding and reconstituting.Thereby compared with copying redundancy strategy, correcting and eleting codes redundancy strategy will take more network bandwidth resources in the time that file recovers, this brings larger pressure will to original just more nervous network bandwidth resources in data center, and then brings larger performance impact to reading of data.In data repair process, need the larger bandwidth of safeguarding just because of correcting and eleting codes redundancy strategy, need in addition more complicated management strategy, thereby greatly limited application and popularization based on input tolerant for Chinese technology.
Obviously, how utilizing the design of architecture to avoid this inferior position existing in reconstruct bandwidth aspect of coding redundancy strategy, or make erasure code storage system have more excellent performance, is the focus that industry is paid close attention to equally.In recent years,, along with the develop rapidly of mass data storage system-based link (as server performance, the network bandwidth, transmission technology etc.), the center of gravity of system configuration research is gradually to client transition.
The present invention is directed in coding redundancy storage system, in the time having file block to lose, system need to be called the shortcoming of carrying out the file block of reconstruction of lost much larger than the data volume of losing, for the feature of correcting and eleting codes distributed storage, has proposed distributed storage and reconstruct mode based on peering structure.
Summary of the invention
The problem existing for prior art, main purpose of the present invention is to provide a kind of network bandwidth pressure of coding redundancy storage system and self-adapting data storage and reconstructing method of server end calculating pressure of reducing.
A kind of self-adapting data storage means for coding redundancy storage system, this coding redundancy storage system comprises server end and client, user end to server end proposes file storage demand, should comprise the steps: for the self-adapting data storage means of coding redundancy storage system
(1) this client is calculated the cryptographic Hash of file to be stored, by the cryptographic Hash of the file to be stored end that uploads onto the server;
(2) cryptographic Hash of the file of the cryptographic Hash of file to be stored and server end having been stored is compared;
(3) if there is identical cryptographic Hash, server end is not accepted uploading of this file to be stored, but server end is accepted the fileinfo about storage file of client upload, and set up the associated of this fileinfo and the file with identical cryptographic Hash of storing; If there is not identical cryptographic Hash, server end is accepted uploading of this file to be stored, and the file of uploading is carried out to piecemeal, calculates and store the cryptographic Hash of each file block, and each file block is encoded and produced checking data piecemeal.
Further, calculate the cryptographic Hash of file to be stored in step (1) client before, whether the first query user's end of server end is ready to calculate the cryptographic Hash of file to be stored, if client is unwilling to calculate the cryptographic Hash of file to be stored,, by direct this file to be stored upload server end, calculated the cryptographic Hash of this file to be stored by server end.For the direct upload server end of file to be stored, calculated the situation of the cryptographic Hash of this file to be stored by server end, step (3) if in there is identical cryptographic Hash, the file to be stored of having uploaded is deleted.
For a self-adapting data reconstructing method for coding redundancy storage system, should comprise the steps: for the self-adapting data reconstructing method of coding redundancy storage system
(1) detect in real time coding redundancy storage system service device end and whether have memory node damage;
(2) in the time having memory node damage, the cryptographic Hash corresponding to file block of damage memory node is labeled as lost condition by system, and judge whether the memory node number of damaging is greater than the set point of system, within the scope of the disaster tolerance that the set point of this system allows at coding;
(3) in the time that the memory node number of damage is greater than the set point of system, coding redundancy storage system service device end utilizes internal calculation resource to be reconstructed the file block of losing, in the time that the memory node number of damage is not more than the set point of system, enter step (4);
(4) whether real-time judge has client to propose file reading request;
(5), if there is client to propose file reading request, judge whether file to be read has file block to lose;
(6) if file to be read has file block to lose, judge whether the hardware resource of client meets the hardware performance requirement of setting;
(7), if the hardware resource of client does not meet the hardware performance requirement of setting, coding redundancy storage system service device end utilizes internal calculation resource to be reconstructed the file block of losing, and the file block recovering is offered to client; If the hardware resource of client meets the hardware performance requirement of setting, enter step (8);
(8) recovery that whether query user's end is ready to participate in losing file block;
(9), if client is ready to participate in losing the recovery of file block, client downloads associated documents piecemeal and the checking data piecemeal that can meet recovery loss file block, recover the file block of losing.If client is unwilling to participate in losing the recovery of file block, coding redundancy storage system service device end utilizes internal calculation resource to be reconstructed the file block of losing.
Further, after the file block of losing is recovered, should also comprise the steps: for the self-adapting data reconstructing method of coding redundancy storage system
Calculate the cryptographic Hash of the file block of reconstruct;
By the cryptographic Hash of this reconstruct file block end that uploads onto the server;
Whether the cryptographic Hash that server end compares reconstruct file block is identical with the cryptographic Hash of the original piecemeal of corresponding loss;
If the cryptographic Hash of reconstruct file block is identical with the cryptographic Hash of the original piecemeal of corresponding loss, allow the file block of client upload reconstruct;
The file block of this reconstruct of client upload;
Server end calculates the cryptographic Hash of this upload file piecemeal, and compares with the cryptographic Hash of the original piecemeal of corresponding loss;
If when the cryptographic Hash of the cryptographic Hash of upload file piecemeal and the original piecemeal of corresponding loss is identical, server end is preserved this file block of uploading, and cryptographic Hash corresponding to this upload file piecemeal is set to available.
With respect to prior art, first, the present invention has proposed a kind of distributed storage and reconstruct mode based on peering structure on storage system architecture, utilize the computing capability of scattering extensive client, make system the least possible the utilize inner limited network bandwidth of data center server end and computational resource realize the decoding reconstruct of obliterated data, but according to client to the characteristics of demand of file by lose file block restore funcitons part moved to client, thereby weaken system and recovered when data that each node is concurrent reads in the impact of phenomenon on cluster internal network, solve correcting and eleting codes redundancy strategy from the angle of architecture Design and safeguarded bandwidth problem when the recovery file, specifically, in the time there is node damage or loss of data in system, system can determine it is recovered voluntarily or brought in data are recovered by client by internal system according to the needs of client and related hardware performance.In the time utilizing client to be reconstructed data, client can be calculated the cryptographic Hash of the obliterated data reconstructing, and this value is sent to server end, and after server end is confirmed this cryptographic Hash, client is by the data block of uploading this and reconstructing; Secondly, the present invention is by recording the cryptographic Hash of storage file and each file block, and select corresponding file memory method and the reconstructing method of losing file block according to the situation of the operation of system and client, reduce the impact of the I/O performance of file block restructuring procedure on system cluster internal transmission and foreground application, and then reduce network bandwidth pressure and calculating pressure that data center brings because of reconstruct data.
Accompanying drawing explanation
Fig. 1 is the self-adapting data reconstructing method flow chart that the present invention is directed to coding redundancy storage system
Fig. 2 is that the self-adapting data reconstructing method client data that the present invention is directed to coding redundancy storage system recovers schematic diagram
Embodiment
Below in conjunction with accompanying drawing, describe the specific embodiment of the present invention in detail.
The present invention is directed to and utilize the data storage center of coding redundancy strategy as basic storage architecture, designed a kind of data storage center system, be i.e. collaborative data reconstruction mode of carrying out between client and widely distributed client.Coding redundancy storage system is made up of server end and widely distributed client.The basic storage architecture of data-storage system is realized by coding strategy, and this coding strategy is not limited to a certain specific coded system, as RS coding method, EVENODD coding method etc. all can.
The present invention proposes a kind of self-adapting data storage means for coding redundancy storage system, in the time that user end to server end proposes file storage demand, should comprise the steps (1) to step (3) for the self-adapting data storage means of coding redundancy storage system:
(1) client is calculated the cryptographic Hash of file to be stored, by the cryptographic Hash of the file to be stored end that uploads onto the server.Utilize existing hash algorithm the binary value of random length to be mapped as to regular length (125 or 250 s') the character string being made up of letter and number, this character string with regular length is cryptographic Hash.Cryptographic Hash is the unique and extremely compact numeric representation form of one piece of data.A corresponding cryptographic Hash of file, different data are after conversion, even only differ from a character between two files, its cryptographic Hash is all different.Calculate the cryptographic Hash of file to be stored in client before, can first whether be ready to calculate the cryptographic Hash of file to be stored by server end query user end, if client is reluctant the cryptographic Hash of file to be stored to calculate, file to be stored is uploaded in acceptance by server end, and calculates the cryptographic Hash of this file.If client is ready the cryptographic Hash of file to be stored to calculate, calculated the cryptographic Hash of file to be stored by client.
(2) cryptographic Hash of the file of the cryptographic Hash of file to be stored and server end having been stored is compared.
(3) if there is identical cryptographic Hash, server end is not accepted uploading of file to be stored; For the direct upload server end of file to be stored, calculated the situation of the cryptographic Hash of this file to be stored by server end, if there is identical cryptographic Hash, the file to be stored of having uploaded is deleted.Before file is uploaded, first utilize cryptographic Hash to verify, can avoid different clients to repeat to upload same file, and then reduce the data redudancy of storage system inside.Though server end does not accept to have the uploading of file to be stored of identical cryptographic Hash, but server end is accepted the fileinfo about storage file of client upload, the fileinfo of storage file refers to the file markup information such as size, the type of file of the title of file, file.And set up the associated of this fileinfo and the file with identical cryptographic Hash of storing, so that client can be identified this file by the feature of its regulation.As user A and user B have one section of identical video, but the file name difference of their video, whether the cryptographic Hash of first utilizing hash algorithm to obtain is differentiated these two videos identical, if identical, do not accept the video file that second user uploads, but accept the fileinfo of second user upload file, also can carry out mark according to the information of second client upload file, now in storage system, only have a this video file, but have two parts of fileinfos associated with this video file;
If the cryptographic Hash table that server end has been stored through inquiry, there is not identical cryptographic Hash, server end is accepted uploading of this file to be stored, and the cryptographic Hash of storing this file, the file of uploading is carried out to piecemeal simultaneously, calculate and store the cryptographic Hash of each file block, the corresponding cryptographic Hash of each file block, the cryptographic Hash of file block is mainly whether the file block that the reconstruct for checking subsequent client to upload is good is the file block of losing, and the cryptographic Hash that file block has is the backup of whole file cryptographic Hash.In addition each file block is encoded and produce checking data piecemeal, can adopt existing STAR code, EVENODD code or other coding method to produce checking data piecemeal.
The present invention also proposes a kind of self-adapting data reconstructing method for coding redundancy storage system, as shown in Figure 1, should comprise the steps S1 to S15 for the self-adapting data reconstructing method of coding redundancy storage system:
S1, the output of system detection module, system brings into operation.
When S2, system operation, detect in real time coding redundancy storage system service device end and whether have memory node damage.In the time having memory node damage, the cryptographic Hash corresponding to file block of damage memory node is labeled as lost condition by system, and enter step S3, if while damage without memory node, enter step S15, exits recovery pattern.
S3, judge whether the memory node number of damage is greater than the set point of system, and the coding method that the set point of this system is used according to coding redundancy storage system is set, within the scope of the disaster tolerance that must allow at coding.Each coding method all can have a fault-tolerant boundary, can allow that 50% memory node is damaged, thereby set point can, according to the boundary of input tolerant for Chinese, be set below boundary as RS code.In the time that the memory node number of damage is greater than the set point of system, enter step S4, when the memory node number of damage is not more than, while being less than or equal to the set point of system, enter step S6.
S4, enter clustered node entirety recovery pattern, coding redundancy storage system service device end utilizes internal calculation resource to be reconstructed the file block of losing.
S5, judge whether to be recovered, if be recovered, enter step S15, exit recovery pattern, if be not recovered, return to step S3.
Whether S6, real-time judge have client to propose file reading request, if there is client to propose file reading request, enter step S7, if propose file reading request without client, return to step S2;
S7, judge that whether file to be read is complete, whether have file block to lose, if file to be read has file block to lose, enter step S8, if file to be read is lost without file block, enter step S15, exit recovery pattern, client is file reading directly.
S8, judge whether the hardware resource of client meets the hardware performance requirement of setting, mainly determine under this kind of configuring condition obliterated data is recovered according to the hardware configuration situation of client herein, bring impact whether can to the program of the normal operation of client, or considerable influence machine runnability.If the hardware resource of client does not meet the hardware performance requirement of setting, enter step S9.If the hardware resource of client meets the hardware performance requirement of setting, enter step S11.
S9, enter cluster monofile recover pattern, coding redundancy storage system service device end utilize internal calculation resource to lose file block be reconstructed, the file block recovering is offered to client.
S10, judge whether to be recovered, if be recovered, enter step S15, exit recovery pattern, if be not recovered, return to step S9.
The recovery whether S11, query user's end are ready to participate in losing file block.If client is ready to participate in losing the recovery of file block, enter step S12, if client is unwilling to participate in losing the recovery of file block, enter step S9, lost the recovery of file block by server end.
S12, client downloads associated documents piecemeal and can meet and recover to lose the checking data piecemeal of file block, recover the file block of losing.Client verifies the file block of reconstruct after the file block of losing is recovered, and method is: the cryptographic Hash of file block of calculating reconstruct; Client is by the cryptographic Hash of this reconstruct file block end that uploads onto the server; Whether server end checks this cryptographic Hash correct, and relatively whether the cryptographic Hash of the cryptographic Hash of reconstruct file block and the original piecemeal of corresponding loss is identical; If the cryptographic Hash of reconstruct file block is identical with the cryptographic Hash of the original piecemeal of corresponding loss, illustrate that the reconstruct file block that this client intends uploading is in full accord with the original piecemeal of losing, allow the file block of client upload reconstruct and enter step S13; If the cryptographic Hash of reconstruct file block is not identical with the cryptographic Hash of the original piecemeal of corresponding loss, illustrate that the reconstruct file block that this client intends uploading is not the original piecemeal of losing, do not allow the file block of client upload reconstruct.
The file block of S13, client upload reconstruct.
S14, that whether the file block that judges reconstruct is uploaded is complete, and server end calculates the cryptographic Hash of this upload file piecemeal, and compares with the cryptographic Hash of the original piecemeal of corresponding loss; If when the cryptographic Hash of the cryptographic Hash of upload file piecemeal and the original piecemeal of corresponding loss is identical, illustrate that the reconstruct file block of this client upload and the original piecemeal of loss are in full accord, server end is preserved this file block of uploading, and cryptographic Hash corresponding to this file block is set to available.If it is complete that the file block of reconstruct is uploaded, enter step S15, exit recovery pattern, complete if the file block of reconstruct is not uploaded, return to step S13.
As shown in Figure 2, utilize client to carry out the schematic diagram of data recovery for the present invention is directed to the self-adapting data reconstructing method of coding redundancy storage system.In the time that coding redundancy storage system occurs that file block is lost, system both can adopt server end internal calculation resource to be reconstructed the file block of losing, also can utilize the computational resource of client to be reconstructed, in the time utilizing client to recover the file block of losing, system does not force client to be recovered the file block of losing, but have when losing the file of file block and read demand when client, client is in downloading all the other intact piecemeals of required file, and download the checking data piecemeal of some, and the file block reconstruct of losing the most at last out, and be that client oneself is used.In addition, the file block reconstructing is calculated to cryptographic Hash, client is by the cryptographic Hash of the reconstruct file block end that uploads onto the server, server end judges that by comparison cryptographic Hash the reconstruct file block that this client intends uploading is in full accord with the original piecemeal of losing, if consistent, allow the file block of client upload reconstruct, client is by the file block of the reconstruct end that uploads onto the server.
Server end, before client upload file block and after client upload file block, has all carried out the comparison of cryptographic Hash.The comparison of carrying out cryptographic Hash before client upload file block is in full accord mainly for guaranteeing the file block of reconstruct and the original piecemeal of loss; After client upload file block, server end is preserved file block and is carried out the comparison of cryptographic Hash before, mainly to be afraid of to occur in upload procedure loss of data or the malice that occurs because of other situation is distorted, for guaranteeing the accuracy of upload file, therefore before preserving, file carries out again the contrast of a cryptographic Hash.
Embodiment mono-
In the present embodiment, coding redundancy storage system is to utilize ordinary PC to build, thereby system is processed node failure as normality.In the time utilizing the file block of erasure code reconstruction of lost, according to erasure code principle, system needs m data block at least arbitrarily, be that m file block and checking data piecemeal participation computing just can recover the file block of losing, thereby follow the In Cast problem based on internal network of its generation (because reconstruct node need to call multiple file blocks, thereby, there will be multiple file blocks in reconstruct set of node, to converge simultaneously, in the time of aggregation node network interface card poor-performing, there will be the situation of data delay) can seriously reduce cluster internal network concurrent transmission ability.But, based on entangling many wrong erasure code storage modes, for the file block that recovers to lose provides more optional recovery policy.In order to reduce the impact of the I/O performance of file block restructuring procedure on system cluster internal transmission and foreground application, the present embodiment utilization is dispersed in reconstruct and the cluster inside that a large amount of computational resource of client participates in losing file block, be the mode that server end concentrates reconstruct to combine, realize the reconstruct of file block on damage node.
Because the client of coding redundancy storage system may be the good PC of hardware performance, it may be also the mobile terminal of poor-performing.Although each client node is equality in network, their network bandwidth, computing capability, memory size may have larger difference.Thereby the node less for the network bandwidth, computing capability is weak, memory size is less, if allow it carry out file reading by client reset mode, may have poor user and experience.Thereby system must be tested relevant hardware performance and the computing capability of client.The good node of those hardware performances and computing capability is chosen as the computing node of client recovery pattern.These nodes should have following characteristics: first this node has the sufficient network bandwidth and less network delay, and have enough computing capabilitys and memory headroom.This node can guarantee stable, relatively long line duration, can not add too frequently or leave system.The autonomous selection of this node is obtained file by the mode that client is recovered, and on being ready the file block reconstructing to upload onto the server.
The erasure code parameter that in the present embodiment, system is selected is (n, k, n-k+1) MDS code, this coding minimum range is n-k+1, even original is k piecemeal, n-k checking data piecemeal of encoded rear generation, from this n data block (file block and checking data piecemeal), take out arbitrarily k piecemeal just restructural go out original.
When cluster storage system operation, to reconstruct critical parameters k (1≤k≤n-m), the set point of system arranges.When damaging memory node k in cluster fwhen≤k, cluster manager dual system (server end) not organization internal node recovers the data block on damage memory node, but utilizes client to recover required file block.Because checking data piecemeal can not be directly for user provides available information, thereby, consider that extreme case is that the node of damage is all the node at original piecemeal place.Client, in the time reading a certain file, need to download n-k remaining in cluster simultaneously findividual original piecemeal and k findividual checking data piecemeal, and download codeword information, reconstruct the k of damage findividual file block, and by the file block reconstructing with ?the n-k downloading findividual file block, is spliced into original.Meanwhile, client is by the k recovering findividual file block encapsulates again according to the form of file block encapsulation in group system, and uploads onto the server on cluster.File server will calculate the cryptographic Hash of this file block, and compares with the cryptographic Hash of the original piecemeal of storing, if identical, stores this piecemeal, if the different upload request of refusing this file block.As the node number k damaging in computer cluster fwhile exceeding the reconstruct critical parameters value k of setting, server will be determined recovery policy in cluster according to the file block data volume size and the cluster interior nodes ruuning situation that are not resumed.Document management server will calculate the needed amount of calculation of all residue file blocks of reconstruct, and system, by the equally loaded principle according to amount of calculation, is carried out task distribution to recovery nodes.And the file block of recovery is deployed in cluster on memory node again.
Because recovery pattern of the present invention is by the data good user's reconstruct end of again uploading onto the server, thereby the privacy of uploading data, integrality, fail safe just become naturally as key issue, simultaneously, the malicious node existing in network may sabotage network operation agreement, the normal route of destroying node, causes network paralysis.Or upload and be with virulent file, cause the node of storage center to occur that data are unavailable.Therefore, guarantee privacy, integrality and the fail safe of reconstruct data, finding and isolating the measures such as dangerous node, the refusal client upload file block different from original piecemeal is a link of this kind of recovery mode method equally.While realization, first client calculates the cryptographic Hash of the file block having reconstructed, by the cryptographic Hash end of uploading onto the server, due to the cryptographic Hash storehouse of having stored in server end when the initial file storage, compare with the cryptographic Hash of the file block of the corresponding loss in storehouse, if the cryptographic Hash of the file block that discovery client reconstructs is identical with the cryptographic Hash of losing file block, allow so this file block of client upload, if the cryptographic Hash of the file block that client reconstructs is different from the cryptographic Hash of the original piecemeal of corresponding loss, the file block that reconstruct is described is incorrect, or these data are maliciously tampered, do not receive this file block.For significant data, when file block is uploaded when complete, internal system need to be carried out secondary detection to uploading complete file block, server end calculates the cryptographic Hash of this upload file piecemeal again, and again compare with the cryptographic Hash of the original piecemeal of corresponding loss, detect it whether in upload procedure, be subject to malicious attack or distort, guarantee the reliable and secure of client data.
Owing to adopting client reset mode, avoided cluster inside to occur the IN CAST phenomenon of data, on cluster outlet linking point, system only need to the traffic of losing file block equivalent, system just can obtain complete original.Thereby avoid the seriously problem of reduction concurrent transmission ability of IN CAST problem based on cluster internal network, saved Internet resources and computational resource in cloud storage data.
More than introduced a kind of self-adapting data storage and reconstructing method for coding redundancy storage system, built large data storage center, sensor network aspect has good using value.The present invention is not limited to above embodiment, and any technical solution of the present invention that do not depart from only carries out to it improvement or change that those of ordinary skills know, within all belonging to protection scope of the present invention.

Claims (8)

1. the self-adapting data storage means for coding redundancy storage system, described coding redundancy storage system comprises server end and client, described user end to server end proposes file storage demand, it is characterized in that, the described self-adapting data storage means for coding redundancy storage system comprises the steps:
(1) described client is calculated the cryptographic Hash of file to be stored, by the cryptographic Hash of the file to be stored end that uploads onto the server;
(2) cryptographic Hash of the file of the cryptographic Hash of file to be stored and server end having been stored is compared;
(3) if there is identical cryptographic Hash, server end is not accepted uploading of described file to be stored, if there is not identical cryptographic Hash, server end is accepted uploading of described file to be stored, the file of uploading is carried out to piecemeal, calculate and store the cryptographic Hash of each file block, each file block is encoded and produced checking data piecemeal.
2. the self-adapting data storage means for coding redundancy storage system as claimed in claim 1, it is characterized in that: before described step (1) client is calculated the cryptographic Hash of file to be stored, whether server end query user end is ready to calculate the cryptographic Hash of file to be stored, if client is unwilling to calculate the cryptographic Hash of described file to be stored,, by direct file to be stored upload server end, calculated the cryptographic Hash of described file to be stored by server end.
3. the self-adapting data storage means for coding redundancy storage system as claimed in claim 2, is characterized in that: described step (3), if there is identical cryptographic Hash, is deleted the file to be stored of having uploaded.
4. the self-adapting data storage means for coding redundancy storage system as claimed in claim 1, it is characterized in that: when occurring identical cryptographic Hash, when server end is not accepted uploading of described file to be stored, server end is accepted the fileinfo about storage file of client upload, and sets up the associated of described fileinfo and the file with identical cryptographic Hash of storing.
5. for a self-adapting data reconstructing method for coding redundancy storage system, it is characterized in that, the described self-adapting data reconstructing method for coding redundancy storage system comprises the steps:
(1) detect in real time coding redundancy storage system service device end and whether have memory node damage;
(2) in the time having memory node damage, the cryptographic Hash corresponding to file block of damage memory node is labeled as lost condition by system, and judge whether the memory node number of damage is greater than the set point of system;
(3) in the time that the memory node number of damage is greater than the set point of system, coding redundancy storage system service device end utilizes internal calculation resource to be reconstructed the file block of losing, in the time that the memory node number of damage is not more than the set point of system, enter step (4);
(4) whether real-time judge has client to propose file reading request;
(5), if there is client to propose file reading request, judge whether file to be read has file block to lose;
(6) if file to be read has file block to lose, judge whether the hardware resource of client meets the hardware performance requirement of setting;
(7), if the hardware resource of client does not meet the hardware performance requirement of setting, coding redundancy storage system service device end utilizes internal calculation resource to be reconstructed the file block of losing, and the file block recovering is offered to client; If the hardware resource of client meets the hardware performance requirement of setting, enter step (8);
(8) client downloads associated documents piecemeal and the checking data piecemeal that can meet recovery loss file block, recover the file block of losing.
6. the self-adapting data reconstructing method for coding redundancy storage system as claimed in claim 5, it is characterized in that, described step (7) is if the hardware resource of client meets the hardware performance requirement of setting, the recovery that whether query user's end is ready to participate in losing file block, if client is ready to participate in losing the recovery of file block, enter step (8), if client is unwilling to participate in losing the recovery of file block, coding redundancy storage system service device end utilizes internal calculation resource to be reconstructed the file block of losing.
7. the self-adapting data reconstructing method for coding redundancy storage system as claimed in claim 5, is characterized in that, the described self-adapting data reconstructing method for coding redundancy storage system also comprises the steps:
After the file block of losing is recovered, calculate the cryptographic Hash of the file block of reconstruct;
By the cryptographic Hash of the described reconstruct file block end that uploads onto the server;
Whether the cryptographic Hash that server end compares reconstruct file block is identical with the cryptographic Hash of the original piecemeal of corresponding loss;
If the cryptographic Hash of reconstruct file block is identical with the cryptographic Hash of the original piecemeal of corresponding loss, allow the file block of client upload reconstruct;
The file block of reconstruct described in client upload;
Server end calculates the cryptographic Hash of described upload file piecemeal, and compares with the cryptographic Hash of the original piecemeal of corresponding loss;
If when the cryptographic Hash of the cryptographic Hash of upload file piecemeal and the original piecemeal of corresponding loss is identical, the file block that server end is uploaded described in preserving, and cryptographic Hash corresponding to described upload file piecemeal is set to available.
8. the self-adapting data reconstructing method for coding redundancy storage system as described in as arbitrary in claim 5 to 7, is characterized in that, within the scope of the disaster tolerance that the set point of described system allows at coding.
CN201410175898.6A 2014-04-28 2014-04-28 Self-adaptation data storage and reconstruction method for coding redundancy storage system Pending CN103916483A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410175898.6A CN103916483A (en) 2014-04-28 2014-04-28 Self-adaptation data storage and reconstruction method for coding redundancy storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410175898.6A CN103916483A (en) 2014-04-28 2014-04-28 Self-adaptation data storage and reconstruction method for coding redundancy storage system

Publications (1)

Publication Number Publication Date
CN103916483A true CN103916483A (en) 2014-07-09

Family

ID=51041881

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410175898.6A Pending CN103916483A (en) 2014-04-28 2014-04-28 Self-adaptation data storage and reconstruction method for coding redundancy storage system

Country Status (1)

Country Link
CN (1) CN103916483A (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105306570A (en) * 2015-10-27 2016-02-03 创新科软件技术(深圳)有限公司 Storage method of cluster data
CN105630418A (en) * 2015-12-24 2016-06-01 曙光信息产业(北京)有限公司 Data storage method and device
CN105787107A (en) * 2016-03-22 2016-07-20 南京工程学院 Big data redundancy detection method
CN107026912A (en) * 2017-05-12 2017-08-08 成都优孚达信息技术有限公司 Embedded communication equipment data transmission method
CN107153588A (en) * 2017-05-12 2017-09-12 成都优孚达信息技术有限公司 data encoding storage method
CN107278366A (en) * 2017-05-27 2017-10-20 福建联迪商用设备有限公司 A kind of method for down loading and its download system for improving download efficiency
CN107357677A (en) * 2017-06-24 2017-11-17 山东超越数控电子有限公司 A kind of data redundancy storage methods of GlusterFS based on correcting and eleting codes
CN108958973A (en) * 2018-06-27 2018-12-07 清华大学 Distributed file system correcting and eleting codes data reconstruction memory node selection method and device
CN109213637A (en) * 2018-11-09 2019-01-15 浪潮电子信息产业股份有限公司 Data reconstruction method, device and the medium of distributed file system clustered node
CN109241023A (en) * 2018-09-21 2019-01-18 郑州云海信息技术有限公司 Distributed memory system date storage method, device, system and storage medium
CN110019053A (en) * 2017-11-02 2019-07-16 福建天晴数码有限公司 A kind of Unity3D resource redundancy packet data detection method and terminal
CN110727640A (en) * 2019-09-11 2020-01-24 国云科技股份有限公司 Lightweight non-master-slave distributed routing file query storage system and method
CN111367876A (en) * 2020-03-04 2020-07-03 中国科学院成都生物研究所 Distributed file management method based on memory metadata
CN114817230A (en) * 2022-06-29 2022-07-29 深圳市乐易网络股份有限公司 Data stream filtering method and system
CN114915624A (en) * 2022-07-13 2022-08-16 飞狐信息技术(天津)有限公司 File processing method and system and electronic equipment
CN117056149A (en) * 2023-10-08 2023-11-14 飞腾信息技术有限公司 Memory testing method and device, computing equipment and storage medium
CN117667834A (en) * 2024-01-31 2024-03-08 苏州元脑智能科技有限公司 Memory decoupling system, data processing method and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101706825A (en) * 2009-12-10 2010-05-12 华中科技大学 Replicated data deleting method based on file content types
CN103098035A (en) * 2010-08-31 2013-05-08 日本电气株式会社 Storage system
CN103246730A (en) * 2013-05-08 2013-08-14 网易(杭州)网络有限公司 File storage method and device and file sensing method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101706825A (en) * 2009-12-10 2010-05-12 华中科技大学 Replicated data deleting method based on file content types
CN103098035A (en) * 2010-08-31 2013-05-08 日本电气株式会社 Storage system
CN103246730A (en) * 2013-05-08 2013-08-14 网易(杭州)网络有限公司 File storage method and device and file sensing method and device

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105306570A (en) * 2015-10-27 2016-02-03 创新科软件技术(深圳)有限公司 Storage method of cluster data
CN105306570B (en) * 2015-10-27 2018-07-20 创新科软件技术(深圳)有限公司 A kind of storage method of company-data
CN105630418A (en) * 2015-12-24 2016-06-01 曙光信息产业(北京)有限公司 Data storage method and device
CN105787107A (en) * 2016-03-22 2016-07-20 南京工程学院 Big data redundancy detection method
CN105787107B (en) * 2016-03-22 2018-10-30 南京工程学院 A kind of big data redundant detecting method
CN107026912A (en) * 2017-05-12 2017-08-08 成都优孚达信息技术有限公司 Embedded communication equipment data transmission method
CN107153588A (en) * 2017-05-12 2017-09-12 成都优孚达信息技术有限公司 data encoding storage method
CN107278366A (en) * 2017-05-27 2017-10-20 福建联迪商用设备有限公司 A kind of method for down loading and its download system for improving download efficiency
CN107357677A (en) * 2017-06-24 2017-11-17 山东超越数控电子有限公司 A kind of data redundancy storage methods of GlusterFS based on correcting and eleting codes
CN107357677B (en) * 2017-06-24 2020-09-08 山东超越数控电子股份有限公司 Data redundancy storage method of GlusterFS based on erasure codes
CN110019053A (en) * 2017-11-02 2019-07-16 福建天晴数码有限公司 A kind of Unity3D resource redundancy packet data detection method and terminal
CN110019053B (en) * 2017-11-02 2022-04-01 福建天晴数码有限公司 Method and terminal for detecting redundant data of Unity3D resource packet
CN108958973B (en) * 2018-06-27 2020-07-07 清华大学 Distributed file system erasure code data reconstruction storage node selection method and device
CN108958973A (en) * 2018-06-27 2018-12-07 清华大学 Distributed file system correcting and eleting codes data reconstruction memory node selection method and device
CN109241023A (en) * 2018-09-21 2019-01-18 郑州云海信息技术有限公司 Distributed memory system date storage method, device, system and storage medium
CN109213637A (en) * 2018-11-09 2019-01-15 浪潮电子信息产业股份有限公司 Data reconstruction method, device and the medium of distributed file system clustered node
CN109213637B (en) * 2018-11-09 2022-03-04 浪潮电子信息产业股份有限公司 Data recovery method, device and medium for cluster nodes of distributed file system
CN110727640A (en) * 2019-09-11 2020-01-24 国云科技股份有限公司 Lightweight non-master-slave distributed routing file query storage system and method
CN111367876B (en) * 2020-03-04 2023-09-19 中国科学院成都生物研究所 Distributed file management method based on memory metadata
CN111367876A (en) * 2020-03-04 2020-07-03 中国科学院成都生物研究所 Distributed file management method based on memory metadata
CN114817230A (en) * 2022-06-29 2022-07-29 深圳市乐易网络股份有限公司 Data stream filtering method and system
CN114915624B (en) * 2022-07-13 2022-12-13 飞狐信息技术(天津)有限公司 File processing method and system and electronic equipment
CN114915624A (en) * 2022-07-13 2022-08-16 飞狐信息技术(天津)有限公司 File processing method and system and electronic equipment
CN117056149A (en) * 2023-10-08 2023-11-14 飞腾信息技术有限公司 Memory testing method and device, computing equipment and storage medium
CN117056149B (en) * 2023-10-08 2024-02-02 飞腾信息技术有限公司 Memory testing method and device, computing equipment and storage medium
CN117667834A (en) * 2024-01-31 2024-03-08 苏州元脑智能科技有限公司 Memory decoupling system, data processing method and storage medium
CN117667834B (en) * 2024-01-31 2024-04-30 苏州元脑智能科技有限公司 Memory decoupling system, data processing method and storage medium

Similar Documents

Publication Publication Date Title
CN103916483A (en) Self-adaptation data storage and reconstruction method for coding redundancy storage system
CN106534273B (en) Block chain metadata storage system and storage method and retrieval method thereof
Perard et al. Erasure code-based low storage blockchain node
CN104410692B (en) A kind of method and system uploaded for duplicate file
US10073645B2 (en) Initiating rebuild actions from DS processing unit errors
KR102412024B1 (en) Indexing and recovery of encoded blockchain data
KR20200074912A (en) Change of primary node in distributed system
CN107220559B (en) Encryption storage method for non-tamperable file
EP3739493B1 (en) File verification method, file verification system and file verification server
CN113826354A (en) Error correction code based block chain data storage
US20130282653A1 (en) Initializing replication in a virtual machine
CN104661042A (en) Method, device and system for transmitting transport stream
CN104899499A (en) Internet image search based Web verification code generation method
CN112131609A (en) Merkle tree-based electric energy quality data exchange format file integrity verification method and system
CN104967660B (en) A kind of network performance method for improving towards more cloud frameworks
CN109117292B (en) Cluster storage method and device and cluster storage system
US20220318092A1 (en) Maintaining Storage of Data Slices in Accordance with a Slice Reduction Scheme
Gokulakrishnan et al. Data integrity and recovery management in cloud systems
CN106027638A (en) Hadoop data distribution method based on hybrid coding
CN103729269A (en) Cloud architecture-based network test data double-cache method
US10402262B1 (en) Fencing for zipheader corruption for inline compression feature system and method
US20150331752A1 (en) Method of data storage on cloud data center for reducing processing and storage requirements by engaging user equipment
US9971802B2 (en) Audit record transformation in a dispersed storage network
US10592336B1 (en) Layered indexing for asynchronous retrieval of redundancy coded data
US20200285404A1 (en) Split-n and composable splits in a dispersed lockless concurrent index

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20140709

RJ01 Rejection of invention patent application after publication