Disclosure of Invention
The present invention provides a method and a storage structure for rapidly verifying consistency of multiple copies in distributed storage, aiming at providing a method for improving consistency detection speed, reducing bandwidth consumption of a storage host, and accelerating data verification speed.
The technical scheme adopted by the invention for solving the technical problem is as follows:
a method for rapidly checking consistency of a plurality of copies of distributed storage, wherein the distributed storage adopts a processing architecture of a control host and a storage host, and the method comprises the following steps:
A. uniformly dividing a stored file into a plurality of data segments in advance, wherein each data segment is provided with a first hash value which corresponds to the data segment independently, and a flag bit for indicating whether the corresponding first hash value is expired or not;
B. when a write request is received, calculating a corresponding zone bit according to the offset and the length of the write request, and setting the zone bit to be overdue;
C. and screening out expired zone bits, and calculating a second hash value of the whole file according to the first hash value of each data segment after updating the first hash value corresponding to the zone bits.
According to the method for rapidly checking consistency of the distributed storage multi-copy, the first hash value and the zone bit are saved by adopting an additional new file.
In the method for rapidly checking consistency of the distributed storage multi-copy, during initialization, both a first hash value and a flag bit are set to be 0; and setting the first hash value corresponding to the unwritten data segment as 0.
The method for rapidly checking consistency of multiple copies of distributed storage comprises the following specific steps:
a1, dividing a stored file into a plurality of data segments in advance, wherein the size of each data segment is 4M, and performing initialization setting;
a2, each data segment is provided with a separately corresponding first hash value and a flag bit for indicating whether the corresponding first hash value is expired.
The method for rapidly checking consistency of multiple copies of distributed storage comprises the following specific steps:
b1, when a write request is received, calculating a corresponding zone bit according to the offset and the length of the write request;
b2, and setting the flag bit from 0 to 1, indicating that the flag bit has expired.
The method for rapidly checking consistency of multiple copies of distributed storage comprises the following specific steps:
c1, screening out expired zone bits, and calculating a new first hash value of the expired zone bits;
c2, judging whether a flag bit is set to be expired during the calculation of the new first hash value, if so, executing the step C1, and if not, writing the new first hash value into the storage host;
and C3, calculating a second hash value of the whole file according to the first hash value of each data segment.
The method for rapidly checking consistency of multiple copies of distributed storage comprises the following steps of: initializing the flag bit to 0 in the memory of the control host, and determining whether there is a flag bit set to 1 during the calculation of the new first hash value, if so, executing step C1, otherwise, writing the new first hash value into the storage host.
A storage fabric, wherein the storage fabric employs a control host-storage host processing architecture;
the control host is built with a virtual disk and is used for managing the life cycle of the virtual disk and completing the functions of receiving, caching and forwarding data;
the storage host consists of a plurality of storage media and is used for storing redundant data;
the storage structure stores a computer program, and the computer program realizes the steps of the method for rapidly checking consistency of distributed storage multiple copies when being executed by a control host.
The invention has the beneficial effects that: the invention provides a method for rapidly checking consistency of distributed storage multiple copies and a storage structure, wherein a large file is divided into a plurality of data segments, the hash value of the file is calculated by the data segments, and the hash value of the whole file is calculated by the hash value of each data segment; by the method, the hash value of the corresponding data segment is updated only by recording which data segments are modified, so that the data of the whole file is prevented from being read when the consistency is checked, the consistency checking speed is greatly improved, and the consumption of the bandwidth of the storage host is reduced; and the hash value is calculated by the data segment, so that concurrent calculation can be realized more easily when the system is idle, and the data verification speed is greatly accelerated.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
An embodiment of the present invention provides a method for quickly checking consistency of multiple copies of distributed storage, please refer to fig. 1 to 4, which are shown in the figure by using a processing architecture of a control host and a storage host.
The method specifically comprises the following steps:
s100, uniformly dividing a stored file into a plurality of data segments in advance, wherein each data segment is provided with a first hash value which corresponds to the data segment individually, and a flag bit which is used for indicating whether the corresponding first hash value is expired is arranged.
S101, dividing a stored file into a plurality of data segments in advance, wherein the size of each data segment is 4M, and carrying out initialization setting.
S102, each data segment is provided with a first hash value which corresponds to the data segment independently, and a flag bit which is used for indicating whether the corresponding first hash value is expired is arranged.
In the embodiment of the present invention, the size of the large file is assumed to be 100G. Each data segment is 4M in size, and is divided into 25600 data segments in total, if the crc32 hash algorithm is adopted, each data segment needs to consume 4B to store the first hash value, and the whole file of the first hash value needs 100K to store the first hash value. Each data segment also needs a 1-bit flag bit to indicate whether the first hash value is expired, and the whole flag bit file needs 3200B to store the flag bit. The storage overhead consumed by the first hash value and the flag bit is (100K +3200B)/100G ≈ 0.0001%.
And the flag bit of the first hash value needs to be loaded into the memory, so that the judgment is accelerated, and as can be seen from the above, the memory space occupied by the flag bit required by the 100G file is less than 4K.
S200, when a write request is received, calculating a corresponding zone bit according to the offset and the length of the write request, and setting the zone bit to be overdue.
S201, when a write request is received, calculating a corresponding zone bit according to the offset and the length of the write request.
S202, setting the flag bit from 0 to 1, and indicating that the flag bit has expired.
In the embodiment of the invention, during initialization, both the first hash value and the flag bit are set to be 0; and setting the first hash value corresponding to the unwritten data segment as 0.
When a write request exists, calculating which flag bits are related to the current write request according to the offset and the length of the write request, if the flag bits are 0, setting the flag bits to be 1, indicating that the first hash value of the corresponding data segment is expired, and updating the flag bits next time; if the zone bit is modified, the zone bit needs to be written into the storage host, and the latest zone bit is ensured not to be lost due to abnormal conditions such as power failure and the like.
S300, screening out expired zone bits, and calculating a second hash value of the whole file according to the first hash value of each data segment after updating the first hash value corresponding to the zone bits.
S301, screening out expired zone bits, and calculating a new first hash value of the expired zone bits.
S302, judging whether a flag bit is set to be expired or not during the calculation of the new first hash value, if so, executing the step S301, and otherwise, writing the new first hash value into the storage host.
And S303, calculating a second hash value of the whole file according to the first hash value of each data segment.
The step S302 specifically includes:
initializing the flag bit to 0 in the memory of the control host, and determining whether there is a flag bit set to 1 during the calculation of the new first hash value, if so, executing step C1, otherwise, writing the new first hash value into the storage host.
In the embodiment of the invention, when the first hash value of the data segment needs to be calculated, the expired flag bits of the data segments are firstly judged, then the first hash value of the corresponding data segment is updated, for the data segments of which the flag bits are not set, the first hash value can be ensured to be the newest without updating the first hash value, and then the hash value of the whole file is calculated according to the first hash values of all the data segments.
Before a new first hash value is written into the storage host, it needs to be determined whether there is a write request to modify the data segment during the first hash value calculation of the data segment.
Specifically, the flag bit is set to 0 in the memory, the flag bit is not updated to the storage host, then the first hash value is calculated, and then whether the flag bit in the memory is set and modified is determined, if the flag bit in the memory is set and modified, it indicates that a write request is made to modify the data segment during the calculation of the new first hash value, and the first hash value is expired and is not necessarily written into the storage host.
Further, the first hash value and the flag bit are saved by adopting an additional new file.
In addition, according to the method for rapidly checking consistency of the distributed storage multiple copies, the invention also provides a storage structure, wherein the storage structure adopts a processing architecture of a control host and a storage host.
The control host is built with a virtual disk and is used for managing the life cycle of the virtual disk and completing the functions of receiving, caching and forwarding data;
the storage host consists of a plurality of storage media and is used for storing redundant data; in the distributed storage system, the storage resources are abstracted into a plurality of storage components at the final storage place of the data, and each component consists of a large-scale sparse file chain.
The storage structure stores a computer program, and the computer program realizes the steps of the method for rapidly checking consistency of distributed storage multiple copies when being executed by a control host.
In summary, the present invention discloses a method and a storage structure for rapidly checking consistency of multiple copies in distributed storage, which adopts a processing architecture of a control host and a storage host, and includes: uniformly dividing a stored file into a plurality of data segments in advance, wherein each data segment is provided with a first hash value which corresponds to the data segment independently, and a flag bit for indicating whether the corresponding first hash value is expired or not; when a write request is received, calculating a corresponding zone bit according to the offset and the length of the write request, and setting the zone bit to be overdue; and screening out expired zone bits, and calculating a second hash value of the whole file according to the first hash value of each data segment after updating the first hash value corresponding to the zone bits. The invention provides a method for rapidly checking consistency of distributed storage multiple copies and a storage structure, wherein a large file is divided into a plurality of data segments, the hash value of the file is calculated by the data segments, and the hash value of the whole file is calculated by the hash value of each data segment; by the method, the hash value of the corresponding data segment is updated only by recording which data segments are modified, so that the data of the whole file is prevented from being read when the consistency is checked, the consistency checking speed is greatly improved, and the consumption of the bandwidth of the storage host is reduced; and the hash value is calculated by the data segment, so that concurrent calculation can be realized more easily when the system is idle, and the data verification speed is greatly accelerated.
It is to be understood that the invention is not limited to the examples described above, but that modifications and variations may be effected thereto by those of ordinary skill in the art in light of the foregoing description, and that all such modifications and variations are intended to be within the scope of the invention as defined by the appended claims.