CN107632781A - A kind of method and storage architecture of the more copy rapid verification uniformity of distributed storage - Google Patents

A kind of method and storage architecture of the more copy rapid verification uniformity of distributed storage Download PDF

Info

Publication number
CN107632781A
CN107632781A CN201710748653.1A CN201710748653A CN107632781A CN 107632781 A CN107632781 A CN 107632781A CN 201710748653 A CN201710748653 A CN 201710748653A CN 107632781 A CN107632781 A CN 107632781A
Authority
CN
China
Prior art keywords
cryptographic hash
flag bit
storage
expired
data segment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710748653.1A
Other languages
Chinese (zh)
Other versions
CN107632781B (en
Inventor
陈仲涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Lianyungang Technology Co ltd
Original Assignee
SHENZHEN YUNSHU NETWORK TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHENZHEN YUNSHU NETWORK TECHNOLOGY Co Ltd filed Critical SHENZHEN YUNSHU NETWORK TECHNOLOGY Co Ltd
Priority to CN201710748653.1A priority Critical patent/CN107632781B/en
Publication of CN107632781A publication Critical patent/CN107632781A/en
Application granted granted Critical
Publication of CN107632781B publication Critical patent/CN107632781B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses the method and storage architecture of a kind of more copy rapid verification uniformity of distributed storage, using the processing framework of control main frame storage host, including:The file of storage is evenly dividing in advance and is respectively arranged with individually corresponding first cryptographic Hash for some data segments, each data segment, and is provided with the flag bit whether expired for representing corresponding first cryptographic Hash;When receiving write request, according to the offset and length of write request, corresponding flag bit is calculated, and the flag bit is arranged to expired;Expired flag bit is filtered out, after updating the first cryptographic Hash corresponding to the flag bit, the second cryptographic Hash of whole file is calculated according to the first cryptographic Hash of each data segment.One big file is divided into multiple data segments by the present invention, the cryptographic Hash of calculation document is segmented, then the cryptographic Hash of whole file is calculated by each section of cryptographic Hash, avoids the data for reading whole file, so as to improve consistency detection speed, the bandwidth consumption of storage host is reduced.

Description

A kind of method and storage architecture of the more copy rapid verification uniformity of distributed storage
Technical field
The present invention relates to technical field of information storage, more particularly to a kind of more copy rapid verifications one of distributed storage The method and storage architecture of cause property.
Background technology
With the arriving of information age, global metadata amount is in the trend of explosive growth.Improving storage system can By property and ensure that availability of data has turned into the research emphasis of enterprise.In existing distributed memory system, the overwhelming majority is By multi-duplicate technology come lifting system reliability, availability, performance and scalability.But distributed memory system is all logical Cross network service, it is inconsistent that the unstability of network easily causes Back end data, and distributed memory system generally comprise compared with More server hosts and number of disks, the probability of hardware damage are also higher.
If being unable to the uniformity of quick detection copy, the data integrity and high availability of distributed memory system are with regard to big It is big to reduce.Existing verification coherence method is mainly the cryptographic Hash of calculation document, and the cryptographic Hash for contrasting multiple wave files is It is no that unanimously to judge file, whether data are consistent.
But if for mass file, substantial amounts of CPU and storage host bandwidth will be consumed by calculating cryptographic Hash, be had a strong impact on and be The performance of system.And the inconsistent position of file is often fewer, but calculation document cryptographic Hash needs to read the interior of whole file Hold, cause the waste of huge resource.
Therefore, prior art has yet to be improved and developed.
The content of the invention
The technical problem to be solved in the present invention is, for the drawbacks described above of prior art, there is provided a kind of distributed storage The method and storage architecture of more copy rapid verification uniformity, it is desirable to provide one kind improves consistency detection speed, reduces simultaneously Storage host bandwidth consumption, and accelerate the method for data check speed.
The technical proposal for solving the technical problem of the invention is as follows:
A kind of method of the more copy rapid verification uniformity of distributed storage, the distributed storage use control main frame-storage The processing framework of main frame, the method comprising the steps of:
A, the file of storage is evenly dividing in advance and is respectively arranged with independent corresponding first for some data segments, each data segment Cryptographic Hash, and it is provided with the flag bit whether expired for representing corresponding first cryptographic Hash;
B, when receiving write request, according to the offset and length of write request, corresponding flag bit is calculated, and by the mark Will position is arranged to expired;
C, expired flag bit is filtered out, after updating the first cryptographic Hash corresponding to the flag bit, according to the first of each data segment Cryptographic Hash calculates the second cryptographic Hash of whole file.
The method of the more copy rapid verification uniformity of described distributed storage, wherein, first cryptographic Hash and mark Position is preserved using extra new files.
The method of the more copy rapid verification uniformity of described distributed storage, wherein, during initialization, the first cryptographic Hash with Flag bit is disposed as 0;And the first cryptographic Hash corresponding to the data segment not write is arranged to 0.
The method of the more copy rapid verification uniformity of described distributed storage, wherein, the step A is specifically included:
A1, the file of storage is divided into some data segments in advance, each data segment size is 4M, and carries out Initialize installation;
A2, each data segment are respectively arranged with individually corresponding first cryptographic Hash, and are provided with for representing corresponding first Hash Value whether expired flag bit.
The method of the more copy rapid verification uniformity of described distributed storage, wherein, the step B is specifically included:
B1, when receiving write request, according to the offset and length of write request, calculate corresponding flag bit;
The flag bit is simultaneously arranged to 1 by B2 from 0, represents that the flag bit is out of date.
The method of the more copy rapid verification uniformity of described distributed storage, wherein, the step C is specifically included:
C1, expired flag bit is filtered out, calculate the first new cryptographic Hash of expired flag bit;
C2, judge during the first new cryptographic Hash is calculated, if there is flag bit to be arranged to expired, if then performing step C1, if the first new cryptographic Hash otherwise is write into storage host;
C3, the second cryptographic Hash for calculating according to the first cryptographic Hash of each data segment whole file.
The method of the more copy rapid verification uniformity of described distributed storage, wherein, the step C2 is specially:Controlling Flag bit is initialized as 0 in the internal memory of main frame processed, judged during the first new cryptographic Hash is calculated, if having flag bit to be set 1 is set to, if then performing step C1, if the first new cryptographic Hash otherwise is write into storage host.
A kind of storage architecture, wherein, the storage architecture uses the processing framework of control main frame-storage host;
Be built with virtual disk in the control main frame, and for managing the life cycle of virtual disk, complete data reception, Caching, forwarding capability;
The storage host is made up of multiple storage mediums, the storage for redundant data;
Computer program is stored with the storage architecture, the computer program realizes any of the above-described when being performed by control main frame The step of method of the more copy rapid verification uniformity of described distributed storage.
Beneficial effects of the present invention:The present invention provide a kind of more copy rapid verification uniformity of distributed storage method and Storage architecture, by the way that one big file is divided into some data segments, the cryptographic Hash of divided data section calculation document, then by each The cryptographic Hash of data segment calculates the cryptographic Hash of whole file;By the above method, only it need to record which data segment is changed, then The cryptographic Hash of corresponding data section is updated, the data of whole file need to be read when avoiding verification uniformity, so as to carry significantly Speed is examined in high uniformity school, reduces the consumption of storage host bandwidth;And divided data section calculates cryptographic Hash, in the system free time Easier concurrent can be realized, greatly accelerate the speed of data check.
Brief description of the drawings
Fig. 1 is a kind of flow of the method preferred embodiment of the more copy rapid verification uniformity of distributed storage of the present invention Figure.
Fig. 2 is a kind of theory diagram of storage architecture preferred embodiment of the present invention.
Fig. 3 is a kind of divided data of the method preferred embodiment of the more copy rapid verification uniformity of distributed storage of the present invention The first cryptographic Hash schematic diagram of section.
Fig. 4 is a kind of the updated of method preferred embodiment of the more copy rapid verification uniformity of distributed storage of the present invention Phase the first cryptographic Hash flow chart.
Embodiment
To make the objects, technical solutions and advantages of the present invention clearer, clear and definite, develop simultaneously embodiment pair referring to the drawings The present invention is further described.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and do not have to It is of the invention in limiting.
The embodiments of the invention provide a kind of method of the more copy rapid verification uniformity of distributed storage, refer to Fig. 1- 4, as illustrated, the processing framework by using control main frame-storage host.
Specifically comprise the following steps:
S100, the file of storage is evenly dividing as some data segments in advance, each data segment be respectively arranged with individually corresponding to First cryptographic Hash, and it is provided with the flag bit whether expired for representing corresponding first cryptographic Hash.
S101, the file of storage is divided into some data segments in advance, each data segment size is 4M, and is initialized Set.
S102, each data segment be respectively arranged with individually corresponding to the first cryptographic Hash, and be provided with for representing corresponding the The whether expired flag bit of one cryptographic Hash.
In the embodiment of the present invention, it is assumed that the size of big file is 100G.Each data segment size is 4M, is divided into altogether 25600 data segments, if using crc32 hash algorithms, each data segment needs to consume 4B to store the first cryptographic Hash, entirely The file of first cryptographic Hash needs 100K to store the first cryptographic Hash.Each data segment section also needs to 1 bit flag position to represent the Whether one cryptographic Hash is expired, and the file of whole flag bit needs 3200B to carry out storage flag.Above-mentioned first cryptographic Hash and flag bit The storage overhead of consumption is(100K+3200B)/ 100G ≈ 0.0001%.
The flag bit of first cryptographic Hash needs to be loaded into internal memory, accelerates to judge, from the foregoing, needed for 100G file Mark bit occupancy memory headroom less than 4K.
S200, when receiving write request, according to the offset and length of write request, calculate corresponding flag bit, and The flag bit is arranged to expired.
S201, when receiving write request, according to the offset and length of write request, calculate corresponding flag bit.
The flag bit is simultaneously arranged to 1 by S202 from 0, represents that the flag bit is out of date.
In the embodiment of the present invention, during initialization, the first cryptographic Hash is disposed as 0 with flag bit;And the number that will do not write 0 is arranged to according to the first cryptographic Hash corresponding to section.
When there is write request, according to the offset and length of write request, calculate which flag bit current write request is related to, such as Fruit flag bit is 0, it is necessary to flag bit is arranged to 1, shows that the first cryptographic Hash of corresponding data section is expired, next time needs more Newly;If have modified flag bit, it is necessary to which flag bit is write storage host, ensure that newest flag bit will not be different because of power-off etc. Reason condition causes to lose.
S300, expired flag bit is filtered out, after updating the first cryptographic Hash corresponding to the flag bit, according to each data segment The first cryptographic Hash calculate the second cryptographic Hash of whole file.
S301, expired flag bit is filtered out, calculate the first new cryptographic Hash of expired flag bit.
S302, judge during the first new cryptographic Hash is calculated, if there is flag bit to be arranged to expired, if then performing Step C1, if the first new cryptographic Hash otherwise is write into storage host.
S303, the second cryptographic Hash for calculating according to the first cryptographic Hash of each data segment whole file.
The step S302 is specially:
Flag bit is initialized as 0 in the internal memory of control main frame, judged during the first new cryptographic Hash is calculated, if having mark Will position is arranged to 1, if then performing step C1, if the first new cryptographic Hash otherwise is write into storage host.
In the embodiment of the present invention, when needing to calculate the first cryptographic Hash of data segment, the expired mark of which data segment first judged Will position is set, and then updates the first cryptographic Hash of corresponding data section, the data segment not being set for which flag bit can To ensure that the first cryptographic Hash is newest, it is not necessary to the first cryptographic Hash is updated, then according to the first cryptographic Hash of all data segments Calculate the cryptographic Hash of whole file.
, it is necessary to first judge to carry out the first cryptographic Hash calculating in the data segment before new the first cryptographic Hash write-in storage host Whether period has write request to change the data segment.
Specifically, flag bit is arranged into 0 in internal memory, storage host is not first updated, then calculates the first cryptographic Hash, Judge whether the flag bit in internal memory is set modification again, if modification is set, then it represents that the first new cryptographic Hash calculates Period has write request to change the data segment, then first cryptographic Hash or expired, need not write storage host.
Further, first cryptographic Hash is preserved with flag bit using extra new files.
In addition, according to a kind of method of the more copy rapid verification uniformity of distributed storage described above, the present invention also carries A kind of storage architecture is supplied, the storage architecture uses the processing framework of control main frame-storage host.
Virtual disk is built with the control main frame, and for managing the life cycle of virtual disk, completes data Receive, caching, forwarding capability;
The storage host is made up of multiple storage mediums, the storage for redundant data;In distributed memory system, data are most Whole storage place, is abstracted into multiple storage assemblies, each component is made up of large-scale sparse file chain by storage resource.
Computer program is stored with the storage architecture, the computer program realizes above-mentioned when being performed by control main frame The step of method of the more copy rapid verification uniformity of distributed storage described in one.
In summary, the invention discloses the method and storage rack of a kind of more copy rapid verification uniformity of distributed storage Structure, using the processing framework of control main frame-storage host, including:The file of storage is evenly dividing as some data segments in advance, Each data segment be respectively arranged with individually corresponding to the first cryptographic Hash, and be provided with for represent corresponding first cryptographic Hash whether mistake The flag bit of phase;When receiving write request, according to the offset and length of write request, corresponding flag bit is calculated, and will The flag bit is arranged to expired;Expired flag bit is filtered out, after updating the first cryptographic Hash corresponding to the flag bit, according to First cryptographic Hash of each data segment calculates the second cryptographic Hash of whole file.The present invention provides a kind of more copies of distributed storage The method and storage architecture of rapid verification uniformity, by the way that one big file is divided into some data segments, divided data section calculates The cryptographic Hash of file, then calculate by the cryptographic Hash of each data segment the cryptographic Hash of whole file;By the above method, only need Record which data segment is changed, then update the cryptographic Hash of corresponding data section, need to be read when avoiding verification uniformity whole The data of individual file, so as to greatly improve uniformity school inspection speed, reduce the consumption of storage host bandwidth;And divided data section Cryptographic Hash is calculated, easier concurrent can be realized in the system free time, greatly accelerate the speed of data check.
It should be appreciated that the application of the present invention is not limited to above-mentioned citing, for those of ordinary skills, can To be improved or converted according to the above description, all these modifications and variations should all belong to the guarantor of appended claims of the present invention Protect scope.

Claims (8)

1. a kind of method of the more copy rapid verification uniformity of distributed storage, the distributed storage is using control main frame-deposit Store up the processing framework of main frame, it is characterised in that methods described includes step:
A, the file of storage is evenly dividing in advance and is respectively arranged with independent corresponding first for some data segments, each data segment Cryptographic Hash, and it is provided with the flag bit whether expired for representing corresponding first cryptographic Hash;
B, when receiving write request, according to the offset and length of write request, corresponding flag bit is calculated, and by the mark Will position is arranged to expired;
C, expired flag bit is filtered out, after updating the first cryptographic Hash corresponding to the flag bit, according to the first of each data segment Cryptographic Hash calculates the second cryptographic Hash of whole file.
2. according to the method for claim 1, it is characterised in that first cryptographic Hash is with flag bit using extra newly-built text Part is preserved.
3. according to the method for claim 1, it is characterised in that during initialization, the first cryptographic Hash is disposed as with flag bit 0;And the first cryptographic Hash corresponding to the data segment not write is arranged to 0.
4. according to the method for claim 3, it is characterised in that the step A is specifically included:
A1, the file of storage is divided into some data segments in advance, each data segment size is 4M, and carries out Initialize installation;
A2, each data segment are respectively arranged with individually corresponding first cryptographic Hash, and are provided with for representing corresponding first Hash Value whether expired flag bit.
5. according to the method for claim 4, it is characterised in that the step B is specifically included:
B1, when receiving write request, according to the offset and length of write request, calculate corresponding flag bit;
The flag bit is simultaneously arranged to 1 by B2 from 0, represents that the flag bit is out of date.
6. according to the method for claim 5, it is characterised in that the step C is specifically included:
C1, expired flag bit is filtered out, calculate the first new cryptographic Hash of expired flag bit;
C2, judge during the first new cryptographic Hash is calculated, if there is flag bit to be arranged to expired, if then performing step C1, if the first new cryptographic Hash otherwise is write into storage host;
C3, the second cryptographic Hash for calculating according to the first cryptographic Hash of each data segment whole file.
7. according to the method for claim 6, it is characterised in that the step C2 is specially:In the internal memory of control main frame Flag bit is initialized as 0, judged during the first new cryptographic Hash is calculated, if having flag bit to be arranged to 1, if then holding Row step C1, if the first new cryptographic Hash otherwise is write into storage host.
8. a kind of storage architecture, it is characterised in that the storage architecture uses the processing framework of control main frame-storage host;
Be built with virtual disk in the control main frame, and for managing the life cycle of virtual disk, complete data reception, Caching, forwarding capability;
The storage host is made up of multiple storage mediums, the storage for redundant data;
Computer program is stored with the storage architecture, the computer program realizes claim 1- when being performed by control main frame The step of method of the more copy rapid verification uniformity of distributed storage described in 7 any one.
CN201710748653.1A 2017-08-28 2017-08-28 Method for rapidly checking consistency of distributed storage multi-copy and storage structure Expired - Fee Related CN107632781B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710748653.1A CN107632781B (en) 2017-08-28 2017-08-28 Method for rapidly checking consistency of distributed storage multi-copy and storage structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710748653.1A CN107632781B (en) 2017-08-28 2017-08-28 Method for rapidly checking consistency of distributed storage multi-copy and storage structure

Publications (2)

Publication Number Publication Date
CN107632781A true CN107632781A (en) 2018-01-26
CN107632781B CN107632781B (en) 2020-05-05

Family

ID=61100574

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710748653.1A Expired - Fee Related CN107632781B (en) 2017-08-28 2017-08-28 Method for rapidly checking consistency of distributed storage multi-copy and storage structure

Country Status (1)

Country Link
CN (1) CN107632781B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109271399A (en) * 2018-11-19 2019-01-25 武汉达梦数据库有限公司 A kind of method of calibration of database write-in log consistency
CN111382463A (en) * 2020-04-02 2020-07-07 中国工商银行股份有限公司 Block chain system and method based on stream data
CN112559547A (en) * 2020-12-24 2021-03-26 北京百度网讯科技有限公司 Method and device for determining consistency among multiple storage object copies
CN113779558A (en) * 2021-09-10 2021-12-10 中国电信集团系统集成有限责任公司 Construction method, installation method and device of application program installation package

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6970987B1 (en) * 2003-01-27 2005-11-29 Hewlett-Packard Development Company, L.P. Method for storing data in a geographically-diverse data-storing system providing cross-site redundancy
CN103546580A (en) * 2013-11-08 2014-01-29 北京邮电大学 File copy asynchronous writing method applied to distributed file system
CN103761162A (en) * 2014-01-11 2014-04-30 深圳清华大学研究院 Data backup method of distributed file system
CN104731792A (en) * 2013-12-19 2015-06-24 中国银联股份有限公司 Method and system for verifying database consistency and method and system for positioning data difference

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6970987B1 (en) * 2003-01-27 2005-11-29 Hewlett-Packard Development Company, L.P. Method for storing data in a geographically-diverse data-storing system providing cross-site redundancy
CN103546580A (en) * 2013-11-08 2014-01-29 北京邮电大学 File copy asynchronous writing method applied to distributed file system
CN104731792A (en) * 2013-12-19 2015-06-24 中国银联股份有限公司 Method and system for verifying database consistency and method and system for positioning data difference
CN103761162A (en) * 2014-01-11 2014-04-30 深圳清华大学研究院 Data backup method of distributed file system

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109271399A (en) * 2018-11-19 2019-01-25 武汉达梦数据库有限公司 A kind of method of calibration of database write-in log consistency
CN111382463A (en) * 2020-04-02 2020-07-07 中国工商银行股份有限公司 Block chain system and method based on stream data
CN112559547A (en) * 2020-12-24 2021-03-26 北京百度网讯科技有限公司 Method and device for determining consistency among multiple storage object copies
CN112559547B (en) * 2020-12-24 2023-09-19 北京百度网讯科技有限公司 Method and device for determining consistency among multiple storage object copies
CN113779558A (en) * 2021-09-10 2021-12-10 中国电信集团系统集成有限责任公司 Construction method, installation method and device of application program installation package

Also Published As

Publication number Publication date
CN107632781B (en) 2020-05-05

Similar Documents

Publication Publication Date Title
US9547591B1 (en) System and method for cache management
US9507732B1 (en) System and method for cache management
US8555019B2 (en) Using a migration cache to cache tracks during migration
US9128826B2 (en) Data storage architecuture and system for high performance computing hash on metadata in reference to storage request in nonvolatile memory (NVM) location
US9020893B2 (en) Asynchronous namespace maintenance
US8572337B1 (en) Systems and methods for performing live backups
US20180095996A1 (en) Database system utilizing forced memory aligned access
US8924353B1 (en) Systems and methods for copying database files
CN107632781A (en) A kind of method and storage architecture of the more copy rapid verification uniformity of distributed storage
CN103605630B (en) Virtual server system and data reading-writing method thereof
WO2019001521A1 (en) Data storage method, storage device, client and system
CN105117351A (en) Method and apparatus for writing data into cache
US20140195488A1 (en) Intelligent Selection of Replication Node for File Data Blocks in GPFS-SNC
JP2017126334A (en) Storage devices, operating methods thereof and systems
US8565545B1 (en) Systems and methods for restoring images
US8380962B2 (en) Systems and methods for efficient sequential logging on caching-enabled storage devices
US10545825B2 (en) Fault-tolerant enterprise object storage system for small objects
US8965855B1 (en) Systems and methods for hotspot mitigation in object-based file systems
US20150212847A1 (en) Apparatus and method for managing cache of virtual machine image file
US11010091B2 (en) Multi-tier storage
US11132128B2 (en) Systems and methods for data placement in container-based storage systems
US8281096B1 (en) Systems and methods for creating snapshots
US10089228B2 (en) I/O blender countermeasures
US11200219B2 (en) System and method for idempotent metadata destage in a storage cluster with delta log based architecture
US10063256B1 (en) Writing copies of objects in enterprise object storage systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20200512

Address after: 812, block B, phase I, Tianan Innovation Technology Plaza, No. 25, Tairan 4th Road, Tianan community, Shatou street, Futian District, Shenzhen City, Guangdong Province

Patentee after: Shenzhen Lianyungang Technology Co.,Ltd.

Address before: 518000, A902, room nine, building A, building 006, Industrial Research Institute, Nanshan New South Road, Nanshan District, Shenzhen, Guangdong

Patentee before: CLOUDSOAR NETWORKS Inc.

CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200505

Termination date: 20210828