CN105183399A - Data writing and reading method and device based on elastic block storage - Google Patents

Data writing and reading method and device based on elastic block storage Download PDF

Info

Publication number
CN105183399A
CN105183399A CN201510639347.5A CN201510639347A CN105183399A CN 105183399 A CN105183399 A CN 105183399A CN 201510639347 A CN201510639347 A CN 201510639347A CN 105183399 A CN105183399 A CN 105183399A
Authority
CN
China
Prior art keywords
data
cryptographic hash
module
virtual block
storage medium
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510639347.5A
Other languages
Chinese (zh)
Inventor
刘俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN201510639347.5A priority Critical patent/CN105183399A/en
Publication of CN105183399A publication Critical patent/CN105183399A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Storage Device Security (AREA)

Abstract

The embodiment of the invention discloses a data writing and reading method and device based on elastic block storage. The method comprises the steps of obtaining data to be processed from virtual block equipment, segmenting the data to be processed and calculating the Hash value of each segment of data, conducting deduplication on the data at the segments corresponding to the calculated Hash values, writing first data left after deduplication corresponding to the Hash values into a storage medium, and writing the first data in the storage medium into a storage server through a service process when the size of the first data in the storage medium reaches a second preset value. The data in the virtual block equipment are segmented, the Hash value of each segment of data is calculated, deduplication is conducted on the data according to the Hash values, data left after deduplication and the Hash values, logic addresses and physical addresses corresponding to the data are stored, access to data is achieved through the Hash values, redundant data removal is achieved, storage space is saved, and data searching efficiency is improved.

Description

A kind of data stored based on elastomer block are write, are read method and device
Technical field
The present invention relates to field of data storage, particularly a kind of data stored based on elastomer block are write, are read method and device.
Background technology
Elastomer block storage be a kind of based on storage networking, capacity can resilient expansion, carried out the original block storage volume equipment of management and by cloud main frame.It exists independent of the life cycle of main frame, any operating main frame can be connected to, for main frame provides the block stores service of persistence, and connection suspension can be removed at any time be loaded onto other main frames, support clone and the snapshot of hard disk, therefore the security of data is high, there is disk failure hardly and occurs the situation of loss of data, has that capacity is expanded as required, low cost, an advantage such as reliable and stable.
At present, data in elastomer block being stored are written in storage server, store logical address and the physical address of every segment data and every segment data, there is the data segment that data are identical in storage server, namely there is redundancy in data, causes the waste of storage space.
Summary of the invention
The object of the embodiment of the present invention is to provide a kind of data stored based on elastomer block to write, read method and device, to remove redundant data, avoids the waste of storage space.
For achieving the above object, the embodiment of the invention discloses a kind of data write method stored based on elastomer block, being pre-created at least one Virtual Block Device, described Virtual Block Device is that elastomer block stores, and stores pending data in described Virtual Block Device; Described method comprises:
From at least one Virtual Block Device described, obtain pending data;
Described pending data are carried out segmentation with the first preset value, and calculates the cryptographic hash of every segment data respectively;
The segment data duplicate removal corresponding to the cryptographic hash calculated, uniquely exists with the segment data making each cryptographic hash of calculating corresponding;
The first corresponding for remaining each cryptographic hash after duplicate removal data are written in the storage medium pre-set, described first data comprise: metadata and segment data, wherein, described metadata comprises: the logical address of the data segment that cryptographic hash, this cryptographic hash are corresponding and physical address and corresponding relation thereof; And by the remaining each cryptographic hash after duplicate removal, be defined as the index of data segment corresponding to this cryptographic hash;
When the data volume writing described first data in described storage medium often reaches the second preset value, described first data in described storage medium corresponding for the data volume often reaching described second preset value are written in storage server to make described service processes by notification service process.
Preferably, described storage medium can be solid state hard disc.
Wherein, described notification service process, to make described service processes be written in storage server by described first data in described storage medium corresponding for the data volume often reaching described second preset value, can comprise:
By store in described storage medium, table data write default file that described first data that the data volume that often reaches described second preset value is corresponding are corresponding;
When there is table data in default file, call service process, to make service processes that data corresponding for described table data are written to storage server, wherein, when not having table data in described default file, described service processes is in dormant state.
Preferably, described from least one Virtual Block Device described, before obtaining pending data, can also comprise:
Create the queue with different priorities, wherein, the corresponding queue of each Virtual Block Device;
Described the first corresponding for remaining each cryptographic hash after duplicate removal data are written in the storage medium pre-set, can comprise:
Write request corresponding in the storage medium pre-set described in each described first data being written to, is stored into and obtains in queue corresponding to Virtual Block Device corresponding to corresponding pending data;
According to queue priority order from high to low, successively described first data corresponding for the write request stored in queue are written in the storage medium pre-set.
Preferably, described from least one Virtual Block Device described, before obtaining pending data, can also comprise:
By at least one Virtual Block Device be pre-created, be divided into M Virtual Block Device group, in each Virtual Block Device group, comprise different Virtual Block Device;
Described the first corresponding for remaining each cryptographic hash after duplicate removal data are written in the storage medium pre-set before, can also comprise:
For each Virtual Block Device group, the segment data corresponding to the remaining cryptographic hash after the duplicate removal in this Virtual Block Device group carries out duplicate removal again, uniquely exists to make segment data corresponding to each cryptographic hash in each Virtual Block Device group.
Preferably, described the first corresponding for remaining each cryptographic hash after duplicate removal data are written in the storage medium pre-set, and/or, described first data in described storage medium corresponding for the data volume often reaching described second preset value are written to the writing mode in storage server by described service processes, can comprise:
The writing mode of content-based storage.
The embodiment of the invention also discloses a kind of data stored based on elastomer block and read method, comprising:
Receive the data read request of user for different virtual block device;
For each data read request, with the cryptographic hash of data segment for index, in storage medium, search the physical address corresponding with this data read request;
If found, then by segment data corresponding for the physical address found, feed back to user;
If do not found, then with the cryptographic hash of data segment for index, in storage server, search the physical address corresponding with this data read request, if found, then by segment data corresponding for the physical address found, feed back to user.
Wherein, the logical address of data to be read can be comprised in described data read request;
Described with the cryptographic hash of data segment for index, in storage medium, search the physical address corresponding with this data read request, can comprise:
According to the logical address of the data to be read that described data read request comprises, determine the cryptographic hash of the data segment that described data read request is corresponding;
With the cryptographic hash of determined data segment for the physical address corresponding with this data read request searched in index in storage medium.
The embodiment of the invention also discloses a kind of data read/write device stored based on elastomer block, be pre-created at least one Virtual Block Device, described Virtual Block Device is that elastomer block stores, and stores pending data in described Virtual Block Device; Described device comprises: obtain module, segmentation module, the first duplicate removal module, the first memory module and the second memory module, wherein,
Described acquisition module, for from least one Virtual Block Device described, obtains pending data;
Described segmentation module, carries out segmentation for the pending data described acquisition module obtained with the first preset value, and calculates the cryptographic hash of every segment data respectively;
Described first duplicate removal module, the segment data duplicate removal that the cryptographic hash for calculating described segmentation module is corresponding, uniquely exists with the segment data making each cryptographic hash of calculating corresponding;
Described first memory module, for the first corresponding for the remaining each cryptographic hash after duplicate removal data are written in the storage medium pre-set, described first data comprise: metadata and segment data, wherein, described metadata comprises: the logical address of the data segment that cryptographic hash, this cryptographic hash are corresponding and physical address and corresponding relation thereof; And by the remaining each cryptographic hash after duplicate removal, be defined as the index of data segment corresponding to this cryptographic hash;
Described second memory module, during for often reaching the second preset value when the data volume writing described first data in described storage medium, described first data in described storage medium corresponding for the data volume often reaching described second preset value are written in storage server to make described service processes by notification service process.
Preferably, described storage medium can be solid state hard disc.
Preferably, described second memory module can comprise: write submodule and wake submodule up, wherein,
Said write submodule, for by store in described storage medium, table data write default file that described first data that the data volume that often reaches described second preset value is corresponding are corresponding;
Describedly wake submodule up, for when there is table data in default file, call service process, to make service processes, data corresponding for described table data are written to storage server, wherein, when not having table data in described default file, described service processes is in dormant state.
Preferably, the described data read/write device stored based on elastomer block, can also comprise:
Create Queue module, for creating the queue with different priorities, wherein, the corresponding queue of each Virtual Block Device;
Described first memory module can comprise: the first sub module stored and the second sub module stored, wherein,
Described first sub module stored, for write request corresponding in the storage medium that pre-sets described in each described first data are written to, be stored into and obtain in queue that described establishment Queue module corresponding to Virtual Block Device corresponding to corresponding pending data create;
Described first data corresponding for the write request stored in the queue of described establishment Queue module establishment, for according to queue priority order from high to low, are written in the storage medium pre-set by described second sub module stored successively.
Preferably, the described data read/write device stored based on elastomer block, can also comprise: grouping module and the second duplicate removal module, wherein,
Described grouping module, at least one Virtual Block Device that will be pre-created, is divided into M Virtual Block Device group, comprises different Virtual Block Device in each Virtual Block Device group;
Described second duplicate removal module, for for each Virtual Block Device group, the segment data corresponding to the remaining cryptographic hash after the duplicate removal in this Virtual Block Device group carries out duplicate removal again, uniquely exists to make segment data corresponding to each cryptographic hash in each Virtual Block Device group.
Preferably, described the first corresponding for remaining each cryptographic hash after duplicate removal data are written in the storage medium pre-set, and/or, described first data in described storage medium corresponding for the data volume often reaching described second preset value are written to the writing mode in storage server by described service processes, can comprise:
The writing mode of content-based storage.
The embodiment of the invention also discloses a kind of data read apparatus stored based on elastomer block, comprising: receiver module, first searches module, the first feedback module, second searches module and the second feedback module, wherein:
Described receiver module, for receiving the data read request of user for different virtual block device;
Described first searches module, for each data read request received for described receiver module, with the cryptographic hash of data segment for index, searches the physical address corresponding with this data read request in storage medium;
Described first feedback module, for when described first search module in storage medium, find the physical address corresponding with this data read request, by segment data corresponding for the physical address that finds, feed back to user;
Described second searches module, for when described first search module in storage medium, do not find the physical address corresponding with this data read request, with the cryptographic hash of data segment for index, in storage server, search the physical address corresponding with this data read request;
Described second feedback module, for when described second search module in storage server, find the physical address corresponding with this data read request, by segment data corresponding for the physical address that finds, feed back to user.
Wherein, the logical address of data to be read can be comprised in described data read request;
Described first searches module, specifically may be used for:
For each data read request that described receiver module receives, according to the logical address of the data to be read that described data read request comprises, determine the cryptographic hash of the data segment that described data read request is corresponding; With the cryptographic hash of determined data segment for the physical address corresponding with this data read request searched in index in storage medium.
As seen from the above technical solutions, the embodiment of the present invention is passed through the data sectional in Virtual Block Device, calculate the cryptographic hash of each data segment, according to cryptographic hash, data are carried out duplicate removal, then the data after duplicate removal and corresponding cryptographic hash, logical address and the physical address of data are stored, by cryptographic hash visit data, achieve removal redundant data, save taking of storage space, also improve the search efficiency of data simultaneously.
Certainly, arbitrary product of the present invention is implemented or method must not necessarily need to reach above-described all advantages simultaneously.
Accompanying drawing explanation
In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
The first schematic flow sheet of the data write method based on elastomer block storage that Fig. 1 provides for the embodiment of the present invention;
The second schematic flow sheet of the data write method based on elastomer block storage that Fig. 2 provides for the embodiment of the present invention;
The third schematic flow sheet of the data write method based on elastomer block storage that Fig. 3 provides for the embodiment of the present invention;
A kind of data stored based on elastomer block that Fig. 4 provides for the embodiment of the present invention read the schematic flow sheet of method;
The first structural representation of the data read/write device based on elastomer block storage that Fig. 5 provides for the embodiment of the present invention;
The second structural representation of the data read/write device based on elastomer block storage that Fig. 6 provides for the embodiment of the present invention;
The third structural representation of the data read/write device based on elastomer block storage that Fig. 7 provides for the embodiment of the present invention;
The structural representation of a kind of data read apparatus based on elastomer block storage that Fig. 8 provides for the embodiment of the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
In order to solve prior art problem, embodiments providing a kind of data stored based on elastomer block and writing, read method and device.First the data write method based on elastomer block storage that the embodiment of the present invention provides is described in detail below.
Be pre-created at least one Virtual Block Device, described Virtual Block Device is that elastomer block stores, and stores pending data in described Virtual Block Device.Concrete, being pre-created Virtual Block Device is prior art, and the embodiment of the present invention no longer repeats this.
The first schematic flow sheet of the data write method based on elastomer block storage that Fig. 1 provides for the embodiment of the present invention, can comprise:
Step 101: from least one Virtual Block Device, obtain pending data.
Wherein, Virtual Block Device is the above-mentioned Virtual Block Device be pre-created.
Step 102: pending data are carried out segmentation with the first preset value, and calculate the cryptographic hash of every segment data respectively.
Step 103: the segment data duplicate removal corresponding to the cryptographic hash calculated.
Particularly, suppose above-mentioned pending data be TXT form, size is the data of 430K, the pending data of the 430K of above-mentioned TXT form can be carried out segmentation in units of 4K, be divided into 108 sections, and calculate the cryptographic hash of each data segment after segmentation, obtain 108 cryptographic hash.May have identical in these 108 cryptographic hash, delete the segment data that cryptographic hash is identical, to make segment data corresponding to identical cryptographic hash uniquely exist, thus save taking of storage space.
Step 104: the first corresponding for the remaining each cryptographic hash after duplicate removal data are written in the storage medium pre-set; And by the remaining each cryptographic hash after duplicate removal, be defined as the index of data segment corresponding to this cryptographic hash.
Wherein, the first data comprise: metadata and segment data, and wherein, metadata comprises: the logical address of the data segment that cryptographic hash, this cryptographic hash are corresponding and physical address and corresponding relation thereof.
In actual applications, above-mentioned storage medium can be solid state hard disc (SolidStateDrives, referred to as SSD), and solid state disk read-write speed quickly, by using solid state hard disc as buffer memory, improve greatly based on elastomer block store readwrite performance.Above-mentioned solid state hard disc can be the solid state hard disc based on flash memory or the solid state hard disc based on DRAM.
Step 105: when the data volume writing the first data in storage medium often reaches the second preset value, the first data in storage medium corresponding for the data volume often reaching the second preset value are written in storage server to make service processes by notification service process.
Particularly, the second preset value can be 4M, when the data volume in storage medium reaches 4M, with regard to notification service process, the first data stored is written in storage server in storage medium.
Above-mentioned storage medium limited storage space, just as buffer memory.The data that can store in periodic cleaning storage medium, when the space also can working as storage medium is occupied full, clean out initial partial data.Above-mentioned storage server can think that storage space is very large, and the most all first data all write in storage server.
Particularly, in actual applications, notification service process, to make service processes be written in storage server by the first data in storage medium corresponding for the data volume often reaching the second preset value, can by store in storage medium, table data write default file that the first data that the data volume that often reaches the second preset value is corresponding are corresponding; When there is table data in default file, call service process, to make service processes that data corresponding for table data are written to storage server, wherein, when not having table data in default file, service processes is in dormant state.In this case, when not having table data in default file, service processes is in dormant state not occupying system resources, and when only there is table data in default file, service processes is just waken up, and processes data.
In actual applications, default file can be socket file.
When data volume in storage medium reaches 4M, just by store in storage medium, table data write socket file that the first data that the data volume of 4M is corresponding are corresponding.When there is table data in socket file, service processes is waken up, and data corresponding for described table data are written to storage server by service processes.Wherein, when not having table data in described default file, described service processes is in dormant state.Be equivalent to socket file and sent out a task list to service processes, data middle in storage medium are written in storage server according to this inventory by service processes.
Certain service processes also can be always in running order, data middle in storage medium is written in storage server in real time.
In actual applications, the first corresponding for remaining each cryptographic hash after duplicate removal data are written in the storage medium pre-set, and/or, the first data in storage medium corresponding for the data volume often reaching described second preset value are written to the writing mode in storage server by service processes, can adopt the writing mode of content-based storage.
Content-based storage is a prominent example of object-oriented store, using content as storage foundation, has favorable expandability, data compression, saving resource and the advantage such as to guarantee data integrity.
In the present embodiment, by by the data sectional in Virtual Block Device, calculate the cryptographic hash of each data segment, according to cryptographic hash, data are carried out duplicate removal, then the data after duplicate removal and corresponding cryptographic hash, logical address and the physical address of data are stored, by cryptographic hash visit data, achieve removal redundant data, save taking of storage space, also improve the search efficiency of data simultaneously.
The second schematic flow sheet of the data write method based on elastomer block storage that Fig. 2 provides for the embodiment of the present invention.The present invention is embodiment illustrated in fig. 2 on basis embodiment illustrated in fig. 1, before step 101, increases step 106: create the queue with different priorities.
Wherein, the corresponding queue of each Virtual Block Device.
The step 104 of the present embodiment, can comprise:
Step 1041: each first data are written to write request corresponding in the storage medium pre-set, is stored into and obtains in queue corresponding to Virtual Block Device corresponding to corresponding pending data.
The first corresponding for the write request stored in queue data are written in the storage medium pre-set by step 1042: according to queue priority order from high to low successively; And by the remaining each cryptographic hash after duplicate removal, be defined as the index of data segment corresponding to this cryptographic hash.
The queue that priority is different is set in kernel-driven, the corresponding queue of each Virtual Block Device.Such as, three Virtual Block Device are had, X, Y, Z.In kernel-driven, be provided with three queues, priority is followed successively by A, B, C from high to low, the corresponding queue B of Virtual Block Device X, Y corresponding queue A, Virtual Block Device Z.Be directed to the pending data in Virtual Block Device X, Y are carried out segmentation, calculating, duplicate removal after write storage medium write request be placed in queue A, be directed to the pending data in Virtual Block Device Z are carried out segmentation, calculating, duplicate removal after write storage medium write request be placed in queue B.Write request in priority processing queue A, all processes the write request in rear reprocessing queue B.
In the present embodiment, by creating the queue with different priorities, under the write request for different virtual block device being stored in the queue of different priorities, achieve the priority level of flexible configuration for the write request of different virtual block device.
The third schematic flow sheet of the data write method based on elastomer block storage that Fig. 3 provides for the embodiment of the present invention.The present invention is embodiment illustrated in fig. 3 on basis embodiment illustrated in fig. 1, before step 101, increases step 107: by least one Virtual Block Device be pre-created, be divided into M Virtual Block Device group.Wherein, different Virtual Block Device is comprised in each Virtual Block Device group.
Before step 104, increase step 108: for each Virtual Block Device group, the segment data corresponding to the remaining cryptographic hash after the duplicate removal in this Virtual Block Device group carries out duplicate removal again.
In actual applications, can according to actual conditions, related multiple Virtual Block Device is divided into a Virtual Block Device group, then in Virtual Block Device duplicate removal basis on, again duplicate removal is carried out in group, delete the segment data that cryptographic hash is identical, uniquely exist to make segment data corresponding to cryptographic hash identical in group.Under guarantee does not cause the prerequisite of shortage of data, remove redundant data further, save taking of storage space.
Suppose have four Virtual Block Device W, X, Y, Z, what store in Virtual Block Device W, X is the data of TXT form, and what store in Virtual Block Device Y, Z is the data of RMVB form.Virtual Block Device W, X can be divided into a Virtual Block Device group 1, Virtual Block Device Y, Z be divided into a Virtual Block Device group 2.
From Virtual Block Device W, obtain the pending data of 400KTXT form, the pending data of this 400KTXT form are carried out segmentation in units of 4K, is divided into 100 sections, and calculate the cryptographic hash of each data segment after segmentation, obtain 100 cryptographic hash.May have identical in these 100 cryptographic hash, delete the segment data that cryptographic hash is identical, uniquely exist to make segment data corresponding to identical cryptographic hash.Suppose to have remained segment data corresponding to 60 cryptographic hash.
From Virtual Block Device X, obtain the pending data of 600KTXT form, the pending data of this 600KTXT form are carried out segmentation in units of 4K, is divided into 150 sections, and calculate the cryptographic hash of each data segment after segmentation, obtain 150 cryptographic hash.May have identical in these 150 cryptographic hash, delete the segment data that cryptographic hash is identical, uniquely exist to make segment data corresponding to identical cryptographic hash.Suppose to have remained segment data corresponding to 80 cryptographic hash.
From Virtual Block Device Y, obtain the pending data of 800KRMVB form, the pending data of this 800KRMVB form are carried out segmentation in units of 4K, is divided into 200 sections, and calculate the cryptographic hash of each data segment after segmentation, obtain 200 cryptographic hash.May have identical in these 200 cryptographic hash, delete the segment data that cryptographic hash is identical, uniquely exist to make segment data corresponding to identical cryptographic hash.Suppose to have remained segment data corresponding to 100 cryptographic hash.
From Virtual Block Device Z, obtain the pending data of 800KRMVB form, the pending data of this 800KRMVB form are carried out segmentation in units of 4K, is divided into 200 sections, and calculate the cryptographic hash of each data segment after segmentation, obtain 200 cryptographic hash.May have identical in these 200 cryptographic hash, delete the segment data that cryptographic hash is identical, uniquely exist to make segment data corresponding to identical cryptographic hash.Suppose to have remained segment data corresponding to 100 cryptographic hash.
On this basis, for the Virtual Block Device group 1 comprising Virtual Block Device W, X, Virtual Block Device group 1 comprises the segment data that in the segment data and Virtual Block Device X that in Virtual Block Device W, remaining 60 cryptographic hash are corresponding, remaining 80 cryptographic hash are corresponding, still may have identical in these 140 cryptographic hash, delete the segment data that cryptographic hash is identical, to make segment data corresponding to cryptographic hash identical in group uniquely exist, thus save taking of storage space further.
In like manner, for the Virtual Block Device group 2 comprising Virtual Block Device Y, Z, Virtual Block Device group 2 comprises the segment data that in the segment data and Virtual Block Device Z that in Virtual Block Device Y, remaining 100 cryptographic hash are corresponding, remaining 100 cryptographic hash are corresponding, still may have identical in these 200 cryptographic hash, delete the segment data that cryptographic hash is identical, to make segment data corresponding to cryptographic hash identical in group uniquely exist, thus save taking of storage space further.
In the present embodiment, by multiple Virtual Block Device is divided into M Virtual Block Device group, on the basis of segment data duplicate removal corresponding for the cryptographic hash of the pending data obtained from Virtual Block Device, in Virtual Block Device group, segment data corresponding for remaining cryptographic hash is carried out duplicate removal again, thus save taking of storage space further.
A kind of data stored based on elastomer block that Fig. 4 provides for the embodiment of the present invention read the schematic flow sheet of method, can comprise:
Step 201: receive the data read request of user for different virtual block device.
Step 202: for each data read request, with the cryptographic hash of data segment for index, judges whether find the physical address corresponding with this data read request in storage medium, if so, then performs step 204; If not, then step 203 is performed.
Step 203: with the cryptographic hash of data segment for index, judges whether find the physical address corresponding with this data read request in storage server, if so, then performs step 204.
Step 204: by segment data corresponding for the physical address that finds, feed back to user.
What store in storage medium and storage server is the first data corresponding to remaining each cryptographic hash after duplicate removal, improves search efficiency.
Particularly, the logical address of data to be read can be comprised in data read request.
With the cryptographic hash of data segment for index, search the physical address corresponding with this data read request in storage medium, the logical address of the data to be read that can comprise according to data read request, determines the cryptographic hash of the data segment that data read request is corresponding; With the cryptographic hash of determined data segment for the physical address corresponding with this data read request searched in index in storage medium.
The first data stored in storage medium and storage server comprise: metadata and segment data, and wherein, metadata comprises: the logical address of the data segment that cryptographic hash, this cryptographic hash are corresponding and physical address and corresponding relation thereof.According to the logical address of the data to be read in data read request, in storage medium or storage server, match corresponding cryptographic hash.Using this cryptographic hash as index, then can search the physical address corresponding with this data read request in storage medium or storage server.
In the present embodiment, by by the data sectional in Virtual Block Device, calculate the cryptographic hash of each data segment, according to cryptographic hash, data are carried out duplicate removal, then the data after duplicate removal and corresponding cryptographic hash, logical address and the physical address of data are stored, by cryptographic hash visit data, achieve removal redundant data, save taking of storage space, also improve the search efficiency of data simultaneously.
Corresponding with above-mentioned embodiment of the method, the embodiment of the present invention additionally provide based on elastomer block store data write, read apparatus.
For the data read/write device stored based on elastomer block, be pre-created at least one Virtual Block Device, described Virtual Block Device is that elastomer block stores, and stores pending data in described Virtual Block Device.
The first structural representation of the data read/write device based on elastomer block storage that Fig. 5 provides for the embodiment of the present invention, can comprise: obtain module 301, segmentation module 302, first duplicate removal module 303, first memory module 304 and the second memory module 305, wherein,
Obtain module 301, for from least one Virtual Block Device described, obtain pending data.
Segmentation module 302, carries out segmentation for pending data acquisition module 301 obtained with the first preset value, and calculates the cryptographic hash of every segment data respectively.
First duplicate removal module 303, the segment data duplicate removal that the cryptographic hash for calculating segmentation module 302 is corresponding, uniquely exists with the segment data making each cryptographic hash of calculating corresponding.
First memory module 304, for the first corresponding for the remaining each cryptographic hash after duplicate removal data are written in the storage medium pre-set, described first data comprise: metadata and segment data, wherein, described metadata comprises: the logical address of the data segment that cryptographic hash, this cryptographic hash are corresponding and physical address and corresponding relation thereof; And by the remaining each cryptographic hash after duplicate removal, be defined as the index of data segment corresponding to this cryptographic hash.
Second memory module 305, during for often reaching the second preset value when the data volume writing described first data in described storage medium, described first data in described storage medium corresponding for the data volume often reaching described second preset value are written in storage server to make described service processes by notification service process.
Wherein, described storage medium can be solid state hard disc.
Wherein, the second memory module 305 can comprise: write submodule and wake submodule (not shown) up, wherein,
Write submodule, for by store in described storage medium, table data write default file that described first data that the data volume that often reaches described second preset value is corresponding are corresponding.
Wake submodule up, for when there is table data in default file, call service process, to make service processes, data corresponding for described table data are written to storage server, wherein, when not having table data in described default file, described service processes is in dormant state.
In actual applications, the first corresponding for remaining each cryptographic hash after duplicate removal data are written in the storage medium pre-set, and/or, described first data in described storage medium corresponding for the data volume often reaching described second preset value are written to the writing mode in storage server by described service processes, can adopt the writing mode of content-based storage.
In the present embodiment, by by the data sectional in Virtual Block Device, calculate the cryptographic hash of each data segment, according to cryptographic hash, data are carried out duplicate removal, then the data after duplicate removal and corresponding cryptographic hash, logical address and the physical address of data are stored, by cryptographic hash visit data, achieve removal redundant data, save taking of storage space, also improve the search efficiency of data simultaneously.
The second structural representation of the data read/write device based on elastomer block storage that Fig. 6 provides for the embodiment of the present invention.The present invention is embodiment illustrated in fig. 7 on basis embodiment illustrated in fig. 6, increases and creates Queue module 306, for creating the queue with different priorities, wherein, and the corresponding queue of each Virtual Block Device.
First memory module 304 can comprise: the first sub module stored and the second sub module stored (not shown), wherein,
First sub module stored, for write request corresponding in the storage medium that pre-sets described in each described first data being written to, is stored into and obtains in queue that establishment Queue module 306 corresponding to Virtual Block Device corresponding to corresponding pending data create.
Described first data creating the write request that stores in queue that Queue module 305 creates corresponding for according to queue priority order from high to low, are written in the storage medium pre-set by the second sub module stored successively.
In the present embodiment, by creating the queue with different priorities, under the write request for different virtual block device being stored in the queue of different priorities, achieve the priority level of flexible configuration for the write request of different virtual block device.
The third structural representation of the data read/write device based on elastomer block storage that Fig. 7 provides for the embodiment of the present invention.The present invention is embodiment illustrated in fig. 7 on basis embodiment illustrated in fig. 5, increases grouping module 307 and the second duplicate removal module 308, wherein,
Grouping module 307, at least one Virtual Block Device that will be pre-created, is divided into M Virtual Block Device group, comprises different Virtual Block Device in each Virtual Block Device group.
Second duplicate removal module 308, for for each Virtual Block Device group, the segment data corresponding to the remaining cryptographic hash after the duplicate removal in this Virtual Block Device group carries out duplicate removal again, uniquely exists to make segment data corresponding to each cryptographic hash in each Virtual Block Device group.
In the present embodiment, by multiple Virtual Block Device is divided into M Virtual Block Device group, on the basis of segment data duplicate removal corresponding for the cryptographic hash of the pending data obtained from Virtual Block Device, in Virtual Block Device group, segment data corresponding for remaining cryptographic hash is carried out duplicate removal again, thus save taking of storage space further.
The structural representation of a kind of data read apparatus based on elastomer block storage that Fig. 8 provides for the embodiment of the present invention, can comprise: receiver module 401, first is searched module 402, first feedback module 403, second and searched module 404 and the second feedback module 405, wherein
Receiver module 401, for receiving the data read request of user for different virtual block device.
First searches module 402, for each data read request received for receiver module 401, with the cryptographic hash of data segment for index, searches the physical address corresponding with this data read request in storage medium.
First feedback module 403, for when first search module 402 in storage medium, find the physical address corresponding with this data read request, by segment data corresponding for the physical address that finds, feed back to user.
Second searches module 404, for when first search module 402 in storage medium, do not find the physical address corresponding with this data read request, with the cryptographic hash of data segment for index, in storage server, search the physical address corresponding with this data read request.
Second feedback module 405, for when second search module 404 in storage server, find the physical address corresponding with this data read request, by segment data corresponding for the physical address that finds, feed back to user.
Particularly, the logical address of data to be read can be comprised in described data read request;
First searches module 402, specifically may be used for:
For each data read request that receiver module 401 receives, according to the logical address of the data to be read that described data read request comprises, determine the cryptographic hash that described data read request is corresponding; With determined cryptographic hash for the physical address corresponding with this data read request searched in index in storage medium.
In the present embodiment, by by the data sectional in Virtual Block Device, calculate the cryptographic hash of each data segment, according to cryptographic hash, data are carried out duplicate removal, then the data after duplicate removal and corresponding cryptographic hash, logical address and the physical address of data are stored, by cryptographic hash visit data, achieve removal redundant data, save taking of storage space, also improve the search efficiency of data simultaneously.
It should be noted that, in this article, the such as relational terms of first and second grades and so on is only used for an entity or operation to separate with another entity or operational zone, and not necessarily requires or imply the relation that there is any this reality between these entities or operation or sequentially.And, term " comprises ", " comprising " or its any other variant are intended to contain comprising of nonexcludability, thus make to comprise the process of a series of key element, method, article or equipment and not only comprise those key elements, but also comprise other key elements clearly do not listed, or also comprise by the intrinsic key element of this process, method, article or equipment.
Each embodiment in this instructions all adopts relevant mode to describe, between each embodiment identical similar part mutually see, what each embodiment stressed is the difference with other embodiments.Especially, for device embodiment, because it is substantially similar to embodiment of the method, so description is fairly simple, relevant part illustrates see the part of embodiment of the method.
One of ordinary skill in the art will appreciate that all or part of step realized in said method embodiment is that the hardware that can carry out instruction relevant by program has come, described program can be stored in computer read/write memory medium, here the alleged storage medium obtained, as: ROM/RAM, magnetic disc, CD etc.
The foregoing is only preferred embodiment of the present invention, be not intended to limit protection scope of the present invention.All any amendments done within the spirit and principles in the present invention, equivalent replacement, improvement etc., be all included in protection scope of the present invention.

Claims (16)

1. based on the data write method that elastomer block stores, it is characterized in that, be pre-created at least one Virtual Block Device, described Virtual Block Device is that elastomer block stores, and stores pending data in described Virtual Block Device; Described method comprises:
From at least one Virtual Block Device described, obtain pending data;
Described pending data are carried out segmentation with the first preset value, and calculates the cryptographic hash of every segment data respectively;
The segment data duplicate removal corresponding to the cryptographic hash calculated, uniquely exists with the segment data making each cryptographic hash of calculating corresponding;
The first corresponding for remaining each cryptographic hash after duplicate removal data are written in the storage medium pre-set, described first data comprise: metadata and segment data, wherein, described metadata comprises: the logical address of the data segment that cryptographic hash, this cryptographic hash are corresponding and physical address and corresponding relation thereof; And by the remaining each cryptographic hash after duplicate removal, be defined as the index of data segment corresponding to this cryptographic hash;
When the data volume writing described first data in described storage medium often reaches the second preset value, described first data in described storage medium corresponding for the data volume often reaching described second preset value are written in storage server to make described service processes by notification service process.
2. method according to claim 1, is characterized in that, described storage medium is solid state hard disc.
3. method according to claim 1, is characterized in that, described notification service process, to make described service processes be written in storage server by described first data in described storage medium corresponding for the data volume often reaching described second preset value, comprising:
By store in described storage medium, table data write default file that described first data that the data volume that often reaches described second preset value is corresponding are corresponding;
When there is table data in default file, call service process, to make service processes that data corresponding for described table data are written to storage server, wherein, when not having table data in described default file, described service processes is in dormant state.
4. method according to claim 1, is characterized in that, described from least one Virtual Block Device described, before obtaining pending data, also comprises:
Create the queue with different priorities, wherein, the corresponding queue of each Virtual Block Device;
Described the first corresponding for remaining each cryptographic hash after duplicate removal data to be written in the storage medium pre-set, to comprise:
Write request corresponding in the storage medium pre-set described in each described first data being written to, is stored into and obtains in queue corresponding to Virtual Block Device corresponding to corresponding pending data;
According to queue priority order from high to low, successively described first data corresponding for the write request stored in queue are written in the storage medium pre-set.
5. method according to claim 1, is characterized in that, described from least one Virtual Block Device described, before obtaining pending data, also comprises:
By at least one Virtual Block Device be pre-created, be divided into M Virtual Block Device group, in each Virtual Block Device group, comprise different Virtual Block Device;
Described the first corresponding for remaining each cryptographic hash after duplicate removal data are written in the storage medium pre-set before, also comprise:
For each Virtual Block Device group, the segment data corresponding to the remaining cryptographic hash after the duplicate removal in this Virtual Block Device group carries out duplicate removal again, uniquely exists to make segment data corresponding to each cryptographic hash in each Virtual Block Device group.
6. the method according to claim 1-5 any one, it is characterized in that, described the first corresponding for remaining each cryptographic hash after duplicate removal data are written in the storage medium pre-set, and/or, described first data in described storage medium corresponding for the data volume often reaching described second preset value are written to the writing mode in storage server by described service processes, comprising:
The writing mode of content-based storage.
7. the data stored based on elastomer block read a method, it is characterized in that, comprising:
Receive the data read request of user for different virtual block device;
For each data read request, with the cryptographic hash of data segment for index, in storage medium, search the physical address corresponding with this data read request;
If found, then by segment data corresponding for the physical address found, feed back to user;
If do not found, then with the cryptographic hash of data segment for index, in storage server, search the physical address corresponding with this data read request, if found, then by segment data corresponding for the physical address found, feed back to user.
8. method according to claim 7, is characterized in that, described data read request comprises the logical address of data to be read;
Described with the cryptographic hash of data segment for index, in storage medium, search the physical address corresponding with this data read request, comprising:
According to the logical address of the data to be read that described data read request comprises, determine the cryptographic hash of the data segment that described data read request is corresponding;
With the cryptographic hash of determined data segment for the physical address corresponding with this data read request searched in index in storage medium.
9. based on the data read/write device that elastomer block stores, it is characterized in that, be pre-created at least one Virtual Block Device, described Virtual Block Device is that elastomer block stores, and stores pending data in described Virtual Block Device; Described device comprises: obtain module, segmentation module, the first duplicate removal module, the first memory module and the second memory module, wherein,
Described acquisition module, for from least one Virtual Block Device described, obtains pending data;
Described segmentation module, carries out segmentation for the pending data described acquisition module obtained with the first preset value, and calculates the cryptographic hash of every segment data respectively;
Described first duplicate removal module, the segment data duplicate removal that the cryptographic hash for calculating described segmentation module is corresponding, uniquely exists with the segment data making each cryptographic hash of calculating corresponding;
Described first memory module, for the first corresponding for the remaining each cryptographic hash after duplicate removal data are written in the storage medium pre-set, described first data comprise: metadata and segment data, wherein, described metadata comprises: the logical address of the data segment that cryptographic hash, this cryptographic hash are corresponding and physical address and corresponding relation thereof; And by the remaining each cryptographic hash after duplicate removal, be defined as the index of data segment corresponding to this cryptographic hash;
Described second memory module, during for often reaching the second preset value when the data volume writing described first data in described storage medium, described first data in described storage medium corresponding for the data volume often reaching described second preset value are written in storage server to make described service processes by notification service process.
10. device according to claim 9, is characterized in that, described storage medium is solid state hard disc.
11. devices according to claim 9, is characterized in that, described second memory module comprises: write submodule and wake submodule up, wherein,
Said write submodule, for by store in described storage medium, table data write default file that described first data that the data volume that often reaches described second preset value is corresponding are corresponding;
Describedly wake submodule up, for when there is table data in default file, call service process, to make service processes, data corresponding for described table data are written to storage server, wherein, when not having table data in described default file, described service processes is in dormant state.
12. devices according to claim 9, is characterized in that, also comprise:
Create Queue module, for creating the queue with different priorities, wherein, the corresponding queue of each Virtual Block Device;
Described first memory module comprises: the first sub module stored and the second sub module stored, wherein,
Described first sub module stored, for write request corresponding in the storage medium that pre-sets described in each described first data are written to, be stored into and obtain in queue that described establishment Queue module corresponding to Virtual Block Device corresponding to corresponding pending data create;
Described first data corresponding for the write request stored in the queue of described establishment Queue module establishment, for according to queue priority order from high to low, are written in the storage medium pre-set by described second sub module stored successively.
13. devices according to claim 9, is characterized in that, also comprise: grouping module and the second duplicate removal module, wherein,
Described grouping module, at least one Virtual Block Device that will be pre-created, is divided into M Virtual Block Device group, comprises different Virtual Block Device in each Virtual Block Device group;
Described second duplicate removal module, for for each Virtual Block Device group, the segment data corresponding to the remaining cryptographic hash after the duplicate removal in this Virtual Block Device group carries out duplicate removal again, uniquely exists to make segment data corresponding to each cryptographic hash in each Virtual Block Device group.
14. devices according to claim 9-13 any one, it is characterized in that, described the first corresponding for remaining each cryptographic hash after duplicate removal data are written in the storage medium pre-set, and/or, described first data in described storage medium corresponding for the data volume often reaching described second preset value are written to the writing mode in storage server by described service processes, comprising:
The writing mode of content-based storage.
15. 1 kinds of data read apparatus stored based on elastomer block, is characterized in that, comprising: receiver module, first searches module, the first feedback module, second searches module and the second feedback module, wherein:
Described receiver module, for receiving the data read request of user for different virtual block device;
Described first searches module, for each data read request received for described receiver module, with the cryptographic hash of data segment for index, searches the physical address corresponding with this data read request in storage medium;
Described first feedback module, for when described first search module in storage medium, find the physical address corresponding with this data read request, by segment data corresponding for the physical address that finds, feed back to user;
Described second searches module, for when described first search module in storage medium, do not find the physical address corresponding with this data read request, with the cryptographic hash of data segment for index, in storage server, search the physical address corresponding with this data read request;
Described second feedback module, for when described second search module in storage server, find the physical address corresponding with this data read request, by segment data corresponding for the physical address that finds, feed back to user.
16. devices according to claim 15, is characterized in that, described data read request comprises the logical address of data to be read;
Described first searches module, specifically for:
For each data read request that described receiver module receives, according to the logical address of the data to be read that described data read request comprises, determine the cryptographic hash of the data segment that described data read request is corresponding; With the cryptographic hash of determined data segment for the physical address corresponding with this data read request searched in index in storage medium.
CN201510639347.5A 2015-09-30 2015-09-30 Data writing and reading method and device based on elastic block storage Pending CN105183399A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510639347.5A CN105183399A (en) 2015-09-30 2015-09-30 Data writing and reading method and device based on elastic block storage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510639347.5A CN105183399A (en) 2015-09-30 2015-09-30 Data writing and reading method and device based on elastic block storage

Publications (1)

Publication Number Publication Date
CN105183399A true CN105183399A (en) 2015-12-23

Family

ID=54905508

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510639347.5A Pending CN105183399A (en) 2015-09-30 2015-09-30 Data writing and reading method and device based on elastic block storage

Country Status (1)

Country Link
CN (1) CN105183399A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106371764A (en) * 2016-08-23 2017-02-01 浪潮(北京)电子信息产业有限公司 Virtual block device-based data processing method and apparatus
CN106406759A (en) * 2016-09-13 2017-02-15 郑州云海信息技术有限公司 Data storage method and device
CN108021513A (en) * 2016-11-02 2018-05-11 杭州海康威视数字技术股份有限公司 A kind of date storage method and device
CN108427539A (en) * 2018-03-15 2018-08-21 深信服科技股份有限公司 Offline duplicate removal compression method, device and the readable storage medium storing program for executing of buffer memory device data
CN113867627A (en) * 2021-08-29 2021-12-31 苏州浪潮智能科技有限公司 Method and system for optimizing performance of storage system
CN114442961A (en) * 2022-02-07 2022-05-06 苏州浪潮智能科技有限公司 Data processing method and device, computer equipment and storage medium
CN115576956A (en) * 2022-12-07 2023-01-06 苏州浪潮智能科技有限公司 Data processing method, system, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1716215A (en) * 2004-06-30 2006-01-04 深圳市朗科科技有限公司 Method for reducing data redundance in storage medium
CN102323958A (en) * 2011-10-27 2012-01-18 上海文广互动电视有限公司 Data de-duplication method
CN102591668A (en) * 2011-01-05 2012-07-18 阿里巴巴集团控股有限公司 Device, method and system for updating elastic computing cloud system
WO2013083085A1 (en) * 2011-12-08 2013-06-13 中兴通讯股份有限公司 Data acquisition method and device
CN103873504A (en) * 2012-12-12 2014-06-18 鸿富锦精密工业(深圳)有限公司 System enabling data blocks to be stored in distributed server and method thereof

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1716215A (en) * 2004-06-30 2006-01-04 深圳市朗科科技有限公司 Method for reducing data redundance in storage medium
CN102591668A (en) * 2011-01-05 2012-07-18 阿里巴巴集团控股有限公司 Device, method and system for updating elastic computing cloud system
CN102323958A (en) * 2011-10-27 2012-01-18 上海文广互动电视有限公司 Data de-duplication method
WO2013083085A1 (en) * 2011-12-08 2013-06-13 中兴通讯股份有限公司 Data acquisition method and device
CN103873504A (en) * 2012-12-12 2014-06-18 鸿富锦精密工业(深圳)有限公司 System enabling data blocks to be stored in distributed server and method thereof

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106371764A (en) * 2016-08-23 2017-02-01 浪潮(北京)电子信息产业有限公司 Virtual block device-based data processing method and apparatus
CN106406759A (en) * 2016-09-13 2017-02-15 郑州云海信息技术有限公司 Data storage method and device
CN106406759B (en) * 2016-09-13 2019-12-31 苏州浪潮智能科技有限公司 Data storage method and device
CN108021513B (en) * 2016-11-02 2021-09-10 杭州海康威视数字技术股份有限公司 Data storage method and device
CN108021513A (en) * 2016-11-02 2018-05-11 杭州海康威视数字技术股份有限公司 A kind of date storage method and device
CN108427539A (en) * 2018-03-15 2018-08-21 深信服科技股份有限公司 Offline duplicate removal compression method, device and the readable storage medium storing program for executing of buffer memory device data
CN108427539B (en) * 2018-03-15 2021-06-04 深信服科技股份有限公司 Offline de-duplication compression method and device for cache device data and readable storage medium
CN113867627A (en) * 2021-08-29 2021-12-31 苏州浪潮智能科技有限公司 Method and system for optimizing performance of storage system
CN113867627B (en) * 2021-08-29 2023-08-22 苏州浪潮智能科技有限公司 Storage system performance optimization method and system
CN114442961A (en) * 2022-02-07 2022-05-06 苏州浪潮智能科技有限公司 Data processing method and device, computer equipment and storage medium
CN114442961B (en) * 2022-02-07 2023-08-08 苏州浪潮智能科技有限公司 Data processing method, device, computer equipment and storage medium
CN115576956A (en) * 2022-12-07 2023-01-06 苏州浪潮智能科技有限公司 Data processing method, system, equipment and storage medium
CN115576956B (en) * 2022-12-07 2023-03-10 苏州浪潮智能科技有限公司 Data processing method, system, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN105183399A (en) Data writing and reading method and device based on elastic block storage
JP5732536B2 (en) System, method and non-transitory computer-readable storage medium for scalable reference management in a deduplication-based storage system
US10747618B2 (en) Checkpointing of metadata into user data area of a content addressable storage system
EP3678015B1 (en) Metadata query method and device
US10013317B1 (en) Restoring a volume in a storage system
CN103116661B (en) A kind of data processing method of database
US20160132541A1 (en) Efficient implementations for mapreduce systems
CN109445702B (en) block-level data deduplication storage system
CN108268344B (en) Data processing method and device
CN105117351A (en) Method and apparatus for writing data into cache
CN108282522B (en) Data storage access method and system based on dynamic routing
CN104239518A (en) Repeated data deleting method and device
WO2017020576A1 (en) Method and apparatus for file compaction in key-value storage system
CN107544869B (en) Data recovery method and device
US20170083406A1 (en) Method and apparatus for incremental backup
CN104735110A (en) Metadata management method and system
WO2015150978A1 (en) Scanning memory for de-duplication using rdma
CN102142032A (en) Method and system for reading and writing data of distributed file system
CN107798063B (en) Snapshot processing method and snapshot processing device
CN105095495A (en) Distributed file system cache management method and system
CN111241088A (en) Data writing method, data query method, device and equipment
CN111831691B (en) Data reading and writing method and device, electronic equipment and storage medium
CN105493080A (en) Method and apparatus for context aware based data de-duplication
US11086558B2 (en) Storage system with storage volume undelete functionality
CN104484132A (en) Data reduction method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20151223

RJ01 Rejection of invention patent application after publication