CN106201771B - Data-storage system and data read-write method - Google Patents
Data-storage system and data read-write method Download PDFInfo
- Publication number
- CN106201771B CN106201771B CN201510226830.0A CN201510226830A CN106201771B CN 106201771 B CN106201771 B CN 106201771B CN 201510226830 A CN201510226830 A CN 201510226830A CN 106201771 B CN106201771 B CN 106201771B
- Authority
- CN
- China
- Prior art keywords
- bucket
- finger print
- multiple knot
- data block
- print information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Abstract
The invention discloses a kind of data-storage system, including central node and remove multiple knot;The central node be used for according to preset strategy by each Bucket be assigned to it is corresponding remove multiple knot, routing table is created with the corresponding relationship of multiple knot is removed according to Bucket, and synchronize the routing table to each multiple knot that goes;It is described to go multiple knot for storing the data block that finger print information and the finger print information corresponding to each Bucket being assigned to represent according to the routing table.Realize the global duplicate removal storage management of the finger print information to the initial data and 100TB or more rank of 100PB or more rank.
Description
Technical field
The invention belongs to Internet technical fields, specifically, be related to a kind of date storage method, data-storage system and
Data read-write method, the client for reading and writing data and the system for reading and writing data.
Background technique
Internet company needs to back up in recent years, the data of filing are in outburst trend.Due to cost considerations, tape is always
It is the primary storage medium of backup and filing system and virtual machine main storage system.But the storage environment of tape require it is high and
Service life is again shorter, generally every 4-5 with regard to needing to dump on new tape.When the quantity of tape is accumulated to tens of thousands of or even tens
Wan Hou, unloading work will become a nightmare.
With the development of magnetic disc, capacity has had reached 6T even 8T, and capacity price is more close with tape than gradually,
And disk with respect to the advantage of tape be support random access, this make it possible data de-duplication technology application, lead to
It crosses and combines magnetic disc and data de-duplication technology, the cost of backup filing can be greatlyd save.
Existing data de-duplication commercial product currently on the market, such as EMC DD990, HP StoreOnce B6200
Equipment, SEPATON DeltaStor software etc., substantially belong to single cpu mode, scalability is very limited, maximum available
1.6PB, maximum handling capacity 31TB/h (8.8G/s) are unable to satisfy Internet company no matter from capacity or performance at all
Storage demand.
A kind of " scalable distributed repeated data for supporting mass data to back up of Inst. of Computing Techn. Academia Sinica
Deletion system ", for the deficiency of single cpu mode, distribution is proposed in terms of the two in the scalability and deduplicated efficiency of machining system
Formula Bloom filter (bloomfilter) is used to go the data of multiple knot to route in distributed machining system, and proposes and be based on
The fingerprint queries of sampling mechanism realize distributed data deduplication system 3D- to improve fingerprint queries speed
deduper.Hereinafter referred to as scheme one.
EMC Inc. also has developed clustering deduplication storage on the basis of its single machine (single-node) mode
(cluster deduplication storage system).Its way is to increase several backup servers, is responsible for data
The fingerprint that stream carries out stripping and slicing, calculates data block, is then packaged into superblock (super chunk) and according to certain strategy
Being routed to some goes multiple knot to be handled.Hereinafter referred to as scheme two.
Both the above scheme cannot be known as distributed system for stricti jurise, but group system.Group system
Basic ideas are that the load balancing of task is carried out between multiple reliable single nodes.And the basic ideas of distributed system be
Data distribution is carried out between multiple insecure single nodes, and (when data distribution equilibrium, then the load for realizing task naturally is equal
Weighing apparatus), and reliability is ensured using means such as more copies or check codes.
In above-mentioned two scheme, the fingerprint base of group system is decentralized, although it is most to use certain measure
What amount was responsible for handling before being routed to the data block occurred before and its fingerprint goes on multiple knot, it can be difficult to avoiding by road
By being gone on multiple knot to one from the untreated data block, to be mistaken for new data block and be repeated preservation.Scheme
One is even more to use the sparse index based on sampling to fingerprint, and the misjudged probability of data block has been further aggravated.Even only
2% erroneous judgement, it is for the system of the order of magnitude even more big for 100PB and unacceptable.
Summary of the invention
In view of this, the application provides a kind of date storage method, data-storage system and data read-write method, for counting
Client according to read-write and the system for reading and writing data, solve in the machining system of the big order of magnitude due to finger print information
Decentralization and caused by the larger technical problem of probability of miscarriage of justice.
In order to solve the above-mentioned technical problem, this application discloses a kind of date storage method, it is applied to include central node
With the data-storage system for removing multiple knot;The date storage method, comprising: the central node will be each according to preset strategy
Bucket (Bucket), which is assigned to, corresponding removes multiple knot;The central node is according to Bucket and the corresponding relationship of multiple knot is gone to create
Routing table, and synchronize the routing table and remove multiple knot to each;It is described to go multiple knot according to the routing table, it stores each described
The data block that finger print information corresponding to the Bucket being assigned to and the finger print information represent.
It is described to go multiple knot according to the routing table, store finger print information corresponding to each Bucket being assigned to
The data block represented with the finger print information include: it is described go multiple knot be it is each described in the Bucket that is assigned to be respectively created pair
Container (Container) file answered;It is described that multiple knot is gone to save corresponding fingerprint in each Bucket being assigned to
Information saves the number that the finger print information represents in Container file corresponding with each Bucket for being assigned to
According to block.
It is described that multiple knot is gone to judge whether the size of the Container file is greater than preset threshold;When described
It is described that multiple knot is gone to take the Container archive to backstage when the size of Container file is greater than preset threshold
Business device.
Each Bucket is distributed to that corresponding to remove multiple knot include: the center according to preset strategy by the central node
Node by each Bucket be assigned to it is multiple it is corresponding remove multiple knot, it is the multiple it is corresponding go in multiple knot determine a master
Node and at least one standby node.
Whether central node judgement each goes whether multiple knot can be used, or increase and new remove multiple knot;When sentencing
It is disconnected go out some go multiple knot unavailable, or increase new when removing multiple knot, the central node is redistributed described each
Bucket;The central node, which updates the routing table and is synchronized to, each removes multiple knot;It is described go multiple knot according to it is described more
Routing table after new carries out Data Migration.
It is described that go multiple knot to carry out Data Migration according to the updated routing table include: the host node according to
Updated routing table initiates the Data Migration.
It is described when judging that some goes multiple knot unavailable, the central node redistributes each Bucket packet
Include: when judging that the host node is unavailable, the central node redefines out from least one described standby node
One host node;It is described that go multiple knot to carry out Data Migration according to the updated routing table include: described redefine
Host node initiates the Data Migration according to the updated routing table.
It includes a finger print information storehouse that multiple knot is removed described in each, and the finger print information storehouse is stored in solid state hard disk
Cuckoo Hash Map removes finger print information corresponding to each Bucket of multiple knot and the finger print information generation including described
The storage information of the data block of table.
M cuckoo Hash Map is run simultaneously in the solid state hard disk, and uses N number of cuckoo Hash letter simultaneously
Number;Wherein, M × N=128.
32 cuckoo Hash Maps are run in the solid state hard disk simultaneously, and use 4 tunnel cuckoo Hash letters simultaneously
Number.
In order to solve the above-mentioned technical problem, disclosed herein as well is a kind of data read-write methods, comprising: is by data cutting
Multiple data blocks and the finger print information for calculating separately each data block;Corresponding to the finger print information for determining each data block
Bucket;According to the routing table obtained from central node, determination is corresponding with the Bucket to remove multiple knot;Send fingerprint queries
Request removes multiple knot to corresponding with the Bucket, and the fingerprint queries request includes the finger print information of data block;It receives
The finger print information not inquired for going multiple knot to return corresponding with the Bucket;Upload the finger print information not inquired
And its data block represented removes multiple knot to corresponding with the Bucket.
Bucket corresponding to the finger print information of determination each data block includes: by the finger print information and institute
The total quantity for stating Bucket carries out modulo operation, the finger print information is determined according to the result of the modulo operation corresponding to
Bucket。
The method also includes: it is finished when the finger print information not inquired and its data block of representative all upload
When, the mapped file of the data is uploaded to multiple knot is removed, and the mapped file includes the finger of each data block of the data
The finger print information of line information, each data block is arranged according to the cutting sequence of data block.
The mapped file for uploading the data is to removing multiple knot, comprising: by the mapped file cutting is multiple numbers
According to block and calculate separately mapped file data block cryptographic Hash;Corresponding to the cryptographic Hash for determining the data block of the mapped file
Bucket;According to corresponding to the determining Bucket corresponding with the cryptographic Hash of the data block of the mapped file of the routing table
Remove multiple knot;Upload the Hash of the data block and corresponding cryptographic Hash of the mapped file extremely with the data block of the mapped file
It is worth corresponding to corresponding Bucket and removes multiple knot.
Described be multiple data blocks by the mapped file cutting include: by the head information cutting of the mapped file is institute
State first data block in multiple data blocks;The head information of the mapped file includes the total size of the mapped file, institute
State the information such as the total quantity of multiple data blocks.
The method also includes: from the mapped file for going multiple knot to obtain the data;According to institute in the mapped file
The finger print information of each data block of data is stated from each data block for going multiple knot to obtain the data;According to every number
Go out the data according to sequential concatenation of the finger print information of block in the mapped file.
It is described from the mapped file for going multiple knot to obtain the data include: title and data according to the mapped file
Block serial number is from each data block for going multiple knot to obtain the mapped file;Each data block of the mapped file is spliced into
The mapped file of the data.
The routing table that the basis is obtained from central node determines that corresponding with the Bucket to remove multiple knot include: to work as
For the first time when storing data, routing table is obtained from the central node;According to the routing table obtained from central node, it is determining with it is described
Bucket is corresponding to remove multiple knot.
The routing table that the basis is obtained from central node, determination is corresponding with the Bucket to remove multiple knot further include:
It sends request packet and removes multiple knot to corresponding with the Bucket;Receive the sound for going multiple knot to return corresponding with the Bucket
It should wrap, the response bag includes the version information of routing table;Judge the version information of the routing table in the response bag with it is described
Whether the version information of the routing table obtained from central node is identical;Version information and institute when the routing table in the response bag
When stating identical as the version information of routing table obtained from central node, determined according to the routing table obtained from central node
It is corresponding with the Bucket to remove multiple knot;When the routing table in the response bag version information with from central node obtain
When the version information of routing table is not identical, updated routing table is obtained from the central node;According to the updated road
It is redefined by table and corresponding with the Bucket removes multiple knot.
In order to solve the above-mentioned technical problem, disclosed herein as well is a kind of data read-write methods, comprising:
Central node sends routing table to client, and the routing table includes Bucket corresponding closes with go between multiple knot
System;Multiple knot is gone to receive the fingerprint queries request of the client, the fingerprint queries request includes removing multiple knot with described
The corresponding finger print information of the Bucket being assigned to;It is described that multiple knot is gone to inquire the finger print information, by what is do not inquired
Finger print information is back to the client;It is described that multiple knot is gone to receive the fingerprint not inquired that the client uploads
Information and its representative data block.
The method also includes: it is described that multiple knot is gone to save the finger not inquired in the Bucket being assigned to
Line information saves the data block in Container file corresponding with the Bucket being assigned to, described to remove multiple knot
The data block, which is returned, to the client saves successful message.
Described that multiple knot is gone to return before the data block saves successful message to the client, the method is also wrapped
It includes: described to go multiple knot that the data block of the finger print information not inquired and its representative is backuped to standby node.
The method also includes: the data blocks for going multiple knot to save the mapped file that the client uploads and corresponding
Cryptographic Hash.
The data block for saving the mapped file that the client uploads and corresponding cryptographic Hash include: in the correspondence
Bucket corresponding in Container file, save the data block of the mapped file;In the corresponding Bucket
In, save the cryptographic Hash and the first storage information of the data block of the mapped file.
The first storage information includes: to save the title of the Container file of data block of the mapped file, institute
State the size of the data block of offset and the mapped file of the data block of mapped file in the Container file.
The method also includes: it is described that multiple knot is gone to receive the data block that the client obtains the mapped file
Request;The data block for going multiple knot to send the mapped file is to the client;It is described that go multiple knot to receive described
Client obtains the request of data block representated by each finger print information in the mapped file;It is described that multiple knot is gone to send institute
Data block representated by each finger print information is stated to the client.
It is described that go multiple knot to send data block representated by each finger print information to the client include: described go
Multiple knot determines the second storage information of the data block according to the finger print information, and the second storage information includes saving institute
State the title of the Container file of data block, offset and the number of the data block in the Container file
According to the size of block;It is described to go whether multiple knot judges the Container file according to the title of the Container file
File to background server;It is described to go multiple knot according to when the Container file has been filed to background server
Data block offset and the data block in the Container file size from the background server obtain described in
Data block is simultaneously sent to the client;It is described to go multiple knot according to institute when the Container file is still stored in local
The size for stating offset and the data block of the data block in the Container file obtains the data block simultaneously from local
It is sent to the client.
The central node send routing table to client include: when the client storing data for the first time, it is described in
Heart node receives the routing table request of the client;The central node sends routing table to the client.
The central node sends routing table to client further include: described that multiple knot is gone to receive asking for the client
Seek packet: described that multiple knot is gone to send response bag to the client, the response bag includes the routing for going multiple knot to save
The version information of table;Version information and the routing table for going multiple knot to save when the routing table that the client saves
Version information it is inconsistent when, the central node receive the client routing table request;The central node is sent
Updated routing table is to the client.
It is described that multiple knot is gone to inquire the finger print information, the finger print information not inquired is back to the client
End goes multiple knot to judge that the finger print information whether there is by Bloom filter described in including:;Sentence when by Bloom filter
In the absence of the disconnected finger print information out, determine that the finger print information is the finger print information not inquired;When pass through the grand filtering of cloth
In the presence of device judges the finger print information, the finger print information is inquired in finger print information storehouse whether there is;Believe when in fingerprint
When inquiring the finger print information in breath library, determine that the finger print information is existing;When not inquiring institute in finger print information storehouse
When stating finger print information, determine that the finger print information is the finger print information not inquired.
In order to solve the above-mentioned technical problem, disclosed herein as well is a kind of data-storage systems, comprising: central node and one
It is a or multiple remove multiple knot, wherein the central node, for each bucket (Bucket) to be assigned to correspondence according to preset strategy
Remove multiple knot, and routing table is created with the corresponding relationship of multiple knot is removed according to Bucket, and synchronize the routing table to each
Remove multiple knot;It is described to remove multiple knot, for storing finger corresponding to each Bucket being assigned to according to the routing table
The data block that line information and the finger print information represent.
In order to solve the above-mentioned technical problem, disclosed herein as well is a kind of clients for reading and writing data, comprising: cutting
Computing module, for being multiple data blocks and the finger print information for calculating separately each data block by data cutting;Bucket determining module,
For determining Bucket corresponding to the finger print information of each data block;Node determining module is used for basis from centromere
The routing table obtained is put, determination is corresponding with the Bucket to remove multiple knot;Request sending module is asked for sending fingerprint queries
It asks to corresponding with the Bucket and removes multiple knot, the fingerprint queries request includes the finger print information of data block;Information receives
Module, for receiving the finger print information not inquired for going multiple knot to return corresponding with the Bucket;Data upload mould
Block, for uploading the data block extremely duplicate removal section corresponding with the Bucket of the finger print information not inquired and its representative
Point.
In order to solve the above-mentioned technical problem, disclosed herein as well is a kind of systems for reading and writing data, comprising: centromere
Point and remove multiple knot, wherein the central node, for sending routing table to client, the routing table include Bucket with
Remove the corresponding relationship between multiple knot;Described to remove multiple knot, the fingerprint queries for receiving the client are requested, the finger
Line inquiry request includes finger print information corresponding with the Bucket for going multiple knot to be assigned to;The finger print information is looked into
It askes, the finger print information not inquired is back to the client;Receive that the client uploads described does not inquire
Finger print information and its representative data block.
Compared with prior art, the application can be obtained including following technical effect: be realized to 100PB or more rank
The global duplicate removal storage management of the finger print information of initial data and 100TB or more rank has very high scalability, is
What addition was new in system goes multiple knot rear center node that can re-start data distribution according to preset strategy, goes multiple knot automatically complete
It at Data Migration, be extended the performance of system and capacity can easily.
Certainly, any product for implementing the application must be not necessarily required to reach all the above technical effect simultaneously.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present application, constitutes part of this application, this Shen
Illustrative embodiments and their description please are not constituted an undue limitation on the present application for explaining the application.In the accompanying drawings:
Fig. 1 is a kind of structural schematic diagram of data-storage system (system for reading and writing data) of the embodiment of the present application;
Fig. 2 is the routing table schematic diagram of the embodiment of the present application;
Fig. 3 is a kind of flow diagram of data read-write method of the embodiment of the present application;
Fig. 4 is a kind of flow diagram of data read-write method of the embodiment of the present application;
Fig. 5 is a kind of structural schematic diagram of client for reading and writing data of the embodiment of the present application.
Specific embodiment
Carry out the embodiment that the present invention will be described in detail below in conjunction with accompanying drawings and embodiments, how the present invention is applied whereby
Technological means solves technical problem and reaches the realization process of technical effect to fully understand and implement.
Fig. 1 is data-storage system provided by the embodiments of the present application (hereinafter referred to as " system "), including 10 He of central node
Multiple knot 11 is removed, central node 10 is coupled with multiple knot 11 is removed.In systems, central node 10 is responsible for removing multiple knot 11 to multiple
Distributed management and system in data distribution and replica management.Multiple knot 11 is gone to be responsible for data block and data block
Finger print information and storage information be managed and save, and under the distributed management of central node 10 complete data duplication
And migration.It goes multiple knot 11 to have abstract storage engines layer, can very easily add new storage engines.
System to the management of data block with bucket (Bucket) for unit, Bucket is a logical concept in system, is
Each Bucket distributes a Bucket number, and Bucket number is preset for passing through with the finger print information of each data block
Hash algorithm establishes corresponding relationship, so that data block be stored respectively according to Bucket number and establish data block and storage file
Between corresponding relationship.The finger print information of data block and data block that the central node of system saves system by Bucket into
Row global administration.
Central node according to preset strategy by each Bucket be assigned to it is corresponding remove multiple knot, which can be
Load balancing.For example, central node obtains each load data for removing multiple knot, by the real-time change of load data come
It determines and each removes the current load condition of multiple knot, Bucket is preferentially assigned to present load is lower to remove multiple knot, pass through
The each load balancing for removing multiple knot of data distribution balanced realization.The preset strategy can be position security strategy, for example, center
Node according to the permission of the concerning security matters of data or client by different Bucket be assigned to it is different remove multiple knot, make difference
The client data of concerning security matters rank or different rights is stored in different go in multiple knot.
Central node by each Bucket be assigned to it is corresponding remove multiple knot, numbered by Bucket and remove multiple knot
The corresponding relationship for identifying to establish Bucket Yu remove multiple knot, and routing table is created according to the corresponding relationship.The routing table can be seen
Make a mapping table, have recorded Bucket and go the mapping relations between multiple knot, Fig. 2 is routing table in the embodiment of the present application
Exemplary diagram, wherein the number of lateral gauge outfit represents Bucket number, and the number of longitudinal gauge outfit represents the copy mark of Bucket,
Letter in table, which respectively represents, different removes multiple knot.As shown in Fig. 2, the Bucket that wherein number is 0, No. 0 copy are divided
It is fitted on multiple knot D, No. 1 copy is assigned to multiple knot A;The Bucket that number is 1, No. 0 copy are assigned to duplicate removal section
Point A, No. 1 copy are assigned to multiple knot B.Fig. 2 is used to illustrate the routing table in the embodiment of the present application, and
Do not constitute the limitation to the application protection scope, in system it is settable it is any number of remove multiple knot, each go multiple knot can be by
Multiple Bucket are assigned to, each Bucket, which there can also be one or more backups and back up, removes multiple knot in different.
After central node creates routing table, which is synchronized to and each removes multiple knot.Each duplicate removal in system
Node is assigned to local Bucket according to routing table determination, stores fingerprint letter corresponding with local Bucket is assigned to
The data block that breath and finger print information represent.Multiple knot is gone to save what the corresponding finger print information of each Bucket and finger print information represented
When data block, the Bucket to be each assigned to creates corresponding container (Container) file, saves in each Bucket
Corresponding finger print information saves the data block that finger print information represents in Container file corresponding with Bucket.And fingerprint
Corresponding relationship between information and Bucket is to carry out modulo operation by Bucket sum of the finger print information to internal system,
Finger print information corresponding Bucket number is determined according to operation result, this calculating process is usually to the visitor of system storing data
Family end is completed.When finger print information corresponding with Bucket is more and more, the data that are saved in corresponding Container file
Block increases therewith, and the memory space that Container file occupies also increases with it, and goes multiple knot that can store in order to ensure each
The load of multiple knot is gone in the copy of multiple Bucket and control, when the size of the corresponding Container file of a Bucket
When more than preset threshold, multiple knot is gone to background server 12, as shown in fig. 1, each to go the Container archive
Multiple knot is all coupled with background server 12, and when multiple knot being gone to receive corresponding data block again, storage is to positioned at background service
In the Container file of device 12.
Central node according to preset strategy by each Bucket distribute to it is corresponding remove multiple knot when, can will be each
Bucket be assigned to it is multiple it is corresponding remove multiple knot, so that each Bucket is there are multiple copies in systems, and each to copy
Shellfish distributes different copy marks.Such as in routing table shown in Fig. 2, central node is that each Bucket is assigned to two duplicate removals
Node, each Bucket is in the different copy marks 0 and 1 for going the copy of multiple knot to be respectively provided with.
Central node by each Bucket be assigned to it is multiple remove multiple knot when, it is multiple go in multiple knot determine a master
Node and at least one standby node.Central node can identify according to copy and determine host node and standby node, each
A primary copy mark is determined in multiple copies mark of Bucket, other copy marks are backing copy mark, for example, will
The copy for being identified as 0 is copied as the primary copy of each Bucket, copies of other copy marks are backing copy.And in institute
Have in multiple knot, the master for going multiple knot to be determined as the Bucket copy of some Bucket being identified as where 0 copy
Node, the multiple knot that goes where other copies of the Bucket is the standby node of the Bucket.To prevent some duplicate removal section
When point is unavailable, the data storage and read-write of the Bucket gone on multiple knot all will be unable to carry out.Each of Bucket is copied
Shellfish includes the corresponding finger print information of the Bucket and the Container file for saving the data block that the finger print information represents.
Central node judge it is each go whether multiple knot can be used, or judge whether internal system increases new duplicate removal
Node.Whether whether central node judged each to go multiple knot available by the heartbeat message gone between multiple knot or be increased
Add and new has removed multiple knot.Central node judge some go multiple knot unavailable or system in increase and new remove multiple knot
When, central node redistributes each Bucket, and internal system Bucket will change with the mapping relations of multiple knot are gone.When
When some goes multiple knot unavailable, central node will go the corresponding Bucket of multiple knot to be re-assigned to according to preset strategy with this
Other go in multiple knot;New when removing multiple knot when increasing in system, central node will be in system according to preset strategy
Bucket is redistributed.Above two situation can all be such that internal system Bucket becomes with the mapping relations of multiple knot are gone
Change, central node updates routing table with the mapping relations of multiple knot are gone according to the Bucket after variation, and by updated routing
Table, which is synchronized to, each removes multiple knot.Since what Bucket was assigned to goes multiple knot to be changed, go multiple knot will in system
Data Migration is carried out according to updated routing table.
The Data Migration is initiated by the host node for removing the changed Bucket of multiple knot being assigned to.For example, on road
By No. 0 copy (primary copy) for the Bucket that in table, number is 1 from going multiple knot A to become multiple knot B, then by the duplicate removal section
No. 0 of the Bucket that point A initiation number is 1 is copied to the Data Migration for removing multiple knot B, and multiple knot B is gone to sentence further according to routing table
Whether other standby nodes for the Bucket that the number of breaking is 1 changed, in case of variation, such as from removing multiple knot
D becomes multiple knot E, then the data in Bucket for being again 1 by number backup to multiple knot E, goes the number of multiple knot E to be
The copy of 1 Bucket is identified as backing copy mark.Each go multiple knot can be according to updated road after the completion of Data Migration
Data with the local Bucket that mapping relations are not present are deleted by table.Multiple knot is removed when the host node as some Bucket
When unavailable, central node redefines a host node in the standby node of the Bucket, by the host node redefined
The Data Migration in relation to the Bucket is initiated according to updated routing table.For example, the host node for the Bucket that number is 1 ---
When going multiple knot A unavailable, the standby node for the Bucket that central node is 1 from number --- it removes multiple knot B and removes multiple knot C
In, it determines the host node for going multiple knot B to be the Bucket that number is 1, then removes the copy for the Bucket that number is 1 in multiple knot B
Copy mark become primary copy mark (such as 0), the standby node for the Bucket that number is 1 in updated routing table is to go
Multiple knot C and multiple knot D is removed, then by going multiple knot B that the data for numbering the Bucket for being 1 are backuped to multiple knot D.
Each duplicate removal intra-node is built with a finger print information storehouse.The finger print information storehouse includes each of multiple knot
The storage information for the data block that finger print information corresponding to Bucket and finger print information represent.The finger print information storehouse can use
The form of Key-Value Store is Key with finger print information, the storage information of the data block which represents as
Value.During the reading and writing data of system, it is related to a large amount of finger print information inquiry and comparison processing, in each duplicate removal section
Partial query request is undertaken using Bloom filter in point, since there are the possibility of under-enumeration for Bloom filter, there are also a large amount of
Request needs further exist for being completed by finger print information storehouse.Therefore to the reading performance of finger print information storehouse (Key-Value Store)
It is required that very high.It, can be Key-Value to (Key-Value for small-sized Key-Value Store
, it is stored on common hard disc, establishes index, in memory then with fast
Key-Value Pair in fast ground access hard disk.But since this system is applied to the data storage of 100PB rank or more, fingerprint
Information and information memory capacity are very big (initial data of 100PB, the finger print information and storage information of corresponding about 50TB), therefore
Index can not be established in device memory at this time.Therefore the inventors of the present application found that completely can be in solid state hard disk (Solid
State Drives, SSD) on realize a Hash table to store whole Key-Value Pair of finger print information storehouse.This is deposited
The Hash table being stored in solid state hard disk is cuckoo Hash Map, due to going carrying out first by Bloom filter for multiple knot
The inquiry of finger print information compares, and exists due to hash-collision and the situation of under-enumeration, cuckoo Hash Map is that one kind can be located
The mode of hash-collision is managed, its basic ideas are that the position of Key storage, (1) are calculated using two different hash functions
If two positions are all idle, a position insertion is selected;(2) if only one position is idle, it is inserted into this sky
Not busy position;(3) it if two positions are not idle, randomly chooses the position of one of both and kicks out of Key in this position, so
The corresponding position of another cryptographic Hash for calculating the Key kicked out of afterwards is inserted into, and is inserted into if this position is sky, if
The Key on this position is not kicked out of again then for sky, so continue to find clear position always.Obvious this mode is possible to generate
Infinite Cyclic, therefore it is normally set up a maximum lookup number, when reaching this maximum value, it is believed that the Hash table has been expired.
Inventor selects cuckoo Hash, is because input and output number of the system when inquiring Key is usually arranged as constant.
Common cuckoo Hash Map only has 49% utilization rate, so two kinds for generalling use cuckoo Hash are main
Deformation: 1) increase hash function number;2) number of Key can be stored by increasing each position.Both deformations are ok
For improving the utilization rate of cuckoo Hash Map.Present inventor has selected murmur2 hash function as basic
Hash function, and by the way that different seed is arranged, identical Key value can produce different cryptographic Hash.
Since the solid-state based on NVMe (NonVolatile Memory express, high speed nonvolatile storage) agreement is hard
Disk (SSD) bottom be all with the page of 4K (Page) be basic unit, therefore finger print information storehouse operated when be all to be with 4K
Size is written and read.Key-Value Pair size in finger print information storehouse is 256Byte, then the Page of a 4K can be with
Store 16 finger print informations.16 Key-Value Pair are stored in so cuckoo Hash Map, each position, each
Key-Value Pair is to be written in Page by insertion sequence, is not sorted by Key, and this unordered mode can be to avoid
Sort bring expense in solid state hard disk.According to the actual test of present inventor, using 128 concurrent (queue depths
Number=128 × job) asynchronous mode, can sufficiently excavate IOPS (the Input/Output Operations Per of NVMe
Second, the number per second for being written and read (I/O) operation) ability (450K), it is so big concurrent in order to generate, the application's
Inventor is optimized at two aspects: 1, multiple cuckoo Hash mapping sheet forms are run on one piece of NVMe hard disk
Key-Value Store;2, multiple cuckoo hash functions are used on one piece of NVMe hard disk, and use asynchronous reading side
Formula;And need while meeting: the number of the cuckoo Hash Map run in every piece of solid state hard disk is multiplied by cuckoo Hash
The number of function is equal to 128.The inventors of the present application found that single cuckoo Hash reflects when cuckoo hash function becomes more
QPS (query rate per second, the Query Per Second) decline of firing table is fairly obvious, becomes 8 tunnels from 4 tunnel cuckoo hash functions
When cuckoo hash function, QPS has dropped half, and when cuckoo hash function is very little, cuckoo Hash mapping table space
Utilization rate then declines obviously.Choosing comprehensively considers performance and space utilization rate, and present inventor selects 4 tunnel cuckoo Hash
Function, space utilization rate can reach 98.66%.Thus, it is desirable to run 32 cuckoo Hash on one piece of NVMe hard disk
Mapping table.And another benefit for being divided into multiple Hash Maps is the locking granularity that can reduce the finger print information storehouse.
The process for carrying out data read-write operation with above-mentioned data-storage system to client below is described further.Client
Hold to system be written data when, as shown in figure 3, the process includes the following steps.
In step s 201, data cutting is multiple data blocks and the fingerprint letter for calculating separately each data block by client
Breath.
The cryptographic Hash of each data block is calculated as finger print information, such as SHA- using the lower hash algorithm of collision rate
The hash algorithms such as 1, MD5.
In step S202, client determines Bucket corresponding to the finger print information of each data block.
Bucket sum in the finger print information and system of data block is carried out modulo operation by client, according to modulo operation
Result and Bucket number matched, so that it is determined that the corresponding Bucket of the finger print information.For example, the cryptographic Hash of data block
For a, the Bucket sum in system is p, carries out modulo operation a%P, and modulo operation result is 2, then the fingerprint letter of the data block
Cease the Bucket that reference numeral is 2.
In step S203, client determines duplicate removal corresponding with Bucket according to the routing table obtained from central node
Node.
Client is determined according to the routing table of preservation removes multiple knot where the corresponding Bucket of finger print information, works as client
When data are written to system for the first time, routing table first can be requested to central node.For example, the finger print information reference numeral of data block is
2 Bucket, in the routing table, the Bucket that number is 2 are assigned to multiple knot A and remove multiple knot B, wherein removing multiple knot
A is the host node for the Bucket that number is 2, and removing multiple knot B is the standby node for the Bucket that number is 2, it is therefore desirable to by this
The finger print information of data block is sent to multiple knot A and carries out fingerprint queries.
In step S204, client sends fingerprint queries and requests to remove multiple knot to corresponding with Bucket, which looks into
Ask the finger print information that request includes data block.
Client includes reading thread, sending thread and logical process thread.Multiple reading threads are each responsible for the data
Different piece carry out stripping and slicing and calculate the finger print information of data block, then finger print information is kept in inquiry request queue,
Each reading thread includes multiple queries request queue, and each inquiry request queue corresponds to a Bucket number.Client
The finger print information of the same Bucket of correspondence can be kept in the same inquiry request queue.Number in inquiry request queue
According to being more than after a certain amount of or the inquiry request queue delay expires, inquiry request is placed into transmission thread by reading thread
Buffer area.
Thread is sent according to the corresponding Bucket of each inquiry request queue, sends and goes where request packet to the Bucket
Multiple knot (host node of the Bucket).In one embodiment, which includes four buffer areas, two of them buffering
The request that area's storage is being transmitted to system, respectively corresponds fingerprint queries and asks summed data block upload request, other two buffering
Area receives the new request that other threads are sent inside client, respectively corresponds fingerprint queries and asks summed data block upload request.If
Two different buffer areas are set, the new request that the request and other threads transmitted to system are sent is separated, can be avoided
There is prolonged obstruction in new request process is written in other threads.
When send thread receive the response bag that multiple knot is sent back to when, response bag can be sent to logical process thread into
The corresponding processing of row.Logical process thread will not inquire fingerprint letter according to the fingerprint queries result for going multiple knot to return accordingly
Breath and its upload request of data block represented pass to transmission thread, by transmission thread by the finger print information not inquired and its
The data block of representative, which is sent to, corresponding removes multiple knot.Such thread burse mode can guarantee that the transmission of request is continuous
, smoothly.
The version information in the response bag that thread receives including the routing table for going multiple knot currently stored is wherein sent,
Judge whether the version information of the routing table in the response bag is identical as the version information of the routing table obtained from central node, when
When the version information of routing table in response bag is not identical as the version information of the routing table obtained from central node, represent in this
Heart node has had updated routing table and has been synchronized to the multiple knot that goes in system, and client is by sending thread to from center at this time
Node obtains updated routing table, and redefines according to the updated routing table that Bucket is corresponding to remove multiple knot, from
And what the data block for redefining the finger print information and its representative that do not inquire should upload to removes multiple knot.When in response bag
When the version information of routing table is identical as the version information of routing table obtained from central node, obtained still according to from central node
Routing table determine it is corresponding with Bucket remove multiple knot, the finger print information not inquired and its representative data block institute Ying Shangchuan
Go multiple knot constant.
In step S205, multiple knot is gone to inquire finger print information, the finger print information not inquired is back to visitor
Family end.
Duplicate removal intra-node includes a Bloom filter and a finger print information storehouse.The Bloom filter establishes this and goes
The hash index of the currently stored all finger print informations of multiple knot;It is protected in the form of Key-Value Pair in the finger print information storehouse
Deposit the storage information for the data block that all finger print informations and finger print information represent.It goes during multiple knot requests fingerprint queries
All finger print informations successively access Bloom filter and finger print information storehouse.The Kazakhstan of each finger print information is calculated by Bloom filter
It is uncommon to index and judge whether identical as the hash index in Bloom filter.When with the not phases of the hash index in Bloom filter
Simultaneously, it is determined that this goes in multiple knot the data block for not having identical finger print information and its representative, when in Bloom filter
Some hash index phase is simultaneously as Bloom filter there are the loophole of hash-collision, can determine that the finger print information is possible to
In the presence of, need further by finger print information library inquiry whether include the finger print information, when there are the fingerprints in finger print information storehouse
It when information, determines that the finger print information is existing, when the finger print information is not present in finger print information storehouse, determines the finger print information not
In the presence of.Carrying out inquiry by the Bloom filter with finger print information hash index first can be improved multiple knot fingerprint queries
Efficiency, then make up by finger print information storehouse the under-enumeration situation that Bloom filter is likely to occur due to hash-collision, improve
Go the accuracy of multiple knot fingerprint queries.Multiple knot is gone to put all finger print informations not inquired in fingerprint queries request
Enter response bag and is back to client.The response bag further includes the version information of the routing table for going multiple knot currently stored, with
Judge whether to need to update routing table for client.
In step S206, client upload the data block of the finger print information not inquired and its representative to Bucket pairs
That answers removes multiple knot.
The data block of the finger print information not inquired in response bag and its representative is uploaded to and does not inquire by client
The corresponding Bucket of finger print information where remove multiple knot.If the version information of routing table does not change, it is somebody's turn to do and does not look into
Ask to finger print information corresponding Bucket where the multiple knot that goes be exactly to carry out fingerprint queries in step S205 to remove multiple knot.
Fingerprint queries request in other finger print informations due to being had existed in removing multiple knot, then do not need to upload again, avoid be
System repeats to store identical data block.
In step S207, multiple knot is gone to save the finger print information not inquired in the Bucket being assigned to,
The data block is saved in Container file corresponding with the Bucket being assigned to.
Multiple knot is gone to receive the data block of the finger print information and its representative that do not inquire, corresponding with finger print information
The finger print information not inquired is saved in Bucket, the Container file corresponding to Bucket corresponding with finger print information
The middle data block for saving the finger print information and representing.The title of Container file is as corresponding to Container file
The number of Bucket+internal system Universally Unique Identifier (UUID)+date (Date) composition, such as 2_abcd234_
010515.In order to guarantee that disk is written in data block, data block is written accordingly in such a way that O_SYNC flag bit is set
Container file makes just to return after the completion of being written every time, the finger print information of the data block is written again after pwrite is returned
Finger print information storehouse, when finger print information storehouse is written, using the finger print information of the data block as Key, by the second storage of the data block
Information forms a Key-Value Pair and is stored in finger print information storehouse as Value.The second storage information includes saving
The title of the Container file of the data block, offset (Offset) of the data block in the Container file and should
The size (Chunksize) of data block.The hash index of the Key-Value Pair of the new preservation is updated in Bloom filter,
To be used for subsequent data duplication elimination query.
In step S208, multiple knot is gone to save successful message to client returned data block.
After the data block for the finger print information and its representative not inquired saves, multiple knot is gone to return to number to client
Successful message is saved according to block, or in one embodiment, when the corresponding Bucket of the finger print information not inquired is in system
In there are when standby node, host node has saved the data block of the client finger print information not inquired uploaded and its representative
Bi Hou, then the standby node of corresponding Bucket is backuped to, it is saved successfully after backup finishes to client returned data block
Message.
In step S209, when the data block for the finger print information and its representative not inquired, which all uploads, to be finished, client
End uploads the mapped file of data to removing multiple knot.
Mapped file includes the finger print information of each data block of the data, and the finger print information of each data block according to
Cutting sequence when the data cutting is multiple data blocks by client arranges, to guarantee correct mapping to the data.
Client similarly uploads mapped file piecemeal when uploading mapped file.Client is by mapped file cutting
For multiple data blocks and calculate separately mapped file data block cryptographic Hash.For example, client passes through murmur2 hash function
Calculate separately the cryptographic Hash of each data block of mapped file.Client determines corresponding to the cryptographic Hash of the data block of mapped file
Bucket, according to routing table determine Bucket corresponding with the cryptographic Hash of the data block of mapped file corresponding to duplicate removal section
Point uploads and removes multiple knot corresponding to the data block and corresponding cryptographic Hash to corresponding Bucket of mapped file.In client
When passing the data block of the mapped file, fingerprint queries are similarly carried out according to the cryptographic Hash of the data block of each mapped file, only
The data block for uploading mapped file corresponding to the cryptographic Hash not inquired, avoids uploading duplicate mapped file data block.Visitor
Family end by mapped file cutting be multiple data blocks when, by the head information cutting of mapped file be multiple data blocks in first
Data block, the head information of the mapped file include the information such as the total size of mapped file and the total quantity of multiple data block.
In step S210, multiple knot is gone to save the data block and corresponding cryptographic Hash of the mapped file that client uploads.
In Bucket corresponding with the cryptographic Hash of the data block of mapped file, the Hash of the data block of mapped file is saved
It is worth and first stores information, the data of mapped file is saved in the Container file corresponding to the corresponding Bucket
Block.The first storage information includes saving the title of the Container file of data block of mapped file, the data of mapped file
The size of the data block of offset and mapped file of the block in Container file.Again with title+data block of mapped file
Serial number Key is Value with the first storage information of the data block of mapped file, updates fingerprint as Key-Value Pair
Information bank.So far client all terminates to the process of system write-in data.
As shown in figure 4, client reads the process of data from system in the embodiment of the present application, which includes following step
Suddenly.
In step S301, client is according to the mapped file titles of data and data block sequence number to going multiple knot request to reflect
Penetrate file.
Client is first to first data block for removing multiple knot request mapped file, first data block of mapped file
Head information including the mapped file.The head information of the mapped file includes the size of mapped file and the number of the mapped file
According to the total quantity of block.Client obtains other data blocks of mapped file to going multiple knot to issue according to the head information of mapped file
Request.
In step s 302, go the data block of multiple knot transmission mapped file to client.
The Key in mapped file title and data block sequence number and finger print information storehouse for going multiple knot to be sent according to client into
Row matching, to inquire Key-Value Pair of the data block of the mapped file in finger print information storehouse, determine and map
The corresponding first storage information of the Key of the data block of file.It is determined according to the Container file name in the first storage information
Which Container file is the data block of the mapped file be stored in, is further existed according to the data block of the mapped file
The size of offset and the mapped file data block in Container file gets the mapping from Container file
The data block of file.
In step S303, client is spliced into mapped file according to the data block of mapped file, and according to mapped file
In each data block finger print information to duplicate removal node requests data block.
Client is spliced into complete mapped file according to the block serial number of mapped file data block.Mapped file includes all
The finger print information of data block and according to the cutting of data block sequence arrange.Client determination is corresponding with finger print information
Bucket removes multiple knot where determining Bucket corresponding with finger print information by routing table, goes multiple knot to send to this
Obtain the request of corresponding data block.
In step s 304, go the data block of the finger print information representative of multiple knot transmission mapped file to client.
Multiple knot is removed according to the finger print information in the request for obtaining data block to inquire finger print information storehouse, inquires and refers to this
The corresponding second storage information of line information.The finger print information is determined according to the Container file name in the second storage information
Which Container file is the data block of representative be stored in, and the offset according to the data block in Container file
Size with the data block is from Container file acquisition to the data block.In one embodiment, according to the second storage information
In Container file name determine the finger print information represent data block be stored in which Container file after, judgement
Whether the Container file has filed background server, if the Container file has filed background service
Device goes multiple knot to get data block from the Container file for being stored in background server and is sent to data block
Client.
In step S305, client goes out institute according to sequential concatenation of the finger print information of each data block in mapped file
State data.
As shown in figure 5, being used for the client of reading and writing data in the embodiment of the present application, comprising:
Cutting computing module 501, for being multiple data blocks and the fingerprint for calculating separately each data block by data cutting
Information;
Bucket determining module 502, for determining Bucket corresponding to the finger print information of each data block;
Node determining module 503, for according to the routing table obtained from central node, determination to be corresponding with the Bucket
Remove multiple knot;
Request sending module 504 requests to remove multiple knot to corresponding with the Bucket for sending fingerprint queries, described
Fingerprint queries request includes the finger print information of data block;
Information receiving module 505 with the Bucket corresponding goes what multiple knot returned not inquire for receiving
Finger print information;
Data uploading module 506, for upload the data block of the finger print information not inquired and its representative to institute
State that Bucket is corresponding to remove multiple knot;It is finished when the finger print information not inquired and its data block of representative all upload
When, it is also used to upload the mapped file of the data to multiple knot is removed, the mapped file includes each data of the data
The finger print information of the finger print information of block, each data block is arranged according to the cutting sequence of data block.
In addition, the system that reading and writing data is used in a kind of the embodiment of the present application is also disclosed, it can be refering to what is shown in Fig. 1, packet
Include: central node 10 and one or more remove multiple knot 11, wherein
The central node 10, for sending routing table to client, the routing table includes Bucket and removes multiple knot
Between corresponding relationship;
Described to remove multiple knot 11, the fingerprint queries for receiving the client are requested, the fingerprint queries request packet
Include finger print information corresponding with the Bucket for going multiple knot to be assigned to;The finger print information is inquired, will not inquired
To finger print information be back to the client;Receive the finger print information not inquired that the client uploads and its
Representative data block.
It should be noted that the feature of the system shown in figure 1 for reading and writing data and embodiment shown by Fig. 3,4
It corresponds to each other, the client illustrated in fig. 5 for reading and writing data is also mutually right with the feature of embodiment shown by Fig. 3,4
Answer, thus Fig. 1,5 embodiment in shortcoming can be found in the description of Fig. 3, embodiment shown by 4, repeat no more.
Date storage method, data-storage system and data read-write method provided by the embodiments of the present application are read for data
The client write and the system for reading and writing data, realize to the initial data and 100TB of 100PB or more rank with higher level
The global duplicate removal storage management of other finger print information, has a very high scalability, system be added it is new after removing multiple knot, in
Heart node can re-start data distribution according to preset strategy, and multiple knot is gone to be automatically performed Data Migration, make the performance of system
It can be easily extended with capacity.Multiple knot is gone to realize the high-performance fingerprint letter based on solid state hard disk each
Library is ceased, the cuckoo Hash Map of large capacity is established in solid state hard disk, is overcome when the data volume of finger print information is very big
Index can not be established in memory, and then can not carry out the technical difficulty of duplication elimination query, while ensure that finger print information inquiry
Efficiency and and improve finger print information inquiry accuracy.
In a typical configuration, calculating equipment includes one or more processors (CPU), input/output interface, net
Network interface and memory.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/or
The forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable medium
Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method
Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data.
The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), moves
State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable
Programmable read only memory (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM),
Digital versatile disc (DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or other magnetic storage devices
Or any other non-transmission medium, can be used for storage can be accessed by a computing device information.As defined in this article, it calculates
Machine readable medium does not include non-temporary computer readable media (transitory media), such as the data-signal and carrier wave of modulation.
As used some vocabulary to censure specific components in the specification and claims.Those skilled in the art answer
It is understood that hardware manufacturer may call the same component with different nouns.This specification and claims are not with name
The difference of title is as the mode for distinguishing component, but with the difference of component functionally as the criterion of differentiation.Such as logical
The "comprising" of piece specification and claim mentioned in is an open language, therefore should be construed to " include but do not limit
In "." substantially " refer within the acceptable error range, those skilled in the art can within a certain error range solve described in
Technical problem basically reaches the technical effect.In addition, " coupling " word includes any direct and indirect electric property coupling herein
Means.Therefore, if it is described herein that a first device is coupled to a second device, then representing the first device can directly electrical coupling
It is connected to the second device, or the second device indirectly electrically coupled through other devices or coupling means.Specification
Subsequent descriptions are to implement better embodiment of the invention, so the description be for the purpose of illustrating rule of the invention,
The range being not intended to limit the invention.Protection scope of the present invention is as defined by the appended claims.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability
Include, so that commodity or system including a series of elements not only include those elements, but also including not clear
The other element listed, or further include for this commodity or the intrinsic element of system.In the feelings not limited more
Under condition, the element that is limited by sentence "including a ...", it is not excluded that in the commodity or system for including the element also
There are other identical elements.
Several preferred embodiments of the invention have shown and described in above description, but as previously described, it should be understood that the present invention
Be not limited to forms disclosed herein, should not be regarded as an exclusion of other examples, and can be used for various other combinations,
Modification and environment, and the above teachings or related fields of technology or knowledge can be passed through within that scope of the inventive concept describe herein
It is modified.And changes and modifications made by those skilled in the art do not depart from the spirit and scope of the present invention, then it all should be in this hair
In the protection scope of bright appended claims.
Claims (33)
1. a kind of date storage method, which is characterized in that applied to including central node and remove the data-storage system of multiple knot,
The date storage method, comprising:
Each Bucket (bucket) is assigned to according to preset strategy and corresponding removes multiple knot by the central node;
The central node creates routing table with the corresponding relationship of multiple knot is removed according to Bucket, and synchronizes the routing table to often
It is a to remove multiple knot;
It is described to go multiple knot according to the routing table, store finger print information and institute corresponding to each Bucket being assigned to
State the data block of finger print information representative.
2. date storage method as described in claim 1, which is characterized in that it is described to go multiple knot according to the routing table, it deposits
The data block that finger print information and the finger print information corresponding to each Bucket being assigned to of storage represent, comprising:
It is described to go multiple knot that corresponding Container (container) file is respectively created for each Bucket being assigned to;
It is described that multiple knot is gone to save corresponding finger print information in each Bucket being assigned to, with each distribution
To the corresponding Container file of Bucket in save the data block that the finger print information represents.
3. date storage method as claimed in claim 2, which is characterized in that
It is described that multiple knot is gone to judge whether the size of the Container file is greater than preset threshold;
It is described that multiple knot is gone to return the Container file when the size of the Container file is greater than preset threshold
Shelves are to background server.
4. date storage method as described in claim 1, which is characterized in that the central node will be each according to preset strategy
Bucket, which is distributed to, corresponding removes multiple knot, comprising:
The central node by each Bucket be assigned to it is multiple it is corresponding remove multiple knot, corresponding remove multiple knot the multiple
One host node of middle determination and at least one standby node.
5. date storage method as claimed in claim 4, which is characterized in that
Whether central node judgement each goes whether multiple knot can be used, or increase and new remove multiple knot;
When judging that some goes multiple knot unavailable, or increase new when removing multiple knot, the central node is redistributed
Each Bucket;
The central node, which updates the routing table and is synchronized to, each removes multiple knot;
It is described that multiple knot is gone to carry out Data Migration according to updated routing table.
6. date storage method as claimed in claim 5, which is characterized in that described to go multiple knot according to updated routing table
Carry out Data Migration, comprising:
The host node initiates the Data Migration according to updated routing table.
7. date storage method as claimed in claim 5, which is characterized in that described to judge that some goes multiple knot unavailable
When, the central node redistributes each Bucket, comprising:
When judging that the host node is unavailable, the central node redefines out from least one described standby node
One host node;
It is described go multiple knot according to updated routing table carry out Data Migration include:
The host node redefined initiates the Data Migration according to updated routing table.
8. date storage method as described in claim 1, which is characterized in that removing multiple knot described in each includes a fingerprint letter
Library is ceased, the finger print information storehouse is stored in the cuckoo Hash Map in solid state hard disk, goes the every of multiple knot including described
The storage information for the data block that finger print information corresponding to a Bucket and the finger print information represent.
9. date storage method as claimed in claim 8, which is characterized in that run M cuckoo in the solid state hard disk simultaneously
Bird Hash Map, and N number of cuckoo hash function is used simultaneously;Wherein, M × N=128.
10. date storage method as claimed in claim 9, which is characterized in that run 32 cloth in the solid state hard disk simultaneously
Paddy bird Hash Map, and 4 tunnel cuckoo hash functions are used simultaneously.
11. a kind of data read-write method characterized by comprising
It is multiple data blocks and the finger print information for calculating separately each data block by data cutting;
Determine Bucket corresponding to the finger print information of each data block;
According to the routing table obtained from central node, determination is corresponding with the Bucket to remove multiple knot;
It sends fingerprint queries to request to remove multiple knot to corresponding with the Bucket, the fingerprint queries request includes data block
Finger print information;
Receive the finger print information not inquired for going multiple knot to return corresponding with the Bucket;
The data block for uploading the finger print information not inquired and its representative removes multiple knot to corresponding with the Bucket.
12. method as claimed in claim 11, which is characterized in that the finger print information institute of determination each data block is right
The Bucket answered includes:
The total quantity of the finger print information and the Bucket are subjected to modulo operation, determined according to the result of the modulo operation
Bucket corresponding to the finger print information.
13. method as claimed in claim 11, which is characterized in that the method also includes:
When the finger print information not inquired and its data block of representative, which all upload, to be finished, the mapping of the data is uploaded
For file to multiple knot is removed, the mapped file includes the finger print information of each data block of the data, each data block
Finger print information according to data block cutting sequence arrange.
14. method as claimed in claim 13, which is characterized in that the mapped file for uploading the data to duplicate removal section
Point, comprising:
By the mapped file cutting be multiple data blocks and calculate separately mapped file data block cryptographic Hash;
Determine Bucket corresponding to the cryptographic Hash of the data block of the mapped file;
According to duplicate removal corresponding to the determining Bucket corresponding with the cryptographic Hash of the data block of the mapped file of the routing table
Node;
Upload the cryptographic Hash pair of the data block and corresponding cryptographic Hash of the mapped file extremely with the data block of the mapped file
Multiple knot is removed corresponding to the Bucket answered.
15. method as claimed in claim 14, which is characterized in that it is described by the mapped file cutting be multiple data block packets
It includes:
By first data block that the head information cutting of the mapped file is in the multiple data block;The mapped file
Head information includes the size of the mapped file, the total quantity of the multiple data block.
16. method as claimed in claim 13, which is characterized in that the method also includes:
From the mapped file for going multiple knot to obtain the data;
According to the finger print information in the mapped file from each data block for going multiple knot to obtain the data;
Go out the data according to sequential concatenation of the finger print information of each data block in the mapped file.
17. the method described in claim 16, which is characterized in that described from the mapped file for going multiple knot to obtain the data
Include:
According to the title of the mapped file and data block sequence number from each data block for going multiple knot to obtain the mapped file;
Each data block of the mapped file is spliced into the mapped file of the data.
18. method as claimed in claim 11, which is characterized in that the routing table that the basis is obtained from central node determines
It is corresponding with the Bucket to go the multiple knot to include:
When storing data for the first time, routing table is obtained from the central node;
According to the routing table obtained from central node, determination is corresponding with the Bucket to remove multiple knot.
19. method as claimed in claim 18, which is characterized in that the routing table that the basis is obtained from central node determines
It is corresponding with the Bucket to remove multiple knot further include:
It sends request packet and removes multiple knot to corresponding with the Bucket;
The response bag for going multiple knot to return corresponding with the Bucket is received, the response bag includes the version letter of routing table
Breath;
Judge the version information of the routing table in the response bag and the version information of the routing table obtained from central node
It is whether identical;
When the version information and the version information phase of the routing table obtained from central node of the routing table in the response bag
Meanwhile multiple knot is removed according to the routing table determination obtained from central node is corresponding with the Bucket;
When the version information of the routing table in the response bag is not identical as the version information of the routing table obtained from central node
When, updated routing table is obtained from the central node;According to the updated routing table redefine with it is described
Bucket is corresponding to remove multiple knot.
20. a kind of data read-write method characterized by comprising
Central node sends routing table to client, and the routing table includes Bucket and removes the corresponding relationship between multiple knot;
Multiple knot is gone to receive the fingerprint queries request of the client, the fingerprint queries request includes removing multiple knot with described
The corresponding finger print information of the Bucket being assigned to;
It is described that multiple knot is gone to inquire the finger print information, the finger print information not inquired is back to the client;
It is described that multiple knot is gone to receive the finger print information not inquired and its representative data that the client uploads
Block.
21. method as claimed in claim 20, which is characterized in that the method also includes:
It is described that multiple knot is gone to save the finger print information not inquired in the Bucket being assigned to, with the distribution
To the corresponding Container file of Bucket in save the data block,
It is described that multiple knot is gone to return to the successful message of the data block preservation to the client.
22. method as claimed in claim 21, which is characterized in that described that multiple knot is gone to return to the data to the client
Before block saves successful message, the method also includes:
It is described to go multiple knot that the data block of the finger print information not inquired and its representative is backuped to standby node.
23. method as claimed in claim 21, which is characterized in that the method also includes:
The data block and corresponding cryptographic Hash for going multiple knot to save the mapped file that the client uploads.
24. method as claimed in claim 23, which is characterized in that the number for saving the mapped file that the client uploads
Include: according to block and corresponding cryptographic Hash
In the Container file corresponding to the corresponding Bucket, the data block of the mapped file is saved;
In the corresponding Bucket, the cryptographic Hash and the first storage information of the data block of the mapped file are saved.
25. method as claimed in claim 24, which is characterized in that the first storage information includes: to save the mapping text
The data block of the title of the Container file of the data block of part, the mapped file is inclined in the Container file
The size of the data block of shifting amount and the mapped file.
26. method as claimed in claim 23, which is characterized in that the method also includes:
It is described that multiple knot is gone to receive the request that the client obtains the data block of the mapped file;
The data block for going multiple knot to send the mapped file is to the client;
It is described to go multiple knot to receive the client to obtain data representated by each finger print information in the mapped file
The request of block;
It is described that multiple knot is gone to send data block representated by each finger print information to the client.
27. method as claimed in claim 26, which is characterized in that described that multiple knot is gone to send each finger print information institute's generation
The data block of table to the client includes:
The second storage information for going multiple knot to determine the data block according to the finger print information, the second storage information
Title including saving the Container file of the data block, offset of the data block in the Container file
The size of amount and the data block;
It is described go multiple knot according to the title of the Container file judge the Container file whether filed to
Background server;
It is described to go multiple knot according to the data block described when the Container file has been filed to background server
The size of offset and the data block in Container file obtains the data block from the background server and sends
To the client;
It is described to go multiple knot according to the data block described when the Container file is still stored in local
The size of offset and the data block in Container file is from the local acquisition data block and is sent to the client
End.
28. method as claimed in claim 20, which is characterized in that the central node sends routing table to client and includes:
When the client storing data for the first time, the central node receives the request that the client obtains routing table;
The central node sends routing table to the client.
29. method as claimed in claim 28, which is characterized in that the central node sends routing table and also wraps to client
It includes:
The request packet for going multiple knot to receive the client:
Described that multiple knot is gone to send response bag to the client, the response bag includes the routing table for going multiple knot to save
Version information;
When the version information for the routing table that the client saves and the version of the routing table for going multiple knot to save are believed
When ceasing inconsistent, the central node receives the routing table request of the client;
The central node sends updated routing table to the client.
30. method as claimed in claim 20, which is characterized in that it is described that multiple knot is gone to inquire the finger print information,
The finger print information not inquired, which is back to the client, includes:
It is described that multiple knot is gone to judge that the finger print information whether there is by Bloom filter;
In the absence of judging the finger print information by Bloom filter, determine that the finger print information is the finger not inquired
Line information;
In the presence of judging the finger print information by Bloom filter, the finger print information is inquired in finger print information storehouse is
No presence;
When inquiring the finger print information in finger print information storehouse, determine that the finger print information is existing;
When not inquiring the finger print information in finger print information storehouse, determine that the finger print information is the fingerprint letter not inquired
Breath.
31. a kind of data-storage system characterized by comprising central node and one or more remove multiple knot, wherein
The central node, for according to preset strategy by each Bucket (bucket) be assigned to it is corresponding remove multiple knot, and according to
Bucket creates routing table with the corresponding relationship of multiple knot is removed, and synchronizes the routing table and remove multiple knot to each;
It is described to remove multiple knot, for storing fingerprint letter corresponding to each Bucket being assigned to according to the routing table
The data block that breath and the finger print information represent.
32. a kind of client for reading and writing data characterized by comprising
Cutting computing module, for being multiple data blocks and the finger print information for calculating separately each data block by data cutting;
Bucket determining module, for determining Bucket corresponding to the finger print information of each data block;
Node determining module, for determining duplicate removal section corresponding with the Bucket according to the routing table obtained from central node
Point;
Request sending module requests to remove multiple knot to corresponding with the Bucket for sending fingerprint queries, and the fingerprint is looked into
Ask the finger print information that request includes data block;
Information receiving module, for receiving the fingerprint letter not inquired for going multiple knot to return corresponding with the Bucket
Breath;
Data uploading module, for upload the data block of the finger print information not inquired and its representative to the Bucket
It is corresponding to remove multiple knot.
33. a kind of system for reading and writing data characterized by comprising central node and remove multiple knot, wherein
The central node, for sending routing table to client, the routing table includes Bucket and goes between multiple knot
Corresponding relationship;
Described to remove multiple knot, the fingerprint queries for receiving the client are requested, and the fingerprint queries request includes and institute
State the corresponding finger print information of Bucket that multiple knot is assigned to;The finger print information is inquired, the finger that will do not inquired
Line information is back to the client;Receive the finger print information not inquired that the client uploads and its representative
Data block.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510226830.0A CN106201771B (en) | 2015-05-06 | 2015-05-06 | Data-storage system and data read-write method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510226830.0A CN106201771B (en) | 2015-05-06 | 2015-05-06 | Data-storage system and data read-write method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106201771A CN106201771A (en) | 2016-12-07 |
CN106201771B true CN106201771B (en) | 2019-07-05 |
Family
ID=57459493
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510226830.0A Active CN106201771B (en) | 2015-05-06 | 2015-05-06 | Data-storage system and data read-write method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106201771B (en) |
Families Citing this family (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107766478A (en) * | 2017-10-11 | 2018-03-06 | 复旦大学 | A kind of design method of concurrent index structure towards high competition scene |
CN107832341B (en) * | 2017-10-12 | 2022-01-28 | 千寻位置网络有限公司 | AGNSS user duplicate removal statistical method |
CN109725842B (en) * | 2017-10-30 | 2022-10-11 | 伊姆西Ip控股有限责任公司 | System and method for accelerating random write placement for bucket allocation within a hybrid storage system |
CN108093024B (en) * | 2017-11-14 | 2020-08-04 | 西北工业大学 | Classified routing method and device based on data frequency |
CN108509616B (en) * | 2018-03-30 | 2022-03-08 | 北京怡生乐居信息服务有限公司 | Data processing method and system |
CN109740037B (en) * | 2019-01-02 | 2023-11-24 | 山东省科学院情报研究所 | Multi-source heterogeneous flow state big data distributed online real-time processing method and system |
CN110071964B (en) * | 2019-03-26 | 2022-03-15 | 罗克佳华科技集团股份有限公司 | File synchronization method, device, file sharing network, file sharing system and storage medium |
CN110209727B (en) * | 2019-04-04 | 2020-08-11 | 特斯联(北京)科技有限公司 | Data storage method, terminal equipment and medium |
CN110134331B (en) * | 2019-04-26 | 2020-06-05 | 重庆大学 | Routing path planning method, system and readable storage medium |
CN110674116B (en) * | 2019-09-25 | 2022-05-03 | 四川长虹电器股份有限公司 | System and method for checking and inserting data repetition of database based on swoole |
CN111158948B (en) * | 2019-12-30 | 2024-04-09 | 深信服科技股份有限公司 | Data storage and verification method and device based on deduplication and storage medium |
CN112148928B (en) * | 2020-09-18 | 2024-02-20 | 鹏城实验室 | Cuckoo filter based on fingerprint family |
CN111966649B (en) * | 2020-10-21 | 2021-01-01 | 中国人民解放军国防科技大学 | Lightweight online file storage method and device capable of efficiently removing weight |
CN113420400B (en) * | 2021-07-06 | 2023-06-30 | 北京字跳网络技术有限公司 | Routing relation establishment method, request processing method, device and equipment |
CN113625968B (en) * | 2021-08-12 | 2024-03-01 | 网易(杭州)网络有限公司 | File authority management method and device, computer equipment and storage medium |
CN115988002B (en) * | 2023-02-16 | 2023-08-15 | 荣耀终端有限公司 | Data transmission method and electronic equipment |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101539950A (en) * | 2009-05-08 | 2009-09-23 | 成都市华为赛门铁克科技有限公司 | Data storage method and device |
US9292530B2 (en) * | 2011-06-14 | 2016-03-22 | Netapp, Inc. | Object-level identification of duplicate data in a storage system |
CN102968498B (en) * | 2012-12-05 | 2016-08-10 | 华为技术有限公司 | Data processing method and device |
-
2015
- 2015-05-06 CN CN201510226830.0A patent/CN106201771B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN106201771A (en) | 2016-12-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106201771B (en) | Data-storage system and data read-write method | |
US8799238B2 (en) | Data deduplication | |
US10380073B2 (en) | Use of solid state storage devices and the like in data deduplication | |
US8712963B1 (en) | Method and apparatus for content-aware resizing of data chunks for replication | |
US8639669B1 (en) | Method and apparatus for determining optimal chunk sizes of a deduplicated storage system | |
US9798486B1 (en) | Method and system for file system based replication of a deduplicated storage system | |
US7992037B2 (en) | Scalable secondary storage systems and methods | |
US7577808B1 (en) | Efficient backup data retrieval | |
US9189493B2 (en) | Object file system | |
US9547706B2 (en) | Using colocation hints to facilitate accessing a distributed data storage system | |
Manogar et al. | A study on data deduplication techniques for optimized storage | |
US9383936B1 (en) | Percent quotas for deduplication storage appliance | |
US10628298B1 (en) | Resumable garbage collection | |
CN104408111A (en) | Method and device for deleting duplicate data | |
TW201734750A (en) | Data deduplication cache comprising solid state drive storage and the like | |
CN110888837B (en) | Object storage small file merging method and device | |
CN109522283A (en) | A kind of data de-duplication method and system | |
US20230394010A1 (en) | File system metadata deduplication | |
CN113535670B (en) | Virtual resource mirror image storage system and implementation method thereof | |
CN104951475A (en) | Distributed file system and implementation method | |
CN109241011B (en) | Virtual machine file processing method and device | |
CN111290883B (en) | Simplified replication method based on deduplication | |
US20240143213A1 (en) | Fingerprint tracking structure for storage system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |