CN104008152B - Support the framework method of the distributed file system of mass data access - Google Patents
Support the framework method of the distributed file system of mass data access Download PDFInfo
- Publication number
- CN104008152B CN104008152B CN201410216506.6A CN201410216506A CN104008152B CN 104008152 B CN104008152 B CN 104008152B CN 201410216506 A CN201410216506 A CN 201410216506A CN 104008152 B CN104008152 B CN 104008152B
- Authority
- CN
- China
- Prior art keywords
- node
- file
- motion
- distributed
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
- G06F16/134—Distributed indices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
Abstract
The invention discloses a kind of framework method for the distributed file system for supporting mass data to access, this method is based on distributed hashtable, and access node is obtained by carrying out Hash mapping to file path.Using fully distributed non-stop layer architecture design, new node can add cluster by communicating several times.Addressing between node uses Kademlia algorithms, and the distance being divided routing table and obtaining node by XOR is to realize redirecting for nearest neighbors.The operation of the node is mapped to handle by PaxosLease algorithm picks leader, to solve consistency problem.The piecemeal storage of size is then fixed in the real data of file, and redundancy backup is on several nodes, there is provided the demand of security and Distributed Calculation.The system of framework can significantly increase treatment effeciency when mass file is handled, and better effects can be also obtained in the environment of relatively low delay requirement.
Description
Technical field
The present invention relates to distributed file system research field, more particularly to a kind of distribution for supporting mass data to access
The framework method of file system.
Background technology
With the development of Internet technology, " cloud computing is just increasingly paid attention to by people, and it is Distributed Calculation, parallel meter
The fusions of conventional art such as calculation, effectiveness calculating, network storage, virtualization, load balancing and a kind of new user oriented formed
Service type product concept.And " cloud storage " is one of cloud service for pressing close to common netizen the most.
The distributed file system of early stage, file and its metadata information do not do redundancy backup, once wherein certain clothes
Business device failure, then store file on that server with regard to unavailable.And as quantity of documents increases, system also becomes more
It is huge, both difficult extensions or unmanageable.Modern distributed file system then more focuses on the Distribution Strategy of metadata, by file member
Data and data storage separation, can improve concurrency, the availability of service, and make full use of actual data storage machine in cluster
The disk I/O of device.
Distributed file system common at present has GFS, HDFS, Lustre, MogileFS etc., and it is each applied to difference
Field.Most active is HDFS on Hadoop, its Organization Chart as shown in figure 8, its towards be Distributed Calculation, use is single
The framework of meta data server, simple system are adapted to larger file size, and the file write by way of addition often reaches
Hundreds and thousands of GB, file is subjected to piecemeal storage.For distributed data processing, the scene calculated, HDFS enough should
Pay, and existing many successful cases.But its single host node easily becomes bottleneck, and there is the situation of single point failure.
MogileFS supports the read-write of large amount of small documents, can replicate file automatically, but does not support the random read-write of file, to database
Depend on unduly, equally exist Single Point of Faliure.Lustre uses object storage technology, is adapted to be written and read big file, and it will be big
File fragmentation, reliability is provided by the RAID on memory node, therefore system does not provide the redundancy backup of multiple copies.
The content of the invention
The shortcomings that it is a primary object of the present invention to overcome prior art and deficiency, there is provided one kind supports mass data to access
Distributed file system framework method, the advantages of system has used for reference various distributed file systems, its non-stop layer frame
Structure and redundancy backup mechanism can provide the data of safe and reliable, efficient distribution type file access service and magnanimity to upper strata
Access.
The purpose of the present invention is realized by following technical scheme:Support the distributed file system of mass data access
Framework method, comprise the following steps:
(1) using the Network Communication Framework of non-obstruction, in linux system, using epoll selectors.Make system big
Still there is very high performance when measuring connection and high IO;
(2) simple and highly efficient remote procedure call (RPC, the Remote Procedure based on dynamic proxy is used
Call), system complexity is reduced;
(3) similar with traditional C/S frameworks, for client by API Access file system, the node in cluster passes through ether
Net realization is in communication with each other, and each node is responsible for safeguarding routing table, metadata, file data.Client connects any one and noted
The node of volume service is that the operation to file can be achieved;
(4) file is mapped on corresponding node by uniformity hash algorithm, ensures distribution and the interstitial content of file
Unrelated, the addition of node is preferably minimized with exiting the migration amount of influence to system and data, distributed hashtable use
Kademila algorithms, the time loss in locating file can be reduced to greatest extent;
(5) big file block, the data and metadata of file are all backed up on 3 different nodes, node is delayed machine
After can switch rapidly, ensure the safe and effective of data;
(6) on multiple nodes all there is file backup in fully distributed structure, need during to some file operation to sentence
Disconnected really exercisable backup.System uses a kind of algorithm outstanding, that leader can be quickly elected in multiple nodes
PaxosLease, other backups are re-synchronised to after being operated by leader.
Non- clogging networks communications framework described in the step of above method (1), it is the NIO storehouses MINA based on Java, its
The event driven API for supporting to be abstracted on TCP/UDP is provided.It is also outstanding filter chain and with Multi-thread control device mould
Type, unpacking is quickly packaged to packet, and gives the processing of Multi-thread control device, MINA takes in complete RPC calling
About 0.5 millisecond.
The traditional mode of remote procedure call described in the step of above method (2) is three layers:
(2-1) counterfoil/framework (Stub/Skeleton) layer:For client counterfoil (agency) and server end frame.
(2-2) telereference (Remote Refference) layer:For telereference behavior.
(2-3) transmits (Transport) layer:Tracking for establishment of connection and management, and remote object.
The excessive abnormal examination of RMI lower portions that Java is carried, subsidiary unnecessary information during transmission, the generation of counterfoil
Also so that the management of code becomes complicated, Figure 11 is seen.And dynamic proxy mode (see Fig. 6) is operationally dynamic as needed
Agent object is generated, the method name that will be called, parameter are looked into by being sent to service end, service end after packaging after receiving request
Chartered service entities are looked for, are called after the method for entity to being sent to client after return value and abnormal packaging.
The thought of metadata reference Linux file system index node and GFS described in the step of above method (3),
Child node information, file size, permission mode, data block information including file etc., tree structure is formed, each text in system
The size of part block is 64M, and big file is stored by piecemeal, and by the block message linked list maintenance of file metadata;File operation API bags
Include and create node, judge whether, create directory, deleting, listing the operation similar with other operating systems such as catalogue file.
The step of above method (4), the Kademila protocol algorithms process had:
(4-1) machine characteristic (such as IP address) and file path all obtain an ID by Hash operation, and the system uses
Quick and sound 64 CityHash algorithms, have the characteristics that uniformly, collision rate it is low.
(4-2) ID is distributed in 264On the ring of size, to find the closest node of the ID that be mapped to current key, need
Calculate the distance of known node.In Kademila algorithms, the distance between two ID is obtained by XOR:
D (x, y)=x ⊕ y
It is recognised that XOR is one-way.It is always true in the presence of one for any given node x and distance D
Fixed node y so that d (x, y)=D;
(4-3) Kad routing tables are made up of the data structure for being referred to as K buckets, K buckets are actual deposit be<K,V>To mapping, often
The distance range for the ID values that individual K buckets have an ID and it is included.When insertion<K,V>To it is enough when, K buckets can divide
Split, the Kad routing tables of a machine are 64 in which final state.If some K bucket is full, replaced using lru algorithm, favorably
In the management of current meter node.
(4-4) Kad routes are a nonequilibrium line segment binary trees, but the Kad routes of a node are not too large, look into
The average time complexity of inquiry is O (logN), and it, which is operated, is divided into insertion, deletion, searches one closest to certain ID value.
The partition strategy of big file described in the step of above method (5) refers to the partition strategy in HDFS, will be greater than 64M
File carry out piecemeal, it is 64M to give tacit consent to each piecemeal.Each piecemeal is backuped on 3 neighbouring nodes, and the write-in of file is write from memory
Recognize using additional mode, ablation process is chain type, and each node transmits after receiving data to next node, number
According at least thinking to write successfully after first node verifies successfully, if having piecemeal write-in failure, by inspection data backup
Thread is initiated synchronous.
The step of above method (6), the PaxosLease algorithms were specific as follows:
When sponsor (Proposer) one proposal of proposition, want the proposal and get the Green Light, it is necessary to obtain and exceed half
The approval of several resolution persons (Acceptor), it can just be synchronized on people (Learner) handbook of all execution proposals.Resolution person and
The waiter of message transmission is not (corresponding node, the network failure in a distributed system) of full-time job, it is believed that only
The resolution person that exceed half (1+n/2) have approved proposal, then the proposal is passed through.
Its constraint includes:
P1:One resolution person must receive the motion received for the first time;
P2:Once one with motion value v, (motion value is that each motion must be with, for example the tax revenue in reality carries
Case, then motion value can be tax revenue ratio) motion go through, then the motion ratified afterwards must have value v.
One motion value v of approval means that multiple resolution persons receive the value, and therefore, P2 can be strengthened:
P2a:Once a motion with motion value v goes through, then the motion that any resolution person receives again afterwards
There must be value v.
Because communication is asynchronous, constraints P2a and constraints P1 can be clashed.An if motion value v quilt
After approval, a sponsor and a resolution person revive from dormancy, and the former proposes a motion with new motion value.Root
According to constraints P1, the latter should receive, and according to constraints P2a, then should not receive, constraints P2a under this Scene
It is contradictory with P1.Then the behavior to proponent is needed to enter row constraint:
P2b:Once a motion with motion value v goes through, then the motion that any sponsor proposes later is necessary
With value v.
Constraints P2b has contained constraints P2a, is a stronger constraint, it can be difficult to realizing, can find one
The individual constraint P2c for containing constraints P2b:
P2c:If the motion that a numbering is n has motion value v, then a majority be present, otherwise institute in them
Someone is without any motion of the receiving numbering less than n, otherwise they in all motions of the numbering less than n it has been accepted that number
That maximum motion has motion value v.
The present invention compared with prior art, has the following advantages that and beneficial effect:
(1) present invention has used for reference the advantages of various distributed file systems, such as HDFS file block, carries on this basis
Go out the distributed file system based on Hash mapping, each node is both used as data access node, also serve as metadata storage
Node, overcome traditional single point failure situation, service can be provided, can also overcome as routing inquiry, the via node redirected
The pressure that metadata is safeguarded by single node, the problem of no Single Point of Faliure, greatly improve the stability of system.
(2) present invention is complete distributed structure/architecture, and each node is cheap PC, fully excavates its computing and IO abilities,
The migration exited to data of node and the influence of system are preferably minimized, and the addition of node also very flexibly, has very high expansion
Malleability.
(3) search procedure of a file operation uses Kademila algorithms in the inventive method, and the required time is
The complexity of Logarithmic degree, the time loss in search procedure can be reduced to greatest extent.Operation in specific 3 copies is led to
PaxosLease election leadership persons are crossed, it is highly reliable.Two stages, all operations had low-down time delay.
Brief description of the drawings
Fig. 1 is the present embodiment two layer system configuration diagram.
Fig. 2 is the present embodiment RPC communication model.
Fig. 3 is the transmission channel schematic diagram in the write-in of the present embodiment file data.
Fig. 4 is the demonstration graph that the present embodiment Kademlia algorithms once insert K buckets.
Fig. 5 is the schematic diagram that ID is once searched in the present embodiment Kademlia algorithms.
Fig. 6 is dynamic proxy schematic diagram.
Fig. 7 is the process schematic that the present embodiment PaxosLease algorithms once compete.
Fig. 8 is Hadoop file system HDFS Organization Charts in the prior art.
Fig. 9 is MINA network architecture figures.
Figure 10 is RPC structural framing figures.
The RPC frameworks (RMI) that Figure 11 is Java itself call schematic diagram.
Embodiment
With reference to embodiment and accompanying drawing, the present invention is described in further detail, but embodiments of the present invention are unlimited
In this.
Embodiment 1
Hardware net structure such as Fig. 1 used by the distributed file system that support mass data described in the present embodiment accesses
It is shown, it is two layer system framework, specifically includes client and several servers, each server includes name node
(NameNode) it is and back end (DataNode), similar with traditional C/S frameworks, client by API Access file system,
Node in cluster is realized by Ethernet to be in communication with each other, and each node is responsible for safeguarding routing table, metadata, file data.Visitor
Following operate can be achieved in family end:A, it is connected to arbitrary node;B, it is connected to specific service device.Client is by connecting any one
The node of registered service is that the operation to file can be achieved.
Framework method described in the present embodiment is based on distributed hashtable, by carrying out Hash mapping acquisition to file path
Access node.This system uses fully distributed non-stop layer architecture design, and new node can be added by communicating several times
Cluster.Addressing algorithm between node employs Kademlia algorithms, and routing table is divided and saved by XOR
Distance between point is to realize redirecting for nearest neighbors.The section is mapped to handle by PaxosLease algorithm picks leader
The operation of point, to solve consistency problem.The metadata of file is by the Hash mapping to file absolute path to corresponding
Node, and store on this node.Metadata object is stored directly in internal memory to provide the service of access, while is protected on hard disk
A mirror image is deposited to make fault recovery use.The piecemeal storage of size, and redundancy backup is then fixed in the real data of file
On several nodes, there is provided the demand of security and Distributed Calculation.The system can be significantly when mass file is handled
Treatment effeciency is improved, better effects can be also obtained in the environment of relatively low delay requirement.Below in conjunction with the accompanying drawings to specific method
Step is described.
First, the present embodiment uses the NIO storehouses MINA based on Java, in linux system, using epoll selectors.MINA
Network frame figure it is as shown in Figure 9.
2nd, using simple and highly efficient remote procedure call (RPC, RemoteProcedure based on dynamic proxy
Call), system complexity is reduced, RPC communication model is as shown in Figure 2.
Described RPC traditional modes are three layers, as shown in Figure 10, including:Counterfoil/framework (Stub/Skeleton) layer:With
In client counterfoil (agency) and server end frame;Telereference (Remote Refference) layer:For telereference row
For;Transmit (Transport) layer:Tracking for establishment of connection and management, and remote object.
3rd, similar with traditional C/S frameworks, for client by API Access file system, the node in cluster passes through ether
Net realization is in communication with each other, and each node is responsible for safeguarding routing table, metadata, file data.Client connects any one and noted
The node of volume service is that the operation to file can be achieved.
Described metadata reference Linux file system index node and GFS thought, include the child node letter of file
Breath, file size, permission mode, data block information etc., tree structure is formed, the size of each blocks of files is 64M in system, greatly
File is stored by piecemeal, and by the block message linked list maintenance of file metadata;File operation API includes creating node, judging
It is no to exist, create directory, deleting, listing the operation similar with other operating systems such as catalogue file.Such as in the present embodiment
Two metadata INode and BlockInfo establishing data structure are respectively following structure, and INode includes
FsVersion, path, type, mode, createTime, modifyTime, children, size, blockInfos etc. believe
Breath;BlockInfo includes the information such as path, blocklength, offset, seqNum, replica.
4th, file is mapped on corresponding node by uniformity hash algorithm, and wherein distributed hashtable uses
Kademila algorithms, the time loss in locating file is reduced to greatest extent.
The Kademila protocol algorithms process has:
(4-1) machine characteristic (such as IP address) and file path all obtain an ID by Hash operation, and the system uses
Quick and sound 64 CityHash algorithms, have the characteristics that uniformly, collision rate it is low.
(4-2) ID is distributed in 264On the ring of size, to find the closest node of the ID that be mapped to current key, need
Calculate the distance of known node.In Kademila algorithms, the distance between two ID is obtained by XOR:
D (x, y)=x ⊕ y
It is recognised that XOR is one-way.It is always true in the presence of one for any given node x and distance D
Fixed node y so that d (x, y)=D;
(4-3) Kad routing tables are made up of the data structure for being referred to as K buckets, K buckets are actual deposit be<K,V>To mapping, often
The distance range for the ID values that individual K buckets have an ID and it is included.When insertion<K,V>To it is enough when, K buckets can divide
Split, the Kad routing tables of a machine are 64 in which final state.If some K bucket is full, replaced using lru algorithm, favorably
In the management of current meter node.
(4-4) Kad routes are a nonequilibrium line segment binary trees, but the Kad routes of a node are not too large, look into
The average time complexity of inquiry is O (logN), and it, which is operated, is divided into insertion, deletion, searches one closest to certain ID value.
It is as follows that search procedure of Kademila algorithms is specifically given with reference to accompanying drawing 4,5:
(1) division of K buckets
Every machine has a Kad routing table, K buckets are actual deposit be<K,V>To mapping.Each K buckets have one
ID and ID values that it is included distance range.When insertion<K,V>To it is enough when, K buckets can divide.See Fig. 4.
(2) ID is searched
Setting:
Node ID | Routing iinformation |
0 | 0,1,11,15 |
1 | 1,2,10,15 |
2 | 2,3,11,13 |
3 | 3,4,12,14 |
4 | 4,5,12,13 |
5 | 5,6,13,15 |
6 | 6,7,12,14 |
7 | 7,8,10,12 |
8 | 8,9,11,13 |
9 | 3,9,10,15 |
10 | 0,6,10,11 |
11 | 0,7,11,12 |
12 | 0,9,12,13 |
13 | 1,8,13,14 |
14 | 2,7,14,15 |
15 | 0,9,12,15 |
In the present embodiment, it is necessary to from node 0, node 13 is searched, as shown above, with reference to Fig. 5, search procedure is such as
Under:
A) in node 0,0,11,15 these three nodes is found by findNear (searching neighbor point) operations and may know that section
Point 13.Wherein 0 had accessed, and did not visited again.
B) 0,11,12 are got from node 11;0,12,15 are got from node 15.Wherein 0,11,15 these three nodes are
Through accessing, do not visit again.Remaining node 12 redirects next time.
C) hit node 13 is obtained from node 12, and obtains the IP values of node 13.
As can be seen here, ID is carried out in Kad networks and searches required RPC request number of times no more than logN times, and with
The increase of run time, Kad routing iinformation can more enrich, adjacent node can more clear mutual situation, it is and popular
Distant-end node can also be able to trust and preserve.In the ideal case, node checks just can be completed by 1 to 2 communication, this is it
Advantage not available for his DHT technologies.
5th, big file block, the data and metadata of file are all backed up on 3 different nodes, node is delayed machine
After can switch rapidly, ensure the safe and effective of data.
The strategy of the big file block refers to the partition strategy in HDFS, and the file that will be greater than 64M carries out piecemeal, acquiescence
Each piecemeal is 64M.Each piecemeal is backuped on 3 neighbouring nodes, as shown in figure 3, what the write-in acquiescence of file used
It is additional mode, ablation process is chain type, and each node transmits after receiving data to next node, and data at least exist
First node is thought to write successfully after verifying successfully, if there is piecemeal write-in failure, is initiated by the thread of inspection data backup
It is synchronous.
6th, system uses a kind of algorithm PaxosLease outstanding, that leader can be quickly elected in multiple nodes,
Other backups are re-synchronised to after being operated by leader.Algorithm steps are as follows:
When sponsor (Proposer) one proposal of proposition, want the proposal and get the Green Light, it is necessary to obtain and exceed half
The approval of several resolution persons (Acceptor), it can just be synchronized on people (Learner) handbook of all execution proposals.Resolution person and
The waiter of message transmission is not (corresponding node, the network failure in a distributed system) of full-time job, it is believed that only
The resolution person that exceed half (1+n/2) have approved proposal, then the proposal is passed through.
7 to specifically give PaxosLease algorithmic procedures as follows below in conjunction with the accompanying drawings:
1) sponsor wishes to obtain a T (T<M) the lease of second.It needs to prepare a motion numbering first
[request.ballotNumber], and be sent on most boards of resolution person.
2) resolution person judges the motion numbering of request when a request is received
Whether [request.ballotNumber] is more than the maximum motion numbering promised to undertake in local state
[state.highestPromised].If it is lower, resolution person can ignore request or send back a refusal respond.If
It is equal to or is more than, resolution person constructs a Prepare Response, wherein containing the resolution of current approved
[state.acceptedProposal], it is empty or current leader.In addition, the highest of local is promised to undertake numbering by resolution person
It is arranged to ask caused motion numbering, and highest is promised to undertake that numbering sends back to sponsor together with accepted at present resolve.
3) sponsor examines the prepare response beamed back from resolution person, if the majority of resolution person is replied
Receive motion as sky, represent that they can receive a new motion, owner of the sponsor oneself as lease, that is,
Leader, and unlatching one countdown T, lease will fail after timet.Sponsor by countdown T, resolution numbering and
Motion value composition propose request are sent to all resolution persons.
4) resolution person checks whether numbering [request.ballotNumber] is big after propose request are received
In the maximum numbering that local state is promised to undertake.If it is less, ignore or postback a refusal respond.If equal to or be more than,
Resolution person receives motion:Maximum motion numbering is set, starts countdown T and lease owner (leader) is set.Then,
Construction propose response are simultaneously postbacked, wherein containing resolution numbering.After countdown time-out, local state
Lease owner is arranged to empty.Unless system reboot, otherwise resolution person is not reset by their highest and promises to undertake numbering.
5) the propose response of sponsor inspection institute recovery, if the majority of resolution person, which is replied, receives motion,
Then sponsor possesses lease until the countdown time-out set in the 3rd step., will be certainly when sponsor receives the reply of majority
Oneself state, which changes into, " possesses lease ".
Above-described embodiment is the preferable embodiment of the present invention, but embodiments of the present invention are not by above-described embodiment
Limitation, other any Spirit Essences without departing from the present invention with made under principle change, modification, replacement, combine, simplification,
Equivalent substitute mode is should be, is included within protection scope of the present invention.
Claims (8)
1. support the framework method of the distributed file system of mass data access, it is characterised in that comprise the following steps:
(1) using the Network Communication Framework of non-obstruction, in linux system, using epoll selectors;
(2) remote procedure call based on dynamic proxy is used;The traditional mode of described remote procedure call is three layers:
(2-1) counterfoil/ccf layer:For client counterfoil/agency, server end framework;
(2-2) telereference layer:For telereference behavior;
(2-3) transport layer:Tracking for establishment of connection and management, and remote object;
Dynamic proxy pattern described in step (2) is:Agent object is dynamically generated during operation as needed, will be called
Method name, parameter search chartered service entities by being sent to service end, service end after packaging after receiving request, adjust
With after the method for entity to return value and it is abnormal pack after be sent to client;
(3) client is by API Access file system, and the node in cluster is realized by Ethernet to be in communication with each other, each node
It is responsible for safeguarding routing table, metadata, file data;The node that client connects any one registered service is realized to file
Operation;
(4) file is mapped on corresponding node by uniformity hash algorithm, and distributed hashtable uses Kademila algorithms;
Addressing algorithm between node employs Kademlia algorithms, being divided routing table and obtaining node by XOR
Distance is to realize redirecting for nearest neighbors;
(5) big file block, the data and metadata of file are all backed up on several different nodes;It is described to big text
The step of part piecemeal is:The file that will be greater than 64M carries out piecemeal, and it is 64M to give tacit consent to each piecemeal;Each piecemeal backups to neighbouring
3 nodes on, the write-in acquiescence of file is using additional mode, and ablation process is chain type, and each node receives
Transmitted after data to next node, data at least think to write successfully after first node verifies successfully, if there is piecemeal to write
Enter failure, then initiated by the thread of inspection data backup synchronous;
(6) on multiple nodes all there is file backup in fully distributed structure, to being used during some file operation
PaxosLease algorithms elect leader in multiple nodes, and other backups are re-synchronised to after being operated by leader.
2. the framework method of the distributed file system according to claim 1 for supporting mass data to access, its feature exist
In, the Network Communication Framework of the non-obstruction described in step (1), it is the NIO storehouses MINA based on Java, its offer support TCP/
The event driven API being abstracted on UDP, unpacking is packaged to packet, and gives the processing of Multi-thread control device.
3. the framework method of the distributed file system according to claim 1 for supporting mass data to access, its feature exist
In, the thought of metadata reference Linux file system index node and GFS described in step (3), include the child node of file
Information, file size, permission mode, data block information, tree structure is formed, the size of each blocks of files is 64M in system, greatly
File is stored by piecemeal, and by the block message linked list maintenance of file metadata;File operation API includes creating node, judging
The no operation there is, create directory, deleting, listing catalogue file.
4. the framework method of the distributed file system according to claim 1 for supporting mass data to access, its feature exist
In step (4) the Kademila algorithm steps are as follows:
(4-1) machine characteristic and file path all obtain an ID by Hash operation;
(4-2) ID is distributed in 264On the ring of size, to find the closest node of the ID that be mapped to current key, it is necessary to count
The distance of known node is calculated, in Kademila algorithms, the distance between two ID is obtained by XOR:
<mrow>
<mi>d</mi>
<mrow>
<mo>(</mo>
<mi>x</mi>
<mo>,</mo>
<mi>y</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mi>x</mi>
<mo>&CirclePlus;</mo>
<mi>y</mi>
<mo>;</mo>
</mrow>
XOR is one-way, for any given node x and distance D, always in the presence of the node y of a determination, is made
Obtain d (x, y)=D;
(4-3) Kad routing tables are made up of the data structure for being referred to as K buckets, K buckets are actual deposit be<K,V>To mapping, each K
The distance range for the ID values that bucket has an ID and it is included, when insertion<K,V>To it is enough when, K buckets can divide,
The Kad routing tables of next machine of end-state are 64;If some K bucket is full, replaced using lru algorithm;
(4-4) Kad routes are a nonequilibrium line segment binary trees, and it, which is operated, is divided into insertion, deletion, searches closest to certain ID value
One.
5. the framework method of the distributed file system according to claim 4 for supporting mass data to access, its feature exist
In machine characteristic and file path all obtain an ID by 64 CityHash algorithms in step (4-1).
6. the framework method of the distributed file system according to claim 1 for supporting mass data to access, its feature exist
In step (6) the PaxosLease algorithms are specific as follows:
When sponsor's one proposal of proposition, want the proposal and get the Green Light, it is necessary to obtain batch of the resolution person more than half
Standard, it can just be synchronized on people's handbook of all execution proposals;Its constraint includes:
P1:One resolution person must receive the motion received for the first time;
P2:Once a motion with motion value v goes through, then the motion ratified afterwards must have value v.
7. the framework method of the distributed file system according to claim 6 for supporting mass data to access, its feature exist
In further being constrained above-mentioned constraints P2:
P2a:Once a motion with motion value v goes through, then the motion that any resolution person receives again afterwards is necessary
With value v;
Meanwhile row constraint is entered in the behavior to proponent:
P2b:Once a motion with motion value v goes through, then the motion that any sponsor proposes later must have
Value v.
8. the framework method of the distributed file system according to claim 7 for supporting mass data to access, its feature exist
In constraint includes:
P2c:If the motion that a numbering is n has motion value v, then a majority be present, otherwise owner in them
All without any motion of the receiving numbering less than n, otherwise they are it has been accepted that numbering is maximum in all motions of the numbering less than n
That motion there is motion value v.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410216506.6A CN104008152B (en) | 2014-05-21 | 2014-05-21 | Support the framework method of the distributed file system of mass data access |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410216506.6A CN104008152B (en) | 2014-05-21 | 2014-05-21 | Support the framework method of the distributed file system of mass data access |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104008152A CN104008152A (en) | 2014-08-27 |
CN104008152B true CN104008152B (en) | 2017-12-01 |
Family
ID=51368809
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410216506.6A Active CN104008152B (en) | 2014-05-21 | 2014-05-21 | Support the framework method of the distributed file system of mass data access |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104008152B (en) |
Families Citing this family (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016051512A1 (en) * | 2014-09-30 | 2016-04-07 | 株式会社日立製作所 | Distributed storage system |
CN104317947B (en) * | 2014-11-07 | 2017-12-12 | 南京烽火星空通信发展有限公司 | A kind of real-time architecture comparing system based on mass data |
CN104462335B (en) * | 2014-12-03 | 2017-12-29 | 北京和利时系统工程有限公司 | A kind of method and server agent for accessing data |
US10983732B2 (en) * | 2015-07-13 | 2021-04-20 | Pure Storage, Inc. | Method and system for accessing a file |
CN106557509A (en) * | 2015-09-29 | 2017-04-05 | 镇江雅迅软件有限责任公司 | A kind of distributed file system |
CN105630973A (en) * | 2015-12-25 | 2016-06-01 | 深圳市中博科创信息技术有限公司 | File storage method of cluster file system and cluster file system |
CN106210038B (en) * | 2016-07-06 | 2019-01-29 | 网易(杭州)网络有限公司 | The processing method and system of data operation request |
CN106708439A (en) * | 2016-12-23 | 2017-05-24 | 深圳市中博科创信息技术有限公司 | Node selection and calculation method and system in distributed file system |
CN106709045B (en) * | 2016-12-29 | 2020-09-15 | 北京同有飞骥科技股份有限公司 | Node selection method and device in distributed file system |
CN106686117B (en) * | 2017-01-20 | 2020-04-03 | 郑州云海信息技术有限公司 | Data storage processing system and method of distributed computing cluster |
CN106936899B (en) * | 2017-02-25 | 2021-02-05 | 九次方大数据信息集团有限公司 | Configuration method of distributed statistical analysis system and distributed statistical analysis system |
CN106789632A (en) * | 2017-02-25 | 2017-05-31 | 郑州云海信息技术有限公司 | A kind of method of the node-routing of large-scale distributed storage system |
CN110019501A (en) * | 2017-08-24 | 2019-07-16 | 深圳市金证科技股份有限公司 | A kind of collecting method, device and terminal device |
CN107832138B (en) * | 2017-09-21 | 2021-09-14 | 南京邮电大学 | Method for realizing flattened high-availability namenode model |
CN107613026A (en) * | 2017-10-31 | 2018-01-19 | 四川仕虹腾飞信息技术有限公司 | Distributed file management system based on cloud storage system |
CN108319634B (en) * | 2017-12-15 | 2021-08-06 | 深圳创新科技术有限公司 | Directory access method and device for distributed file system |
CN110071870B (en) * | 2018-01-24 | 2022-03-18 | 苏宁云商集团股份有限公司 | Alluxio-based routing method and device for multiple HDFS clusters |
CN108462737B (en) * | 2018-01-29 | 2021-02-02 | 哈尔滨工业大学深圳研究生院 | Batch processing and pipeline-based hierarchical data consistency protocol optimization method |
CN110120961B (en) * | 2018-02-06 | 2022-04-26 | 北京京东尚科信息技术有限公司 | Distributed service cluster and route synchronization method thereof |
CN109688211A (en) * | 2018-12-18 | 2019-04-26 | 杭州茂财网络技术有限公司 | Data distribution formula processing method |
CN109885550B (en) * | 2018-12-28 | 2022-09-13 | 安徽维德工业自动化有限公司 | File storage system based on all-connected routing layer |
CN111695018B (en) * | 2019-03-13 | 2023-05-30 | 阿里云计算有限公司 | Data processing method and device, distributed network system and computer equipment |
CN110381157A (en) * | 2019-07-26 | 2019-10-25 | 正链科技(深圳)有限公司 | A kind of distributed directional data storage P2P network based on Kademlia algorithm |
CN113220644B (en) * | 2021-05-28 | 2022-04-26 | 北京微纳星空科技有限公司 | File processing method, device, equipment and storage medium |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102737130A (en) * | 2012-06-21 | 2012-10-17 | 广州从兴电子开发有限公司 | Method and system for processing metadata of hadoop distributed file system (HDFS) |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI476610B (en) * | 2008-04-29 | 2015-03-11 | Maxiscale Inc | Peer-to-peer redundant file server system and methods |
CN101840366B (en) * | 2010-05-13 | 2012-05-23 | 上海交通大学 | Storage method of loop chain type n+1 bit parity check code |
-
2014
- 2014-05-21 CN CN201410216506.6A patent/CN104008152B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102737130A (en) * | 2012-06-21 | 2012-10-17 | 广州从兴电子开发有限公司 | Method and system for processing metadata of hadoop distributed file system (HDFS) |
Non-Patent Citations (2)
Title |
---|
"Kademlia协议资源定位模型的分析与改进";劳炽元;《中国优秀硕士学位论文全文数据库 信息科技辑》;20110315;第I139-55页 * |
"Paxos 优化算法下的数据库元数据一致性方法研究";周一帆;《现代电子技术》;20130701;第36卷(第13期);第65-67,70页 * |
Also Published As
Publication number | Publication date |
---|---|
CN104008152A (en) | 2014-08-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104008152B (en) | Support the framework method of the distributed file system of mass data access | |
EP3837652B1 (en) | Distributed blockchain data storage under account model | |
US9558194B1 (en) | Scalable object store | |
Bonvin et al. | A self-organized, fault-tolerant and scalable replication scheme for cloud storage | |
CN103873501B (en) | A kind of cloud standby system and its data back up method | |
JP2021508876A (en) | Simultaneous transaction processing in a high-performance distributed recording system | |
CN109791594A (en) | Data are segmented in order to persistently be stored in multiple immutable data structures | |
CN104391930A (en) | Distributed file storage device and method | |
CN103442057A (en) | Cloud storage system based on user collaboration cloud | |
CN107734026A (en) | A kind of design method, device and the equipment of network attached storage cluster | |
CN106993064A (en) | A kind of system and its construction method and application that the storage of mass data scalability is realized based on Openstack cloud platforms | |
CN102420854A (en) | Distributed file system facing to cloud storage | |
CN101771537A (en) | Processing method and certificating method for distribution type certificating system and certificates of certification thereof | |
US11100094B2 (en) | Taking snapshots of blockchain data | |
CN112559637B (en) | Data processing method, device, equipment and medium based on distributed storage | |
CN113360456B (en) | Data archiving method, device, equipment and storage medium | |
CN105205402A (en) | Privacy cluster metadata separation based cloud storage privacy protection method | |
CN114301972A (en) | Block chain link point hierarchical deployment method and system based on cloud edge cooperation | |
Qin et al. | A secure and effective construction scheme for blockchain networks | |
CN104951475B (en) | Distributed file system and implementation method | |
CN106687943A (en) | Systems and methods to organize a computing system having multiple computers, distribute computing tasks among the computers, and maintain data integrity and redundancy in the computing system | |
CN106919470A (en) | A kind of data reconstruction method and device | |
US11194792B2 (en) | Taking snapshots of blockchain data | |
CN106170012A (en) | Distributed file system that a kind of facing cloud renders and structure and access method | |
CN110362590A (en) | Data managing method, device, system, electronic equipment and computer-readable medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |