CN112699094A

CN112699094A - File storage method, data retrieval method, corresponding device and system

Info

Publication number: CN112699094A
Application number: CN202110309954.0A
Authority: CN
Inventors: 谢家贵; 郭健; 李志平; 张波; 马旭锋; 魏星; 杨威; 高礼坤
Original assignee: China Academy of Information and Communications Technology CAICT
Current assignee: China Academy of Information and Communications Technology CAICT
Priority date: 2021-03-23
Filing date: 2021-03-23
Publication date: 2021-04-23
Anticipated expiration: 2041-03-23
Also published as: CN112699094B

Abstract

The embodiment of the application provides a file storage method, a data retrieval method, a corresponding device and a corresponding system, wherein the file storage method comprises the following steps: dividing a file to be stored into a plurality of metadata blocks, and calculating the hash value of each metadata block; calculating the logical distance between each metadata block and the current node according to the hash value of each metadata block and the node address of the current node; determining a storage node corresponding to each metadata block according to the logical distance; and respectively storing the plurality of metadata blocks to corresponding storage nodes. By adopting the scheme in the application, the file is divided into a plurality of metadata blocks, the storage positions of the metadata blocks are determined according to the logical distance between each metadata block and the current node, and the seed server is replaced by the specific data storage and addressing rule, so that the target file can be obtained only through the seed file per se when the file is retrieved, and routing by the seed server is not needed.

Description

File storage method, data retrieval method, corresponding device and system

Technical Field

The present application relates to the field of distributed storage technologies, and in particular, to a file storage method, a data retrieval method, a corresponding apparatus, and a corresponding system.

Background

BitTorrent (BT for short) is a file distribution protocol, and the BT protocol is characterized in that the downloading speed is higher as more persons download a certain file, because a peer-to-peer network is formed among the downloaders, each downloader provides the downloaded data to other downloaders for downloading, and the uploading bandwidth of a user is fully utilized.

The file downloading process based on the BT protocol comprises the following steps: (1) a client obtains a seed file through a web server; (2) the client analyzes the seed file to obtain the information of the seed server; (3) the client accesses the seed server to obtain a file storage list to obtain the actual storage position of the target file; (4) the seed server updates the IP and the port number of the client into a file storage list; (5) and the client requests other clients to download the target file according to the IP and the port number recorded in the file storage list.

File distribution protocols typified by BT have drawbacks in that: on one hand, if the seed server is down or corresponding routing information is deleted, the target file cannot be downloaded through the seed, the protocol is difficult to ensure the availability of the file, on the other hand, because the protocol uses a centralized network model, the model requires a downloading party to completely trust the seed server, the safety of the seed server is ensured by an operating party of the server, once the seed server is attacked, the seed server may maliciously return error information, so that malicious files are downloaded, and the protocol cannot ensure the safety of the file storage and retrieval process.

Disclosure of Invention

The embodiment of the application provides a file storage method, a data retrieval method, a corresponding device and a system, and aims to solve the problems that in the prior art, a file distribution protocol represented by BT adopts a centralized network model, so that the availability of files is poor and the security of the files is not high enough.

According to a first aspect of embodiments of the present application, there is provided a file storage method, including: dividing a file to be stored into a plurality of metadata blocks, and calculating the hash value of each metadata block; calculating the logical distance between each metadata block and the current node according to the hash value of each metadata block and the node address of the current node; determining a storage node corresponding to each metadata block according to the logical distance; and respectively storing the plurality of metadata blocks to corresponding storage nodes.

According to a second aspect of embodiments of the present application, there is provided a data retrieval method, the method including: acquiring a retrieval index of a target file, and acquiring a corresponding Mercker tree according to the retrieval index, wherein the retrieval index is root node hash of the Mercker tree; obtaining hash values of a plurality of metadata blocks forming the target file according to the Mercker tree; calculating the logical distance between each metadata block and the current node according to the hash value of each metadata block and the node address of the current node; determining a storage node of each metadata block according to the logical distance; and respectively sending metadata block acquisition requests to corresponding storage nodes to acquire corresponding metadata blocks.

According to a third aspect of embodiments of the present application, there is provided a file storage apparatus, the apparatus including: the file segmentation module is used for segmenting a file to be stored into a plurality of metadata blocks and calculating the hash value of each metadata block; the first distance calculation module is used for calculating the logical distance between each metadata block and the current node according to the hash value of each metadata block and the node address of the current node; the first node determining module is used for determining a storage node corresponding to each metadata block according to the logical distance; and the data block storage module is used for respectively storing the plurality of metadata blocks to corresponding storage nodes.

According to a fourth aspect of embodiments of the present application, there is provided a data retrieval apparatus, the apparatus including: the index processing module is used for acquiring a retrieval index of a target file and acquiring a corresponding Mercker tree according to the retrieval index, wherein the retrieval index is the root node hash of the Mercker tree; the hash acquisition module is used for acquiring hash values of a plurality of metadata blocks forming the target file according to the Mercker tree; the second distance calculation module is used for calculating the logical distance between each metadata block and the current node according to the hash value of each metadata block and the node address of the current node; the second node determining module is used for determining the storage node of each metadata block according to the logical distance; and the data block acquisition module is used for respectively sending metadata block acquisition requests to the corresponding storage nodes so as to acquire the corresponding metadata blocks.

According to a fifth aspect of embodiments of the present application, there is provided a distributed storage system, the system including a plurality of nodes, each node being configured to: after a file to be stored is obtained, the file to be stored is divided into a plurality of metadata blocks, and the hash value of each metadata block is calculated; calculating the logical distance between each metadata block and the current node according to the hash value of each metadata block and the node address of the current node; determining a storage node of each metadata block in the distributed storage system according to the logical distance; storing the plurality of metadata blocks to corresponding storage nodes respectively; after a retrieval index of a target file is acquired, acquiring a corresponding Mercker tree according to the retrieval index, wherein the retrieval index is root node hash of the Mercker tree; obtaining hash values of a plurality of metadata blocks forming the target file according to the Mercker tree; calculating the logical distance between each metadata block and the current node according to the hash value of each metadata block and the node address of the current node; determining a storage node of each metadata block in the distributed storage system according to the logical distance; respectively sending metadata block acquisition requests to corresponding storage nodes to acquire corresponding metadata blocks; after receiving a metadata block sent by any node in the distributed storage system, storing the metadata block; after receiving a metadata block acquisition request sent by any node in the distributed storage system, providing a corresponding metadata block for the node.

According to the technical scheme, the metadata block addressing is carried out in a logical distance mode, when the file is stored, the file is divided into a plurality of metadata blocks, the metadata blocks are stored on corresponding storage nodes according to the logical distance between each metadata block and the current node, and when data retrieval is carried out, the metadata blocks are requested to the corresponding storage nodes according to the logical distance between each metadata block and the current node. In the scheme, the seed server is replaced by the specific data storage and addressing rule, so that the required metadata block can be obtained only through the seed file when the file is retrieved, routing is not required by the seed server, the metadata block can be ensured to be obtained at any time because the seed server is not required to participate, meanwhile, because the whole network model is decentralized, the risk that the seed server is attacked does not exist, and the safety in the file storage and data retrieval process can be ensured.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

FIG. 1 illustrates a schematic diagram of a peer-to-peer network as described in an embodiment of the present application;

FIG. 2 is a flowchart illustrating a file storage method according to an embodiment of the present application;

FIG. 3 is a flowchart illustrating a file storage method according to an embodiment of the present application;

FIG. 4 shows a schematic representation of the Mercker tree in an embodiment of the application;

FIG. 5 is a diagram illustrating a DHT in an embodiment of the present application;

FIG. 6 is a schematic diagram illustrating the determination of storage nodes for each metadata block based on logical distance;

FIG. 7 is a flow chart illustrating a data retrieval method provided by an embodiment of the present application;

fig. 8 shows a detailed flowchart after step S250 in the data retrieval method provided in the embodiment of the present application;

FIG. 9 is a schematic diagram illustrating data redundancy provided by an embodiment of the present application through a subscription;

FIG. 10 is a schematic diagram of a file storage device provided by an embodiment of the present application;

fig. 11 shows a schematic diagram of a data retrieval device provided in an embodiment of the present application.

Detailed Description

The following detailed description of exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, makes it apparent that the described embodiments are only some embodiments of the application, and are not exhaustive of all embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

Based on the technical problems in the file distribution protocol represented by BT in the prior art, the present embodiment provides a file storage method and a data retrieval method, in which a seed server is replaced by a specific data storage and addressing rule, so that the storage locations of metadata blocks constituting a target file can be obtained only by the seed file itself when retrieving the file, without routing by the seed server.

An embodiment of the present application introduces a distributed storage system, where the distributed storage system includes a plurality of nodes, and each node can execute both the file storage method and the data retrieval method of the embodiment. The distributed storage system is implemented based on a structured peer-to-peer network, before the method provided by the embodiment is executed, the peer-to-peer network is firstly constructed, the algorithm for constructing the peer-to-peer network selects Kademlia algorithm (Kad for short), and in Kad algorithm, each node in the peer-to-peer network manages the network through a distributed hash table. After the construction is completed, a peer-to-peer network as shown in fig. 1 is formed. The specific functions performed by the various nodes in the distributed storage system are described in detail in the method embodiments below.

Fig. 2 shows a flowchart of a file storage method provided in this embodiment, please refer to fig. 2, which includes the following steps:

step S110: and cutting the file to be stored into a plurality of metadata blocks, and calculating the hash value of each metadata block.

In this embodiment, the metadata block serves as a basic unit for storage and retrieval.

When a file is stored, a current node divides the file to be stored into a plurality of metadata blocks, and respectively calculates the hash value of each metadata block. In one embodiment, the current node uniformly divides the file to be stored according to the size of 4K, and the residual data with the size less than 4K after division is used as a metadata block, so that a plurality of metadata blocks are finally obtained.

Step S120: and calculating the logical distance between each metadata block and the current node according to the hash value of each metadata block and the node address of the current node.

Step S130: and determining the storage node corresponding to each metadata block according to the logical distance.

Step S140: and respectively storing the plurality of metadata blocks on the corresponding storage nodes.

In this embodiment, the relationship between the file and the metadata block is described using a merkel tree. Referring to fig. 3, after step S110, the method further includes:

step S111: and generating a Merck tree according to the hash value of each metadata block.

Step S112: and calculating the logical distance between the Mercker tree and the current node according to the root node Hash of the Mercker tree and the node address of the current node.

Step S113: and determining the storage node corresponding to the Mercker tree according to the logical distance.

Step S114: and storing the Merckel tree on the corresponding storage node.

The mercker tree is a binary tree consisting of a set of leaf nodes, a set of intermediate nodes and a root node, and fig. 4 shows a schematic diagram of the mercker tree. The file to be stored is divided into metadata blocks A0, A1, A2 and A3, each leaf node of the Merck tree corresponds to a hash value of one metadata block, N0, N1, N2 and N3 are leaf nodes, the values are hash values of the metadata blocks A0, A1, A2 and A3 respectively, a group of intermediate nodes N4 and N5 are obtained according to the leaf nodes, the value of N4 is a hash value obtained after hash operation is performed on the leaf nodes N0 and N1, the value of N5 is a hash value obtained after hash operation is performed on the leaf nodes N2 and N3, the value of a root node N6 is obtained according to the intermediate nodes, the value of N6 is a hash value obtained after hash operation is performed on the intermediate nodes N4 and N5, and the hash value of the root node N6 is the hash of the root node of the Merck tree.

In steps S120 to S140, a storage node corresponding to each metadata chunk is determined according to the hash value of each metadata chunk, and in steps S111 to S114, the mercker tree is also regarded as one metadata chunk, and a storage node corresponding to the mercker tree is determined.

In step S120-step S140, the storage position of the metadata block is determined according to the logical distance between the metadata block and the current node. The logical distance between each metadata block and the current node can be calculated as follows:

(1) and performing XOR operation on the first N bits of the hash value of the metadata block and the first N bits of the node address of the current node to obtain a first distance.

And taking the first N bits of the binary hash value of the metadata block and the first N bits of the binary node address of the current node, and carrying out XOR operation on the N-bit hash value and the N-bit node address to obtain a first distance L1. And N is a positive integer and is the same as the number of K buckets in the DHT.

It should be noted that, in the peer-to-peer network, each node has a node address, the specific form of the node address is similar to the hash value, the node addresses of different nodes are different, and each node manages the network through the distributed hash table. Each node maintains a distributed hash table, fig. 5 shows a simple schematic diagram of the distributed hash table, the distributed hash table includes N lists, each list is called a K Bucket (K-Bucket), and as shown in fig. 5, the distributed hash table maintained by the node includes N K buckets, which are respectively K buckets 0 to K Bucket N. Each K bucket includes node information having the same logical distance, for example, K bucket 0 includes information of nodes having a logical distance of 0 from the current node, K bucket 1 includes information of nodes having a logical distance of 1 from the current node, K bucket 2 includes information of nodes having a logical distance of 2 from the current node, and K bucket n includes information of nodes having a logical distance of n from the current node. Each K bucket stores the node address, the IP address, and the port number of each node. If the number of the K buckets in the distributed hash table is 16, performing exclusive-or operation on the first 16 bits of the hash value of the metadata block and the first 16 bits of the node address of the current node to obtain a first distance.

(2) And calculating the logarithm with the base 2 of the first distance to obtain a first logarithm value.

The first pair of values is calculated as: y1=

Y1 is a first logarithm and L1 is a first distance.

(3) And obtaining the logical distance between the metadata block and the current node according to the difference value of the N and the first logarithm value.

And subtracting the first logarithm value from the N to obtain a difference value of the N logarithm value and the N logarithm value, and taking an integer of the difference value or a rounded integer as a logic distance between the metadata block and the current node. After the above calculation, the obtained logical distance is one of the integers in the range [0, N ].

For example, assuming that N =16, the first 16 bits of the hash value of a certain metadata block are subjected to an exclusive or operation with the first 16 bits of the node address of the current node, so as to obtain a character string after the exclusive or operation, and the first distance L1 is obtained according to the character string. Assuming that the first distance L1 is 14524, a first logarithm of L1 (14524) is calculated, resulting in y1 being 13.826. By subtracting 13.826 from 16, the difference is 2.174, and rounding to 2, the logical distance between the metadata block and the current node is determined to be 2.

And after the logical distance between each metadata block and the current node is obtained, selecting the nodes with the same logical distance from the distributed hash table as storage nodes of the metadata block.

For example, as shown in fig. 6, the file to be stored is divided into metadata blocks a0, a1, a2 and A3, and if the logical distance of the metadata block a0 from the current node is 3, the logical distance of the metadata block a1 from the current node is 5, the logical distance of the metadata block a2 from the current node is 3, and the logical distance of the metadata block A3 from the current node is 2. The current node queries K bucket 3 from the DHT to obtain the storage node for metadata chunk A0, queries K bucket 5 from the DHT to obtain the storage node for metadata chunk A1, queries K bucket 3 from the DHT to obtain the storage node for metadata chunk A2, and queries K bucket 2 from the DHT to obtain the storage node for metadata chunk A3.

If there are no nodes with the same logical distance in the distributed hash table, a node corresponding to a logical distance close to the logical distance (the logical distance between the metadata block and the current node) may be selected from the distributed hash table as a storage node.

In steps S111 to S114, the storage location of the Mercker tree is determined according to the logical distance between the Mercker tree and the current node. The logical distance between the merkel tree and the current node can be calculated as follows:

(1) and carrying out XOR operation on the first N bits of the root node hash of the Mercker tree and the first N bits of the node address of the current node to obtain a second distance.

Taking the first N bits of the root node hash of the binary system of the Mercker tree and the first N bits of the node address of the binary system of the current node, and carrying out XOR operation on the root node hash of the N bits and the node address of the N bits to obtain a second distance L2.

(2) And calculating the logarithm with the base 2 of the second distance to obtain a second logarithm value.

The calculation formula of the second logarithm is: y2=

Y2 is a second logarithm value, and L2 is a second distance.

(3) And obtaining the logical distance between the Merck tree and the current node according to the difference value of the N and the second logarithm value.

And subtracting the second logarithm value from N to obtain a difference value of the second logarithm value and taking an integer of the difference value or a rounded integer as the logic distance between the Merck tree and the current node. After the above calculation, the obtained logical distance is one of the integers in the range [0, N ].

And after the logical distance between the Mercker tree and the current node is obtained, selecting the node with the same logical distance from the distributed hash table as a storage node of the Mercker tree.

Through the steps of the above embodiment, the current node has completed the storage process of the file.

Further, fig. 7 shows a flowchart of the data retrieval method provided in this embodiment, please refer to fig. 7, which includes the following steps:

step S210: and acquiring a retrieval index of the target file, and acquiring a corresponding Mercker tree according to the retrieval index, wherein the retrieval index is the root node hash of the Mercker tree.

When the target file is searched, a search index of the target file is required to be obtained, the search index is provided by a file sharer, and the search index is the root node hash of the merkel tree generated when the target file is stored.

In one embodiment, step S210 includes: calculating the logical distance between the current node and the to-be-acquired Mercker tree according to the retrieval index and the node address of the current node; determining a storage node of the Merck tree according to the logical distance; and sending a Merck tree acquisition request to a corresponding storage node to acquire the Merck tree.

After step S210, step S220 is continuously executed: and obtaining the hash values of a plurality of metadata blocks forming the target file according to the Merckel tree.

Each leaf node of the Merckel tree corresponds to the hash value of one metadata block, and the hash value of each metadata block forming the target file can be obtained according to the Merckel tree.

Step S230: and calculating the logical distance between each metadata block and the current node according to the hash value of each metadata block and the node address of the current node.

Step S240: and determining the storage node of each metadata block according to the logical distance.

Step S250: and respectively sending metadata block acquisition requests to corresponding storage nodes to acquire corresponding metadata blocks.

Through steps S210 to S250, each metadata block required can be retrieved. And after each metadata block is obtained, combining the obtained metadata blocks together to obtain the target file.

Optionally, referring to fig. 8, in another embodiment, after step S250, a metadata block check is performed to prevent the metadata block from being tampered by others. And after each acquired metadata block is confirmed to be correct, combining the metadata blocks to obtain a target file. The verification process comprises the following steps:

step S310: the hash value of each acquired metadata block is calculated.

Step S320: and generating a to-be-verified Merck tree according to the obtained hash value of each metadata block.

And generating a Merck tree according to the obtained hash value of each metadata block, wherein the Merck tree is the Merck tree to be verified, and the root node hash of the Merck tree to be verified is obtained. The method for generating the to-be-verified merkel tree may refer to the foregoing step S111 and the embodiment shown in fig. 4.

Step S330: judging whether the root node hash of the to-be-verified Mercker tree is the same as the retrieval index, and if so, skipping to execute the step S340; if not, the step S350 is executed.

If a node in the network is attacked, the metadata block in the node may be tampered, or the node may refuse to return the trojan virus to the current node, so the correctness of the metadata block is verified through the merkel tree in the present embodiment. According to the structure of the merkel tree, it can be known that the hash value of the leaf node is changed and the hash value of the root node is changed finally when the data content of any metadata block is tampered, so that the root node hashes of the two merkel trees are compared, if the root node hashes are consistent, the obtained metadata block is correct, and if the root node hashes are inconsistent, the obtained metadata block is incorrect.

Of course, in other embodiments, the obtained hash value of each metadata chunk may be compared with the hash value of each metadata chunk in the merkel tree obtained in step S210 to determine whether each metadata chunk is correct.

Step S340: and combining the acquired metadata blocks to obtain the target file.

Step S350: and determining an abnormal metadata block, and sending a metadata block acquisition request to other nodes except the original storage node of the abnormal metadata block.

If the root node hash of the Mercker tree to be verified is different from the root node hash of the Mercker tree corresponding to the target file, the abnormal metadata block can be determined by utilizing the characteristics of the Mercker tree, and the metadata block can be obtained from other nodes again because the metadata block is stored on more than one node in the network, so that the current node sends a metadata block obtaining request to other nodes except the original storage node of the abnormal metadata block again.

It is understood that each node in the distributed storage system may be further configured to store a metadata block sent by any node in the distributed storage system after receiving the metadata block, and provide a corresponding metadata block to the node after receiving a metadata block obtaining request sent by any node in the distributed storage system.

Further, if the metadata block is only stored in a certain node, when the node exits the peer-to-peer network, the metadata block stored in the node cannot be accessed, and in order to solve the problem, the present embodiment allows the node to actively backup the metadata block to other nodes.

The backup method of the metadata block in this embodiment includes, but is not limited to, the following:

1. if any node in the peer-to-peer network receives the metadata block sent by other nodes except the node itself in the peer-to-peer network, the metadata block is stored and simultaneously broadcasted to the first adjacent node with the logical distance smaller than i. i may be a small value, for example, 1, 2, or 3, so that the metadata block may be backed up and stored on a first adjacent node close to the node, and even if the node exits from the peer-to-peer network, the metadata block may be acquired through the first adjacent node.

Optionally, after receiving the metadata block, the node first determines whether the metadata block exists locally, if so, the node may not store the metadata block, and if not, the node stores the metadata block.

2. Data redundancy may be provided by way of a subscription. Referring to fig. 9, the specific process of this method is:

(1) and the current node selects a second adjacent node with the logical distance smaller than j from the distributed hash table, and sends a metadata block subscription request to the second adjacent node. j may be a small value, such as 1, 2, or 3.

(2) The second adjacent node returns an allowance message to the current node, which indicates that the second adjacent node allows the current node to perform metadata block backup to the second adjacent node.

(3) The current node provides an initial list of metadata blocks to the second neighboring node. The initial metadata block list includes list information of a plurality of metadata blocks to be backed up.

(4) The second adjacent node selects the required metadata block from the initial metadata block list, forms a required metadata block list, and returns the required metadata block list to the current node.

(5) And the current node backups the corresponding metadata block to a second adjacent node according to the required metadata block list.

Through the processes of (1) to (5), the current node actively backs up the metadata block to a second adjacent node with the logical distance smaller than j.

3. When any node in the peer-to-peer network receives a metadata block acquisition request, judging whether a corresponding metadata block exists locally; if yes, sending the corresponding metadata block to a sending node of the metadata block acquisition request; if the metadata block does not exist, acquiring the corresponding metadata block from other nodes, forwarding the acquired metadata block to a request node of the metadata block acquisition request, counting the forwarding times of the metadata block, and if the forwarding times of the metadata block exceeds a preset threshold, storing the metadata block locally, including caching or persistent storage.

After the node receives the metadata block acquisition request of the metadata block, the node can directly return the metadata block to the requesting node without acquiring the metadata block from other nodes, and the forwarding process is reduced.

In this way, each node can actively cache the metadata block with higher heat.

In summary, in the technical solution provided in the embodiment of the present application, metadata block addressing is performed in a logical distance manner, when a file is stored, the file is divided into a plurality of metadata blocks, the metadata blocks are stored in corresponding storage nodes according to the logical distance between each metadata block and a current node, and when data retrieval is performed, the metadata blocks are requested from the corresponding storage nodes according to the logical distance between each metadata block and the current node. In the scheme, the seed server is replaced by the specific data storage and addressing rule, so that the required metadata block can be obtained only through the seed file when the file is retrieved, routing is not required by the seed server, the metadata block can be ensured to be obtained at any time because the seed server is not required to participate, meanwhile, because the whole network model is decentralized, the risk that the seed server is attacked does not exist, and the safety in the file storage and data retrieval process can be ensured.

Furthermore, because the file is divided into a plurality of metadata blocks to be stored respectively, each node only stores a part of the content of the file, but not the complete file, and the occupation of the storage space can be reduced.

Based on the same inventive concept, an embodiment of the present application provides a file storage apparatus, please refer to fig. 10, the apparatus includes:

the file dividing module 410 is configured to divide a file to be stored into a plurality of metadata blocks, and calculate a hash value of each metadata block;

a first distance calculating module 420, configured to calculate a logical distance between each metadata block and a current node according to the hash value of each metadata block and the node address of the current node;

a first node determining module 430, configured to determine, according to the logical distance, a storage node corresponding to each metadata block;

a data block storage module 440, configured to store the plurality of metadata blocks on corresponding storage nodes, respectively.

It is to be understood that the file storage apparatus in the present embodiment, the implementation principle and the technical effects thereof have been introduced in the foregoing file storage method, and for the sake of brief description, the corresponding description in the file storage method may be referred to for what is not mentioned in the file storage apparatus.

Based on the same inventive concept, an embodiment of the present application provides a data retrieval apparatus, please refer to fig. 11, the apparatus includes:

an index processing module 510, configured to obtain a retrieval index of a target file, and obtain a corresponding mercker tree according to the retrieval index, where the retrieval index is a root node hash of the mercker tree;

a hash obtaining module 520, configured to obtain hash values of a plurality of metadata blocks that form a target file according to the merkel tree;

a second distance calculating module 530, configured to calculate a logical distance between each metadata block and the current node according to the hash value of each metadata block and the node address of the current node;

a second node determining module 540, configured to determine a storage node of each metadata block according to the logical distance;

a data block obtaining module 550, configured to send metadata block obtaining requests to corresponding storage nodes respectively, so as to obtain corresponding metadata blocks.

It is to be understood that the data retrieval apparatus in the present embodiment, the implementation principle and the resulting technical effects thereof have been introduced in the foregoing data retrieval method, and for the sake of brief description, the corresponding description in the data retrieval method may be referred to for what is not mentioned in the data retrieval apparatus.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A method of file storage, the method comprising:

dividing a file to be stored into a plurality of metadata blocks, and calculating the hash value of each metadata block;

calculating the logical distance between each metadata block and the current node according to the hash value of each metadata block and the node address of the current node;

determining a storage node corresponding to each metadata block according to the logical distance;

and respectively storing the plurality of metadata blocks to corresponding storage nodes.

2. The method of claim 1, wherein after computing the hash value for each chunk of metadata, the method further comprises:

generating a Merck tree according to the hash value of each metadata block;

calculating the logical distance between the Mercker tree and the current node according to the root node Hash of the Mercker tree and the node address of the current node;

determining a storage node corresponding to the Mercker tree according to the logic distance;

and storing the Merckel tree on a corresponding storage node.

3. The method of claim 1, wherein calculating the logical distance between each metadata chunk and the current node according to the hash value of each metadata chunk and the node address of the current node comprises:

performing XOR operation on the first N bits of the hash value of the metadata block and the first N bits of the node address of the current node to obtain a first distance, wherein N is a positive integer and is the same as the number of K buckets in a distributed hash table in the current node;

calculating the logarithm taking 2 as the base of the first distance to obtain a first logarithm value;

and obtaining the logical distance according to the difference value of the N and the first logarithm value, wherein the logical distance is one integer in the range [0, N ].

4. The method according to any one of claims 1-3, further comprising:

and after receiving the metadata blocks transmitted by other nodes, storing the metadata blocks, and broadcasting the metadata blocks to a first adjacent node of which the logical distance to the current node is less than i.

5. The method according to any one of claims 1-3, further comprising:

determining a second adjacent node of which the logical distance with the current node is less than j;

sending a metadata chunk subscription request to the second neighboring node;

after receiving an allowance message returned by the second adjacent node, providing an initial metadata block list to the second adjacent node, wherein the allowance message indicates that the second adjacent node allows the current node to perform metadata block backup, and the initial metadata block list comprises list information of a plurality of metadata blocks to be backed up;

and after receiving a demand metadata block list returned by the second adjacent node based on the initial metadata block list, backing up a corresponding metadata block in the demand metadata block list to the second adjacent node.

6. The method of claim 1, further comprising:

when a metadata block acquisition request is received, judging whether a corresponding metadata block exists locally;

if not, acquiring corresponding metadata blocks from other nodes, and forwarding the acquired metadata blocks to a request node of the metadata block acquisition request;

and counting the forwarding times of the metadata block, and if the forwarding times of the metadata block exceed a preset threshold, storing the metadata block locally.

7. A method for data retrieval, the method comprising:

acquiring a retrieval index of a target file, and acquiring a corresponding Mercker tree according to the retrieval index, wherein the retrieval index is root node hash of the Mercker tree;

obtaining hash values of a plurality of metadata blocks forming the target file according to the Mercker tree;

determining a storage node of each metadata block according to the logical distance;

and respectively sending metadata block acquisition requests to corresponding storage nodes to acquire corresponding metadata blocks.

8. The method according to claim 7, wherein obtaining the corresponding merkel tree according to the search index comprises:

calculating the logical distance between the current node and the to-be-acquired Mercker tree according to the retrieval index and the node address of the current node;

determining a storage node of the Mercker tree according to the logical distance;

and sending a Merck tree acquisition request to a corresponding storage node to acquire the Merck tree.

9. The method of claim 7, wherein after obtaining the corresponding metadata block, the method further comprises:

calculating the hash value of each acquired metadata block;

generating a to-be-verified Merck tree according to the obtained hash value of each metadata block;

and judging whether the root node hash of the to-be-verified Mercker tree is the same as the retrieval index, and if so, combining the obtained multiple metadata blocks to obtain a target file.

10. The method according to claim 9, wherein after determining whether the root node hash of the to-be-verified merkel tree is the same as the search index, the method further comprises:

and if the root node hash of the to-be-verified Mercker tree is different from the retrieval index, determining an abnormal metadata block, and sending a metadata block acquisition request to other nodes except the original storage node of the abnormal metadata block.

11. A file storage apparatus, the apparatus comprising:

the file segmentation module is used for segmenting a file to be stored into a plurality of metadata blocks and calculating the hash value of each metadata block;

the first distance calculation module is used for calculating the logical distance between each metadata block and the current node according to the hash value of each metadata block and the node address of the current node;

the first node determining module is used for determining a storage node corresponding to each metadata block according to the logical distance;

and the data block storage module is used for respectively storing the plurality of metadata blocks to corresponding storage nodes.

12. A data retrieval device, the device comprising:

the index processing module is used for acquiring a retrieval index of a target file and acquiring a corresponding Mercker tree according to the retrieval index, wherein the retrieval index is the root node hash of the Mercker tree;

the hash acquisition module is used for acquiring hash values of a plurality of metadata blocks forming the target file according to the Mercker tree;

the second distance calculation module is used for calculating the logical distance between each metadata block and the current node according to the hash value of each metadata block and the node address of the current node;

the second node determining module is used for determining the storage node of each metadata block according to the logical distance;

and the data block acquisition module is used for respectively sending metadata block acquisition requests to the corresponding storage nodes so as to acquire the corresponding metadata blocks.

13. A distributed storage system, the system comprising a plurality of nodes, each node configured to:

after a file to be stored is obtained, the file to be stored is divided into a plurality of metadata blocks, and the hash value of each metadata block is calculated; calculating the logical distance between each metadata block and the current node according to the hash value of each metadata block and the node address of the current node; determining a storage node of each metadata block in the distributed storage system according to the logical distance; storing the plurality of metadata blocks to corresponding storage nodes respectively;

after a retrieval index of a target file is acquired, acquiring a corresponding Mercker tree according to the retrieval index, wherein the retrieval index is root node hash of the Mercker tree; obtaining hash values of a plurality of metadata blocks forming the target file according to the Mercker tree; calculating the logical distance between each metadata block and the current node according to the hash value of each metadata block and the node address of the current node; determining a storage node of each metadata block in the distributed storage system according to the logical distance; respectively sending metadata block acquisition requests to corresponding storage nodes to acquire corresponding metadata blocks;

after receiving a metadata block sent by any node in the distributed storage system, storing the metadata block;

after receiving a metadata block acquisition request sent by any node in the distributed storage system, providing a corresponding metadata block for the node.