WO2022121345A1

WO2022121345A1 - Distributed data storage method, data query method, device, and storage medium

Info

Publication number: WO2022121345A1
Application number: PCT/CN2021/111851
Authority: WO
Inventors: 袁兴强; 王志文; 吴思进
Original assignee: 杭州复杂美科技有限公司
Priority date: 2020-12-09
Filing date: 2021-08-10
Publication date: 2022-06-16
Also published as: CN112364209B; CN112364209A

Abstract

A distributed data storage method, which relates to the technical field of blockchains. The method comprises: S11, generating first data according to a block hash of a first number of consecutive blocks to be stored; S13, generating first archival data according to a first global index table corresponding to the first number of consecutive blocks; S15, according to the first data and a preset distributed data storage rule, determining a plurality of first blockchain nodes that are to receive the first archival data; S17, sending the first archival data to the first blockchain nodes for storing the first archival data; and S19, when a current node is not comprised in the first blockchain nodes, deleting the stored first global index table after a first period of time.

Description

Distributed data storage method, data query method, device and storage medium

technical field

The present application relates to the field of blockchain technology, and in particular to a distributed data storage method, data query method, device and storage medium.

Background technique

In the patent text proposed by the applicant (for details, please refer to the patent texts 2018108842951 and 2018108840354 of the applicant), the state data storage of the blockchain can also be stored in a global index table to improve query efficiency.

In the above mechanism, each blockchain node stores a global index table, but most of the data in the global index table is historical status data, not the latest status data. Saving historical status data is only for the convenience of query, wasteful A lot of disk space.

SUMMARY OF THE INVENTION

In view of the above-mentioned defects or deficiencies in the prior art, it is desirable to provide a distributed data storage method, data query method, device and storage medium that save disk space on the basis of improving query efficiency.

In the first aspect, the present invention provides a distributed data storage method suitable for blockchain nodes. The storage method of state data includes storing in the form of a global index table. Each blockchain node stores an mvcc index table, and the mvcc index table includes: A number of first key-value pairs, the first key of the first key-value pair includes the first address of the first user, and the first value of the first key-value pair includes the first block height of the first address where the state changes. A height set, the above method includes:

Generate the first data according to the block hash of the first number of consecutive blocks to be stored;

Generate the first archive data according to the first global index table corresponding to the first number of consecutive blocks;

Determine a number of first blockchain nodes that will receive the first archived data according to the first data and the preconfigured distributed data storage rules;

sending the first archived data to each of the first blockchain nodes for storing the first archived data;

When the current node is not included in the first blockchain node, delete the stored first global index table after the first duration;

Among them, the first archive information is used for blockchain nodes:

Receive a first query instruction; wherein, the first query instruction includes the second address of the second user and the queried second block interval height;

Find the corresponding second height set in the mvcc index table according to the second block interval height and the second address;

Perform the following operations on each of the third block heights in the second set of heights:

Generate the second data according to the block hash of the continuous block corresponding to the second archive data where the third block height is located;

Find several second blockchain nodes storing the second archived data according to the second data and the distributed data storage rules;

Send a first query instruction to at least one second blockchain node to find corresponding target data.

In a second aspect, the present invention provides a data query method applicable to a blockchain node. The blockchain node stores data in a distributed manner according to the method of the first aspect, and the method includes:

Find the corresponding second height set in the mvcc index table according to the second address;

S251: Generate second data according to the block hash of the continuous block corresponding to the second archive data where the third block height is located;

S252: Find several second blockchain nodes storing the second archive data according to the second data and the distributed data storage rules;

S253: Send the first query instruction to at least one second blockchain node requesting the second archive data to search for corresponding target data.

In a third aspect, the present invention also provides an apparatus comprising one or more processors and a memory, wherein the memory contains instructions executable by the one or more processors to cause the one or more processors to perform various functions according to the present invention. The distributed data storage method and the data query method provided by the embodiments.

In a fourth aspect, the present invention further provides a storage medium storing a computer program, and the computer program enables a computer to execute the distributed data storage method and data query method provided according to various embodiments of the present invention.

The distributed data storage method, data query method, device and storage medium provided by various embodiments of the present invention generate first data according to the block hash of the first number of consecutive blocks to be stored; The first global index table corresponding to the block generates the first archived data; according to the first data and the preconfigured distributed data storage rules, several first blockchain nodes that will receive the first archived data are determined; the first archived data is sent to Each first block chain node is used to store the first archive data; when the current node is not included in the first block chain node, the method of deleting the stored first global index table after a first time period improves query efficiency.

Description of drawings

Other features, objects and advantages of the present application will become more apparent by reading the detailed description of non-limiting embodiments made with reference to the following drawings:

FIG. 1 is a flowchart of a distributed data storage method according to an embodiment of the present invention.

FIG. 2 is a flowchart of a data query method according to an embodiment of the present invention.

FIG. 3 is a schematic structural diagram of a device according to an embodiment of the present invention.

Detailed ways

The present application will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the related invention, but not to limit the invention. In addition, it should be noted that, for the convenience of description, only the parts related to the invention are shown in the drawings.

It should be noted that the embodiments in the present application and the features of the embodiments may be combined with each other in the case of no conflict. The present application will be described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.

FIG. 1 is a flowchart of a distributed data storage method according to an embodiment of the present invention. As shown in FIG. 1, in this embodiment, the present invention provides a distributed data storage method suitable for blockchain nodes. The storage method of state data includes storing in the form of a global index table, and each blockchain node stores a mvcc index table, the mvcc index table includes several first key-value pairs, the first key of the first key-value pair includes the first address of the first user, and the first value of the first key-value pair includes the state of the first address changes. The first height set of each first block height, and the above method includes:

S11: Generate first data according to the block hashes of the first number of consecutive blocks to be stored;

S13: Generate first archive data according to the first global index table corresponding to the first number of consecutive blocks;

S15: Determine a number of first blockchain nodes that will receive the first archived data according to the first data and the preconfigured distributed data storage rules;

S17: Send the first archive data to each first blockchain node for storing the first archive data;

S19: when the current node is not included in the first blockchain node, delete the stored first global index table after the first duration;

Among them, the first archive information is used for blockchain nodes:

Specifically, S15 includes "calculating the first distance between the node id of each blockchain node and the first data; determining the second number of blockchain nodes with the smallest first distance as the first blockchain node", the corresponding , according to the second data and distributed data storage rules to find several second blockchain nodes that store the second archived data, including "calculating the second distance between the node id of each blockchain node and the second data; the second distance is the smallest The second number of blockchain nodes is determined as the second blockchain node”, the first number is 100, the second number is 5, and the first duration is 2min as an example; assuming that according to the first number of consecutive areas to be stored The block is block(1)～block(100);

The blockchain node executes step S11, and generates data chunkhash (1-100) according to blockhash (1)-blockhash (100);

The blockchain node executes step S13, and generates archive data chunk (1-100) according to the global index table T (1-100) corresponding to block (1)-block (100);

The blockchain node executes step S15 to calculate the distance between the node id of each blockchain node and the chunkhash (1-100); the 5 blockchain nodes with the smallest distance are determined as the blockchain that will receive the chunk (1-100). Node; assume that the blockchain nodes that will receive chunks (1-100) are N1-N5;

The blockchain node executes step S17, and sends the chunk (1-100) to N1-N5, and N1-N5 stores the chunk (1-100);

The blockchain node executes step S19, when the current node is not included in N1-N5, deletes the stored T(1-100) after 2 minutes.

Suppose that the blockchain node N50 receives the query command "addr(A), [50,70]"; the height set that addr(A) searches in the mvcc index table is {65, 105, 185}; N1~N5 store chunks (1~100) , N6～N10 store chunks (101～200), N11～N15 store chunks (201～300), N16～N20 store chunks (301～400), N21～N25 store chunks (401～500), other status data has not been distributed data storage;

N50 searches for the corresponding height set in the mvcc index table according to [50,70] and addr(A), and the found corresponding set is {65};

For 65:

N50 generates data chunk (1-100) according to the block hash (ie blockhash(1)-blockhash(100)) of the continuous block corresponding to the archived data where 65 is located;

N50 calculates the distance between the node id of each blockchain node and the chunk (1-100); the 5 blockchain nodes N1-N5 with the smallest distance are determined as the blockchain nodes that store the chunk (1-100);

N50 sends addr(A),[50,70] to one or more nodes in N1~N5 (assuming that it is only sent to N1);

N1 traverses [50,70] to find the target data according to addr(A):

N1 finds the data corresponding to addr(A) at 65 in chunk(1~100);

N1 returns the data corresponding to addr(A) at 65 to N50.

In more embodiments, S15 can also be configured according to actual needs, for example, configured to calculate the first distance between the node id of each blockchain node and the first data; The node is determined to be the first blockchain node, and accordingly, according to the second data and distributed data storage rules to find several second blockchain nodes that store the second archived data, they should also be configured as "calculate the nodes of each blockchain node". The second distance between the id and the second data; determining the second number of blockchain nodes with the largest second distance as the second blockchain node” can achieve the same technical effect.

In more embodiments, the first number can also be configured according to actual requirements, for example, it is configured as 1000, which can achieve the same technical effect.

In more embodiments, the second number can also be configured according to actual needs, for example, it is configured as 10, which can achieve the same technical effect.

In more embodiments, the first duration may also be configured according to actual requirements, for example, configured to be 1 min, which can achieve the same technical effect.

This embodiment makes it more convenient to obtain historical state data of a specified account in a certain height interval. The historical status data of the specified account only needs to be obtained from one node, and the data of one month, one quarter, one year or more can be queried, which improves the efficiency of data query.

Preferably, according to the first data and preconfigured distributed data storage rules, determining a number of first blockchain nodes that will receive the first archived data includes:

Calculate the first distance between the node id of each blockchain node and the first data;

Determine the second number of blockchain nodes with the smallest first distance as the first blockchain node;

Finding several second blockchain nodes storing the second archived data according to the second data and distributed data storage rules includes:

Calculate the second distance between the node id of each blockchain node and the second data;

The second number of blockchain nodes with the smallest second distance is determined as the second blockchain node.

For the principle of distributed data storage in the foregoing embodiment, reference may be made to the method shown in FIG. 1 , which will not be repeated here.

Preferably, the fourth block height of the first block with the largest block height among the first number of consecutive blocks is smaller than the difference between the current block height and the safe rollback depth.

The above-mentioned embodiment ensures that the fragmented data will not be rolled back, which improves user experience.

Preferably, deleting the stored first global index table after the first duration includes:

Perform the following operations on each second key-value pair of the first global index table:

Determine whether the second value of the second key-value pair is the latest state data on the blockchain:

If yes, keep the second key-value pair;

After the first time period, delete the first global index table excluding each second key value to the outside.

When a transaction is executed, a latest version of the state data needs to be obtained; therefore, a latest version of the data is kept locally, and the data of the historical version can be stored in a distributed manner.

Those skilled in the art should be able to imagine that if a certain second key has not been updated in the last several heights, for example, the current height is 1 million, the latest version of a certain second key is 899,999, and the height exceeds 100,000. If the second key has not been updated, it means that the access frequency of the second key is very low, and at this time, the second key should be stored in a distributed manner.

FIG. 2 is a flowchart of a data query method according to an embodiment of the present invention. As shown in FIG. 2, in this embodiment, the present invention provides a data query method suitable for blockchain nodes. The blockchain nodes perform distributed storage of data according to the above distributed data storage method, and the above method includes:

S21: Receive a first query instruction; wherein, the first query instruction includes the second address of the second user and the queried second block interval height;

S23: Find the corresponding second height set in the mvcc index table according to the second address;

S252: according to the second data and distributed data storage rules, find several second block chain nodes that store the second archive data;

The above embodiment makes it more convenient to obtain the historical status data of a specified account in a certain height interval, and can query data of one month, one quarter, one year or more, which improves the efficiency of data query.

For the data query principle of the foregoing embodiment, reference may be made to the method shown in FIG. 1 , which will not be repeated here.

As shown in FIG. 3 , as another aspect, the present application also provides a device including one or more central processing units (CPUs) 301 , which can be stored in a read-only memory (ROM) 302 according to a program or from The storage section 308 loads programs into the random access memory (RAM) 303 to execute various appropriate actions and processes. In the RAM 303, various programs and data necessary for the operation of the device 300 are also stored. The CPU 301 , the ROM 302 , and the RAM 303 are connected to each other through a bus 304 . An input/output (I/O) interface 305 is also connected to bus 304 .

The following components are connected to the I/O interface 305: an input section 306 including a keyboard, a mouse, etc.; an output section 307 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker, etc.; a storage section 308 including a hard disk, etc. ; and a communication section 309 including a network interface card such as a LAN card, a modem, and the like. The communication section 309 performs communication processing via a network such as the Internet. A drive 310 is also connected to the I/O interface 305 as needed. A removable medium 311, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is mounted on the drive 310 as needed so that a computer program read therefrom is installed into the storage section 308 as needed.

In particular, according to an embodiment of the present disclosure, the method described in any of the above embodiments may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program containing program code for performing any of the methods described above. In such an embodiment, the computer program may be downloaded and installed from the network via the communication portion 309 and/or installed from the removable medium 311 .

As yet another aspect, the present application also provides a computer-readable storage medium. The computer-readable storage medium may be a computer-readable storage medium included in the apparatus of the foregoing embodiment; A computer-readable storage medium in a device. The computer-readable storage medium stores one or more programs that are used by one or more processors to perform the methods described in the present application.

The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by dedicated hardware-based systems that perform the specified functions or operations , or can be implemented by a combination of dedicated hardware and computer instructions.

The units or modules involved in the embodiments of the present application may be implemented in a software manner, and may also be implemented in a hardware manner. The described units or modules may also be provided in the processor, for example, each of the units may be a software program provided in a computer or a mobile smart device, or may be a separately configured hardware device. Wherein, the names of these units or modules do not constitute limitations on the units or modules themselves under certain circumstances.

The above description is only a preferred embodiment of the present application and an illustration of the applied technical principles. Those skilled in the art should understand that the scope of the invention involved in the present application is not limited to the technical solutions formed by the specific combination of the above-mentioned technical features, and should also cover, without departing from the concept of the present application, the above-mentioned technical features or Other technical solutions formed by any combination of its equivalent features. For example, a technical solution is formed by replacing the above-mentioned features with the technical features disclosed in this application (but not limited to) with similar functions.

Claims

A distributed data storage method, characterized in that the storage method of state data includes storing in the form of a global index table, each block chain node stores an mvcc index table, and the mvcc index table includes several first key-value pairs, so The first key of the first key-value pair includes the first address of the first user, and the first value of the first key-value pair includes the first height of each first block height where the state of the first address changes collection, the method is applicable to a blockchain node, and the method includes:

Generate the first data according to the block hash of the first number of consecutive blocks to be stored;

generating first archive data according to the first global index table corresponding to the first number of consecutive blocks;

Determine a number of first blockchain nodes that will receive the first archived data according to the first data and preconfigured distributed data storage rules;

sending the first archived data to each of the first blockchain nodes for storing the first archived data;

When the current node is not included in the first blockchain node, delete the stored first global index table after a first duration;

Wherein, the first archive information is used for blockchain nodes:

Receive a first query instruction; wherein, the first query instruction includes the second address of the second user and the queried second block interval height;

Search for a corresponding second height set in the mvcc index table according to the second block interval height and the second address;

Perform the following operations on each of the third block heights in the second height set:

Generate second data according to the block hash of the continuous block corresponding to the second archive data where the third block height is located;

Find several second blockchain nodes storing the second archived data according to the second data and the distributed data storage rules;

Send the first query instruction to at least one of the second blockchain nodes to find corresponding target data.
The method according to claim 1, wherein the determining, according to the first data and a preconfigured distributed data storage rule, a plurality of first blockchain nodes that will receive the first archived data comprises:

Calculate the first distance between the node id of each blockchain node and the first data;

Determining a second number of blockchain nodes with the smallest first distance as the first blockchain node;

The finding, according to the second data and the distributed data storage rules, several second blockchain nodes that store the second archive data includes:

Calculate the second distance between the node id of each blockchain node and the second data;

The second number of blockchain nodes with the smallest second distance is determined as the second blockchain node.
The method according to claim 1 or 2, wherein the fourth block height of the first block with the largest block height in the first number of consecutive blocks is smaller than the current block height and the security rollback difference in depth.
The method according to claim 1 or 2, wherein the deleting the stored first global index table after a first time period comprises:

Perform the following operations on each second key-value pair of the first global index table:

Determine whether the second value of the second key-value pair is the latest state data on the blockchain:

If yes, keep the second key-value pair;

After the first time period, the first global index table excluding each of the second key values to the outside is deleted.
A data query method, characterized in that each blockchain node stores data in a distributed manner according to the method according to any one of claims 1-4, the method is applicable to blockchain nodes, and the method includes:

Receive a first query instruction; wherein, the first query instruction includes the second address of the second user and the queried second block interval height;

Find the corresponding second height set in the mvcc index table according to the second block interval height and the second address;

Perform the following operations on each of the third block heights in the second height set:

Generate second data according to the block hash of the continuous block corresponding to the second archive data where the third block height is located;

Find several second blockchain nodes storing the second archived data according to the second data and the distributed data storage rules;

Send the first query instruction to at least one of the second blockchain nodes to find corresponding target data.
A computer device, characterized in that the device comprises:

one or more processors;

memory for storing one or more programs,

The one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-5.
A storage medium storing a computer program, characterized in that, when the program is executed by a processor, the method according to any one of claims 1-5 is implemented.