CN117421157A

CN117421157A - Data backup storage method and system based on block chain

Info

Publication number: CN117421157A
Application number: CN202311330373.0A
Authority: CN
Inventors: 姚昱旻; 王雪晴; 肖晶; 陈孝经; 刘齐军; 谭林
Original assignee: Guangdong Tianhe Guoyun Technology Co ltd
Current assignee: Guangdong Tianhe Guoyun Technology Co ltd
Priority date: 2023-10-13
Filing date: 2023-10-13
Publication date: 2024-01-19

Abstract

The invention discloses a data backup storage method and a system based on a block chain, wherein the data backup storage method comprises the following steps: the data to be fragmented are fragmented to obtain N data blocks; the data to be fragmented comprises at least two backup data corresponding to target data to be stored, wherein N is an integer greater than 2; based on the number N of the data blocks and the slicing strategy, predicting the number M of backup nodes selected from the blockchain and the number K of the data block storage of the backup node i; determining storage allocation strategies of N data blocks among M backup nodes based on M, K and backup capacity factors S of the backup nodes; and each encrypted data block with the unique identification tag is correspondingly stored in M backup nodes based on the storage allocation strategy. The invention can ensure the integrity and the safety of the backup data; meanwhile, strong recovery capability is provided, and the system can be ensured to operate efficiently and stably when facing faults and abnormal conditions.

Description

Data backup storage method and system based on block chain

Technical Field

The invention relates to the technical field of partitioned data backup storage in on-chain data processing, in particular to a data backup storage method and system based on a block chain.

Background

The blockchain storage technology is a distributed data storage technology for storing and managing data in a blockchain network. It is one of the core components of the blockchain technology. Traditional data storage approaches typically rely on a centralized database or server, which makes the data vulnerable to control, tampering, and loss by a centralized authority. In contrast, blockchain storage techniques provide higher security, decentralization, and tamper resistance characteristics by distributing data across multiple nodes of a network. Blockchain storage techniques are based on a series of blocks (blocks), each containing some data records, and form a chain structure by a cryptographic hash algorithm. Each block contains the hash value of the previous block, thus forming a complete history of the data. Blockchain storage techniques use consensus algorithms to ensure that all nodes on the network agree on the consistency of the data.

In the prior art, in order to ensure the security and privacy of the backup node, the backup node is mostly selected by a mode of randomly selecting the node. But this does not guarantee the ability of the selected node to complete node backup and restore.

First, although randomly selecting backup nodes from a pool of nodes may reduce the risk of potential malicious attacks and data leakage, reduce predictive attacks, and increase the difficulty of an attacker. There are some malicious nodes that may be selected as backup nodes, affecting the security and privacy of the data.

In addition, in the prior art, the data containing redundant data is divided into N data blocks, and the N data blocks are respectively distributed to a plurality of nodes of the block chain network for storage, so that the safety of data storage can be improved, and when a single node is attacked, the integrity of the data is not affected even if the data of the single node is lost. However, even if the backup node is simply divided into a plurality of parts, it is still difficult to restore data when the backup node fails.

Finally, the capacity of node recovery is often achieved by adopting distributed storage and replication at present, namely, data is stored and replicated in a distributed mode among a plurality of nodes, when one node fails, other nodes can provide data copies, and availability and data integrity of a system are ensured. But this approach requires additional resources, which may lead to reduced performance and increased costs. During the recovery process, there may be situations where the data is inconsistent, especially in a distributed system, which may require additional mechanisms to address.

In summary, in the prior art, there are short boards for the selection of backup nodes, the allocation of multiple backup nodes, and the restoration means and technology of the nodes.

Disclosure of Invention

The present invention is directed to a method and a system for backup storage of data based on blockchain, which solve at least one of the above problems in the prior art.

In order to solve the technical problems, the invention adopts the following technical scheme:

a data backup storage method based on a block chain is characterized by comprising the following steps:

step 1, slicing data to be sliced to obtain N data blocks; the data to be fragmented comprises at least two backup data corresponding to target data to be stored, wherein N is an integer greater than 2;

step 2, based on the number N of data blocks and the slicing strategy, predicting the number M of backup nodes selected from the blockchain and the number K of data block storage of the backup node i;

step 3, determining storage allocation strategies of N data blocks among M backup nodes based on M, K and backup capacity factors S of the backup nodes;

and step 4, storing each encrypted data block with the unique identification tag into M backup nodes correspondingly based on the storage allocation strategy.

In a preferred manner, in the step 3, the storage allocation policy includes: each data block is stored at least once in each of two different backup nodes.

As a preferred manner, the storage allocation policy includes: and storing at least part of data blocks in the rest backup nodes in the backup nodes with the largest backup capacity factor S value in the M backup nodes.

Further, in the step 3, a relationship table between the backup node and the data block information stored corresponding to the backup node is generated based on M, K and the storage allocation policy.

In a preferred manner, in the step 3, the backup capability factor S of each backup node is determined based on the security of the backup node itself and/or the number of times the backup node has been historically an accounting node.

As a preferred manner, the determining the backup capability factor S of each backup node based on the security of the backup node itself includes:

and determining S according to the encryption technology level and the encryption algorithm reliability used by the backup node: the higher the encryption technology level, the greater S; the higher the encryption algorithm reliability is, the larger S is;

the determining the backup capability factor S of each backup node based on the number of times the backup node history becomes an accounting node includes:

the greater the number of times the backup node history becomes an accounting node, the greater S.

Further, the method further comprises the following steps:

and step 5, adopting a distributed storage consistency algorithm to enable the data on each backup node to keep synchronous.

Based on the same inventive concept, the invention also provides a data backup storage system based on the block chain, which is characterized by comprising:

and the slicing processing module is used for: the method comprises the steps of performing slicing processing on data to be sliced to obtain N data blocks; the data to be fragmented comprises at least two backup data corresponding to target data to be stored, wherein N is an integer greater than 2;

and a prediction module: the method comprises the steps of predicting the number M of backup nodes selected from a blockchain and the number K of data block storage of the backup node i based on the number N of the data blocks and a fragmentation processing strategy;

an allocation policy determination module: the storage allocation strategy of the N data blocks among the M backup nodes is determined based on M, K and the backup capacity factors S of the backup nodes;

and a distribution module: and the encrypted data blocks with the unique identification labels are correspondingly stored into M backup nodes based on the storage allocation strategy.

Further, the system also comprises a data backup storage device, wherein the data backup storage device comprises:

and a management module: for managing the reading and slicing of data;

privacy module: for ensuring security and privacy of data;

redundancy module: for redundant backup of data;

and a cache module: the method is used for caching hot data or high-frequency access data;

and (3) a copy module: for responsible for migration and tracking of data;

and a disaster recovery module: for handling data recovery in the event of data loss or disaster;

P2P network module: for communication and data transmission between nodes to ensure data synchronization and consistency between nodes.

Further, the node recovery device is also included, and the node recovery device includes:

and a consensus module: for ensuring consistency and consensus among the various nodes;

backup module: for handling persistence and backup of data;

a distributed database module: the method is used for automatically reconstructing and repairing the data under the condition of node failure or data loss;

and a monitoring module: the system is used for monitoring the state and health condition of the node in real time, and alarming or triggering a node recovery mechanism when the node fails or is abnormal;

the algorithm module: policies and algorithms for defining node recovery;

and a communication module: for responsible for communication between nodes.

Compared with the prior art, the method and the device predict the number of the backup nodes and the corresponding data block storage number according to the number of the backup data blocks and the fragmentation processing strategy, and determine the storage allocation strategy by integrating the backup capacity factors of the backup nodes, so that the integrity and the safety of the backup data can be ensured; meanwhile, the invention sets the data backup storage device and the node recovery device, thereby providing strong recovery capability and ensuring that the system can operate efficiently and stably when facing faults and abnormal conditions. The invention can effectively improve the backup and recovery capability of the node and ensure the safety and privacy of the backup node.

Drawings

FIG. 1 is a flow chart of a block chain based data backup storage method according to an embodiment of the invention.

FIG. 2 is a block diagram of an embodiment of a data backup storage device according to the present invention.

FIG. 3 is a block diagram of an embodiment of a node recovery apparatus according to the present invention.

The system comprises a data backup storage device 1, a management module 101, a privacy module 103, a redundancy module 104, a cache module 105, a copy module 106, a disaster recovery module 107, a P2P network module 2, a node recovery device 201, a consensus module 202, a backup module 203, a distributed database module 204, a monitoring module 205, an algorithm module 205 and a communication module 206.

Detailed Description

In order to make the person skilled in the art better understand the solution of the present invention, the technical solution of the embodiment of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiment. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

As shown in fig. 1, a first aspect of an embodiment of the present invention provides a data backup storage method based on a blockchain, including the following steps:

step 1, slicing data to be sliced to obtain N data blocks; the data to be fragmented comprises at least two backup data corresponding to target data to be stored, and N is an integer greater than 2.

In step 1, the backup data is preferably but not limited to fragmented by using a consistent hashing algorithm, so that data skew and data redistribution during node failure can be avoided.

And 2, predicting the number M of backup nodes selected from the blockchain and the number K of data block storage of the backup node i based on the number N of data blocks and the slicing strategy.

Wherein predicting the number of backup nodes M selected from the blockchain preferably, but not limited to, includes: first, the number of backup nodes needed needs to be predicted based on the number of data blocks N and the sharding policy. At the same time, it is desirable to reduce the number of backup nodes as much as possible to save resources.

Wherein the number of data block storages K of the predictive backup node i is preferably, but not limited to, determined based on the storage capacity of the backup node i and the network bandwidth. The storage capacity of the backup node should correspond to the number K of data blocks it should store. If the storage capacity of the backup node is too large or too small, it may result in waste of resources or insufficient data storage.

And 3, determining storage allocation strategies of the N data blocks among the M backup nodes based on M, K and backup capacity factors S of the backup nodes. Upon selection of a backup node, more important data (involving more different data blocks) may be intentionally stored in the more powerful backup node depending on the size of the backup node's capacity.

In some preferred embodiments, in step 3, the method further includes generating a table of correspondence between the backup nodes and the data block information stored corresponding thereto based on M, K and the storage allocation policy. The corresponding relation table is used for describing each backup node and the corresponding backup data block number in detail. This helps manage and track which backup node stores which sliced data blocks.

In the step 3, the storage allocation policy preferably includes, but is not limited to: each data block is stored at least once in each of two different backup nodes. By the method, each backup data block can be ensured to occur at least once in different backup nodes, so that complete data can be found from other backup nodes when one backup node is lost.

In the step 3, the backup capability factor S of each backup node is preferably, but not limited to, determined based on the security of the backup node itself and/or the number of times the backup node history becomes an accounting node.

The backup capability factor S of each backup node is preferably but not limited to determined based on the security of the backup node itself, including:

and determining S according to the encryption technology level and the encryption algorithm reliability used by the backup node: the higher the encryption technology level, the greater S; the higher the encryption algorithm reliability, the greater S.

a. Encryption technology level:

low-level encryption: for nodes that use only simple or hacked encryption methods, the security rating may be low.

Medium-level encryption: for nodes using standard but not most advanced encryption methods, a medium security rating is provided.

Advanced encryption: for nodes using current most advanced encryption techniques and practices, a high security rating is given.

b. Encryption algorithm reliability used:

known drawbacks: if the algorithm has known flaws or is vulnerable to certain attacks, the node's rating will decrease.

Extensive application and validation of algorithms: nodes using widely tested and validated algorithms (e.g., RSA, SHA-256, etc.) should get higher scores.

The number of times the backup node history becomes an accounting node may be considered a trust factor. Often becoming an accounting node may mean that this node is trusted in the blockchain network.

Low frequency: if a node rarely or never becomes an accounting node, its score may be low.

Medium frequency: if a node occasionally becomes a billing node, a medium score is given.

High frequency: the node that is often selected as the billing node should obtain a high score.

In the step 3, the storage allocation policy preferably includes, but is not limited to: and storing at least part of data blocks in the rest backup nodes in the backup nodes with the largest backup capacity factor S value in the M backup nodes.

One specific storage allocation policy example is as follows:

and storing the fragmented backup data in the corresponding backup nodes according to the predicted corresponding table. For example, if the backup data is divided into 7 data blocks, a corresponding table is generated according to the number of 7, and the table includes searching for 4 backup nodes, wherein the number of the data blocks stored in each backup node is 3 (1 st to 3 rd backup data blocks), 3 (3 rd to 5 th backup data blocks), 3 (5 th to 7 th backup data blocks), and 5 (1 st, 2 nd, 4 th, 6 th and 7 th backup data blocks). The number of backup node storages may also be 3,4, respectively. Because the fourth node comprises different data blocks in the 1 st, 2 nd, 3 rd and 4 th backup nodes, the data in 5 (the 1 st, 2 nd, 4 th, 6 th and 7 th backup data blocks) are stored and backed up in the backup nodes which are safer and have more accounting times at the same time, so that the safety of the backup of the fragmented data is improved.

In step 1, the data to be fragmented is fragmented, and when N data blocks are obtained, each data block is marked with a unique identification tag at the same time, and the unique identification tag can be used as the unique identification of the corresponding data block. Alternatively, the hash value of each of the N data blocks may be calculated, and the hash value of the data block may be used as a unique identification tag for the data block. Where hashing is the transformation of an arbitrary length input (also called pre-mapped pre-image) into a fixed length output by a hashing algorithm, i.e. compressing an arbitrary length message to a certain fixed length message digest.

The invention also needs to encrypt each data block by using a data encryption technology, so that even if the backup node is invaded, the data cannot be leaked.

The blockchain-based data backup storage method of the invention preferably but not exclusively comprises a step 5 of adopting a distributed storage consistency algorithm to keep the data on each backup node synchronous. The copies of the backup data blocks are periodically checked and managed through copy management, so that the backup data blocks on each backup node are kept up to date and complete.

Based on the same inventive concept, a second aspect of the present invention also provides a data backup storage system based on a blockchain, which is characterized by comprising:

In order to enhance the recovery capability of the nodes, the invention adopts a decentralised consensus mechanism and a data storage mode, thereby improving the anti-fault capability of the system and reducing the single-point fault risk. Meanwhile, a new generation of distributed database technology is adopted, so that higher performance, expandability and data consistency can be provided, and the influence caused by node recovery is reduced. The invention provides two devices: data backup storage device and node recovery device. The organic combination of these modules will provide a strong recovery capability, ensuring that the system operates efficiently and stably in the face of faults and anomalies.

As shown in fig. 2, the data backup storage device 1 preferably, but not limited to, includes:

management module 101: for managing the reading and slicing of data.

Privacy module 102: the security and privacy of the data are ensured, including functions of encryption, access control, identity verification, authority management and the like of the data, so that unauthorized access and data disclosure are prevented.

Redundancy module 103: for redundant backup of data to ensure reliability and high availability of the data. Redundancy module 103 typically involves copying, synchronizing, and recovering of data.

The cache module 104: for improving the performance and efficiency of data access. The cache module 104 temporarily stores hot or frequently accessed data in a cache to speed up the reading of the data.

Replica module 105: for responsible for migration and tracking of data; is responsible for migration and tracking of data. The copy module 105 involves the movement, import and export of data, and change recording and version tracking of data.

Disaster recovery module 106: for handling data recovery in the event of data loss or disaster; disaster recovery module 106 typically involves backup and restore policies for data to ensure that data is not lost in unpredictable situations.

P2P network module 107: for communication and data transmission between nodes to ensure data synchronization and consistency between nodes. In a blockchain system, the P2P network module 107 plays an important role in helping nodes share and propagate partitioned data.

As shown in fig. 3, the node recovery apparatus 2 preferably, but not limited to, includes:

consensus module 201: and the decentralised consensus mechanism module is used for ensuring consistency and consensus among all nodes, so that the nodes can mutually agree in the recovery process, and the situations of divergence and data inconsistency are avoided.

Backup module 202: is responsible for handling the persistence and backup of data to ensure that the data is not lost or corrupted during the recovery of the node, while being able to recover the data efficiently.

Distributed database module 203: the latest distributed database technology is adopted for automatically reconstructing and repairing data under the condition of node failure or data loss, so that the integrity and usability of the data are ensured.

The monitoring module 204: the system is used for monitoring the state and health condition of the node in real time, and alarming or triggering a node recovery mechanism when the node fails or is abnormal;

algorithm module 205: policies and algorithms for defining node recovery include priorities for data recovery, data reconstruction algorithms, failed node replacement policies, and the like.

The communication module 206: the method is used for being in charge of communication between nodes, so that the fault node can coordinate with other nodes to perform data transmission and recovery.

The method has important significance for the reasonable selection technology of the backup node in the fields of public chains, private chains, cross-chain interaction, decentralization application, intelligent contract execution and the like of the blockchain. The method can improve the reliability, the safety and the performance of the blockchain network, protect the integrity and the usability of data on a chain and improve the user experience.

The invention has the following beneficial effects:

(1) Integrity of

Integrity of data refers to the data not being tampered with, damaged or lost during storage and transmission. Maintaining the integrity of the data is very important because it ensures the trustworthiness and reliability of the data. In the scheme, the backup nodes are reasonably selected according to the capacity of the backup nodes, and meanwhile, the data blocks after each block of slicing are ensured to appear at least twice in different backup nodes, so that complete data can be found from other backup nodes when one backup node is lost. The present invention records a complete history of transactions for each data block. By maintaining the integrity of the data block, the source, change and operation of the data block can be traced back, transparency and traceability are provided, the quality and accuracy of the data block are improved, the transmission and use of error data are reduced, and the quality and reliability of the data are improved. And higher data security and reliability are provided for users. Users can participate in the blockchain network with confidence that the data stored on the chain has not been tampered with or manipulated. This helps to build trust, pushing the wide range of applications and adoption of blockchains.

(2) Safety of

Security of data refers to the risk of protecting data from unauthorized access, tampering, corruption, or loss. Ensuring the security of data is critical to ensuring the credibility of the blockchain network and protecting the privacy of users. In the invention, the data blocks are reasonably distributed to a plurality of backup nodes, and the data on different backup nodes are ensured to keep synchronous through a data encryption technology, a data slicing technology, a consistent hash algorithm, a distributed storage consistent algorithm and the like. The safety of the data is ensured, and the efficiency and the reliability of the data processing on the chain are improved. By encryption techniques, increased security of data may prevent unauthorized access, which helps to protect sensitive information and personal privacy. The method reduces various risks of data, effectively resists tampering and fraud, and makes the blockchain a reliable tamper-proof and fraud-proof platform.

(3) Restorability of

Data recoverability refers to the ability to restore data to an original state or near original state by backup means when the data is lost, damaged, or under attack. The invention designs a data backup storage device and a node recovery device, and provides strong recovery capability. The method ensures that even if data loss occurs, the data can be quickly recovered from the backup, and the data is prevented from being permanently lost. Malicious tampering of the data is prevented, and if the data of a certain node is tampered, other nodes can detect abnormality by comparing respective copies. After tampering is found, the correct data version can be restored through backup or other means, and the integrity and accuracy of the data are ensured.

The slicing processing module, the predicting module, the allocation strategy determining module, the distributing module and the like are used for realizing the data backup storage method based on the block chain. It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the system is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed method and system may be implemented in other manners. For example, the above-described method and system embodiments are merely illustrative, e.g., the division of the modules or units is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed.

The units described as separate parts may or may not be physically separate, i.e. may be located in one place, or may be distributed over a plurality of units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.

The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims

1. The data backup and storage method based on the block chain is characterized by comprising the following steps of:

2. The blockchain-based data backup storage method of claim 1, wherein in step 3, the storage allocation policy includes: each data block is stored at least once in each of two different backup nodes.

3. The blockchain-based data backup storage method of claim 2, wherein the storage allocation policy includes: and storing at least part of data blocks in the rest backup nodes in the backup nodes with the largest backup capacity factor S value in the M backup nodes.

4. The blockchain-based data backup storage method of claim 1, further comprising generating a table of relationships between backup nodes and their corresponding stored data block information based on M, K and a storage allocation policy in step 3.

5. The blockchain-based data backup storage method according to claim 1, wherein in the step 3, the backup capability factor S of each backup node is determined based on the security of the backup node itself and/or the number of times the backup node history becomes an accounting node.

6. The blockchain-based data backup storage method of claim 5, wherein the determining the backup capability factor S of each backup node based on the security of the backup node itself includes:

7. The blockchain-based data backup storage method of any of claims 1 to 6, further comprising:

8. A blockchain-based data backup storage system, comprising:

9. The blockchain-based data backup storage system of claim 8, further comprising a data backup storage device comprising: