CN117376364A - Data processing method and related equipment - Google Patents

Data processing method and related equipment Download PDF

Info

Publication number
CN117376364A
CN117376364A CN202210983123.6A CN202210983123A CN117376364A CN 117376364 A CN117376364 A CN 117376364A CN 202210983123 A CN202210983123 A CN 202210983123A CN 117376364 A CN117376364 A CN 117376364A
Authority
CN
China
Prior art keywords
data
target data
management device
data management
storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210983123.6A
Other languages
Chinese (zh)
Inventor
张子怡
曲强
杨锐捷
杜明晓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Cloud Computing Technologies Co Ltd
Original Assignee
Huawei Cloud Computing Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Cloud Computing Technologies Co Ltd filed Critical Huawei Cloud Computing Technologies Co Ltd
Priority to PCT/CN2023/081418 priority Critical patent/WO2024001304A1/en
Publication of CN117376364A publication Critical patent/CN117376364A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks
    • H04L67/1061Peer-to-peer [P2P] networks using node-based peer discovery mechanisms
    • H04L67/1065Discovery involving distributed pre-established resource-based relationships among peers, e.g. based on distributed hash tables [DHT] 
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/12Applying verification of the received information
    • H04L63/126Applying verification of the received information the source of the received data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks
    • H04L67/1074Peer-to-peer [P2P] networks for supporting data block transmission mechanisms
    • H04L67/1078Resource delivery mechanisms
    • H04L67/108Resource delivery mechanisms characterised by resources being split in blocks or fragments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a data processing method which is applied to a distributed data management system. A first data management device of the plurality of data management devices corresponds to a first blockchain node of the blockchain network, and a second data management device corresponds to a second blockchain node of the blockchain network. A first data management device mounted store and a second data management device mounted store for forming a pool of storage resources of a blockchain network, the method comprising: the target data management device in the plurality of data management devices receives the data operation request, acquires the storage addresses of the plurality of data fragments of the target data from the blockchain network according to the data operation request, and carries out IO on the target data in the storage resource pool according to the storage addresses of the plurality of data fragments. The interaction with the storage resource pool is processed by the data management device, so that the data consistency is ensured, and the safety, usability and accessibility of the data are improved.

Description

Data processing method and related equipment
The present application claims priority from the chinese patent application filed 30 months 2022, 06, filed with the chinese national intellectual property agency, application number 202210770817.1, entitled "method, apparatus, server, and storage medium for rich media storage", the entire contents of which are incorporated herein by reference.
Technical Field
The present application relates to the field of blockchain technology, and in particular, to a data processing method, system, apparatus, computing device cluster, computer readable storage medium, and computer program product.
Background
Blockchain (blockchain) technology is a decentralised architecture and computing paradigm that uses a blockchain data structure to validate and store data, a distributed node consensus algorithm to generate and update data, cryptography to secure data transfer and access, and intelligent contracts composed of automated script code to program and manipulate data.
Networks built based on blockchain technology are called blockchain networks, and nodes in the blockchain networks commonly maintain a distributed ledger, which is used as a storage carrier and generally stores a series of simple data structures such as key values or relational data. With the wide application of blockchains in industries such as finance, energy, government affairs, aviation, agriculture, civilian life, logistics, and the like, data related to industries such as rich media data such as video, audio, images, and the like, or large data such as modeling files, the demand for high-reliability on-chain storage is increasing.
If the rich media data or the big data is directly uplink, a large amount of on-link resources are occupied. Based on this, the industry has proposed a storage scheme that combines on-chain storage with off-chain storage. Specifically, rich media data or large data is stored to a storage system under a chain while hash values of the above data are being wound up. Therefore, the user can obtain the hash value on the chain, obtain the data from the storage system under the chain, calculate the hash value of the data, and compare the hash value on the chain with the hash value calculated under the chain, so that the data consistency is ensured.
However, there may be stability and security risks in the client, the transmission network, the storage network, etc., which may cause problems such as inconsistent data, tampered data, etc., and it is difficult to satisfy the service requirement.
Disclosure of Invention
The application provides a data processing method, which manages uploading, downloading and the like of data by introducing a distributed data management system, in particular to a method that a data management device in the distributed data management system interacts with a storage resource pool of a blockchain network formed by storage mounted by each data management device to realize input and output operations such as uploading, downloading and the like of the data, and relevant information such as storage addresses and the like of data fragments is recorded in the blockchain network. Even if the data of the client, the transmission network and the storage network are inconsistent due to stability or security problems, the data can be recovered based on the storage address of the data copy stored on the chain, so that the data consistency is ensured, and the security, availability and accessibility of the data are improved. The application also provides a distributed data management system, a data management device, a computing device cluster, a computer readable storage medium and a computer program product corresponding to the method.
In a first aspect, the present application provides a data processing method. The method is applied to a distributed data management system, wherein the distributed data management system comprises a plurality of data management devices. A first one of the plurality of data management devices corresponds to a first blockchain node of the blockchain network and a second one of the plurality of data management devices corresponds to a second blockchain node of the blockchain network. The first data management device mounted storage and the second data management device mounted storage are used for forming a storage resource pool of the blockchain network.
The target data management device in the plurality of data management devices can receive a data operation request, the data operation request is used for carrying out input/output IO operation on target data, then the target data management device obtains storage addresses of a plurality of data fragments of the target data from the blockchain network according to the data operation request, and carries out IO on the target data in a storage resource pool according to the storage addresses of the plurality of data fragments.
In the method, the storage resource pool is managed by the distributed data management system, all interactions with the storage resource pool (such as IO operation on target data) are processed by a data management device in the distributed data management system, and the data management device links the storage address of the target data of the IO operation. Even if the data of the client, the transmission network and the storage network are inconsistent due to stability or security problems, the data can be recovered based on the storage address of the data copy stored on the chain, so that the data consistency is ensured, and the security, availability and accessibility of the data are improved. In addition, the method can be used for linking related information of IO operation and can also realize operation traceability.
In some possible implementations, the data manipulation request is a write request, which is used to write, i.e., upload, the target data. Accordingly, the target data management device may acquire an allocation policy based on an intelligent contract of the blockchain network according to the data operation request, and then allocate storage resources for a plurality of data fragments of the target data from the storage resource pool according to the allocation policy by the target data management device, so as to obtain storage addresses of the plurality of data fragments. The target data management device may write the plurality of data fragments into the storage resource pool according to the storage address of the at least one data fragment, and store the storage address of the plurality of data fragments to a distributed ledger of the blockchain network.
The method provides a distributed data management system for the block chain network, determines an allocation strategy through a data management device in the distributed data management system, disperses a plurality of data fragments for storing target data according to the allocation strategy, meets the requirement of distributed management, avoids the risk of manager disuse in centralized management, and constructs a trusted system.
In some possible implementations, the target data management device may determine weights of different storage resources according to an allocation policy in combination with at least one of capacity, bandwidth, and history of failure of each storage, and allocate the storage resources for the data slices based on the weights, so as to obtain storage addresses of each data slice. By storing the data fragments of the target data according to the storage addresses determined by the method, the data storage and reading time can be reduced, and the storage space waste can be reduced.
In some possible implementations, the distributed data management system may further obtain the slicing policy based on the intelligent contract of the blockchain network according to the data operation request during the data uploading process by the target data management device. The target data management device may then obtain a slicing algorithm, a number of slices, and a number of copies of each data slice according to the slicing policy. Accordingly, when the target data management device performs IO, the target data can be fragmented according to the fragmentation algorithm and the fragmentation number to obtain a plurality of data fragments of the target data, then each copy of each data fragment is written into the storage resource pool according to the storage address of each copy of each data fragment in the plurality of data fragments, and the storage address of each copy of each data fragment is stored into the distributed ledger of the blockchain network.
According to the method, the target data is segmented according to the segmentation strategy obtained in the blockchain network, so that a plurality of data segments are obtained, and then the data segments are stored in the storage resource pool in a distributed mode, so that the storage (uploading) or reading (downloading) efficiency of the target data can be improved.
In some possible implementations, each data slice includes multiple copies, and even if several copies of the data slice are lost or deleted or tampered with, the data can be restored based on the other copies. The target data management device may write multiple copies of each data shard to different types of storage media of the storage resource pool while writing the respective copies of the data shard to the storage resource pool. Therefore, even if one or some types of storage media fail, the data can be recovered through the copies stored by other types of storage media, the storage reliability is improved, and the data safety is ensured.
In some possible implementations, the number of copies of the data shards is equal to the number of blockchain nodes. That is, for each data slice of the target data, the target data management device can store one copy in the storage mounted by the data management device corresponding to each blockchain node of the blockchain network, so as to achieve the effect of storing the data slice on the blockchain network, and the storage reliability is ensured through lower storage cost without occupying a large amount of storage resources on the links of the blockchain network.
In some possible implementations, the target data management apparatus may further determine at least one of a hash value of the target data, a hash value of each of the plurality of data slices, a data attribute of the target data. The data attributes may include one or more of creator, creation time, topic, among others. The target data management device may then store at least one of the hash value of the target data, the hash value of each of the plurality of data fragments, and the data attribute of the target data to a distributed ledger of the blockchain network.
Therefore, when the data is queried, the data query can be supported according to the hash value of the target data, the hash value of the data fragment and the data attribute of the target data, so that the query efficiency can be quickened on one hand, and the query accuracy can be guaranteed on the other hand.
In some possible implementations, the data manipulation request may be a read request for reading the target data, i.e., downloading the target data. Specifically, the target data management device may acquire, according to the read request, storage addresses of a plurality of data fragments of the target data from the distributed ledger of the blockchain network, then the target data management device acquires, according to the storage addresses of the plurality of data fragments, the plurality of data fragments from the storage resource pool, and then the target data management device aggregates the plurality of data fragments to obtain the target data.
In the method, the target data management device concurrently reads a plurality of data fragments from the storage resource pool by means of the blockchain network, and obtains target data based on the plurality of data fragments, thereby improving the data reading (downloading) efficiency. In addition, the method ensures the consistency of the read data through a block chain network.
In some possible implementations, the storage resource pool may store multiple copies of a data chunk. Accordingly, when the target data management device reads the data fragments from the storage resource pool, the target data management device can acquire an allocation strategy from the blockchain network based on the intelligent contract, and according to the allocation strategy, the target path can be determined from a plurality of paths based on the weight determined by combining at least one of the capacity, the bandwidth and the historical fault record of each storage resource. The target path may access the least costly or least delayed path of the multiple paths of the data slice. The target data management device may access the target path to obtain each data slice. Therefore, the reading time delay of the target data can be further shortened, and the reading cost of the target data is reduced.
In some possible implementations, the target data management device may obtain the aggregated policy based on a blockchain network's intelligent contracts in accordance with the data manipulation request. Accordingly, the target data management device can aggregate the plurality of data fragments according to the aggregation policy to obtain target data.
The method comprises the steps of aggregating data fragments by means of an aggregation strategy stored on a chain to obtain target data. If part of the data fragments in the storage resource pool are tampered, deleted or lost, copies of the data fragments can be timely obtained and aggregated, so that the consistency of the data is ensured.
In some possible implementations, the target data management device may obtain the local hash values and the in-chain hash values for the plurality of data slices. The local hash value may be obtained by a hash algorithm, for example, calculated by the data management device based on the content of the locally stored data fragment by the hash algorithm. The on-chain hash value is a hash value stored in a blockchain network. The target data management device may first verify based on the local hash value or the in-chain hash value, thereby detecting tampered, deleted or lost data fragments in advance.
When the target data management device determines that the local hash value is matched with the on-chain hash value, aggregation of a plurality of data fragments is started, and aggregated data is obtained. The target data management device may then determine a hash value of the aggregate data and obtain the hash value of the target data from the blockchain network. The target data management device may perform verification based on the hash value of the aggregate data or the hash value of the target data. And when the hash value of the aggregated data is matched with the hash value of the target data, determining that the aggregated data is the target data.
Thus, the accuracy of the read target data can be guaranteed through hash value verification.
In some possible implementations, the target data management device may obtain the first meta-information of the data fragments in the storage mounted by the target data management device from the blockchain node corresponding to the target data management device, and obtain the second meta-information of the data fragments from the storage mounted by the target data management device. When the first information is not matched with the second meta-information, the target data management device determines that a fault occurs, and stores the fault information to the distributed ledger of the blockchain network.
The target data management device can periodically scan the block to take the node and the local storage mounted by the device, and the meta information of the data fragments stored by the block link points and the meta information of the data fragments stored locally are checked, so that the fault checking speed is accelerated, the checking efficiency is improved, and further, assistance is provided for fault recovery.
In some possible implementations, the target data management device may read failure information from the blockchain network. When the fault information characterizes that the data fragments in the storage mounted by the target data management device are tampered, deleted or lost, the target data management device can acquire the data fragments from the storage mounted by other data management devices and perform local storage, and then the target data management device stores the updated storage address to the distributed ledger of the blockchain network.
The target data management device performs fault recovery by reading the fault information stored in the chain and based on the fault information related to the current device, so that the consistency of the data is ensured.
In a second aspect, the present application provides a distributed data management system. The distributed data management system comprises a plurality of data management devices; a first data management device of the plurality of data management devices corresponds to a first blockchain node of a blockchain network, and a second data management device of the plurality of data management devices corresponds to a second blockchain node of the blockchain network; the first data management device mounted storage and the second data management device mounted storage are used for forming a storage resource pool of the blockchain network;
the target data management device is used for receiving a data operation request, and the data operation request is used for carrying out input/output IO operation on target data;
the target data management device is further configured to obtain, according to the data operation request, storage addresses of a plurality of data fragments of the target data from the blockchain network, and perform IO on the target data in the storage resource pool according to the storage addresses of the plurality of data fragments.
In some possible implementations, the data operation request is a write request, and the target data management device is specifically configured to:
acquiring an allocation strategy based on an intelligent contract of the blockchain network according to the data operation request;
according to the allocation strategy, allocating storage resources for a plurality of data fragments of the target data from the storage resource pool, and obtaining storage addresses of the plurality of data fragments;
and writing the data fragments into the storage resource pool according to the storage addresses of the at least one data fragment, and storing the storage addresses of the data fragments into a distributed ledger of the blockchain network.
In some possible implementations, the target data management device is further configured to:
acquiring a slicing strategy based on an intelligent contract of the blockchain network according to the data operation request;
obtaining a slicing algorithm, the number of slices and the number of copies of each data slice according to the slicing strategy;
the target data management device is specifically configured to:
according to the slicing algorithm and the slicing quantity, slicing the target data to obtain a plurality of data slices of the target data;
And writing the respective copies of each data slice into the storage resource pool according to the storage addresses of the respective copies of each data slice in the plurality of data slices, and storing the storage addresses of the respective copies of each data slice into a distributed ledger of the blockchain network.
In some possible implementations, each data slice includes multiple copies;
the target data management device is specifically configured to:
multiple copies of each data slice are written to different types of storage media of the storage resource pool.
In some possible implementations, the target data management device is further configured to:
determining at least one of a hash value of the target data, a hash value of each of the plurality of data slices, and a data attribute of the target data;
storing at least one of the hash value of the target data, the hash value of each of the plurality of data slices, and the data attribute of the target data to a distributed ledger of the blockchain network.
In some possible implementations, the data operation request is a read request, and the target data management device is specifically configured to:
according to the read request, acquiring storage addresses of a plurality of data fragments of the target data from a distributed ledger of the blockchain network;
The target data management device is specifically configured to:
acquiring the plurality of data fragments from the storage resource pool according to the storage addresses of the plurality of data fragments;
and aggregating the plurality of data fragments to obtain the target data.
In some possible implementations, the target data management device is further configured to:
acquiring an aggregation strategy based on an intelligent contract of the blockchain network according to the data operation request;
the target data management device is specifically configured to:
and aggregating the plurality of data fragments according to the aggregation strategy to obtain the target data.
In some possible implementations, the target data management device is specifically configured to:
obtaining local hash values of the plurality of data fragments and a hash value on a chain, wherein the local hash values are obtained through a hash algorithm, and the hash value on the chain is a hash value stored in a block chain network;
determining that the local hash value is matched with the on-chain hash value, and starting aggregation of the plurality of data fragments to obtain aggregated data;
determining a hash value of the aggregated data, acquiring the hash value of the target data from the blockchain network, and determining the aggregated data as the target data when the hash value of the aggregated data is matched with the hash value of the target data.
In some possible implementations, the target data management device is further configured to:
acquiring first meta information of data fragments in the storage mounted by the target data management device from a blockchain node corresponding to the target data management device, and acquiring second meta information of the data fragments in the storage mounted by the target data management device;
and when the first information is not matched with the second meta-information, determining that a fault occurs, and storing the fault information to a distributed ledger of the blockchain network.
In some possible implementations, the target data management device is further configured to:
reading failure information from the blockchain network;
when the fault information characterizes that the data fragments in the storage mounted by the target data management device are tampered, deleted or lost, acquiring the data fragments from the storage mounted by other data management devices, and performing local storage;
and storing the updated storage address to a distributed ledger of the blockchain network.
In a third aspect, the present application provides a data management apparatus. The data management device corresponds to a blockchain node in a blockchain network, and the storage of the data management device and the storage of the other data management devices in the distributed data management system are used for forming a storage resource pool of the blockchain network, and the data management device comprises:
The communication module is used for receiving a data operation request, wherein the data operation request is used for carrying out input/output IO operation on target data;
and the management module is also used for acquiring the storage addresses of the plurality of data fragments of the target data from the blockchain network according to the data operation request, and carrying out IO on the target data in the storage resource pool according to the storage addresses of the plurality of data fragments.
In some possible implementations, the data operation request is a write request, and the management module is specifically configured to:
acquiring an allocation strategy based on an intelligent contract of the blockchain network according to the data operation request;
according to the allocation strategy, allocating storage resources for a plurality of data fragments of the target data from the storage resource pool, and obtaining storage addresses of the plurality of data fragments;
and writing the data fragments into the storage resource pool according to the storage addresses of the at least one data fragment, and storing the storage addresses of the data fragments into a distributed ledger of the blockchain network.
In some possible implementations, the management module is further configured to:
acquiring a slicing strategy based on an intelligent contract of the blockchain network according to the data operation request;
Obtaining a slicing algorithm, the number of slices and the number of copies of each data slice according to the slicing strategy;
the management module is specifically configured to:
according to the slicing algorithm and the slicing quantity, slicing the target data to obtain a plurality of data slices of the target data;
and writing the respective copies of each data slice into the storage resource pool according to the storage addresses of the respective copies of each data slice in the plurality of data slices, and storing the storage addresses of the respective copies of each data slice into a distributed ledger of the blockchain network.
In some possible implementations, each data slice includes multiple copies;
the management module is specifically configured to:
multiple copies of each data slice are written to different types of storage media of the storage resource pool.
In some possible implementations, the management module is further configured to:
determining at least one of a hash value of the target data, a hash value of each of the plurality of data slices, and a data attribute of the target data;
storing at least one of the hash value of the target data, the hash value of each of the plurality of data slices, and the data attribute of the target data to a distributed ledger of the blockchain network.
In some possible implementations, the data operation request is a read request, and the management module is specifically configured to:
according to the read request, acquiring storage addresses of a plurality of data fragments of the target data from a distributed ledger of the blockchain network;
acquiring the plurality of data fragments from the storage resource pool according to the storage addresses of the plurality of data fragments;
and aggregating the plurality of data fragments to obtain the target data.
In some possible implementations, the management module is further configured to:
acquiring an aggregation strategy based on an intelligent contract of the blockchain network according to the data operation request;
the management module is specifically configured to:
and aggregating the plurality of data fragments according to the aggregation strategy to obtain the target data.
In some possible implementations, the management module is specifically configured to:
obtaining local hash values of the plurality of data fragments and a hash value on a chain, wherein the local hash values are obtained through a hash algorithm, and the hash value on the chain is a hash value stored in a block chain network;
determining that the local hash value is matched with the on-chain hash value, and starting aggregation of the plurality of data fragments to obtain aggregated data;
Determining a hash value of the aggregated data, acquiring the hash value of the target data from the blockchain network, and determining the aggregated data as the target data when the hash value of the aggregated data is matched with the hash value of the target data.
In some possible implementations, the data management apparatus further includes:
the fault checking module is used for acquiring first meta information of the data fragments in the storage mounted by the target data management device from the blockchain nodes corresponding to the target data management device and acquiring second meta information of the data fragments in the storage mounted by the target data management device; and when the first information is not matched with the second meta-information, determining that a fault occurs, and storing the fault information to a distributed ledger of the blockchain network.
In some possible implementations, the data management apparatus further includes:
and the fault recovery module is used for reading fault information from the blockchain network, acquiring the data fragments from the storage mounted by other data management devices when the fault information characterizes that the data fragments in the storage mounted by the target data management device are tampered, deleted or lost, carrying out local storage, and storing the updated storage addresses to the distributed ledger of the blockchain network.
In a fourth aspect, the present application provides a cluster of computing devices. The cluster of computing devices includes at least one computing device including at least one processor and at least one memory. The at least one processor and the at least one memory are in communication with each other. The at least one processor is configured to execute instructions stored in the at least one memory to cause a computing device or cluster of computing devices to perform the data processing method according to the first aspect or any implementation of the first aspect.
In a fifth aspect, the present application provides a computer readable storage medium having stored therein instructions for instructing a computing device or a cluster of computing devices to execute the data processing method according to any implementation manner of the first aspect or the first aspect.
In a sixth aspect, the present application provides a computer program product comprising instructions which, when run on a computing device or cluster of computing devices, cause the computing device or cluster of computing devices to perform the data processing method of any implementation of the first aspect or the first aspect described above.
Further combinations of the present application may be made to provide further implementations based on the implementations provided in the above aspects.
Drawings
In order to more clearly illustrate the technical method of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below.
FIG. 1 is a schematic diagram of a distributed data management system according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a distributed data management system according to an embodiment of the present disclosure;
FIG. 3 is a schematic architecture diagram of a distributed data management system in a multi-scenario federation according to an embodiment of the present disclosure;
FIG. 4 is a flowchart of a data processing method according to an embodiment of the present application;
fig. 5 is a schematic flow chart of data uploading provided in an embodiment of the present application;
fig. 6 is a schematic flow chart of data downloading according to an embodiment of the present application;
FIG. 7 is a schematic flow chart of fault detection according to an embodiment of the present disclosure;
FIG. 8 is a schematic flow chart of fault recovery according to an embodiment of the present disclosure;
FIG. 9 is a schematic structural diagram of a distributed data management system according to an embodiment of the present application;
FIG. 10 is a schematic diagram of a computing device according to an embodiment of the present application;
FIG. 11 is a schematic diagram of a computing device cluster according to an embodiment of the present disclosure;
FIG. 12 is a schematic diagram of a computing device cluster according to an embodiment of the present disclosure;
fig. 13 is a schematic structural diagram of a computing device cluster according to an embodiment of the present application.
Detailed Description
The terms "first", "second" in the embodiments of the present application are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature.
Some technical terms related to the embodiments of the present application will be first described.
The blockchain network, which may also be simply referred to as a blockchain, refers to a peer-to-peer (P2P) network that is built based on blockchain technology. The blockchain network includes a plurality of blockchain nodes, each blockchain node being a peer node. In a blockchain network, a plurality of blockchain links collectively maintain a continuously growing, chain-type list ledger constructed from ordered blocks of data. Each blockchain node stores copies of the above-described chain list ledgers and maintains consistency between the copies, and thus, the chain list ledgers are also referred to as distributed ledgers of the blockchain network.
The blockchain network may be classified as either public (public blockchain), private (Private Blockchain) or federated (Consortium Blockchain) based on how open read and write rights are. The public chain is a public blockchain network, and the read-write authority is opened to all nodes; the private chain is a private block chain network, and the read-write authority is opened to a certain node; the alliance chain is an alliance block chain, and the read-write authority is opened to nodes (members in the alliance) joining the alliance.
Distributed ledgers of blockchain networks are commonly used to store simple data structures of key value data, relational data, and the like. With the wide application of blockchain technology in industries such as finance, energy, government affairs, aviation, agriculture, civilian life, logistics, etc., industry related data, such as rich media data of video, audio, image, etc. or big data of modeling files, etc., the demand for high-reliability on-chain storage is growing.
Considering that the on-chain storage needs to occupy a large amount of resources, large-scale data such as rich media data (e.g. video, audio, image) or large data (e.g. modeling files) can be stored to an off-chain storage system, and hash values of the data can be simultaneously chained. Thus, the user can compare the hash value on the chain with the hash value obtained by calculation of the data stored under the chain, and the data consistency is ensured. However, there may be stability and security risks in the client, the transmission network, the storage network, etc., which may cause problems such as inconsistent data, tampered data, etc., and it is difficult to satisfy the service requirement.
In view of this, the embodiment of the application provides a data processing method. The method may be applied to a distributed data management system. The management system includes a plurality of data management devices. Each data management device is part of a distributed data management system. The distributed data management system is essentially a distributed storage engine, and is mainly used for managing storage of rich media data, so that the distributed data management system can also be called as a distributed rich media engine, and a data management device in the distributed data management system is a part of the distributed rich media engine. A first one of the plurality of data management devices corresponds to a first blockchain node of the blockchain network and a second one of the plurality of data management devices corresponds to a second blockchain node of the blockchain network. The first data management device mounted storage and the second data management device mounted storage are used to form a pool of storage resources of the blockchain network.
Specifically, the target data management in the plurality of data management apparatuses may receive a data operation request for performing an Input Output (IO) operation on the target data, and then the target data management apparatus may obtain, according to the data operation request, a storage address of a plurality of data slices (which may also be simply referred to as slices in some cases) of the target data from the blockchain network, and perform IO on the target data in the storage resource pool according to the storage addresses of the plurality of data slices.
In the method, the storage resource pool is managed by the distributed data management system, all interactions with the storage resource pool (such as IO operation on target data) are processed by a data management device in the distributed data management system, and the data management device links the storage address of the target data of the IO operation. Even if the data of the client, the transmission network and the storage network are inconsistent due to stability or security problems, the data can be recovered based on the storage address of the data copy stored on the chain, so that the data consistency is ensured, and the security, availability and accessibility of the data are improved. In addition, the method can be used for linking related information of IO operation and can also realize operation traceability.
In order to make the technical solution of the present application clearer and easier to understand, the system architecture of the embodiments of the present application is described below with reference to the accompanying drawings.
Referring to the architecture diagram of the distributed data management system shown in fig. 1, the distributed data management system 100 includes a plurality of data management devices 10, each data management device 10 of the plurality of data management devices 10 respectively corresponds to one blockchain node 20 of the blockchain network 200, and each data management device 10 is loaded with a storage 30. It should be noted that the data management apparatus 10 of the embodiments of the present application supports nanotubes and adaptations to different storage media, for example, the data management apparatus 10 may mount different storage media, including but not limited to a hard disk (HDD) or a solid state disk (solid state drive, SDD). Multiple data management devices 10 mounted stores 30 may be used to form a pool 300 of storage resources of a blockchain network.
In the example of fig. 1, the data management apparatus 10 may also interface with a blockchain client 40. A participant of the blockchain, such as a tenant on the cloud, may write large-scale data, such as rich media data or big data, to the storage resource pool through the blockchain client 40, or read large-scale data, such as rich media data or big data, from the storage resource pool 300 through the blockchain client 40.
Specifically, the data management apparatus 10 is configured to receive a data operation request, for example, a data operation request sent by a tenant through the blockchain client 40, where the data operation request is used to perform an input/output IO operation on target data, and the data management apparatus 10 obtains, according to the data operation request, storage addresses of a plurality of data slices of the target data from the blockchain network, and performs IO on the target data according to the storage addresses of the plurality of data slices. For example, when the data operation request is a write request, the data management apparatus 10 may fragment the target data, and then determine the storage address of each data fragment, and store each data fragment according to the storage address. The data management device 10 links up the hash value of the target data and the hash value of the data fragment, and also links up the storage address of the data fragment. For another example, when the data operation request is a read request, the data management apparatus 10 may acquire a storage address of the data fragment from the blockchain network, acquire the data fragment according to the storage address, and then aggregate the data fragment to obtain the target data. It should be noted that, the data management apparatus 10 may perform verification on hash values of the data slices before aggregation, specifically, calculate hash values according to the data slices, and compare the hash values with hash values on a chain, so as to implement verification. Similarly, the data management apparatus 10 may also check the hash value of the aggregated data after aggregation to determine whether the aggregated data is the target data.
Aiming at the problems of scattered storage resources, easy loss and easy tampering of data caused by respective management, the data management device 10 of the embodiment of the application also provides a corresponding customized contract to provide interfaces for a storage fragmentation strategy, a storage allocation strategy (also called as a storage fragmentation route) and a storage aggregation strategy (the aggregation strategy refers to a strategy for aggregating data fragmentation), for example, an application programming interface (Application Programming Interface, API) is used for each distributed storage engine. The data management device 10 of the distributed data management system 100 can use the intelligent contract of the blockchain network to make consensus on the storage sharding policy, the storage allocation policy and the storage aggregation policy. When the data management device performs data IO, the data IO time (storage or reading time) and the storage space waste can be reduced by calculating the storage position of the fragments (the storage position marked by the storage address) based on the allocation strategy and combining the stored residual reserves, the fragment types, the fragment number, the bandwidth, the historical failure times and the like.
And, the data management device 10 traces back IO operation through intelligent contract, and the storage allocation policy, the storage aggregation policy and the execution logic thereof are processed by contract consensus, and the current storage writing or reading action is approved by multiparty endorsement result, so that the data security is ensured, and storage inconsistency or failure caused by data tampering is avoided. Further, the data management device 10 defines different data slicing algorithms through the smart contracts, slices the data into data slices which cannot be read, any data cannot be obtained in the storage medium, slices in different storage media are read by the data management device 10 and are aggregated and returned to the blockchain client. On the one hand, the method can expand the slicing mode and automatically aggregate the data slicing, so that the user operation is simplified, and on the other hand, by slicing the data into unreadable data slicing which are stored in the storage media managed by different data management devices 10 in a scattered manner, any one data management device 10 cannot acquire the data independently, so that the data privacy safety is ensured.
The data management apparatus 10 shown in fig. 1 may be a software apparatus that may be deployed on other computing devices independent of blockchain nodes. The data management apparatus shown in fig. 1 may also be a hardware apparatus, for example, the hardware apparatus may be a computing device independent of the blockchain node and having a large-scale data management function such as rich media data.
In some possible implementations, referring to the architecture diagram of the distributed data management system 100 shown in fig. 2, the individual data management devices 10 of the distributed data management system 100 may also be deployed in the blockchain node 20, i.e., the blockchain node 20 includes a blockchain kernel and the data management devices 10. Wherein the data management device 10 may be a middleware or component that may be integrated into the blockchain node 20.
The distributed data management system 100 of the present embodiments may be applied to industries such as finance, energy, government, aviation, agriculture, civilian, logistics, and the like. For example, the distributed data management system 100 may be applied to rich media data logging, file logging, digital asset logging, non-homogeneous token (NFT) transactions, and the like. Moreover, the distributed data management system 100 may support the meta universe or web3.0 as a distributed storage floor.
When the distributed data management system 100 is applied in the above scenario, deployment in private cloud, public cloud, hybrid cloud, or edge node is supported. The public cloud is cloud services provided by a cloud service provider for users through the public Internet (Internet), and the users can access the cloud through the Internet and enjoy various services, including but not limited to computing, storage, networks and the like. The private cloud is a cloud computing use mode which is built by an enterprise and provides services for the interior of the enterprise, is built for the enterprise to be used independently, can be deployed in a data center of the enterprise, and can be uniformly deployed in a machine room of a cloud service provider. The hybrid cloud is a cloud computing usage way combining private cloud and public cloud. The edge node is a network node with fewer intermediate links with respect to the cloud computing data center, and refers to a user that is ultimately accessed. The edge node may be a machine room or a physical device, and the user has better response capability and connection speed when accessing the edge node than when accessing the source station directly.
In some possible implementations, the distributed data management system 100 may also be distributed deployed in different environments. Referring to the architecture schematic diagram of the distributed data management system 100 shown in fig. 3, a plurality of data management devices 10 of the distributed data management system 100 may be deployed in public cloud, hybrid cloud, and edge nodes, respectively, so as to provide data management services for multi-scenario federations.
Based on the distributed data management system 100 provided in the embodiment of the present application, the embodiment of the present application further provides a corresponding data processing method.
In order to make the technical solution of the present application clearer and easier to understand, the following describes the data processing method of the embodiment of the present application with reference to the accompanying drawings.
Referring to the flow chart of the data processing method shown in fig. 4, the method comprises:
s402: the target data management device receives a data operation request sent by a blockchain client.
The target data management device may be any one of the data management devices 10 in the distributed data management system 100, for example, the first data management device described above, or the second data management device.
The data operation request is used for carrying out input/output IO operation on target data. Wherein the data operation request may be a write request for writing (storing) target data. The data operation request may also be a read request for reading the target data. Based on this, an operation type, such as read or write, may be included in the data operation request to indicate writing or reading of the target data. The data operation request further includes meta information of the target data, which may be, for example, a name of the target data. The target data is taken as a rich media file for illustration, and the data operation request comprises an operation type and a file name of the rich media file.
S404: the target data management device obtains the storage address of the data fragment of the target data from the blockchain network 200 according to the data operation request.
In this embodiment, the target data is stored in a scattered manner in the form of data fragments, specifically, in a storage resource pool in a distributed storage manner. Based on this, the target data management apparatus may first acquire the storage address of the data fragment of the target data from the blockchain network 200 based on the intelligent contract of the blockchain network 200 according to the data operation request. The following exemplifies the case of writing target data and reading target data, respectively.
When the data operation request is a write request, the target data management device may acquire an allocation policy based on an intelligent contract of the blockchain network according to the data operation request, and then allocate storage resources for a plurality of data fragments of the target data from the storage resource pool 300 according to the allocation policy by the target data management device, so as to obtain storage addresses of the plurality of data fragments.
Wherein the allocation policy may be a weight-based allocation policy. The weight-based allocation policy may specifically be to determine the weight of each storage 30 based on the remaining reserves, the fragmentation type, the number of fragments, the bandwidth, and the number of historical failures of each storage 30, and determine the fragmentation storage location based on the weight of each storage 30. The slicing strategy can ensure that each data slice exists on two to three storage media, so that data loss caused by the failure of individual storage media is avoided, and data safety is ensured. Wherein the storage address of the data fragment can be recorded into the blockchain ledger by generating an index table.
When the data operation request is a read request, since the storage addresses of the plurality of data fragments of the target data to be read by the read request are also stored in the blockchain network (uplink) during the writing process, the target data management device may acquire the storage addresses of the plurality of data fragments of the target data from the distributed ledger of the blockchain network 200 according to the read request.
S406: the target data management device performs IO on the target data according to the storage addresses of the plurality of data fragments of the target data.
When the data operation request is a write request, the target data management device may write the plurality of data fragments into the storage resource pool 300 according to the storage address of the at least one data fragment, and store the storage addresses of the plurality of data fragments into the distributed ledger of the blockchain network 200.
Further, before the allocation policy is obtained based on the intelligent contract of the blockchain network 200, the target data management device may further obtain a slicing policy based on the intelligent contract of the blockchain network 200 according to the data operation request, and accordingly, the target data management device may obtain a slicing algorithm, a slicing number and a copy number of each data slice according to the slicing policy. The slicing algorithm may be different depending on the data type. For example, where the data type is a video file, the slicing algorithm may include one or more of free-split, average split, split by time length, split by file size. The number of fragments may be determined according to the size of the target data, the number of storage nodes in the storage resource pool 300. The number of copies of each data slice may be determined according to the reliability requirement of the target data, for example, when the reliability requirement of the target data is high, three copies of each data slice may be used for storing, that is, the number of copies of each data slice may be 3.
Accordingly, the target data management device may write the respective copies of each of the plurality of data slices to the storage resource pool 300 based on the storage address of the respective copies of each of the plurality of data slices. The target data management device may also store the storage address of the respective copy of each data slice to blockchain network 200 for facilitating subsequent reads. Wherein the target data management device may record the storage address of each copy of each data slice to the distributed ledger of blockchain network 200 based on the smart contract.
Further, for multiple copies of a data shard, the target data management device may write multiple copies of each data shard to different types of storage media of the storage resource pool 300. In this way, even if a certain storage medium fails, the failure recovery can be performed based on the copies in other storage media.
In this embodiment, in order to facilitate verification of data in a subsequent data reading or data query process, the target data management apparatus may further determine at least one of a hash value of the target data, a hash value of each of the plurality of data slices, and a data attribute of the target data, where the data attribute of the target data may include one or more of a creator, creation time, and theme of the target data. The target data management apparatus may store at least one of a hash value of the target data, a hash value of each of the plurality of data fragments, and a data attribute of the target data to the blockchain network. Similarly to the storage address where the data fragment is stored, the target data management apparatus may record at least one of the hash value of the target data, the hash value of the data fragment, and the data attribute of the target data to the distributed ledger based on the smart contract.
When the data operation request is a read request, the target data management apparatus may acquire the storage addresses of the plurality of data fragments of the target data from the blockchain network 200 (for example, the distributed ledger of the blockchain network 200) according to the read request, then the target data management apparatus may acquire the plurality of data fragments from the storage resource pool 300 according to the storage addresses of the plurality of data fragments, and then the target data management apparatus may aggregate the plurality of data fragments to obtain the target data.
The target data management device may further obtain an aggregation policy based on the intelligent contract of the blockchain network 200 according to the data operation request, and then aggregate the plurality of data fragments according to the aggregation policy to obtain the target data. Wherein the aggregate policy corresponds to the sharding policy. When the target data is taken as a video example to illustrate that the slicing strategy is a strategy of dividing according to time length, the aggregation strategy can be a strategy of aggregating according to time length, and the target data management device can sort the data slices according to the starting time or the ending time sequence based on the starting time and the ending time of the data slices, and then splice the sorted data slices, so that the aggregation of the data slices is realized.
When each of the plurality of data slices of the target data has a plurality of copies, the target data management device may determine a target path from a plurality of paths for accessing the plurality of copies, where the target path may be a path with a minimum delay or a minimum cost, and pull the data slices according to the target path, so as to aggregate the data slices. When determining the target path, the target data management device may calculate the weight of each path based on at least one of the remaining storage capacity, the fragmentation type, the fragmentation number, the bandwidth, and the historical failure number of the storage 30 mounted by each data management device, and determine the target path from the plurality of paths based on the weight.
In some possible implementations, the target data management device may also obtain local hash values and in-chain hash values for multiple data slices. The local hash value is obtained through a hash algorithm, and specifically, the target data management device may perform hash operation on the content of the data fragment by using the hash algorithm after obtaining the data fragment, so as to obtain the local hash value. The on-chain hash value is a hash value stored in the blockchain network 200, and specifically, the target data management device triggers a read operation to the blockchain network 200 to read hash values of a plurality of data fragments of target data stored in the blockchain network 200. The target data management device may then compare the local hash value with the on-chain hash value, and when the target data management device determines that the local hash value matches the on-chain hash value, for example, if the local hash value matches the on-chain hash value, then aggregate the plurality of data fragments is started to obtain aggregate data.
Further, the target data management device may also determine hash values of the aggregate data and obtain hash values of the target data from the blockchain network 200. Similarly, the target data management device may compare the hash value of the aggregate data with the hash value of the target data stored on the chain. When the hash value of the aggregate data matches the hash value of the target data, the target data management device determines that the aggregate data is the target data.
S408: the target data management device returns the data operation result.
When the data operation request is a write request, the data operation result may be write success or write failure. If the data operation result is that the writing is successful, the target data management device can execute other data operation requests. If the data operation result is a write failure, the blockchain client may be instructed to resend the write request to rewrite the target data.
When the data operation request is a read request, the data operation result may be a read success or a read failure. When the data operation result is that the reading is successful, the data operation result can also comprise target data read by the target data management device. When the data operation result is that the reading fails, the target data management device can instruct the blockchain client to resend the reading request so as to re-read the target data.
It should be noted that, S408 is an optional step in the embodiment of the present application, for example, when the data operation request is a write request, the target data management device may not return the data operation result.
S410: the target data management device obtains the first meta-information of the data fragments in the target data management device mounted store 30 from the blockchain network 200.
Specifically, the target data management device may periodically scan the blockchain network 200, specifically, periodically scan the blockchain node 20 corresponding to the target data management device, so as to obtain the first meta information of the data slice in the storage 30 mounted by the target data management device. The first meta information refers to meta information stored on a chain, and the first meta information can include one or more of names, sizes and hash values of data fragments stored on the chain.
The period in which the target data management device scans the blockchain network 200 may be set according to an empirical value, for example, the period may be set to 5 minutes (min).
S412: the target data management device obtains the second meta-information of the data fragment from the storage 30 on which the target data management device is mounted.
The target data management device may periodically scan the target data management device-mounted store 30 (also referred to as a local store) to obtain second meta-information for the data fragments in the target data management device-mounted store 30. The second meta information refers to meta information stored under the chain, and the second meta information can include one or more of names, sizes and hash values of data fragments stored under the chain.
S414: the target data management device determines whether the first meta information and the second meta information match. If not, S416 is performed.
Specifically, the target data management apparatus may compare the first meta information and the second meta information, thereby determining whether the first meta information and the second meta information match. When the first meta information and the second meta information do not match, the characterization is failed, such as that the storage medium of the local storage is lost or that the data fragment is deleted and tampered, the target data management device may perform S416.
S416: the target data management device determines that a failure has occurred and stores failure information to the blockchain network.
The failure information may include a node identification of the failed node or a shard identification of the failed data shard. The node identifier may be one or more of a node name and an IP address of the node, and the slice identifier of the data slice may be a slice name. Wherein, when the data fragment is deleted or tampered, the fault information may further include meta information of the data to which the data fragment belongs.
The target data management device may record failure information to the distributed ledger of blockchain network 200 based on the intelligent contract of blockchain network 200. On one hand, the method lays a foundation for subsequent fault recovery, and on the other hand, the method can realize data operation tracing.
S418: the target data management device reads the failure information.
The target data management device may periodically read the failure information. Wherein the target data management device may access blockchain node 20 to periodically read the failure information.
The period of performing the scan block chain network 200 or the local storage by the target data management device and the period of performing the fault recovery by reading the fault information may be identical or different. For example, the period in which the target data management apparatus reads the failure information for failure recovery may be greater than the period of failure check.
In some examples, the period for which the target data management device scans the blockchain network 200 or local storage for failure checks may be 5 minutes, and the period for which the data management device reads the failure information for failure recovery may be 5 minutes, or 10 minutes.
S420: the target data management device obtains copies of the data fragments from the storage mounted by other data management devices in the storage resource pool 300 according to the failure information.
When the failure information characterizes that the storage medium in the storage 30 mounted by the target data management apparatus is lost, the target data management apparatus may obtain copies of all data fragments stored by the lost storage medium from the storage mounted by other data management apparatuses in the storage resource pool 300.
When the failure information characterizes that a certain data fragment in the storage 30 mounted by the target data management device is deleted or tampered, the target data management device may obtain a copy of the deleted or tampered data fragment from the storage mounted by the other data management devices in the storage resource pool 300.
S420: the target data management device stores copies of the data fragments locally.
The data management device writes the copy of the data fragment into the storage mounted by the target data management device, thereby realizing the local storage of the data target data fragment. When the storage medium in the storage 30 mounted by the target data management device is lost, the target data management device may first mount a new storage medium and then write a copy of the data fragment to the storage 30 mounted by the target data management device.
S424: the target data management device stores the updated storage address to the blockchain network.
In the fault recovery process, the storage address of the data fragment is updated correspondingly. The target data management device may store the updated storage address to blockchain network 200, such as by storing the updated storage address to a distributed ledger of blockchain network 200 via a smart contract of blockchain network 200.
It should be noted that, S410 to S416 are a specific implementation manner of fault detection, S418 to S424 are one implementation manner of fault recovery, and the data processing method of the embodiment of the present application may not be executed in S410 to S416 or S418 to S424.
Based on the above description, the embodiments of the present application provide a data processing method. In this approach, the blockchain nodes 20 in the blockchain network 200 support the mounting of different types of external storage, thereby providing storage capability for storing large-scale data, such as rich media data, modeling data, etc., based on the blockchain network 200. The blockchain client provides a calling interface to support a user to upload or download large-scale data such as rich media data and the like through the calling interface. Because the target data management device of the distributed data management system 100 is involved in the uploading or downloading process, the related information in the uploading or downloading process is recorded in a linking manner. Even if the data is tampered or deleted, the data can be timely restored based on the related information stored on the chain, so that the safety, usability, accessibility and operation traceability of the data are improved. In addition, the storage 30 mounted on each blockchain node 20 in the method can adopt distributed storage, and an adaptive distributed storage resource pool is provided for the decentralizing system of the blockchain network 200 to complete the decentralizing requirement.
Fig. 4 is a flowchart describing a data processing method according to an embodiment of the present application, and a flowchart describing a data uploading, a data downloading, a fault checking, and a fault recovering in detail will be described below with reference to the accompanying drawings.
First, referring to the flow chart of data uploading shown in fig. 5, the method specifically includes the following steps:
step 1: the blockchain client receives a write request and sends the write request to the target data management device.
The write request is for writing target data, which may be large-scale data such as rich media data or large data. The write request is specifically used for writing target data into a storage resource pool formed by the storage mounted by each blockchain node in the blockchain network, that is, uploading the target data to the storage resource pool of the blockchain network, so the write request is also called a data uploading request.
It should be noted that the write request may include the target data. For example, the target data may be encapsulated in a payload of a write request, and transmitted to the target data management device via the write request, such that the target data management device uploads the target data to a pool of storage resources of the blockchain network.
Step 2: the target data management device in the distributed data management system verifies the target data and the signature according to the write request. The verification is passed, and the step 3 is executed; and (5) if the verification is not passed, executing step 8.
In particular, the target data management device may perform integrity checking on the target data in consideration of the possibility that the target data may be tampered with during transmission. For example, the integrity check code may be carried in the write request, the target data management device receives the write request, and may calculate the integrity check code according to the target data in the write request, then compare the integrity check code carried in the write request with the locally calculated integrity check code, and if the integrity check codes are consistent, then the integrity check is indicated to pass, otherwise, the integrity check is indicated to fail, and the target data may be tampered.
In addition, during the transmission process of the target data, man-in-the-middle attack (MITM) may also exist, based on which the signature of the user may also be included in the write request, and accordingly, the target data management device may also verify the signature.
It should be noted that, the above step 2 is an optional step in the embodiments of the present application, and the method of executing the embodiments of the present application may not be executed. For example, the whole system may be deployed in a trusted environment without performing step 2.
Step 3: the target data management device acquires a slicing strategy from the blockchain network, and slices the target data to acquire a plurality of data slices.
Step 4: the target data management device obtains an allocation policy from the blockchain network and determines a storage address of each of the plurality of data slices according to the allocation policy.
Step 5: the target data management device writes the data fragments into the storage resource pool according to the storage of each data fragment.
Step 6: the target data management device determines hash values of the respective data fragments and hash values of the target data.
The specific implementation of step 3 to step 6 may be described with reference to the related content of the embodiment shown in fig. 4, and will not be described herein.
Step 7: the target data management device generates a transaction based on the hash value of the data fragment, the hash value of the target data, and the storage address of the data fragment, and links the transaction.
The target data management device can generate a transaction block according to the hash value of the data fragment, the hash value of the target data and the storage address of the data fragment, each block link point achieves consensus for the transaction block, and the transaction block can be added to the distributed account book, so that transaction uplink is realized. It should be noted that, the target data management apparatus may also add the data attribute to the transaction block, so as to implement the uplink storage of the data attribute.
Step 8: the target data management device returns a verification failure notification.
The verification failure notification is used to indicate that the verification failed, and the blockchain client may resend the data upload request to re-upload the target data.
Then, referring to the flow chart of data downloading shown in fig. 6, the method specifically includes the following steps:
step 1: the blockchain client receives a read request and sends the read request to the target data management device.
The read request is for reading target data, which may be large-scale data such as rich media data or large data. The read request is specifically configured to read target data from a storage resource pool formed by the storage mounted by each blockchain node in the blockchain network, that is, to download target data from the storage resource pool of the blockchain network 200, and therefore, the read request is also referred to as a data download request.
Step 2: and the target data management device in the distributed data management system verifies the signature according to the read request, acquires the access right from the blockchain network and verifies the access right. The verification is passed, and the step 3 is executed; and 7, if the verification is not passed, executing the step 7.
The specific implementation of signature verification by the target data management device may be described with reference to the embodiment shown in fig. 5, which is not described herein. The distributed account book of the blockchain network can store access rights of different data, and the target data management device can acquire the access rights of the target data read by the read request and determine whether the current user has the access rights. If yes, the access right verification is passed, and if not, the access right verification is not passed.
When the signature verification is passed and the access right verification is passed, step 3 may be performed, and when the signature verification is not passed or the access right verification is not passed, step 7 may be performed.
Step 3: the target data management device obtains the storage address of the data fragment from the blockchain network.
Step 4: the target data management device determines a target path from the storage address of each copy of the data fragments, acquires the data fragments from the target path storage resource pool, acquires the on-chain hash value and the local hash value of each data fragment, and performs verification based on the on-chain hash value and the local hash value to ensure the accuracy of the data fragments.
Step 5: the target data management device aggregates the pieces of data and then compares the hash value of the aggregated data with the hash value of the target data stored on the chain. When the hash value of the aggregate data is consistent with the hash value of the target data stored on the chain, step 6 is performed.
Step 6: the target data management device returns the target data and the corresponding transaction data.
Step 7: the target data management device returns a verification failure notification.
The verification failure notification is used to indicate that the verification failed, and the blockchain client may resend the data download request to re-download the target data.
In this approach, the blockchain client may query the transaction data for data attributes via hash values using a data acquisition interface (also referred to as a data download interface). After data inquiry is initiated, a target data management device of the distributed data management system checks a signature based on a block chain system, acquires a fragment address from a chain, pulls data fragments, aggregates the data fragments, compares and verifies the hash value of aggregated data with the hash value of target data stored on the chain, returns the target data to a block chain client after verification is passed, reads other data fragments for aggregation if verification is not passed, and continues to verify.
Next, referring to the schematic flow chart of fault checking shown in fig. 7, the method specifically includes the following steps:
step 1: the target data management device periodically accesses the blockchain node to acquire meta-information of the data fragments in the local storage.
The meta-information obtained by the target data management device from the block link points is also referred to as first meta-information. Wherein the first meta information includes one or more of a name, a size, or a hash value of the data fragment.
Step 2: the target data management device periodically accesses the local storage to acquire the meta information of the data fragments in the local storage.
The meta information acquired from the local storage by the target data management apparatus is also referred to as second meta information. Wherein the second meta-information includes one or more of a name, a size, or a hash value of the data fragment.
Step 3: the target data management device compares the meta-information obtained from the blockchain node with the meta-information obtained from the local store. When the meta information is consistent, step 4 is executed, and when the meta information is inconsistent, step 5 is executed.
Step 4: the target data management device determines that the storage is normal and records an event log.
Step 5: the target data management device determines a storage failure and writes failure information to a blockchain node of the blockchain network.
The target data management device of the distributed data management system can ensure the storage reliability by detecting the local mounted storage medium at fixed time. Specifically, the target data management device may query the hash value or the data attribute of the locally stored data fragment, and obtain the hash value or the data attribute from the chain for comparison. If the local storage medium is lost or some piece of stored data is deleted or tampered with, the target data management device may determine a new storage address to store the piece of data and uplink the failure information and the new storage address to other target data management devices. If the hash value and the data attribute are consistent, the local log event is recorded and the next polling time is waited for checking the data fragments in the storage.
Then, referring to the schematic flow chart of fault recovery shown in fig. 8, the method specifically includes the following steps:
step 1: the target data management device periodically accesses the blockchain node to acquire fault information.
Step 2: the target data management device determines whether to involve local storage of the target data management device based on the failure information. If yes, executing the step 3, otherwise, returning to the step 1 to wait for the next polling.
The fault information includes a recommended memory address. The storage address is attributed to the storage (local storage) on which the target data management device is mounted, indicating that the failure information relates to the local storage of the target data management device.
Step 3: the target data management device determines the storage address based on the allocation policy. If the storage address is consistent with the recommended storage address in the fault information, executing the step 4, and if the storage address is inconsistent with the recommended storage address in the fault information, returning to the step 1 to wait for the next polling.
Step 4: the target data management device accesses the storage resource pool to pull the copy of the data fragment, verifies the hash value, and writes the copy of the data fragment into the local storage according to the recommended storage address when the hash value passes the verification.
Step 5: the target data management device updates the storage location to the blockchain node.
In the method, the distributed data management system can transmit fault information based on a blockchain network, repair data fragments at regular time through a multi-backup mechanism to achieve high available distributed storage capacity, poll fault information on a chain at regular time, check whether local storage is recommended, calculate weight to verify a storage position if the local storage is recommended, pull the data fragments to a storage node with backup if the storage addresses are consistent, store the data fragments in a local storage medium after verifying hash values and data attributes, eliminate the fault information and link the updated storage addresses.
Based on the data processing method provided in the embodiments of the present application, the embodiments of the present application further provide a distributed data management system 100 as described above. The distributed data management system 100 is described below in conjunction with the accompanying figures.
Referring to the schematic structure of the distributed data management system 100 shown in fig. 9, the distributed data management system 100 includes a plurality of data management devices 10. A first one of the plurality of data management devices 10 corresponds to a first blockchain node of the blockchain network and a second one of the plurality of data management devices 10 corresponds to a second blockchain node of the blockchain network. The first data management device mounted storage and the second data management device mounted storage are used for forming a storage resource pool of the blockchain network.
A target data management device of the plurality of data management devices 10, configured to receive a data operation request, where the data operation request is used to perform an input/output IO operation on target data;
and the target data management device is also used for acquiring the storage addresses of a plurality of data fragments of the target data from the blockchain network according to the data operation request, and carrying out IO on the target data in the storage resource pool according to the storage addresses of the plurality of data fragments.
In some possible implementations, the data operation request is a write request, and the target data management device is specifically configured to:
acquiring an allocation strategy based on an intelligent contract of a blockchain network according to a data operation request;
according to the allocation strategy, allocating storage resources for a plurality of data fragments of target data from a storage resource pool, and obtaining storage addresses of the plurality of data fragments;
and writing the plurality of data fragments into a storage resource pool according to the storage address of at least one data fragment, and storing the storage addresses of the plurality of data fragments into a distributed ledger of the blockchain network.
In some possible implementations, the target data management device is further configured to:
acquiring a slicing strategy based on an intelligent contract of a blockchain network according to a data operation request;
Obtaining a slicing algorithm, the number of slices and the number of copies of each data slice according to the slicing strategy;
the target data management device is specifically used for:
according to the slicing algorithm and the slicing quantity, slicing the target data to obtain a plurality of data slices of the target data;
and writing the respective copies of each data slice into a storage resource pool according to the storage address of the respective copy of each data slice in the plurality of data slices, and storing the storage address of the respective copy of each data slice into a distributed ledger of the blockchain network.
In some possible implementations, each data slice includes multiple copies;
the target data management device is specifically used for:
multiple copies of each data slice are written to different types of storage media of the storage resource pool.
In some possible implementations, the target data management device is further configured to:
determining at least one of a hash value of the target data, a hash value of each of the plurality of data fragments, and a data attribute of the target data;
storing at least one of the hash value of the target data, the hash value of each of the plurality of data slices, and the data attribute of the target data to a distributed ledger of the blockchain network.
In some possible implementations, the data operation request is a read request, and the target data management device is specifically configured to:
according to the read request, obtaining storage addresses of a plurality of data fragments of target data from a distributed ledger of a blockchain network;
the target data management device is specifically used for:
acquiring a plurality of data fragments from a storage resource pool according to the storage addresses of the plurality of data fragments;
and aggregating the plurality of data fragments to obtain target data.
In some possible implementations, the target data management device is further configured to:
acquiring an aggregation strategy based on intelligent contracts of a blockchain network according to a data operation request;
the target data management device is specifically used for:
and aggregating the plurality of data fragments according to an aggregation strategy to obtain target data.
In some possible implementations, the target data management device is specifically configured to:
acquiring local hash values of a plurality of data fragments and a hash value on a chain, wherein the local hash values are obtained through a hash algorithm, and the hash value on the chain is a hash value stored in a block chain network;
determining that the local hash value is matched with the hash value on the chain, and starting aggregation of a plurality of data fragments to obtain aggregated data;
Determining hash values of the aggregated data, acquiring hash values of the target data from the blockchain network, and determining the aggregated data as the target data when the hash values of the aggregated data are matched with the hash values of the target data.
In some possible implementations, the target data management device is further configured to:
acquiring first meta information of data fragments in the storage mounted by the target data management device from the blockchain node corresponding to the target data management device, and acquiring second meta information of the data fragments in the storage mounted by the target data management device;
and when the first information is not matched with the second meta-information, determining that the fault occurs, and storing the fault information into a distributed account book of the blockchain network.
In some possible implementations, the target data management device is further configured to:
reading fault information from the blockchain network;
when fault information represents that the data fragments in the storage mounted by the target data management device are tampered, deleted or lost, acquiring the data fragments from the storage mounted by other data management devices, and performing local storage;
and storing the updated storage address to a distributed ledger of the blockchain network.
The target data management device may be any one of the plurality of data management devices 10, and may be, for example, the first data management device or the second data management device. The structure of the data management apparatus will be described below. As shown in fig. 9, the data management apparatus 10 includes:
The communication module 102 is configured to receive a data operation request, where the data operation request is used to perform input/output IO operation on target data;
the management module 104 is further configured to obtain, according to the data operation request, storage addresses of a plurality of data slices of the target data from the blockchain network, and perform IO on the target data in the storage resource pool according to the storage addresses of the plurality of data slices.
The management module 104 is configured to implement the storage allocation policy function shown in fig. 1, fig. 2, and fig. 3, and determine a storage address of the data partition based on the allocation policy, and perform IO on the target data in the storage resource pool based on the storage address.
The communication module 102 and the management module 104 may be implemented by a hardware module or by a software module.
When implemented in software, the communication module 102, the management module 104 may be an application or application module running on a computing device or cluster of computing devices.
When implemented in hardware, the communication module 102 may be implemented by a transceiver module such as a network interface card, transceiver, or the like. The management module 104 may be a device implemented using an application-specific integrated circuit (ASIC), a programmable logic device (programmable logic device, PLD), or the like. The PLD may be implemented as a complex program logic device (complex programmable logical device, CPLD), a field-programmable gate array (FPGA), a general-purpose array logic (generic array logic, GAL), or any combination thereof.
In some possible implementations, the data operation request is a write request, and the management module 104 is specifically configured to:
acquiring an allocation strategy based on an intelligent contract of the blockchain network according to the data operation request;
according to the allocation strategy, allocating storage resources for a plurality of data fragments of the target data from the storage resource pool, and obtaining storage addresses of the plurality of data fragments;
and writing the data fragments into the storage resource pool according to the storage addresses of the at least one data fragment, and storing the storage addresses of the data fragments into a distributed ledger of the blockchain network.
In some possible implementations, the management module 104 is further configured to:
acquiring a slicing strategy based on an intelligent contract of the blockchain network according to the data operation request;
obtaining a slicing algorithm, the number of slices and the number of copies of each data slice according to the slicing strategy;
the management module is specifically configured to:
according to the slicing algorithm and the slicing quantity, slicing the target data to obtain a plurality of data slices of the target data;
and writing the respective copies of each data slice into the storage resource pool according to the storage addresses of the respective copies of each data slice in the plurality of data slices, and storing the storage addresses of the respective copies of each data slice into a distributed ledger of the blockchain network.
In this method, the management module 104 is further configured to implement the storage fragmentation policy function shown in fig. 1, fig. 2, and fig. 3, based on the fragmentation policy, fragment the target data, and perform IO on the target data in the storage resource pool based on the storage address of each data fragment of the target data.
In some possible implementations, each data slice includes multiple copies;
the management module 104 is specifically configured to:
multiple copies of each data slice are written to different types of storage media of the storage resource pool.
In some possible implementations, the management module 104 is further configured to:
determining at least one of a hash value of the target data, a hash value of each of the plurality of data slices, and a data attribute of the target data;
storing at least one of the hash value of the target data, the hash value of each of the plurality of data slices, and the data attribute of the target data to a distributed ledger of the blockchain network.
In some possible implementations, the data operation request is a read request, and the management module 104 is specifically configured to:
according to the read request, acquiring storage addresses of a plurality of data fragments of the target data from a distributed ledger of the blockchain network;
Acquiring the plurality of data fragments from the storage resource pool according to the storage addresses of the plurality of data fragments;
and aggregating the plurality of data fragments to obtain the target data.
In some possible implementations, the management module 104 is further configured to:
acquiring an aggregation strategy based on an intelligent contract of the blockchain network according to the data operation request;
the management module 104 is specifically configured to:
and aggregating the plurality of data fragments according to the aggregation strategy to obtain the target data.
In this method, the management module 104 is further configured to implement the storage aggregation policy function shown in fig. 1 or fig. 2 and fig. 3, and based on the aggregation policy, aggregate each data slice of the target data to restore the target data, so as to implement IO on the target data.
In some possible implementations, the management module 104 is specifically configured to:
obtaining local hash values of the plurality of data fragments and a hash value on a chain, wherein the local hash values are obtained through a hash algorithm, and the hash value on the chain is a hash value stored in a block chain network;
determining that the local hash value is matched with the on-chain hash value, and starting aggregation of the plurality of data fragments to obtain aggregated data;
Determining a hash value of the aggregated data, acquiring the hash value of the target data from the blockchain network, and determining the aggregated data as the target data when the hash value of the aggregated data is matched with the hash value of the target data.
In this method, the management module 104 is further configured to implement the data calculation checking function shown in fig. 1 or fig. 2 and fig. 3. Specifically, the management module 104 may calculate the local hash value, and then compare the local hash value with the hash value on the chain, so as to implement data calculation verification, thereby guaranteeing accuracy of the target data IO.
In some possible implementations, the data management device 10 further includes:
a fault checking module 106, configured to obtain first meta information of a data fragment in storage mounted by the target data management device from a blockchain node corresponding to the target data management device, and obtain second meta information of the data fragment in storage mounted by the target data management device; and when the first information is not matched with the second meta-information, determining that a fault occurs, and storing the fault information to a distributed ledger of the blockchain network.
The fault checking module 106 may be implemented by a hardware module or by a software module.
When implemented in software, the fault checking module 106 may be an application or application module running on a computing device or cluster of computing devices.
When implemented in hardware, the fault checking module 106 may be a device implemented using an application specific integrated circuit ASIC, or a programmable logic device PLD, or the like. The PLD may be a complex program logic device CPLD, a field programmable gate array FPGA, a general-purpose array logic GAL, or any combination thereof.
In some possible implementations, the data management device 10 further includes:
the fault recovery module 108 is configured to read fault information from the blockchain network, when the fault information characterizes that a data fragment in a storage mounted by the target data management device is tampered, deleted or lost, acquire the data fragment from a storage mounted by another data management device, perform local storage, and store an updated storage address to a distributed ledger of the blockchain network.
Similarly, the fault recovery module 108 described above may be implemented by a hardware module or by a software module.
When implemented in software, the failure recovery module 108 may be an application or application module running on a computing device or cluster of computing devices.
When implemented in hardware, the fault recovery module 108 may be a device implemented using an application specific integrated circuit ASIC, or a programmable logic device PLD, or the like. The PLD may be a complex program logic device CPLD, a field programmable gate array FPGA, a general-purpose array logic GAL, or any combination thereof.
The present application also provides a computing device 1000. As shown in fig. 10, the computing device 1000 includes: bus 1002, processor 1004, memory 1006, and communication interface 1008. Communication between the processor 1004, memory 1006 and communication interface 1008 is via bus 1002. The computing device 1000 may be a computing device in a central cloud, such as a central server, or a computing device in an edge cloud, such as an edge server. The computing device 1000 may also be a lightweight device, such as a terminal device for a smart phone, smart wearable device, or the like. It should be understood that the present application is not limited to the number of processors, memories in computing device 1000.
Bus 1002 may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, only one line is shown in fig. 10, but not only one bus or one type of bus. Bus 1004 may include a pathway to transfer information between various components of computing device 1000 (e.g., memory 1006, processor 1004, communication interface 1008).
The processor 1004 may include any one or more of a central processing unit (central processing unit, CPU), a graphics processor (graphics processing unit, GPU), a Microprocessor (MP), or a digital signal processor (digital signal processor, DSP).
The memory 1006 may include volatile memory (RAM), such as random access memory (random access memory). The processor 1004 may also include a non-volatile memory (non-volatile memory), such as read-only memory (ROM), flash memory, a mechanical hard disk (HDD) or a solid state disk (solid state drive, SSD). The memory 1006 has stored therein executable program codes that the processor 1004 executes to implement the aforementioned data processing method. Specifically, the memory 1006 stores instructions for the distributed data management system 100 or the data management device 10 to perform a data processing method.
Communication interface 1003 enables communication between computing device 1000 and other devices or communication networks using a transceiver module such as, but not limited to, a network interface card, transceiver, or the like.
The embodiment of the application also provides a computing device cluster. The cluster of computing devices includes at least one computing device 1000. The computing device 1000 may be a server, such as a central server, an edge server. In some embodiments, computing device 1000 may also be a terminal device.
As shown in fig. 11, the cluster of computing devices includes at least one computing device 1000. The memory 1006 in one or more computing devices 1000 in the cluster of computing devices may have stored therein instructions of the same distributed data management system 100 for performing the data processing method.
In some possible implementations, one or more computing devices 1000 in the cluster of computing devices may also be used to execute portions of the instructions of the distributed data management system 100 for performing the data processing method. In other words, a combination of one or more computing devices 1000 may collectively execute instructions of the distributed data management system 100 for performing a data processing method.
It should be noted that the memory 1006 in different computing devices 1000 in a cluster of computing devices may store different instructions for performing part of the functions of the distributed data management system 100.
Fig. 12 shows one possible implementation. As shown in fig. 12, two computing devices 1000A and 1000B are connected by a communication interface 1008. Instructions for performing the functions of the communication module 102, the management module 104 are stored on a memory in the computing device 1000A. Instructions for the functions of the fault detection module 106 and the fault recovery module 108 are stored on memory in the computing device 1000B. In other words, the memory 1006 of the computing devices 1000A and 1000B collectively store instructions for the distributed data management system 100 for performing the data processing methods.
The connection manner between the computing device clusters shown in fig. 12 may be a distributed ledger that needs to be maintained by the blockchain nodes in the blockchain network in consideration of the need of scanning the blockchain nodes in the data processing method provided by the present application when performing fault detection, and when recovering from a fault, the fault information stored in the blockchain nodes needs to be read. Thus, consider that the functions implemented by communication module 102, management module 104 are performed by computing device 1000A, and the functions implemented by failure detection module 106 and failure recovery module 108 are performed by computing device 1000B.
It should be appreciated that the functionality of computing device 1000A shown in fig. 12 may also be performed by multiple computing devices 1000. Likewise, the functionality of computing device 1000B may also be performed by multiple computing devices 1000.
In some possible implementations, one or more computing devices in a cluster of computing devices may be connected through a network. Wherein the network may be a wide area network or a local area network, etc. Fig. 13 shows one possible implementation. As shown in fig. 13, two computing devices 1000C and 1000D are connected by a network. Specifically, the connection to the network is made through a communication interface in each computing device. In this type of possible implementation, instructions to perform the functions of the communication module 102, the management module 104 are stored in a memory 1006 in the computing device 1000C. Meanwhile, instructions to perform the functions of the fault checking module 106 and the fault recovery module 108 are stored in the memory 1006 in the computing device 1000D.
The connection between clusters of computing devices shown in fig. 13 may be implemented by computing device 1000C in consideration of the data processing method provided in the present application that needs to scan a distributed ledger maintained by blockchain nodes in the blockchain network, or read fault information stored in blockchain nodes, so functions implemented by communication module 102 and management module 104 are handed over to computing device 1000C, and functions implemented by fault checking module 106 and fault recovery module 108 are implemented by computing device 1000D. It should be appreciated that the functionality of computing device 1000C shown in fig. 13 may also be performed by multiple computing devices 1000. Likewise, the functionality of computing device 1000D may also be performed by multiple computing devices 1000.
Embodiments of the present application also provide a computer-readable storage medium. The computer readable storage medium may be any available medium that can be stored by a computing device or a data storage device such as a data center containing one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk), etc. The computer readable storage medium includes instructions that instruct a computing device to perform the data processing method described above as being applied to the distributed data management system 100.
Embodiments of the present application also provide a computer program product comprising instructions. The computer program product may be a software or program product containing instructions capable of running on a computing device or cluster of computing devices or stored in any available medium. The computer program product, when run on at least one computing device (a computing device or a cluster of computing devices), causes the at least one computing device to perform the data processing method described above.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; these modifications or substitutions do not depart from the essence of the corresponding technical solutions from the protection scope of the technical solutions of the embodiments of the present invention.

Claims (24)

1. A data processing method, characterized by being applied to a distributed data management system, the distributed data management system comprising a plurality of data management devices; a first data management device of the plurality of data management devices corresponds to a first blockchain node of a blockchain network, and a second data management device of the plurality of data management devices corresponds to a second blockchain node of the blockchain network; the first data management device mounted storage and the second data management device mounted storage are used for forming a storage resource pool of the blockchain network; the method comprises the following steps:
the target data management device in the plurality of data management devices receives a data operation request, wherein the data operation request is used for carrying out input/output (IO) operation on target data;
and the target data management device acquires the storage addresses of a plurality of data fragments of the target data from the blockchain network according to the data operation request, and carries out IO on the target data in the storage resource pool according to the storage addresses of the plurality of data fragments.
2. The method of claim 1, wherein the data operation request is a write request, wherein the target data management device obtains the storage addresses of the plurality of data slices of the target data from the blockchain network according to the data operation request, comprising:
The target data management device obtains an allocation strategy based on an intelligent contract of the blockchain network according to the data operation request;
the target data management device allocates storage resources for a plurality of data fragments of the target data from the storage resource pool according to the allocation strategy, and obtains storage addresses of the plurality of data fragments;
the target data management device performs IO on the target data in the storage resource pool according to the storage addresses of the data fragments, and the method comprises the following steps:
the target data management device writes the plurality of data fragments into the storage resource pool according to the storage address of the at least one data fragment, and stores the storage address of the plurality of data fragments into the distributed ledger of the blockchain network.
3. The method according to claim 2, wherein the method further comprises:
the target data management device obtains a slicing strategy based on an intelligent contract of the blockchain network according to the data operation request;
the target data management device obtains a slicing algorithm, the number of slices and the number of copies of each data slice according to the slicing strategy;
The target data management device performs IO on the target data according to the storage addresses of the data fragments, and the method comprises the following steps:
the target data management device fragments the target data according to the fragmentation algorithm and the fragmentation quantity to obtain a plurality of data fragments of the target data;
the target data management device writes the copies of each data slice into the storage resource pool according to the storage addresses of the copies of each data slice in the plurality of data slices, and stores the storage addresses of the copies of each data slice into the distributed ledger of the blockchain network.
4. A method according to claim 3, wherein each data slice comprises a plurality of copies;
the target data management device writes respective copies of each data slice to the storage resource pool, comprising:
the target data management device writes multiple copies of each data slice to different types of storage media of the storage resource pool.
5. The method according to any one of claims 2 to 4, further comprising:
the target data management device determines at least one of a hash value of the target data, a hash value of each of the plurality of data slices, and a data attribute of the target data;
The target data management device stores at least one of a hash value of the target data, a hash value of each of the plurality of data slices, and a data attribute of the target data to a distributed ledger of the blockchain network.
6. The method of claim 1, wherein the data operation request is a read request, and wherein the target data management device obtains the storage addresses of the plurality of data slices of the target data from the blockchain network according to the data operation request, comprising:
the target data management device acquires storage addresses of a plurality of data fragments of the target data from a distributed ledger of the blockchain network according to the read request;
the target data management device performs IO on the target data according to the storage addresses of the data fragments, and the method comprises the following steps:
the target data management device acquires the plurality of data fragments from the storage resource pool according to the storage addresses of the plurality of data fragments;
and the target data management device aggregates the plurality of data fragments to obtain the target data.
7. The method of claim 6, wherein the method further comprises:
The target data management device obtains an aggregation strategy based on the intelligent contract of the blockchain network according to the data operation request;
the target data management device aggregates the plurality of data fragments to obtain the target data, including:
and the target data management device aggregates the plurality of data fragments according to the aggregation strategy to obtain the target data.
8. The method of claim 6, wherein the target data management device aggregates the plurality of data fragments to obtain the target data, comprising:
the target data management device obtains local hash values of the plurality of data fragments and on-chain hash values, wherein the local hash values are obtained through a hash algorithm, and the on-chain hash values are hash values stored in a block chain network;
the target data management device determines that the local hash value is matched with the on-chain hash value, and starts aggregation of the plurality of data fragments to obtain aggregated data;
the target data management device determines a hash value of the aggregate data, acquires the hash value of the target data from the blockchain network, and determines the aggregate data as the target data when the hash value of the aggregate data matches the hash value of the target data.
9. The method according to any one of claims 1 to 8, further comprising:
the target data management device acquires first meta information of the data fragments in the storage mounted by the target data management device from the block chain nodes corresponding to the target data management device, and acquires second meta information of the data fragments in the storage mounted by the target data management device;
and when the first information is not matched with the second meta-information, the target data management device determines that the fault occurs, and stores the fault information to the distributed ledger of the blockchain network.
10. The method according to any one of claims 1 to 8, further comprising:
the target data management device reads fault information from the blockchain network;
when the fault information characterizes that the data fragments in the storage mounted by the target data management device are tampered, deleted or lost, the target data management device acquires the data fragments from the storage mounted by other data management devices and performs local storage;
the target data management device stores the updated storage address to a distributed ledger of the blockchain network.
11. A distributed data management system, the distributed data management system comprising a plurality of data management devices; a first data management device of the plurality of data management devices corresponds to a first blockchain node of a blockchain network, and a second data management device of the plurality of data management devices corresponds to a second blockchain node of the blockchain network; the first data management device mounted storage and the second data management device mounted storage are used for forming a storage resource pool of the blockchain network;
the target data management device is used for receiving a data operation request, and the data operation request is used for carrying out input/output IO operation on target data;
the target data management device is further configured to obtain, according to the data operation request, storage addresses of a plurality of data fragments of the target data from the blockchain network, and perform IO on the target data in the storage resource pool according to the storage addresses of the plurality of data fragments.
12. The system of claim 11, wherein the data operation request is a write request, and wherein the target data management device is specifically configured to:
Acquiring an allocation strategy based on an intelligent contract of the blockchain network according to the data operation request;
according to the allocation strategy, allocating storage resources for a plurality of data fragments of the target data from the storage resource pool, and obtaining storage addresses of the plurality of data fragments;
and writing the data fragments into the storage resource pool according to the storage addresses of the at least one data fragment, and storing the storage addresses of the data fragments into a distributed ledger of the blockchain network.
13. The system of claim 12, wherein the target data management device is further configured to:
acquiring a slicing strategy based on an intelligent contract of the blockchain network according to the data operation request;
obtaining a slicing algorithm, the number of slices and the number of copies of each data slice according to the slicing strategy;
the target data management device is specifically configured to:
according to the slicing algorithm and the slicing quantity, slicing the target data to obtain a plurality of data slices of the target data;
and writing the respective copies of each data slice into the storage resource pool according to the storage addresses of the respective copies of each data slice in the plurality of data slices, and storing the storage addresses of the respective copies of each data slice into a distributed ledger of the blockchain network.
14. The system of claim 13, wherein each data slice comprises a plurality of copies;
the target data management device is specifically configured to:
multiple copies of each data slice are written to different types of storage media of the storage resource pool.
15. The system according to any one of claims 12 to 14, wherein the target data management device is further configured to:
determining at least one of a hash value of the target data, a hash value of each of the plurality of data slices, and a data attribute of the target data;
storing at least one of the hash value of the target data, the hash value of each of the plurality of data slices, and the data attribute of the target data to a distributed ledger of the blockchain network.
16. The system of claim 11, wherein the data operation request is a read request, and wherein the target data management device is specifically configured to:
according to the read request, acquiring storage addresses of a plurality of data fragments of the target data from a distributed ledger of the blockchain network;
the target data management device is specifically configured to:
acquiring the plurality of data fragments from the storage resource pool according to the storage addresses of the plurality of data fragments;
And aggregating the plurality of data fragments to obtain the target data.
17. The system of claim 16, wherein the target data management device is further configured to:
acquiring an aggregation strategy based on an intelligent contract of the blockchain network according to the data operation request;
the target data management device is specifically configured to:
and aggregating the plurality of data fragments according to the aggregation strategy to obtain the target data.
18. The system according to claim 16, wherein the target data management device is specifically configured to:
obtaining local hash values of the plurality of data fragments and a hash value on a chain, wherein the local hash values are obtained through a hash algorithm, and the hash value on the chain is a hash value stored in a block chain network;
determining that the local hash value is matched with the on-chain hash value, and starting aggregation of the plurality of data fragments to obtain aggregated data;
determining a hash value of the aggregated data, acquiring the hash value of the target data from the blockchain network, and determining the aggregated data as the target data when the hash value of the aggregated data is matched with the hash value of the target data.
19. The system according to any one of claims 11 to 18, wherein the target data management device is further configured to:
acquiring first meta information of data fragments in the storage mounted by the target data management device from a blockchain node corresponding to the target data management device, and acquiring second meta information of the data fragments in the storage mounted by the target data management device;
and when the first information is not matched with the second meta-information, determining that a fault occurs, and storing the fault information to a distributed ledger of the blockchain network.
20. The system according to any one of claims 11 to 18, wherein the target data management device is further configured to:
reading failure information from the blockchain network;
when the fault information characterizes that the data fragments in the storage mounted by the target data management device are tampered, deleted or lost, acquiring the data fragments from the storage mounted by other data management devices, and performing local storage;
and storing the updated storage address to a distributed ledger of the blockchain network.
21. A data management device, wherein the data management device corresponds to a blockchain node in a blockchain network, and the storage of the data management device and the storage of other data management devices in a distributed data management system are used for forming a storage resource pool of the blockchain network, and the data management device comprises:
The communication module is used for receiving a data operation request, wherein the data operation request is used for carrying out input/output IO operation on target data;
and the management module is also used for acquiring the storage addresses of the plurality of data fragments of the target data from the blockchain network according to the data operation request, and carrying out IO on the target data in the storage resource pool according to the storage addresses of the plurality of data fragments.
22. A cluster of computing devices, the cluster of computing devices comprising at least one computing device, the at least one computing device comprising at least one processor and at least one memory, the at least one memory having computer-readable instructions stored therein; the at least one processor executing the computer readable instructions to cause the cluster of computing devices to perform the method of any one of claims 1 to 10.
23. A computer-readable storage medium comprising computer-readable instructions; the computer readable instructions are for implementing the method of any one of claims 1 to 10.
24. A computer program product comprising computer readable instructions; the computer readable instructions are for implementing the method of any one of claims 1 to 10.
CN202210983123.6A 2022-06-30 2022-08-16 Data processing method and related equipment Pending CN117376364A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2023/081418 WO2024001304A1 (en) 2022-06-30 2023-03-14 Data processing method and related device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2022107708171 2022-06-30
CN202210770817 2022-06-30

Publications (1)

Publication Number Publication Date
CN117376364A true CN117376364A (en) 2024-01-09

Family

ID=89393516

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210983123.6A Pending CN117376364A (en) 2022-06-30 2022-08-16 Data processing method and related equipment

Country Status (1)

Country Link
CN (1) CN117376364A (en)

Similar Documents

Publication Publication Date Title
US11928029B2 (en) Backup of partitioned database tables
US11036591B2 (en) Restoring partitioned database tables from backup
US11327949B2 (en) Verification of database table partitions during backup
US11836151B2 (en) Synchronizing symbolic links
US9971823B2 (en) Dynamic replica failure detection and healing
US10922303B1 (en) Early detection of corrupt data partition exports
US20070234331A1 (en) Targeted automatic patch retrieval
CN108683668B (en) Resource checking method, device, storage medium and equipment in content distribution network
EP4207688A1 (en) Asynchronous bookkeeping method and apparatus for blockchain, medium, and electronic device
US11627122B2 (en) Inter-system linking method and node
US11442752B2 (en) Central storage management interface supporting native user interface versions
US11886390B2 (en) Data file partition and replication
US11816069B2 (en) Data deduplication in blockchain platforms
US20230122861A1 (en) Unified metadata search
US20200153889A1 (en) Method for uploading and downloading file, and server for executing the same
US20200341674A1 (en) Method, device and computer program product for restoring data
CN113395340A (en) Information updating method, device, equipment, system and readable storage medium
US11494493B1 (en) Software verification for network-accessible applications
WO2024001304A1 (en) Data processing method and related device
CN113885797B (en) Data storage method, device, equipment and storage medium
US10409492B2 (en) Multi-phase dispersed storage write process
US20160323379A1 (en) Distributed storage of software images in computing systems
US20230305837A1 (en) Data file partition and replication
CN110298031B (en) Dictionary service system and model version consistency distribution method
CN109154880B (en) Consistent storage data in a decentralized storage network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication