CN116701539A - Distributed data storage method, device and terminal based on block chain - Google Patents

Distributed data storage method, device and terminal based on block chain Download PDF

Info

Publication number
CN116701539A
CN116701539A CN202310762035.8A CN202310762035A CN116701539A CN 116701539 A CN116701539 A CN 116701539A CN 202310762035 A CN202310762035 A CN 202310762035A CN 116701539 A CN116701539 A CN 116701539A
Authority
CN
China
Prior art keywords
data
storage
target
disaster recovery
fid
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310762035.8A
Other languages
Chinese (zh)
Inventor
何东
高秀寒
由楷
殷宏飞
顾宗杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202310762035.8A priority Critical patent/CN116701539A/en
Publication of CN116701539A publication Critical patent/CN116701539A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioethics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a distributed data storage method, a distributed data storage device and a distributed data storage terminal based on a block chain, which are applied to a storage gateway, wherein the storage gateway is connected with a plurality of storage controllers, and each storage controller corresponds to one storage node; the method comprises the following steps: acquiring target data; encrypting the target data based on a hash algorithm to obtain a characteristic value, and taking the characteristic value as the FID of the target data; determining a target storage node based on a preset disaster recovery isolation strategy, storing target data to the target storage node, and determining a storage address of the target data based on the FID of the target data and the target storage node. The method and the device can avoid the risk of tampering of the data content caused by repeated uploading of the same-name file in the prior art, and reduce the redundancy of data storage.

Description

Distributed data storage method, device and terminal based on block chain
Technical Field
The present invention relates to the field of data storage technologies, and in particular, to a distributed data storage method, device and terminal based on a blockchain.
Background
With the continuous evolution and application of digital technology, application systems are carrying more and more scenes in production and life, where the data stored is also related to aspects of the real world. With the development of blockchain technology, distributed and trusted service data in the real world has an effective carrier, and has successful practice in various aspects such as digital currency, supply chain finance, asset digitization and the like.
However, general files such as electronic documents, archive files and the like still depend on traditional object storage, and the files with the same name can be repeatedly uploaded to be modified and replaced to become short plates in a blockchain trust system, so that the service requirements of decrustation and incapability of tampering cannot be met.
Disclosure of Invention
The embodiment of the invention provides a distributed data storage method, device and terminal based on a blockchain, which are used for solving the problem of low safety of unstructured data storage.
In a first aspect, an embodiment of the present invention provides a blockchain-based distributed data storage method, which is applied to a storage gateway, where the storage gateway is connected to a plurality of storage controllers, and each storage controller corresponds to a storage node;
the method comprises the following steps:
acquiring target data;
encrypting the target data based on a hash algorithm to obtain a characteristic value, and taking the characteristic value as the FID of the target data;
determining a target storage node based on a preset disaster recovery isolation strategy, storing target data to the target storage node, and determining a storage address of the target data based on the FID of the target data and the target storage node.
In one possible implementation manner, encrypting the target data based on the hash algorithm to obtain the feature value, and taking the feature value as the FID of the target data includes:
If the target data is larger than the preset size, slicing the target data to obtain a plurality of sliced data;
encrypting each piece of data based on a hash algorithm to obtain characteristic values of each piece of data, and taking the characteristic values of each piece of data as FIDs of the corresponding piece of data;
arranging the FIDs of the piece data according to the sequence of the piece data to obtain an index object of the target data;
encrypting the index object based on the hash algorithm to obtain the characteristic value of the index object, and taking the characteristic value of the index object as the FID of the target data.
In one possible implementation manner, the storage gateway is connected with a plurality of storage controllers through a plurality of dump controllers, each dump controller corresponds to the plurality of storage controllers, each dump controller is used as a logic disaster recovery group, each dump controller forms a plurality of logic storage pools, and each logic storage pool comprises a plurality of logic disaster recovery groups;
before determining the target storage node based on the preset disaster recovery isolation policy, the method further comprises:
acquiring a target logical storage pool;
the determining the target storage node based on the preset disaster recovery isolation strategy comprises the following steps:
performing modular operation on the FID of the target data based on the number of the logic disaster recovery groups in the target logic storage pool, and taking the logic disaster recovery group with the sequence number i as a target logic disaster recovery group; wherein i is a modular operation result;
Selecting one or more target storage nodes from the target logic disaster recovery group based on a preset disaster recovery isolation strategy; the disaster recovery isolation strategy comprises the number of target nodes and/or the state of storage nodes.
In one possible implementation, each storage node has one or more data stored therein;
the method further comprises the steps of:
acquiring a logic disaster recovery group updating instruction; the logic disaster recovery group updating instruction comprises the number of updated logic disaster recovery groups and the serial numbers of the updated logic disaster recovery groups;
performing modular operation on the FID of the data based on the number of updated logical disaster recovery groups for each data, and taking the logical disaster recovery group with the sequence number of i as a target logical disaster recovery group of the data, wherein i is a modular operation result, and selecting one or more storage nodes from the target logical disaster recovery group of the data as target storage nodes of the data based on a preset disaster recovery isolation strategy;
and transferring each data to each corresponding target storage node.
In one possible implementation, obtaining the target logical storage pool includes:
identifying a user identity for uploading target data, and determining user permission based on the user identity;
And determining a logical storage pool corresponding to the user authority as a target logical storage pool.
In one possible implementation, the method further includes:
when a downloading request is acquired, resolving a storage address in the downloading request to obtain a logic disaster recovery group and an FID (finite field effect device) corresponding to data to be downloaded;
searching the data to be downloaded in a logic disaster recovery group corresponding to the data to be downloaded based on the FID corresponding to the data to be downloaded;
if the data to be downloaded is file data, the file data is written into the download buffer area to obtain the data to be downloaded.
In one possible implementation manner, after searching the data to be downloaded in the logical disaster recovery group corresponding to the data to be downloaded based on the FID corresponding to the data to be downloaded, the method further includes:
if the data to be downloaded is an index object, searching the fragment data corresponding to each FID in the index object in the logic disaster recovery group corresponding to the data to be downloaded;
and writing each piece of data into a downloading buffer area according to the sequence in the index object to obtain the data to be downloaded.
In a second aspect, an embodiment of the present invention provides a blockchain-based distributed data storage device, which is applied to a storage gateway, where the storage gateway is connected to a plurality of storage controllers, and each storage controller corresponds to a storage node;
The device comprises:
the acquisition module is used for acquiring target data;
the encryption module is used for encrypting the target data based on a hash algorithm to obtain a characteristic value, and taking the characteristic value as the FID of the target data;
the storage module is used for determining a target storage node based on a preset disaster recovery isolation strategy, storing target data to the target storage node, and determining a storage address of the target data based on the FID of the target data and the target storage node.
In a third aspect, embodiments of the present invention provide a terminal comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method according to the first aspect or any one of the possible implementations of the first aspect, when the computer program is executed.
In a fourth aspect, embodiments of the present invention provide a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the method as described above in the first aspect or any one of the possible implementations of the first aspect.
The embodiment of the invention provides a distributed data storage method, a distributed data storage device and a distributed data storage terminal based on a blockchain, wherein the hash value of target data represents original target data, so that the content of the target data is non-tamper-resistant, and the risk of tampering of the data content caused by repeated uploading of a file with the same name in the prior art is avoided; meanwhile, partial storage nodes are selected for storage, and compared with all node data synchronization in the traditional block chain technology, the redundancy of the data is reduced; finally, the storage address is determined based on the FID and the storage node, so that the target data can be conveniently searched, and the availability and reliability of the target data are ensured.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of an implementation of a blockchain-based distributed data storage method provided by an embodiment of the present invention;
FIG. 2 is a schematic diagram of a block chain based distributed data storage device according to one embodiment of the present invention;
fig. 3 is a schematic diagram of a terminal according to an embodiment of the present invention.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the following description will be made by way of specific embodiments with reference to the accompanying drawings.
Referring to fig. 1, a flowchart of an implementation of a blockchain-based distributed data storage method according to an embodiment of the present invention is shown, where the method is applied to a storage gateway, and the storage gateway is connected to a plurality of storage controllers, where each storage controller corresponds to a storage node;
the method is described in detail as follows:
step 101, acquiring target data.
In this embodiment, the target data may be unstructured data such as a document, a table, or the like, that is, data without a fixed structure. After the user uploads the target data to the storage gateway, the storage gateway communicates with the storage controller to store the target data in one or more storage nodes.
And 102, encrypting the target data based on a hash algorithm to obtain a characteristic value, and taking the characteristic value as the FID of the target data.
In this embodiment, the FID, i.e., the file ID, functions similarly to the file name, and is typically used to open or select a file. However, the general file name and the file content have no specific relationship, and it cannot be verified whether the file content has been tampered by only the file name, for example, an existing file is replaced by a file having a completely different content but the same file name, and only the file name is detected unchanged later, so that the completely different file content cannot be found.
Aiming at the problem, the embodiment calculates the characteristic value as the FID of the target data by utilizing the technical characteristic of the blockchain, such as the SHA-256 algorithm, and ensures that each file with different contents corresponds to one FID with uniqueness by utilizing the characteristics of the forward direction of the hash algorithm, the reverse direction of the hash algorithm and the sensitivity of the input, and whether the file contents are tampered or not can be detected by the FID and the file contents, so that the safety of data storage is improved.
Step 103, determining a target storage node based on a preset disaster recovery isolation policy, storing target data to the target storage node, and determining a storage address of the target data based on the FID of the target data and the target storage node.
In this embodiment, the disaster recovery capability is a measure of availability of the data storage system, and a more appropriate target storage node can be selected from a plurality of storage nodes through a disaster recovery isolation policy, so as to store target data, thereby ensuring reliability and availability of data storage.
After storage, the FID of the target data and the target storage node may be spliced and combined, for example, the address of the target storage node is F00, and then the spliced storage address is F00/FID, the storage gateway may feed back the storage address to the user, and the user may search for the target data in each storage node based on the storage address in subsequent use.
In one possible implementation manner, encrypting the target data based on the hash algorithm to obtain the feature value, and taking the feature value as the FID of the target data includes:
if the target data is larger than the preset size, slicing the target data to obtain a plurality of sliced data;
encrypting each piece of data based on a hash algorithm to obtain characteristic values of each piece of data, and taking the characteristic values of each piece of data as FIDs of the corresponding piece of data;
arranging the FIDs of the piece data according to the sequence of the piece data to obtain an index object of the target data;
encrypting the index object based on the hash algorithm to obtain the characteristic value of the index object, and taking the characteristic value of the index object as the FID of the target data.
In this embodiment, after the user uploads the target data to the storage gateway, the storage gateway may limit the maximum file that can receive the upload, for example, limit the maximum file size of 4GiB, but the limit of the single file may be broken through when uploading in the multi art manner, and the limit becomes the size of each portion. The storage gateway does not store the file content, and the zero replication technology realized by FILECHANNEL class directly sends the file object to the dump controller in a streaming mode. The transfer controller is responsible for communicating with the actual storage controller, and splitting the file to control the file size before forwarding the file to the actual storage node.
The segmentation process can be divided into two cases: 1. when the original file received by the dump controller is smaller than or equal to 1MiB, the hash value is directly calculated by using SHA-256, and the hash value is defined as the FID.2. When the original file received by the transfer controller is larger than 1MiB, the file is divided into a plurality of byte arrays with the size of 1MiB and a byte array with the size of less than 1MiB according to 1MiB, namely a plurality of pieces of data, the respective FIDs are calculated by SHA-256 respectively, an index object defined by the FIDs for sequentially recording the pieces of data is generated, and the FIDs are calculated by SHA-256 for the index object to be used as the FIDs of the original file.
SHA-256 is a hash algorithm with very low collision degree, the collision degree is almost negligible, the calculated hash value is used as a unique identification of a storage object, if the storage object of the slicing process is not existed, the file is represented by FID, if the slicing process exists, an index object defined by FID for sequentially recording each slicing data is generated, and SHA-256 calculation of the index object is also performed to represent the original file.
The embodiment can control the size of data in the storage node, encrypt the fragmented data and the index object after the fragmentation of the larger target data, and comprehensively ensure the safety of the data.
In one possible implementation manner, the storage gateway is connected with a plurality of storage controllers through a plurality of dump controllers, each dump controller corresponds to the plurality of storage controllers, each dump controller is used as a logic disaster recovery group, each dump controller forms a plurality of logic storage pools, and each logic storage pool comprises a plurality of logic disaster recovery groups;
before determining the target storage node based on the preset disaster recovery isolation policy, the method further comprises:
acquiring a target logical storage pool;
the determining the target storage node based on the preset disaster recovery isolation strategy comprises the following steps:
performing modular operation on the FID of the target data based on the number of the logic disaster recovery groups in the target logic storage pool, and taking the logic disaster recovery group with the sequence number i as a target logic disaster recovery group; wherein i is a modular operation result;
selecting one or more target storage nodes from the target logic disaster recovery group based on a preset disaster recovery isolation strategy; the disaster recovery isolation strategy comprises the number of target nodes and/or the state of storage nodes.
In this embodiment, the back-end storage nodes may be divided into three levels, where the first level is a logical storage pool, and when uploading the target data, the user may select a corresponding storage pool according to the service classification, so as to place files of different departments or systems. The second layer is a logic disaster recovery group and is responsible for logically interfacing the third layer to a storage controller which actually stores data. A logical disaster-tolerant group is composed of a certain number of slots, and in order to facilitate modular operation, the number of slots in the logical disaster-tolerant group can only be split once being determined to be indistinct, and the number is an integer power of 2, for example, 1024 slots are initially divided, and 2048 slots can be split. The logical disaster recovery group can also configure the copy number, and several levels of disaster recovery strategies such as a machine room, a rack, a host computer and the like are considered, and the logical definition of the logical disaster recovery group can coordinate and place storage objects on a plurality of storage controllers of different host computers through calculation according to the copy number and the configuration of the disaster recovery strategies. The third layer is a storage controller, which corresponds to the magnetic disk devices one by one, and one host can have a plurality of storage controllers.
And (2) storing the data and metadata information of the segmented storage object in the step (S2), wherein the metadata information comprises information such as file names, uploading time, file sizes and the like. The transfer controller transfers the storage object segmented in the step S2, firstly finds a first layer of logic storage pool according to the selection of a user, then performs modular operation on the FID defined in the step S2 according to the slots of the logic disaster recovery group in the second layer of the storage pool, determines which slot to transfer to, and then selects one or more storage controllers as target storage nodes according to the number of copies in the disaster recovery isolation strategy and the state of the storage controllers monitored in the transfer controller.
And the transfer controller sends the storage object to the storage controller for storage. And according to the configured data consistency requirement, if one copy is stored and returned or all copies are stored and returned, responding to the storage gateway after enough copies are stored, and feeding back the value of the file FID.
In one possible implementation, each storage node has one or more data stored therein;
the method further comprises the steps of:
acquiring a logic disaster recovery group updating instruction; the logic disaster recovery group updating instruction comprises the number of updated logic disaster recovery groups and the serial numbers of the updated logic disaster recovery groups;
Performing modular operation on the FID of the data based on the number of updated logical disaster recovery groups for each data, and taking the logical disaster recovery group with the sequence number of i as a target logical disaster recovery group of the data, wherein i is a modular operation result, and selecting one or more storage nodes from the target logical disaster recovery group of the data as target storage nodes of the data based on a preset disaster recovery isolation strategy;
and transferring each data to each corresponding target storage node.
In this embodiment, the storage controller may be configured to communicate with the dump controller periodically to report its own status, including the disk capacity, the FID of the stored object, and the number of the associated logical disaster recovery group. The dump controller monitors the information, alarms when the reporting period of the storage controller exceeds a threshold value, and timely discovers the storage fault risk.
The dump controller is also responsible for rebalancing the data in each storage node when changing the disk or adding a new storage controller, that is, after the serial numbers of the logic disaster recovery group or the storage nodes change, each stored data is dumped to the corresponding target storage node according to the new serial numbers through the steps in the embodiment, so as to ensure that the available storage controller can be accurately positioned for reading the data through the calculation of the FID.
In one possible implementation, obtaining the target logical storage pool includes:
identifying a user identity for uploading target data, and determining user permission based on the user identity;
and determining a logical storage pool corresponding to the user authority as a target logical storage pool.
In this embodiment, an authentication mode based on an authentication controller and a dynamic access controller may be set in the storage gateway, so as to implement user identity authentication:
an authentication controller: by default, all admission, attribute-based rights control may be configured. The hierarchy is divided into users and storage groups, and the authority mapping relation is configured in the storage gateway according to the attribute. The user performs access authentication in a KEY and SECERT mode, the access authority of the storage group is controlled through the attribute, and the target logical storage pool can be directly determined through the user authority, so that the user and the storage node can be rapidly corresponding.
Dynamic admission controller: and if the permission is not available, a third party call based on WEBHOOK can be configured, and access control is carried out on the permission by the third party system which is customized in a docking mode. If any one of the WEBHOOK returns fails, the dynamic admission controller immediately denies the request.
In one possible implementation, the method further includes:
When a downloading request is acquired, resolving a storage address in the downloading request to obtain a logic disaster recovery group and an FID (finite field effect device) corresponding to data to be downloaded;
searching the data to be downloaded in a logic disaster recovery group corresponding to the data to be downloaded based on the FID corresponding to the data to be downloaded;
if the data to be downloaded is file data, the file data is written into the download buffer area to obtain the data to be downloaded.
In this embodiment, the storage address includes a serial number of a logical disaster recovery group where the data to be downloaded is located and an FID of the data to be downloaded, and the storage address is parsed to find a corresponding logical disaster recovery group, and then a corresponding file is found according to the FID, so as to download the data.
The specific steps can be as follows:
step 201: when downloading the file, the user sends the resource address of the storage gateway response when uploading the file to the storage gateway, and verifies the user identity and determines the logic storage pool through authentication and permission of the dynamic access controller.
Step 202: the storage gateway analyzes the resource address and sends the resource address to the transfer controller, and the request is converted into a search of the designated FID in the corresponding logic storage pool and the corresponding logic disaster recovery group.
Step 203: and the dump controller locates the corresponding logic disaster recovery group and the storage object on the corresponding storage controller according to the FID. If the storage object is a small file without segmentation, directly responding to the file and metadata, and if the storage object is an index object, sequentially searching again according to the FID of the storage object stored on the index object through the index object, sequentially combining the file and the metadata of the file, and responding to the storage gateway.
Step 204: and the storage gateway writes the file into a buffer area of the downloading request according to the response of the transfer server, and the user obtains the required file.
In one possible implementation manner, after searching the data to be downloaded in the logical disaster recovery group corresponding to the data to be downloaded based on the FID corresponding to the data to be downloaded, the method further includes:
if the data to be downloaded is an index object, searching the fragment data corresponding to each FID in the index object in the logic disaster recovery group corresponding to the data to be downloaded;
and writing each piece of data into a downloading buffer area according to the sequence in the index object to obtain the data to be downloaded.
In this embodiment, since any change of the file will change the FID of the storage object, no matter the file is a small file that does not need to be fragmented, or the FID is obtained by performing SHA-256 calculation on the index object generated by multiple FIDs after the large file is split, the change will occur, so that the uniqueness of the file version is ensured.
According to the embodiment of the invention, the hash value of the target data represents the original target data, so that the content of the target data is non-tamper-resistant, and the risk of tampering of the data content caused by repeated uploading of the same-name file in the prior art is avoided; meanwhile, partial storage nodes are selected for storage, and compared with all node data synchronization in the traditional block chain technology, the redundancy of the data is reduced; finally, the storage address is determined based on the FID and the storage node, so that the target data can be conveniently searched, and the availability and reliability of the target data are ensured.
The invention ensures the uniqueness of the general file by carrying out SHA-256 calculation, wherein the large file is cut and respectively carries out SHA-256 calculation, and an index object with a sequential hash value set is generated, and the uniqueness is ensured by carrying out SHA-256 calculation for the second time. In addition, by adopting the design of a logic storage pool and a disaster recovery group, the actual storage position is obtained by calculating the transfer storage of the storage object, and the problem of overlarge metadata caused by the overlarge storage object is solved. And the uniqueness guarantee of file storage is realized by a single system without relying on a third-party blockchain product, and the defect that the data cannot be guaranteed to be easily tampered in the traditional distributed storage mode only by focusing on availability and reliability is overcome.
The invention carries out safety customization on the traditional distributed storage by the technical principle of the block chain, and ensures the uniqueness of the storage object by calculating the hash value of the storage object. The hash values are calculated respectively after the large file is diced, a hash value set of a sequence is stored in a mode of an index object with a simple data structure, and hash calculation is performed on the index object again, so that the additional expenditure caused by the complex data structure of the merck tree and the balance tree is avoided, and the uniqueness requirements of different versions of the file are also ensured. In addition, through setting the logical storage pool and the disaster recovery group, and determining the exact position of the storage position through calculation, the flexibility of configuration is increased, and the problem of continuous growth of metadata is reduced. The invention also does not depend on the matching of the blockchain products of the third party to realize the function of tamper resistance of the file, and a user can flexibly select to use independently or match with the use of the third party to construct a security system which is untrustworthy and cannot be tampered.
Compared with the prior art, the technical scheme provided by the invention introduces the technical principle of the blockchain, and takes untrustworthy and untrustworthy which are not considered in the traditional storage scheme as core purposes. The complete user authentication and admission mechanism further highlights the characteristics of safe storage. Once uploaded, the file can not be changed and the appointed version is permanently reserved, the version of the file submitted during uploading is always determined through the resource address responded by the storage gateway, the possibility of randomly updating or replacing the original file is avoided, and the trust cost of each stakeholder is reduced.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.
The following are device embodiments of the invention, for details not described in detail therein, reference may be made to the corresponding method embodiments described above.
FIG. 2 is a schematic structural diagram of a distributed data storage device based on a blockchain, according to an embodiment of the present invention, and for convenience of explanation, only the portions related to the embodiment of the present invention are shown in detail below:
As shown in fig. 2, the blockchain-based distributed data storage device 2 is applied to a storage gateway, and the storage gateway is connected with a plurality of storage controllers, and each storage controller corresponds to one storage node;
the blockchain-based distributed data storage device 2 includes:
an acquisition module 21 for acquiring target data;
the encryption module 22 is configured to encrypt the target data based on a hash algorithm to obtain a feature value, and use the feature value as an FID of the target data;
the storage module 23 is configured to determine a target storage node based on a preset disaster recovery isolation policy, store target data to the target storage node, and determine a storage address of the target data based on the FID of the target data and the target storage node.
In one possible implementation, the encryption module 22 is specifically configured to:
if the target data is larger than the preset size, slicing the target data to obtain a plurality of sliced data;
encrypting each piece of data based on a hash algorithm to obtain characteristic values of each piece of data, and taking the characteristic values of each piece of data as FIDs of the corresponding piece of data;
arranging the FIDs of the piece data according to the sequence of the piece data to obtain an index object of the target data;
Encrypting the index object based on the hash algorithm to obtain the characteristic value of the index object, and taking the characteristic value of the index object as the FID of the target data.
In one possible implementation manner, the storage gateway is connected with a plurality of storage controllers through a plurality of dump controllers, each dump controller corresponds to the plurality of storage controllers, each dump controller is used as a logic disaster recovery group, each dump controller forms a plurality of logic storage pools, and each logic storage pool comprises a plurality of logic disaster recovery groups;
the storage module 23 is further configured to:
before determining a target storage node based on a preset disaster recovery isolation strategy, acquiring a target logical storage pool;
the determining the target storage node based on the preset disaster recovery isolation strategy comprises the following steps:
performing modular operation on the FID of the target data based on the number of the logic disaster recovery groups in the target logic storage pool, and taking the logic disaster recovery group with the sequence number i as a target logic disaster recovery group; wherein i is a modular operation result;
selecting one or more target storage nodes from the target logic disaster recovery group based on a preset disaster recovery isolation strategy; the disaster recovery isolation strategy comprises the number of target nodes and/or the state of storage nodes.
In one possible implementation, each storage node has one or more data stored therein;
the storage module 23 is further configured to:
acquiring a logic disaster recovery group updating instruction; the logic disaster recovery group updating instruction comprises the number of updated logic disaster recovery groups and the serial numbers of the updated logic disaster recovery groups;
performing modular operation on the FID of the data based on the number of updated logical disaster recovery groups for each data, and taking the logical disaster recovery group with the sequence number of i as a target logical disaster recovery group of the data, wherein i is a modular operation result, and selecting one or more storage nodes from the target logical disaster recovery group of the data as target storage nodes of the data based on a preset disaster recovery isolation strategy;
and transferring each data to each corresponding target storage node.
In one possible implementation, the storage module 23 is specifically configured to:
identifying a user identity for uploading target data, and determining user permission based on the user identity;
and determining a logical storage pool corresponding to the user authority as a target logical storage pool.
In one possible implementation, the storage module 23 is further configured to:
when a downloading request is acquired, resolving a storage address in the downloading request to obtain a logic disaster recovery group and an FID (finite field effect device) corresponding to data to be downloaded;
Searching the data to be downloaded in a logic disaster recovery group corresponding to the data to be downloaded based on the FID corresponding to the data to be downloaded;
if the data to be downloaded is file data, the file data is written into the download buffer area to obtain the data to be downloaded.
In one possible implementation, the storage module 23 is further configured to:
after searching the data to be downloaded in the logic disaster recovery group corresponding to the data to be downloaded based on the FID corresponding to the data to be downloaded, if the data to be downloaded is an index object, searching the fragment data corresponding to each FID in the index object in the logic disaster recovery group corresponding to the data to be downloaded;
and writing each piece of data into a downloading buffer area according to the sequence in the index object to obtain the data to be downloaded.
According to the embodiment of the invention, the hash value of the target data represents the original target data, so that the content of the target data is non-tamper-resistant, and the risk of tampering of the data content caused by repeated uploading of the same-name file in the prior art is avoided; meanwhile, partial storage nodes are selected for storage, and compared with all node data synchronization in the traditional block chain technology, the redundancy of the data is reduced; finally, the storage address is determined based on the FID and the storage node, so that the target data can be conveniently searched, and the availability and reliability of the target data are ensured.
Fig. 3 is a schematic diagram of a terminal according to an embodiment of the present invention. As shown in fig. 3, the terminal 3 of this embodiment includes: a processor 30, a memory 31 and a computer program 32 stored in said memory 31 and executable on said processor 30. The processor 30, when executing the computer program 32, performs the steps described above in various embodiments of the blockchain-based distributed data storage method, such as steps 101 through 103 shown in fig. 1. Alternatively, the processor 30 may perform the functions of the modules/units of the apparatus embodiments described above, such as the functions of the modules/units 21 to 23 shown in fig. 2, when executing the computer program 32.
Illustratively, the computer program 32 may be partitioned into one or more modules/units that are stored in the memory 31 and executed by the processor 30 to complete the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions for describing the execution of the computer program 32 in the terminal 3. For example, the computer program 32 may be split into modules/units 21 to 23 shown in fig. 2.
The terminal 3 may be a computing device such as a desktop computer, a notebook computer, a palm computer, a cloud server, etc. The terminal 3 may include, but is not limited to, a processor 30, a memory 31. It will be appreciated by those skilled in the art that fig. 3 is merely an example of the terminal 3 and does not constitute a limitation of the terminal 3, and may include more or less components than illustrated, or may combine certain components, or different components, e.g., the terminal may further include an input-output device, a network access device, a bus, etc.
The processor 30 may be a central processing unit (Central Processing Unit, CPU), other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field-programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 31 may be an internal storage unit of the terminal 3, such as a hard disk or a memory of the terminal 3. The memory 31 may be an external storage device of the terminal 3, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the terminal 3. Further, the memory 31 may also include both an internal storage unit and an external storage device of the terminal 3. The memory 31 is used for storing the computer program as well as other programs and data required by the terminal. The memory 31 may also be used for temporarily storing data that has been output or is to be output.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal and method may be implemented in other manners. For example, the apparatus/terminal embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present invention may also be implemented by implementing all or part of the above-described embodiment method flow, or by implementing relevant hardware by a computer program, where the computer program may be stored in a computer readable storage medium, and the computer program may be executed by a processor to implement the steps of each of the above-described blockchain-based distributed data storage method embodiments. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium may include content that is subject to appropriate increases and decreases as required by jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is not included as electrical carrier signals and telecommunication signals.
The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims (10)

1. The distributed data storage method based on the blockchain is characterized by being applied to a storage gateway, wherein the storage gateway is connected with a plurality of storage controllers, and each storage controller corresponds to one storage node;
the method comprises the following steps:
acquiring target data;
encrypting the target data based on a hash algorithm to obtain a characteristic value, and taking the characteristic value as the FID of the target data;
determining a target storage node based on a preset disaster recovery isolation strategy, storing the target data to the target storage node, and determining a storage address of the target data based on the FID of the target data and the target storage node.
2. The blockchain-based distributed data storage method of claim 1, wherein encrypting the target data based on a hash algorithm to obtain a feature value, and taking the feature value as the FID of the target data comprises:
if the target data is larger than the preset size, slicing the target data to obtain a plurality of sliced data;
encrypting each piece of data based on a hash algorithm to obtain characteristic values of each piece of data, and taking the characteristic values of each piece of data as FIDs of the corresponding piece of data;
arranging the FIDs of the piece data according to the sequence of the piece data to obtain an index object of the target data;
encrypting the index object based on a hash algorithm to obtain a characteristic value of the index object, and taking the characteristic value of the index object as the FID of the target data.
3. The blockchain-based distributed data storage method of claim 1, wherein the storage gateway is connected with a plurality of storage controllers through a plurality of dump controllers, each dump controller corresponds to the plurality of storage controllers, each dump controller serves as a logical disaster recovery group, each dump controller forms a plurality of logical storage pools, and each logical storage pool comprises a plurality of logical disaster recovery groups;
Before the target storage node is determined based on the preset disaster recovery isolation policy, the method further comprises:
acquiring a target logical storage pool;
the determining the target storage node based on the preset disaster recovery isolation policy comprises the following steps:
performing modular operation on the FID of the target data based on the number of the logic disaster recovery groups in the target logic storage pool, and taking the logic disaster recovery group with the sequence number i as a target logic disaster recovery group; wherein i is a modular operation result;
selecting one or more target storage nodes from the target logic disaster recovery group based on a preset disaster recovery isolation strategy; the disaster recovery isolation strategy comprises the number of target nodes and/or the state of storage nodes.
4. A blockchain-based distributed data storage method as in claim 3, wherein one or more data is stored in each storage node;
the method further comprises the steps of:
acquiring a logic disaster recovery group updating instruction; the logic disaster recovery group updating instruction comprises the number of updated logic disaster recovery groups and the serial numbers of all the updated logic disaster recovery groups;
performing modular operation on the FID of the data based on the number of updated logical disaster recovery groups for each data, and taking the logical disaster recovery group with the sequence number of i as a target logical disaster recovery group of the data, wherein i is a modular operation result, and selecting one or more storage nodes from the target logical disaster recovery group of the data as target storage nodes of the data based on a preset disaster recovery isolation strategy;
And transferring each data to each corresponding target storage node.
5. The blockchain-based distributed data storage method of claim 3, wherein the obtaining the target logical storage pool includes:
identifying a user identity for uploading the target data, and determining user permission based on the user identity;
and determining a logical storage pool corresponding to the user authority as a target logical storage pool.
6. The blockchain-based distributed data storage method of claim 3, further comprising:
when a downloading request is obtained, resolving a storage address in the downloading request to obtain a logic disaster recovery group and an FID (finite field effect device) corresponding to data to be downloaded;
searching the data to be downloaded in a logic disaster recovery group corresponding to the data to be downloaded based on the FID corresponding to the data to be downloaded;
and if the data to be downloaded is file data, writing the file data into a download buffer area to obtain the data to be downloaded.
7. The blockchain-based distributed data storage method of claim 6, further comprising, after searching the to-be-downloaded data in the logical disaster recovery group corresponding to the to-be-downloaded data based on the FID corresponding to the to-be-downloaded data:
If the data to be downloaded is an index object, searching the fragment data corresponding to each FID in the index object in a logic disaster recovery group corresponding to the data to be downloaded;
and writing each piece of data into a downloading buffer area according to the sequence in the index object to obtain the data to be downloaded.
8. A distributed data storage device based on a blockchain is characterized by being applied to a storage gateway, wherein the storage gateway is connected with a plurality of storage controllers, and each storage controller corresponds to one storage node;
the device comprises:
the acquisition module is used for acquiring target data;
the encryption module is used for encrypting the target data based on a hash algorithm to obtain a characteristic value, and taking the characteristic value as the FID of the target data;
the storage module is used for determining a target storage node based on a preset disaster recovery isolation strategy, storing the target data to the target storage node, and determining a storage address of the target data based on the FID of the target data and the target storage node.
9. A terminal comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of the preceding claims 1 to 7 when the computer program is executed.
10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any of the preceding claims 1 to 7.
CN202310762035.8A 2023-06-26 2023-06-26 Distributed data storage method, device and terminal based on block chain Pending CN116701539A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310762035.8A CN116701539A (en) 2023-06-26 2023-06-26 Distributed data storage method, device and terminal based on block chain

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310762035.8A CN116701539A (en) 2023-06-26 2023-06-26 Distributed data storage method, device and terminal based on block chain

Publications (1)

Publication Number Publication Date
CN116701539A true CN116701539A (en) 2023-09-05

Family

ID=87825537

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310762035.8A Pending CN116701539A (en) 2023-06-26 2023-06-26 Distributed data storage method, device and terminal based on block chain

Country Status (1)

Country Link
CN (1) CN116701539A (en)

Similar Documents

Publication Publication Date Title
US11500729B2 (en) System and method for preserving data using replication and blockchain notarization
US11334562B2 (en) Blockchain based data management system and method thereof
US9792306B1 (en) Data transfer between dissimilar deduplication systems
CN111090645B (en) Cloud storage-based data transmission method and device and computer equipment
EP3652885B1 (en) Secure token passing via blockchains
US8554743B2 (en) Optimization of a computing environment in which data management operations are performed
US11385830B2 (en) Data storage method, apparatus and system, and server, control node and medium
US11068446B2 (en) Multi-cloud bi-directional storage replication system and techniques
US10762051B1 (en) Reducing hash collisions in large scale data deduplication
CN102067148A (en) Methods and systems for determining file classifications
US11526494B2 (en) Blockchain-based computing system and method for managing transaction thereof
US11314885B2 (en) Cryptographic data entry blockchain data structure
CN112163412B (en) Data verification method and device, electronic equipment and storage medium
KR20210050959A (en) Blockchain based file management system and method thereof
CN111177257A (en) Data storage and access method, device and equipment of block chain
US11550913B2 (en) System and method for performing an antivirus scan using file level deduplication
US10142415B2 (en) Data migration
CN111832018A (en) Virus detection method, virus detection device, computer device and storage medium
CN116701539A (en) Distributed data storage method, device and terminal based on block chain
CN114089924B (en) Block chain account book data storage system and method
CN117376364A (en) Data processing method and related equipment
CN114138711A (en) File migration method and device, storage medium and electronic equipment
CN113051622B (en) Index construction method, device, equipment and storage medium
CN115129789A (en) Bucket index storage method, device and medium of distributed object storage system
CN112035471A (en) Transaction processing method and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination