CN114721594A - Distributed storage method, device, equipment and machine readable storage medium - Google Patents

Distributed storage method, device, equipment and machine readable storage medium Download PDF

Info

Publication number
CN114721594A
CN114721594A CN202210329259.5A CN202210329259A CN114721594A CN 114721594 A CN114721594 A CN 114721594A CN 202210329259 A CN202210329259 A CN 202210329259A CN 114721594 A CN114721594 A CN 114721594A
Authority
CN
China
Prior art keywords
storage
data
data block
stored
characteristic information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210329259.5A
Other languages
Chinese (zh)
Inventor
何培
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New H3C Information Technologies Co Ltd
Original Assignee
New H3C Information Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by New H3C Information Technologies Co Ltd filed Critical New H3C Information Technologies Co Ltd
Priority to CN202210329259.5A priority Critical patent/CN114721594A/en
Publication of CN114721594A publication Critical patent/CN114721594A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1464Management of the backup or restore process for networked environments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • G06F3/0641De-duplication techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0652Erasing, e.g. deleting, data cleaning, moving of data to a wastebasket

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure provides a distributed storage method, apparatus, device and machine-readable storage medium, the method comprising: responding to a data storage request, dividing a data packet to be stored into a plurality of data blocks, calculating characteristic information of each data block, and storing the characteristic information of each data block to each storage node of a storage cluster in a fragmentation manner; comparing each data block divided by the data packet to be stored with each data block stored in the storage cluster according to the characteristic information; a pointer is set. According to the technical scheme, the deduplication storage system is constructed in a distributed storage mode, the characteristic information and the data blocks of the deduplication storage system are stored in each storage node of the storage cluster in a fragmentation mode, the deduplication storage system reserves the high storage utilization rate of a deduplication storage mode, meanwhile, the multiple storage nodes concurrently process services in the data storage process at least, load balance can be achieved, and the upper limit of storage performance is improved.

Description

Distributed storage method, device, equipment and machine readable storage medium
Technical Field
The present disclosure relates to the field of communications technologies, and in particular, to a distributed storage method, apparatus, device, and machine-readable storage medium.
Background
Deduplication (Data Deduplication ): a technique for saving data storage space. A plurality of repeated data are stored in the storage system, the data occupy a large amount of hard disk space, and only one copy of data can be stored by using a repeated data deleting technology, so that the storage utilization rate is effectively improved.
The existing deduplication storage system is a storage system of a single machine/a single node/a backup all-in-one machine, and has the problems of large capacity expansion limitation, data loss due to faults and performance bottleneck. Specifically, the maximum capacity of a single storage server is determined by the number of server disks and the capacity of a single disk, capacity expansion can be performed only by adding a hard disk, and longitudinal expansion has the limitations of maximum limit, shutdown, complex operation and the like; the deduplication pool is composed of a fingerprint database, data blocks and fingerprint indexes, and once fingerprints and data blocks are damaged in a single storage server, data cannot be recovered; in the storage process, reading, writing and network bandwidth are three major factors influencing a storage window, and under the condition of concurrent storage operation, a single storage server can become the performance bottleneck of the storage window and is lack of load balancing.
Disclosure of Invention
In view of the above, the present disclosure provides a distributed storage method, a distributed storage apparatus, an electronic device, and a machine-readable storage medium, so as to at least improve one of the above technical problems.
The specific technical scheme is as follows:
the disclosure provides a distributed storage method, which is applied to a storage cluster, and the method comprises the following steps: responding to a data storage request, dividing a data packet to be stored into a plurality of data blocks, calculating characteristic information of each data block, and storing the characteristic information of each data block to each storage node of a storage cluster in a fragmentation mode; comparing each data block divided by the data packet to be stored with each data block stored in the storage cluster according to the characteristic information; setting a pointer pointing to the data block stored in the storage cluster to be associated with the characteristic information associated with the data to be stored according to the comparison result that the data block to be stored associated with the characteristic information is the same as the data block stored in the storage cluster; and storing the data block to be stored to the storage cluster according to a comparison result that the data block to be stored associated with the characteristic information is different from each data block stored in the storage cluster, and setting a pointer pointing to the data block to be stored to the storage cluster to be associated with the characteristic information associated with the data to be stored.
As a technical solution, the dividing a data packet to be stored into a plurality of data blocks in response to a data storage request, calculating feature information of each data block, and storing the feature information of each data block to each storage node of a storage cluster in a fragmented manner includes: and recording the characteristic information of each data block and the association relation between the characteristic information and the data packet to the characteristic index.
As a technical solution, in response to a data reading request, obtaining feature information of each data block associated with a data packet to be read from a feature index; inquiring matched characteristic information at each storage node of the storage cluster according to the characteristic information of each data block associated with the data packet to be read; finding and returning each data block related to the data packet to be read according to the pointer of the matched characteristic information; and restoring and generating the data packet to be read according to the returned data blocks.
As a technical scheme, backup data is generated, wherein the backup data backups characteristic information and correspondingly set pointers of distributed storage of each storage node of a storage cluster; if the storage nodes of the storage cluster are changed, storing characteristic information and correspondingly set pointers in a distributed manner according to the changed storage nodes of the storage cluster of the pre-backup data; the change of the storage nodes of the storage cluster comprises adding storage nodes or reducing storage nodes or replacing storage nodes.
The present disclosure also provides a distributed storage apparatus, which is applied to a storage cluster, and the apparatus includes: the characteristic module is used for responding to a data storage request, dividing a data packet to be stored into a plurality of data blocks, calculating characteristic information of each data block, and storing the characteristic information of each data block to each storage node of the storage cluster in a fragmentation mode; the comparison module is used for comparing each data block divided by the data packet to be stored with each data block stored in the storage cluster according to the characteristic information; the storage module is used for setting a pointer pointing to the data block stored in the storage cluster to be associated with the characteristic information associated with the data to be stored according to the comparison result that the data block to be stored which is associated with the characteristic information is the same as the data block stored in the storage cluster; the storage module is further configured to store the data block to be stored to the storage cluster according to a comparison result that the data block to be stored associated with the feature information is different from each data block stored in the storage cluster, and set a pointer pointing to the data block to be stored to the storage cluster to be associated with the feature information associated with the data block to be stored.
As a technical solution, the dividing a data packet to be stored into a plurality of data blocks in response to a data storage request, calculating feature information of each data block, and storing the feature information of each data block to each storage node of a storage cluster in a fragmented manner includes: and recording the characteristic information of each data block and the association relation between the characteristic information and the data packet to the characteristic index.
As a technical solution, the reading module is configured to respond to a data reading request, and obtain feature information associated with each data block of a data packet to be read from a feature index; the query module is used for querying matched characteristic information in each storage node of the storage cluster according to the characteristic information of each data block associated with the data packet to be read; the transmission module is used for finding and returning each data block related to the data packet to be read according to the pointer of the matched characteristic information; and the data module is used for recovering and generating the data packet to be read according to the returned data blocks.
As a technical solution, the backup module is configured to generate backup data, where the backup data backs up feature information and correspondingly set pointers stored in a distributed manner in each storage node of a storage cluster; the recovery module is used for storing the characteristic information and the correspondingly set pointer in a distributed manner according to the changed storage nodes of the storage cluster of the pre-backup data if the storage nodes of the storage cluster are changed; the change of the storage nodes of the storage cluster comprises adding storage nodes or reducing storage nodes or replacing storage nodes.
The present disclosure also provides an electronic device including a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor, the processor executing the machine-executable instructions to implement the foregoing distributed storage method.
The present disclosure also provides a machine-readable storage medium having stored thereon machine-executable instructions that, when invoked and executed by a processor, cause the processor to implement the aforementioned distributed storage method.
The technical scheme provided by the disclosure at least brings the following beneficial effects:
the deduplication storage system is constructed in a distributed storage mode, and the feature information and the data blocks of the deduplication storage system are stored in each storage node of the storage cluster in a fragmentation mode, so that the deduplication storage system reserves the high storage utilization rate of a deduplication storage mode, simultaneously, at least multiple storage nodes concurrently process services in the data storage process, load balance can be achieved, and the upper limit of storage performance is improved.
Drawings
In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments of the present disclosure or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present disclosure, and other drawings can be obtained by those skilled in the art according to the drawings of the embodiments of the present disclosure.
FIG. 1 is a flow chart of a distributed storage method in one embodiment of the present disclosure;
FIG. 2 is a block diagram of a distributed storage apparatus in one embodiment of the present disclosure;
fig. 3 is a hardware configuration diagram of an electronic device in an embodiment of the present disclosure.
Detailed Description
The terminology used in the embodiments of the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein is meant to encompass any and all possible combinations of one or more of the associated listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information in the embodiments of the present disclosure, such information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present disclosure. Depending on the context, moreover, the word "if" as used may be interpreted as "at … …" or "when … …" or "in response to a determination".
The present disclosure provides a distributed storage method, apparatus, electronic device, and machine-readable storage medium to at least improve one of the above technical problems.
The specific technical scheme is as follows.
In one embodiment, the present disclosure provides a distributed storage method applied to a storage cluster, where the method includes: responding to a data storage request, dividing a data packet to be stored into a plurality of data blocks, calculating characteristic information of each data block, and storing the characteristic information of each data block to each storage node of a storage cluster in a fragmentation mode; comparing each data block divided by the data packet to be stored with each data block stored in the storage cluster according to the characteristic information; setting a pointer pointing to the data block stored in the storage cluster to be associated with the characteristic information associated with the data to be stored according to the comparison result that the data block to be stored associated with the characteristic information is the same as the data block stored in the storage cluster; and storing the data block to be stored to the storage cluster according to a comparison result that the data block to be stored associated with the characteristic information is different from each data block stored in the storage cluster, and setting a pointer pointing to the data block to be stored to the storage cluster to be associated with the characteristic information associated with the data to be stored.
Specifically, as shown in fig. 1, the method comprises the following steps:
step S11, in response to the data storage request, dividing the data packet to be stored into a plurality of data blocks, calculating characteristic information of each data block, and storing the characteristic information of each data block to each storage node of the storage cluster in a fragmented manner.
Step S12, comparing each data block divided by the data packet to be stored with each data block stored in the storage cluster according to the characteristic information.
Step S131, according to the comparison result that the data block to be stored associated with the characteristic information is the same as the stored data block of a storage cluster, setting a pointer pointing to the stored data block of the storage cluster to be associated with the characteristic information associated with the data block to be stored.
Step S132, according to the comparison result that the data block to be stored associated with the characteristic information is different from the data blocks stored in the storage cluster, storing the data block to be stored in the storage cluster, and setting a pointer pointing to the data block to be stored in the storage cluster to be associated with the characteristic information associated with the data block to be stored.
The deduplication storage system is constructed in a distributed storage mode, and the feature information and the data blocks of the deduplication storage system are stored in each storage node of the storage cluster in a fragmentation mode, so that the deduplication storage system reserves the high storage utilization rate of a deduplication storage mode, simultaneously, at least multiple storage nodes concurrently process services in the data storage process, load balance can be achieved, and the upper limit of storage performance is improved.
In one embodiment, the dividing, in response to a data storage request, a data packet to be stored into a plurality of data blocks, calculating feature information of each data block, and storing the feature information of each data block in a fragmented manner to each storage node of a storage cluster includes: and recording the characteristic information of each data block and the association relation between the characteristic information and the data packet to the characteristic index.
In one embodiment, in response to a data reading request, acquiring characteristic information associated with each data block of a data packet to be read from a characteristic index; inquiring matched characteristic information at each storage node of the storage cluster according to the characteristic information of each data block related to the data packet to be read; finding and returning each data block related to the data packet to be read according to the pointer of the matched characteristic information; and restoring and generating the data packet to be read according to the returned data blocks.
In one embodiment, backup data is generated, wherein the backup data backups characteristic information and correspondingly set pointers of distributed storage of each storage node of a storage cluster; if the storage nodes of the storage cluster are changed, storing characteristic information and correspondingly set pointers in a distributed manner according to the changed storage nodes of the storage cluster of the pre-backup data; the change of the storage nodes of the storage cluster comprises adding storage nodes or reducing storage nodes or replacing storage nodes.
In one embodiment, the characteristic information of the data block is a fingerprint of the data block, and the fingerprint refers to a hash value of the data block calculated by a set algorithm, such as a hash algorithm. And taking the fingerprint hash value as the unique identifier of the data block, and when the fingerprints of two data blocks are the same, considering the two data blocks as the same data block. Other parameters with uniqueness may also be used as the characteristic information of the data block.
The characteristic index stores the association relationship between the fingerprints of the data blocks divided by the data packet and the data packet, and records the fingerprints and the association relationship between the fingerprints and the corresponding data packet in sequence in the characteristic index, so that the data block blocking information and the corresponding fingerprints of the data packet can be obtained through the characteristic index.
In one embodiment, backup software installed on servers in a storage cluster may be utilized. A management server component and a storage server component of backup software are deployed on a plurality of storage servers to form a distributed deduplication cluster, and the cluster does not distinguish master nodes from slave nodes.
In one embodiment, the present disclosure also provides a distributed storage apparatus, as shown in fig. 2, applied to a storage cluster, the apparatus including: the characteristic module 21 is configured to respond to a data storage request, divide a data packet to be stored into a plurality of data blocks, calculate characteristic information of each data block, and store the characteristic information of each data block to each storage node of a storage cluster in a partitioned manner; the comparison module 22 is configured to compare each data block divided by the data packet to be stored with each data block stored in the storage cluster according to the feature information; the storage module 23 is configured to set, according to a comparison result that a data block to be stored associated with feature information and a data block already stored in a storage cluster are the same, a pointer pointing to the data block already stored in the storage cluster to be associated with the feature information associated with the data block to be stored; the storage module is further configured to store the data block to be stored to the storage cluster according to a comparison result that the data block to be stored associated with the feature information is different from each data block stored in the storage cluster, and set a pointer pointing to the data block to be stored to the storage cluster to be associated with the feature information associated with the data block to be stored.
In one embodiment, the dividing, in response to a data storage request, a data packet to be stored into a plurality of data blocks, calculating feature information of each data block, and storing the feature information of each data block in a fragmented manner to each storage node of a storage cluster includes: and recording the characteristic information of each data block and the association relation between the characteristic information and the data packet to the characteristic index.
In one embodiment, the reading module is configured to, in response to a data reading request, obtain characteristic information associated with each data block of a data packet to be read from a characteristic index; the query module is used for querying matched characteristic information in each storage node of the storage cluster according to the characteristic information of each data block associated with the data packet to be read; the transmission module is used for finding and returning each data block related to the data packet to be read according to the pointer of the matched characteristic information; and the data module is used for recovering and generating the data packet to be read according to the returned data blocks.
In one embodiment, the backup module is configured to generate backup data, where the backup data backs up feature information and a correspondingly set pointer that are distributively stored in each storage node of the storage cluster; the recovery module is used for storing the characteristic information and the correspondingly set pointer in a distributed manner according to the changed storage nodes of the storage cluster of the pre-backup data if the storage nodes of the storage cluster are changed; the change of the storage nodes of the storage cluster comprises adding storage nodes or reducing storage nodes or replacing storage nodes.
The present disclosure also provides an electronic device including a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor, the processor executing the machine-executable instructions to implement the foregoing distributed storage method.
The device embodiments are the same or similar to the corresponding method embodiments and are not described herein again.
In one embodiment, the present disclosure provides an electronic device, which includes a processor and a machine-readable storage medium, where the machine-readable storage medium stores machine-executable instructions capable of being executed by the processor, and the processor executes the machine-executable instructions to implement the foregoing distributed storage method, and from a hardware level, a schematic diagram of a hardware architecture may be shown in fig. 3.
In one embodiment, the present disclosure provides a machine-readable storage medium having stored thereon machine-executable instructions that, when invoked and executed by a processor, cause the processor to implement the aforementioned distributed storage method.
Here, a machine-readable storage medium may be any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and so forth. For example, the machine-readable storage medium may be: a RAM (random Access Memory), a volatile Memory, a non-volatile Memory, a flash Memory, a storage drive (e.g., a hard drive), a solid state drive, any type of storage disk (e.g., an optical disk, a dvd, etc.), or similar storage medium, or a combination thereof.
The systems, devices, modules or units described in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the various elements may be implemented in the same one or more software and/or hardware implementations in practicing the disclosure.
As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.
The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Furthermore, these computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media (which may include, but is not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above description is only an embodiment of the present disclosure, and is not intended to limit the present disclosure. Various modifications and variations of this disclosure will occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present disclosure should be included in the scope of the claims of the present disclosure.

Claims (10)

1. A distributed storage method applied to a storage cluster, the method comprising:
responding to a data storage request, dividing a data packet to be stored into a plurality of data blocks, calculating characteristic information of each data block, and storing the characteristic information of each data block to each storage node of a storage cluster in a fragmentation mode;
comparing each data block divided by the data packet to be stored with each data block stored in the storage cluster according to the characteristic information;
setting a pointer pointing to the data block stored in the storage cluster to be associated with the characteristic information associated with the data to be stored according to the comparison result that the data block to be stored associated with the characteristic information is the same as the data block stored in the storage cluster;
and storing the data block to be stored to the storage cluster according to a comparison result that the data block to be stored associated with the characteristic information is different from each data block stored in the storage cluster, and setting a pointer pointing to the data block to be stored to the storage cluster to be associated with the characteristic information associated with the data to be stored.
2. The method according to claim 1, wherein the dividing a data packet to be stored into a plurality of data blocks in response to a data storage request, calculating feature information of each data block, and storing the feature information of each data block in a fragmented manner to each storage node of a storage cluster comprises:
and recording the characteristic information of each data block and the association relation between the characteristic information and the data packet to the characteristic index.
3. The method of claim 2, further comprising:
responding to a data reading request, and acquiring characteristic information of each data block related to a data packet to be read from the characteristic index;
inquiring matched characteristic information at each storage node of the storage cluster according to the characteristic information of each data block related to the data packet to be read;
finding and returning each data block related to the data packet to be read according to the pointer of the matched characteristic information;
and restoring and generating the data packet to be read according to the returned data blocks.
4. The method of claim 1, further comprising:
generating backup data, wherein the backup data backups characteristic information and correspondingly set pointers of distributed storage of each storage node of a storage cluster;
if the storage nodes of the storage cluster change, storing characteristic information and correspondingly set pointers in a distributed manner according to the changed storage nodes of the storage cluster with the pre-backup data;
the change of the storage nodes of the storage cluster comprises adding storage nodes or reducing storage nodes or replacing storage nodes.
5. A distributed storage apparatus, applied to a storage cluster, the apparatus comprising:
the characteristic module is used for responding to a data storage request, dividing a data packet to be stored into a plurality of data blocks, calculating characteristic information of each data block, and storing the characteristic information of each data block to each storage node of the storage cluster in a fragmentation mode;
the comparison module is used for comparing each data block divided by the data packet to be stored with each data block stored in the storage cluster according to the characteristic information;
the storage module is used for setting a pointer pointing to the data block stored in the storage cluster to be associated with the characteristic information associated with the data to be stored according to the comparison result that the data block to be stored which is associated with the characteristic information is the same as the data block stored in the storage cluster;
the storage module is further configured to store the data block to be stored to the storage cluster according to a comparison result that the data block to be stored associated with the feature information is different from each data block stored in the storage cluster, and set a pointer pointing to the data block to be stored to the storage cluster to be associated with the feature information associated with the data block to be stored.
6. The apparatus according to claim 5, wherein the, in response to the data storage request, dividing the data packet to be stored into a plurality of data blocks, calculating feature information of each data block, and storing the feature information of each data block in a fragmented manner to each storage node of the storage cluster, includes:
and recording the characteristic information of each data block and the association relation between the characteristic information and the data packet to the characteristic index.
7. The apparatus of claim 6, further comprising:
the reading module is used for responding to a data reading request and acquiring the characteristic information of each data block related to the data packet to be read from the characteristic index;
the query module is used for querying matched characteristic information in each storage node of the storage cluster according to the characteristic information of each data block associated with the data packet to be read;
the transmission module is used for finding and returning each data block related to the data packet to be read according to the pointer of the matched characteristic information;
and the data module is used for recovering and generating the data packet to be read according to the returned data blocks.
8. The apparatus of claim 5, further comprising:
the backup module is used for generating backup data, and the backup data backups characteristic information and correspondingly set pointers stored in a distributed manner in each storage node of the storage cluster;
the recovery module is used for storing the characteristic information and the correspondingly set pointer in a distributed manner according to the changed storage nodes of the storage cluster of the pre-backup data if the storage nodes of the storage cluster are changed;
the change of the storage nodes of the storage cluster comprises adding storage nodes or reducing storage nodes or replacing storage nodes.
9. An electronic device, comprising: a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor to perform the method of any one of claims 1 to 4.
10. A machine-readable storage medium having stored thereon machine-executable instructions which, when invoked and executed by a processor, cause the processor to implement the method of any of claims 1-4.
CN202210329259.5A 2022-03-31 2022-03-31 Distributed storage method, device, equipment and machine readable storage medium Pending CN114721594A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210329259.5A CN114721594A (en) 2022-03-31 2022-03-31 Distributed storage method, device, equipment and machine readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210329259.5A CN114721594A (en) 2022-03-31 2022-03-31 Distributed storage method, device, equipment and machine readable storage medium

Publications (1)

Publication Number Publication Date
CN114721594A true CN114721594A (en) 2022-07-08

Family

ID=82239195

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210329259.5A Pending CN114721594A (en) 2022-03-31 2022-03-31 Distributed storage method, device, equipment and machine readable storage medium

Country Status (1)

Country Link
CN (1) CN114721594A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116991329A (en) * 2023-09-25 2023-11-03 深圳市明泰智能技术有限公司 Data redundancy prevention method and system for self-service terminal equipment
CN117688106A (en) * 2024-02-04 2024-03-12 广东东华发思特软件有限公司 Efficient distributed data storage and retrieval system, method and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116991329A (en) * 2023-09-25 2023-11-03 深圳市明泰智能技术有限公司 Data redundancy prevention method and system for self-service terminal equipment
CN116991329B (en) * 2023-09-25 2023-12-08 深圳市明泰智能技术有限公司 Data redundancy prevention method and system for self-service terminal equipment
CN117688106A (en) * 2024-02-04 2024-03-12 广东东华发思特软件有限公司 Efficient distributed data storage and retrieval system, method and storage medium

Similar Documents

Publication Publication Date Title
CN110471795B (en) Block chain state data recovery method and device and electronic equipment
CN107807794B (en) Data storage method and device
US8782011B2 (en) System and method for scalable reference management in a deduplication based storage system
CN106874348B (en) File storage and index method and device and file reading method
CN111444196B (en) Method, device and equipment for generating Hash of global state in block chain type account book
CN114721594A (en) Distributed storage method, device, equipment and machine readable storage medium
CN109032803B (en) Data processing method and device and client
CN111444192B (en) Method, device and equipment for generating Hash of global state in block chain type account book
CN111522502B (en) Data deduplication method and device, electronic equipment and computer-readable storage medium
CN109145053B (en) Data processing method and device, client and server
CN113535670B (en) Virtual resource mirror image storage system and implementation method thereof
CN114936188A (en) Data processing method and device, electronic equipment and storage medium
CN108399175B (en) Data storage and query method and device
CN107145306B (en) Distributed data storage method and system
CN112800057B (en) Fingerprint table management method and device
CN115756955A (en) Data backup and data recovery method and device and computer equipment
CN114268501B (en) Data processing method, firewall generating method, computing device and storage medium
CN114785662B (en) Storage management method, device, equipment and machine-readable storage medium
CN114647658A (en) Data retrieval method, device, equipment and machine-readable storage medium
CN115421856A (en) Data recovery method and device
CN109791541B (en) Log serial number generation method and device and readable storage medium
CN113419792A (en) Event processing method and device, terminal equipment and storage medium
CN109032804B (en) Data processing method and device and server
CN112565373B (en) Method and device for removing duplicate of mirror image file
CN117539690B (en) Method, device, equipment, medium and product for merging and recovering multi-disk data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination