CN114721594A - Distributed storage method, device, equipment and machine readable storage medium - Google Patents
Distributed storage method, device, equipment and machine readable storage medium Download PDFInfo
- Publication number
- CN114721594A CN114721594A CN202210329259.5A CN202210329259A CN114721594A CN 114721594 A CN114721594 A CN 114721594A CN 202210329259 A CN202210329259 A CN 202210329259A CN 114721594 A CN114721594 A CN 114721594A
- Authority
- CN
- China
- Prior art keywords
- storage
- data
- data block
- stored
- characteristic information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1458—Management of the backup or restore process
- G06F11/1464—Management of the backup or restore process for networked environments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/172—Caching, prefetching or hoarding of files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/064—Management of blocks
- G06F3/0641—De-duplication techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0646—Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
- G06F3/0652—Erasing, e.g. deleting, data cleaning, moving of data to a wastebasket
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present disclosure provides a distributed storage method, apparatus, device and machine-readable storage medium, the method comprising: responding to a data storage request, dividing a data packet to be stored into a plurality of data blocks, calculating characteristic information of each data block, and storing the characteristic information of each data block to each storage node of a storage cluster in a fragmentation manner; comparing each data block divided by the data packet to be stored with each data block stored in the storage cluster according to the characteristic information; a pointer is set. According to the technical scheme, the deduplication storage system is constructed in a distributed storage mode, the characteristic information and the data blocks of the deduplication storage system are stored in each storage node of the storage cluster in a fragmentation mode, the deduplication storage system reserves the high storage utilization rate of a deduplication storage mode, meanwhile, the multiple storage nodes concurrently process services in the data storage process at least, load balance can be achieved, and the upper limit of storage performance is improved.
Description
Technical Field
The present disclosure relates to the field of communications technologies, and in particular, to a distributed storage method, apparatus, device, and machine-readable storage medium.
Background
Deduplication (Data Deduplication ): a technique for saving data storage space. A plurality of repeated data are stored in the storage system, the data occupy a large amount of hard disk space, and only one copy of data can be stored by using a repeated data deleting technology, so that the storage utilization rate is effectively improved.
The existing deduplication storage system is a storage system of a single machine/a single node/a backup all-in-one machine, and has the problems of large capacity expansion limitation, data loss due to faults and performance bottleneck. Specifically, the maximum capacity of a single storage server is determined by the number of server disks and the capacity of a single disk, capacity expansion can be performed only by adding a hard disk, and longitudinal expansion has the limitations of maximum limit, shutdown, complex operation and the like; the deduplication pool is composed of a fingerprint database, data blocks and fingerprint indexes, and once fingerprints and data blocks are damaged in a single storage server, data cannot be recovered; in the storage process, reading, writing and network bandwidth are three major factors influencing a storage window, and under the condition of concurrent storage operation, a single storage server can become the performance bottleneck of the storage window and is lack of load balancing.
Disclosure of Invention
In view of the above, the present disclosure provides a distributed storage method, a distributed storage apparatus, an electronic device, and a machine-readable storage medium, so as to at least improve one of the above technical problems.
The specific technical scheme is as follows:
the disclosure provides a distributed storage method, which is applied to a storage cluster, and the method comprises the following steps: responding to a data storage request, dividing a data packet to be stored into a plurality of data blocks, calculating characteristic information of each data block, and storing the characteristic information of each data block to each storage node of a storage cluster in a fragmentation mode; comparing each data block divided by the data packet to be stored with each data block stored in the storage cluster according to the characteristic information; setting a pointer pointing to the data block stored in the storage cluster to be associated with the characteristic information associated with the data to be stored according to the comparison result that the data block to be stored associated with the characteristic information is the same as the data block stored in the storage cluster; and storing the data block to be stored to the storage cluster according to a comparison result that the data block to be stored associated with the characteristic information is different from each data block stored in the storage cluster, and setting a pointer pointing to the data block to be stored to the storage cluster to be associated with the characteristic information associated with the data to be stored.
As a technical solution, the dividing a data packet to be stored into a plurality of data blocks in response to a data storage request, calculating feature information of each data block, and storing the feature information of each data block to each storage node of a storage cluster in a fragmented manner includes: and recording the characteristic information of each data block and the association relation between the characteristic information and the data packet to the characteristic index.
As a technical solution, in response to a data reading request, obtaining feature information of each data block associated with a data packet to be read from a feature index; inquiring matched characteristic information at each storage node of the storage cluster according to the characteristic information of each data block associated with the data packet to be read; finding and returning each data block related to the data packet to be read according to the pointer of the matched characteristic information; and restoring and generating the data packet to be read according to the returned data blocks.
As a technical scheme, backup data is generated, wherein the backup data backups characteristic information and correspondingly set pointers of distributed storage of each storage node of a storage cluster; if the storage nodes of the storage cluster are changed, storing characteristic information and correspondingly set pointers in a distributed manner according to the changed storage nodes of the storage cluster of the pre-backup data; the change of the storage nodes of the storage cluster comprises adding storage nodes or reducing storage nodes or replacing storage nodes.
The present disclosure also provides a distributed storage apparatus, which is applied to a storage cluster, and the apparatus includes: the characteristic module is used for responding to a data storage request, dividing a data packet to be stored into a plurality of data blocks, calculating characteristic information of each data block, and storing the characteristic information of each data block to each storage node of the storage cluster in a fragmentation mode; the comparison module is used for comparing each data block divided by the data packet to be stored with each data block stored in the storage cluster according to the characteristic information; the storage module is used for setting a pointer pointing to the data block stored in the storage cluster to be associated with the characteristic information associated with the data to be stored according to the comparison result that the data block to be stored which is associated with the characteristic information is the same as the data block stored in the storage cluster; the storage module is further configured to store the data block to be stored to the storage cluster according to a comparison result that the data block to be stored associated with the feature information is different from each data block stored in the storage cluster, and set a pointer pointing to the data block to be stored to the storage cluster to be associated with the feature information associated with the data block to be stored.
As a technical solution, the dividing a data packet to be stored into a plurality of data blocks in response to a data storage request, calculating feature information of each data block, and storing the feature information of each data block to each storage node of a storage cluster in a fragmented manner includes: and recording the characteristic information of each data block and the association relation between the characteristic information and the data packet to the characteristic index.
As a technical solution, the reading module is configured to respond to a data reading request, and obtain feature information associated with each data block of a data packet to be read from a feature index; the query module is used for querying matched characteristic information in each storage node of the storage cluster according to the characteristic information of each data block associated with the data packet to be read; the transmission module is used for finding and returning each data block related to the data packet to be read according to the pointer of the matched characteristic information; and the data module is used for recovering and generating the data packet to be read according to the returned data blocks.
As a technical solution, the backup module is configured to generate backup data, where the backup data backs up feature information and correspondingly set pointers stored in a distributed manner in each storage node of a storage cluster; the recovery module is used for storing the characteristic information and the correspondingly set pointer in a distributed manner according to the changed storage nodes of the storage cluster of the pre-backup data if the storage nodes of the storage cluster are changed; the change of the storage nodes of the storage cluster comprises adding storage nodes or reducing storage nodes or replacing storage nodes.
The present disclosure also provides an electronic device including a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor, the processor executing the machine-executable instructions to implement the foregoing distributed storage method.
The present disclosure also provides a machine-readable storage medium having stored thereon machine-executable instructions that, when invoked and executed by a processor, cause the processor to implement the aforementioned distributed storage method.
The technical scheme provided by the disclosure at least brings the following beneficial effects:
the deduplication storage system is constructed in a distributed storage mode, and the feature information and the data blocks of the deduplication storage system are stored in each storage node of the storage cluster in a fragmentation mode, so that the deduplication storage system reserves the high storage utilization rate of a deduplication storage mode, simultaneously, at least multiple storage nodes concurrently process services in the data storage process, load balance can be achieved, and the upper limit of storage performance is improved.
Drawings
In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments of the present disclosure or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present disclosure, and other drawings can be obtained by those skilled in the art according to the drawings of the embodiments of the present disclosure.
FIG. 1 is a flow chart of a distributed storage method in one embodiment of the present disclosure;
FIG. 2 is a block diagram of a distributed storage apparatus in one embodiment of the present disclosure;
fig. 3 is a hardware configuration diagram of an electronic device in an embodiment of the present disclosure.
Detailed Description
The terminology used in the embodiments of the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein is meant to encompass any and all possible combinations of one or more of the associated listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information in the embodiments of the present disclosure, such information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present disclosure. Depending on the context, moreover, the word "if" as used may be interpreted as "at … …" or "when … …" or "in response to a determination".
The present disclosure provides a distributed storage method, apparatus, electronic device, and machine-readable storage medium to at least improve one of the above technical problems.
The specific technical scheme is as follows.
In one embodiment, the present disclosure provides a distributed storage method applied to a storage cluster, where the method includes: responding to a data storage request, dividing a data packet to be stored into a plurality of data blocks, calculating characteristic information of each data block, and storing the characteristic information of each data block to each storage node of a storage cluster in a fragmentation mode; comparing each data block divided by the data packet to be stored with each data block stored in the storage cluster according to the characteristic information; setting a pointer pointing to the data block stored in the storage cluster to be associated with the characteristic information associated with the data to be stored according to the comparison result that the data block to be stored associated with the characteristic information is the same as the data block stored in the storage cluster; and storing the data block to be stored to the storage cluster according to a comparison result that the data block to be stored associated with the characteristic information is different from each data block stored in the storage cluster, and setting a pointer pointing to the data block to be stored to the storage cluster to be associated with the characteristic information associated with the data to be stored.
Specifically, as shown in fig. 1, the method comprises the following steps:
step S11, in response to the data storage request, dividing the data packet to be stored into a plurality of data blocks, calculating characteristic information of each data block, and storing the characteristic information of each data block to each storage node of the storage cluster in a fragmented manner.
Step S12, comparing each data block divided by the data packet to be stored with each data block stored in the storage cluster according to the characteristic information.
Step S131, according to the comparison result that the data block to be stored associated with the characteristic information is the same as the stored data block of a storage cluster, setting a pointer pointing to the stored data block of the storage cluster to be associated with the characteristic information associated with the data block to be stored.
Step S132, according to the comparison result that the data block to be stored associated with the characteristic information is different from the data blocks stored in the storage cluster, storing the data block to be stored in the storage cluster, and setting a pointer pointing to the data block to be stored in the storage cluster to be associated with the characteristic information associated with the data block to be stored.
The deduplication storage system is constructed in a distributed storage mode, and the feature information and the data blocks of the deduplication storage system are stored in each storage node of the storage cluster in a fragmentation mode, so that the deduplication storage system reserves the high storage utilization rate of a deduplication storage mode, simultaneously, at least multiple storage nodes concurrently process services in the data storage process, load balance can be achieved, and the upper limit of storage performance is improved.
In one embodiment, the dividing, in response to a data storage request, a data packet to be stored into a plurality of data blocks, calculating feature information of each data block, and storing the feature information of each data block in a fragmented manner to each storage node of a storage cluster includes: and recording the characteristic information of each data block and the association relation between the characteristic information and the data packet to the characteristic index.
In one embodiment, in response to a data reading request, acquiring characteristic information associated with each data block of a data packet to be read from a characteristic index; inquiring matched characteristic information at each storage node of the storage cluster according to the characteristic information of each data block related to the data packet to be read; finding and returning each data block related to the data packet to be read according to the pointer of the matched characteristic information; and restoring and generating the data packet to be read according to the returned data blocks.
In one embodiment, backup data is generated, wherein the backup data backups characteristic information and correspondingly set pointers of distributed storage of each storage node of a storage cluster; if the storage nodes of the storage cluster are changed, storing characteristic information and correspondingly set pointers in a distributed manner according to the changed storage nodes of the storage cluster of the pre-backup data; the change of the storage nodes of the storage cluster comprises adding storage nodes or reducing storage nodes or replacing storage nodes.
In one embodiment, the characteristic information of the data block is a fingerprint of the data block, and the fingerprint refers to a hash value of the data block calculated by a set algorithm, such as a hash algorithm. And taking the fingerprint hash value as the unique identifier of the data block, and when the fingerprints of two data blocks are the same, considering the two data blocks as the same data block. Other parameters with uniqueness may also be used as the characteristic information of the data block.
The characteristic index stores the association relationship between the fingerprints of the data blocks divided by the data packet and the data packet, and records the fingerprints and the association relationship between the fingerprints and the corresponding data packet in sequence in the characteristic index, so that the data block blocking information and the corresponding fingerprints of the data packet can be obtained through the characteristic index.
In one embodiment, backup software installed on servers in a storage cluster may be utilized. A management server component and a storage server component of backup software are deployed on a plurality of storage servers to form a distributed deduplication cluster, and the cluster does not distinguish master nodes from slave nodes.
In one embodiment, the present disclosure also provides a distributed storage apparatus, as shown in fig. 2, applied to a storage cluster, the apparatus including: the characteristic module 21 is configured to respond to a data storage request, divide a data packet to be stored into a plurality of data blocks, calculate characteristic information of each data block, and store the characteristic information of each data block to each storage node of a storage cluster in a partitioned manner; the comparison module 22 is configured to compare each data block divided by the data packet to be stored with each data block stored in the storage cluster according to the feature information; the storage module 23 is configured to set, according to a comparison result that a data block to be stored associated with feature information and a data block already stored in a storage cluster are the same, a pointer pointing to the data block already stored in the storage cluster to be associated with the feature information associated with the data block to be stored; the storage module is further configured to store the data block to be stored to the storage cluster according to a comparison result that the data block to be stored associated with the feature information is different from each data block stored in the storage cluster, and set a pointer pointing to the data block to be stored to the storage cluster to be associated with the feature information associated with the data block to be stored.
In one embodiment, the dividing, in response to a data storage request, a data packet to be stored into a plurality of data blocks, calculating feature information of each data block, and storing the feature information of each data block in a fragmented manner to each storage node of a storage cluster includes: and recording the characteristic information of each data block and the association relation between the characteristic information and the data packet to the characteristic index.
In one embodiment, the reading module is configured to, in response to a data reading request, obtain characteristic information associated with each data block of a data packet to be read from a characteristic index; the query module is used for querying matched characteristic information in each storage node of the storage cluster according to the characteristic information of each data block associated with the data packet to be read; the transmission module is used for finding and returning each data block related to the data packet to be read according to the pointer of the matched characteristic information; and the data module is used for recovering and generating the data packet to be read according to the returned data blocks.
In one embodiment, the backup module is configured to generate backup data, where the backup data backs up feature information and a correspondingly set pointer that are distributively stored in each storage node of the storage cluster; the recovery module is used for storing the characteristic information and the correspondingly set pointer in a distributed manner according to the changed storage nodes of the storage cluster of the pre-backup data if the storage nodes of the storage cluster are changed; the change of the storage nodes of the storage cluster comprises adding storage nodes or reducing storage nodes or replacing storage nodes.
The present disclosure also provides an electronic device including a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor, the processor executing the machine-executable instructions to implement the foregoing distributed storage method.
The device embodiments are the same or similar to the corresponding method embodiments and are not described herein again.
In one embodiment, the present disclosure provides an electronic device, which includes a processor and a machine-readable storage medium, where the machine-readable storage medium stores machine-executable instructions capable of being executed by the processor, and the processor executes the machine-executable instructions to implement the foregoing distributed storage method, and from a hardware level, a schematic diagram of a hardware architecture may be shown in fig. 3.
In one embodiment, the present disclosure provides a machine-readable storage medium having stored thereon machine-executable instructions that, when invoked and executed by a processor, cause the processor to implement the aforementioned distributed storage method.
Here, a machine-readable storage medium may be any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and so forth. For example, the machine-readable storage medium may be: a RAM (random Access Memory), a volatile Memory, a non-volatile Memory, a flash Memory, a storage drive (e.g., a hard drive), a solid state drive, any type of storage disk (e.g., an optical disk, a dvd, etc.), or similar storage medium, or a combination thereof.
The systems, devices, modules or units described in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the various elements may be implemented in the same one or more software and/or hardware implementations in practicing the disclosure.
As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.
The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Furthermore, these computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media (which may include, but is not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above description is only an embodiment of the present disclosure, and is not intended to limit the present disclosure. Various modifications and variations of this disclosure will occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present disclosure should be included in the scope of the claims of the present disclosure.
Claims (10)
1. A distributed storage method applied to a storage cluster, the method comprising:
responding to a data storage request, dividing a data packet to be stored into a plurality of data blocks, calculating characteristic information of each data block, and storing the characteristic information of each data block to each storage node of a storage cluster in a fragmentation mode;
comparing each data block divided by the data packet to be stored with each data block stored in the storage cluster according to the characteristic information;
setting a pointer pointing to the data block stored in the storage cluster to be associated with the characteristic information associated with the data to be stored according to the comparison result that the data block to be stored associated with the characteristic information is the same as the data block stored in the storage cluster;
and storing the data block to be stored to the storage cluster according to a comparison result that the data block to be stored associated with the characteristic information is different from each data block stored in the storage cluster, and setting a pointer pointing to the data block to be stored to the storage cluster to be associated with the characteristic information associated with the data to be stored.
2. The method according to claim 1, wherein the dividing a data packet to be stored into a plurality of data blocks in response to a data storage request, calculating feature information of each data block, and storing the feature information of each data block in a fragmented manner to each storage node of a storage cluster comprises:
and recording the characteristic information of each data block and the association relation between the characteristic information and the data packet to the characteristic index.
3. The method of claim 2, further comprising:
responding to a data reading request, and acquiring characteristic information of each data block related to a data packet to be read from the characteristic index;
inquiring matched characteristic information at each storage node of the storage cluster according to the characteristic information of each data block related to the data packet to be read;
finding and returning each data block related to the data packet to be read according to the pointer of the matched characteristic information;
and restoring and generating the data packet to be read according to the returned data blocks.
4. The method of claim 1, further comprising:
generating backup data, wherein the backup data backups characteristic information and correspondingly set pointers of distributed storage of each storage node of a storage cluster;
if the storage nodes of the storage cluster change, storing characteristic information and correspondingly set pointers in a distributed manner according to the changed storage nodes of the storage cluster with the pre-backup data;
the change of the storage nodes of the storage cluster comprises adding storage nodes or reducing storage nodes or replacing storage nodes.
5. A distributed storage apparatus, applied to a storage cluster, the apparatus comprising:
the characteristic module is used for responding to a data storage request, dividing a data packet to be stored into a plurality of data blocks, calculating characteristic information of each data block, and storing the characteristic information of each data block to each storage node of the storage cluster in a fragmentation mode;
the comparison module is used for comparing each data block divided by the data packet to be stored with each data block stored in the storage cluster according to the characteristic information;
the storage module is used for setting a pointer pointing to the data block stored in the storage cluster to be associated with the characteristic information associated with the data to be stored according to the comparison result that the data block to be stored which is associated with the characteristic information is the same as the data block stored in the storage cluster;
the storage module is further configured to store the data block to be stored to the storage cluster according to a comparison result that the data block to be stored associated with the feature information is different from each data block stored in the storage cluster, and set a pointer pointing to the data block to be stored to the storage cluster to be associated with the feature information associated with the data block to be stored.
6. The apparatus according to claim 5, wherein the, in response to the data storage request, dividing the data packet to be stored into a plurality of data blocks, calculating feature information of each data block, and storing the feature information of each data block in a fragmented manner to each storage node of the storage cluster, includes:
and recording the characteristic information of each data block and the association relation between the characteristic information and the data packet to the characteristic index.
7. The apparatus of claim 6, further comprising:
the reading module is used for responding to a data reading request and acquiring the characteristic information of each data block related to the data packet to be read from the characteristic index;
the query module is used for querying matched characteristic information in each storage node of the storage cluster according to the characteristic information of each data block associated with the data packet to be read;
the transmission module is used for finding and returning each data block related to the data packet to be read according to the pointer of the matched characteristic information;
and the data module is used for recovering and generating the data packet to be read according to the returned data blocks.
8. The apparatus of claim 5, further comprising:
the backup module is used for generating backup data, and the backup data backups characteristic information and correspondingly set pointers stored in a distributed manner in each storage node of the storage cluster;
the recovery module is used for storing the characteristic information and the correspondingly set pointer in a distributed manner according to the changed storage nodes of the storage cluster of the pre-backup data if the storage nodes of the storage cluster are changed;
the change of the storage nodes of the storage cluster comprises adding storage nodes or reducing storage nodes or replacing storage nodes.
9. An electronic device, comprising: a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor to perform the method of any one of claims 1 to 4.
10. A machine-readable storage medium having stored thereon machine-executable instructions which, when invoked and executed by a processor, cause the processor to implement the method of any of claims 1-4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210329259.5A CN114721594A (en) | 2022-03-31 | 2022-03-31 | Distributed storage method, device, equipment and machine readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210329259.5A CN114721594A (en) | 2022-03-31 | 2022-03-31 | Distributed storage method, device, equipment and machine readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114721594A true CN114721594A (en) | 2022-07-08 |
Family
ID=82239195
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210329259.5A Pending CN114721594A (en) | 2022-03-31 | 2022-03-31 | Distributed storage method, device, equipment and machine readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114721594A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116991329A (en) * | 2023-09-25 | 2023-11-03 | 深圳市明泰智能技术有限公司 | Data redundancy prevention method and system for self-service terminal equipment |
CN117688106A (en) * | 2024-02-04 | 2024-03-12 | 广东东华发思特软件有限公司 | Efficient distributed data storage and retrieval system, method and storage medium |
-
2022
- 2022-03-31 CN CN202210329259.5A patent/CN114721594A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116991329A (en) * | 2023-09-25 | 2023-11-03 | 深圳市明泰智能技术有限公司 | Data redundancy prevention method and system for self-service terminal equipment |
CN116991329B (en) * | 2023-09-25 | 2023-12-08 | 深圳市明泰智能技术有限公司 | Data redundancy prevention method and system for self-service terminal equipment |
CN117688106A (en) * | 2024-02-04 | 2024-03-12 | 广东东华发思特软件有限公司 | Efficient distributed data storage and retrieval system, method and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110471795B (en) | Block chain state data recovery method and device and electronic equipment | |
CN107807794B (en) | Data storage method and device | |
US8782011B2 (en) | System and method for scalable reference management in a deduplication based storage system | |
CN106874348B (en) | File storage and index method and device and file reading method | |
CN111444196B (en) | Method, device and equipment for generating Hash of global state in block chain type account book | |
CN114721594A (en) | Distributed storage method, device, equipment and machine readable storage medium | |
CN109032803B (en) | Data processing method and device and client | |
CN111444192B (en) | Method, device and equipment for generating Hash of global state in block chain type account book | |
CN111522502B (en) | Data deduplication method and device, electronic equipment and computer-readable storage medium | |
CN109145053B (en) | Data processing method and device, client and server | |
CN113535670B (en) | Virtual resource mirror image storage system and implementation method thereof | |
CN114936188A (en) | Data processing method and device, electronic equipment and storage medium | |
CN108399175B (en) | Data storage and query method and device | |
CN107145306B (en) | Distributed data storage method and system | |
CN112800057B (en) | Fingerprint table management method and device | |
CN115756955A (en) | Data backup and data recovery method and device and computer equipment | |
CN114268501B (en) | Data processing method, firewall generating method, computing device and storage medium | |
CN114785662B (en) | Storage management method, device, equipment and machine-readable storage medium | |
CN114647658A (en) | Data retrieval method, device, equipment and machine-readable storage medium | |
CN115421856A (en) | Data recovery method and device | |
CN109791541B (en) | Log serial number generation method and device and readable storage medium | |
CN113419792A (en) | Event processing method and device, terminal equipment and storage medium | |
CN109032804B (en) | Data processing method and device and server | |
CN112565373B (en) | Method and device for removing duplicate of mirror image file | |
CN117539690B (en) | Method, device, equipment, medium and product for merging and recovering multi-disk data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |