CN112052124A - Data redundancy method and distributed storage cluster - Google Patents

Data redundancy method and distributed storage cluster Download PDF

Info

Publication number
CN112052124A
CN112052124A CN202011025578.4A CN202011025578A CN112052124A CN 112052124 A CN112052124 A CN 112052124A CN 202011025578 A CN202011025578 A CN 202011025578A CN 112052124 A CN112052124 A CN 112052124A
Authority
CN
China
Prior art keywords
sub
data
disk
management module
interval
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011025578.4A
Other languages
Chinese (zh)
Other versions
CN112052124B (en
Inventor
苏伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Macrosan Technologies Co Ltd
Original Assignee
Macrosan Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Macrosan Technologies Co Ltd filed Critical Macrosan Technologies Co Ltd
Priority to CN202011025578.4A priority Critical patent/CN112052124B/en
Publication of CN112052124A publication Critical patent/CN112052124A/en
Application granted granted Critical
Publication of CN112052124B publication Critical patent/CN112052124B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1469Backup restoration techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a data redundancy method and a distributed storage cluster. The distributed storage cluster writes data in a copy mode to ensure the data writing performance of the distributed storage cluster; and when the data is cooled, the data stored in the copy mode is converted into an erasure code mode for storage, so that the disk consumption is reduced, and the disk obtaining rate of the distributed storage cluster is improved.

Description

Data redundancy method and distributed storage cluster
Technical Field
The present application relates to the field of storage technologies, and in particular, to a data redundancy method and a distributed storage cluster.
Background
In distributed storage clusters, data redundancy techniques are often employed to ensure the reliability of data. The current data redundancy technology mainly comprises: replica technology and error correction code (EC) technology.
The duplication technology refers to copying and copying the same data into multiple copies, and respectively storing the copies in different fault domains of the distributed storage cluster. The technology is simple to realize, but the disk obtaining rate of the system is low. Taking 2 copies (i.e. the same data is stored in 2 copies in the distributed storage cluster) as an example, the disk yield is 50%.
The EC technology divides data into a plurality of data fragments, calculates check fragments according to the data fragments, and then writes each data fragment and check fragment into different fault domains. And the data fragments participating in the same check calculation and the check fragments obtained by calculation form an EC stripe. It can be seen that, in this technique, when data is written, check calculation is required, which undoubtedly increases system overhead and affects writing performance.
Disclosure of Invention
In view of this, the present application provides a data redundancy method and a distributed storage cluster, so as to reduce disk consumption and improve a disk availability of the distributed storage cluster on the premise of ensuring write performance.
In order to achieve the purpose of the application, the application provides the following technical scheme:
in a first aspect, the present application provides a data redundancy method, which is applied to a distributed storage cluster, where the distributed storage cluster includes at least one cluster node, each cluster node includes at least one disk for storing data, each disk is divided into a plurality of blocks according to a preset Block size, the distributed storage cluster is configured with at least one LUN, each LUN is divided into a plurality of logical sections according to a preset Segment size, each logical section is divided into a plurality of sub-logical sections according to a preset Block size, each cluster node deploys a corresponding disk management module for each disk on the node, each Segment corresponds to a write bitmap, each bit in the write bitmap is used to identify whether data has been written into the corresponding sub-logical section, each Segment also corresponds to a storage bitmap, and each bit in the storage bitmap is used to identify a storage manner of the data in the corresponding sub-logical section, the distributed storage cluster is written in an N copy mode, wherein N is greater than or equal to 2, and the method comprises the following steps:
when monitoring that the time length of the target Segment which is not accessed reaches the preset time length, a target cluster node in the at least one cluster node acquires a writing bitmap and a storage bitmap corresponding to the target Segment;
the target cluster node traverses the writing bitmap and the storage bitmap and finds N first sub-logic intervals with written data and a storage mode of a copy mode;
the target cluster node selects a target disk from N first disks used for storing the data corresponding to the target Segment;
for each first sub-logic interval, the target cluster node sends a read command for reading data corresponding to the first sub-logic interval to a target disk management module corresponding to the target disk;
the target cluster node calculates check data according to the data of each first sub-logic interval returned by the target disk management module;
the target cluster node sends a write command for indicating to write the verification data into a second disk to a second disk management module corresponding to the second disk, wherein the second disk is a disk which is specified in advance and used for storing the verification data corresponding to the target Segment;
the second disk management module allocates a first Block for the check data from the second disk, and writes the check data into the first Block;
and for each first disk, the target cluster node sends a deletion command for indicating deletion of the designated data copy to a first disk management module corresponding to the first disk, so that the data corresponding to the N first sub-logic intervals are stored in the N first disks respectively.
Optionally, after the target cluster node sends, to each first disk, a delete command for instructing to delete the designated data copy to the first disk management module corresponding to the first disk, the method further includes:
and for each first sub-logic interval, the target cluster node updates the storage mode identified by the corresponding bit of the first sub-logic interval in the storage bitmap to be an erasure code mode.
Optionally, the method further includes:
when the target cluster node receives a write request needing to be written into the target Segment, determining each second sub-logic interval in the target Segment related to the write request, and splitting the write request into sub-write requests aiming at each second sub-logic interval;
for each second sub-logical interval, the following processing is performed:
the target cluster node inquires the writing bitmap of the target Segment and determines whether data is written into the second sub-logic interval;
if the data is written into the second sub-logic interval, the target cluster node inquires the storage bitmap of the target Segment and determines the storage mode of the data corresponding to the second sub-logic interval;
if the storage mode of the data corresponding to the second sub-logic interval is an erasure code mode, the target cluster node sends a sub-write request aiming at the second sub-logic interval to each first disk management module respectively, and the sub-write request carries the data to be written into the second sub-logic interval and an erasure code mark;
when the first disk management module determines that the sub-write request carries an erasure code mark, allocating a second Block to the second sub-logic interval, writing data into the second Block, and recording a mapping relation between the second sub-logic interval and the second Block;
and the target cluster node updates the storage mode identified by the corresponding bit of the second sub-logic interval in the storage bitmap to be a copy mode.
Optionally, for each first disk, the target cluster node sends, to the first disk management module corresponding to the first disk, a delete command for instructing to delete the designated data copy, so that after the data corresponding to the N first sub-logical intervals are stored in the N first disks, respectively, the method further includes:
the first disk management module records the mapping relation between a first sub-logic interval to which the deleted data copy belongs and a Block to which the undeleted data copy belongs, and adds an association mark to the mapping relation;
when the target cluster node receives a read request needing to read the target Segment, determining each third sub-logic interval in the target Segment related to the read request, and splitting the read request into sub-read requests aiming at each third sub-logic interval;
for each third sub-logical interval, the following processing is performed:
the target cluster node inquires the storage bitmap and determines a storage mode of data corresponding to a third sub-logic interval;
if the storage mode of the data corresponding to the third sub-logic interval is an erasure code mode, the target cluster node sends a first sub-read request aiming at the third sub-logic interval to each first disk management module respectively, and the first sub-read request does not include an association mark;
and when determining that the mapping relation between the locally recorded third sub-logic interval and the third Block does not have the associated mark, the first disk management module reads the corresponding data from the third Block and returns the corresponding data to the target cluster node.
Optionally, the write command includes a start address of each first sub-logic interval, and after the second disk management module allocates a first Block to the check data from the second disk and writes the check data into the first Block, the method further includes:
the second disk management module records the mapping relation between each first sub-logic interval and the first Block and adds an association mark;
after the target cluster node sends the first sub-read request for the third sub-logic interval to each first disk management module, the method further includes:
if the data of the third sub-logic interval is not read, the target cluster node sends a second sub-read request aiming at the third sub-logic interval to each first disk management module, wherein the second sub-read request carries an association mark;
when determining that the mapping relation between the locally recorded third sub-logic interval and the third Block has an association mark, the first disk management module reads corresponding data from the third Block and returns the corresponding data to the target cluster node;
the target cluster node sends a third sub-read request aiming at the third sub-logic interval to the second disk management module;
the second disk management module reads check data from a fourth Block and returns the check data to the target cluster node according to a locally recorded mapping relation between the third sub-logic interval and the fourth Block;
and the target cluster node performs verification calculation according to the data returned by the first disk management module and the verification data returned by the second disk management module to obtain the data corresponding to the third sub-logic interval.
Optionally, the method further includes:
when the target cluster node determines that a failed disk exists in the N first disks, the following processing is performed for each fourth sub-logic interval of the data written in the target Segment:
the target cluster node reads data corresponding to the fourth sub-logic interval;
the target cluster node sends a write command for indicating to write data corresponding to the fourth sub-logic interval to each third disk management module, wherein the third disk management module refers to a disk management module corresponding to a disk except for a failed disk and a disk returning data corresponding to the fourth sub-logic interval in the first disk and the second disk;
when determining that a mapping relation between a locally recorded fourth sub-logic interval and a fifth Block has an association mark, a third disk management module allocates a sixth Block to the fourth sub-logic interval and writes data corresponding to the fourth sub-logic interval into the sixth Block;
and the third disk management module updates the mapping relation between the fourth sub-logic interval and the fifth Block into the mapping relation between the fourth sub-logic interval and the sixth Block, and removes the corresponding association mark.
Optionally, after the third disk management module updates the mapping relationship between the fourth sub-logic interval and the fifth Block to the mapping relationship between the fourth sub-logic interval and the sixth Block, the method further includes:
the third disk management module judges whether a mapping relation comprising the fifth Block still exists;
if not, the fifth Block is recovered.
In a second aspect, the present application provides a distributed storage cluster, where the distributed storage cluster includes at least one cluster node, each cluster node includes at least one disk for storing data, each disk is divided into multiple blocks according to a preset Block size, the distributed storage cluster is configured with at least one LUN, each LUN is divided into multiple logical intervals according to a preset Segment size, each logical interval is divided into multiple sub-logical intervals according to a preset Block size, each cluster node deploys a corresponding disk management module for each disk on the node, each Segment corresponds to a write bitmap, each bit in the write bitmap is used to identify whether data has been written into the corresponding sub-logical interval, each Segment further corresponds to a storage bitmap, each bit in the storage bitmap is used to identify a storage manner of the data in the corresponding sub-logical interval, the distributed storage cluster is written in by adopting an N copy mode, wherein N is more than or equal to 2;
the target cluster node in the at least one cluster node is used for acquiring a write bitmap and a storage bitmap corresponding to the target Segment when the monitored duration of not accessing the target Segment reaches a preset duration; traversing the writing bitmap and the storage bitmap, and finding N first sub-logic intervals in which data are written and the storage mode is a copy mode; selecting a target disk from N first disks for storing data corresponding to the target Segment; for each first sub-logic interval, sending a read command for reading data corresponding to the first sub-logic interval to a target disk management module corresponding to the target disk; calculating check data according to the data of each first sub-logic interval returned by the target disk management module; sending a write command for indicating to write the verification data into a second disk to a second disk management module corresponding to the second disk, wherein the second disk is a disk which is specified in advance and used for storing the verification data corresponding to the target Segment;
the second disk management module is configured to allocate a first Block to the check data from the second disk, and write the check data into the first Block;
the target cluster node is further configured to send, to each first disk, a deletion command for instructing to delete the designated data copy to the first disk management module corresponding to the first disk, so that the data corresponding to the N first sub-logical intervals are stored in the N first disks, respectively.
Optionally, the target cluster node is further configured to, for each first sub-logic interval, update a storage manner identified by a corresponding bit in the storage bitmap of the first sub-logic interval to be an erasure code manner.
Optionally, the target cluster node is further configured to, when receiving a write request that needs to be written into the target Segment, determine each second sub-logic interval in the target Segment that the write request relates to, and split the write request into sub-write requests for each second sub-logic interval; for each second sub-logic interval, inquiring the writing bitmap of the target Segment, and determining whether the second sub-logic interval is written with data; if the data is written into the second sub-logic interval, inquiring the storage bitmap of the target Segment, and determining the storage mode of the data corresponding to the second sub-logic interval; if the storage mode of the data corresponding to the second sub-logic interval is an erasure code mode, respectively sending a sub-write request aiming at the second sub-logic interval to each first disk management module, wherein the sub-write request carries the data to be written into the second sub-logic interval and an erasure code mark;
the first disk management module is further configured to allocate a second Block to the second sub-logic interval when it is determined that the sub-write request carries the erasure code flag, write data into the second Block, and record a mapping relationship between the second sub-logic interval and the second Block;
and the target cluster node is further configured to update the storage mode identified by the corresponding bit in the storage bitmap of the second sub-logic interval to be a copy mode.
Optionally, the first disk management module is further configured to record a mapping relationship between a first sub-logic interval to which the deleted data copy belongs and a Block in which the undeleted data copy is located, and add an association flag to the mapping relationship;
the target cluster node is further configured to, when receiving a read request that requires reading of the target Segment, determine each third sub-logic interval in the target Segment that the read request relates to, and split the read request into sub-read requests for each third sub-logic interval; inquiring the storage bitmap aiming at each third sub-logic interval, and determining a storage mode of data corresponding to the third sub-logic interval; if the storage mode of the data corresponding to the third sub-logic interval is an erasure code mode, respectively sending a first sub-read request aiming at the third sub-logic interval to each first disk management module, wherein the first sub-read request does not include an association mark;
and the first disk management module is further configured to, when it is determined that the mapping relationship between the locally recorded third sub-logic interval and the third Block does not have an association flag, read corresponding data from the third Block and return the corresponding data to the target cluster node.
Optionally, the second disk management module is further configured to record a mapping relationship between each first sub-logic interval and the first Block, and add an association flag;
the target cluster node is further configured to send a second sub-read request for the third sub-logic interval to each first disk management module if the data of the third sub-logic interval is not read, where the second sub-read request carries an association flag;
the first disk management module is further configured to, when it is determined that a mapping relationship between the locally recorded third sub-logic interval and a third Block has an association flag, read corresponding data from the third Block and return the corresponding data to the target cluster node;
the target cluster node is further configured to send a third sub-read request for the third sub-logic interval to the second disk management module;
the second disk management module is further configured to read check data from a fourth Block according to a locally recorded mapping relationship between the third sub-logic interval and the fourth Block, and return the check data to the target cluster node;
and the target cluster node is further configured to perform verification calculation according to the data returned by the first disk management module and the verification data returned by the second disk management module, so as to obtain data corresponding to the third sub-logic interval.
Optionally, the target cluster node is further configured to, when it is determined that a failed disk exists in the N first disks, read, for each fourth sub-logic interval in which data has been written in the target Segment, data corresponding to the fourth sub-logic interval; sending a write command for indicating to write data corresponding to the fourth sub-logic interval to each third disk management module, wherein the third disk management module refers to a disk management module corresponding to a disk except for the failed disk and the disk returning the data corresponding to the fourth sub-logic interval in the first disk and the second disk;
the third disk management module is configured to allocate a sixth Block to a fourth sub-logic interval when it is determined that a mapping relationship between the locally recorded fourth sub-logic interval and the fifth Block has an association flag, and write data corresponding to the fourth sub-logic interval into the sixth Block; and updating the mapping relation between the fourth sub-logic interval and the fifth Block into the mapping relation between the fourth sub-logic interval and the sixth Block, and removing the corresponding association mark.
Optionally, the third disk management module is further configured to determine whether a mapping relationship including the fifth Block still exists; if not, the fifth Block is recovered.
As can be seen from the above description, in the embodiment of the present application, the distributed storage cluster writes data in a duplicate manner, so as to ensure data writing performance of the distributed storage cluster; and when the data is cooled, the data stored in the copy mode is converted into an erasure code mode for storage, so that the disk consumption is reduced, and the disk obtaining rate of the distributed storage cluster is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic diagram of an exemplary illustrative distributed storage cluster employing replica techniques;
FIG. 2 is a schematic diagram illustrating a distributed storage cluster employing EC techniques;
FIG. 3 is a flow chart of a data redundancy method according to an embodiment of the present application;
fig. 4 is a schematic diagram illustrating a mapping relationship of Seg1 in a distributed storage cluster according to an embodiment of the present application;
fig. 5 is a schematic diagram illustrating a mapping relationship of Seg1 in a distributed storage cluster according to an embodiment of the present application;
FIG. 6 is a write request processing flow according to an embodiment of the present disclosure;
fig. 7 is a schematic diagram illustrating a mapping relationship of Seg1 in a distributed storage cluster according to an embodiment of the present application;
FIG. 8 is a flow diagram illustrating a read request processing flow according to an embodiment of the present application;
FIG. 9 is a flow chart illustrating a fault handling process according to an embodiment of the present application;
FIG. 10 illustrates a data reconstruction process according to an embodiment of the present application;
fig. 11 is a schematic diagram illustrating a mapping relationship of Seg1 in a distributed storage cluster according to an embodiment of the present application;
fig. 12 is a schematic diagram illustrating a mapping relationship of Seg1 in a distributed storage cluster according to an embodiment of the present application;
fig. 13 is a schematic diagram illustrating a mapping relationship of Seg1 in a distributed storage cluster according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application.
The terminology used in the embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the embodiments of the present application. As used in the embodiments of the present application, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used in the embodiments of the present application to describe various information, the information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, the negotiation information may also be referred to as second information, and similarly, the second information may also be referred to as negotiation information without departing from the scope of the embodiments of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
A distributed storage cluster typically includes a plurality of servers (also referred to as cluster nodes). Each cluster node includes at least one disk (also referred to as a data disk) for storing data. In the following description, unless otherwise specified, a disk refers to a data disk.
In order to ensure the reliability of data, data redundancy technology is usually used to store data. The existing data redundancy technologies mainly include: replica techniques and EC techniques.
Referring to FIG. 1, a schematic diagram of an exemplary distributed storage cluster employing replica technology is shown. As can be seen from fig. 1, the same data corresponds to 3 data copies (data copy 1 to data copy 3) in the distributed storage cluster, and are stored in cluster nodes Server1 to Server3, respectively. The storage mode is simple to implement, but the disk yield is low, for example, the disk yield of 3 copies shown in fig. 1 is only 33%.
Referring to fig. 2, a schematic diagram of an exemplary distributed storage cluster employing EC technology is shown. As can be seen from fig. 2, the original data is divided into 3 data fragments, which are stored in the cluster node servers 1-3, and meanwhile, the verification calculation is performed according to the 3 data fragments to obtain 1 verification fragment, which is stored in the cluster node Server 4. Due to the fact that verification calculation is needed in the storage mode, system overhead is increased, and writing performance is affected.
In view of the foregoing problems, an embodiment of the present application provides a data redundancy method, where when data is written, data is written in a copy manner, so as to ensure data writing performance. And after the data is cooled, converting the data stored in the copy mode into an Erasure Code (EC) mode for storage so as to reduce the consumption of the disk and improve the disk obtaining rate of the distributed storage cluster.
In order to make the objects, technical solutions and advantages of the embodiments of the present application more clear, the embodiments of the present application are described in detail below with reference to the accompanying drawings and specific embodiments:
referring to fig. 3, a flow chart of a data redundancy method according to an embodiment of the present application is shown. The flow is applied to a distributed storage cluster.
The distributed storage cluster includes at least one cluster node. Each cluster node includes at least one disk for storing data. Each disk is partitioned into blocks (which may be abbreviated as Blk) according to a preset Block size (e.g., 64 KB).
The distributed storage cluster is configured with at least one LUN, and each LUN is divided into a plurality of logical intervals according to a preset Segment size (e.g., 256MB), such as [ 0, 256MB ], [ 256MB, 512MB ], [ 512MB, 768MB ], and so on.
Each logical interval is divided into a plurality of sub-logical intervals, such as [ 0, 64KB ], [ 64KB, 128KB ], [ 128KB, 192KB ], and so on, according to a preset Block size (e.g., 64 KB).
During the data writing process, the distributed storage cluster establishes a mapping of logical resources to physical resources. The mapping mainly comprises: a mapping of logical intervals to segments (which may be abbreviated as segments), a mapping of segments to disks, and a mapping of sub-logical intervals to blocks.
Referring to fig. 4, a schematic diagram of a mapping from a logical resource to a physical resource is shown in the embodiment of the present application. The distributed storage cluster shown in FIG. 4 includes 4 cluster nodes (Server 1-Server 4), each including a disk for storing data. Of course, the number of disks in the cluster node is not limited in the embodiments of the present disclosure, and here, for simplicity of description, each cluster node includes one disk as an example.
In fig. 4, the data block a, the data block B, and the data block C are written in the logical interval [ 0, 256MB ] of the LUN1, and the data block sizes are 64KB (for convenience of description, 64KB sized data blocks are taken as an example). Wherein, the data block A has been written into the sub-logic interval [ 0, 64KB ]; data block B has been written in sub-logical interval [ 128KB, 192KB ]; data block C has been written in sub-logical intervals [ 192KB, 256KB ].
In fig. 4, the logical interval [ 0, 256MB ] has been mapped to Seg1, then the mapping table of the current LUN1 can be represented as:
LUN1→[0:Seg1]
wherein "0" is the start address of the logical interval [ 0, 256MB ] for identifying the logical interval.
In order to ensure data writing performance, the embodiment of the application writes data in a copy mode. As shown in fig. 4, if the OSD1 through OSD3 of the disks are respectively specified to store the corresponding data of Seg1, the mapping relationship from Seg1 to each disk can be expressed as:
Seg1→[OSD1;OSD2;OSD3]
since the data block a, the data block B, and the data block C are all located in Seg1, the 3 data block-corresponding data copies are stored in the OSDs 1 to 3, respectively. As shown in fig. 4, the data block A, B, C has corresponding data copies a1, B1, and C1 in the OSD 1; the corresponding data copies in the OSD2 are denoted as a2, B2, C2; the corresponding data copies are denoted as A3, B3, and C3 in the OSD 3.
The storage position of each data copy in the disk is determined by the disk management module corresponding to each disk. Here, the disk management module generally refers to a service process of the disk. And the cluster node deploys a corresponding disk management module for each disk on the node.
Taking OSD1 as an example, the disk management module corresponding to OSD1 allocates corresponding blocks to the sub-logic intervals to which the data copies a1, B1, and C1 belong, and establishes a mapping relationship between the sub-logic intervals and the blocks, which can be specifically expressed as:
Seg1→[0:Blk20;128KB:Blk10;192KB:Blk100]
the "0", "128 KB" and "192 KB" are respectively the start addresses of the sub-logical intervals [ 0, 64KB ], [ 128KB, 192KB ] and [ 192KB, 256KB ] to which the data copies a1, B1 and C1 belong, and are used for identifying the corresponding sub-logical intervals. Blk20 is a Block to which sub-logical intervals [ 0, 64KB ] are mapped in OSD 1; blk10 is the Block to which the sub-logical interval [ 128KB, 192KB ] is mapped in OSD 1; blk100 is the Block to which the sub-logical interval [ 192KB, 256KB ] is mapped in OSD 1.
Similarly, the mapping relationship of the sub-logical intervals [ 0, 64KB ], [ 128KB, 192KB ], [ 192KB, 256KB ] to which the data copies a2, B2, C2 belong in the OSD2 can be expressed as:
Seg1→[0:Blk10;128KB:Blk100;192KB:Blk40]
the mapping relationship among the sub-logical intervals [ 0, 64KB ], [ 128KB, 192KB ], and [ 192KB, 256KB ] to which the data copies a3, B3, C3 belong in the OSD3 can be expressed as:
Seg1→[0:Blk20;128KB:Blk100;192KB:Blk60]
based on the storage structure and the mapping relationship of the distributed storage cluster, a data redundancy process according to an embodiment of the present application is described below. As shown in fig. 3, the process may include the following steps:
step 301, when monitoring that the time length of the target Segment which is not accessed reaches the preset time length, the target cluster node acquires a write bitmap and a storage bitmap corresponding to the target Segment.
Here, the target cluster node may be any cluster node in the distributed storage cluster. Each cluster node monitors for segments of the LUN accessible to the node. Still taking the distributed storage cluster shown in FIG. 4 as an example, the Server1 is configured to process the access request of the LUN1, and then the Server1 can monitor the segments included in the LUN 1.
Herein, the Segment currently monitored by the target cluster node is referred to as a target Segment.
It is understood that the reference to target cluster node and target Segment is a name for convenience of differentiation and is not intended to be limiting.
When the target cluster node monitors that the target Segment is not accessed for a long time, the data corresponding to the target Segment is cold, and at the moment, subsequent operation is performed on the data corresponding to the target Segment, so that the influence on the distributed storage cluster is minimum.
And the target cluster node acquires a writing bitmap and a storage bitmap corresponding to the target Segment. Here, it should be noted that each Segment corresponds to one write bitmap and one storage bitmap. Each bit in the writing bitmap and the storage bitmap corresponds to a sub-logical interval of the Segment. Each bit of the writing bitmap is used for identifying whether data is written into the corresponding sub-logic interval or not; each bit of the storage bitmap is used for identifying the storage mode of the corresponding sub-logic interval. The storage mode can be a copy mode or an erasure code mode.
Still taking the Seg1 shown in fig. 4 as an example, if the sub-logical intervals [ 0, 64KB ], [ 128KB, 192KB ], [ 192KB, 256KB ] of the Seg1 have been written with data, the current Seg1 corresponding to the writing bitmap (denoted as Wmap) can be represented as:
Seg1→[Wmap:10110000……000]
the writing bitmap sequentially identifies whether sub-logical intervals [ 0, 64KB ], [ 64KB, 128KB ], [ 128KB, 192KB ], [ 192KB, 256KB ], … …, [ 256MB-64KB, 256MB ] have data written therein from left to right. Wherein, a "1" indicates that data has been written in the corresponding sub-logic interval; "0" indicates that no data is written in the corresponding sub-logical interval.
The storage bitmap (denoted as Smap) corresponding to current Seg1 may be represented as:
Seg1→[Smap:00000000……000]
the storage bitmap sequentially identifies sub-logical intervals [ 0, 64KB ], [ 64KB, 128KB ], [ 128KB, 192KB ], [ 192KB, 256KB ], … …, [ 256MB-64KB, 256MB ] from left to right, and corresponds to the storage mode of data. Wherein, 0 represents that the storage mode of the data of the corresponding sub-logic interval is a copy mode; "1" indicates that the storage method of the data corresponding to the sub-logical interval is an erasure code method. Here, the default storage method of Segment-associated data is a copy method.
Step 302, the target cluster node traverses the write bitmap and the storage bitmap of the target Segment, and finds N first sub-logical intervals in which data is written and the storage mode is a copy mode.
Here, N is the number of supported replicas of the distributed storage cluster, and N is greater than or equal to 2. For example, N ═ 2, indicates that the cluster supports 2 copies; n-3, indicating that the cluster supports 3 copies.
Here, the first sub-logical section refers to a sub-logical section to which data has been written and in which the storage method is the copy method. It is to be understood that the first sub-logical interval is only named for the convenience of distinction and is not intended to be limiting.
Still taking the distributed storage cluster shown in fig. 4 as an example, the distributed storage cluster supports 3 copies. The current Seg1 corresponds to a write bitmap of:
Seg1→[Wmap:10110000……000]
the storage bitmap corresponding to Seg1 is:
Seg1→[Smap:00000000……000]
the Server1 traverses the write bitmap and the storage bitmap of the Seg1, and it can be seen that the sub-logical intervals [ 0, 64KB ], [ 128KB, 192KB ], and [ 192KB, 256KB ] of the Seg1 have data written therein and the corresponding data are stored in a copy manner.
Step 303, the target cluster node selects a target disk from the N first disks for storing data corresponding to the target Segment.
Here, the disk for storing the data corresponding to the target Segment is referred to as a first disk. It is to be understood that the designation as first disk is merely for convenience of distinction and is not intended to be limiting.
As described above, the cluster specifies that the OSD1 to OSD3 of the disks store data corresponding to Seg1, and the OSD1 to OSD3 of the disks are all the first disks for storing data corresponding to Seg 1.
Since each first disk stores data of the target Segment stored in a copy manner, any first disk can be selected as the target disk. Of course, as an embodiment, the first disk included in the cluster node with the smaller load may be selected as the target disk. It is to be understood that the reference to target disk is merely a nomenclature for ease of distinction and is not intended to be limiting.
Still taking Seg1 corresponding to the first disk OSDs 1-OSD 3 as an example, the Server1 may select the OSD1 as the target disk.
Step 304, aiming at each first sub-logic interval, the target cluster node sends a read command for reading data corresponding to the first sub-logic interval to a target disk management module corresponding to the target disk.
Here, the disk management module corresponding to the target disk is referred to as a target disk management module. It is to be understood that the reference to the target disk management module is merely a nomenclature for ease of distinction and is not intended to be limiting.
In this step, the target cluster node sends a read command to the target disk management module for each first sub-logic interval.
For example, the Server1 sends read commands to the OSD1 corresponding to the disk management module for the sub-logical intervals [ 0, 64KB ], [ 128KB, 192KB ], [ 192KB, 256KB ] of the Seg 1. The OSD1 corresponds to the mapping relationship between the local recorded sub-logic interval and Block:
Seg1→[0:Blk20;128KB:Blk10;192KB:Blk100]
read data copy a1 from Blk20 to which sub-logical interval [ 0, 64KB ] is mapped; read data copy B1 from Blk10 mapped [ 128KB, 192KB ]; the data copy C1 is read from Blk100 mapped [ 192KB, 256KB ].
And 305, calculating verification data by the target cluster node according to the data of each first sub-logic interval returned by the target disk management module.
Still taking the data copies a1 (corresponding to the data block a), B1 (corresponding to the data block B), and C1 (corresponding to the data block C) returned by the OSD1 corresponding to the disk management module as examples, the Server1 performs check calculation on the data block A, B, C according to a preset erasure code algorithm to obtain check data P.
Step 306, the target cluster node sends a write command for instructing to write the check data into the second disk to a second disk management module corresponding to the second disk. The write command includes a start address of each first sub-logical interval.
Here, the second disk is a disk designated in advance for storing the verification data corresponding to the target Segment. For example, the designated OSD4 stores Seg1 corresponding check data.
Here, the disk management module corresponding to the second disk is referred to as a second disk management module.
It is to be understood that the names of the second disk and the second disk management module are only for convenience of distinguishing and are not intended to be limiting.
Still taking the OSD4 as the second disk as an example, the Server1 sends a write command to the OSD4 corresponding to the disk management module, where the write command is used to instruct the OSD4 to write the verification data P into the OSD4, and includes a start address of a sub-logical interval [ 0, 64KB ] to which the data block a participates in the verification data calculation, a start address of a sub-logical interval [ 128KB, 192KB ] to which the data block C belongs, and a start address of a sub-logical interval [ 192KB, 256KB ] to which the data block C belongs.
And 307, the second disk management module allocates a first Block for the check data from the second disk, and writes the check data into the first Block.
Here, the first Block is named for convenience of distinction and is not intended to be limiting.
After the check data is written in, the second disk management module records the mapping relation between each first sub-logic interval and the first Block, and adds an association mark for identifying that the first Block stores data associated with the data corresponding to the first sub-logic interval.
Still taking the OSD4 as an example, the OSD4 allocates Blk4 for the verification data P corresponding to the disk management module, writes the verification data P into the Blk4 of the OSD4, and records mapping relationships between sub-logical intervals [ 0, 64KB ], [ 128KB, 192KB ], and [ 192KB, 256KB ] and Blk4, and adds an association flag, which can be expressed as:
Seg1→[0:Blk4:R;128KB:Blk4:R;192KB:Blk4:R]
wherein R is a correlation mark.
Step 308, for each first disk, the target cluster node sends a deletion command for instructing to delete the designated data copy to the first disk management module corresponding to the first disk, so that the data corresponding to the N first sub-logical intervals are stored in the N first disks respectively.
Here, the disk management module corresponding to the first disk is referred to as a first disk management module. It is to be understood that the first disk management module is named for convenience of distinguishing and is not intended to be limiting.
Still taking the OSD1, OSD2, and OSD3 shown in fig. 4 as an example, the Server1 may issue a delete command for the sub-logical intervals [ 128KB, 192KB ] and a delete command for the sub-logical intervals [ 192KB, 256KB ] to the OSD1, and the OSD1 corresponds to the mapping relationship between the locally recorded sub-logical intervals and blocks by the disk management module:
Seg1→[0:Blk20;128KB:Blk10;192KB:Blk100]
delete data copy B1 in Blk10 mapped with sub-logical interval [ 128KB, 192KB ]; delete the copy of data in Blk100 mapped with the sub-logical interval [ 192KB, 256KB ] C1. And updates the mapping table of Seg1 to:
Seg1→[0:Blk20;128KB:Blk20:R;192KB:Blk20:R]
as can be seen from the mapping table, the logical sub-interval [ 128KB, 192KB ] to which the deleted data copy B1 belongs and the logical sub-interval [ 192KB, 256KB ] to which the deleted data copy C1 belongs are both mapped to the Blk20 in which the undeleted data copy a1 exists, and data associated with the data corresponding to the logical sub-interval [ 128KB, 192KB ] and the logical sub-interval [ 192KB, 256KB ] is stored in the Blk20 by the association flag R. The association relation in the embodiment of the present application refers to participation in the same parity calculation and belonging to the same EC stripe.
Similarly, the Server1 may issue a delete command for the sub-logical interval [ 0, 64KB ] and a delete command for the sub-logical interval [ 192KB, 256KB ] to the OSD2, and the OSD2 corresponds to the disk management module according to the mapping relationship between the locally recorded sub-logical interval and Block:
Seg1→[0:Blk10;128KB:Blk100;192KB:Blk40]
delete data copy a2 in Blk10 mapped with sub-logical interval [ 0, 64KB ]; the copy of data in Blk40 mapped with the sub-logical interval [ 192KB, 256KB ] is deleted C2. And updates the mapping table of Seg1 to:
Seg1→[0:Blk100:R;128KB:Blk100;192KB:Blk100:R]
that is, the logical sub-intervals [ 0, 64KB ] to which the deleted data copy a2 belongs and the logical sub-intervals [ 192KB, 256KB ] to which the deleted data copy C2 belongs are both mapped to the Blk100 in which the undeleted data copy B2 exists, and data associated with the logical sub-interval [ 0, 64KB ] corresponding data and the logical sub-interval [ 192KB, 256KB ] corresponding data are stored in the Blk100 by the association flag R.
The Server1 may issue a delete command for a sub-logical interval [ 0, 64KB ] and a delete command for a sub-logical interval [ 128KB, 192KB ] to the OSD3, and the OSD3 corresponds to the mapping relationship between the locally recorded sub-logical interval and Block:
Seg1→[0:Blk20;128KB:Blk100;192KB:Blk60]
delete data copy A3 in Blk20 mapped with sub-logical interval [ 0, 64KB ]; the copy of data in Blk100, mapped with the sub-logical interval [ 128KB, 192KB ], is deleted B3. And updates the mapping table of Seg1 to:
Seg1→[0:Blk60:R;128KB:Blk60:R;192KB:Blk60]
that is, the logical sub-intervals [ 0, 64KB ] to which the deleted data copy a3 belongs and the logical sub-intervals [ 128KB, 192KB ] to which the deleted data copy B3 belongs are both mapped to the Blk60 in which the undeleted data copy C3 exists, and data associated with the logical sub-intervals [ 0, 64KB ] and the logical sub-intervals [ 128KB, 192KB ] corresponding data are stored in the Blk60 by the association flag R.
At this time, the mapping relationship of Seg1 in the distributed storage cluster is as shown in fig. 5. In FIG. 5, data copy A1 (corresponding to data chunk A), data copy B2 (corresponding to data chunk B), data copy C3 (corresponding to data chunk C), and parity data P form an EC stripe. That is, the data block A, B, C in Seg1 has been converted from storage in a copy mode to storage in an Erasure Code (EC) mode, which saves the storage space of the distributed storage cluster.
In addition, it should be added that after the conversion into the erasure code method for storage, the target cluster node needs to update the storage method identified by the corresponding bit in the storage bitmap of the target Segment of each first sub-logical interval to the erasure code method.
Still taking sub-logical intervals [ 0, 64KB ], [ 128KB, 192KB ], [ 192KB, 256KB ] of Seg1 as an example, if all the data blocks A, B, C in the 3 sub-logical intervals have been converted into erasure codes for storage, the storage bitmap of Seg1 is updated as follows:
Seg1→[Smap:10110000……000]
here, "1" indicates that the storage method of the data in the corresponding sub-logical interval is an erasure code method.
The flow shown in fig. 3 is completed.
As can be seen from the flow shown in fig. 3, in the embodiment of the present application, when data is written, data is written in a copy manner, so as to ensure data writing performance. And after the data is cooled, converting the data stored in the copy mode into an erasure code mode for storage so as to reduce the consumption of the disk and improve the disk obtaining rate of the distributed storage cluster.
Based on the storage manner, a processing procedure of the distributed storage cluster receiving the write request is described below. Referring to fig. 6, a write request processing flow is shown for the embodiment of the present application.
As shown in fig. 6, the process may include the following steps:
step 601, when the target cluster node receives a write request needing to be written into the target Segment, determining each second sub-logic interval in the target Segment related to the write request, and splitting the write request into sub-write requests for each second sub-logic interval.
Here, the second sub-logical interval is only named for convenience of distinction and is not intended to be limiting.
Still taking the example that the Server1 processes the write request for the Seg1, the write range of the write request in the Seg1 is [ 0, 256KB ], and the sub-logical intervals (second sub-logical interval) in the Seg1 involved in the write request are [ 0, 64KB ], [ 64KB, 128KB ], [ 128KB, 192KB ], and [ 192KB, 256KB ], respectively. The Server1 splits write requests for Seg1 into sub-write requests for sub-logical intervals.
And executing subsequent processing aiming at each second sub-logic interval.
Step 602, the target cluster node queries the write bitmap of the target Segment, and determines whether the data has been written in the second sub-logical interval.
If the second sub-logic interval has not been written with data, i.e. the first writing, the data is directly written in a copy mode, which is not described herein again. If the data is written into the second sub-logical interval, go to step 603.
Still taking Seg1 as an example, the write bitmap of the current Seg1 is:
Seg1→[Wmap:10110000……000]
if the current second sub-logical interval to be processed is [ 64KB, 128KB ], the Server1 can know by querying the writing bitmap, and the value of the corresponding bit in the writing bitmap of the second sub-logical interval [ 64KB, 128KB ] is 0, which indicates that the data is not written in the second sub-logical interval [ 64KB, 128KB ], the data is directly stored in a copy mode.
If the current second sub-logical interval to be processed is [ 0, 64KB ], the Server1 can know by querying the writing bitmap, and the value of the corresponding bit in the writing bitmap of the second sub-logical interval [ 0, 64KB ] is 1, which indicates that data has been written in the second sub-logical interval [ 0, 64KB ], go to step 603.
Step 603, if the data is written into the second sub-logic interval, the target cluster node queries the storage bitmap of the target Segment, and determines the storage mode of the data corresponding to the second sub-logic interval.
Still taking the second sub-logical interval [ 0, 64KB ] as an example, the Server1 determines that the sub-logical interval has written data through step 602, and the Server1 continues to query the storage bitmap of Seg 1:
Seg1→[Smap:10110000……000]
as can be seen from the storage bitmap, the value of the corresponding bit in the storage bitmap of the second sub-logical interval [ 0, 64KB ] is 1, which indicates that the storage mode of the corresponding data in the second sub-logical interval [ 0, 64KB ] is an erasure code mode.
Here, it should be noted that if the storage mode of the data corresponding to the second sub-logic interval is determined to be a copy mode by querying the storage bitmap, the data is directly written in the existing copy mode, and details are not described here. If the storage mode of the data corresponding to the second sub-logical interval is determined to be the erasure code mode, go to step 604.
Step 604, if the storage mode of the data corresponding to the second sub-logic interval is an erasure code mode, the target cluster node sends a sub-write request for the second sub-logic interval to each first disk management module, where the sub-write request carries the data to be written into the second sub-logic interval and an erasure code flag.
Still taking the second sub-logical interval [ 0, 64KB ] as an example, the Server1 determines that data has been written in the sub-logical interval through steps 602 and 603, and the corresponding data storage mode is an erasure code mode, and then sends sub-write requests for the sub-logical interval [ 0, 64KB ] to the corresponding disk management modules of OSD1 to OSD3 for storing data corresponding to Seg1, respectively, where the sub-write requests carry data blocks D to be written and an erasure code flag.
Step 605, when determining that the sub-write request carries the erasure code flag, the first disk management module allocates a second Block to the second sub-logic interval, writes the data into the second Block, and records a mapping relationship between the second sub-logic interval and the second Block.
And when the first disk management module determines that the sub-write request carries the erasure code flag, allocating a new Block for the second sub-logic interval, and recording the new Block as a second Block. It is to be understood that the reference to the second Block is a name for convenience of distinguishing and is not intended to be limiting. And recording the mapping relation between the second sub-logic interval and the newly allocated second Block.
Still taking the second sub-logical interval [ 0, 64KB ] as an example, when the OSD1 corresponds to the disk management module and receives the sub-write instruction for the sub-logical interval and determines that the sub-write instruction carries the erasure correction code flag, a new Block is allocated to the sub-logical interval [ 0, 64KB ], which is denoted as Blk1, and the data Block D is written into Block1, and the mapping relationship between the sub-logical interval [ 0, 64KB ] and Block in Seg1 is updated, and the mapping table of Seg1 after updating is:
Seg1→[0:Blk1;128KB:Blk20:R;192KB:Blk20:R]
here, it should be noted that, by allocating a new Block to the sub-logical interval [ 0, 64KB ], the data copy a1 in Blk20 to which the sub-logical interval is originally mapped can be prevented from being covered. In this way, when data of the sub-logical interval [ 128KB, 192KB ] or the sub-logical interval [ 192KB, 256KB ] needs to be restored subsequently, data restoration can be performed by searching the data copy a1 in Blk20 in the association relationship. The specific process is described below, and is not described in detail here.
Similarly, when the OSD2 receives a sub-write instruction for the sub-logical interval [ 0, 64KB ] corresponding to the disk management module and determines that the sub-write instruction carries an erasure correction code flag, a new Block is allocated for the sub-logical interval [ 0, 64KB ], which is denoted as Blk2, the data Block D is written into Block2, and the mapping relationship between the sub-logical interval [ 0, 64KB ] and Block in Seg1 is updated, and the updated mapping table of Seg1 is:
Seg1→[0:Blk2;128KB:Blk100;192KB:Blk100:R]
here, it should be noted that, since the data in the sub-logical interval [ 0, 64KB ] is stored in Blk2, the mapping relationship between the sub-logical interval [ 0, 64KB ] and Blk2 does not need to add an association flag.
Similarly, when the OSD3 receives a sub-write instruction for the sub-logical interval [ 0, 64KB ] corresponding to the disk management module and determines that the sub-write instruction carries an erasure correction code flag, a new Block is allocated for the sub-logical interval [ 0, 64KB ], which is denoted as Blk3, the data Block D is written into Block3, and the mapping relationship between the sub-logical interval [ 0, 64KB ] and Block in Seg1 is updated, and the updated mapping table of Seg1 is:
Seg1→[0:Blk3;128KB:Blk60:R;192KB:Blk60]
since the data in the sub-logical interval [ 0, 64KB ] is stored in Blk3, the mapping relationship between the sub-logical interval [ 0, 64KB ] and Blk3 does not need to add an association flag.
At this time, the mapping relationship of Seg1 in the distributed storage cluster is as shown in fig. 7. Wherein D1 is a data copy of the data block D in the OSD 1; d2 is a copy of data in OSD2 for data block D; d3 is a copy of the data block D in the OSD 3. That is, data block D is written in duplicate.
Step 606, the target cluster node updates the storage mode identified by the corresponding bit of the second sub-logic interval in the storage bitmap of the target Segment to be a copy mode.
Still taking Seg1 as an example, the Server1 updates the value of the corresponding bit in the storage bitmap of the sub-logical interval [ 0, 64KB ] of Seg1 to 0 (which identifies that the storage mode is a duplicate mode), and the storage bitmap corresponding to Seg1 after updating can be represented as:
Seg1→[Smap:00110000……000]
the flow shown in fig. 6 is completed. The processing of the write request is implemented by the flow shown in fig. 6.
The processing of the distributed storage cluster receiving the read request is described below. Referring to fig. 8, a read request processing flow is shown for an embodiment of the present application.
As shown in fig. 8, the process may include the following steps:
step 801, when the target cluster node receives a read request that needs to read the target Segment, determining each third sub-logic interval in the target Segment related to the read request, and splitting the read request into sub-read requests for each third sub-logic interval.
Here, the third sub-logical interval is only named for convenience of distinction and is not intended to be limiting.
Still taking the example that the Server1 processes the read request for the Seg1, the read range of the read request in the Seg1 is [ 0, 256KB ], and the sub-logical intervals (the third sub-logical interval) in the Seg1 involved in the read request are [ 0, 64KB ], [ 64KB, 128KB ], [ 128KB, 192KB ], and [ 192KB, 256KB ], respectively. The Server1 splits the read request for Seg1 into sub-read requests for sub-logical intervals.
And performing subsequent processing on each third sub-logic interval.
Step 802, the target cluster node queries the storage bitmap of the target Segment, and determines the storage mode of the data corresponding to the third sub-logical interval.
If the storage mode of the data corresponding to the third sub-logic interval is a copy mode, a sub-read request aiming at the third sub-logic interval can be sent to any first disk management module; if the storage mode of the data corresponding to the third sub-logical interval is erasure code mode, go to step 803.
Taking the reading sub-logical interval [ 0, 64KB ] as an example, the current Seg1 corresponds to a storage bitmap as follows:
Seg1→[Smap:00110000……000]
the Server1 finds out by querying the storage bitmap that the value of the corresponding bit of the sub-logical interval [ 0, 64KB ] is 0, which indicates that the storage mode of the data corresponding to the sub-logical interval is a copy mode. The Server1 sends a sub-read request for reading the data corresponding to the sub-logic section to any one of the OSDs 1 to 3 (for example, OSD1) corresponding to the disk management module for storing the data corresponding to the Seg 1. The OSD1 corresponds to the mapping table of the disk management module according to Seg1 recorded locally:
Seg1→[0:Blk1;128KB:Blk20:R;192KB:Blk20:R]
the data copy D1 is read from Blk1 which has a mapping relation with the sub-logical interval [ 0, 64KB ], and returned to the Server 1.
Taking the reading sub-logical interval [ 128KB, 192KB ] as an example, the current Seg1 corresponding to the storage bitmap is:
Seg1→[Smap:00110000……000]
the Server1 finds out by querying the storage bitmap that the value of the corresponding bit of the sub-logical interval [ 128KB, 192KB ] is 1, which indicates that the storage mode of the data corresponding to the sub-logical interval is erasure code mode, and then go to step 803.
Step 803, if the storage mode of the data corresponding to the third sub-logic interval is an erasure code mode, the target cluster node sends a first sub-read request for the third sub-logic interval to each first disk management module, where the first sub-read request does not include an association flag.
Here, the first sub-read request is named for convenience of distinction and is not intended to be limiting.
Still taking the example of reading the sub-logical intervals [ 128KB, 192KB ], the Server1 determines the storage manner of the data corresponding to the sub-logical intervals [ 128KB, 192KB ] as the erasure code manner through step 802, and then sends the sub-read requests for the sub-logical intervals [ 128KB, 192KB ] to the corresponding disk management modules of the OSDs 1 to 3, respectively.
Here, it should be noted that the target cluster node instructs the first disk management module to prohibit returning the associated data of the third sub-logical interval (i.e., the data associated with the data in the third sub-logical interval) by sending a sub-read request that does not include the association flag.
Step 804, when determining that the mapping relationship between the locally recorded third sub-logic interval and the third Block does not have an association mark, the first disk management module reads corresponding data from the third Block and returns the corresponding data to the target cluster node.
Here, the third Block is named for convenience of distinction and is not intended to be limiting.
And no associated mark exists in the mapping relation between the third sub-logic interval and the third Block, which indicates that the data stored in the third Block is the data of the third sub-logic interval, so that the corresponding data can be directly read from the third Block and returned to the target cluster node.
If the mapping relation between the third sub-logic interval and the third Block has the association mark, which indicates that the third Block stores only the data associated with the data corresponding to the third sub-logic interval, the first disk management module does not read the data from the third Block.
The following description will take the cases that the first sub-read requests are received by the disk management modules corresponding to the OSDs 1 to 3, respectively:
after receiving a sub-read request (containing no association flag) for the sub-logical interval [ 128KB, 192KB ] corresponding to the disk management module, the OSD1 queries the mapping table of the locally recorded Seg 1:
Seg1→[0:Blk1;128KB:Blk20:R;192KB:Blk20:R]
according to the mapping table, the sub-logical intervals [ 128KB, 192KB ] are associated with Blk20 (with the association flag R), and the data stored in Blk20 is not the data in the sub-logical intervals [ 128KB, 192KB ], so that the OSD1 corresponding to the disk management module cannot return the data in the sub-logical intervals [ 128KB, 192KB ] to the Server 1.
After receiving a sub-read request (containing no association flag) for the sub-logical interval [ 128KB, 192KB ] corresponding to the disk management module, the OSD2 queries the mapping table of the locally recorded Seg 1:
Seg1→[0:Blk2;128KB:Blk100;192KB:Blk100:R]
according to the mapping table, the sub-logical intervals [ 128KB, 192KB ] have a direct mapping relationship (no association flag R) with Blk100, so that the OSD2 corresponding to the disk management module can read the data copy B2 from Blk100 of the OSD2 and return the data copy to the Server 1.
After receiving a sub-read request (containing no association flag) for the sub-logical interval [ 128KB, 192KB ] corresponding to the disk management module, the OSD3 queries the mapping table of the locally recorded Seg 1:
Seg1→[0:Blk3;128KB:Blk60:R;192KB:Blk60]
according to the mapping table, the sub-logical intervals [ 128KB, 192KB ] are associated with Blk60 (with the association flag R), and the data stored in Blk60 is not in the sub-logical intervals [ 128KB, 192KB ], so the OSD3 corresponding to the disk management module cannot return the data in the sub-logical intervals [ 128KB, 192KB ] to the Server 1.
The flow shown in fig. 8 is completed. The processing of the read request is implemented by the flow shown in fig. 8.
For one embodiment, after step 803, if the target cluster node does not successfully read the data of the third sub-logical interval, the fault handling process shown in fig. 9 may be executed.
As shown in fig. 9, the process may include the following steps:
step 901, the target cluster node sends a second sub-read request for the third sub-logic interval to each first disk management module, where the second sub-read request carries an association flag.
Here, the second sub-read request is named for convenience of distinction and is not intended to be limiting.
And the target cluster node instructs the first disk management module to return data associated with the data corresponding to the third sub-logic interval by sending a second sub-reading request carrying the association mark.
Step 902, when determining that the mapping relationship between the locally recorded third sub-logic interval and the third Block has an association mark, the first disk management module reads corresponding data from the third Block and returns the corresponding data to the target cluster node.
Still taking the sub-logic intervals [ 128KB, 192KB ] as an example, if the data corresponding to the sub-logic interval is not read through the process shown in fig. 8 (OSD2 is failed), the Server1 sends the sub-read requests including the association flags to the corresponding disk management modules of OSD1 to OSD3, respectively.
After receiving a sub-read request (containing an association flag) for a sub-logic interval [ 128KB, 192KB ] corresponding to the disk management module, the OSD1 queries the mapping table of the locally recorded Seg 1:
Seg1→[0:Blk1;128KB:Blk20:R;192KB:Blk20:R]
according to the mapping table, the sub-logical intervals [ 128KB, 192KB ] are associated with Blk20 (with an association flag R), that is, data associated with data corresponding to the sub-logical intervals [ 128KB, 192KB ] are stored in Blk20, and then the OSD1 reads the associated data from Blk20 corresponding to the disk management module (a1) and returns the associated data to the Server 1.
The OSD2 fails, and therefore, the corresponding disk management module cannot return data.
After receiving a sub-read request (containing an association flag) for a sub-logic interval [ 128KB, 192KB ] corresponding to the disk management module, the OSD3 queries the mapping table of the locally recorded Seg 1:
Seg1→[0:Blk3;128KB:Blk60:R;192KB:Blk60]
according to the mapping table, the sub-logical intervals [ 128KB, 192KB ] are associated with Blk60 (with an association flag R), that is, data associated with data corresponding to the sub-logical intervals [ 128KB, 192KB ] is stored in Blk60, and then OSD3 reads the associated data from Blk60 corresponding to the disk management module (C3) and returns the associated data to the Server 1.
Step 903, the target cluster node sends a third sub-read request for the third sub-logic interval to the second disk management module, where the third sub-read request includes an association flag.
Here, the third sub-read request is named for convenience of description only and is not limited.
As described above, the second disk management module is a disk management module corresponding to the second disk and used for storing the verification data corresponding to the target Segment.
And the target cluster node acquires the check data related to the data corresponding to the third sub-logic interval by sending a third sub-read request to the second disk management module.
And 904, the second disk management module reads the check data from the fourth Block according to the locally recorded mapping relationship between the third sub-logic interval and the fourth Block, and returns the check data to the target cluster node.
Here, the fourth Block is named only for convenience of description and is not intended to be limiting.
Taking the example that the Server1 sends a sub-read request for sub-logical intervals [ 128KB, 192KB ] to the OSD4 corresponding to the disk management module, the OSD4 queries the mapping relationship between the locally recorded sub-logical intervals and blocks corresponding to the disk management module:
Seg1→[0:Blk4:R;128KB:Blk4:R;192KB:Blk4:R]
the check data P is read from Blk4 mapped to the sub-logical interval [ 128KB, 192KB ], and returned to the Server 1.
Step 905, the target cluster node performs check calculation according to the data returned by the first disk management module and the check data returned by the second disk management module, so as to obtain data corresponding to the third sub-logic interval.
For example, the Server performs verification calculation according to a1 returned by the OSD1 corresponding to the disk management module, C3 returned by the OSD3 corresponding to the disk management module, and P returned by the OSD4 corresponding to the disk management module, to obtain data B2 in sub-logic intervals [ 128KB, 192KB ] stored in the OSD2 of the failure disk.
Thus, the flow shown in fig. 9 is completed, and data can still be read when a disk fails.
Based on the storage structure in the embodiment of the present application, a data reconstruction process after a disk failure is described below.
When it is determined that a failed disk exists in the N first disks, the target cluster node executes the data reconstruction process shown in fig. 10 for each fourth sub-logical interval into which data has been written in the target Segment. Here, the fourth sub-logical interval is only named for convenience of distinction and is not intended to be limiting.
As shown in fig. 10, the process may include the following steps:
step 1001, the target cluster node reads data corresponding to the fourth sub-logic interval.
The process of reading the data corresponding to the fourth sub-logic interval by the target cluster node may refer to the foregoing read request processing flow, which is not described herein again.
Taking the distributed storage cluster shown in fig. 7 as an example, if the OSD2 fails, data reconstruction needs to be performed on each sub-logical interval ([ 0, 64KB ], [ 128KB, 192KB ], [ 192KB, 256KB ]) of the Seg1 to which data has been written.
Taking the sub-logical interval [ 0, 64KB ] as an example, the Server1 can directly read the corresponding data D1 of the sub-logical interval from the OSD 1.
Taking the sub-logic interval [ 128KB, 192KB ] as an example, since the OSD2 fails, the data corresponding to the sub-logic interval cannot be directly read, and the Server1 may perform the verification calculation by reading a1, C3, and P, to obtain the data B2 corresponding to the sub-logic interval [ 128KB, 192KB ].
Step 1002, the target cluster node sends a write command for instructing to write data corresponding to the fourth sub-logical interval to each third disk management module.
Here, the third disk management module refers to a disk management module corresponding to a disk other than the disk corresponding to the failed disk and the disk corresponding to the data returned from the fourth sub-logical section, among the first disk and the second disk.
For example, the Server1 reads the sub-logical section [ 0, 64KB ] corresponding data D1 from the OSD1, and the Server1 sends a write command for instructing to write the sub-logical section [ 0, 64KB ] corresponding data D1 to the OSD3 and the OSD4 corresponding disk management modules except for the OSD1 (data source disk) and the OSD2 (failure disk).
Step 1003, when determining that the mapping relationship between the locally recorded fourth sub-logic interval and the fifth Block has an association mark, the third disk management module allocates a sixth Block to the fourth sub-logic interval, and writes data corresponding to the fourth sub-logic interval into the sixth Block.
Here, the fifth Block and the sixth Block are only named for convenience of distinction and are not intended to be limiting.
It should be noted that, when the third disk management module determines that the mapping relationship between the locally recorded fourth sub-logical interval and the fifth Block has the association flag, it indicates that only data associated with data corresponding to the fourth sub-logical interval exists in the corresponding disk (in the fifth Block), and the data corresponding to the fourth sub-logical interval does not really exist, so that the third disk management module needs to allocate a new Block (sixth Block) to the fourth sub-logical interval, store the data corresponding to the fourth sub-logical interval into the sixth Block, and go to step 1004.
On the contrary, when the third disk management module determines that the mapping relationship between the locally recorded fourth sub-logical interval and the fifth Block does not have an association mark, it indicates that the data corresponding to the fourth sub-logical interval is stored in the fifth Block, and therefore, the rebuilding operation may not be executed.
Step 1004, the third disk management module updates the mapping relationship between the fourth sub-logic interval and the fifth Block to the mapping relationship between the fourth sub-logic interval and the sixth Block, and removes the corresponding association flag.
That is, a mapping relationship between the fourth sub-logical section and a Block (sixth Block) storing data corresponding to the fourth sub-logical section is established. And because the real mapping relation exists, the data corresponding to the fourth sub-logic interval does not need to be recovered by depending on the associated data, so that the original associated mark is deleted.
In addition, it should be noted that after the mapping relationship is updated, the third disk management module may further determine whether a mapping relationship including a fifth Block still exists in the local record, and if not, it indicates that there is no data that needs to be restored depending on the data in the fifth Block, so that the fifth Block may be recovered, so as to save storage resources.
Taking an example that the Server1 sends write commands for writing sub-logic intervals [ 0, 64KB ] into the data block D to the disk management module corresponding to the OSD3 and the disk management module corresponding to the OSD4 respectively:
after the OSD3 corresponding to the disk management module receives the write command, the mapping table of locally recorded Seg1 is queried:
Seg1→[0:Blk3;128KB:Blk60:R;192KB:Blk60]
it can be seen that the mapping relationship between the sub-logical interval [ 0, 64KB ] and Blk3 has no correlation flag R, that is, the sub-logical interval [ 0, 64KB ] corresponding to data is stored in Blk3, and therefore, the write operation can be prohibited.
After the OSD4 corresponding to the disk management module receives the write command, the mapping table of locally recorded Seg1 is queried:
Seg1→[0:Blk4:R;128KB:Blk4:R;192KB:Blk4:R]
it can be known that there is an association flag R in the mapping relationship between the sub-logical intervals [ 0, 64KB ] and Blk4, and then the OSD4 allocates new blocks to the sub-logical intervals [ 0, 64KB ] corresponding to the disk management module, and records the blocks as Blk70, and writes the data corresponding to the sub-logical intervals [ 0, 64KB ] into the Blk 70. Here, the sub-logical interval [ 0, 64KB ] corresponds to data in the OSD4 as D4 (a data copy of the data block D). The mapping table of the OSD4 corresponding to the disk management module updated Seg1 is:
Seg1→[0:Blk70;128KB:Blk4:R;192KB:Blk4:R]
at this time, the mapping relationship of Seg1 in the distributed storage cluster is as shown in fig. 11.
Based on fig. 11, the Server1 sends write commands for writing sub-logical intervals [ 128KB, 192KB ] corresponding to data blocks B to the OSD1 corresponding to the disk management module, the OSD3 corresponding to the disk management module, and the OSD4 corresponding to the disk management module, respectively:
after the OSD1 corresponding to the disk management module receives the write command, the mapping table of locally recorded Seg1 is queried:
Seg1→[0:Blk1;128KB:Blk20:R;192KB:Blk20:R]
it can be known that there is an association flag R in the mapping relationship between the sub-logical intervals [ 128KB, 192KB ] and Blk20, and the OSD1 allocates a new Block to the sub-logical intervals [ 128KB, 192KB ] corresponding to the disk management module, and records the new Block as Blk10, and writes the data corresponding to the sub-logical intervals [ 128KB, 192KB ] into the Blk 10. Here, the sub-logical interval [ 128KB, 192KB ] corresponds to data in the OSD1 as B1 (a data copy of the data block B). The mapping table of the OSD1 corresponding to the disk management module updated Seg1 is:
Seg1→[0:Blk1;128KB:Blk10;192KB:Blk20:R]
after the OSD3 corresponding to the disk management module receives the write command, the mapping table of locally recorded Seg1 is queried:
Seg1→[0:Blk3;128KB:Blk60:R;192KB:Blk60]
it can be known that there is an association flag R in the mapping relationship between the sub-logical intervals [ 128KB, 192KB ] and Blk60, and the OSD3 allocates a new Block to the sub-logical intervals [ 128KB, 192KB ] corresponding to the disk management module, and records the new Block as Blk20, and writes the data corresponding to the sub-logical intervals [ 128KB, 192KB ] into the Blk 20. Here, the sub-logical interval [ 128KB, 192KB ] corresponds to data in the OSD3 as B3 (a data copy of the data block B). The mapping table of the OSD3 corresponding to the disk management module updated Seg1 is:
Seg1→[0:Blk3;128KB:Blk20;192KB:Blk60]
after the OSD4 corresponding to the disk management module receives the write command, the mapping table of locally recorded Seg1 is queried:
Seg1→[0:Blk70;128KB:Blk4:R;192KB:Blk4:R]
it can be known that there is a correlation flag R in the mapping relationship between the sub-logical intervals [ 128KB, 192KB ] and Blk4, the OSD4 allocates a new Block to the corresponding disk management module for the sub-logical intervals [ 128KB, 192KB ], which is denoted as Blk100, and writes the corresponding data of the sub-logical intervals [ 128KB, 192KB ] into Blk 100. Here, the sub-logical interval [ 128KB, 192KB ] corresponds to data in the OSD4 as B4 (a data copy of the data block B). The mapping table of the OSD4 corresponding to the disk management module updated Seg1 is:
Seg1→[0:Blk70;128KB:Blk100;192KB:Blk4:R]
at this time, the mapping relationship of Seg1 in the distributed storage cluster is as shown in fig. 12.
On the basis of fig. 12, the Server1 sends write commands for writing sub-logical intervals [ 192KB, 256KB ] corresponding to data blocks C to the OSD1 corresponding to the disk management module and the OSD4 corresponding to the disk management module, respectively:
after the OSD1 corresponding to the disk management module receives the write command, the mapping table of locally recorded Seg1 is queried:
Seg1→[0:Blk1;128KB:Blk10;192KB:Blk20:R]
it can be known that there is a correlation flag R in the mapping relationship between the sub-logical intervals [ 192KB, 256KB ] and Blk20, the OSD1 allocates a new Block to the corresponding disk management module for the sub-logical intervals [ 192KB, 256KB ], which is denoted as Blk100, and writes the corresponding data of the sub-logical intervals [ 192KB, 256KB ] into Blk 100. Here, the sub-logical interval [ 192KB, 256KB ] corresponds to data in the OSD1 and is denoted as C1 (a data copy of the data block C). The mapping table of the OSD1 corresponding to the disk management module updated Seg1 is:
Seg1→[0:Blk1;128KB:Blk10;192KB:Blk100]
here, it should be noted that after updating 192KB: Blk20: R to 192KB: Blk100, OSD1 corresponding to the disk management module may determine that the mapping relationship containing Blk20 does not exist locally, i.e., data a1 in Blk20 is meaningless, and thus, Blk20 may be recovered.
After the OSD4 corresponding to the disk management module receives the write command, the mapping table of locally recorded Seg1 is queried:
Seg1→[0:Blk70;128KB:Blk100;192KB:Blk4:R]
it can be known that there is a correlation flag R in the mapping relationship between the sub-logical intervals [ 192KB, 256KB ] and Blk4, the OSD4 allocates a new Block to the sub-logical interval [ 192KB, 256KB ] corresponding to the disk management module, and records the new Block as Blk20, and writes the data corresponding to the sub-logical interval [ 192KB, 256KB ] into Blk 20. Here, the sub-logical interval [ 192KB, 256KB ] corresponds to data in the OSD4 and is denoted as C4 (a data copy of the data block C). The mapping table of the OSD4 corresponding to the disk management module updated Seg1 is:
Seg1→[0:Blk70;128KB:Blk100;192KB:Blk20]
here, it should be noted that after updating 192KB: Blk4: R to 192KB:20, the OSD4 corresponding to the disk management module may determine that the mapping relationship containing Blk4 no longer exists locally, i.e., the data P in Blk4 is meaningless, and therefore, Blk4 may be recycled.
At this time, the mapping relationship of Seg1 in the distributed storage cluster is as shown in fig. 13.
The flow shown in fig. 10 is completed. Data reconstruction can be achieved by the flow shown in fig. 10.
The method provided by the embodiment of the present application is described above, and the distributed storage cluster provided by the embodiment of the present application is described below:
the distributed storage cluster comprises at least one cluster node, each cluster node comprises at least one disk for storing data, each disk is divided into a plurality of blocks according to the preset Block size, the distributed storage cluster is configured with at least one LUN, each LUN is divided into a plurality of logic sections according to the preset Segment size, each logic section is divided into a plurality of sub-logic sections according to the preset Block size, each cluster node deploys a corresponding disk management module for each disk on the node, each Segment corresponds to a write bitmap, each bit in the write bitmap is used for identifying whether the corresponding sub-logic section has been written with data, each Segment also corresponds to a storage bitmap, each bit in the storage bitmap is used for identifying the storage mode of the data of the corresponding sub-logic section, and the distributed storage cluster is written with an N copy mode, wherein N is greater than or equal to 2;
the target cluster node in the at least one cluster node is used for acquiring a write bitmap and a storage bitmap corresponding to the target Segment when the monitored duration of not accessing the target Segment reaches a preset duration; traversing the writing bitmap and the storage bitmap, and finding N first sub-logic intervals in which data are written and the storage mode is a copy mode; selecting a target disk from N first disks for storing data corresponding to the target Segment; for each first sub-logic interval, sending a read command for reading data corresponding to the first sub-logic interval to a target disk management module corresponding to the target disk; calculating check data according to the data of each first sub-logic interval returned by the target disk management module; sending a write command for indicating to write the verification data into a second disk to a second disk management module corresponding to the second disk, wherein the second disk is a disk which is specified in advance and used for storing the verification data corresponding to the target Segment;
the second disk management module is configured to allocate a first Block to the check data from the second disk, and write the check data into the first Block;
the target cluster node is further configured to send, to each first disk, a deletion command for instructing to delete the designated data copy to the first disk management module corresponding to the first disk, so that the data corresponding to the N first sub-logical intervals are stored in the N first disks, respectively.
As an embodiment, the target cluster node is further configured to, for each first sub-logical interval, update a storage manner identified by a corresponding bit in the storage bitmap of the first sub-logical interval to be an erasure code manner.
As an embodiment, the target cluster node is further configured to, when receiving a write request that needs to be written into the target Segment, determine second sub-logic intervals in the target Segment that the write request relates to, and split the write request into sub-write requests for each second sub-logic interval; for each second sub-logic interval, inquiring the writing bitmap of the target Segment, and determining whether the second sub-logic interval is written with data; if the data is written into the second sub-logic interval, inquiring the storage bitmap of the target Segment, and determining the storage mode of the data corresponding to the second sub-logic interval; if the storage mode of the data corresponding to the second sub-logic interval is an erasure code mode, respectively sending a sub-write request aiming at the second sub-logic interval to each first disk management module, wherein the sub-write request carries the data to be written into the second sub-logic interval and an erasure code mark;
the first disk management module is further configured to allocate a second Block to the second sub-logic interval when it is determined that the sub-write request carries the erasure code flag, write data into the second Block, and record a mapping relationship between the second sub-logic interval and the second Block;
and the target cluster node is further configured to update the storage mode identified by the corresponding bit in the storage bitmap of the second sub-logic interval to be a copy mode.
As an embodiment, the first disk management module is further configured to record a mapping relationship between a first sub-logic interval to which a deleted data copy belongs and a Block in which an undeleted data copy is located, and add an association tag to the mapping relationship;
the target cluster node is further configured to, when receiving a read request that requires reading of the target Segment, determine each third sub-logic interval in the target Segment that the read request relates to, and split the read request into sub-read requests for each third sub-logic interval; inquiring the storage bitmap aiming at each third sub-logic interval, and determining a storage mode of data corresponding to the third sub-logic interval; if the storage mode of the data corresponding to the third sub-logic interval is an erasure code mode, respectively sending a first sub-read request aiming at the third sub-logic interval to each first disk management module, wherein the first sub-read request does not include an association mark;
and the first disk management module is further configured to, when it is determined that the mapping relationship between the locally recorded third sub-logic interval and the third Block does not have an association flag, read corresponding data from the third Block and return the corresponding data to the target cluster node.
As an embodiment, the second disk management module is further configured to record a mapping relationship between each first sub-logic interval and the first Block, and add an association flag;
the target cluster node is further configured to send a second sub-read request for the third sub-logic interval to each first disk management module if the data of the third sub-logic interval is not read, where the second sub-read request carries an association flag;
the first disk management module is further configured to, when it is determined that a mapping relationship between the locally recorded third sub-logic interval and a third Block has an association flag, read corresponding data from the third Block and return the corresponding data to the target cluster node;
the target cluster node is further configured to send a third sub-read request for the third sub-logic interval to the second disk management module;
the second disk management module is further configured to read check data from a fourth Block according to a locally recorded mapping relationship between the third sub-logic interval and the fourth Block, and return the check data to the target cluster node;
and the target cluster node is further configured to perform verification calculation according to the data returned by the first disk management module and the verification data returned by the second disk management module, so as to obtain data corresponding to the third sub-logic interval.
As an embodiment, the target cluster node is further configured to, when it is determined that a failed disk exists in the N first disks, read, for each fourth sub-logical interval to which data has been written in the target Segment, data corresponding to the fourth sub-logical interval; sending a write command for indicating to write data corresponding to the fourth sub-logic interval to each third disk management module, wherein the third disk management module refers to a disk management module corresponding to a disk except for the failed disk and the disk returning the data corresponding to the fourth sub-logic interval in the first disk and the second disk;
the third disk management module is configured to allocate a sixth Block to a fourth sub-logic interval when it is determined that a mapping relationship between the locally recorded fourth sub-logic interval and the fifth Block has an association flag, and write data corresponding to the fourth sub-logic interval into the sixth Block; and updating the mapping relation between the fourth sub-logic interval and the fifth Block into the mapping relation between the fourth sub-logic interval and the sixth Block, and removing the corresponding association mark.
As an embodiment, the third disk management module is further configured to determine whether a mapping relationship including the fifth Block further exists; if not, the fifth Block is recovered.
As can be seen from the above description, in the embodiment of the present application, the distributed storage cluster writes data in a duplicate manner, so as to ensure data writing performance of the distributed storage cluster; and when the data is cooled, the data stored in the copy mode is converted into an erasure code mode for storage, so that the disk consumption is reduced, and the disk obtaining rate of the distributed storage cluster is improved.
The above description is only a preferred embodiment of the present application, and should not be taken as limiting the present application, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present application shall be included in the scope of the present application.

Claims (14)

1. A data redundancy method is applied to a distributed storage cluster, the distributed storage cluster comprises at least one cluster node, each cluster node comprises at least one disk for storing data, each disk is divided into a plurality of blocks according to a preset Block size, the distributed storage cluster is configured with at least one LUN, each LUN is divided into a plurality of logical sections according to a preset Segment size, each logical section is divided into a plurality of sub-logical sections according to the preset Block size, each cluster node is used for deploying a corresponding disk management module for each disk on the node, each Segment corresponds to a writing bitmap, each bit in the writing bitmap is used for identifying whether data are written into the corresponding sub-logical section or not, each Segment also corresponds to a storage bitmap, and each bit in the storage bitmap is used for identifying a storage mode of the data of the corresponding sub-logical section, the distributed storage cluster is written in an N copy mode, wherein N is greater than or equal to 2, and the method comprises the following steps:
when monitoring that the time length of the target Segment which is not accessed reaches the preset time length, a target cluster node in the at least one cluster node acquires a writing bitmap and a storage bitmap corresponding to the target Segment;
the target cluster node traverses the writing bitmap and the storage bitmap and finds N first sub-logic intervals with written data and a storage mode of a copy mode;
the target cluster node selects a target disk from N first disks used for storing the data corresponding to the target Segment;
for each first sub-logic interval, the target cluster node sends a read command for reading data corresponding to the first sub-logic interval to a target disk management module corresponding to the target disk;
the target cluster node calculates check data according to the data of each first sub-logic interval returned by the target disk management module;
the target cluster node sends a write command for indicating to write the verification data into a second disk to a second disk management module corresponding to the second disk, wherein the second disk is a disk which is specified in advance and used for storing the verification data corresponding to the target Segment;
the second disk management module allocates a first Block for the check data from the second disk, and writes the check data into the first Block;
and for each first disk, the target cluster node sends a deletion command for indicating deletion of the designated data copy to a first disk management module corresponding to the first disk, so that the data corresponding to the N first sub-logic intervals are stored in the N first disks respectively.
2. The method of claim 1, wherein after the target cluster node sends, for each first disk, a delete command for instructing deletion of the designated data copy to the first disk management module corresponding to the first disk, the method further comprises:
and for each first sub-logic interval, the target cluster node updates the storage mode identified by the corresponding bit of the first sub-logic interval in the storage bitmap to be an erasure code mode.
3. The method of claim 1, wherein the method further comprises:
when the target cluster node receives a write request needing to be written into the target Segment, determining each second sub-logic interval in the target Segment related to the write request, and splitting the write request into sub-write requests aiming at each second sub-logic interval;
for each second sub-logical interval, the following processing is performed:
the target cluster node inquires the writing bitmap of the target Segment and determines whether data is written into the second sub-logic interval;
if the data is written into the second sub-logic interval, the target cluster node inquires the storage bitmap of the target Segment and determines the storage mode of the data corresponding to the second sub-logic interval;
if the storage mode of the data corresponding to the second sub-logic interval is an erasure code mode, the target cluster node sends a sub-write request aiming at the second sub-logic interval to each first disk management module respectively, and the sub-write request carries the data to be written into the second sub-logic interval and an erasure code mark;
when the first disk management module determines that the sub-write request carries an erasure code mark, allocating a second Block to the second sub-logic interval, writing data into the second Block, and recording a mapping relation between the second sub-logic interval and the second Block;
and the target cluster node updates the storage mode identified by the corresponding bit of the second sub-logic interval in the storage bitmap to be a copy mode.
4. The method according to claim 1, wherein, for each first disk, after the target cluster node sends a delete command for instructing to delete the designated data copy to the first disk management module corresponding to the first disk, so that the data corresponding to the N first sub-logical intervals are stored in the N first disks, the method further includes:
the first disk management module records the mapping relation between a first sub-logic interval to which the deleted data copy belongs and a Block to which the undeleted data copy belongs, and adds an association mark to the mapping relation;
when the target cluster node receives a read request needing to read the target Segment, determining each third sub-logic interval in the target Segment related to the read request, and splitting the read request into sub-read requests aiming at each third sub-logic interval;
for each third sub-logical interval, the following processing is performed:
the target cluster node inquires the storage bitmap and determines a storage mode of data corresponding to a third sub-logic interval;
if the storage mode of the data corresponding to the third sub-logic interval is an erasure code mode, the target cluster node sends a first sub-read request aiming at the third sub-logic interval to each first disk management module respectively, and the first sub-read request does not include an association mark;
and when determining that the mapping relation between the locally recorded third sub-logic interval and the third Block does not have the associated mark, the first disk management module reads the corresponding data from the third Block and returns the corresponding data to the target cluster node.
5. The method of claim 4, wherein the write command includes a start address of each first sub-logical interval, and after the second disk management module allocates a first Block to the parity data from the second disk and writes the parity data into the first Block, the method further comprises:
the second disk management module records the mapping relation between each first sub-logic interval and the first Block and adds an association mark;
after the target cluster node sends the first sub-read request for the third sub-logic interval to each first disk management module, the method further includes:
if the data of the third sub-logic interval is not read, the target cluster node sends a second sub-read request aiming at the third sub-logic interval to each first disk management module, wherein the second sub-read request carries an association mark;
when determining that the mapping relation between the locally recorded third sub-logic interval and the third Block has an association mark, the first disk management module reads corresponding data from the third Block and returns the corresponding data to the target cluster node;
the target cluster node sends a third sub-read request aiming at the third sub-logic interval to the second disk management module;
the second disk management module reads check data from a fourth Block and returns the check data to the target cluster node according to a locally recorded mapping relation between the third sub-logic interval and the fourth Block;
and the target cluster node performs verification calculation according to the data returned by the first disk management module and the verification data returned by the second disk management module to obtain the data corresponding to the third sub-logic interval.
6. The method of claim 1, wherein the method further comprises:
when the target cluster node determines that a failed disk exists in the N first disks, the following processing is performed for each fourth sub-logic interval of the data written in the target Segment:
the target cluster node reads data corresponding to the fourth sub-logic interval;
the target cluster node sends a write command for indicating to write data corresponding to the fourth sub-logic interval to each third disk management module, wherein the third disk management module refers to a disk management module corresponding to a disk except for a failed disk and a disk returning data corresponding to the fourth sub-logic interval in the first disk and the second disk;
when determining that a mapping relation between a locally recorded fourth sub-logic interval and a fifth Block has an association mark, a third disk management module allocates a sixth Block to the fourth sub-logic interval and writes data corresponding to the fourth sub-logic interval into the sixth Block;
and the third disk management module updates the mapping relation between the fourth sub-logic interval and the fifth Block into the mapping relation between the fourth sub-logic interval and the sixth Block, and removes the corresponding association mark.
7. The method of claim 6, wherein after the third disk management module updates the mapping relationship between the fourth sub-logical interval and the fifth Block to the mapping relationship between the fourth sub-logical interval and the sixth Block, the method further comprises:
the third disk management module judges whether a mapping relation comprising the fifth Block still exists;
if not, the fifth Block is recovered.
8. A distributed storage cluster is characterized by comprising at least one cluster node, each cluster node comprises at least one disk for storing data, each disk is divided into a plurality of blocks according to a preset Block size, the distributed storage cluster is configured with at least one LUN, each LUN is divided into a plurality of logical intervals according to a preset Segment size, each logical interval is divided into a plurality of sub-logical intervals according to the preset Block size, each cluster node deploys a corresponding disk management module for each disk on the node, each Segment corresponds to a write bitmap, each bit in the write bitmap is used for identifying whether data are written into the corresponding sub-logical interval or not, each Segment also corresponds to a storage bitmap, and each bit in the storage bitmap is used for identifying a storage mode of the data of the corresponding sub-logical interval, the distributed storage cluster is written in by adopting an N copy mode, wherein N is more than or equal to 2;
the target cluster node in the at least one cluster node is used for acquiring a write bitmap and a storage bitmap corresponding to the target Segment when the monitored duration of not accessing the target Segment reaches a preset duration; traversing the writing bitmap and the storage bitmap, and finding N first sub-logic intervals in which data are written and the storage mode is a copy mode; selecting a target disk from N first disks for storing data corresponding to the target Segment; for each first sub-logic interval, sending a read command for reading data corresponding to the first sub-logic interval to a target disk management module corresponding to the target disk; calculating check data according to the data of each first sub-logic interval returned by the target disk management module; sending a write command for indicating to write the verification data into a second disk to a second disk management module corresponding to the second disk, wherein the second disk is a disk which is specified in advance and used for storing the verification data corresponding to the target Segment;
the second disk management module is configured to allocate a first Block to the check data from the second disk, and write the check data into the first Block;
the target cluster node is further configured to send, to each first disk, a deletion command for instructing to delete the designated data copy to the first disk management module corresponding to the first disk, so that the data corresponding to the N first sub-logical intervals are stored in the N first disks, respectively.
9. The cluster of claim 8, wherein:
and the target cluster node is further configured to, for each first sub-logic interval, update the storage mode identified by the corresponding bit of the first sub-logic interval in the storage bitmap to be an erasure code mode.
10. The cluster of claim 8, wherein:
the target cluster node is further configured to, when receiving a write request that needs to be written into the target Segment, determine each second sub-logic interval in the target Segment that the write request relates to, and split the write request into sub-write requests for each second sub-logic interval; for each second sub-logic interval, inquiring the writing bitmap of the target Segment, and determining whether the second sub-logic interval is written with data; if the data is written into the second sub-logic interval, inquiring the storage bitmap of the target Segment, and determining the storage mode of the data corresponding to the second sub-logic interval; if the storage mode of the data corresponding to the second sub-logic interval is an erasure code mode, respectively sending a sub-write request aiming at the second sub-logic interval to each first disk management module, wherein the sub-write request carries the data to be written into the second sub-logic interval and an erasure code mark;
the first disk management module is further configured to allocate a second Block to the second sub-logic interval when it is determined that the sub-write request carries the erasure code flag, write data into the second Block, and record a mapping relationship between the second sub-logic interval and the second Block;
and the target cluster node is further configured to update the storage mode identified by the corresponding bit in the storage bitmap of the second sub-logic interval to be a copy mode.
11. The cluster of claim 8, wherein:
the first disk management module is further configured to record a mapping relationship between a first sub-logic interval to which the deleted data copy belongs and a Block to which the undeleted data copy belongs, and add an association tag to the mapping relationship;
the target cluster node is further configured to, when receiving a read request that requires reading of the target Segment, determine each third sub-logic interval in the target Segment that the read request relates to, and split the read request into sub-read requests for each third sub-logic interval; inquiring the storage bitmap aiming at each third sub-logic interval, and determining a storage mode of data corresponding to the third sub-logic interval; if the storage mode of the data corresponding to the third sub-logic interval is an erasure code mode, respectively sending a first sub-read request aiming at the third sub-logic interval to each first disk management module, wherein the first sub-read request does not include an association mark;
and the first disk management module is further configured to, when it is determined that the mapping relationship between the locally recorded third sub-logic interval and the third Block does not have an association flag, read corresponding data from the third Block and return the corresponding data to the target cluster node.
12. The cluster of claim 11, wherein:
the second disk management module is further configured to record a mapping relationship between each first sub-logic interval and the first Block, and add an association tag;
the target cluster node is further configured to send a second sub-read request for the third sub-logic interval to each first disk management module if the data of the third sub-logic interval is not read, where the second sub-read request carries an association flag;
the first disk management module is further configured to, when it is determined that a mapping relationship between the locally recorded third sub-logic interval and a third Block has an association flag, read corresponding data from the third Block and return the corresponding data to the target cluster node;
the target cluster node is further configured to send a third sub-read request for the third sub-logic interval to the second disk management module;
the second disk management module is further configured to read check data from a fourth Block according to a locally recorded mapping relationship between the third sub-logic interval and the fourth Block, and return the check data to the target cluster node;
and the target cluster node is further configured to perform verification calculation according to the data returned by the first disk management module and the verification data returned by the second disk management module, so as to obtain data corresponding to the third sub-logic interval.
13. The cluster of claim 8, wherein:
the target cluster node is further configured to, when it is determined that a failed disk exists in the N first disks, read data corresponding to a fourth sub-logic interval for each fourth sub-logic interval in which data has been written in the target Segment; sending a write command for indicating to write data corresponding to the fourth sub-logic interval to each third disk management module, wherein the third disk management module refers to a disk management module corresponding to a disk except for the failed disk and the disk returning the data corresponding to the fourth sub-logic interval in the first disk and the second disk;
the third disk management module is configured to allocate a sixth Block to a fourth sub-logic interval when it is determined that a mapping relationship between the locally recorded fourth sub-logic interval and the fifth Block has an association flag, and write data corresponding to the fourth sub-logic interval into the sixth Block; and updating the mapping relation between the fourth sub-logic interval and the fifth Block into the mapping relation between the fourth sub-logic interval and the sixth Block, and removing the corresponding association mark.
14. The cluster of claim 13, wherein:
the third disk management module is further configured to determine whether a mapping relationship including the fifth Block still exists; if not, the fifth Block is recovered.
CN202011025578.4A 2020-09-25 2020-09-25 Data redundancy method and distributed storage cluster Active CN112052124B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011025578.4A CN112052124B (en) 2020-09-25 2020-09-25 Data redundancy method and distributed storage cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011025578.4A CN112052124B (en) 2020-09-25 2020-09-25 Data redundancy method and distributed storage cluster

Publications (2)

Publication Number Publication Date
CN112052124A true CN112052124A (en) 2020-12-08
CN112052124B CN112052124B (en) 2023-09-22

Family

ID=73604833

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011025578.4A Active CN112052124B (en) 2020-09-25 2020-09-25 Data redundancy method and distributed storage cluster

Country Status (1)

Country Link
CN (1) CN112052124B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060136691A1 (en) * 2004-12-20 2006-06-22 Brown Michael F Method to perform parallel data migration in a clustered storage environment
CN102867035A (en) * 2012-08-28 2013-01-09 浪潮(北京)电子信息产业有限公司 High-availability method and device of distributed document system cluster
CN102981927A (en) * 2011-09-06 2013-03-20 阿里巴巴集团控股有限公司 Distribution type independent redundant disk array storage method and distribution type cluster storage system
US20180300203A1 (en) * 2017-04-18 2018-10-18 Netapp, Inc. Systems and methods for backup and restore of distributed master-slave database clusters
CN109783016A (en) * 2018-12-25 2019-05-21 西安交通大学 A kind of elastic various dimensions redundancy approach in distributed memory system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060136691A1 (en) * 2004-12-20 2006-06-22 Brown Michael F Method to perform parallel data migration in a clustered storage environment
CN102981927A (en) * 2011-09-06 2013-03-20 阿里巴巴集团控股有限公司 Distribution type independent redundant disk array storage method and distribution type cluster storage system
CN102867035A (en) * 2012-08-28 2013-01-09 浪潮(北京)电子信息产业有限公司 High-availability method and device of distributed document system cluster
US20180300203A1 (en) * 2017-04-18 2018-10-18 Netapp, Inc. Systems and methods for backup and restore of distributed master-slave database clusters
CN109783016A (en) * 2018-12-25 2019-05-21 西安交通大学 A kind of elastic various dimensions redundancy approach in distributed memory system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘桂华;: "分布式系统中数据冗余的研究", 电脑知识与技术, no. 18 *

Also Published As

Publication number Publication date
CN112052124B (en) 2023-09-22

Similar Documents

Publication Publication Date Title
CN108733761B (en) Data processing method, device and system
US9946460B2 (en) Storage subsystem and storage system architecture performing storage virtualization and method thereof
CN109947363B (en) Data caching method of distributed storage system
EP1569085B1 (en) Method and apparatus for increasing data storage capacity
CN107943867B (en) High-performance hierarchical storage system supporting heterogeneous storage
US7076622B2 (en) System and method for detecting and sharing common blocks in an object storage system
US5778411A (en) Method for virtual to physical mapping in a mapped compressed virtual storage subsystem
CN106951375B (en) Method and device for deleting snapshot volume in storage system
CN113868192B (en) Data storage device and method and distributed data storage system
US7933938B2 (en) File storage system, file storing method and file searching method therein
US11861204B2 (en) Storage system, memory management method, and management node
CN103399823B (en) The storage means of business datum, equipment and system
US20200341874A1 (en) Handling of offline storage disk
US11010072B2 (en) Data storage, distribution, reconstruction and recovery methods and devices, and data processing system
CN112181299B (en) Data restoration method and distributed storage cluster
US20220129346A1 (en) Data processing method and apparatus in storage system, and storage system
CN112052218B (en) Snapshot implementation method and distributed storage cluster
CN105068896B (en) Data processing method and device based on RAID backup
CN109189326B (en) Management method and device of distributed cluster
CN112052124B (en) Data redundancy method and distributed storage cluster
CN112083886B (en) Storage management method, system and device of NAS (network attached storage) equipment
CN111444114B (en) Method, device and system for processing data in nonvolatile memory
CN112052217B (en) Snapshot implementation method and device
CN113050891B (en) Method and device for protecting deduplication data
CN114281267B (en) Data migration method and device between distributed storage systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant