WO2019062856A1 - Data reconstruction method and apparatus, and data storage system - Google Patents

Data reconstruction method and apparatus, and data storage system Download PDF

Info

Publication number
WO2019062856A1
WO2019062856A1 PCT/CN2018/108342 CN2018108342W WO2019062856A1 WO 2019062856 A1 WO2019062856 A1 WO 2019062856A1 CN 2018108342 W CN2018108342 W CN 2018108342W WO 2019062856 A1 WO2019062856 A1 WO 2019062856A1
Authority
WO
WIPO (PCT)
Prior art keywords
storage
data
storage node
target
disk
Prior art date
Application number
PCT/CN2018/108342
Other languages
French (fr)
Chinese (zh)
Inventor
林鹏
汪渭春
林起芊
Original Assignee
杭州海康威视系统技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 杭州海康威视系统技术有限公司 filed Critical 杭州海康威视系统技术有限公司
Publication of WO2019062856A1 publication Critical patent/WO2019062856A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • G06F3/0635Configuration or reconfiguration of storage systems by changing the path, e.g. traffic rerouting, path reconfiguration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/12Shortest path evaluation
    • H04L45/122Shortest path evaluation by minimising distances, e.g. by selecting a route with minimum of number of hops
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/29Flow control; Congestion control using a combination of thresholds
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Definitions

  • the present disclosure relates to the field of data storage technologies, and in particular, to a data reconstruction method and apparatus, and a data storage system.
  • SAS Serial Attached Small Computer System Interface
  • the data storage system based on the SAS protocol includes: a metadata management server (English: Metadata Server; MDS for short), a SAS switch, and a plurality of storage nodes, and the plurality of storage nodes are connected to each other through a SAS switch.
  • Each storage node includes multiple disks.
  • a storage node in the data storage system cuts the target data into a plurality of data blocks (also referred to as striping processing on the target data, the plurality of The data blocks belong to the same stripe) and the multiple data blocks are stored on separate disks.
  • the MDS can be used to store stripe information (a kind of metadata) of each data block, and the stripe information includes: a stripe identifier of the data block (that is, an identifier of a strip to which the data block belongs), and a disk identifier where the data block is located.
  • the identifier of the data block, the data volume of the data block, and the erasure code of the data block (English: Erasure Coding; abbreviation: EC) type.
  • the MDS may send a reconstruction instruction to the storage node, where the reconstruction instruction includes stripe information of each data block stored on the faulty disk, and the storage node may be in accordance with the reconstruction instruction.
  • the stripe information of each data block reconstructs each data block stored on the failed disk, that is, restores each data block.
  • the storage node since a large number of data blocks are usually stored on the disk, and when the data processing capability of the storage node where the failed disk is located is weak, the storage node reconfigures the data stored on the failed disk to be slow.
  • the data reconstruction system has low data reconstruction efficiency.
  • the present disclosure provides a data reconstruction method and device, and a data storage system, which can solve the problem of low data reconstruction efficiency of the data storage system.
  • the technical solution is as follows:
  • a data reconstruction method for a metadata management server MDS in a data storage system, the data storage system further comprising: a serial connection small computer system interface SAS switch and a plurality of storage nodes, The plurality of storage nodes are connected to each other by the SAS switch, and the method includes:
  • the n targets There is a storage node different from the first storage node in the storage node, m ⁇ n ⁇ 1;
  • the method before the sending the reconfiguration instruction to the n target storage nodes of the plurality of storage nodes, the method further includes:
  • the n storage nodes that are closest to the routing distance of the MDS among the plurality of storage nodes are determined as the n target storage nodes.
  • sending the reconfiguration instructions to the n target storage nodes of the multiple storage nodes, respectively including:
  • the method further includes:
  • the related data block is not stored on the target disk, and the related data block belongs to the same strip as any one of the m data blocks.
  • the method further includes:
  • the storage information of each of the data blocks includes: an identifier of the target disk and an identifier of each of the data blocks;
  • each of the plurality of storage nodes includes: a storage disk and a cache disk, each storage node having read permission of the storage disk, and read and write permissions of the cache disk,
  • the n target storage nodes include a cache storage node, and the reconstruction instruction sent to the cache storage node is used to indicate: storing the reconstructed data block in a cache disk of the cache storage node, to the first storage After the node sends the acquisition instruction, the method further includes:
  • a data reconstruction apparatus for a metadata management server MDS in a data storage system, the data storage system further comprising: a serial connection small computer system interface SAS switch and a plurality of storage nodes, The plurality of storage nodes are connected to each other by the SAS switch, and the method includes:
  • a first sending module configured to send a reconfiguration instruction to each of the plurality of storage nodes when detecting a faulty disk in which the m data blocks are stored in the first storage node, where
  • the first storage node is any one of the plurality of storage nodes, and the n reconstruction instructions sent to the n target reconstruction nodes are used to indicate that the m data blocks are reconstructed and Storage, a storage node different from the first storage node exists in the n target storage nodes, m ⁇ n ⁇ 1;
  • a second sending module configured to send an acquisition instruction to the first storage node, where the obtaining instruction is used to instruct the first storage node to acquire and store the m data blocks reconstructed by the n target storage nodes .
  • the data reconstruction device further includes:
  • a first determining module configured to determine n storage nodes that are less loaded among the plurality of storage nodes as the n target storage nodes
  • the second determining module is configured to determine, as the n target storage nodes, the n storage nodes preset in the multiple storage nodes;
  • the third determining module is configured to determine, as the n target storage nodes, n storage nodes that are closest to the routing distance of the MDS among the plurality of storage nodes.
  • the first sending module is configured to:
  • the data reconstruction device further includes:
  • a first receiving module configured to receive a storage application message sent by the first storage node, where the storage application message includes a total data volume of the m data blocks;
  • a third sending module configured to send a storage instruction to the first storage node according to the storage request message, where the storage instruction is used to indicate that the reconstructed m data blocks are stored on a target disk, the target disk A disk that has a storage capacity greater than or equal to the total amount of data in the first storage node.
  • the related data block is not stored on the target disk, and the related data block belongs to the same strip as any one of the m data blocks.
  • the data reconstruction device further includes:
  • a second receiving module configured to receive storage information of each of the data blocks sent by the first storage node, where the storage information of each data block includes: an identifier of the target disk, and each of the data blocks Identification
  • a fourth determining module configured to determine stripe information of each of the data blocks according to the identifier of each of the data blocks in the storage information of each data block;
  • a modification module configured to modify an identifier of the disk where each of the data blocks in the strip information of each data block is located to an identifier of the target disk.
  • each of the plurality of storage nodes includes: a storage disk and a cache disk, each storage node having read permission of the storage disk, and read and write permissions of the cache disk,
  • the n target storage nodes include a cache storage node, and the reconstruction instruction sent to the cache storage node is used to indicate that the reconstructed data block is stored in a cache disk of the cache storage node, and the data reconstruction device further include:
  • a third receiving module configured to receive the acquired information message sent by the first storage node, where the obtained information message is used to indicate that the first storage node has acquired and stored the reconstructed m data blocks;
  • a fourth sending module configured to send a delete instruction to the cache storage node, where the delete instruction is used to instruct the cache storage node to delete a data block stored on a cache disk.
  • a data storage system comprising: a metadata management server MDS, a plurality of storage nodes, and a serial connection small computer system interface SAS switch, wherein the plurality of storage nodes pass the SAS The switches are interconnected, and the MDS comprises the data reconstruction device of the second aspect.
  • a fourth aspect provides a computer device including a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory complete communication with each other through a bus; the memory is configured to store the computer program; For executing the program stored on the memory, implementing the method steps described in the first aspect.
  • a computer readable storage medium having stored therein a computer program, the computer program being executed by a processor to implement the method steps of the first aspect.
  • a data storage system comprising: a metadata management server MDS, a plurality of storage nodes, and a serial connection small computer system interface SAS switch, wherein the plurality of storage nodes pass the SAS The switches are interconnected and the MDS comprises the computer device of claim 16.
  • a computer program product which, when run on a computer, causes the computer to perform the method steps of the first aspect.
  • the MDS When detecting the faulty disk in the first storage node, the MDS sends a reconfiguration instruction to the n target storage nodes, so that the n target storage nodes reconstruct the data blocks on the failed disk, and the MDS may also indicate A storage node acquires a data block reconstructed by each target storage node.
  • the data reconstruction capability of the first storage node is weak, since there are other storage nodes different from the first storage node among the n target storage nodes, other storage nodes can help the first storage node to weight the data block. Therefore, the first storage node needs less reconstructed data, and the reconstruction of the data block stored on the faulty disk is faster, so the data reconstruction efficiency of the data storage system is improved.
  • FIG. 1 is a schematic structural diagram of a data storage system according to an embodiment of the present disclosure
  • FIG. 2 is a schematic diagram of a disk in a storage node according to an embodiment of the present disclosure
  • FIG. 3 is a flowchart of a method for data reconstruction according to an embodiment of the present disclosure
  • FIG. 4 is a flowchart of another method for data reconstruction according to an embodiment of the present disclosure.
  • FIG. 5 is a schematic structural diagram of a data reconstruction apparatus according to an embodiment of the present disclosure.
  • FIG. 6 is a schematic structural diagram of another data reconstruction apparatus according to an embodiment of the present disclosure.
  • FIG. 7 is a schematic structural diagram of still another data reconstruction apparatus according to an embodiment of the present disclosure.
  • FIG. 8 is a schematic structural diagram of still another data reconstruction apparatus according to an embodiment of the present disclosure.
  • FIG. 1 is a schematic structural diagram of a data storage system according to an embodiment of the present disclosure.
  • the data storage system includes: an MDS 01, a plurality of storage nodes 02, a SAS switch 03, and an Ethernet switch 04.
  • the MDS 01 and the plurality of storage nodes 02 are connected by an Ethernet switch 04, and the plurality of storage nodes 02 are connected by a SAS switch 03.
  • the MDS 01 may be a server or a server cluster
  • the storage node 02 may be a device having a storage function, such as a server or a computer.
  • Each storage node consists of multiple disks, each of which is used to store data.
  • each storage node may further include a processor, where the storage server (English: Object Storage Device; OSD), an audit server (English: AUDITOR), and a slicing server (English: Stripe Server) Abbreviation: SS), that is, each storage node can run OSD, audit server and SS.
  • FIG. 2 is a schematic diagram of a disk in a storage node according to an embodiment of the present disclosure.
  • FIG. 2 shows a total of five storage nodes 02.
  • Each storage node 02 can include multiple disks.
  • the OSD running on each storage node is capable of reading data stored on any disk through the SAS switch.
  • the user terminal can store data on the disk on the storage node in FIG. 1, and read the data in the disk on the storage node in FIG.
  • the user terminal when the user terminal needs to write the target data A in the disk, the user terminal can send a write request to the MDS, at which time the MDS allocates the SS for the user terminal and assigns the EC type to the target data A. Then, the user terminal can transmit the target data A that needs to be stored to the SS allocated by the MDS. After receiving the target data A, the SS needs to apply for the stripe resource to the MDS.
  • the MDS may allocate a strip resource to the SS according to an erasure code (English: Erasure Coding; EC) type assigned to the target data A, and the strip resource may include a disk on multiple storage nodes.
  • EC Erasure Coding
  • the information of the stripe resource allocated by the MDS to the SS is: ⁇ stripe_id, OSD_1, wwn_1>, ⁇ stripe_id, OSD_1, wwn_2>, ⁇ stripe_id, OSD_1, wwn_3>, ⁇ stripe_id, OSD_1, wwn_4>, ⁇ stripe_id, OSD_1, wwn_5> ⁇ .
  • each two adjacent " ⁇ ” and “>” represents information of a disk on a storage node, and between “ ⁇ ” and “ ⁇ ” represents a stripe resource allocated for the data, and the stripe_id is a stripe ID, wwn is the disk ID. That is, the MDS allocates five disks on the five storage nodes for the target data A, namely the disks wwn_1, wwn_2, wwn_3, wwn_4, and wwn_5, and the MDS allocates the write rights of the five disks to the OSD1. These five disks are used to store the striped target data A, and the stripe identifiers in the information of the stripe resources are the same.
  • the stripe resources allocated by the MDS include a total of n disks on n storage nodes, and each disk is used to store one data block of the target data A.
  • the SS may also generate a secret key (English: key) for each data block of the target data A, the secret key of each data block may be used as an identifier of the data block, and the key of each data block may also identify the data block.
  • the data block is the original object block target data A1 or the redundant object block target data A2. Then, the SS can obtain ⁇ stripe_id, OSD, wwn, key, value> according to the information of adding the data block and the secret key of the data block to the disk. Where value represents a data block.
  • the information of the stripe resource becomes ⁇ stripe_id, OSD_1, wwn_1, key_1, value_1>, ⁇ stripe_id, OSD_1, wwn_2, key_2, value_2>, ⁇ stripe_id, OSD_1, wwn_3, key_3, value_3>, ⁇ stripe_id, OSD_1, wwn_4, key_4, value_4>, ⁇ stripe_id, OSD_1, wwn_5, key_5, value_5> ⁇ .
  • the SS may also send ⁇ wwn, key, value> in the information of each disk to the corresponding OSD, where the OSD is the OSD indicated by the OSD identifier in the information of the disk.
  • the OSD After receiving the ⁇ wwn,key,value>, the OSD can write ⁇ key,value> to the disk indicated by wwn.
  • the SS write success message is returned, and the SS can determine that the data block is successfully written according to the write success message.
  • the stripe information of the target data A in the information of the strip resource may be Return to the MDS store.
  • the stripe information of the target data A may be: ⁇ stripe_id, wwn_1, key_1>, ⁇ stripe_id, wwn_2, key_2>, ⁇ stripe_id, wwn_3, key_3>, ⁇ stripe_id, wwn_4, key_4>, ⁇ stripe_id, wwn_5, Key_5> ⁇ .
  • ⁇ and “ ⁇ ” is represented as stripe information of the target data A, and each two adjacent " ⁇ ” and “>” represents storage information of one data block, and stripe of the target data A
  • the information is also referred to as stripe information of each data block of the target data A.
  • the user terminal needs to read the target data A stored in the disk, the user terminal needs to send a read request to the MDS.
  • the MDS can read the stripe information of the previously recorded target data A according to the read request.
  • the stripe information of the target data A is ⁇ stripe_id, wwn_1, key_1>, ⁇ stripe_id, wwn_2, key_2>, ⁇ stripe_id, wwn_3, key_3>, ⁇ stripe_id, wwn_4, key_4>, ⁇ stripe_id, wwn_5, key_5> ⁇ .
  • the MDS may determine the OSD that previously stored the target data A according to the disk indicated by wwn in the strip information, and send the stripe information of the target data A and the identifier of the user terminal to the SS of the OSD local (the OSD) Running on the same storage node as the SS).
  • the SS may send ⁇ wwn, key> in the storage information of each original target block target data A1 in the stripe information to the local OSD. Then, the OSD can read the data block (also called value) on the disk indicated by wwn according to the received key in ⁇ wwn, key>, and return ⁇ key, value> to the local SS.
  • the SS After receiving all ⁇ key, value> returned by the OSD, the SS can get the received ⁇ key, value> combination to get ⁇ key_1, value_1>, ⁇ key_2, value_2>, ⁇ key_3, value_3> ⁇ (assuming value_1, Both value_2 and value_3 are the original object block target data A1, and value_4 and value_5 are redundant object block target data A2). Finally, the SS can package the value_1, value_2, and value_3 in ⁇ key_1, value_1>, ⁇ key_2, value_2>, ⁇ key_3, value_3> ⁇ to obtain the target data A, and send the target data A to the user terminal.
  • the embodiment of the present disclosure provides a data reconstruction method for Reconstruct the data blocks stored on the failed disk.
  • FIG. 3 is a flowchart of a method for data reconstruction according to an embodiment of the present disclosure.
  • the data reconstruction method may be used for an MDS in a data storage system (such as MDS 01 shown in FIG. 1 ), as shown in FIG. 3 .
  • the data reconstruction method includes:
  • Step 301 When detecting a faulty disk in which the m data blocks are stored in the first storage node, send a reconstruction instruction to each of the plurality of storage nodes, where the first storage node is multiple Any one of the storage nodes, the n reconstruction instructions sent to the n target reconstruction nodes are used to indicate that the m data blocks are reconstructed and stored, and the n storage nodes exist and the first storage node Different storage nodes, m ⁇ n ⁇ 1.
  • Step 302 Send an acquisition instruction to the first storage node, where the acquisition instruction is used to instruct the first storage node to acquire and store the m data blocks reconstructed by the n target storage nodes.
  • the embodiment of the present disclosure provides a data reconstruction method.
  • the MDS sends a reconstruction instruction to the n target storage nodes, so that n target storages are performed.
  • the node reconstructs the data block on the failed disk, and the MDS may also instruct the first storage node to acquire the data block reconstructed by each target storage node.
  • the data reconstruction capability of the first storage node is weak, since there are other storage nodes different from the first storage node among the n target storage nodes, other storage nodes can help the first storage node to weight the data block. Therefore, the first storage node needs less reconstructed data, and the reconstruction of the data block stored on the faulty disk is faster, so the data reconstruction efficiency of the data storage system is improved.
  • FIG. 4 is a flowchart of another method for data reconstruction according to an embodiment of the present disclosure. As shown in FIG. 6, the data reconstruction method includes:
  • Step 401 The first storage node sends a fault message to the MDS, where the fault message is used to indicate that a faulty disk is present in the first storage node.
  • each storage node in FIG. 1 may include multiple disks, and the multiple disks include a storage disk and a cache disk.
  • the storage disk and the cache disk may be solid state drives (English: Solid State Drives; SSD), serial hard disks (also called SATA hard disks), or SAS disks.
  • the cache disk is a solid state drive (English: Solid State Drives; SSD), and the storage disk is a serial hard disk (also called a SATA hard disk) or a SAS disk.
  • the first storage node is any one of the plurality of storage nodes, and the failed disk may be a storage disk of the first storage node.
  • a plurality of disks (six disks shown in FIG. 2) of each storage node 02 may include: five storage disks and one cache disk.
  • the OSD running on each storage node 02 has the right to write data to the cache disk on the storage node 02, and has the right to read data for each storage disk and each cache disk on the storage node 02.
  • the number of the storage disks in the storage node 02 may be any integer greater than or equal to 1
  • the number of the cache disks may be any integer greater than or equal to 1, which is not limited in the embodiment of the present disclosure.
  • the OSD can be used to monitor whether the storage disk in the storage node is faulty.
  • the OSD determines that the storage disk is a faulty disk, and sends a fault message to the MDS, and the fault message can be used to indicate the fault. Faulty disk.
  • the fault message can include an identification of the failed disk.
  • Step 402 The MDS determines a faulty disk in the first storage node according to the fault message.
  • the MDS can parse the fault message, obtain the identifier of the faulty disk in the fault message, and determine the faulty disk in the first storage node.
  • Step 403 The MDS acquires stripe information of m data blocks stored on the faulty disk.
  • the MDS stores strip information of each data block stored on each disk in the data storage system. After the MDS determines the faulty disk in the first storage node, the MDS may determine m data blocks stored on the faulty disk, and obtain stripe information of each of the m data blocks. Where m ⁇ 1.
  • Step 404 The MDS determines n target storage nodes in the data storage system.
  • the MDS can determine a target storage node in the data storage system in a variety of implementable ways.
  • the following three implementable modes will be exemplified in the embodiments of the present disclosure.
  • the MDS may first determine the load of each storage node in the data storage system other than the first storage node. It should be noted that the load of the storage node may be positively correlated with at least one performance parameter of the storage node, where the performance parameters of the storage node include: usage of the processor in the storage node, memory of the storage node (including all in the storage node) Disk usage and storage efficiency of storage nodes.
  • the MDS can determine a target storage node based on the load of each storage node other than the first storage node. For example, the MDS may compare the load of the storage nodes other than the first storage node in the data storage system, and determine one storage node with the smallest load as the target storage node. That is, the MDS may select a storage node with a minimum load (higher data processing capability) other than the first storage node as a target for performing a data reconstruction task when selecting a target storage node that needs to perform a reconstruction task.
  • the storage node is configured to ensure that the target storage node can perform the task of reconstructing the data block faster, and improve the efficiency of data reconstruction.
  • a storage node in the data storage system has a preset storage node different from the first storage node, and the preset storage node may be a storage node with higher data processing capability.
  • the MDS can directly determine the preset storage node as the target storage node.
  • the MDS may first determine a routing distance of each storage node except the first storage node and the MDS in the data storage system, and according to each storage node except the first storage node.
  • the routing distance from the MDS determines a target storage node.
  • the MDS may compare the storage node of the data storage system except the first storage node with the routing distance of the MDS, and determine a storage node with the smallest routing distance from the MDS as the target storage node. That is, when selecting the target storage node that needs to perform the reconstruction task, the MDS may select the storage node closest to the routing distance of the MDS as the storage data reconstruction task among the storage nodes other than the first storage node.
  • the target storage node ensures that the MDS can quickly allocate the reconstructed data block to the target storage node, thereby improving the efficiency of data reconstruction.
  • the MDS can determine n target storage nodes in the data storage system in a variety of implementable ways.
  • the following three implementable modes will be exemplified in the embodiments of the present disclosure.
  • the MDS may first determine the load of each storage node in the data storage system.
  • the MDS can determine n target storage nodes based on the load of each storage node. For example, a preset number threshold n may be pre-stored on the MDS.
  • the MDS can compare the loads of the storage nodes in the data storage system and determine the n storage nodes with smaller loads as the n target storage nodes. That is, when the MDS selects the target storage node that needs to perform the reconstruction task, the storage node with smaller load (higher data processing capability) can be selected as the target storage node for performing the data reconstruction task to ensure the target storage.
  • the node can perform the task of reconstructing the data block faster, and improve the efficiency of data reconstruction.
  • n preset storage nodes there are n preset storage nodes in the storage node in the data storage system, and the n preset storage nodes may be storage nodes with higher data processing capability.
  • the MDS can directly determine the preset n storage nodes as n target storage nodes.
  • the MDS may first determine a routing distance between each storage node and the MDS in the data storage system, and determine n target storage nodes according to a routing distance between each storage node and the MDS. For example, a preset number threshold n may be pre-stored on the MDS.
  • the MDS can compare the storage nodes of the data storage system with the routing distance of the MDS, and determine n storage nodes with a small routing distance from the MDS as n target storage nodes. That is, when selecting the target storage node that needs to perform the reconstruction task, the MDS may select the storage node that is closer to the routing distance of the MDS as the target storage node for performing the data reconstruction task, so as to ensure that the MDS can be quickly followed.
  • the task of allocating reconstructed data blocks to each target storage node improves the efficiency of data reconstruction.
  • Step 405 The MDS separately sends n reconstruction instructions to the n target storage nodes.
  • the MDS may determine the correspondence of each target storage node according to the load of the n target storage nodes and the stripe information of the m data blocks. At least one data block. The sum of the data amounts of all the data blocks corresponding to the target storage node is negatively correlated with the load of the target storage node.
  • the target storage node can reconstruct data with a smaller amount of data. At this time, the sum of the data amounts of all the data blocks corresponding to the target storage node is small; if the load of the target storage node is small The target storage node can reconstruct data of a larger amount of data. At this time, the sum of the data amounts of all the data blocks corresponding to the target storage node is large. That is, the data reconstruction capability of the target storage node is related to the load of the target storage node. The MDS needs to allocate a data block to be reconstructed for each target storage node according to the load and reconstruction capability of each target storage node.
  • the MDS may generate a reconstruction instruction corresponding to each target storage node according to the stripe information of the data block.
  • the reconfiguration instruction corresponding to each target storage node is used to indicate that the data block corresponding to each target storage node is reconstructed and stored.
  • the MDS can send the reconfiguration instruction corresponding to each target storage node to the audit server running on each target storage node.
  • the reconfiguration instruction corresponding to each target storage node includes: stripe information of each data block corresponding to each target storage node, and a storage node for indicating whether each target storage node is a faulty disk. Instructions.
  • Step 406 The n target storage nodes reconstruct and store the data block according to the received reconstruction instruction.
  • the audit server running on each target storage node may parse the reconstruction instruction to obtain stripe information of at least one data block that needs to be reconstructed.
  • the audit server running on each target storage node can also read the valid data required for reconstructing each data block through the local OSD according to the stripe information of each data block that needs to be reconstructed, and the valid data storage. In at least one disk in the data storage system. It should be noted that, assuming that the data block X is a data block that needs to be reconstructed, and the stripe information of the data block X includes: the storage information of the data block X, and the storage information of the data block Y, the process of reconstructing the data block X The valid data required in is the data block Y. Afterwards, the audit server running on each target storage node can reconstruct the corresponding data block according to the valid data read and the received reconstruction instructions.
  • the audit server running on each target storage node may parse the reconfiguration instruction to obtain a storage node for indicating whether each target storage node is a faulty disk. Instructions.
  • the audit server running on the target storage node may determine that the reconstruction instruction is used to indicate that the target storage node is to be reconstructed.
  • the data block is stored on the cache disk.
  • the audit server running on the target storage node may send the reconstructed data block to the local OSD and instruct the local OSD to write the data block to the target.
  • the cache disk in the storage node may be used to indicate that the target storage node is not the storage node where the faulty disk is located.
  • the audit server running on the first storage node may determine that the reconstruction instruction is used to indicate the The first storage node stores the reconstructed data block on the cache disk. After the audit server running on the first storage node reconstructs the data block, the audit server running on the first storage node may send the reconstructed data block to the local OSD and instruct the local OSD to write the data block. The first storage node is in the cache disk.
  • the audit server running on the first storage node may also determine that the reconstruction instruction is used to indicate The target storage node stores the reconstructed data block on a storage disk.
  • the audit server running on the first storage node may send a storage request message to the MDS, the storage request message including the total data amount of the m data blocks.
  • the MDS may send, according to the storage request message, a storage instruction to the audit server running on the first storage node, where the storage instruction is used to instruct the first storage node to store the reconstructed m data blocks on the target disk, where the target disk may be a storage disk in which the storage capacity of the first storage node is greater than or equal to the total data amount of the m data blocks, and the related data block is not stored on the target disk, and the related data block and any data in the m data blocks are not stored. Blocks belong to the same strip.
  • the audit server running on the first storage node may send the reconstructed data block to the local OSD, and instruct the local OSD to write the data block to the first In the target disk in the storage node.
  • Step 407 The n target storage nodes respectively send a reconstruction complete message to the MDS.
  • the target storage node may send a reconstruction completion message to the MDS.
  • the reconstruction completion message sent by each target storage node may include: an identifier of each data block reconstructed by the target storage node, and an identifier of a disk stored by each of the data blocks.
  • Step 408 The MDS sends an acquisition instruction to the first storage node, where the acquisition instruction is used to instruct the first storage node to acquire and store the m data blocks reconstructed by the n target storage nodes.
  • the MDS may determine that each target storage node has completed the data reconstruction task of the MDS allocation. At this time, the MDS may run on the first storage node.
  • the audit server sends an acquisition instruction to instruct the audit server to acquire and store the data block reconstructed by each target storage node. It should be noted that the obtaining instruction may include: an identifier of each data block in the m data blocks, and an identifier of a disk stored in each of the data blocks.
  • Step 409 The first storage node acquires and stores the reconstructed m data blocks according to the acquisition instruction.
  • the audit server running on the first storage node may determine the identifier of the disk stored in each of the reconstructed m data blocks according to the obtaining instruction, and obtain the reconstructed data block on the corresponding disk.
  • the audit server running on the first storage node can read or copy the corresponding disk through the local OSD and SAS switches.
  • Reconstructed data blocks stored on cache disks of other storage nodes.
  • the audit server running on the first storage node can directly read the reconstructed data block stored on the local cache disk.
  • the audit server running on the first storage node can obtain the reconstructed data block without performing the step of reading the data block. .
  • the audit server running on the first storage node may store the reconstructed data block.
  • the audit server running on the first storage node may further send a storage application message to the MDS, the storage application.
  • the message can include the total amount of data for m data blocks.
  • the MDS may send, according to the storage request message, a storage instruction to the audit server running on the first storage node, where the storage instruction is used to instruct the first storage node to store the reconstructed m data blocks on the target disk. Then, the audit server running on the first storage node can store the obtained reconstructed m data blocks on the target disk.
  • the audit server running on the first storage node may directly store the obtained reconstructed m data blocks on the target disk in step 409, and The step of performing repeated storage on the reconstructed data block stored by the first storage node is performed, and the reconstructed data block can be guaranteed to be stored on the target disk.
  • Step 410 The first storage node sends an acquisition completion message to the MDS.
  • the audit server running on the first storage node may send an acquisition completion message to the MDS, where the acquisition completion message may be used to indicate that the first storage node has acquired the reconstruction of each target storage node. data block.
  • Step 411 The first storage node sends, to the MDS, storage information of each of the m data blocks, where the storage information of each data block includes: an identifier of the target disk and an identifier of each data block.
  • the storage information of each data block may be sent to the MDS, where the storage information includes the identifier of the data block (such as the data block). Key), and the identity of the target disk where the data block is located.
  • Step 412 The MDS updates the stripe information of each data block in the m data blocks.
  • the MDS may search for the strip information of each data block according to the identifier of the data block in the storage information of each data block, and the strip information of each data block is The ID of the disk where the data block is located is modified to the ID of the target disk.
  • the identifier of the data block X is key1.
  • the stripe information of the data block X may be: ⁇ stripe_id, wwn_1, key_1>, ⁇ stripe_id, wwn_2, key_2>, ⁇ stripe_id, wwn_3, key_3>, ⁇ stripe_id, wwn_4, key_4>, ⁇ stripe_id, wwn_5, key_5> ⁇
  • the MDS can modify wwn_1 in the stripe information to wwn_x (identification of the target disk),
  • the stripe information of the data block X is updated to: ⁇ stripe_id, wwn_x, key_1>, ⁇ stripe_id, wwn_2, key_2>, ⁇ stripe_id, wwn_3, key
  • Step 413 The MDS sends a delete instruction to each of the n target storage nodes, where the delete instruction is used to instruct the cache storage node to delete the data block stored on the cache disk of the cache storage node.
  • the n target storage nodes include cache storage nodes, and each cache storage node stores the reconstructed data blocks on the cache disk of the cache storage node after reconstructing the data block.
  • each target storage node is a cache storage node.
  • the first storage node is excluded from the n target storage nodes.
  • Each target storage node is a cache storage node.
  • the MDS may send a delete instruction to the OSD running on each cache storage node to indicate that the OSD running on each cache storage node deletes the cache storage.
  • the data block (that is, the reconstructed data block) stored on the node's cache disk.
  • Step 414 Each cache storage node deletes a data block stored on a cache disk of the cache storage node according to the delete instruction.
  • the OSD running on each cache storage node can directly delete the data block stored on the cache disk of the cache storage node.
  • the MDS may determine the storage node 1, the storage node 2, and the storage node 3.
  • the storage node 4 and the storage node 5 are both target storage nodes.
  • the MDS can also send reconstruction instructions to the storage node 1, the storage node 2, the storage node 3, the storage node 4, and the storage node 5, respectively.
  • the storage node 1 can reconstruct the data block 1 according to the received reconstruction instruction
  • the storage node 2 can reconstruct the data block 2 according to the received reconstruction instruction
  • the storage node 3 can reconstruct the data block 3 according to the received reconstruction instruction.
  • the storage node 4 can reconstruct the data block 4 according to the received reconstruction instruction
  • the storage node 5 can reconstruct the data block 5 according to the received reconstruction instruction.
  • the storage disk 1-1 (faulty disk) stores the data block 1, the data block 2, the data block 3, the data block 4, and the data block 5.
  • the storage node 1 may also send a storage request message to the MDS, and the MDS may send a storage instruction to the storage node 1 for instructing the storage node 1 to store the data block on the storage disk 6-1 (target disk).
  • the storage node 1 can store the reconstructed data block 1 on the storage disk 6-1
  • the storage node 2 can store the reconstructed data block 2 on the cache disk 2
  • the storage node 3 can store the reconstructed data block 3
  • the storage node 4 can store the reconstructed data block 4 on the cache disk 4
  • the storage node 5 can store the reconstructed data block 5 on the cache disk 5.
  • each target storage node may send a reconstruction complete message to the MDS, so that the MDS sends an acquisition instruction to the storage node 1 after receiving the reconstructed message sent by all the target storage nodes.
  • the storage node 1 can obtain the reconstructed data block 2, the data block 3, the data block 4, and the data block stored on the cache disk 2, the cache disk 3, the cache disk 4, and the cache disk 5 through the SAS switch according to the received acquisition instruction. 5, and the data block 2, the data block 3, the data block 4, and the data block 5 are also stored on the storage disk 6-1.
  • the storage node 1 may also send an acquisition complete message to the MDS, and the MDS may store the storage node 2, the storage node 3, and the storage according to the received acquisition completed message.
  • the node 4 and the storage node 5 respectively send a delete instruction to instruct the storage node 2, the storage node 3, the storage node 4, and the storage node 5 to delete the data blocks stored on the local cache disk, respectively.
  • the storage node 1 can also transmit the storage information of the data block like the MDS.
  • the MDS can update the stripe information of the data block according to the storage information of the data block.
  • the embodiment of the present disclosure provides a data reconstruction method.
  • the MDS sends a reconstruction instruction to the n target storage nodes, so that n target storages are performed.
  • the node reconstructs the data block on the failed disk, and the MDS may also instruct the first storage node to acquire the data block reconstructed by each target storage node.
  • the data reconstruction capability of the first storage node is weak, since there are other storage nodes different from the first storage node among the n target storage nodes, other storage nodes can help the first storage node to weight the data block. Therefore, the first storage node needs less reconstructed data, and the reconstruction of the data block stored on the faulty disk is faster, so the data reconstruction efficiency of the data storage system is improved.
  • FIG. 5 is a schematic structural diagram of a data reconstruction apparatus according to an embodiment of the present disclosure.
  • the data reconstruction apparatus may be used in an MDS (such as the MDS shown in FIG. 1) in a data storage system, as shown in FIG.
  • the data reconstruction device 50 can include:
  • a first sending module 501 configured to send a reconfiguration instruction to each of the plurality of storage nodes when detecting a faulty disk in which the m data blocks are stored in the first storage node, where the first The storage node is any one of the plurality of storage nodes, and the n reconstruction instructions sent to the n target reconstruction nodes are used to indicate that the m data blocks are reconstructed and stored, and the n target storage nodes exist.
  • the second sending module 502 is configured to send an acquisition instruction to the first storage node, where the obtaining instruction is used to instruct the first storage node to acquire and store the m data blocks reconstructed by the n target storage nodes.
  • the embodiment of the present disclosure provides a data reconstruction apparatus, where the first sending module sends a reconfiguration instruction to the n target storage nodes when detecting a faulty disk in the first storage node, so that n The target storage node reconstructs the data block on the failed disk, and the second sending module may instruct the first storage node to acquire the data block reconstructed by each target storage node.
  • the data reconstruction capability of the first storage node is weak, since there are other storage nodes different from the first storage node among the n target storage nodes, other storage nodes can help the first storage node to weight the data block. Therefore, the first storage node needs less reconstructed data, and the reconstruction of the data block stored on the faulty disk is faster, so the data reconstruction efficiency of the data storage system is improved.
  • the data reconstruction device 50 may further include:
  • a first determining module (not shown in FIG. 5), configured to determine n storage nodes with smaller loads among the plurality of storage nodes as n target storage nodes;
  • a second determining module (not shown in FIG. 5), configured to determine n storage nodes preset among the plurality of storage nodes as n target storage nodes;
  • the third determining module (not shown in FIG. 5) is configured to determine n storage nodes that are closest to the routing distance of the MDS among the plurality of storage nodes as n target storage nodes.
  • the first sending module 501 is configured to: determine a load of each target storage node; determine, according to a load of the n target storage nodes, at least one data block corresponding to each target storage node, where the target storage node corresponds to The sum of the data amounts of all the data blocks is negatively correlated with the load of the target storage node; generating a reconstruction instruction corresponding to each target storage node, wherein the reconstruction instruction corresponding to each target storage node is used to indicate: for each target The data block corresponding to the storage node is reconstructed and stored; and each corresponding storage node is sent its corresponding reconstruction instruction.
  • FIG. 6 is a schematic structural diagram of another data reconstruction apparatus according to an embodiment of the present disclosure.
  • the data reconstruction apparatus 50 may further include:
  • the first receiving module 503 is configured to receive a storage application message sent by the first storage node, where the storage application message includes a total data volume of the m data blocks;
  • the third sending module 504 is configured to send, to the first storage node, a storage instruction according to the storage request message, where the storage instruction is used to store the reconstructed m data blocks on the target disk, where the target disk is an available storage capacity in the first storage node.
  • the relevant data block is not stored on the target disk, and the related data block belongs to the same strip as any one of the m data blocks.
  • FIG. 7 is a schematic structural diagram of another data reconstruction apparatus according to an embodiment of the present disclosure.
  • the data reconstruction apparatus 50 may further include:
  • the second receiving module 505 is configured to receive storage information of each data block sent by the first storage node, where the storage information of each data block includes: an identifier of the target disk and an identifier of each data block;
  • a fourth determining module 506, configured to determine strip information of each data block according to an identifier of each data block in the storage information of each data block;
  • the modifying module 507 is configured to modify the identifier of the disk where each data block in each strip of the data block is located to be the identifier of the target disk.
  • each of the plurality of storage nodes includes: a storage disk and a cache disk, each storage node has read permission of the storage disk, and read and write permissions of the cache disk, and the n target storage nodes include a cache storage node.
  • the reconfiguration instruction sent to the cache storage node is used to indicate that the reconstructed data block is stored in the cache disk of the cache storage node.
  • FIG. 8 is a schematic structural diagram of another data reconstruction apparatus according to an embodiment of the present disclosure, such as As shown in FIG. 8, on the basis of FIG. 5, the data reconstruction apparatus 50 may further include:
  • the third receiving module 508 is configured to receive the obtained completion message sent by the first storage node, where the obtained complete message is used to indicate that the first storage node has acquired and stored the reconstructed m data blocks.
  • the fourth sending module 509 is configured to send a delete instruction to the cache storage node, where the delete instruction is used to instruct the cache storage node to delete the data block stored on the cache disk.
  • the embodiment of the present disclosure provides a data reconstruction apparatus, where the first sending module sends a reconfiguration instruction to the n target storage nodes when detecting a faulty disk in the first storage node, so that n The target storage node reconstructs the data block on the failed disk, and the second sending module may instruct the first storage node to acquire the data block reconstructed by each target storage node.
  • the data reconstruction capability of the first storage node is weak, since there are other storage nodes different from the first storage node among the n target storage nodes, other storage nodes can help the first storage node to weight the data block. Therefore, the first storage node needs less reconstructed data, and the reconstruction of the data block stored on the faulty disk is faster, so the data reconstruction efficiency of the data storage system is improved.
  • Embodiments of the present disclosure provide a computer device having a computer program running thereon, the processor in the computer device executing a computer program to implement the data reconstruction method described above.
  • the MDS in the data storage system shown in Figure 1 can include the computer device.
  • Embodiments of the present disclosure provide a storage medium on which a computer program is stored, and a processor executes a computer program to implement the data reconstruction method described above.
  • Embodiments of the present disclosure provide a computer program product that, when executed on a computer, causes the computer to perform the data reconstruction method described above.
  • the embodiment of the method provided by the embodiment of the present disclosure can refer to the corresponding device embodiment, and the embodiment of the present disclosure does not limit this.
  • the sequence of the steps of the method embodiments provided by the embodiments of the present disclosure can be appropriately adjusted, and the steps can also be correspondingly increased or decreased according to the situation. Any person skilled in the art can easily think of changes within the technical scope disclosed by the disclosure. The method should be covered by the scope of the present disclosure, and therefore will not be described again.
  • a person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium.
  • the storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A data reconstruction method, apparatus, and system, relating to the technical field of data storage. The method comprises: when detecting the occurrence of a faulty disk storing m data blocks in a first storage node, respectively sending a reconstruction command to n target storage nodes amongst a plurality of storage nodes, the first storage node being any storage node amongst the plurality of storage nodes, and the n reconstruction commands sent to the n target reconstruction nodes being used for instructing: implementing reconstruction of m data blocks and storing same, different storage nodes to the first storage node being present amongst the n target storage nodes, m≥n≥1 (301); and sending an acquisition command to the first storage node, the acquisition command being used for instructing the first storage node to acquire and store the m data blocks reconstructed by the n target storage nodes (302). The present method solves the problem of low data reconstruction efficiency in data storage systems and increases the efficiency of data reconstruction in data storage systems; the method is used for data reconstruction.

Description

数据重构方法及装置、数据存储系统Data reconstruction method and device, data storage system
相关申请的交叉引用Cross-reference to related applications
本公开要求于2017年9月29日递交的中国专利申请第201710903893.4号的优先权,在此全文引用上述中国专利申请公开的内容以作为本公开的一部分。The present disclosure claims the priority of the Chinese Patent Application No. 201710903893.4 filed on Sep. 29, 2017, the entire disclosure of which is hereby incorporated by reference.
技术领域Technical field
本公开涉及数据存储技术领域,特别涉及一种数据重构方法及装置、数据存储系统。The present disclosure relates to the field of data storage technologies, and in particular, to a data reconstruction method and apparatus, and a data storage system.
背景技术Background technique
随着数据存储技术的发展,基于串行连接小型计算机系统接口(英文:SerialAttached Small Computer System Interface;简称:SAS)协议的数据存储系统得到了广泛的应用。With the development of data storage technology, the data storage system based on the Serial Attached Small Computer System Interface (SAS) protocol has been widely used.
相关技术中,基于SAS协议的数据存储系统包括:元数据管理服务器(英文:Metadata Server;简称:MDS)、SAS交换机和多个存储节点,且该多个存储节点之间通过SAS交换机相互连接,每个存储节点包括多个磁盘。当用户终端需要在该数据存储系统中存储目标数据时,该数据存储系统中的一个存储节点会将该目标数据切割为多个数据块(也称对目标数据进行条带化处理,该多个数据块属于同一个条带),并将该多个数据块分别存储在不同的磁盘上。MDS可以用于存储每个数据块的条带信息(一种元数据),条带信息包括:数据块的条带标识(也即数据块所属的条带的标识)、数据块所在的磁盘标识、数据块的标识、数据块的数据量以及数据块的纠删码(英文:Erasure Coding;简称:EC)类型。当存储节点上出现故障磁盘时,MDS可以向该存储节点发送重构指令,该重构指令包括该故障磁盘上存储的每个数据块的条带信息,该存储节点可以根据该重构指令中每个数据块的条带信息,对故障磁盘上存储的每个数据块进行重构,也即对每个数据块进行恢复。In the related art, the data storage system based on the SAS protocol includes: a metadata management server (English: Metadata Server; MDS for short), a SAS switch, and a plurality of storage nodes, and the plurality of storage nodes are connected to each other through a SAS switch. Each storage node includes multiple disks. When the user terminal needs to store the target data in the data storage system, a storage node in the data storage system cuts the target data into a plurality of data blocks (also referred to as striping processing on the target data, the plurality of The data blocks belong to the same stripe) and the multiple data blocks are stored on separate disks. The MDS can be used to store stripe information (a kind of metadata) of each data block, and the stripe information includes: a stripe identifier of the data block (that is, an identifier of a strip to which the data block belongs), and a disk identifier where the data block is located. The identifier of the data block, the data volume of the data block, and the erasure code of the data block (English: Erasure Coding; abbreviation: EC) type. When a faulty disk occurs on the storage node, the MDS may send a reconstruction instruction to the storage node, where the reconstruction instruction includes stripe information of each data block stored on the faulty disk, and the storage node may be in accordance with the reconstruction instruction. The stripe information of each data block reconstructs each data block stored on the failed disk, that is, restores each data block.
但是,由于磁盘上通常存储有较多的数据块,且当该故障磁盘所在的存储节点的数据处理能力较弱时,该存储节点对故障磁盘上存储的数据的重构速度 较慢,因此,数据存储系统的数据重构效率较低。However, since a large number of data blocks are usually stored on the disk, and when the data processing capability of the storage node where the failed disk is located is weak, the storage node reconfigures the data stored on the failed disk to be slow. The data reconstruction system has low data reconstruction efficiency.
发明内容Summary of the invention
本公开提供了一种数据重构方法及装置、数据存储系统,可以解决数据存储系统的数据重构效率较低的问题。所述技术方案如下:The present disclosure provides a data reconstruction method and device, and a data storage system, which can solve the problem of low data reconstruction efficiency of the data storage system. The technical solution is as follows:
第一方面,提供了一种数据重构方法,用于数据存储系统中的元数据管理服务器MDS,所述数据存储系统还包括:串行连接小型计算机系统接口SAS交换机和多个存储节点,所述多个存储节点通过所述SAS交换机相互连接,所述方法包括:In a first aspect, a data reconstruction method is provided for a metadata management server MDS in a data storage system, the data storage system further comprising: a serial connection small computer system interface SAS switch and a plurality of storage nodes, The plurality of storage nodes are connected to each other by the SAS switch, and the method includes:
在检测到第一存储节点中出现存储有m个数据块的故障磁盘时,向所述多个存储节点中的n个目标存储节点分别发送重构指令,其中,所述第一存储节点为所述多个存储节点中的任一存储节点,发送给所述n个目标重构节点的n个重构指令用于指示:对所述m个数据块进行重构并存储,所述n个目标存储节点中存在与所述第一存储节点不同的存储节点,m≥n≥1;When detecting the faulty disk in which the m data blocks are stored in the first storage node, respectively sending a reconstruction instruction to the n target storage nodes of the plurality of storage nodes, where the first storage node is ???said one of the plurality of storage nodes, and the n reconstruction instructions sent to the n target reconstruction nodes are used to indicate that the m data blocks are reconstructed and stored, the n targets There is a storage node different from the first storage node in the storage node, m≥n≥1;
向所述第一存储节点发送获取指令,所述获取指令用于指示所述第一存储节点获取并存储所述n个目标存储节点重构的所述m个数据块。Sending an acquisition instruction to the first storage node, where the obtaining instruction is used to instruct the first storage node to acquire and store the m data blocks reconstructed by the n target storage nodes.
可选的,m≥n≥2,在向所述多个存储节点中的n个目标存储节点分别发送重构指令之前,所述方法还包括:Optionally, before the sending the reconfiguration instruction to the n target storage nodes of the plurality of storage nodes, the method further includes:
将所述多个存储节点中负载较小的n个存储节点,确定为所述n个目标存储节点;Determining, as the n target storage nodes, n storage nodes having a smaller load among the plurality of storage nodes;
或者,将所述多个存储节点中预设的n个存储节点,确定为所述n个目标存储节点;Or determining, by the n storage nodes preset in the plurality of storage nodes, the n target storage nodes;
或者,将所述多个存储节点中与所述MDS的路由距离最近的n个存储节点,确定为所述n个目标存储节点。Alternatively, the n storage nodes that are closest to the routing distance of the MDS among the plurality of storage nodes are determined as the n target storage nodes.
可选的,向所述多个存储节点中的n个目标存储节点分别发送重构指令,包括:Optionally, sending the reconfiguration instructions to the n target storage nodes of the multiple storage nodes, respectively, including:
确定所述每个目标存储节点的负载;Determining a load of each of the target storage nodes;
根据所述n个目标存储节点的负载,确定所述每个目标存储节点对应的至少一个数据块,其中,目标存储节点对应的所有数据块的数据量之和与目标存储节点的负载负相关;Determining, according to the load of the n target storage nodes, at least one data block corresponding to each target storage node, wherein a sum of data amounts of all data blocks corresponding to the target storage node is negatively correlated with a load of the target storage node;
生成所述每个目标存储节点对应的重构指令,其中,所述每个目标存储节 点对应的重构指令用于指示:对所述每个目标存储节点对应的数据块进行重构并存储;Reconstructing the reconstruction instruction corresponding to each target storage node, where the reconstruction instruction corresponding to each target storage node is used to indicate that the data block corresponding to each target storage node is reconstructed and stored;
向所述每个目标存储节点发送其对应的重构指令。Sending its corresponding reconstruction instruction to each of the target storage nodes.
可选的,在向所述多个存储节点中的n个目标存储节点分别发送重构指令之后,所述方法还包括:Optionally, after the resizing instructions are sent to the n target storage nodes of the plurality of storage nodes, the method further includes:
接收所述第一存储节点发送的存储申请消息,所述存储申请消息包括所述m个数据块的总数据量;Receiving a storage application message sent by the first storage node, where the storage application message includes a total data amount of the m data blocks;
根据所述存储申请消息向所述第一存储节点发送存储指令,所述存储指令用于指示在目标磁盘上存储重构的所述m个数据块,所述目标磁盘为所述第一存储节点中可用存储容量大于或等于所述总数据量的磁盘。And sending, by the storage request message, a storage instruction to the first storage node, where the storage instruction is used to indicate that the reconstructed m data blocks are stored on a target disk, where the target disk is the first storage node A disk in which the available storage capacity is greater than or equal to the total amount of data.
可选的,所述目标磁盘上并未存储有相关数据块,所述相关数据块与所述m个数据块中的任一数据块属于同一条带。Optionally, the related data block is not stored on the target disk, and the related data block belongs to the same strip as any one of the m data blocks.
可选的,在向所述第一存储节点发送存储指令之后,所述方法还包括:Optionally, after the sending the storage instruction to the first storage node, the method further includes:
接收所述第一存储节点发送的所述每个数据块的存储信息,所述每个数据块的存储信息包括:所述目标磁盘的标识以及所述每个数据块的标识;Receiving, by the first storage node, the storage information of each of the data blocks, where the storage information of each data block includes: an identifier of the target disk and an identifier of each of the data blocks;
根据所述每个数据块的存储信息中所述每个数据块的标识,确定所述每个数据块的条带信息;Determining stripe information of each of the data blocks according to the identifier of each of the data blocks in the storage information of each data block;
将所述每个数据块的条带信息中所述每个数据块所在的磁盘的标识,修改为所述目标磁盘的标识。Modifying, in the stripe information of each data block, an identifier of a disk where each of the data blocks is located, to an identifier of the target disk.
可选的,所述多个存储节点中的每个存储节点包括:存储磁盘和缓存磁盘,所述每个存储节点具有所述存储磁盘的读权限,以及所述缓存磁盘的读写权限,所述n个目标存储节点包括缓存存储节点,发送给所述缓存存储节点的重构指令用于指示:将重构的数据块存储在所述缓存存储节点的缓存磁盘,在向所述第一存储节点发送获取指令之后,所述方法还包括:Optionally, each of the plurality of storage nodes includes: a storage disk and a cache disk, each storage node having read permission of the storage disk, and read and write permissions of the cache disk, The n target storage nodes include a cache storage node, and the reconstruction instruction sent to the cache storage node is used to indicate: storing the reconstructed data block in a cache disk of the cache storage node, to the first storage After the node sends the acquisition instruction, the method further includes:
接收所述第一存储节点发送的获取完毕消息,所述获取完毕消息用于指示所述第一存储节点已经获取并存储完毕重构的所述m个数据块;And receiving the obtained completion message sent by the first storage node, where the obtained information is used to indicate that the first storage node has acquired and stored the reconstructed m data blocks;
向所述缓存存储节点发送删除指令,所述删除指令用于指示所述缓存存储节点删除其缓存磁盘上存储的数据块。And sending a delete instruction to the cache storage node, where the delete instruction is used to instruct the cache storage node to delete a data block stored on a cache disk thereof.
第二方面,提供了一种数据重构装置,用于数据存储系统中的元数据管理服务器MDS,所述数据存储系统还包括:串行连接小型计算机系统接口SAS交换机和多个存储节点,所述多个存储节点通过所述SAS交换机相互连接, 所述方法包括:In a second aspect, a data reconstruction apparatus is provided for a metadata management server MDS in a data storage system, the data storage system further comprising: a serial connection small computer system interface SAS switch and a plurality of storage nodes, The plurality of storage nodes are connected to each other by the SAS switch, and the method includes:
第一发送模块,用于在检测到第一存储节点中出现存储有m个数据块的故障磁盘时,向所述多个存储节点中的n个目标存储节点分别发送重构指令,其中,所述第一存储节点为所述多个存储节点中的任一存储节点,发送给所述n个目标重构节点的n个重构指令用于指示:对所述m个数据块进行重构并存储,所述n个目标存储节点中存在与所述第一存储节点不同的存储节点,m≥n≥1;a first sending module, configured to send a reconfiguration instruction to each of the plurality of storage nodes when detecting a faulty disk in which the m data blocks are stored in the first storage node, where The first storage node is any one of the plurality of storage nodes, and the n reconstruction instructions sent to the n target reconstruction nodes are used to indicate that the m data blocks are reconstructed and Storage, a storage node different from the first storage node exists in the n target storage nodes, m≥n≥1;
第二发送模块,用于向所述第一存储节点发送获取指令,所述获取指令用于指示所述第一存储节点获取并存储所述n个目标存储节点重构的所述m个数据块。a second sending module, configured to send an acquisition instruction to the first storage node, where the obtaining instruction is used to instruct the first storage node to acquire and store the m data blocks reconstructed by the n target storage nodes .
可选的,m≥n≥2,所述数据重构装置还包括:Optionally, m≥n≥2, the data reconstruction device further includes:
第一确定模块,用于将所述多个存储节点中负载较小的n个存储节点,确定为所述n个目标存储节点;a first determining module, configured to determine n storage nodes that are less loaded among the plurality of storage nodes as the n target storage nodes;
或者,第二确定模块,用于将所述多个存储节点中预设的n个存储节点,确定为所述n个目标存储节点;Or the second determining module is configured to determine, as the n target storage nodes, the n storage nodes preset in the multiple storage nodes;
或者,第三确定模块,用于将所述多个存储节点中与所述MDS的路由距离最近的n个存储节点,确定为所述n个目标存储节点。Alternatively, the third determining module is configured to determine, as the n target storage nodes, n storage nodes that are closest to the routing distance of the MDS among the plurality of storage nodes.
可选的,所述第一发送模块用于:Optionally, the first sending module is configured to:
确定所述每个目标存储节点的负载;Determining a load of each of the target storage nodes;
根据所述n个目标存储节点的负载,确定所述每个目标存储节点对应的至少一个数据块,其中,目标存储节点对应的所有数据块的数据量之和与目标存储节点的负载负相关;Determining, according to the load of the n target storage nodes, at least one data block corresponding to each target storage node, wherein a sum of data amounts of all data blocks corresponding to the target storage node is negatively correlated with a load of the target storage node;
生成所述每个目标存储节点对应的重构指令,其中,所述每个目标存储节点对应的重构指令用于指示:对所述每个目标存储节点对应的数据块进行重构并存储;Reconstructing the reconstruction instruction corresponding to each target storage node, where the reconstruction instruction corresponding to each target storage node is used to indicate that the data block corresponding to each target storage node is reconstructed and stored;
向所述每个目标存储节点发送其对应的重构指令。Sending its corresponding reconstruction instruction to each of the target storage nodes.
可选的,所述数据重构装置还包括:Optionally, the data reconstruction device further includes:
第一接收模块,用于接收所述第一存储节点发送的存储申请消息,所述存储申请消息包括所述m个数据块的总数据量;a first receiving module, configured to receive a storage application message sent by the first storage node, where the storage application message includes a total data volume of the m data blocks;
第三发送模块,用于根据所述存储申请消息向所述第一存储节点发送存储指令,所述存储指令用于指示在目标磁盘上存储重构的所述m个数据块,所述目标磁盘为所述第一存储节点中可用存储容量大于或等于所述总数据量的磁 盘。a third sending module, configured to send a storage instruction to the first storage node according to the storage request message, where the storage instruction is used to indicate that the reconstructed m data blocks are stored on a target disk, the target disk A disk that has a storage capacity greater than or equal to the total amount of data in the first storage node.
可选的,所述目标磁盘上并未存储有相关数据块,所述相关数据块与所述m个数据块中的任一数据块属于同一条带。Optionally, the related data block is not stored on the target disk, and the related data block belongs to the same strip as any one of the m data blocks.
可选的,所述数据重构装置还包括:Optionally, the data reconstruction device further includes:
第二接收模块,用于接收所述第一存储节点发送的所述每个数据块的存储信息,所述每个数据块的存储信息包括:所述目标磁盘的标识以及所述每个数据块的标识;a second receiving module, configured to receive storage information of each of the data blocks sent by the first storage node, where the storage information of each data block includes: an identifier of the target disk, and each of the data blocks Identification
第四确定模块,用于根据所述每个数据块的存储信息中所述每个数据块的标识,确定所述每个数据块的条带信息;a fourth determining module, configured to determine stripe information of each of the data blocks according to the identifier of each of the data blocks in the storage information of each data block;
修改模块,用于将所述每个数据块的条带信息中所述每个数据块所在的磁盘的标识,修改为所述目标磁盘的标识。And a modification module, configured to modify an identifier of the disk where each of the data blocks in the strip information of each data block is located to an identifier of the target disk.
可选的,所述多个存储节点中的每个存储节点包括:存储磁盘和缓存磁盘,所述每个存储节点具有所述存储磁盘的读权限,以及所述缓存磁盘的读写权限,所述n个目标存储节点包括缓存存储节点,发送给所述缓存存储节点的重构指令用于指示:将重构的数据块存储在所述缓存存储节点的缓存磁盘,所述数据重构装置还包括:Optionally, each of the plurality of storage nodes includes: a storage disk and a cache disk, each storage node having read permission of the storage disk, and read and write permissions of the cache disk, The n target storage nodes include a cache storage node, and the reconstruction instruction sent to the cache storage node is used to indicate that the reconstructed data block is stored in a cache disk of the cache storage node, and the data reconstruction device further include:
第三接收模块,用于接收所述第一存储节点发送的获取完毕消息,所述获取完毕消息用于指示所述第一存储节点已经获取并存储完毕重构的所述m个数据块;a third receiving module, configured to receive the acquired information message sent by the first storage node, where the obtained information message is used to indicate that the first storage node has acquired and stored the reconstructed m data blocks;
第四发送模块,用于向所述缓存存储节点发送删除指令,所述删除指令用于指示所述缓存存储节点删除其缓存磁盘上存储的数据块。And a fourth sending module, configured to send a delete instruction to the cache storage node, where the delete instruction is used to instruct the cache storage node to delete a data block stored on a cache disk.
第三方面,提供了一种数据存储系统,所述数据存储系统包括:元数据管理服务器MDS、多个存储节点和串行连接小型计算机系统接口SAS交换机,所述多个存储节点通过所述SAS交换机相互连接,所述MDS包括第二方面所述的数据重构装置。In a third aspect, a data storage system is provided, the data storage system comprising: a metadata management server MDS, a plurality of storage nodes, and a serial connection small computer system interface SAS switch, wherein the plurality of storage nodes pass the SAS The switches are interconnected, and the MDS comprises the data reconstruction device of the second aspect.
第四方面,提供了一种计算机设备,包括处理器、通信接口、存储器和通信总线,其中,处理器,通信接口,存储器通过总线完成相互间的通信;存储器,用于存放计算机程序;处理器,用于执行存储器上所存放的程序,实现第一方面所述的方法步骤。A fourth aspect provides a computer device including a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory complete communication with each other through a bus; the memory is configured to store the computer program; For executing the program stored on the memory, implementing the method steps described in the first aspect.
第五方面,提供了一种计算机可读存储介质,所述存储介质内存储有计算机程序,所述计算机程序被处理器执行时实现第一方面所述的方法步骤。In a fifth aspect, a computer readable storage medium is provided having stored therein a computer program, the computer program being executed by a processor to implement the method steps of the first aspect.
第六方面,提供了一种数据存储系统,所述数据存储系统包括:元数据管理服务器MDS、多个存储节点和串行连接小型计算机系统接口SAS交换机,所述多个存储节点通过所述SAS交换机相互连接,所述MDS包括权利要求16所述的计算机设备。In a sixth aspect, a data storage system is provided, the data storage system comprising: a metadata management server MDS, a plurality of storage nodes, and a serial connection small computer system interface SAS switch, wherein the plurality of storage nodes pass the SAS The switches are interconnected and the MDS comprises the computer device of claim 16.
第七方面,提供了一种计算机程序产品,当所述计算机程序产品在计算机上运行时,使得计算机执行第一方面所述的方法步骤。In a seventh aspect, a computer program product is provided, which, when run on a computer, causes the computer to perform the method steps of the first aspect.
本公开提供的技术方案带来的有益效果是:The beneficial effects brought by the technical solutions provided by the present disclosure are:
MDS在检测到第一存储节点中出现故障磁盘时,向n个目标存储节点发送了重构指令,以使得n个目标存储节点对故障磁盘上的数据块进行重构,且MDS还可以指示第一存储节点获取每个目标存储节点重构得到的数据块。当第一存储节点的数据重构能力较弱时,由于该n个目标存储节点中存在与第一存储节点不同的其他存储节点,因此,其他存储节点能够帮助第一存储节点对数据块进行重构,使得第一存储节点所需重构的数据较少,故障磁盘上存储的数据块的重构速度较快,所以,提高了数据存储系统的数据重构效率。When detecting the faulty disk in the first storage node, the MDS sends a reconfiguration instruction to the n target storage nodes, so that the n target storage nodes reconstruct the data blocks on the failed disk, and the MDS may also indicate A storage node acquires a data block reconstructed by each target storage node. When the data reconstruction capability of the first storage node is weak, since there are other storage nodes different from the first storage node among the n target storage nodes, other storage nodes can help the first storage node to weight the data block. Therefore, the first storage node needs less reconstructed data, and the reconstruction of the data block stored on the faulty disk is faster, so the data reconstruction efficiency of the data storage system is improved.
附图说明DRAWINGS
为了更清楚地说明本公开实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present disclosure, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the present disclosure. Other drawings may also be obtained from those of ordinary skill in the art in light of the inventive work.
图1为本公开实施例提供的一种数据存储系统的结构示意图;FIG. 1 is a schematic structural diagram of a data storage system according to an embodiment of the present disclosure;
图2为本公开实施例提供的一种存储节点中磁盘的示意图;2 is a schematic diagram of a disk in a storage node according to an embodiment of the present disclosure;
图3为本公开实施例提供的一种数据重构方法的方法流程图;FIG. 3 is a flowchart of a method for data reconstruction according to an embodiment of the present disclosure;
图4为本公开实施例提供的另一种数据重构方法的方法流程图;FIG. 4 is a flowchart of another method for data reconstruction according to an embodiment of the present disclosure;
图5为本公开实施例提供的一种数据重构装置的结构示意图;FIG. 5 is a schematic structural diagram of a data reconstruction apparatus according to an embodiment of the present disclosure;
图6为本公开实施例提供的另一种数据重构装置的结构示意图;FIG. 6 is a schematic structural diagram of another data reconstruction apparatus according to an embodiment of the present disclosure;
图7为本公开实施例提供的又一种数据重构装置的结构示意图;FIG. 7 is a schematic structural diagram of still another data reconstruction apparatus according to an embodiment of the present disclosure;
图8为本公开实施例提供的再一种数据重构装置的结构示意图。FIG. 8 is a schematic structural diagram of still another data reconstruction apparatus according to an embodiment of the present disclosure.
具体实施方式Detailed ways
为使本公开的目的、技术方案和优点更加清楚,下面将结合附图对本公开 实施方式作进一步地详细描述。The embodiments of the present disclosure will be further described in detail below with reference to the accompanying drawings.
图1为本公开实施例提供的一种数据存储系统的结构示意图,如图1所示,该数据存储系统包括:MDS 01、多个存储节点02、SAS交换机03和以太网交换机04。其中,MDS 01和多个存储节点02之间通过以太网交换机04相连接,多个存储节点02之间通过SAS交换机03相连接。示例的,MDS 01可以为服务器或者服务器集群,存储节点02可以为具有存储功能的设备,例如服务器或电脑等设备。FIG. 1 is a schematic structural diagram of a data storage system according to an embodiment of the present disclosure. As shown in FIG. 1 , the data storage system includes: an MDS 01, a plurality of storage nodes 02, a SAS switch 03, and an Ethernet switch 04. The MDS 01 and the plurality of storage nodes 02 are connected by an Ethernet switch 04, and the plurality of storage nodes 02 are connected by a SAS switch 03. For example, the MDS 01 may be a server or a server cluster, and the storage node 02 may be a device having a storage function, such as a server or a computer.
每个存储节点包括多个磁盘,每个磁盘均用于存储数据。可选的,每个存储节点还可以包括处理器,该处理器上可以运行有存储服务器(英文:Object Storage Device;简称:OSD)、审计服务器(英文:AUDITOR)和切片服务器(英文:Stripe Server;简称:SS),也即每个存储节点上可以运行有OSD、审计服务器和SS。图2为本公开实施例提供的一种存储节点中磁盘的示意图,图2共示出了五个存储节点02。其中,每个存储节点02均可以包括多个磁盘。每个存储存储节点上可以运行有一个OSD、一个SS和一个审计服务器(图5中均未示出)。每个存储节点上运行的OSD均能够通过SAS交换机读取任一磁盘上存储的数据。Each storage node consists of multiple disks, each of which is used to store data. Optionally, each storage node may further include a processor, where the storage server (English: Object Storage Device; OSD), an audit server (English: AUDITOR), and a slicing server (English: Stripe Server) Abbreviation: SS), that is, each storage node can run OSD, audit server and SS. FIG. 2 is a schematic diagram of a disk in a storage node according to an embodiment of the present disclosure. FIG. 2 shows a total of five storage nodes 02. Each storage node 02 can include multiple disks. There can be one OSD, one SS, and one audit server running on each storage storage node (neither shown in Figure 5). The OSD running on each storage node is capable of reading data stored on any disk through the SAS switch.
用户终端可以在图1中存储节点上的磁盘存储数据,以及读取图1中存储节点上的磁盘中的数据。The user terminal can store data on the disk on the storage node in FIG. 1, and read the data in the disk on the storage node in FIG.
一方面,当用户终端需要在磁盘中写入目标数据A时,用户终端可以向MDS发送写请求,此时MDS为用户终端分配SS,并为目标数据A分配EC类型。然后,用户终端就可以将需要存储的目标数据A发送至MDS分配的SS。SS在接收到目标数据A后,需要向MDS申请条带资源。MDS可以根据分配给目标数据A的纠删码(英文:Erasure Coding;简称:EC)类型为SS分配条带资源,该条带资源可以包括多个存储节点上的磁盘。On the one hand, when the user terminal needs to write the target data A in the disk, the user terminal can send a write request to the MDS, at which time the MDS allocates the SS for the user terminal and assigns the EC type to the target data A. Then, the user terminal can transmit the target data A that needs to be stored to the SS allocated by the MDS. After receiving the target data A, the SS needs to apply for the stripe resource to the MDS. The MDS may allocate a strip resource to the SS according to an erasure code (English: Erasure Coding; EC) type assigned to the target data A, and the strip resource may include a disk on multiple storage nodes.
例如,MDS分配给SS的条带资源的信息为:{<stripe_id,OSD_1,wwn_1>,<stripe_id,OSD_1,wwn_2>,<stripe_id,OSD_1,wwn_3>,<stripe_id,OSD_1,wwn_4>,<stripe_id,OSD_1,wwn_5>}。其中,每两个相邻的“<”和“>”之间表示一个存储节点上的磁盘的信息,“{”和“}”之间表示为该数据分配的条带资源,stripe_id为条带标识,wwn为磁盘标识。也即是,MDS共为目标数据A分配了五个存储节点上的五个磁盘,分别为磁盘wwn_1、wwn_2、wwn_3、wwn_4和wwn_5,且MDS将这五个磁盘的写权限分配给了OSD1, 这五个磁盘均用于存储条带化的目标数据A,该条带资源的信息中的条带标识均相同。For example, the information of the stripe resource allocated by the MDS to the SS is: {<stripe_id, OSD_1, wwn_1>, <stripe_id, OSD_1, wwn_2>, <stripe_id, OSD_1, wwn_3>, <stripe_id, OSD_1, wwn_4>, <stripe_id, OSD_1, wwn_5>}. Wherein, each two adjacent "<" and ">" represents information of a disk on a storage node, and between "{" and "}" represents a stripe resource allocated for the data, and the stripe_id is a stripe ID, wwn is the disk ID. That is, the MDS allocates five disks on the five storage nodes for the target data A, namely the disks wwn_1, wwn_2, wwn_3, wwn_4, and wwn_5, and the MDS allocates the write rights of the five disks to the OSD1. These five disks are used to store the striped target data A, and the stripe identifiers in the information of the stripe resources are the same.
然后,SS可以将目标数据A按照预设的数据块大小进行切片(也即将目标数据A进行条带化处理),得到k个原始对象块目标数据A1(也即k个数据块),并根据目标数据A的EC类型(k+m=n)生成目标数据A的m个冗余对象块目标数据A2(也即m个数据块)。MDS分配的条带资源共包括n个存储节点上的n个磁盘,每个磁盘用于存储目标数据A的一个数据块。Then, the SS can slice the target data A according to the preset data block size (that is, the target data A is striped), and obtain k original target block target data A1 (that is, k data blocks), and according to The EC type (k+m=n) of the target data A generates m redundant target block target data A2 (that is, m data blocks) of the target data A. The stripe resources allocated by the MDS include a total of n disks on n storage nodes, and each disk is used to store one data block of the target data A.
SS还可以为目标数据A的每个数据块生成一个秘钥(英文:key),每个数据块的秘钥可以作为该数据块的标识,并且,每个数据块的秘钥还可以标识该数据块为原始对象块目标数据A1,还是冗余对象块目标数据A2。然后,SS可以根据将数据块以及数据块的秘钥添加至磁盘的信息,以得到<stripe_id,OSD,wwn,key,value>。其中,value表示数据块。此时,条带资源的信息变为{<stripe_id,OSD_1,wwn_1,key_1,value_1>,<stripe_id,OSD_1,wwn_2,key_2,value_2>,<stripe_id,OSD_1,wwn_3,key_3,value_3>,<stripe_id,OSD_1,wwn_4,key_4,value_4>,<stripe_id,OSD_1,wwn_5,key_5,value_5>}。The SS may also generate a secret key (English: key) for each data block of the target data A, the secret key of each data block may be used as an identifier of the data block, and the key of each data block may also identify the data block. The data block is the original object block target data A1 or the redundant object block target data A2. Then, the SS can obtain <stripe_id, OSD, wwn, key, value> according to the information of adding the data block and the secret key of the data block to the disk. Where value represents a data block. At this time, the information of the stripe resource becomes {<stripe_id, OSD_1, wwn_1, key_1, value_1>, <stripe_id, OSD_1, wwn_2, key_2, value_2>, <stripe_id, OSD_1, wwn_3, key_3, value_3>, <stripe_id, OSD_1, wwn_4, key_4, value_4>, <stripe_id, OSD_1, wwn_5, key_5, value_5>}.
进一步的,SS还可以将每个磁盘的信息中的<wwn,key,value>发送给相应的OSD,该OSD为磁盘的信息中OSD标识所指示的OSD。OSD在接收到<wwn,key,value>后,可以将<key,value>写入到wwn所指示的磁盘中。OSD写入完毕后返回给SS写入成功消息,SS可以根据该写入成功消息,确定该数据块写入成功。当SS确定目标数据A中的每个数据块(包括原始对象块目标数据A1和冗余对象块目标数据A2)均写入成功后,可以将条带资源的信息中目标数据A的条带信息返回给MDS存储。其中,目标数据A的条带信息可以为:{<stripe_id,wwn_1,key_1>,<stripe_id,wwn_2,key_2>,<stripe_id,wwn_3,key_3>,<stripe_id,wwn_4,key_4>,<stripe_id,wwn_5,key_5>}。其中,“{”和“}”之间表示为目标数据A的条带信息,每两个相邻的“<”和“>”之间表示一个数据块的存储信息,目标数据A的条带信息也称为目标数据A的每个数据块的条带信息。Further, the SS may also send <wwn, key, value> in the information of each disk to the corresponding OSD, where the OSD is the OSD indicated by the OSD identifier in the information of the disk. After receiving the <wwn,key,value>, the OSD can write <key,value> to the disk indicated by wwn. After the OSD is written, the SS write success message is returned, and the SS can determine that the data block is successfully written according to the write success message. When the SS determines that each data block in the target data A (including the original object block target data A1 and the redundant object block target data A2) is successfully written, the stripe information of the target data A in the information of the strip resource may be Return to the MDS store. The stripe information of the target data A may be: {<stripe_id, wwn_1, key_1>, <stripe_id, wwn_2, key_2>, <stripe_id, wwn_3, key_3>, <stripe_id, wwn_4, key_4>, <stripe_id, wwn_5, Key_5>}. Wherein, between "{" and "}" is represented as stripe information of the target data A, and each two adjacent "<" and ">" represents storage information of one data block, and stripe of the target data A The information is also referred to as stripe information of each data block of the target data A.
另一方面,当用户终端需要读取磁盘中存储的目标数据A时,用户终端需要向MDS发送读请求。MDS可以根据该读请求,读取之前记录的目标数据A的条带信息。例如,目标数据A的条带信息为{<stripe_id,wwn_1,key_1>, <stripe_id,wwn_2,key_2>,<stripe_id,wwn_3,key_3>,<stripe_id,wwn_4,key_4>,<stripe_id,wwn_5,key_5>}。然后,MDS可以根据该条带信息中wwn所指示的磁盘确定之前存储该目标数据A的OSD,并将目标数据A的条带信息以及用户终端的标识,发送给该OSD本地的SS(该OSD与该SS运行在同一个存储节点上)。On the other hand, when the user terminal needs to read the target data A stored in the disk, the user terminal needs to send a read request to the MDS. The MDS can read the stripe information of the previously recorded target data A according to the read request. For example, the stripe information of the target data A is {<stripe_id, wwn_1, key_1>, <stripe_id, wwn_2, key_2>, <stripe_id, wwn_3, key_3>, <stripe_id, wwn_4, key_4>, <stripe_id, wwn_5, key_5> }. Then, the MDS may determine the OSD that previously stored the target data A according to the disk indicated by wwn in the strip information, and send the stripe information of the target data A and the identifier of the user terminal to the SS of the OSD local (the OSD) Running on the same storage node as the SS).
SS在接收到目标数据A的条带信息后,可以将该条带信息中每个原始对象块目标数据A1的存储信息中的<wwn,key>发送给本地的OSD。然后,OSD可以根据接收到的<wwn,key>中的key,读取wwn所指示的磁盘上的数据块(也称value),并将<key,value>返回给本地的SS。SS在接收到OSD返回的所有<key,value>后,可以将接收到的<key,value>组合得到{<key_1,value_1>,<key_2,value_2>,<key_3,value_3>}(假设value_1、value_2和value_3均为原始对象块目标数据A1,value_4和value_5均为冗余对象块目标数据A2)。最后,SS可以将{<key_1,value_1>,<key_2,value_2>,<key_3,value_3>}中的value_1、value_2和value_3打包得到目标数据A,并将目标数据A发送给用户终端。After receiving the stripe information of the target data A, the SS may send <wwn, key> in the storage information of each original target block target data A1 in the stripe information to the local OSD. Then, the OSD can read the data block (also called value) on the disk indicated by wwn according to the received key in <wwn, key>, and return <key, value> to the local SS. After receiving all <key, value> returned by the OSD, the SS can get the received <key, value> combination to get {<key_1, value_1>, <key_2, value_2>, <key_3, value_3>} (assuming value_1, Both value_2 and value_3 are the original object block target data A1, and value_4 and value_5 are redundant object block target data A2). Finally, the SS can package the value_1, value_2, and value_3 in {<key_1, value_1>, <key_2, value_2>, <key_3, value_3>} to obtain the target data A, and send the target data A to the user terminal.
需要说明的是,存储节点中的磁盘中较容易出现故障磁盘,此时用户无法读取存储在该故障磁盘上的数据块,因此,本公开实施例提供了一种数据重构方法,用于重构故障磁盘上存储的数据块。It should be noted that the disk in the storage node is more likely to be faulty, and the user cannot read the data block stored on the faulty disk. Therefore, the embodiment of the present disclosure provides a data reconstruction method for Reconstruct the data blocks stored on the failed disk.
图3为本公开实施例提供的一种数据重构方法的方法流程图,该数据重构方法可以用于数据存储系统中的MDS(如图1所示的MDS 01),如图3所示,该数据重构方法包括:FIG. 3 is a flowchart of a method for data reconstruction according to an embodiment of the present disclosure. The data reconstruction method may be used for an MDS in a data storage system (such as MDS 01 shown in FIG. 1 ), as shown in FIG. 3 . The data reconstruction method includes:
步骤301、在检测到第一存储节点中出现存储有m个数据块的故障磁盘时,向多个存储节点中的n个目标存储节点分别发送重构指令,其中,第一存储节点为多个存储节点中的任一存储节点,发送给n个目标重构节点的n个重构指令用于指示:对m个数据块进行重构并存储,n个目标存储节点中存在与第一存储节点不同的存储节点,m≥n≥1。Step 301: When detecting a faulty disk in which the m data blocks are stored in the first storage node, send a reconstruction instruction to each of the plurality of storage nodes, where the first storage node is multiple Any one of the storage nodes, the n reconstruction instructions sent to the n target reconstruction nodes are used to indicate that the m data blocks are reconstructed and stored, and the n storage nodes exist and the first storage node Different storage nodes, m≥n≥1.
步骤302、向第一存储节点发送获取指令,获取指令用于指示第一存储节点获取并存储n个目标存储节点重构的m个数据块。Step 302: Send an acquisition instruction to the first storage node, where the acquisition instruction is used to instruct the first storage node to acquire and store the m data blocks reconstructed by the n target storage nodes.
综上所述,本公开实施例提供了一种数据重构方法,MDS在检测到第一存储节点中出现故障磁盘时,向n个目标存储节点发送了重构指令,以使得n 个目标存储节点对故障磁盘上的数据块进行重构,且MDS还可以指示第一存储节点获取每个目标存储节点重构得到的数据块。当第一存储节点的数据重构能力较弱时,由于该n个目标存储节点中存在与第一存储节点不同的其他存储节点,因此,其他存储节点能够帮助第一存储节点对数据块进行重构,使得第一存储节点所需重构的数据较少,故障磁盘上存储的数据块的重构速度较快,所以,提高了数据存储系统的数据重构效率。In summary, the embodiment of the present disclosure provides a data reconstruction method. When detecting a faulty disk in a first storage node, the MDS sends a reconstruction instruction to the n target storage nodes, so that n target storages are performed. The node reconstructs the data block on the failed disk, and the MDS may also instruct the first storage node to acquire the data block reconstructed by each target storage node. When the data reconstruction capability of the first storage node is weak, since there are other storage nodes different from the first storage node among the n target storage nodes, other storage nodes can help the first storage node to weight the data block. Therefore, the first storage node needs less reconstructed data, and the reconstruction of the data block stored on the faulty disk is faster, so the data reconstruction efficiency of the data storage system is improved.
图4为本公开实施例提供的另一种数据重构方法的方法流程图,如图6所示,该数据重构方法包括:FIG. 4 is a flowchart of another method for data reconstruction according to an embodiment of the present disclosure. As shown in FIG. 6, the data reconstruction method includes:
步骤401、第一存储节点向MDS发送故障消息,故障消息用于指示第一存储节点中出现故障磁盘。Step 401: The first storage node sends a fault message to the MDS, where the fault message is used to indicate that a faulty disk is present in the first storage node.
需要说明的是,图1中的每个存储节点均可以包括多个磁盘,且该多个磁盘包括存储磁盘和缓存磁盘。其中,存储磁盘和缓存磁盘可以为固态硬盘(英文:Solid State Drives;简称:SSD)、串口硬盘(也称SATA硬盘)或者SAS磁盘。可选的,缓存磁盘为固态硬盘(英文:Solid State Drives;简称:SSD),存储磁盘为串口硬盘(也称SATA硬盘)或者SAS磁盘。第一存储节点为该多个存储节点中的任一存储节点,该故障磁盘可以为第一存储节点中的某一存储磁盘。It should be noted that each storage node in FIG. 1 may include multiple disks, and the multiple disks include a storage disk and a cache disk. The storage disk and the cache disk may be solid state drives (English: Solid State Drives; SSD), serial hard disks (also called SATA hard disks), or SAS disks. Optionally, the cache disk is a solid state drive (English: Solid State Drives; SSD), and the storage disk is a serial hard disk (also called a SATA hard disk) or a SAS disk. The first storage node is any one of the plurality of storage nodes, and the failed disk may be a storage disk of the first storage node.
请参考图2,每个存储节点02的多个磁盘(图2中示出的是六个磁盘)可以包括:五个存储磁盘和一个缓存磁盘。每个存储节点02上运行的OSD具有对该存储节点02上的缓存磁盘写数据的权限,且具有对该存储节点02上的每个存储磁盘和每个缓存磁盘进行读数据的权限。需要说明的是,存储节点02中存储磁盘的个数可以为大于或等于1的任意整数,缓存磁盘的个数也可以为大于或等于1的任意整数,本公开实施例对此不做限定。可选的,OSD可以用于监控存储节点中的存储磁盘是否故障,当某一存储磁盘故障时,OSD确定该存储磁盘为故障磁盘,并向MDS发送故障消息,该故障消息可以用于指示该故障磁盘。例如,该故障消息可以包括故障磁盘的标识。Referring to FIG. 2, a plurality of disks (six disks shown in FIG. 2) of each storage node 02 may include: five storage disks and one cache disk. The OSD running on each storage node 02 has the right to write data to the cache disk on the storage node 02, and has the right to read data for each storage disk and each cache disk on the storage node 02. It should be noted that the number of the storage disks in the storage node 02 may be any integer greater than or equal to 1, and the number of the cache disks may be any integer greater than or equal to 1, which is not limited in the embodiment of the present disclosure. Optionally, the OSD can be used to monitor whether the storage disk in the storage node is faulty. When a storage disk fails, the OSD determines that the storage disk is a faulty disk, and sends a fault message to the MDS, and the fault message can be used to indicate the fault. Faulty disk. For example, the fault message can include an identification of the failed disk.
步骤402、MDS根据故障消息,确定第一存储节点中的故障磁盘。Step 402: The MDS determines a faulty disk in the first storage node according to the fault message.
MDS在接收到故障消息后,可以对该故障消息进行解析,得到该故障消息中故障磁盘的标识,进而确定该第一存储节点中的故障磁盘。After receiving the fault message, the MDS can parse the fault message, obtain the identifier of the faulty disk in the fault message, and determine the faulty disk in the first storage node.
步骤403、MDS获取故障磁盘上存储的m个数据块的条带信息。Step 403: The MDS acquires stripe information of m data blocks stored on the faulty disk.
需要说明的是,MDS上存储有该数据存储系统中每个磁盘上存储的每个数据块的条带信息。当MDS确定第一存储节点中的故障磁盘后,MDS可以确定该故障磁盘上存储的m个数据块,并获取该m个数据块中每个数据块的条带信息。其中,m≥1。It should be noted that the MDS stores strip information of each data block stored on each disk in the data storage system. After the MDS determines the faulty disk in the first storage node, the MDS may determine m data blocks stored on the faulty disk, and obtain stripe information of each of the m data blocks. Where m≥1.
步骤404、MDS在数据存储系统中确定n个目标存储节点。Step 404: The MDS determines n target storage nodes in the data storage system.
示例的,m≥n≥1,也即是MDS可以在该数据存储系统中选择一个存储节点作为目标存储节点,或者在该数据存储系统中选择多个存储节点作为目标存储节点。且当n=1时,MDS选择出的一个目标存储节点并不是第一存储节点,当n≥2时,MDS选择出的多个目标存储节点可以包括第一存储节点,也可以不包括第一存储节点。For example, m≥n≥1, that is, the MDS may select one storage node as the target storage node in the data storage system, or select a plurality of storage nodes as the target storage node in the data storage system. And when n=1, a target storage node selected by the MDS is not the first storage node, and when n≥2, the plurality of target storage nodes selected by the MDS may include the first storage node, or may not include the first Storage node.
一方面,当n=1时,MDS可以通过多种可实现方式在数据存储系统中确定一个目标存储节点。本公开实施例中将对以下的三种可实现方式进行举例说明。On the one hand, when n = 1, the MDS can determine a target storage node in the data storage system in a variety of implementable ways. The following three implementable modes will be exemplified in the embodiments of the present disclosure.
在第一种可实现方式中,MDS可以首先确定该数据存储系统中除第一存储节点之外的每个存储节点的负载。需要说明的是,存储节点的负载可以与存储节点的至少一个性能参数正相关,其中,存储节点的性能参数包括:存储节点中处理器的使用率、存储节点的内存(包括存储节点中的所有磁盘)占用率以及存储节点的存储效率。In a first implementation, the MDS may first determine the load of each storage node in the data storage system other than the first storage node. It should be noted that the load of the storage node may be positively correlated with at least one performance parameter of the storage node, where the performance parameters of the storage node include: usage of the processor in the storage node, memory of the storage node (including all in the storage node) Disk usage and storage efficiency of storage nodes.
MDS可以根据除第一存储节点之外的每个存储节点的负载确定一个目标存储节点。示例的,MDS可以将该数据存储系统中除第一存储节点之外的存储节点的负载进行比较,并将负载最小的一个存储节点确定为目标存储节点。也即是,MDS在选择需要执行重构任务的目标存储节点时,可以选择除第一存储节点之外的最小负载(数据处理能力较高)的存储节点作为用于执行数据重构任务的目标存储节点,以保证目标存储节点能够较快的执行重构数据块的任务,提高数据重构的效率。The MDS can determine a target storage node based on the load of each storage node other than the first storage node. For example, the MDS may compare the load of the storage nodes other than the first storage node in the data storage system, and determine one storage node with the smallest load as the target storage node. That is, the MDS may select a storage node with a minimum load (higher data processing capability) other than the first storage node as a target for performing a data reconstruction task when selecting a target storage node that needs to perform a reconstruction task. The storage node is configured to ensure that the target storage node can perform the task of reconstructing the data block faster, and improve the efficiency of data reconstruction.
在第二种可实现方式中,数据存储系统中的存储节点中存在与第一存储节点不同的一个预设存储节点,且该预设存储节点可以为数据处理能力较高的存储节点。MDS可以直接将预设的这个存储节点确定为目标存储节点。In a second implementation manner, a storage node in the data storage system has a preset storage node different from the first storage node, and the preset storage node may be a storage node with higher data processing capability. The MDS can directly determine the preset storage node as the target storage node.
在第三种可实现方式中,MDS可以首先确定该数据存储系统中除第一存储节点之外的每个存储节点与MDS的路由距离,并根据除第一存储节点之外的每个存储节点与MDS的路由距离确定一个目标存储节点。示例的,MDS可 以将该数据存储系统中除第一存储节点之外的存储节点与MDS的路由距离进行比较,并将与MDS的路由距离最小的一个存储节点确定为目标存储节点。也即是,MDS在选择需要执行重构任务的目标存储节点时,可以在除第一存储节点之外的存储节点中,选择与MDS的路由距离最近的存储节点作为用于执行数据重构任务的目标存储节点,以保证MDS后续能够快速的向目标存储节点分配重构数据块的任务,提高数据重构的效率。In a third implementation manner, the MDS may first determine a routing distance of each storage node except the first storage node and the MDS in the data storage system, and according to each storage node except the first storage node. The routing distance from the MDS determines a target storage node. For example, the MDS may compare the storage node of the data storage system except the first storage node with the routing distance of the MDS, and determine a storage node with the smallest routing distance from the MDS as the target storage node. That is, when selecting the target storage node that needs to perform the reconstruction task, the MDS may select the storage node closest to the routing distance of the MDS as the storage data reconstruction task among the storage nodes other than the first storage node. The target storage node ensures that the MDS can quickly allocate the reconstructed data block to the target storage node, thereby improving the efficiency of data reconstruction.
另一方面,当n≥2时,MDS可以通过多种可实现方式在数据存储系统中确定n个目标存储节点。本公开实施例中将对以下的三种可实现方式进行举例说明。On the other hand, when n ≥ 2, the MDS can determine n target storage nodes in the data storage system in a variety of implementable ways. The following three implementable modes will be exemplified in the embodiments of the present disclosure.
在第一种可实现方式中,MDS可以首先确定该数据存储系统中每个存储节点的负载。MDS可以根据每个存储节点的负载确定n个目标存储节点。示例的,MDS上可以预先存储有预设个数阈值n。MDS可以将该数据存储系统中的存储节点的负载进行比较,并将负载较小的n个存储节点确定为n个目标存储节点。也即是,MDS在选择需要执行重构任务的目标存储节点时,可以选择负载较小(数据处理能力较高)的存储节点作为用于执行数据重构任务的目标存储节点,以保证目标存储节点能够较快的执行重构数据块的任务,提高数据重构的效率。In a first implementation, the MDS may first determine the load of each storage node in the data storage system. The MDS can determine n target storage nodes based on the load of each storage node. For example, a preset number threshold n may be pre-stored on the MDS. The MDS can compare the loads of the storage nodes in the data storage system and determine the n storage nodes with smaller loads as the n target storage nodes. That is, when the MDS selects the target storage node that needs to perform the reconstruction task, the storage node with smaller load (higher data processing capability) can be selected as the target storage node for performing the data reconstruction task to ensure the target storage. The node can perform the task of reconstructing the data block faster, and improve the efficiency of data reconstruction.
在第二种可实现方式中,数据存储系统中的存储节点中存在n个预设存储节点,且该n个预设存储节点可以为数据处理能力较高的存储节点。MDS可以直接将预设的n个存储节点确定为n个目标存储节点。In the second implementation manner, there are n preset storage nodes in the storage node in the data storage system, and the n preset storage nodes may be storage nodes with higher data processing capability. The MDS can directly determine the preset n storage nodes as n target storage nodes.
在第三种可实现方式中,MDS可以首先确定该数据存储系统中每个存储节点与MDS的路由距离,并根据每个存储节点与MDS的路由距离确定n个目标存储节点。示例的,MDS上可以预先存储有预设个数阈值n。MDS可以将该数据存储系统中的存储节点与MDS的路由距离进行比较,并将与MDS的路由距离较小的n个存储节点确定为n个目标存储节点。也即是,MDS在选择需要执行重构任务的目标存储节点时,可以选择与MDS的路由距离较近的存储节点作为用于执行数据重构任务的目标存储节点,以保证MDS后续能够快速的向每个目标存储节点分配重构数据块的任务,提高数据重构的效率。In a third implementation manner, the MDS may first determine a routing distance between each storage node and the MDS in the data storage system, and determine n target storage nodes according to a routing distance between each storage node and the MDS. For example, a preset number threshold n may be pre-stored on the MDS. The MDS can compare the storage nodes of the data storage system with the routing distance of the MDS, and determine n storage nodes with a small routing distance from the MDS as n target storage nodes. That is, when selecting the target storage node that needs to perform the reconstruction task, the MDS may select the storage node that is closer to the routing distance of the MDS as the target storage node for performing the data reconstruction task, so as to ensure that the MDS can be quickly followed. The task of allocating reconstructed data blocks to each target storage node improves the efficiency of data reconstruction.
步骤405、MDS向n个目标存储节点分别发送n个重构指令。Step 405: The MDS separately sends n reconstruction instructions to the n target storage nodes.
MDS在确定m个数据块中每个数据块的条带信息以及n个目标存储节点后,可以根据n个目标存储节点的负载以及m个数据块的条带信息,确定每个 目标存储节点对应的至少一个数据块。其中,目标存储节点对应的所有数据块的数据量之和与目标存储节点的负载负相关。After determining the stripe information of each data block in the m data blocks and the n target storage nodes, the MDS may determine the correspondence of each target storage node according to the load of the n target storage nodes and the stripe information of the m data blocks. At least one data block. The sum of the data amounts of all the data blocks corresponding to the target storage node is negatively correlated with the load of the target storage node.
若目标存储节点的负载较大,则目标存储节点能够重构较少数据量的数据,此时该目标存储节点对应的所有数据块的数据量之和较小;若目标存储节点的负载较小,则目标存储节点能够重构较多数据量的数据,此时该目标存储节点对应的所有数据块的数据量之和较大。也即,目标存储节点的数据重构能力与目标存储节点的负载相关,MDS需要根据每个目标存储节点的负载以及重构能力,适当的为每个目标存储节点分配需要重构的数据块。If the load of the target storage node is large, the target storage node can reconstruct data with a smaller amount of data. At this time, the sum of the data amounts of all the data blocks corresponding to the target storage node is small; if the load of the target storage node is small The target storage node can reconstruct data of a larger amount of data. At this time, the sum of the data amounts of all the data blocks corresponding to the target storage node is large. That is, the data reconstruction capability of the target storage node is related to the load of the target storage node. The MDS needs to allocate a data block to be reconstructed for each target storage node according to the load and reconstruction capability of each target storage node.
MDS在确定每个目标存储节点对应的数据块后,可以根据数据块的条带信息,生成每个目标存储节点对应的重构指令。其中,每个目标存储节点对应的重构指令用于指示对该每个目标存储节点对应的数据块进行重构和存储。之后,MDS就可以向每个目标存储节点上运行的审计服务器发送该每个目标存储节点对应的重构指令。After determining the data block corresponding to each target storage node, the MDS may generate a reconstruction instruction corresponding to each target storage node according to the stripe information of the data block. The reconfiguration instruction corresponding to each target storage node is used to indicate that the data block corresponding to each target storage node is reconstructed and stored. Afterwards, the MDS can send the reconfiguration instruction corresponding to each target storage node to the audit server running on each target storage node.
示例的,每个目标存储节点对应的重构指令包括:该每个目标存储节点对应的每个数据块的条带信息,以及用于指示该每个目标存储节点是否为故障磁盘所在的存储节点的指示信息。For example, the reconfiguration instruction corresponding to each target storage node includes: stripe information of each data block corresponding to each target storage node, and a storage node for indicating whether each target storage node is a faulty disk. Instructions.
步骤406、n个目标存储节点根据接收到的重构指令重构并存储数据块。Step 406: The n target storage nodes reconstruct and store the data block according to the received reconstruction instruction.
每个目标存储节点上运行的审计服务器在接收到重构指令后,可以对该重构指令进行解析,以得到需要重构的至少一个数据块的条带信息。After receiving the reconstruction instruction, the audit server running on each target storage node may parse the reconstruction instruction to obtain stripe information of at least one data block that needs to be reconstructed.
每个目标存储节点上运行的审计服务器还可以根据需要重构的每个数据块的条带信息,通过本地的OSD读取重构每个数据块过程中所需的有效数据,该有效数据存储在数据存储系统中的至少一个磁盘中。需要说明的是,假设数据块X为需要重构的数据块,且数据块X的条带信息包括:数据块X的存储信息,以及数据块Y的存储信息,则重构数据块X的过程中所需的有效数据为数据块Y。之后,每个目标存储节点上运行的审计服务器就可以根据读取到的有效数据以及接收到的重构指令,重构相应的数据块。The audit server running on each target storage node can also read the valid data required for reconstructing each data block through the local OSD according to the stripe information of each data block that needs to be reconstructed, and the valid data storage. In at least one disk in the data storage system. It should be noted that, assuming that the data block X is a data block that needs to be reconstructed, and the stripe information of the data block X includes: the storage information of the data block X, and the storage information of the data block Y, the process of reconstructing the data block X The valid data required in is the data block Y. Afterwards, the audit server running on each target storage node can reconstruct the corresponding data block according to the valid data read and the received reconstruction instructions.
进一步的,每个目标存储节点上运行的审计服务器在接收到重构指令后,可以对该重构指令进行解析,以得到用于指示该每个目标存储节点是否为故障磁盘所在的存储节点的指示信息。Further, after receiving the reconfiguration instruction, the audit server running on each target storage node may parse the reconfiguration instruction to obtain a storage node for indicating whether each target storage node is a faulty disk. Instructions.
第一方面,当该指示信息用于指示目标存储节点不为故障磁盘所在的存储节点时,该目标存储节点上运行的审计服务器可以确定该重构指令用于指示该 目标存储节点将重构的数据块存储在缓存磁盘。在目标存储节点运行的审计服务器重构完毕数据块之后,该目标存储节点上运行的审计服务器可以将重构的数据块发送给本地的OSD,并指示本地的OSD将该数据块写入该目标存储节点中的缓存磁盘中。In a first aspect, when the indication information is used to indicate that the target storage node is not the storage node where the faulty disk is located, the audit server running on the target storage node may determine that the reconstruction instruction is used to indicate that the target storage node is to be reconstructed. The data block is stored on the cache disk. After the audit server running on the target storage node reconstructs the data block, the audit server running on the target storage node may send the reconstructed data block to the local OSD and instruct the local OSD to write the data block to the target. The cache disk in the storage node.
第二方面,当该指示信息用于指示目标存储节点为故障磁盘所在的存储节点(也即第一存储节点)时,第一存储节点上运行的审计服务器可以确定该重构指令用于指示该第一存储节点将重构的数据块存储在缓存磁盘。在第一存储节点上运行的审计服务器重构完毕数据块之后,第一存储节点上运行的审计服务器可以将重构的数据块发送给本地的OSD,并指示本地的OSD将该数据块写入第一存储节点的缓存磁盘中。In a second aspect, when the indication information is used to indicate that the target storage node is a storage node (ie, the first storage node) where the faulty disk is located, the audit server running on the first storage node may determine that the reconstruction instruction is used to indicate the The first storage node stores the reconstructed data block on the cache disk. After the audit server running on the first storage node reconstructs the data block, the audit server running on the first storage node may send the reconstructed data block to the local OSD and instruct the local OSD to write the data block. The first storage node is in the cache disk.
第三方面,当该指示信息用于指示目标存储节点为故障磁盘所在的存储节点(也即第一存储节点)时,第一存储节点上运行的审计服务器也可以确定该重构指令用于指示该目标存储节点将重构的数据块存储在存储磁盘。第一存储节点上运行的审计服务器可以向MDS发送存储申请消息,该存储申请消息包括m个数据块的总数据量。MDS可以根据该存储申请消息,向第一存储节点上运行的审计服务器发送存储指令,该存储指令用于指示第一存储节点在目标磁盘上存储重构的m个数据块,该目标磁盘可以为第一存储节点中可用存储容量大于或等于m个数据块的总数据量的存储磁盘,且该目标磁盘上并未存储有相关数据块,该相关数据块与m个数据块中的任一数据块属于同一条带。在第一存储节点运行的审计服务器重构完毕数据块之后,第一存储节点运行的审计服务器可以将重构的数据块发送给本地的OSD,并指示本地的OSD将该数据块写入第一存储节点中的目标磁盘中。In a third aspect, when the indication information is used to indicate that the target storage node is a storage node where the faulty disk is located (that is, the first storage node), the audit server running on the first storage node may also determine that the reconstruction instruction is used to indicate The target storage node stores the reconstructed data block on a storage disk. The audit server running on the first storage node may send a storage request message to the MDS, the storage request message including the total data amount of the m data blocks. The MDS may send, according to the storage request message, a storage instruction to the audit server running on the first storage node, where the storage instruction is used to instruct the first storage node to store the reconstructed m data blocks on the target disk, where the target disk may be a storage disk in which the storage capacity of the first storage node is greater than or equal to the total data amount of the m data blocks, and the related data block is not stored on the target disk, and the related data block and any data in the m data blocks are not stored. Blocks belong to the same strip. After the audit server running on the first storage node reconstructs the data block, the audit server running by the first storage node may send the reconstructed data block to the local OSD, and instruct the local OSD to write the data block to the first In the target disk in the storage node.
步骤407、n个目标存储节点分别向MDS发送重构完成消息。Step 407: The n target storage nodes respectively send a reconstruction complete message to the MDS.
在每个目标存储节点根据重构指令重构完成所有的数据块,并将所有重构的数据块进行存储后,该每个目标存储节点可以向MDS发送重构完成消息。每个目标存储节点发送的重构完成消息可以包括:该目标存储节点重构的每个数据块的标识,以及该每个数据块所存储的磁盘的标识。After each target storage node reconstructs all the data blocks according to the reconstruction instruction and stores all the reconstructed data blocks, the target storage node may send a reconstruction completion message to the MDS. The reconstruction completion message sent by each target storage node may include: an identifier of each data block reconstructed by the target storage node, and an identifier of a disk stored by each of the data blocks.
步骤408、MDS向第一存储节点发送获取指令,获取指令用于指示第一存储节点获取并存储n个目标存储节点重构的m个数据块。Step 408: The MDS sends an acquisition instruction to the first storage node, where the acquisition instruction is used to instruct the first storage node to acquire and store the m data blocks reconstructed by the n target storage nodes.
当MDS接收到n个目标存储节点发送的重构完成消息后,MDS可以确定此时每个目标存储节点已经完成了MDS分配的数据重构任务,此时,MDS可 以向第一存储节点上运行的审计服务器发送获取指令,以指示该审计服务器获取并存储每个目标存储节点重构的数据块。需要说明的是,该获取指令可以包括:m个数据块中每个数据块的标识,以及该每个数据块所存储的磁盘的标识。After the MDS receives the reconstruction completion message sent by the n target storage nodes, the MDS may determine that each target storage node has completed the data reconstruction task of the MDS allocation. At this time, the MDS may run on the first storage node. The audit server sends an acquisition instruction to instruct the audit server to acquire and store the data block reconstructed by each target storage node. It should be noted that the obtaining instruction may include: an identifier of each data block in the m data blocks, and an identifier of a disk stored in each of the data blocks.
步骤409、第一存储节点根据获取指令,获取并存储重构的m个数据块。Step 409: The first storage node acquires and stores the reconstructed m data blocks according to the acquisition instruction.
第一存储节点上运行的审计服务器可以根据该获取指令,确定重构的m个数据块中每个数据块所存储的磁盘的标识,并在相应的磁盘上获取重构数据块。The audit server running on the first storage node may determine the identifier of the disk stored in each of the reconstructed m data blocks according to the obtaining instruction, and obtain the reconstructed data block on the corresponding disk.
示例的,当重构的数据块所存储的磁盘不为第一存储节点中的磁盘时,该第一存储节点上运行的审计服务器可以通过本地的OSD和SAS交换机,读取或拷贝相应的磁盘(其他存储节点的缓存磁盘)上存储的重构的数据块。当重构的数据块所存储的磁盘为第一存储节点中的缓存磁盘时,该第一存储节点上运行的审计服务器可以直接读取本地缓存磁盘上存储的重构的数据块。当重构的数据块所存储的磁盘为第一存储节点中的目标磁盘时,该第一存储节点上运行的审计服务器无需执行读取数据块的步骤,就能够获取到该重构的数据块。For example, when the disk stored in the reconstructed data block is not the disk in the first storage node, the audit server running on the first storage node can read or copy the corresponding disk through the local OSD and SAS switches. Reconstructed data blocks stored on (cache disks of other storage nodes). When the disk stored in the reconstructed data block is a cache disk in the first storage node, the audit server running on the first storage node can directly read the reconstructed data block stored on the local cache disk. When the disk stored in the reconstructed data block is the target disk in the first storage node, the audit server running on the first storage node can obtain the reconstructed data block without performing the step of reading the data block. .
在获取到重构的数据块后,第一存储节点上运行的审计服务器可以将重构的数据块进行存储。After the reconstructed data block is obtained, the audit server running on the first storage node may store the reconstructed data block.
示例的,当步骤406中通过第一方面或第二方面实现存储重构的数据块时,在步骤409中,第一存储节点上运行的审计服务器还可以向MDS发送存储申请消息,该存储申请消息可以包括m个数据块的总数据量。MDS可以根据该存储申请消息,向第一存储节点上运行的审计服务器发送存储指令,该存储指令用于指示第一存储节点在目标磁盘上存储重构的m个数据块。然后,第一存储节点上运行的审计服务器就可以将获取到的重构的m个数据块存储在目标磁盘上。For example, when the reconstructed data block is implemented by the first aspect or the second aspect in step 406, in step 409, the audit server running on the first storage node may further send a storage application message to the MDS, the storage application. The message can include the total amount of data for m data blocks. The MDS may send, according to the storage request message, a storage instruction to the audit server running on the first storage node, where the storage instruction is used to instruct the first storage node to store the reconstructed m data blocks on the target disk. Then, the audit server running on the first storage node can store the obtained reconstructed m data blocks on the target disk.
当步骤406中通过第三方面实现存储重构的数据块时,在步骤409中第一存储节点上运行的审计服务器可以直接将获取到的重构的m个数据块存储在目标磁盘上,且无需执行对第一存储节点存储的重构的数据块执行重复存储的步骤,就可以保证该重构的数据块存储在目标磁盘上。When the reconstructed data block is implemented by the third aspect in step 406, the audit server running on the first storage node may directly store the obtained reconstructed m data blocks on the target disk in step 409, and The step of performing repeated storage on the reconstructed data block stored by the first storage node is performed, and the reconstructed data block can be guaranteed to be stored on the target disk.
步骤410、第一存储节点向MDS发送获取完成消息。Step 410: The first storage node sends an acquisition completion message to the MDS.
第一存储节点上运行的审计服务器在获取到重构的数据块后,可以向MDS发送获取完成消息,该获取完成消息可以用于指示第一存储节点已经获取完成每个目标存储节点重构的数据块。After obtaining the reconstructed data block, the audit server running on the first storage node may send an acquisition completion message to the MDS, where the acquisition completion message may be used to indicate that the first storage node has acquired the reconstruction of each target storage node. data block.
步骤411、第一存储节点向MDS发送m个数据块中每个数据块的存储信息,每个数据块的存储信息包括:目标磁盘的标识以及每个数据块的标识。Step 411: The first storage node sends, to the MDS, storage information of each of the m data blocks, where the storage information of each data block includes: an identifier of the target disk and an identifier of each data block.
第一存储节点上运行的审计服务器在确定重构的每个数据块写入目标磁盘后,可以向MDS发送每个数据块的存储信息,该存储信息包括该数据块的标识(如该数据块的key),以及该数据块所在的目标磁盘的标识。After the audit server running on the first storage node determines that each reconstructed data block is written to the target disk, the storage information of each data block may be sent to the MDS, where the storage information includes the identifier of the data block (such as the data block). Key), and the identity of the target disk where the data block is located.
步骤412、MDS更新m个数据块中每个数据块的条带信息。Step 412: The MDS updates the stripe information of each data block in the m data blocks.
MDS在接收到每个数据块的存储信息后,可以根据每个数据块的存储信息中的数据块的标识,查找每个数据块的条带信息,并将每个数据块的条带信息中该数据块所在的磁盘的标识,修改为目标磁盘的标识。After receiving the storage information of each data block, the MDS may search for the strip information of each data block according to the identifier of the data block in the storage information of each data block, and the strip information of each data block is The ID of the disk where the data block is located is modified to the ID of the target disk.
例如,数据块X的标识为key1,在对条带信息中磁盘的标识进行修改前,数据块X的条带信息可以为:{<stripe_id,wwn_1,key_1>,<stripe_id,wwn_2,key_2>,<stripe_id,wwn_3,key_3>,<stripe_id,wwn_4,key_4>,<stripe_id,wwn_5,key_5>},在步骤413中,MDS可以将该条带信息中的wwn_1修改为wwn_x(目标磁盘的标识),从而将数据块X的条带信息更新为:{<stripe_id,wwn_x,key_1>,<stripe_id,wwn_2,key_2>,<stripe_id,wwn_3,key_3>,<stripe_id,wwn_4,key_4>,<stripe_id,wwn_5,key_5>}。For example, the identifier of the data block X is key1. Before modifying the identifier of the disk in the stripe information, the stripe information of the data block X may be: {<stripe_id, wwn_1, key_1>, <stripe_id, wwn_2, key_2>, <stripe_id, wwn_3, key_3>, <stripe_id, wwn_4, key_4>, <stripe_id, wwn_5, key_5>}, in step 413, the MDS can modify wwn_1 in the stripe information to wwn_x (identification of the target disk), Thus, the stripe information of the data block X is updated to: {<stripe_id, wwn_x, key_1>, <stripe_id, wwn_2, key_2>, <stripe_id, wwn_3, key_3>, <stripe_id, wwn_4, key_4>, <stripe_id, wwn_5, Key_5>}.
步骤413、MDS向n个目标存储节点中的每个缓存存储节点发送删除指令,删除指令用于指示缓存存储节点删除该缓存存储节点的缓存磁盘上存储的数据块。Step 413: The MDS sends a delete instruction to each of the n target storage nodes, where the delete instruction is used to instruct the cache storage node to delete the data block stored on the cache disk of the cache storage node.
需要说明的是,n个目标存储节点包括缓存存储节点,每个缓存存储节点在重构数据块后,会将重构的数据块存储在缓存存储节点的缓存磁盘上。例如,当步骤406中通过第一方面或第二方面实现时,每个目标存储节点均为缓存存储节点,当步骤406中通过第三方面实现时,n个目标存储节点中除第一存储节点之外的每个目标存储节点均为缓存存储节点。It should be noted that the n target storage nodes include cache storage nodes, and each cache storage node stores the reconstructed data blocks on the cache disk of the cache storage node after reconstructing the data block. For example, when the first aspect or the second aspect is implemented in step 406, each target storage node is a cache storage node. When the third aspect is implemented in step 406, the first storage node is excluded from the n target storage nodes. Each target storage node is a cache storage node.
MDS在接收到第一存储节点上运行的审计服务器发送的获取完成消息后,可以向每个缓存存储节点上运行的OSD发送删除指令,以指示每个缓存存储节点上运行的OSD删除该缓存存储节点的缓存磁盘上存储的数据块(也即重构的数据块)。After receiving the acquisition completion message sent by the audit server running on the first storage node, the MDS may send a delete instruction to the OSD running on each cache storage node to indicate that the OSD running on each cache storage node deletes the cache storage. The data block (that is, the reconstructed data block) stored on the node's cache disk.
步骤414、每个缓存存储节点根据删除指令,删除该缓存存储节点的缓存磁盘上存储的数据块。Step 414: Each cache storage node deletes a data block stored on a cache disk of the cache storage node according to the delete instruction.
每个缓存存储节点上运行的OSD在接收到删除指令后,就可以直接将该 缓存存储节点的缓存磁盘上存储的数据块进行删除。After receiving the delete instruction, the OSD running on each cache storage node can directly delete the data block stored on the cache disk of the cache storage node.
示例的,如图2所示,假设第一存储节点为存储节点1,且故障磁盘为第一存储节点上的存储磁盘1-1,则MDS可以确定存储节点1、存储节点2、存储节点3、存储节点4和存储节点5均为目标存储节点。For example, as shown in FIG. 2, assuming that the first storage node is the storage node 1, and the failed disk is the storage disk 1-1 on the first storage node, the MDS may determine the storage node 1, the storage node 2, and the storage node 3. The storage node 4 and the storage node 5 are both target storage nodes.
MDS还可以向存储节点1、存储节点2、存储节点3、存储节点4和存储节点5分别发送重构指令。存储节点1可以根据接收到的重构指令重构数据块1,存储节点2可以根据接收到的重构指令重构数据块2,存储节点3可以根据接收到的重构指令重构数据块3,存储节点4可以根据接收到的重构指令重构数据块4,存储节点5可以根据接收到的重构指令重构数据块5。需要说明的是,存储磁盘1-1(故障磁盘)中存储有数据块1、数据块2、数据块3、数据块4和数据块5。The MDS can also send reconstruction instructions to the storage node 1, the storage node 2, the storage node 3, the storage node 4, and the storage node 5, respectively. The storage node 1 can reconstruct the data block 1 according to the received reconstruction instruction, the storage node 2 can reconstruct the data block 2 according to the received reconstruction instruction, and the storage node 3 can reconstruct the data block 3 according to the received reconstruction instruction. The storage node 4 can reconstruct the data block 4 according to the received reconstruction instruction, and the storage node 5 can reconstruct the data block 5 according to the received reconstruction instruction. It should be noted that the storage disk 1-1 (faulty disk) stores the data block 1, the data block 2, the data block 3, the data block 4, and the data block 5.
存储节点1还可以向MDS发送存储申请消息,MDS可以向存储节点1发送存储指令,该存储指令用于指示存储节点1在存储磁盘6-1(目标磁盘)上存储数据块。存储节点1可以将重构的数据块1存储在存储磁盘6-1上,存储节点2可以将重构的数据块2存储在缓存磁盘2上,存储节点3可以将重构的数据块3存储在缓存磁盘3上,存储节点4可以将重构的数据块4存储在缓存磁盘4上,存储节点5可以将重构的数据块5存储在缓存磁盘5上。The storage node 1 may also send a storage request message to the MDS, and the MDS may send a storage instruction to the storage node 1 for instructing the storage node 1 to store the data block on the storage disk 6-1 (target disk). The storage node 1 can store the reconstructed data block 1 on the storage disk 6-1, the storage node 2 can store the reconstructed data block 2 on the cache disk 2, and the storage node 3 can store the reconstructed data block 3 On the cache disk 3, the storage node 4 can store the reconstructed data block 4 on the cache disk 4, and the storage node 5 can store the reconstructed data block 5 on the cache disk 5.
每个目标存储节点在存储重构的数据块后,可以向MDS发送重构完毕消息,使得MDS在接收到所有目标存储节点发送的重构完毕消息后,向存储节点1发送获取指令。存储节点1可以根据接收到的获取指令,通过SAS交换机获取缓存磁盘2、缓存磁盘3、缓存磁盘4和缓存磁盘5上存储的重构的数据块2、数据块3、数据块4和数据块5,并将数据块2、数据块3、数据块4和数据块5也存储在存储磁盘6-1上。After storing the reconstructed data block, each target storage node may send a reconstruction complete message to the MDS, so that the MDS sends an acquisition instruction to the storage node 1 after receiving the reconstructed message sent by all the target storage nodes. The storage node 1 can obtain the reconstructed data block 2, the data block 3, the data block 4, and the data block stored on the cache disk 2, the cache disk 3, the cache disk 4, and the cache disk 5 through the SAS switch according to the received acquisition instruction. 5, and the data block 2, the data block 3, the data block 4, and the data block 5 are also stored on the storage disk 6-1.
在存储节点1获取到每个目标存储节点重构的数据块后,存储节点1还可以向MDS发送获取完毕消息,MDS可以根据接收到的获取完毕消息,向存储节点2、存储节点3、存储节点4和存储节点5分别发送删除指令,以指示存储节点2、存储节点3、存储节点4和存储节点5分别删除本地的缓存磁盘上存储的数据块。在存储节点1将数据块存储在存储磁盘6-1上后,存储节点1还可以像MDS发送该数据块的存储信息。MDS可以根据该数据块的存储信息,更新该数据块的条带信息。After the storage node 1 obtains the data block reconstructed by each target storage node, the storage node 1 may also send an acquisition complete message to the MDS, and the MDS may store the storage node 2, the storage node 3, and the storage according to the received acquisition completed message. The node 4 and the storage node 5 respectively send a delete instruction to instruct the storage node 2, the storage node 3, the storage node 4, and the storage node 5 to delete the data blocks stored on the local cache disk, respectively. After the storage node 1 stores the data block on the storage disk 6-1, the storage node 1 can also transmit the storage information of the data block like the MDS. The MDS can update the stripe information of the data block according to the storage information of the data block.
综上所述,本公开实施例提供了一种数据重构方法,MDS在检测到第一 存储节点中出现故障磁盘时,向n个目标存储节点发送了重构指令,以使得n个目标存储节点对故障磁盘上的数据块进行重构,且MDS还可以指示第一存储节点获取每个目标存储节点重构得到的数据块。当第一存储节点的数据重构能力较弱时,由于该n个目标存储节点中存在与第一存储节点不同的其他存储节点,因此,其他存储节点能够帮助第一存储节点对数据块进行重构,使得第一存储节点所需重构的数据较少,故障磁盘上存储的数据块的重构速度较快,所以,提高了数据存储系统的数据重构效率。In summary, the embodiment of the present disclosure provides a data reconstruction method. When detecting a faulty disk in a first storage node, the MDS sends a reconstruction instruction to the n target storage nodes, so that n target storages are performed. The node reconstructs the data block on the failed disk, and the MDS may also instruct the first storage node to acquire the data block reconstructed by each target storage node. When the data reconstruction capability of the first storage node is weak, since there are other storage nodes different from the first storage node among the n target storage nodes, other storage nodes can help the first storage node to weight the data block. Therefore, the first storage node needs less reconstructed data, and the reconstruction of the data block stored on the faulty disk is faster, so the data reconstruction efficiency of the data storage system is improved.
图5为本公开实施例提供的一种数据重构装置的结构示意图,该数据重构装置可以用于数据存储系统中的MDS(如图1所示的MDS),如图5所示,该数据重构装置50可以包括:FIG. 5 is a schematic structural diagram of a data reconstruction apparatus according to an embodiment of the present disclosure. The data reconstruction apparatus may be used in an MDS (such as the MDS shown in FIG. 1) in a data storage system, as shown in FIG. The data reconstruction device 50 can include:
第一发送模块501,用于在检测到第一存储节点中出现存储有m个数据块的故障磁盘时,向多个存储节点中的n个目标存储节点分别发送重构指令,其中,第一存储节点为多个存储节点中的任一存储节点,发送给n个目标重构节点的n个重构指令用于指示:对m个数据块进行重构并存储,n个目标存储节点中存在与第一存储节点不同的存储节点,m≥n≥1;a first sending module 501, configured to send a reconfiguration instruction to each of the plurality of storage nodes when detecting a faulty disk in which the m data blocks are stored in the first storage node, where the first The storage node is any one of the plurality of storage nodes, and the n reconstruction instructions sent to the n target reconstruction nodes are used to indicate that the m data blocks are reconstructed and stored, and the n target storage nodes exist. a storage node different from the first storage node, m≥n≥1;
第二发送模块502,用于向第一存储节点发送获取指令,获取指令用于指示第一存储节点获取并存储n个目标存储节点重构的m个数据块。The second sending module 502 is configured to send an acquisition instruction to the first storage node, where the obtaining instruction is used to instruct the first storage node to acquire and store the m data blocks reconstructed by the n target storage nodes.
综上所述,本公开实施例提供了一种数据重构装置,第一发送模块在检测到第一存储节点中出现故障磁盘时,向n个目标存储节点发送了重构指令,以使得n个目标存储节点对故障磁盘上的数据块进行重构,且第二发送模块可以指示第一存储节点获取每个目标存储节点重构得到的数据块。当第一存储节点的数据重构能力较弱时,由于该n个目标存储节点中存在与第一存储节点不同的其他存储节点,因此,其他存储节点能够帮助第一存储节点对数据块进行重构,使得第一存储节点所需重构的数据较少,故障磁盘上存储的数据块的重构速度较快,所以,提高了数据存储系统的数据重构效率。In summary, the embodiment of the present disclosure provides a data reconstruction apparatus, where the first sending module sends a reconfiguration instruction to the n target storage nodes when detecting a faulty disk in the first storage node, so that n The target storage node reconstructs the data block on the failed disk, and the second sending module may instruct the first storage node to acquire the data block reconstructed by each target storage node. When the data reconstruction capability of the first storage node is weak, since there are other storage nodes different from the first storage node among the n target storage nodes, other storage nodes can help the first storage node to weight the data block. Therefore, the first storage node needs less reconstructed data, and the reconstruction of the data block stored on the faulty disk is faster, so the data reconstruction efficiency of the data storage system is improved.
可选的,m≥n≥2,该数据重构装置50还可以包括:Optionally, m≥n≥2, the data reconstruction device 50 may further include:
第一确定模块(图5中未示出),用于将多个存储节点中负载较小的n个存储节点,确定为n个目标存储节点;a first determining module (not shown in FIG. 5), configured to determine n storage nodes with smaller loads among the plurality of storage nodes as n target storage nodes;
或者,第二确定模块(图5中未示出),用于将多个存储节点中预设的n个存储节点,确定为n个目标存储节点;Or a second determining module (not shown in FIG. 5), configured to determine n storage nodes preset among the plurality of storage nodes as n target storage nodes;
或者,第三确定模块(图5中未示出),用于将多个存储节点中与MDS的路由距离最近的n个存储节点,确定为n个目标存储节点。Alternatively, the third determining module (not shown in FIG. 5) is configured to determine n storage nodes that are closest to the routing distance of the MDS among the plurality of storage nodes as n target storage nodes.
可选的,第一发送模块501可以用于:确定每个目标存储节点的负载;根据n个目标存储节点的负载,确定每个目标存储节点对应的至少一个数据块,其中,目标存储节点对应的所有数据块的数据量之和与目标存储节点的负载负相关;生成每个目标存储节点对应的重构指令,其中,每个目标存储节点对应的重构指令用于指示:对每个目标存储节点对应的数据块进行重构并存储;向每个目标存储节点发送其对应的重构指令。Optionally, the first sending module 501 is configured to: determine a load of each target storage node; determine, according to a load of the n target storage nodes, at least one data block corresponding to each target storage node, where the target storage node corresponds to The sum of the data amounts of all the data blocks is negatively correlated with the load of the target storage node; generating a reconstruction instruction corresponding to each target storage node, wherein the reconstruction instruction corresponding to each target storage node is used to indicate: for each target The data block corresponding to the storage node is reconstructed and stored; and each corresponding storage node is sent its corresponding reconstruction instruction.
可选的,图6为本公开实施例提供的另一种数据重构装置的结构示意图,如图6所示,在图5的基础上,该数据重构装置50还可以包括:Optionally, FIG. 6 is a schematic structural diagram of another data reconstruction apparatus according to an embodiment of the present disclosure. As shown in FIG. 6, the data reconstruction apparatus 50 may further include:
第一接收模块503,用于接收第一存储节点发送的存储申请消息,存储申请消息包括m个数据块的总数据量;The first receiving module 503 is configured to receive a storage application message sent by the first storage node, where the storage application message includes a total data volume of the m data blocks;
第三发送模块504,用于根据存储申请消息向第一存储节点发送存储指令,存储指令用于指示在目标磁盘上存储重构的m个数据块,目标磁盘为第一存储节点中可用存储容量大于或等于总数据量的磁盘。The third sending module 504 is configured to send, to the first storage node, a storage instruction according to the storage request message, where the storage instruction is used to store the reconstructed m data blocks on the target disk, where the target disk is an available storage capacity in the first storage node. A disk that is greater than or equal to the total amount of data.
可选的,目标磁盘上并未存储有相关数据块,相关数据块与m个数据块中的任一数据块属于同一条带。Optionally, the relevant data block is not stored on the target disk, and the related data block belongs to the same strip as any one of the m data blocks.
可选的,图7为本公开实施例提供的又一种数据重构装置的结构示意图,如图7所示,在图6的基础上,该数据重构装置50还可以包括:Optionally, FIG. 7 is a schematic structural diagram of another data reconstruction apparatus according to an embodiment of the present disclosure. As shown in FIG. 7, the data reconstruction apparatus 50 may further include:
第二接收模块505,用于接收第一存储节点发送的每个数据块的存储信息,每个数据块的存储信息包括:目标磁盘的标识以及每个数据块的标识;The second receiving module 505 is configured to receive storage information of each data block sent by the first storage node, where the storage information of each data block includes: an identifier of the target disk and an identifier of each data block;
第四确定模块506,用于根据每个数据块的存储信息中每个数据块的标识,确定每个数据块的条带信息;a fourth determining module 506, configured to determine strip information of each data block according to an identifier of each data block in the storage information of each data block;
修改模块507,用于将每个数据块的条带信息中每个数据块所在的磁盘的标识,修改为目标磁盘的标识。The modifying module 507 is configured to modify the identifier of the disk where each data block in each strip of the data block is located to be the identifier of the target disk.
可选的,多个存储节点中的每个存储节点包括:存储磁盘和缓存磁盘,每个存储节点具有存储磁盘的读权限,以及缓存磁盘的读写权限,n个目标存储节点包括缓存存储节点,发送给缓存存储节点的重构指令用于指示:将重构的数据块存储在缓存存储节点的缓存磁盘,图8为本公开实施例提供的再一种数据重构装置的结构示意图,如图8所示,在图5的基础上,该数据重构装置50还可以包括:Optionally, each of the plurality of storage nodes includes: a storage disk and a cache disk, each storage node has read permission of the storage disk, and read and write permissions of the cache disk, and the n target storage nodes include a cache storage node. The reconfiguration instruction sent to the cache storage node is used to indicate that the reconstructed data block is stored in the cache disk of the cache storage node. FIG. 8 is a schematic structural diagram of another data reconstruction apparatus according to an embodiment of the present disclosure, such as As shown in FIG. 8, on the basis of FIG. 5, the data reconstruction apparatus 50 may further include:
第三接收模块508,用于接收第一存储节点发送的获取完毕消息,获取完毕消息用于指示第一存储节点已经获取并存储完毕重构的m个数据块;The third receiving module 508 is configured to receive the obtained completion message sent by the first storage node, where the obtained complete message is used to indicate that the first storage node has acquired and stored the reconstructed m data blocks.
第四发送模块509,用于向缓存存储节点发送删除指令,删除指令用于指示缓存存储节点删除其缓存磁盘上存储的数据块。The fourth sending module 509 is configured to send a delete instruction to the cache storage node, where the delete instruction is used to instruct the cache storage node to delete the data block stored on the cache disk.
综上所述,本公开实施例提供了一种数据重构装置,第一发送模块在检测到第一存储节点中出现故障磁盘时,向n个目标存储节点发送了重构指令,以使得n个目标存储节点对故障磁盘上的数据块进行重构,且第二发送模块可以指示第一存储节点获取每个目标存储节点重构得到的数据块。当第一存储节点的数据重构能力较弱时,由于该n个目标存储节点中存在与第一存储节点不同的其他存储节点,因此,其他存储节点能够帮助第一存储节点对数据块进行重构,使得第一存储节点所需重构的数据较少,故障磁盘上存储的数据块的重构速度较快,所以,提高了数据存储系统的数据重构效率。In summary, the embodiment of the present disclosure provides a data reconstruction apparatus, where the first sending module sends a reconfiguration instruction to the n target storage nodes when detecting a faulty disk in the first storage node, so that n The target storage node reconstructs the data block on the failed disk, and the second sending module may instruct the first storage node to acquire the data block reconstructed by each target storage node. When the data reconstruction capability of the first storage node is weak, since there are other storage nodes different from the first storage node among the n target storage nodes, other storage nodes can help the first storage node to weight the data block. Therefore, the first storage node needs less reconstructed data, and the reconstruction of the data block stored on the faulty disk is faster, so the data reconstruction efficiency of the data storage system is improved.
本公开实施例提供了一种计算机设备,该计算机设备上运行有计算机程序,计算机设备中的处理器执行计算机程序以实现上述数据重构方法。图1所示的数据存储系统中的MDS可以包括该计算机设备。Embodiments of the present disclosure provide a computer device having a computer program running thereon, the processor in the computer device executing a computer program to implement the data reconstruction method described above. The MDS in the data storage system shown in Figure 1 can include the computer device.
本公开实施例提供了一种存储介质,该存储介质上存储有计算机程序,处理器执行计算机程序以实现上述数据重构方法。Embodiments of the present disclosure provide a storage medium on which a computer program is stored, and a processor executes a computer program to implement the data reconstruction method described above.
本公开实施例提供了一种计算机程序产品,当所述计算机程序产品在计算机上运行时,使得计算机执行上述数据重构方法。Embodiments of the present disclosure provide a computer program product that, when executed on a computer, causes the computer to perform the data reconstruction method described above.
需要说明的是,本公开实施例提供的方法实施例能够与相应的装置实施例相互参考,本公开实施例对此不做限定。本公开实施例提供的方法实施例步骤的先后顺序能够进行适当调整,步骤也能够根据情况进行相应增减,任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,可轻易想到变化的方法,都应涵盖在本公开的保护范围之内,因此不再赘述。It should be noted that the embodiment of the method provided by the embodiment of the present disclosure can refer to the corresponding device embodiment, and the embodiment of the present disclosure does not limit this. The sequence of the steps of the method embodiments provided by the embodiments of the present disclosure can be appropriately adjusted, and the steps can also be correspondingly increased or decreased according to the situation. Any person skilled in the art can easily think of changes within the technical scope disclosed by the disclosure. The method should be covered by the scope of the present disclosure, and therefore will not be described again.
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。A person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium. The storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.
以上所述仅为本公开的可选实施例,并不用以限制本公开,凡在本公开的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本公开的 保护范围之内。The above description is only an alternative embodiment of the present disclosure, and is not intended to limit the disclosure, and any modifications, equivalents, improvements, etc., made within the spirit and principles of the present disclosure should be included in the protection of the present disclosure. Within the scope.

Claims (18)

  1. 一种数据重构方法,用于数据存储系统中的元数据管理服务器MDS,所述数据存储系统还包括:串行连接小型计算机系统接口SAS交换机和多个存储节点,所述多个存储节点通过所述SAS交换机相互连接,所述方法包括:A data reconstruction method for a metadata management server MDS in a data storage system, the data storage system further comprising: a serial connection small computer system interface SAS switch and a plurality of storage nodes, wherein the plurality of storage nodes pass The SAS switches are connected to each other, and the method includes:
    在检测到第一存储节点中出现存储有m个数据块的故障磁盘时,向所述多个存储节点中的n个目标存储节点分别发送重构指令,其中,所述第一存储节点为所述多个存储节点中的任一存储节点,发送给所述n个目标重构节点的n个重构指令用于指示:对所述m个数据块进行重构并存储,所述n个目标存储节点中存在与所述第一存储节点不同的存储节点,m≥n≥1;When detecting the faulty disk in which the m data blocks are stored in the first storage node, respectively sending a reconstruction instruction to the n target storage nodes of the plurality of storage nodes, where the first storage node is ???said one of the plurality of storage nodes, and the n reconstruction instructions sent to the n target reconstruction nodes are used to indicate that the m data blocks are reconstructed and stored, the n targets There is a storage node different from the first storage node in the storage node, m≥n≥1;
    向所述第一存储节点发送获取指令,所述获取指令用于指示所述第一存储节点获取并存储所述n个目标存储节点重构的所述m个数据块。Sending an acquisition instruction to the first storage node, where the obtaining instruction is used to instruct the first storage node to acquire and store the m data blocks reconstructed by the n target storage nodes.
  2. 根据权利要求1所述的方法,m≥n≥2,在向所述多个存储节点中的n个目标存储节点分别发送重构指令之前,所述方法还包括:The method of claim 1, wherein m≥n≥2, before separately transmitting the reconstruction instruction to the n target storage nodes of the plurality of storage nodes, the method further comprises:
    将所述多个存储节点中负载较小的n个存储节点,确定为所述n个目标存储节点;Determining, as the n target storage nodes, n storage nodes having a smaller load among the plurality of storage nodes;
    或者,将所述多个存储节点中预设的n个存储节点,确定为所述n个目标存储节点;Or determining, by the n storage nodes preset in the plurality of storage nodes, the n target storage nodes;
    或者,将所述多个存储节点中与所述MDS的路由距离最近的n个存储节点,确定为所述n个目标存储节点。Alternatively, the n storage nodes that are closest to the routing distance of the MDS among the plurality of storage nodes are determined as the n target storage nodes.
  3. 根据权利要求1或2所述的方法,向所述多个存储节点中的n个目标存储节点分别发送重构指令,包括:The method according to claim 1 or 2, wherein the reconfiguring instructions are respectively sent to the n target storage nodes of the plurality of storage nodes, including:
    确定所述每个目标存储节点的负载;Determining a load of each of the target storage nodes;
    根据所述n个目标存储节点的负载,确定所述每个目标存储节点对应的至少一个数据块,其中,目标存储节点对应的所有数据块的数据量之和与目标存储节点的负载负相关;Determining, according to the load of the n target storage nodes, at least one data block corresponding to each target storage node, wherein a sum of data amounts of all data blocks corresponding to the target storage node is negatively correlated with a load of the target storage node;
    生成所述每个目标存储节点对应的重构指令,其中,所述每个目标存储节点对应的重构指令用于指示:对所述每个目标存储节点对应的数据块进行重构 并存储;Reconstructing the reconstruction instruction corresponding to each target storage node, where the reconstruction instruction corresponding to each target storage node is used to indicate that the data block corresponding to each target storage node is reconstructed and stored;
    向所述每个目标存储节点发送其对应的重构指令。Sending its corresponding reconstruction instruction to each of the target storage nodes.
  4. 根据权利要求1所述的方法,在向所述多个存储节点中的n个目标存储节点分别发送重构指令之后,所述方法还包括:The method of claim 1, after the resizing instructions are respectively sent to the n target storage nodes of the plurality of storage nodes, the method further comprising:
    接收所述第一存储节点发送的存储申请消息,所述存储申请消息包括所述m个数据块的总数据量;Receiving a storage application message sent by the first storage node, where the storage application message includes a total data amount of the m data blocks;
    根据所述存储申请消息向所述第一存储节点发送存储指令,所述存储指令用于指示在目标磁盘上存储重构的所述m个数据块,所述目标磁盘为所述第一存储节点中可用存储容量大于或等于所述总数据量的磁盘。And sending, by the storage request message, a storage instruction to the first storage node, where the storage instruction is used to indicate that the reconstructed m data blocks are stored on a target disk, where the target disk is the first storage node A disk in which the available storage capacity is greater than or equal to the total amount of data.
  5. 根据权利要求4所述的方法,The method of claim 4,
    所述目标磁盘上并未存储有相关数据块,所述相关数据块与所述m个数据块中的任一数据块属于同一条带。An associated data block is not stored on the target disk, and the related data block belongs to the same strip as any one of the m data blocks.
  6. 根据权利要求4或5所述的方法,在向所述第一存储节点发送存储指令之后,所述方法还包括:The method of claim 4 or 5, after the sending the storage instruction to the first storage node, the method further comprises:
    接收所述第一存储节点发送的所述每个数据块的存储信息,所述每个数据块的存储信息包括:所述目标磁盘的标识以及所述每个数据块的标识;Receiving, by the first storage node, the storage information of each of the data blocks, where the storage information of each data block includes: an identifier of the target disk and an identifier of each of the data blocks;
    根据所述每个数据块的存储信息中所述每个数据块的标识,确定所述每个数据块的条带信息;Determining stripe information of each of the data blocks according to the identifier of each of the data blocks in the storage information of each data block;
    将所述每个数据块的条带信息中所述每个数据块所在的磁盘的标识,修改为所述目标磁盘的标识。Modifying, in the stripe information of each data block, an identifier of a disk where each of the data blocks is located, to an identifier of the target disk.
  7. 根据权利要求1所述的方法,所述多个存储节点中的每个存储节点包括:存储磁盘和缓存磁盘,所述每个存储节点具有所述存储磁盘的读权限,以及所述缓存磁盘的读写权限,所述n个目标存储节点包括缓存存储节点,发送给所述缓存存储节点的重构指令用于指示:将重构的数据块存储在所述缓存存储节点的缓存磁盘,在向所述第一存储节点发送获取指令之后,所述方法还包括:The method of claim 1, each of the plurality of storage nodes comprising: a storage disk and a cache disk, each storage node having read access to the storage disk, and the cache disk Read and write rights, the n target storage nodes include a cache storage node, and the reconstruction instruction sent to the cache storage node is used to indicate that the reconstructed data block is stored in a cache disk of the cache storage node, After the first storage node sends the acquisition instruction, the method further includes:
    接收所述第一存储节点发送的获取完毕消息,所述获取完毕消息用于指示所述第一存储节点已经获取并存储完毕重构的所述m个数据块;And receiving the obtained completion message sent by the first storage node, where the obtained information is used to indicate that the first storage node has acquired and stored the reconstructed m data blocks;
    向所述缓存存储节点发送删除指令,所述删除指令用于指示所述缓存存储节点删除其缓存磁盘上存储的数据块。And sending a delete instruction to the cache storage node, where the delete instruction is used to instruct the cache storage node to delete a data block stored on a cache disk thereof.
  8. 一种数据重构装置,用于数据存储系统中的元数据管理服务器MDS,所述数据存储系统还包括:串行连接小型计算机系统接口SAS交换机和多个存储节点,所述多个存储节点通过所述SAS交换机相互连接,所述方法包括:A data reconstruction apparatus is used for a metadata management server MDS in a data storage system, the data storage system further comprising: a serial connection small computer system interface SAS switch and a plurality of storage nodes, wherein the plurality of storage nodes pass The SAS switches are connected to each other, and the method includes:
    第一发送模块,用于在检测到第一存储节点中出现存储有m个数据块的故障磁盘时,向所述多个存储节点中的n个目标存储节点分别发送重构指令,其中,所述第一存储节点为所述多个存储节点中的任一存储节点,发送给所述n个目标重构节点的n个重构指令用于指示:对所述m个数据块进行重构并存储,所述n个目标存储节点中存在与所述第一存储节点不同的存储节点,m≥n≥1;a first sending module, configured to send a reconfiguration instruction to each of the plurality of storage nodes when detecting a faulty disk in which the m data blocks are stored in the first storage node, where The first storage node is any one of the plurality of storage nodes, and the n reconstruction instructions sent to the n target reconstruction nodes are used to indicate that the m data blocks are reconstructed and Storage, a storage node different from the first storage node exists in the n target storage nodes, m≥n≥1;
    第二发送模块,用于向所述第一存储节点发送获取指令,所述获取指令用于指示所述第一存储节点获取并存储所述n个目标存储节点重构的所述m个数据块。a second sending module, configured to send an acquisition instruction to the first storage node, where the obtaining instruction is used to instruct the first storage node to acquire and store the m data blocks reconstructed by the n target storage nodes .
  9. 根据权利要求8所述的数据重构装置,m≥n≥2,所述数据重构装置还包括:The data reconstruction device according to claim 8, wherein m≥n≥2, the data reconstruction device further comprises:
    第一确定模块,用于将所述多个存储节点中负载较小的n个存储节点,确定为所述n个目标存储节点;a first determining module, configured to determine n storage nodes that are less loaded among the plurality of storage nodes as the n target storage nodes;
    或者,第二确定模块,用于将所述多个存储节点中预设的n个存储节点,确定为所述n个目标存储节点;Or the second determining module is configured to determine, as the n target storage nodes, the n storage nodes preset in the multiple storage nodes;
    或者,第三确定模块,用于将所述多个存储节点中与所述MDS的路由距离最近的n个存储节点,确定为所述n个目标存储节点。Alternatively, the third determining module is configured to determine, as the n target storage nodes, n storage nodes that are closest to the routing distance of the MDS among the plurality of storage nodes.
  10. 根据权利要求8或9所述的数据重构装置,所述第一发送模块用于:The data reconstruction device according to claim 8 or 9, wherein the first sending module is configured to:
    确定所述每个目标存储节点的负载;Determining a load of each of the target storage nodes;
    根据所述n个目标存储节点的负载,确定所述每个目标存储节点对应的至少一个数据块,其中,目标存储节点对应的所有数据块的数据量之和与目标存储节点的负载负相关;Determining, according to the load of the n target storage nodes, at least one data block corresponding to each target storage node, wherein a sum of data amounts of all data blocks corresponding to the target storage node is negatively correlated with a load of the target storage node;
    生成所述每个目标存储节点对应的重构指令,其中,所述每个目标存储节点对应的重构指令用于指示:对所述每个目标存储节点对应的数据块进行重构 并存储;Reconstructing the reconstruction instruction corresponding to each target storage node, where the reconstruction instruction corresponding to each target storage node is used to indicate that the data block corresponding to each target storage node is reconstructed and stored;
    向所述每个目标存储节点发送其对应的重构指令。Sending its corresponding reconstruction instruction to each of the target storage nodes.
  11. 根据权利要求8所述的数据重构装置,所述数据重构装置还包括:The data reconstruction device according to claim 8, wherein the data reconstruction device further comprises:
    第一接收模块,用于接收所述第一存储节点发送的存储申请消息,所述存储申请消息包括所述m个数据块的总数据量;a first receiving module, configured to receive a storage application message sent by the first storage node, where the storage application message includes a total data volume of the m data blocks;
    第三发送模块,用于根据所述存储申请消息向所述第一存储节点发送存储指令,所述存储指令用于指示在目标磁盘上存储重构的所述m个数据块,所述目标磁盘为所述第一存储节点中可用存储容量大于或等于所述总数据量的磁盘。a third sending module, configured to send a storage instruction to the first storage node according to the storage request message, where the storage instruction is used to indicate that the reconstructed m data blocks are stored on a target disk, the target disk A disk that has a storage capacity greater than or equal to the total amount of data in the first storage node.
  12. 根据权利要求11所述的数据重构装置,The data reconstruction device according to claim 11,
    所述目标磁盘上并未存储有相关数据块,所述相关数据块与所述m个数据块中的任一数据块属于同一条带。An associated data block is not stored on the target disk, and the related data block belongs to the same strip as any one of the m data blocks.
  13. 根据权利要求11或12所述的数据重构装置,所述数据重构装置还包括:The data reconstruction device according to claim 11 or 12, wherein the data reconstruction device further comprises:
    第二接收模块,用于接收所述第一存储节点发送的所述每个数据块的存储信息,所述每个数据块的存储信息包括:所述目标磁盘的标识以及所述每个数据块的标识;a second receiving module, configured to receive storage information of each of the data blocks sent by the first storage node, where the storage information of each data block includes: an identifier of the target disk, and each of the data blocks Identification
    第四确定模块,用于根据所述每个数据块的存储信息中所述每个数据块的标识,确定所述每个数据块的条带信息;a fourth determining module, configured to determine stripe information of each of the data blocks according to the identifier of each of the data blocks in the storage information of each data block;
    修改模块,用于将所述每个数据块的条带信息中所述每个数据块所在的磁盘的标识,修改为所述目标磁盘的标识。And a modification module, configured to modify an identifier of the disk where each of the data blocks in the strip information of each data block is located to an identifier of the target disk.
  14. 根据权利要求8所述的数据重构装置,所述多个存储节点中的每个存储节点包括:存储磁盘和缓存磁盘,所述每个存储节点具有所述存储磁盘的读权限,以及所述缓存磁盘的读写权限,所述n个目标存储节点包括缓存存储节点,发送给所述缓存存储节点的重构指令用于指示:将重构的数据块存储在所述缓存存储节点的缓存磁盘,所述数据重构装置还包括:The data reconstruction device of claim 8, each of the plurality of storage nodes comprising: a storage disk and a cache disk, each storage node having read permission of the storage disk, and the The read and write permissions of the cache disk, the n target storage nodes include a cache storage node, and the reconstruction instruction sent to the cache storage node is used to indicate: storing the reconstructed data block in a cache disk of the cache storage node The data reconstruction device further includes:
    第三接收模块,用于接收所述第一存储节点发送的获取完毕消息,所述获 取完毕消息用于指示所述第一存储节点已经获取并存储完毕重构的所述m个数据块;a third receiving module, configured to receive the acquired information message sent by the first storage node, where the obtained message is used to indicate that the first storage node has acquired and stored the reconstructed m data blocks;
    第四发送模块,用于向所述缓存存储节点发送删除指令,所述删除指令用于指示所述缓存存储节点删除其缓存磁盘上存储的数据块。And a fourth sending module, configured to send a delete instruction to the cache storage node, where the delete instruction is used to instruct the cache storage node to delete a data block stored on a cache disk.
  15. 一种数据存储系统,所述数据存储系统包括:元数据管理服务器MDS、多个存储节点和串行连接小型计算机系统接口SAS交换机,所述多个存储节点通过所述SAS交换机相互连接,所述MDS包括权利要求8至14任一所述的数据重构装置。A data storage system, comprising: a metadata management server MDS, a plurality of storage nodes, and a serial connection small computer system interface SAS switch, wherein the plurality of storage nodes are connected to each other through the SAS switch, The MDS comprises the data reconstruction device of any of claims 8 to 14.
  16. 一种计算机设备,包括处理器、通信接口、存储器和通信总线,其中,处理器,通信接口,存储器通过总线完成相互间的通信;存储器,用于存放计算机程序;处理器,用于执行存储器上所存放的程序,实现权利要求1-7任一所述的数据重构方法。A computer device comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface, the memory complete communication with each other through the bus; the memory for storing the computer program; and the processor for executing the memory The stored program implements the data reconstruction method of any one of claims 1-7.
  17. 一种数据存储系统,所述数据存储系统包括:元数据管理服务器MDS、多个存储节点和串行连接小型计算机系统接口SAS交换机,所述多个存储节点通过所述SAS交换机相互连接,所述MDS包括权利要求16所述的计算机设备。A data storage system, comprising: a metadata management server MDS, a plurality of storage nodes, and a serial connection small computer system interface SAS switch, wherein the plurality of storage nodes are connected to each other through the SAS switch, The MDS comprises the computer device of claim 16.
  18. 一种计算机可读存储介质,所述计算机可读存储介质内存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1-7任一所述的数据重构方法。A computer readable storage medium having stored therein a computer program, the computer program being executed by a processor to implement the data reconstruction method of any of claims 1-7.
PCT/CN2018/108342 2017-09-29 2018-09-28 Data reconstruction method and apparatus, and data storage system WO2019062856A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710903893.4A CN109582213B (en) 2017-09-29 2017-09-29 Data reconstruction method and device and data storage system
CN201710903893.4 2017-09-29

Publications (1)

Publication Number Publication Date
WO2019062856A1 true WO2019062856A1 (en) 2019-04-04

Family

ID=65900908

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/108342 WO2019062856A1 (en) 2017-09-29 2018-09-28 Data reconstruction method and apparatus, and data storage system

Country Status (2)

Country Link
CN (1) CN109582213B (en)
WO (1) WO2019062856A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111124292A (en) * 2019-12-10 2020-05-08 新华三大数据技术有限公司 Data refreshing method and device, cache node and distributed storage system
CN111400241A (en) * 2019-11-14 2020-07-10 杭州海康威视系统技术有限公司 Data reconstruction method and device

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112579384B (en) * 2019-09-27 2023-07-04 杭州海康威视数字技术股份有限公司 Method, device and system for monitoring nodes of SAS domain and nodes
CN110989934B (en) * 2019-12-05 2023-08-25 达闼机器人股份有限公司 Block chain link point data storage method, block chain system and block chain node
CN112214447A (en) * 2020-10-10 2021-01-12 中科声龙科技发展(北京)有限公司 Dynamic reconstruction method, system and device for workload certification operation chip cluster data
CN113672174B (en) * 2021-08-03 2024-05-07 中移(杭州)信息技术有限公司 Data reconstruction method, device, storage medium and apparatus
CN114415970B (en) * 2022-03-25 2022-06-17 北京金山云网络技术有限公司 Disk fault processing method and device of distributed storage system and server

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101986276A (en) * 2010-10-21 2011-03-16 成都市华为赛门铁克科技有限公司 Methods and systems for storing and recovering files and server
CN104050250A (en) * 2011-12-31 2014-09-17 北京奇虎科技有限公司 Distributed key-value query method and query engine system
US20140317222A1 (en) * 2012-01-13 2014-10-23 Hui Li Data Storage Method, Device and Distributed Network Storage System
CN105335250A (en) * 2014-07-28 2016-02-17 浙江大华技术股份有限公司 Distributed file system-based data recovery method and device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100587692C (en) * 2007-01-26 2010-02-03 华中科技大学 Method and system for promoting metadata service reliability
CN101515296A (en) * 2009-03-06 2009-08-26 成都市华为赛门铁克科技有限公司 Data updating method and device
US9830240B2 (en) * 2015-05-14 2017-11-28 Cisco Technology, Inc. Smart storage recovery in a distributed storage system
CN106662983B (en) * 2015-12-31 2019-04-12 华为技术有限公司 The methods, devices and systems of data reconstruction in distributed memory system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101986276A (en) * 2010-10-21 2011-03-16 成都市华为赛门铁克科技有限公司 Methods and systems for storing and recovering files and server
CN104050250A (en) * 2011-12-31 2014-09-17 北京奇虎科技有限公司 Distributed key-value query method and query engine system
US20140317222A1 (en) * 2012-01-13 2014-10-23 Hui Li Data Storage Method, Device and Distributed Network Storage System
CN105335250A (en) * 2014-07-28 2016-02-17 浙江大华技术股份有限公司 Distributed file system-based data recovery method and device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111400241A (en) * 2019-11-14 2020-07-10 杭州海康威视系统技术有限公司 Data reconstruction method and device
CN111400241B (en) * 2019-11-14 2024-04-05 杭州海康威视系统技术有限公司 Data reconstruction method and device
CN111124292A (en) * 2019-12-10 2020-05-08 新华三大数据技术有限公司 Data refreshing method and device, cache node and distributed storage system
CN111124292B (en) * 2019-12-10 2022-08-19 新华三大数据技术有限公司 Data refreshing method and device, cache node and distributed storage system

Also Published As

Publication number Publication date
CN109582213B (en) 2020-10-30
CN109582213A (en) 2019-04-05

Similar Documents

Publication Publication Date Title
WO2019062856A1 (en) Data reconstruction method and apparatus, and data storage system
US20220137849A1 (en) Fragment Management Method and Fragment Management Apparatus
US9613040B2 (en) File system snapshot data management in a multi-tier storage environment
US20200356282A1 (en) Distributed Storage System, Data Processing Method, and Storage Node
US20190163395A1 (en) Storage system and control method thereof
CN110134338B (en) Distributed storage system and data redundancy protection method and related equipment thereof
CN108733311B (en) Method and apparatus for managing storage system
WO2019001521A1 (en) Data storage method, storage device, client and system
US20180364948A1 (en) Data Processing Method, Related Device, and Storage System
CN109144406B (en) Metadata storage method, system and storage medium in distributed storage system
WO2016029743A1 (en) Method and device for generating logical disk of virtual machine
US11449402B2 (en) Handling of offline storage disk
US20220253356A1 (en) Redundant data calculation method and apparatus
US20210278983A1 (en) Node Capacity Expansion Method in Storage System and Storage System
US20200341674A1 (en) Method, device and computer program product for restoring data
US20190347165A1 (en) Apparatus and method for recovering distributed file system
CN107301021B (en) Method and device for accelerating LUN (logical Unit number) by utilizing SSD (solid State disk) cache
US11775194B2 (en) Data storage method and apparatus in distributed storage system, and computer program product
CN109840051B (en) Data storage method and device of storage system
WO2021088586A1 (en) Method and apparatus for managing metadata in storage system
CN108306780B (en) Cloud environment-based virtual machine communication quality self-optimization system and method
US10228885B2 (en) Deallocating portions of data storage based on notifications of invalid data
US11275518B2 (en) System and method for implementing heterogeneous media types with raid
US9606909B1 (en) Deallocating portions of provisioned data storage based on defined bit patterns indicative of invalid data
US8583852B1 (en) Adaptive tap for full virtual machine protection

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18861486

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18861486

Country of ref document: EP

Kind code of ref document: A1